=Paper= {{Paper |id=None |storemode=property |title=Optimal Feature Selection for Context-Aware Recommendation Using Differential Relaxation |pdfUrl=https://ceur-ws.org/Vol-889/paper1.pdf |volume=Vol-889 }} ==Optimal Feature Selection for Context-Aware Recommendation Using Differential Relaxation== https://ceur-ws.org/Vol-889/paper1.pdf
                Optimal Feature Selection for Context-Aware
               Recommendation using Differential Relaxation

                                     Yong Zheng, Robin Burke, Bamshad Mobasher
                                                    Center for Web Intelligence
                                               School of Computing, DePaul University
                                                        Chicago, Illinois, USA
                                          {yzheng8,rburke,mobasher}@cs.depaul.edu


ABSTRACT                                                                    What we see in this example, however, is that these contextu-
Research in context-aware recommender systems (CARS) usual-              al factors (occasion, companions) are linked to specific features
ly requires the identification of the influential contextual variables   associated with the restaurant itself: the type of food, the atmo-
in advance. In collaborative recommendation, there is a substantial      sphere, perhaps the price. To the extent that such linkages are
trade-off between applying context very strictly and achieving good      shared among many users, we call these context-linked features.
coverage and accuracy. Our prior work showed that this tradeoff          When contextual information is limited, we may be able to observe
can be managed by applying the contexts differentially in different      context-linked features more readily, and so discovering influential
components of the recommendation algorithm. In this paper, we            context-linked features is an important goal for CARS research.
extend our previous model and show that our differential context            In our previous research [16], we developed a context-aware col-
relaxation (DCR) model can also be used to identify demographic          laborative recommendation approach based on differential context
and item features that are linked to the contexts. We also demon-        relaxation (DCR). The differential aspect of the algorithm is that
strate the application of binary particle swarm optimization as a        we segment the recommendation algorithm into different compo-
scalable optimization technique for deriving the optimal relaxation.     nents and apply different aspects of the context to each part. Each
                                                                         component of the algorithm operates on the same collaborative
                                                                         user-item-rating data, but only those parts that match its particu-
Categories and Subject Descriptors                                       lar context. The context relaxation part of the algorithm arises be-
H.3.3 [Information Systems]: Information Search and Retrieval            cause we treat the context as a set of constraints controlling what
                                                                         data is available to a given component. Finding the optimum set of
                                                                         contextual features then becomes a matter of finding a relaxation
General Terms                                                            of these contextual constraints so that a balance between accuracy
Algorithms, Modeling, Optimization                                       and coverage is achieved.
                                                                            In this paper, we show that context-linked features are also good
Keywords                                                                 candidates for our relaxation technique. Previously, we used a data
Context-aware recommender systems, collaborative filtering, dif-         set with only a handful of contextual variables so it was possible to
ferential context relaxation, particle swarm optimization                do an exhaustive search of all possible relaxations in order to find
                                                                         the optimum. In this work, we show that binary particle swarm
                                                                         optimization (BPSO) [10] can be applied to find optimal relaxations
1.    INTRODUCTION                                                       with a larger set of context and context-linked variables.
Context-aware recommender systems (CARS) extends the focus of
recommender systems research beyond users and items to the con-
texts in which items are experienced by users, attempting to model
                                                                         2.    RELATED WORK
the interactions that are often influential in making good recom-        CARS researchers have sought different means of discovering in-
mendations. For example, Alice may prefer causal restaurants with        fluential contextual variables. Baltrunas et al [2] conducted a sur-
spicy Thai food when dining with friends, but would rather choose        vey asking subjects to evaluate the importance of contextual factors
a quiet but upscale Italian place when entertaining work colleagues.     to acquire contextual relevances. Others [15, 7] applied feature se-
Clearly, the occasion (work vs. personal life) and the companions        lection or reduction techniques to extract influential contexts from
(friends vs. co-workers) play a key factor in what a system should       a set of candidate variables. However, each of these techniques
recommend. The recognition of the importance of such contextual          has its drawbacks: the technique in [2] requires a lot of user effort;
factors is a starting point for much CARS research.                      the feature reduction techniques are not reliable unless the data set
                                                                         is dense across contexts – items have been rated multiple times in
                                                                         different contexts.
                                                                            Comparing with explicit contextual approaches above, there are
                                                                         also approaches incorporating latent factors: previously we ap-
                                                                         plied labeled-LDA to infer contexts based on review mining[6]. A.
                                                                         Karatzoglou et al [8] introduced multiverse recommendation based
                                                                         on a N-dimensional tensor factorization model, and L. Baltrunas et
                                                                         al [3] present a novel context-aware recommender based on matrix
—————————————————————————————–
CARS-2012, September 9, 2012, Dublin, Ireland.                           factorizations. In this paper, we mainly focus on the existing ex-
Copyright is held by the author/owner(s).                                plicit contextual features we already know, and we will incorporate
latent factor models in the future.                                       C2 , and we do a similar thing for the target user by matching a-
   Rather than seeking to define a set of contextual variables to be      gainst C3 .
applied to the algorithm as a whole, our previous work [16] sought           In our previous work, we were able to show significant improve-
to decompose collaborative recommendation into components and             ment in accuracy for hotel recommendation in a travel data set
apply different aspects of the context to each. We think of the           (http://www.tripadvisor.com/) with the following constraints: C1
context as a constraint governing what data each component con-           (neighbor filtering) using “state of origin”, C2 (peer baseline) us-
siders and seek the strongest constraint consistent with low error        ing “trip type” (e.g. business, family, etc), and no constraints on
and acceptable coverage. Note that this concept of constraining           target user baseline, C3 = {}, where trip type is known as a typical
the data used by different parts of the recommendation algorithm          influential context [6], and we found the state of origin was another
is quite different from constraint-based recommendation, in which         one. For more details, see [16]. What is significant here is that each
constraints are applied to the recommended items themselves [5].          of the three components of the algorithm worked best with their in-
                                                                          put data filtered in different context-related ways. We derived this
                                                                          result by evaluating all possible sets of constraints on all of the data
3.     METHODOLOGY                                                        and finding the optimum combination.
   We take as our starting point the well-known Resnick’s algorithm
for collaborative recommendation:
                            P                                             3.1     Optimization of Constraint Search
                               (ru,i − r̄u ) × sim(a, u)
                           u∈N                                            We model the relaxation as a process of binary selection – we rep-
            Pa,i = r̄a +           P                             (1)      resent C1 , C2 and C3 as binary vectors where a bit value of one
                                       sim(a, u)
                                 u∈N                                      denotes that we filter the data using that feature. In extending
                                                                          this work to other data sets and adding context-linked features, it
where a is a user, i is an item, and N is a neighborhood of other         became clear that an optimization approach based on exhaustive
users u similar to a. The algorithm calculates Pa,i , which is the        search was not scalable.
predicted rating that user a is expected to assign to item i.                To some extent, regularities in the data and dependencies be-
   The algorithm can be understood as operating over the rating           tween the variables can shrink the combinatorial space. For exam-
data in three different ways. r̄a averages over all of the ratings        ple, in our travel data set, origin city was fixed for each user, mean-
supplied by user a to establish an overall baseline for the target        ing that whether we restricted it or ignored it, the computation was
user. The formation of the neighborhood N requires selecting all          the same when computing a user’s baseline. So, that variable only
users for their similarity to a. And, finally, r̄u is computed for each   became relevant for neighborhood calculation. There were other
user u to establish their baseline rating to be subtracted from their     similar simplifications for the other components, and we imagine
individual rating of item i.                                              that this would be true of most recommendation domains.
   Contextual effects may enter into the algorithm in various ways.          Even with such simplifications, we require a more scalable opti-
Perhaps a user has a different baseline rating in different contexts.     mization technique than exhaustive search. Furthermore, the opti-
For example, Alice, our hypothetical diner, may have relatively low       mization space is highly non-linear and standard approaches such
expectations for her casual Thai restaurants and give a lot of high       as gradient descent cannot be used. However, we have had success
scores but give out relatively few five star ratings for her work-        with the particle swarm optimization technique.
oriented experiences. Perhaps good collaborative peer groups are
context-specific – recommendations from expense-account-wielding
business types might be good for her work dinners but not for any-        3.2     Particle Swarm Optimization
thing else.                                                               Particle swarm optimization (PSO) [9] is a kind of swarm intelli-
   To account for this kind of variance, we start with the target         gence which was originally introduced by Eberhart and Kennedy
context C, which is the context for which the recommendation is           in 1995. It is a population-based optimization approach inspired by
sought. Our prediction now is for target user a, item i, and context      social behaviors in swarming and flocking creatures like bees, birds
C: how will the user respond to this item in this context? For ex-        or fish. It was introduced to the domain of information retrieval [4]
ample, how will Alice like a fancy regional American restaurant in        and recommender system [1, 14] recently as a way for feature se-
the “work” context with Betsy, her boss and Carl, an out-of-town          lection and feature weighting. Binary particle swarm optimization
client? It is unlikely that other users will have dined with exactly      (BPSO) is a discrete binary version of the technique introduced in
the same individuals before, which is why we want to think of con-        1997, which is used to fit our binary selections for contexts and
trolling recommendation based on constraints derived from C as            feature relaxation in this paper.
opposed to C itself. For example, we may want to generalize Al-              The basic idea behind PSO is that optimization is produced by
ice’s dining occasion as a small work event, which would exclude          a number of particles at different points in an optimization space.
peer experiences where the meal was a large banquet.                      Each particle searches independently for the optimum, guided by
   From the target context, we derive three sets of contextual con-       the local value and communication with the other particles. In our
straints: C1 , C2 , C3 and apply them to these components separate-       case, we use RMSE as the value to be minimized and the posi-
ly, yielding the following context-sensitive formulation:                 tion in the space corresponds to a set of constraints. Each particle
                          P                                               i records its personal best performance (iBest) and corresponding
                                (ru,i,C2 − r̄u,C2 ) × sim(a, u)
                         u∈NC1                                            best position (Pibest ). The algorithm also keeps track of the global
     Pa,i,C = r̄a,C3 +              P                             (2)     best performance (gBest) and corresponding position (Pgbest ).
                                         sim(a, u)
                                 u∈NC1                                       In each iteration of the algorithm, each particle moves through
                                                                          the optimization space as a function of its velocity. The velocity is
  This version of the equation replaces N with NC1 , meaning that         a function of an inertia term and represented by functions of iBest
we only consider neighbors who have rated item i in a context             and gBest. The inertia is gradually decreased over time so that the
matching C1 . Instead of averaging over all ratings when normaliz-        particles will be more influenced by the previously-located minima.
ing peer ratings, we select only those ratings in contexts matching          The update formulas for Vij (the velocity of j th bit in particle i)
and the global inertia w are as follows:                                      of change to zero should be decreased. Then the improvement can
                                         j
                                                                              be described as follows:
        Vij,t = wt × Vij,t−1 + α1 ϕ1 × (Pibest − Xij,t−1 )                           j
                                                                                 If Pibest = 1 Then d1ij,1 = α1 r1 and d0ij,1 = −α1 r1
                                                                       (3)           j
                                            j
                                 +α2 ϕ2 × (Pgbest − Xij,t−1 )                    If Pibest = 0 Then d0ij,1 = α1 r1 and d1ij,1 = −α1 r1
                                                                                     j
                                                                                 If Pgbest = 1 Then d1ij,2 = α2 r2 and d0ij,2 = −α2 r2
   where ϕ1 and ϕ2 are positive random numbers drawn from a                          j
                                                                                 If Pgbest = 0 Then d0ij,2 = α2 r2 and d1ij,2 = −α2 r2
uniform distribution between 0.0 and 1.0, α1 and α2 are constan-
t learning factors (set as 2.0 by empirical suggestions [13]) which           where d1ij and d0ij are two temporary values and r1 and r2 are two
control the weight given to the local and the global minima, respec-          random variable in range of (0,1). α1 and α2 are the same learning
tively. Xij,t is the particle’s position (the value of j th bit in particle   factors. With this in mind, we define two separate velocities, one
i) at the iteration t.                                                        in the “1” direction V 1 and one in “0” direction V 0 .
   Usually velocity is restricted within range [−Vmax , Vmax ]. It                                Vij1 = wVij1 + d1ij,1 + d1ij,2                 (6)
cannot be too fast or too slow – a small velocity will result in mi-
nor changes on the positions, where large velocity will drive the
                                                                                                  Vij0 = wVij0 + d0ij,1 + d0ij,2                 (7)
particle to move too much in the space. Vmax is suggested to be
the maximum value of each bit in the position vector, where in our              We choose Vij1 or Vij0 as the velocity, depending on the current
case it is 1 because the value switches between 0 and 1. Further              position of the particle for that bit.
experiments show that 2.0 is a better configuration.                                                    1                         
   The inertia at a given iteration t is calculated by:                                                    Vij , if (Xij,t−1 = 0)
                                                                                              Vij,t =                                            (8)
                                                                                                           Vij0 , if (Xij,t−1 = 1)
                           (Tmax − t)
           wt = wend +                × (wstart − wend )               (4)      Finally the bit value can be updated as follows:
                             Tmax
                                                                                                                                     
where Tmax denotes the maximum number of iterations. The de-                                        X ij,t−1 , if (randt < S(Vij,t ))
                                                                                       Xij,t =                                                   (9)
signer specifies a starting and ending weight with wstart > wend .                                  Xij,t−1 , otherwise
A large inertia weight value facilitates a global search while a s-
                                                                                 where X ij,t−1 denotes the 2’s complement of Xij,t−1 ; that is,
mall value facilitates a local search. The linear decreasing value
                                                                              if Xij,t−1 = 0, then X ij,t−1 =1; otherwise, X ij,t−1 =0. Our ex-
means that the algorithm begins with the particles being more in-
                                                                              periments show this improved version can find optimum more effi-
fluenced by their current direction and ends with them being more
                                                                              ciently than classical BPSO.
influenced by their neighbors. We followed the empirical parame-
ter configuration: wstart = 0.9 and wend = 0.4, which is suggested
in PSO research [13].                                                         4.    EXPERIMENTAL SETUP
   The velocity is used to update the value (position) Xij of each bit        For this paper, we use a rating data set in the area of food, the
in the binary vector. Basically, the higher the velocity, the greater         "AIST context-aware food preference dataset" which was used and
probability of switching to 1.                                                distributed by the author Hideki Asoh [12]. The data set was gen-
                                                                            erated using a survey of 212 users asking them to rate items on a
                           1, if (randt < S(Vij,t ))                          menu. There were two different conditions in the survey. In one
              Xij,t =                                             (5)
                           0, otherwise                                       condition, “real hunger”, users were asked to rate the items based
                             1                                                on their current degree of hunger. In the “virtual hunger” situation,
where S(Vij,t ) = 1+exp(−V       ij,t )
                                        , which is a sigmoidal function       they were asked to imagine a degree of hunger different from the
squashes the range of velocity to a range of [0,1]. randt is a uni-           current state and asked what their preferences would be in that state.
form random number in the range [0,1].                                        Asoh and his colleagues collected 6,360 ratings for 20 food menus,
   In our application, we use from 1 to 5 particles initialized with          and also included demographic and item features in the data. Each
a random velocity and random bit vectors representing the possible            user supplied 30 ratings.
constraints. For each iteration, every particle runs our algorithm               For our exploration of contextual recommendation, we use 6
across the training set, compares the RMSE it acquires in current             variables. The key contextual variable is degree of hunger (real
iteration with the best so far (pBest and gBest), and updates them as         or virtual), but we also wanted to explore context-linked features
well as the corresponding positions. Then the new position for next           including gender and various item features related to the food that
iteration is generated based on the equations above. The ability of           were available in the data. Food genre indicates the food is Chinese,
the particles to communicate within the swarm reduces the total               Japanese or Western food; food stuff denotes the food category is
number of iterations required to reach an optimum.                            vegetable, pork, beef, or fish, and the food style contains values that
                                                                              represent the style of food preparation.
3.2.1      Improved BPSO                                                         The data set is split into five folds where each user has at least
Whether each bit value in the binary vector for each particle is 1            one rating in each fold. For each fold, we used DCR to come up
or 0 actually depends on the value of velocity – if the velocity is           with the optimal combinations of context and context-linked fea-
larger, it is more like it is to switch to 1. Thus the weakness of            tures. We compare this technique to collaborative filtering without
classical BPSO is that it only considers the possibility of change            contexts and with contextual pre-filtering which is the similar to
to 1, and does not take the possibility of change to 0 into accoun-           the Equation 2 where C2 and C3 are null are we only constrain the
t. Mojtaba et al [11] improved BPSO by taking the possibility of              neighbor selection.
change to 0 into the consideration. Consider the best position visit-            In order to determine the value of context-linked features, we
ed so far for a particle is Pibest and the global best position for the       created two additional test conditions looking at the contextual fea-
particle is Pgbest . Assume the j th bit of ith best particle is 1. So to     ture alone (degree of hunger) and using the contextual feature to-
guide the bit j th of ith particle to its best position, the velocity of      gether with context-linked features (demographic and item features
change to one for that particle should be increased and the velocity          discussed above).
   In order to assess the effectiveness of our particle swarm tech-                                                           5.2           Analysis of the Optimal Constraints
nique, we performed exhaustive search of the constraint space (8,192                                                          Table 1 shows how the different conditions yield different optimal
iterations) and compared this result with that found by the opti-                                                             constraints. For pre-filtering and DCR using contextual features
mization technique.                                                                                                           only, the real hunger is influential. When context-linked features
                                                                                                                              are added, real hunger is still influential, but it is applied to dif-
5.         EXPERIMENTAL RESULTS                                                                                               ferent components in the DCR model. Context-linked features can
Recall that we want to discover influential contexts and context-                                                             provide accuracy improvements but it has to be applied to appropri-
linked features, and our method for doing so is to find the optimal                                                           ate components – the best DCR model is using all of the algorith-
set of constraints and see what features are contained therein. Our                                                           m components, the pre-filtering part of the algorithm retains only
results show that on this data set appropriate relaxation of contexts                                                         “gender” and the hunger-related features are applied elsewhere.
and features can help improve predictive performance, and incor-                                                                 Context-linked features - gender, food genre, food stuff, food
porating context-linked features outperforms using contexts only.                                                             style are selected and applied in the optimal model. The position
                                                                                                                              of contextual constraints in the optimal relaxation is also telling.
5.1             Predictive Performance                                                                                        “Gender” turns out to be the best feature for C1 , indicating that
   The experimental results are shown in Figure 1 and Table 1. We                                                             same-sex peer neighborhoods work best in this domain. Also, we
use CO to denote contextual features (degree of hunger) and CL                                                                see that a fully restricted constraint – all contexts and food fea-
to denote the other demographic and item features that we suspect                                                             tures appear in C2 , which is where we choose the ratings to be
may be context-linked. These are the results achieved by exhaustive                                                           used in constructing the peer user’s baseline rating. It confirms the
search; the optimization results are as below.                                                                                assumption by CARS – users’ preferences differ from contexts to
        1.215                                                                                             100.0%
                                                                                                                              contexts, thus neighbor’s exactly matched contextual ratings con-
                                                                                                                              tribute significantly to predict user’s rating on the same item under
        1.200                                                                                             99.7%
                                                                                                                              the same contexts. The optimal model took real hunger as the on-
        1.185
                                                                                                          99.4%
                                                                                                                              ly constraint in 3rd component, which implies the users’ average
                                                                                                          99.1%
        1.170                                                                                                                 rating under the same real hunger situation works the best as the
        1.155
                                                                                                          98.8%
                                                                                                                              representative of this component.
RMSE                                                                                                               Coverage
                                                                                                          98.5%
(Bar)                                                                                                               (Line)
        1.140
                                                                                                          98.2%
                                                                                                                              5.3           Optimization Performance
        1.125
                                                                                                          97.9%                  It is clear that differential context relaxation is useful both for
        1.110
                                                                                                          97.6%               improving algorithm performance and discovering crucial features.
        1.095                                                                                             97.3%
                                                                                                                              However, exhaustive search is not a practical or scalable way to
        1.080                                                                                             97.0%
                                                                                                                              realize this goal, although it is possible in this data set.
                Standard CF Pre!filtering Pre!filtering Pre!filtering   DCR (CL)   DCR (CO) DCR (CO+CL)                          We used this data set as an opportunity to experiment with BPSO
                                (CL)         (CO)         (CO+CL)
                                                                                                                              as an alternative method of finding the optimal context relaxation.
                                                        RMSE            Coverage
                                                                                                                              We examined particle counts from 1 to 5, using a maximum of
                                                                                                                              100 iterations. A local minimum occurred in classical BPSO – it
                Figure 1: Comparison of Predictive Performance
                                                                                                                              converged an RMSE (1.116) not quite as good as the global min-
    As the figure shows, the contextual pre-filtering and our DCR                                                             imum found by exhaustive search (1.114). The improved BPSO
model can make improvements comparing with the standard CF                                                                    algorithm does find the global minimum RMSE and the same set
without incorporating contexts, and the DCR model achieved the                                                                of relaxations as the exhaustive search. Figure 2 shows the learn-
best RMSE among these algorithms. Besides, DCR shows better                                                                   ing curve of the improved version, where N -BPSO denotes we use
accuracy improvement when context-linked features are added, but                                                              number of N particles in the swarm. Improved BPSO finds the
it does not work for contextual pre-filtering. We believe the ef-                                                                                              1 BPSO         3 BPSO            5 BPSO
fects shown by adding context-linked features are domain-specific.
                                                                                                                                           1.210
And those features should be applied to appropriate components                                                                             1.200
in the model, where DCR using CL only and DCR using CO and                                                                                 1.190
CL together provide significant improvements in terms of RMSE.                                                                             1.180
But pre-filtering restricts the application of context-linked features                                                                     1.170
                                                                                                                                    RMSE




                                                                                                                                           1.160
for neighborhood filtering only. – it may not provide accuracy im-
                                                                                                                                           1.150
provements; instead, more restrict constraints may bring lower cov-                                                                        1.140
erage as shown in the figure above.                                                                                                        1.130
    We also examined coverage. When the dataset is sparse, intro-                                                                          1.120
ducing contextual constraints can greatly limit coverage, an effect                                                                        1.110
                                                                                                                                           1.100
that we found in our previous research. With this dataset, there was
                                                                                                                                                   1   6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 96
little cost in coverage – the best coverage is 99.7% by standard CF                                                                                                      Number of Iterations
model, while the best DCR model shows a 99.3% coverage. Notice
that when contexts are introduced into the 1st algorithm component                                                                                           Figure 2: Improved BPSO
(neighbor filtering), the coverage may depend on which variables
are selected as the relaxed ones for this component, as well as the                                                           optimal solutions on average around the 18th iteration using one
density of contextual information in the data set. From Figure 1 and                                                          particle, and 12th iteration using five particles. 18 iterations with 1
Table 1, we can see the coverage of pre-filtering using both context                                                          particles corresponds to 18 evaluations over the training set, a more
and context-linked features is the lowest, because real hunger and                                                            than 455× reduction in computational cost. Increasing the number
gender together as more strict constraints are selected and applied                                                           of particles expands the power of social contributions by other par-
in this component.                                                                                                            ticles, and can reduce the number of iterations, but also increases
                                                       Table 1: Optimal Relaxations
 Dataset: Food Menu         RMSE      Coverage      Optimal Relaxation
 Standard CF                1.203      99.7%        N/A
 Pre-filtering (CL)         1.202      99.3%        C1 = {gender}
 Pre-filtering (CO)         1.198      99.6%        C1 = {real hunger}
 Pre-filtering (CO+CL)      1.204      98.3%        C1 = {real hunger, gender}
 DCR (CL)                   1.163      99.7%        C1 = {}; C2 = {food genre, food stuff, food style}; C3 = {food genre, food stuff, food style}
 DCR (CO)                   1.119      99.6%        C1 = {real hunger}; C2 = {}; C3 = {real hunger}
                                                    C1 = {gender}; C2 = {real hunger, virtual hunger, food genre, food stuff, food style};
 DCR (CO+CL)                 1.114      99.3%
                                                    C3 = {real hunger}


the amount of computation per iteration. The selection of the num-             the 10th international conference on Electronic commerce,
ber of particles and the number of iterations depends on the specific          ICEC ’08, pages 3:1–3:10, New York, USA, 2008. ACM.
cost and computational complexity of the algorithm.                        [6] N. Hariri, B. Mobasher, R. Burke, and Y. Zheng.
                                                                               Context-aware recommendation based on review mining. In
6.    CONCLUSIONS                                                              Proceedings of the 9th Workshop on Intelligent Techniques
Differential context relaxation is a technique for context-aware rec-          for Web Personalization and Recommender Systems (ITWP
ommendation that decomposes the recommendation algorithm into                  2011), page 30, 2011.
components and filters the data used in each component. The con-           [7] Z. Huang, X. Lu, and H. Duan. Context-aware
text is considered a set of constraints, which may be relaxed to               recommendation using rough set model and collaborative
achieve the best recommendation performance.                                   filtering. Artificial Intelligence Review, pages 1–15, 2011.
   In this work, we examine an optimization technique – binary             [8] A. Karatzoglou, X. Amatriain, L. Baltrunas, and N. Oliver.
particle swarm optimization – to efficiently locate the highest per-           Multiverse recommendation: n-dimensional tensor
forming set of contextual constraints. We show that BPSO can find              factorization for context-aware collaborative filtering. In
the global optimum constraint set with a fraction of the iterations            Proceedings of the fourth ACM conference on Recommender
required by exhaustive search.                                                 systems, pages 79–86. ACM, 2010.
   We also show that both contextual features and context-linked           [9] J. Kennedy and R. Eberhart. Particle swarm optimization. In
features can be useful in our context relaxation model. Further-               Neural Networks, 1995. Proceedings., IEEE International
more, our algorithm highlights the different roles that contextual             Conference on, volume 4, pages 1942–1948. IEEE, 1995.
features play, producing some interesting results, such as the fact       [10] J. Kennedy and R. Eberhart. A discrete binary version of the
that same-sex neighbor populations work best in our food data set.             particle swarm algorithm. In Systems, Man, and Cybernetics,
   Future research will be focused on two parts: 1) exploring the              1997. Computational Cybernetics and Simulation., 1997
possibility for real-valued (as opposed to binary) definition of dif-          IEEE International Conference on, volume 5, pages
ferential context relaxation and 2) applying the same technique to             4104–4108. IEEE, 1997.
additional algorithms beyond user-based collaborative filtering. 3)       [11] M. Khanesar, M. Teshnehlab, and M. Shoorehdeli. A novel
combine with the latent factor models, such as matrix factorization            binary particle swarm optimization. In Control &
in order to alleviate the sparsity problem of contextual ratings and           Automation, 2007. MED’07. Mediterranean Conference on,
examine our differential model in another way.                                 pages 1–6. Ieee, 2007.
                                                                          [12] C. Ono, Y. Takishima, Y. Motomura, and H. Asoh.
7.    REFERENCES                                                               Context-aware preference model based on a study of
 [1] A. Abdelwahab, H. Sekiya, I. Matsuba, Y. Horiuchi, and                    difference between real and supposed situation data. User
     S. Kuroiwa. Feature optimization approach for improving the               Modeling, Adaptation, and Personalization, pages 102–113,
     collaborative filtering performance using particle swarm                  2009.
     optimization. Journal of Computational Information                   [13] Y. Shi and R. Eberhart. Empirical study of particle swarm
     Systems, 8(1):435–450, 2012.                                              optimization. In Evolutionary Computation, 1999. CEC 99.
 [2] L. Baltrunas, B. Ludwig, S. Peer, and F. Ricci. Context                   Proceedings of the 1999 Congress on, volume 3. IEEE, 1999.
     relevance assessment and exploitation in mobile                      [14] S. Ujjin and P. Bentley. Particle swarm optimization
     recommender systems. Personal and Ubiquitous Computing,                   recommender system. In IEEE Swarm Intelligence
     pages 1–20, 2011.                                                         Symposium, pages 124–131. IEEE, 2003.
 [3] L. Baltrunas, B. Ludwig, and F. Ricci. Matrix factorization          [15] B. Vargas-Govea, G. González-Serna, and
     techniques for context aware recommendation. In                           R. Ponce-Medellín. Effects of relevant contextual features in
     Proceedings of the fifth ACM conference on Recommender                    the performance of a restaurant recommender system. In
     systems, pages 301–304. ACM, 2011.                                        ACM RecSys’ 11, the 3rd Workshop on Context-Aware
 [4] H. Drias. Web information retrieval using particle swarm                  Recommender Systems (CARS-2011), 2011.
     optimization based approaches. In Web Intelligence and               [16] Y. Zheng, R. Burke, and B. Mobasher. Differential context
     Intelligent Agent Technology (WI-IAT), 2011                               relaxation for context-aware travel recommendation. In 13th
     IEEE/WIC/ACM International Conference on, volume 1,                       International Conference on Electronic Commerce and Web
     pages 36–39. IEEE, 2011.                                                  Technologies (EC-WEB 2012), volume 123 of Lecture Notes
 [5] A. Felfernig and R. Burke. Constraint-based recommender                   in Business Information Processing, pages 88–99. Springer
     systems: technologies and research issues. In Proceedings of              Berlin Heidelberg, 2012.