Context-Aware Recommendation Based On Review Mining

                       Negar Hariri, Bamshad Mobasher, Robin Burke and Yong Zheng
                          DePaul University, College of Computing and Digital Media
                                 243 S. Wabash Ave, Chicago, IL 60604, USA
                              {nhariri, mobasher, burke, yzheng8}@cs.depaul.edu


                           Abstract                                   his friend’s child may repeatedly receive suggestions to buy
                                                                      items related to kids as the recommendation algorithm de-
     Recommender systems are important building                       cides based on the whole history in user’s profile without pri-
     blocks in many of today’s e-commerce applications                oritizing his current interests. To address this issue, the notion
     including targeted advertising, personalized mar-                of context and context-aware recommender systems (CARS)
     keting and information retrieval. In recent years,               has been introduced.
     the importance of contextual information has moti-
     vated many researchers to focus on designing sys-                   Contextual information can be explicit or implicit and can
     tems that produce personalized recommendations                   be inferred in different ways such as using GPS sensor data,
     in accordance with the available contextual infor-               clickstream analysis or monitoring user rating behavior. In
     mation of users. Compared to the traditional sys-                this paper, we concentrate on deriving context from a textual
     tems that mainly utilize users’ preference history,              description of a user’s current state and the item features in
     context-aware recommender systems provide more                   which he/she is interested. This data can be in different forms
     relevant results to users. We introduce a context-               such as tweets, blog posts, review texts or it can be given
     aware recommender system that obtains contextual                 directly to the system as part of a query.
     information by mining user reviews and combining                    As an example application of our approach, we have used
     them with user rating history to compute a utility               our method to mine hidden contextual data from customers’
     function over a set of items. An item utility is a               reviews of hotels. The reason behind the selection of this
     measure that shows how much it is preferred ac-                  dataset is that users usually provide some contextual cues in
     cording to user’s current context. In our system, the            their comments. For example, they may mention that they are
     context inference is modeled as a supervised topic-              with family or on a business trip, or they may express their
     modeling problem in which a set of categories for                opinions about the hotel services that are important to them
     a contextual attribute constitute the topic set. As an           such as having wireless internet, conference rooms, etc. In or-
     example application, we used our method to mine                  der to evaluate our method, we have used “trip Advisor” hotel
     hidden contextual data from customers’ reviews of                reviews dataset where each review contains an overall rating,
     hotels and use it to produce context-aware recom-                an optional review comment and also a “trip type” attribute
     mendations. Our evaluations suggest that our sys-                that shows the types of trips user suggest for this hotel. For
     tem can help produce better recommendations in                   this attribute, the user can select a subset of five possible val-
     comparison to a standard kN N recommender sys-                   ues: Family, Couples, Solo travel, Business, and Friends’ get-
     tem.                                                             away. The “trip type” attribute is not a feature of user or hotel
                                                                      (as different users may assign different values), it is rather re-
                                                                      lated to the interaction and it is assumed to be an indication
1   Introduction                                                      of context in our system.
In recent years, recommender systems (RS) have been exten-               Our approach in inferring context is based on using a clas-
sively used in various domains to recommend items of inter-           sifier that is trained by the samples of descriptions and their
est to users based on their profiles. A user’s profile is a reflec-   corresponding contexts. Usually the trip type that a customer
tion of the user’s previous selections and preferences that can       picks for a hotel is related to his review. Having this assump-
be captured as rating scores given to different items in the sys-     tion, a set of review texts and their associated trip types are se-
tem. Using preference data, different systems have been de-           lected as the training set for the context classifier. After train-
veloped to produce personalized recommendations based on              ing, for a given description (as the user context) the classifier
collaborative filtering, content-based or a hybrid approach.          computes the probability of each the trip category. This prob-
   Despite the broad usage of such recommender systems,               ability distribution is used to infer context. Since we are deal-
failure to consider the users’ current situations may result in       ing with a multi-class supervised classification problem, we
considerable performance degradation in recommendations.              chose to use Labeled-LDA [1] as our categorization method
For example, a customer who has once bought a toy for                 as based on our experiments it performs better in our dataset
in comparison to other similar methods.                            context is not observable itself, the activity that arise from the
   We propose a method to use this inferred context to pro-        context can be observed.
duce context-aware recommendations. While most of the                  Adomavicius et al. [6] suggest three different architec-
existing approaches assume that a user’s rating behavior de-       tures for context-aware recommender systems: In Contex-
pends on the current context and predict a rating function,        tual pre-filtering approach, the dataset is first filtered, the
we differentiate between the “rating” that a user gives to an      recommendations will be then provided based on the con-
item and the “utility” gains from choosing it. The inferred        textualized dataset. On the other hand, contextual post-
context is used to define a utility function for the items re-     filtering approach generates recommendations similar to tra-
flecting how much each item is preferred by a user given his       ditional recommender systems. It will then filter and re-
current context. More specifically, the utility value depends      rank these recommendations to provide contextual recom-
on two factors: the predicted rating and the “context score”       mendations. In contextual modeling, context is added to the
where context score represents the suitability of an item for a    problem as an additional dimension; meaning that in con-
user in a given context. Rating can be predicted based on any      trast to traditional recommender systems that estimate the rat-
conventional recommendation algorithms such as kNN.                ing function in two dimensional space of U ser × Item, the
   Through the rest of this paper, we will first review some of    context-aware recommender system is defined over the space
the related work. Section 3 describes our proposed context-        of U ser × Item × Context. The representation of context
aware recommendation process. Finally, section 4 includes          and the way it should be captured and integrated into the rec-
the evaluation of the proposed method and its comparison           ommendation algorithm depend on available contextual in-
with traditional recommender.                                      formation as well as the definition of context in the system.
                                                                       An interesting application of context-aware recommender
2   Related Work                                                   systems is in mobile devices that are equipped with GPS or
                                                                   have internet access. In this case, different contextual infor-
Several researchers have previously investigated the use of        mation can be captured in real-time in order to be used in
contextual information in various applications of recom-           the recommendation process. For example PioApp Recom-
mender systems. Although there is no clear-cut definition          mender [4] produces recommendations based on points of in-
of context, one of the most commonly used definitions was          terest (such as restaurants, museum and train stations) in the
suggested by Abowd et al. [2] as follows: “Context is any in-      neighborhood of the mobile user. Social camera introduced
formation that can be used to characterize the situation of an     in [7] assists users in picking photo compositions given their
entity. An entity is a person, place, or object that is consid-    current location and scene context. Many mobile travel appli-
ered relevant to the interaction between a user and an appli-      cations such as [8–10] have also took advantage of context in
cation, including the user and applications themselves.” This      order to make better suggestions. Numerous algorithms have
is a general definition that limits the context only to the in-    also been suggested for music and movie recommendation
formation that could be used to characterize the situation or      (as well as many other domains). Micro-profiling introduced
the circumstance. Another similar definition by H.Lieberman        in [11], splits each single user profile into several possibly
et al. [3] is: “context can be considered to be everything that    overlapping sub-profiles where each of them represent user’s
affects computation except the explicit input and output”. In      preference in a particular context. A context random walk
addition to these general definitions, a number of more spe-       algorithm was proposed in [12] to model the user’s movie
cific definitions of context have been recently provided. For      browsing behavior and then use it to make context-aware rec-
example, “Context can be described by a vector of context          ommendations.
attributes, e.g. time, location or currently available network         Some of the above mentioned approaches such as [8, 9]
bandwidth in a mobile scenario”. [4].                              use a simple representational view of context where context
   Capturing and representation of context in a system de-         is shown as a set of attributes (such as time, location, weather
pends on the way context is defined in that system. Dourish et     conditions) that is given to the system as input; while some
al. [5] presented two different views in modeling context: The     other systems try to infer the contextual attributes from the
representational view and the interactional view. In represen-     user behavior. Instead of using a representational model,
tational view, context is defined as a form of information that    the context-aware recommender in [13] uses an interaction
is stable, delineable and is separate from activity. Having this   model. The proposed system was inspired by human mem-
view, context can be defined and represented as a specific set     ory model in psychology where the short term and long term
of attributes of the environment within which the user’s in-       memories are separately modeled. The short term memory
teraction with the system has taken place. For example, time       contains the user preferences derived from his active interac-
and location can be considered as contextual attributes. In in-    tion with the system while the long term memory stores the
teractional view, it is assumed that contextuality is a rational   preference models related to his previous interactions with
property that holds between objects and activities rather than     the system. They introduced three types of contextual cues
to be information (as in the case of representational view).       including collaborative, semantic and behavioral cues in or-
Also, the contextual features are not definable and static but     der to retrieve relevant preference models from the long term
their scope is defined dynamically. Furthermore, rather than       memory. The retrieved memory objects will then be com-
assuming that context defines the situation within which an        bined with user’s current preference model to generate and
activity occurs, it is assumed that context arises from activity   aggregate a final preference model that is used to produce
and activity is induced by context. Therefore, even though         recommendations.
   In this paper we propose a method for mining contextual             In our experiments, a dataset containing a set of hotel re-
data from texual reviews. The importance of the hidden data         views from Trip Advisor website has been used. In this
in review comments has been the subject of many researches          dataset, the “trip type” attribute assigned to a hotel review
in the area of opinion mining and sentiment analysis. In opin-      shows the types of trips that the user suggest for the hotel.
ion analysis, various Natural Language processing techniques        The assigned attribute can be selected by the user from a set
and text analysis methods are applied to a set of review to ex-     of five possible choices: Family, couples, Solo travel, busi-
tract attributes of the object that are referred to in the review   ness, and friends’ getaway. We assume that this element is
text and to discover polarity (positive, negative or neutral) of    the representation of context in our system. A sample review
the expressed opinions.                                             from this dataset is depicted in Table 1. This sample shows
   The problem of extracting contextual information from un-        the relationship between the trip type attribute and the review
structured text is fairly new and has not been extensively ad-      comment. For example “budget accommodation”, “twin bed-
dressed in prior researches. Aciar [14] introduces a method         room”, “small” and “shared bathroom” can be more related to
to identify review sentences which contain contextual infor-        Friends getaway trip than to a business trip or a family travel.
mation. In their approach, rule sets were created to classify          Producing context aware recommendations requires min-
sentences into contextual and preference categories where the       ing the user’s current context. If the user explicitly specifies
preference category groups sentences including user’s evalu-        his context, then it can be easily used in the recommendation
ation of the features. The approach presented in [14] does not      algorithm. On the other hand, if he implies his context in
discuss the use of the retrieved information in the recommen-       a set of sentences describing his current state or his desired
dation process while we will provide a way of incorporating         features for the hotel, then an inference method is required
the contextual knowledge in producing the recommendations.          to determine the probability of each trip type. In this way,
                                                                    the context is shown as a distribution over the set of trip cat-
3     Context-Aware Recommendation Process                          egories. In both cases, let Contextiu denote the context of
                                                                    user u when using item i. For example, if the reviewer u indi-
Our context-Aware recommender system (CARS), includes               cates the trip type for hotel i as business and solo travel, then
several components. The first component is the context miner        the context represenation is Contextiu = {P (family) = 0,
that is responsible for determining a user’s current context.       P (couples) = 0, P (solo travel) = 0.5, P (business) = 0.5,
Context is represented as a distribution function over the set      P (friends’ get away) = 0}.
of trip types and can be mined from a textual description of           The context inference problem just described is similar to a
user’s current situation and the features that are important to     multi-labeled text classification problem in which documents
him. The main part of the context inference module consists         can be a classified into one or more categories. The general
of a multi-class supervised classifier. After training the clas-    solution is to provide a training set, build the model and use
sifier, context can be inferred for a given query. An example       the model to categorize the new documents. If the trip type
query is shown in Table 1. Based on the underlined words, it        categories assigned to each review is assumed to be related to
seems that the user is most probably looking for “couples” or       the review comment (and we will show they are related in our
“family” type trip rather than a “business” one.                    dataset), then we can use a set of review comments and their
                                                                    corresponding trip type values as our training set for training
                                                                    the classifier.
    I’m planning a romantic trip for my anniversary. I’m
    looking for an all inclusive resort near a beach. I ex-          Trip type        Friends’ getaway
    pect the hotel room to be spacious, have a nice view             Review           This is an excellent option for budget ac-
    over the sea and to be nicely decorated.                         Comment          commodation in a hostel type establish-
                                                                                      ment in a top class location, very close
                   Table 1: A sample query                                            to central station and quick bus journey
                                                                                      to circular quay. Stayed in twin bedroom
                                                                                      which was very small but did the trick. If
   The second component of our system is the rating predic-                           all you want is a clean bed in a clean room
tor that is a simple collaborative filtering recommender which                        then this is grand. Shared bathroom and
predicts ratings of items. This component can be replaced                             showering facilities were kept clean too.
with other types of rating prediction algorithms. The third          Summary          Excellent hostel accommodation in great
component calculates the utility function based on user’s cur-       Quote            location
rent context and the predicted rating and presents a set of sug-
gestions according to the order of the utility values.              Table 2: A sample review comment and the associated trip
                                                                    type
3.1    Context Representation
Contextual recommender systems can either have an interac-
tional or a representational view of the context. In this paper,
we assume there are explicit labels representing context and        3.2   Inferring the Context
the contextual information is obtained for each textual review      Different techniques have been used in text categorization
by mapping it to this label set.                                    such as probabilistic methods, regression modeling and SVM
      Figure 1: Graphical Representation of LDA [16]                 Figure 2: Graphical Representation of Labeled-LDA [1]


classification. In this article, we have used Labeled Latent       chose to use Labeled-LDA [1] as the other methods limit each
Drichlet Allocation [1] (shown as L-LDA) for our dataset as        document to be associated with only one topic while in our
it has shown to perform relatively well on our dataset. This       case, reviews can have multiple labels. Similar to LDA, in
method is a supervised classification algorithm for multi-         Labeled-LDA modeling, each word in the document is as-
labeled text corpora and is based on topic modeling.               signed a single topic. However, in order to incorporate super-
                                                                   vision, the topic should belong to the label set of the docu-
Topic modeling and Labeled-LDA                                     ment. In other words, there is a one-to-one relationship be-
Topic modeling deals with statistical modeling of documents        tween the set of labels assigned to the documents and the top-
in order to discover the latent topics behind them. Probabilis-    ics and the topic mixture of each document is formed accord-
tic latent semantic analysis (PLSA) [15] is one of the early       ing to its label set. Figure 2 shows the graphical presentation
approaches in this area which models a document as a proba-        of Labeled-LDA. Having k unique labels in all documents,
bility distribution over the set of topics.                        the parameter Λ for each document is a k dimensional binary
   Later, the Latent Drichlet Allocation [16], known as LDA,       vector that shows the presents or absence of each topic in the
was proposed as an extension of PLSA. LDA specifies a gen-         document label set. For each document, Λ is generated using
erative process for creating documents. The document gen-          a Bernoulli coin toss with a prior probability vector η
eration is based on the idea that documents are mixture of             As in [1], we used Gibbs sampling [19] for training. Let
topics, where a topic is a probability distribution over words.    C W T and C DT represent two matrices which contain word-
To generate a new document d, first the distribution over top-     topic counts and document-topic counts respectively. Gibbs
ics denoted by θ(d) should be specified. For each word in the      sampling begins with randomly assigning words to topics and
document a topic t is selected based on θ(d) . Let φ(t) denote     filling the two matrices accordingly. Then iteratively updates
the multinomial distribution over words for topic t; Accord-       them to finally converge to estimations of θ and φ. At each
ing to this distribution a word is picked and is added to the      iteration, a word token is selected and its current topic is re-
document. It should be noted that this is similar to the gen-      moved and C W T and C DT are updated by decrementing the
eral procedure followed by most of the existing topic mod-         corresponding entries to the removed topic assignment. Then,
els except that the statistical assumptions differs based on the   a new topic is sampled based on the topic assignments to all
model. The LDA model assumes that the topic mixture θ is a         other words and the count matrices are incremented accord-
k-dimensional random variable as follows [16].                     ingly. After convergence, estimates of θ and φ can be ob-
                                                                   tained using equations 2 and 3 respectively.
                       Pk
                     Γ(      αi ) α1 −1
           P (θ|α) = Qk i=1       θ     . . . θkαk −1       (1)                                      DT
                                                                                                    Cdj +α
                           Γ(α  )                                                      (d)
                       i=1    i                                                      θj      = PT                                (2)
                                                                                                     DT
                                                                                                k=1 Cdk + T α
   Where α is k-vector with elements αi > 0 and Γ(x) is the
gamma function. Figure 1 describes the graphical representa-                                         WT
tion of LDA where the rectangles show replicates. The outer                           (j)           Cij +β
                                                                                    φi       = PW                                (3)
rectangle represents M documents while the inner rectangle                                           WT
                                                                                                k=1 Ckj + W β
illustrates the process of sampling words for a document of
size N . In the LDA model, the document size follows a
Poisson distribution. In corpora with a large vocabulary, it       3.3   Predicting item utility
is likely that some of the words do not appear in the training     As noted earlier, we make a distinction between predicting
examples. In order to cope with problem, a smoothing strat-        rating and utility. We assume that the utility of an item for
egy is used by placing a Drichlet prior on φ with parameter β      a user may differ among different contexts, even though the
as shown in the figure.                                            user has rated the same item equally in those contexts. For
   In our problem, the user reviews are assumed to be docu-        instance, in the hotel review dataset it is possible that the rat-
ments where the topics behind these documents are the set          ing given by a customer to hotel on a business trip does not
of possible values for a trip type. As the topics are pre-         change if he visits the same hotel one more time with his fam-
defined, we need to adopt a supervised topic modeling ap-          ily while the utility of selecting that hotel changes from one
proach. Some variations of LDA have been proposed to sup-          trip type to the other. When he is on a business trip, business
port supervised learning such as [1, 17, 18] among which we        services of the hotel are more important while in a family trip
some other characteristics of the hotel (such as having a pool,       Context score of item i for user u can be estimated by com-
distance to beach etc.) gain more priority.                        paring the distribution of inferred context of u and predicted
   We define context score as a measure of suitability of          context for this item. We used three different methods namely
an item for a user in a given context. To calculate the            Chebyshev Similarity [21], Kullback-Leibler Similarity [22]
context score for user u and item i, we need to predict            and a simple cosine similarity. We have chosen to use co-
the context that u would assign to i that is denoted by            sine similarity in our evaluations as it performs better on our
predictedContext(u, i). The predicted context will then            dataset.
be compared to current context of u (that can be inferred).           Let ICu denote the inferred context for user u and P Cui
We use a collaborative approach for calculating context of a       indicate the predicted context (calculated based on equation
(user, item) pair. The similarity between two items i and j        5). The context Score for item i and user u is computed as
is computed using the cosine similarity as follows:                follows:

                                                                                                         ICu · P Cui
                              P
                                 commonLabels(i, j)
 contexualSimilarity(i, j) = pPu                                               contextScore(u, i) =                             (6)
                               u |labels(i)| × |labels(i)|
                                                                                                       ||ICu ||||P Cui ||
                                                        (4)           The utility score of item i and user u is calculated as a
                                                                   function of both the context score of i and also the predicted
   Where commonLabels(i, j) is the number of times users
                                                                   rating of the item. In our experiments, standard item-based
assign the same trip type category to both i and j and labels(i)
                                                                   kN N was used to calculate the predicted ratings.
counts the number of trip type labels given to i by all users.
This similarity is used to obtain a neighborhood for item i by
selecting the top N most similar items. Then, the predicted                  utility(i, j) = α · predictedRating(u, i)+
                                                                                                                                (7)
context can be computed as in equation 5. In the predicted                   (1 − α) · contextScore(u, i)
context, the probability of each trip category is calculated by       In relation 7, α is a constant representing the weight of the
taking the weighted average of its probabilities in the neigh-     predicted rating in the utility function. The items are sorted
bors’ contexts.                                                    based on utility values and the top N items are suggested to
                                                                   the user.
   predictedContext(u, i) =
                                                                   4     Evaluation
   P                        k                               (5)
      k∈Neighbors(i) contextu · contexualSimilarity(k, i)          The evaluations presented in this paper were performed on
                                                                   the Trip Advisor dataset that contains 12558 reviews for 8941
        P
           k∈Neighbors(i) |contextualSimilarity(k, i)|
                                                                   hotels made by 1071 reviewers. About 9500 of the reviews
   Where contextku stands for the neighbor k context given by      has the “trip type” label which has been used as an indication
user u.                                                            of context.
   Our notion of predicted context for a (user, item) pair            Our system consists of two main parts and the experiments
is somehow similar to the idea of “best context” introduced        have been designed accordingly. The first experiment focuses
in [20] for music recommendation. The authors have defined         on assessing the accuracy of the context inference module on
this concept as the contextual information most suited for a       our dataset. In the second experiment, the performance of
particular item. They have used a vector representation of         the recommender system is compared with the standard kNN
context where each dimension corresponds to a contextual at-       recommender.
tribute. if the user believes that context is suitable for that
specific item, the value of the corresponding dimension is set     4.1    Context Inference Evaluation
to one. They have proposed four different approaches for the       The accuracy of the context inference algorithm plays a sig-
prediction of the best context. The first method is based on       nificant role in the performance of the system. As previously
averaging the context vectors of the item across all users. An-    explained, we used Labeled-LDA as it has shown to perform
other technique is to find the K-nearest neighbors of the user     relatively better than other multi-Labeled text classification
(based on rating history) and compute the predicted context        method. In this experiment we will assess its performance on
as the weighted average of the contexts assigned to that item      our dataset. The experiment was set up as a five-fold cross
by his neighbor. The other two methods follow the same ap-         validation. In each of the five runs, one of the folds was used
proach except that the similarity of users are computed based      for testing while the topic model was built based on the re-
on the context vectors and independent of their rating his-        maining four folds. For every test case (i.e. the review text),
tory. Our method is different from the previously mentioned        the probability distribution over the trip type categories were
approaches in various aspects: The above methods focus on          predicted. A category is assigned to a test case if the predicted
predicting the suitable context for a (user, item) pair while      probability for that category exceeds a certain threshold.
we address the whole process of context-aware recommenda-             The results are evaluated by measuring both precision and
tion; In other words, predicting the best context is just a part   recall where precision is computed as the fraction of correct
of our context-aware algorithm. Moreover, our method for           categorical labels, and recall is computed as the ratio of cor-
calculation of contextual similarity and also prediction of the    rect labels to total number of labels. Figures 3 and 4 de-
best context is different from the previous techniques.            pict recall and precision values for different categories. As
       Figure 3: Recall values for different categories              Figure 5: Hit Ratio comparison for item-based kNN and
                                                                     context-aware recommender


                                                                     tion list for standard kNN and context aware recommender
                                                                     system where the user’s context is inferred and also when it
                                                                     is explicitly expressed. The results suggest that an increase
                                                                     in hit ratio is expected when the contextual information is in-
                                                                     volved in producing the recommendations.

                                                                     5   Conclusions
                                                                     This paper has presented a novel approach for mining context
                                                                     from unstructured text and using it to produce context-aware
                                                                     recommendations. In our system, the context inference is
                                                                     modeled as a supervised topic-modeling problem for which
                                                                     we used Labeled-LDA to build the context classifier. The in-
                                                                     ferred context is used to define a utility function for the items
      Figure 4: Precision values for different categories
                                                                     reflecting how much each item is preferred by a user given
                                                                     his current context. The utility value for each item depends
                                                                     on two factors: the predicted rating and the “context score”
it is shown, the precision tends to be higher as the threshold       where context score represents the suitability of the item for
increases. Also, as expected, by increasing the confidence           a user in a given context. Rating can be predicted based on
threshold, recall is likely to decrease.                             any conventional recommendation algorithms such as kNN.
4.2   Evaluation of Recommendations                                     As an example application, we have used our method to
                                                                     mine hidden contextual data from customers’ reviews of ho-
As we are working with a sparse dataset, a preprocessing             tels in “Trip Advisor” dataset and used it to produce context-
phase has been added to the procedure in order to prune the          aware recommendations. Our evaluations indicate that using
matrix by removing all those items that have less than 5 rat-        the contextual information can improve the performance of
ings.                                                                the recommender system in terms of hit ratio.
   In previous sections, we introduced a context-aware rec-
ommender that produce recommendations for a user based on
a utility function that depends both the user’s current context      References
and also the predicted rating for that item. As recommenda-          [1] D. Ramge, D. Hall, R. Nallapati, and C. Manning, “La-
tions are based on utility function (and not ratings alone), it is       beled lda: a supervised topic model for credit attribution
not logical to use metrics such as MAE and other metrics that            in multi-labeled corpora,” in Proceedings of the 2009
compare the predicted rating with the actual ones. Instead,              Conference on Empirical Methods in Natural Language
hit ratio was chosen as our performance measure and we per-              Processing, 2009.
formed a leave-one-out cross validation experiment on those
                                                                     [2] G. Abowd, A. Dey, N. D. P.J. Brown, M. Smith, and
reviews that have ratings greater than the reviewer’s average
                                                                         P. Steggles, “Towards a better understanding of con-
rating. Having the recommendation size of k, the hit ratio is
                                                                         text and context-awareness,” Handheld and Ubiquitous
calculated as the probability that the left-out item is included
                                                                         Computing, vol. 1707, no. 2, pp. 304–307, 1999.
in the list of N recommendations. The standard item-based
kNN algorithm has also been run on the same dataset and un-          [3] H. Lieberman and T.Selker, “Out of context: Computer
der the same condition as our recommender method. Figure                 systems that adapt to, and lean from, context,” IBM Sys-
5 shows the hit ratio having different sizes of recommenda-              tems Journal, vol. 39, no. 3, pp. 617–632, 2000.
[4] W. Woerndl and J. Schlichter, “Introducing context into           classification,” Neural Information Processing Systems,
     recommender systems,” in Proceedings of AAAI Work-               vol. 22, 2008.
     shop on Recommender Systems in E-Commerce, 2007,            [19] D. j. S. W. R. Gilks, S. Richardson, Markov chain Monte
     pp. 138–140.                                                     Carlo in practice. London: Chapman & Hall, 1996.
[5] P. Dourish, “What do we talk about when we talk about        [20] L. Baltrunas, M. Kaminskas, F. Ricci, L. Rokach,
     context,” Personal and Ubiquitous Computing, vol. 8,             B. Shapira, and K. Luke, “Best usage context predic-
     no. 1, pp. 19–30, 2004.                                          tion for music tracks,” in Proceedings of the 2nd Work-
[6] G. Adomavicius and A. Tuzhilin, “Context-aware rec-               shop on Context Aware Recommender Systems, Septem-
     ommender systems,” in Proceedings of the 2008 ACM                ber 2010.
     conference on Recommender Systems. ACM, 2008.               [21] C. Cantrell, Modern Mathematical Methods for Physi-
[7] S. Bourke, K. McCarthy, and B. Smyth, “The so-                    cists and Engineers.        Cambridge University Press,
     cial camera: Recommending photo composition using                2000.
     contextual features,” in Proceedings of Workshop on         [22] R. L. S. Kullback, “On information and sufficiency,”
     Context-Aware Recommender System. ACM, 2010.                     Annals of Mathematical Statistics, vol. 22, pp. 79–86,
[8] K. Cheverst, N. Davies, K. Mitchell, A. Friday, and               1951.
     C. Efstratiou, “Developing a context-aware electronic
     tourist guide: some isues and experiences,” in Proceed-
     ings of the SIGCHI conference on Human Factors in
     Computing Systems, p. 17.
[9] L. Ardissono, A. Goy, G. Petrone, M. Segnan, and
     P. Torasso, “Intrigue: personalized recommendation of
     tourist attractions for desktop and hand held devices,”
     Applied Artificial Intelligence, vol. 17, no. 8, pp. 678–
     714, 2003.
[10] M. V. Setten, S. Pokraev, and J. Koolwaaij, “Context-
     aware recommendations in the mobile tourist applica-
     tion compass,” in Proceedings of Third International
     Conference In Adaptive Hypermedia and Adaptive Web-
     Based Systems. Springer, August 2004.
[11] L. Baltrunas and X. Amatriain, “Towards time-
     dependant recommendation based on implicit feed-
     back,” in Proceedings of Workshop on Context-Aware
     Recommender System. ACM, 2009.
[12] T. Bogers, “Movie recommendation using random
     walks over the contextual graph,” in Proceedings of
     Workshop on Context-Aware Recommender System.
     ACM, 2010.
[13] S. Anand and B. Mobasher, “Contextual recommenda-
     tion,” From Web to Social Web: Discovering and De-
     ploying User and Content Profiles, 2007.
[14] S. Aciar, “Mining context information from consumers
     reviews,” in Proceedings of Workshop on Context-
     Aware Recommender System. ACM, 2010.
[15] T. Hoffman, “Probabilistic latent semantic indexing,” in
     Proceedings of the 22nd annual international ACM SI-
     GIR conference on Research and development in infor-
     mation retrieval (SIGIR99), 1999.
[16] D. Blei, A. Ng, and M. Jordan, “Latent dirichlet al-
     location,” The Journal of Machine Learning Research,
     vol. 3, 2003.
[17] D. Blei and J. McAuliffe, “Supervised topic models,”
     Neural Information Processing Systems, vol. 21, 2007.
[18] L. Julien, F. Sha, and M. I. Jordan, “Disclda: Dis-
     criminative learning for dimensionality reduction and