=Paper=
{{Paper
|id=None
|storemode=property
|title=Context-Dependent Recommendations with Items Splitting
|pdfUrl=https://ceur-ws.org/Vol-560/paper16.pdf
|volume=Vol-560
|dblpUrl=https://dblp.org/rec/conf/iir/BaltrunasR10
}}
==Context-Dependent Recommendations with Items Splitting==
Context-Dependent Recommendations with Items Splitting Linas Baltrunas Francesco Ricci Free University of Bozen-Bolzano, Free University of Bozen-Bolzano, Piazza Università 1, Piazza Università 1, Bolzano, Italy Bolzano, Italy lbaltrunas@unibz.it fricci@unibz.it ABSTRACT called “word-of-mouth” and is now largely applied in the Recommender systems are intelligent applications that help “social” web. For example, amazon.com recommends items on-line users to tackle information overload by providing that user could be interested to buy or delicious.com recom- recommendations of relevant items. Collaborative Filter- mends the links that were tagged by alike users with com- ing (CF) is a recommendation technique that exploits users’ monly used tags. CF recommendations are computed by explicit feedbacks on items to predict the relevance of items leveraging historical log data of users’ online behavior [12]. not evaluated yet. In classical CF users’ ratings are not The relevance of an item is usually expressed and modeled specifying in which contextual conditions the item was eval- by the explicit user’s rating. The higher is the rating that a uated (e.g., the time when the item was rated or the goal of user assigned to an item, the more relevant is the item for the consumption). But, in some domains the context could the user. CF assumes that the user’s recorded ratings for heavily influence the relevance of the item and this must be items can help in predicting the ratings of like-minded users. taken into account. This paper analyzes the behavior of a We want to stress that this assumption is valid only to some technique which deals with context by generating new items extent. In fact, the user’s general interests can be relatively that are restricted to a contextual situation. The ratings’ stable, but, the exact evaluation of an item can be influenced vectors of some items are split in two vectors containing the by many additional and varying factors. In certain domains ratings collected in two alternative contextual conditions. the consumption of the same item can lead to extremely dif- Hence, each split generates two fictitious items that are used ferent experiences when the context changes [1, 4]. There- in the prediction algorithm instead of the original one. We fore, relevance of an item can depend on several contextual evaluated this approach on semi-synthetic data sets measur- conditions. For instance, in a tourism application the visit- ing precision and recall while using a matrix-factorization ing experience to a beach in summer is strikingly different algorithm for generating rating predictions. We compared from the same visit in winter (e.g., during a conference meet- our approach to the previously introduced reduction based ing). Here context plays the role of query refinement, i.e., method. We show that item splitting can improve system a context-aware recommender system must try to retrieve accuracy. Moreover, item splitting leads to a better recall the most relevant items for a user, given the knowledge of than the reduction based approach. the current context. However, most CF recommender sys- tems do not distinguish between these two experiences, thus providing a poor recommendation in certain situations, i.e., 1. INTRODUCTION when the context really matters. The Internet, interconnecting information and business Context-aware recommender systems is a new area of re- services, has made available to on-line users an over abun- search [1]. The classical context-aware reduction based ap- dance of information and very large product catalogues. proach [1] extended the classical CF method adding to the Hence, users trying to decide what information to consult or standard dimensions of users and items new ones represent- what products to choose may be overwhelmed by the num- ing contextual information. Here recommendations are com- ber of options. Recommender systems are intelligent appli- puted using only the ratings made in the same context as cations that try to solve information overload problem by the target one. For each contextual segment, i.e., sunny recommending relevant items to a user [2, 11]. Here an item weekend, algorithm checks (using cross validation) if gener- is usually a descriptive information about a product such as ated predictions using only the ratings of this segment are a movie, a book or a place of interest. Recommender sys- more accurate than using full data set. The authors use tems are personalized Information Retrieval systems where a hierarchical representation of context, therefore, the ex- users make generic queries, such as, ”suggest a movie to be act granularity of the used context is searched (optimized) watched with my family this night”. among those that improve the accuracy of the prediction. Collaborative Filtering (CF) is a recommendation tech- Similarly, in our approach we enrich the simple 2-dim. CF nique that emulates a simple and effective social strategy matrix with a model of the context comprising a set of fea- tures either of the user, or the item, or the evaluation. We adopt the definition of context introduced by Dey, where “Context is any information that can be used to character- ize the situation of an entity” [8]. Here, the entity is an item Appears in the Proceedings of the 1st Italian Information Retrieval consumption that can be influenced by contextual variables Workshop (IIR’10), January 27–28, 2010, Padova, Italy. http://ims.dei.unipd.it/websites/iir10/index.html Copyright owned by the authors. i1 i2 a new item is created. This step can be repeated for all the i items having a significant dependency of their ratings on the u1 1 u1 1 value of one contextual variable. In this paper we focus on u2 4 u2 4 a simple application of this method where an item is split Splitting u3 2 of a single u3 2 only into two items, using only one selected contextual vari- item i u4 5 u4 5 able. A more aggressive split of an item into several items, u5 2 u5 2 using a combination of features, could produce even more “specialized” items, but potentially increasing data sparsity. n items n+1 items We note again, that for the same user, and different items, Figure 1: Item splitting one can in principle obtain ratings in different contexts, as in our context model context depends on the rating. Therefore, items i1 and i2 could overlap, i.e., could be rated both by describing the state of the user and the item. In this paper the same user in different contextual conditions. However, we propose a new approach for using these contextual di- such situation are not very common. mensions to pre-filter items’ ratings. Actually, to be precise, We conjecture that the splitting could be beneficial if the the set of ratings for an item is not filtered but it is split into ratings within each newly obtained item are more homoge- two subsets according to the value of a contextual variable, nous, or if they are significantly different in the new items e.g., ratings collected in “winter” or in “summer” (the contex- coming from a split. One way to accomplish this task is to tual variable is the season of the rating/evaluation). These define an impurity criteria t [7]. So, if there are some can- two sets of ratings are then assigned to two new fictitious didate splits s ∈ S, which divide i into i1 and i2 , we choose items (e.g. beach in winter and in summer). the split s that maximizes t(i, s) over all possible splits in This paper extends the results presented in [5, 6]. Here S. A split is determined by selecting a contextual variable we evaluate the same item splitting technique in a differ- and a partition of its values in two sets. Thus, the space of ent set of experiments, namely we measure precision and all possible splits of item i is defined by the context model recall, whereas previously we used MAE. Also the nine semi- C. In this work we analyzed tIG impurity criteria. tIG (i, s) synthetical data sets are generated differently. Moreover, we measures the information gain (IG), also known as Kullback- extended our analyzes by studying the behavior of item split- Leibler divergence [10], given by s to the knowledge of the ting with respect to the various Information Gain thresholds. item i rating: tIG = H(i) − H(i1 )Pi1 + H(i2 )Pi2 where H(i) is the Shannon Entropy of the item i rating distribution and 2. ITEM SPLITTING Pi1 is the proportion of ratings that i1 receives from item i. To ensure reliability of this statistic we compute it only Our approach extends the traditional CF data model by for a split S that could potentially generate items each con- assuming that each rating rui in a m × n users-items matrix, taining 4 or more ratings. Thus, algorithm never generates is stored (tagged) together with some contextual information items with less than 4 ratings in the profile. c(u, i) = (c1 , . . . , cn ), cj ∈ Cj , describing the conditions un- der which the user experience was collected (cj is a nominal variable). The proposed method identifies items having sig- 3. EXPERIMENTAL EVALUATION nificant differences in the ratings (see later the exact test We tested the proposed method on nine semi-synthetic criteria). For each one of these items, our algorithm splits data sets with ratings in {1, 2, 3, 4, 5}. The data sets were its ratings into two subsets, creating two new artificial items generated using Yahoo!1 Webscope movies data set contains with ratings belonging to these two subsets. The split is 221K ratings, for 11,915 movies by 7,642 users. The semi- determined by the value of one contextual variable cj , i.e., synthetic data sets were used to analyze item splitting when all the ratings in a subset have been acquired in a context varying the influence of the context on the user ratings. The where the contextual feature cj took a certain value. So, original Yahoo! data set contains user age and gender fea- for each item the algorithm seeks for a contextual feature cj tures. We used 3 age groups: users below 18 (u18), between that can be used to split the item. Then it checks if the two 18 and 50 (18to50), and above 50 (a50). We modified the subsets of ratings have some (statistical significant) differ- original Yahoo! data set by replacing the gender feature ence, e.g., in the mean. If this is the case, the split is done with a new artificial feature c ∈ {0, 1} that was assigned and the original item in the ratings matrix is replaced by the randomly to the value 1 or 0 for each rating. This feature c two newly generated items. In the testing phase, the rating is representing a contextual condition that could affect the predictions for the split item are computed for one of the rating. We randomly choose α ∗ 100% items from the data newly generated item. For example, assume that an item set and then from these items we randomly chose β ∗100% of i has generated two new items i1 and i2, where i1 contains the ratings to modify. We increased (decreased) the rating ratings for item i acquired in the contextual condition cj = value by one if c = 1 (c = 0) and if the rating value was v, and i2 the ratings acquired in context cj v̄, hence the two not already 5 (1). For example, if α = 0.9 and β = 0.5 the sets partition the original set of ratings. Now assume that corresponding synthetic data set has 90% of altered items’ the system needs to compute a rating prediction for the item profiles that contains 50% of changed ratings. We gener- i and user u in a context where cj = x. Then the prediction ated nine semi-synthetic data sets varying α ∈ {0.1, 0.5, 0.9} is computed for the item i1 if x = v, or i2 if x 6= v, and is and β ∈ {0.1, 0.5, 0.9}. So, in these data set the contextual returned as the prediction for i. condition is more “influencing” the rating value as α and β Figure 1 illustrates the splitting of one item. As input, increase. the item splitting step takes a m × n rating matrix of m In this paper we used matrix factorization (FACT ) as the users and n items and outputs a m × (n + 1) matrix. The 1 total number of ratings in the matrix does not change, but Webscope v1.0, http://research.yahoo.com/ 10% Items 90% Ratings 50% Items 90% Ratings 90% Items 90% Ratings 10% Items 90% Ratings 50% Items 90% Ratings 90% Items 90% Ratings 0.92 0.92 0.92 0.90 0.90 0.90 0.68 0.68 0.68 0.88 0.88 0.88 0.66 0.66 0.66 0.86 0.86 0.86 0.64 0.64 0.64 0.62 0.62 0.62 0.84 0.84 0.84 0.60 0.60 0.60 0.82 0.82 0.82 0.58 0.58 0.58 0.80 0.80 0.80 0.56 0.56 0.56 10% Items 50% Ratings 50% Items 50% Ratings 90% Items 50% Ratings 10% Items 50% Ratings 50% Items 50% Ratings 90% Items 50% Ratings 0.92 0.92 0.92 0.90 0.90 0.90 0.68 0.68 0.68 0.88 0.88 0.88 0.66 0.66 0.66 0.86 0.86 0.86 0.64 0.64 0.64 0.62 0.62 0.62 0.84 0.84 0.84 0.60 0.60 0.60 0.82 0.82 0.82 0.58 0.58 0.58 0.80 0.80 0.80 0.56 0.56 0.56 10% Items 10% Ratings 50% Items 10% Ratings 90% Items 10% Ratings 10% Items 10% Ratings 50% Items 10% Ratings 90% Items 10% Ratings 0.92 0.92 0.92 0.90 0.90 0.90 0.68 0.68 0.68 0.88 0.88 0.88 0.66 0.66 0.66 0.86 0.86 0.86 0.64 0.64 0.64 0.62 0.62 0.62 0.84 0.84 0.84 No Context 0.60 0.60 0.60 No Context 0.82 0.82 0.82 0.58 0.58 0.58 0.80 0.80 0.80 Reduction 0.56 0.56 0.56 Reduction Item-Split Item-Split (a) Precision (b) Recall Figure 2: Comparison of contextual pre-filtering methods. rating prediction technique. We used the algorithm imple- Subsection 3.3 we report result while choosing bigger values mented and provided by Timely Development2 . FACT uses that typically decrease the impact of item splitting. As we 60 factors and the other parameters are set to the same expected, the smaller is the impact of the contextual fea- values optimized for another data set (Netflix), so it might ture c, the smaller is the improvement of the performance not be the best setting, but all the system variants that we measure obtained by the methods that do use the context. compared used the same settings. To evaluate the described In fact, item splitting improved the performance of baseline methods we used 5-fold cross-validation and measured pre- method for 4 data sets: α ∈ {0.5, 0.9}, β ∈ {0.5, 0.9}. The cision and recall. The usage of precision and recall in recom- highest improvement for precision of 9.9% was observed for mender systems needs some clarification. These measures, the data set α = 0.9, β = 0.9 where most items and most in its purest sense, are impossible to measure as they would ratings were influenced by the artificial contextual feature. require the knowledge of the rating (relevance) of each item Increasing the value of α and β, i.e., increasing the number and user combination [9]. Usually there are thousands of of items and ratings that are correlated to the value of the candidate items to recommend (11K in our case) and just context feature, decreased the overall precision and recall for a small percentage of them we know the true user’s evalu- of the baseline method. We conjecture, that the contextual ation (typically less than 1%) . Herlocker et al. [9] proposed condition plays the role of noise added to the data, even if to approximate these measures by computing the prediction this is clearly not noise but a simple functional dependency just for user × item pairs that are present in the ratings from a hidden variable. In fact, FACT cannot exploit the data set, and consider items worth recommending (relevant additional information brought by this feature and cannot items) only if the user rated them 4 or 5. We computed the effectively deal with the influence of this variable. measures on full test set (of each fold), while trained the Reduction based approach increased precision by 1.3% models on the train set. Please refer to [5] for additional only for α = 0.9, β = 0.9 data set. This is the data set, experiments. These include the evaluation of other impu- where artificial contextual feature has highest influence on rity criteria, the performance of the proposed method on the ratings and 90% of items are modified. In [1] the authors the original Yahoo! data set, and experiments using other optimized MAE when searching for the contextual segments prediction methods such as user-based CF while computing where the context-dependent prediction improves the de- Mean Absolute Error (MAE). fault one (no context). Here, we searched for the segments where precision and recall is improved and we used all better 3.1 Context-aware Prediction Methods performing segments to make the predictions. For example, To understanding the potential of item splitting in a context- Figure 2(a) reports the precision of reduction based. To con- dependent set of ratings we tested this approach on the duct this experiment, the algorithm first sought (optimized) semi-synthetical data sets described earlier, i.e., replacing the contextual segments where precision is improved (using a the gender feature with a new contextual variable that does particular split of train and test data). Then, when it has to influence the ratings. The baseline method is FACT when make a rating prediction, used either only the data in one of no contextual information is considered. It is compared these segments, i.e., if the prediction is for a item-user com- with the context-aware reduction based approach [1], and bination in one of the found segments, or all the data, i.e., if our item splitting technique. Figure 2 shows comparison of the item-rating is in one contextual conditions where no im- three methods for the nine semi-synthetic data sets. For provements can be found with respect to the baseline. Note, each data set we computed precision and recall. We con- that in all three data sets where α = 0.5, β ∈ {0.1, 0.5, 0.9} sidered item as worth recommending if algorithm made a the results are similar to the baseline approach. In these prediction greater or equal to 4. For all the nine data sets cased the reduction base approach does consider the seg- the algorithm splits an item if any split leads to an IG bigger ments generated using the artificial feature. However, the than 0.01. The small IG threshold value led to a good re- data set was constructed in such a way that half of the items sults in our previous experiments [6] and it allows algorithm do not have ratings’ dependencies on the artificial feature, to split up to 15% of items (depending on the data set). In and no benefit is observed. These experiments show that both context-aware pre-filtering 2 http://www.timelydevelopment.com 1.00 90%i-50%r 1.00 90%i-90%r gives valuable insights into the behavior of reduction base approach. We see, that at each level of the recommendation 0.95 0.95 threshold it shows a higher recall value than the other two 0.90 0.90 methods. At the highest level of precision, reduction based Precision approach is close to item splitting and gives improvement of 0.85 0.85 6.1% in precision for α = 0.9, β = 0.9 data set and 1.3% for 0.80 No Context 0.80 No Context α = 0.9, β = 0.5 data set. But, the precision/recall curve of 0.75 Reduction 0.75 Reduction reduction based is always below than that of item split. Item-Split Item-Split In conclusion we want to note that considering both pre- 0.700.0 0.2 0.4 0.6 0.8 1.00.700.0 0.2 0.4 0.6 0.8 1.0 cision and recall, we see that both context-aware recommen- Recall Recall dation methods yields quite similar results. More noticeably, Figure 3: Precision/recall curves for two data sets. both methods outperforms baseline CF which does not take context into account. approaches can outperform the base line FACT CF method, when the context influences the ratings. It is worth noting 3.3 Item Splitting for Various IG Thresholds that item splitting is computationally cheaper and it per- To better understand the item splitting method we fur- formed better than reduction based. Note also that, accu- ther analyzed the prediction processes. We looked at the racy could depend on the particular baseline prediction al- number of items the algorithm splits and also on which at- gorithm, i.e., FACT in our experiments. However, we choose tribute the split was performed. For this purpose we var- FACT as it is now currently largely used, and in our previ- ied the item splitting threshold parameter. For this experi- ous experiments it outperformed traditional user-based CF ment we used tIG impurity measure and the three data sets: method [5]. α = 0.9β ∈ {0.1, 0.5, 0.9}. The summary of the results are shown in Figure 4. Figures 4(a), 4(b), 4(c) show the number 3.2 Precision Versus Recall of splits that the item split algorithm performs varying the In this section we illustrate the precision/recall curves for IG threshold for the three considered data sets. When using the three selected methods. For this experiments we reused α = 0.9, β = 0.1 the algorithm chooses the artificial fea- the three data sets: α = 0.1, β ∈ {0.1, 0.5, 0.9}. As was done ture approximately twice as often as the age feature. More in the previous experiment, we set the IG threshold to 0.01. precisely, when the threshold is IG = 0.2 item split splits For the reduction based approach we optimized precision. 101.8 items (on average in 5 folds); the artificial feature was The results can be seen in Figure 3. The left figure shows chosen 69.8 and age feature was chosen 32 times. When results for α = 0.9, β = 0.5 data set and the right figure for the influence of artificial feature increases, a higher propor- α = 0.9, β = 0.9. We skip the α = 0.9, β = 0.1 data set, as tion of items are split using the artificial feature. For the for this data set all three methods perform similarly to each α = 0.9, β = 0.9 data set and IG = 0.2 it splits 576.8 items other. Each curve was computed by varying the threshold using the artificial feature and 29.8 using the age feature. at which a recommendation is done. For example, all meth- Note, that despite IG favors attributes with many possible ods obtained the highest precision when recommending the values [10] item splitting chooses the attribute having larger items that were predicted as rating 5. In this case, we do influence on the rating. We further observe that the number not recommend the items that were predicted with a lower of split items is not large. For all three data sets we split rating. Note that we always count recommendation as rel- no more than 2050 items (17%). This low number can be evant if user rated the item 4 or 5. We set the threshold explained by looking at the size of items’ profiles. Note that to values equal to {1, 1.5, 2, 2.5, 3, 3.5, 4, 4.5, 5}. Note, in the considered data sets the average number of ratings that previous experiment (see. Figure 2) was done with the per item is 18.5. Algorithm splits item only if the newly recommendation threshold equal to 4. The recall is equal to generated item has at least 4 ratings. Therefore, item must 1 if we recommend all the items, i.e., those predicted with have a minimum of 8 ratings to be considered for splitting. a rating of 1 and higher. Even at this level of recall, the Lowering the minimum number of ratings in the item pro- precision is more than 70%. This can be explained by the file, could cause unreliable computation of statistics and was high fraction of high ratings in the data set. observed to decrease the overall performance. Recommender systems usually try to improve precision. Figures 4(d), 4(e) shows precision and recall accuracy Having recall as small as 0.01, we could still be able to rec- measures for three data sets. We observe, that item split- ommend too many items for user to consume, i.e., approx- ting is only beneficial when context (i.e., artificial feature imately 119 items in our data set. Interestingly, as we can here) has an high influence on the rating. The best perfor- see it is also much harder to make precise recommendations mance for the α = 0.9, β = 0.1 data set, both for recall and than to obtain high recall. The curves for all three meth- precision, is obtained when no items are split. Each split ods get flat when approaching precision 0.97. At this point of an item affects also the prediction for the items that are we recommend only the items that were predicted with rat- not split. Splitting an item is equivalent to create two new ing 5. This is the maximum possible predicted rating by items and deleting one, therefore, it causes a modification of FACT and precision can not be improved by varying the the data set. When CF generates a prediction for a target threshold at which recommendation is done. We also ob- user-item pair all the other items’ ratings, including those serve, that we can achieve higher maximum precision for in the new items coming from some split, are used to build item splitting method comparing to other methods. When that prediction. In [5] we observed that we can increase α = 0.9 and β = 0.9, the highest precision value for item the performance on split items, but at the same time the split improves by 7% the baseline method. The improve- decrease of performance on the untouched items can cancel ment when α = 0.9 and β = 0.5 is 2.7%. This experiment any benefit. When α = 0.9, β = 0.5 the situation is dif- 1800 1800 1800 0.92 0.68 1600 Artificial 1600 Artificial 1600 Artificial 0.91 0.66 1400 Age 1400 Age 1400 Age 0.90 1200 1200 1200 0.89 0.64 #Items Split #Items Split #Items Split precision 1000 1000 1000 0.88 recall 800 800 800 0.87 0.62 600 600 600 0.86 90%i-10%r 0.60 90%i-10%r 400 400 400 0.85 90%i-50%r 0.58 90%i-50%r 200 200 200 0.84 90%i-90%r 90%i-90%r 00.0 0.2 0.4 0.6 0.8 1.0 00.0 0.2 0.4 0.6 0.8 1.0 00.0 0.2 0.4 0.6 0.8 1.0 0.830.0 0.2 0.4 0.6 0.8 1.0 0.560.0 0.2 0.4 0.6 0.8 1.0 treshold (IG) treshold (IG) treshold (IG) treshold (IG) treshold (IG) (a) 90%i-10%r (b) 90%i-50%r (c) 90%i-90%r (d) precision (e) recall Figure 4: Item splitting behavior for different thresholds. ferent. We observe, that here splitting more items leads to 5. REFERENCES an increase in precision and decrease in recall. Finally, for [1] G. Adomavicius, R. Sankaranarayanan, S. Sen, and α = 0.9, β = 0.9 splitting more items increase the precision A. Tuzhilin. Incorporating contextual information in and recall, and this is maximum when the IG threshold is recommender systems using a multidimensional equal to 0.1. In conclusion, we could regard item split as a approach. ACM Transactins on Information Systems, more dynamical version of reduction based. Here the split 23(1):103–145, 2005. is done for each item separately and using an external mea- [2] G. Adomavicius and A. Tuzhilin. Toward the next sure (such as IG) to decide if the split is needed. Using the generation of recommender systems: A survey of the IG criteria, splitting items is beneficial when context highly state-of-the-art and possible extensions. IEEE influences the ratings. Transactions on Knowledge and Data Engineering, 17(6):734–749, 2005. [3] G. Adomavicius and A. Tuzhilin. Context-aware 4. CONCLUSIONS AND FUTURE WORK recommender systems. In P. Pu, D. G. Bridge, This paper evaluates a contextual pre-filtering technique B. Mobasher, and F. Ricci, editors, RecSys, pages for CF, called item splitting. Based on the assumption 335–336. ACM, 2008. that certain items may have different evaluations in dif- [4] S. S. Anand and B. Mobasher. Contextual ferent contexts, we proposed to use item splitting to cope recommendation. In Lecture Notes In Artificial with this. The method is compared with a classical context- Intelligence, volume 4737, pages 142–160. aware pre-filtering approach [1] which uses extensive search- Springer-Verlag, Berlin, Heidelberg, 2007. ing to find the contextual segments that improve the base- [5] L. Baltrunas and F. Ricci. Context-based splitting of line prediction. As a result we observed that despite the item ratings in collaborative filtering. In L. D. increased data sparsity, item splitting is beneficial, when Bergman, A. Tuzhilin, R. Burke, A. Felfernig, and some contextual feature separates the item ratings into two L. Schmidt-Thieme, editors, RecSys, pages 245–248. more homogeneous rating groups. However, if the contex- ACM, 2009. tual feature is not influential the splitting technique some- [6] L. Baltrunas and F. Ricci. Context-dependent items times produced a minor decrease of the precision and re- generation in collaborative filtering. In call. Item-splitting outperforms reduction based context- G. Adomavicius and F. Ricci, editors, Proceedings of aware approach when FACT CF method is used. Moreover, the 2009 Workshop on Context-Aware Recommender the method is more time and space efficient and could be Systems, 2009. used with large context-enriched data bases. [7] L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. The method we proposed can be extended in several ways. Stone. Classification and Regression Trees. For instance one can try to split the users (not the items) Statistics/Probability Series. Wadsworth Publishing according to the contextual features in order to represent Company, Belmont, California, U.S.A., 1984. the preferences of a user in different contexts by using vari- ous parts of the user profile. Another interesting problem is [8] A. K. Dey. Understanding and using context. Personal to find a meaningful item splitting in continuous contextual Ubiquitous Comput., 5(1):4–7, February 2001. domains such as time or temperature. Here, the splitting [9] J. L. Herlocker, J. A. Konstan, L. G. Terveen, John, is not easily predefined but have to be searched in the con- and T. Riedl. Evaluating collaborative filtering tinuous space. Finally, item splitting could ease the task of recommender systems. ACM Transactions on explaining recommendations. The recommendation can be Information Systems, 22:5–53, 2004. made for the same item in different context. The contextual [10] J. R. Quinlan. C4.5: Programs for Machine Learning condition on which the item was split could be mentioned (Morgan Kaufmann Series in Machine Learning). as justifications of the recommendations. For example, we Morgan Kaufmann, 1 edition, January 1993. recommend you to go to the museum instead of going to the [11] P. Resnick and H. R. Varian. Recommender systems. beach as it will be raining today. We would also like to ex- Communications of the ACM, 40(3):56–58, 1997. tend our evaluation of the proposed algorithm. First of all, [12] J. B. Schafer, D. Frankowski, J. Herlocker, and S. Sen. we want to use real world context-enriched data. Moreover, Collaborative filtering recommender systems. In The we want to evaluate precision and recall at top-N recommen- Adaptive Web, pages 291–324. Springer Berlin / dation list. At the end, we want to develop a solution to be Heidelberg, 2007. able to deal with missing contextual values.