A Contextual Modeling Approach to Context-Aware Recommender Systems Umberto Panniello Michele Gorgoglione Polytechnic of Bari Polytechnic of Bari Bari, Italy Bari, Italy u.panniello@poliba.it m.gorgoglione@poliba.it ABSTRACT Mdl4, each of them selecting contextual neighborhoods in its own Methods for generating context-aware recommendations were way. The long-term goal of this research is to identify several classified into the pre-filtering, post-filtering and contextual different contextual modeling approaches, including our modeling approaches. This paper proposes a novel type of contextual neighbor approach, and investigate strength and weak contextual modeling (CM) based on the contextual neighbors points of each one by comparing these approaches among each approach and introduces four specific contextual neighbors other and to other approaches to CARS. Because of space limits, methods. It compares these four types of contextual neighbors in this paper we only present the results of the first step of this techniques to determine the best-performing alternative among research, i.e. the comparison of the four contextual neighbors them. Then it compares this best-of-breed method with the methods among them, to identify the best performing one. Then contextual pre-filtering, post-filtering and un-contextual methods we select the best-of-breed contextual neighbors method and to determine how well the CM approach compares with other compare it with the pre-, post-filtering and un-contextual methods context-aware recommendation techniques. across various experimental settings in order to determine how well the CM methods fit into the overall taxonomy of CARS methods. Future papers will present the comparison of the Categories and Subject Descriptors contextual neighbor approach to other existing approaches to CM, H.3.3 [Information Systems]: Information Search and Retrieval such as those proposed by [6] and [11]. General Terms 2. BACKGROUND ON CARS Recommender system, Context-aware, Algorithms. Unlike the traditional two-dimensional (2D) recommender systems that deal with two types of entities, Users (e.g., customers) and Items (e.g., products) and try to estimate unknown Keywords ratings in the Users × Items matrix of Users and Items, context- Recommender systems, pre-filtering, post-filtering, contextual aware recommender systems (CARS) also take into account modeling. contextual information [1]. In this paper, we follow the representational view to modeling contextual information [3] and 1. INTRODUCTION assume that the context is defined with a predefined set of To incorporate contextual information into recommender systems observable attributes. For the CARS paradigm, this means that the (RSes), a new subfield, called CARS (Context–Aware rating (or utility) function is of the following form [2]: Recommender Systems), has recently emerged. There are several R: Users x Items x Context  Ratings approaches to incorporating contextual information into recommender systems that were previously proposed in the where Context is a set of contextual variables, each such variable literature [2]. In particular, they are categorized into contextual K having a hierarchical structure defined by a set of k atomic modeling, pre-filtering and post-filtering methods [2]. Although variables, i.e., K = (K1,…, Kq) [2]. Further, the values taken by the contextual pre- and post-filtering methods have been variable Kq define finer (more granular) levels, while K1 coarser previously studied before, e.g. in [9], the contextual modeling (less granular) levels of contextual knowledge [7]. For example, methods have been little explored. Among the few attempts, [6] Figure 1(a, b) presents the hierarchy for the contextual variables and [11] proposed two approaches to include context in the “Season” and “Intent of the purchase” respectively that we use in recommendation engine. In this paper, we study the contextual the study presented in Sections 4. modeling (CM) methods and propose a specific type of CM that The function R can be of the following two types. In the ratings- we call contextual neighbors. We also propose four specific types based RSes, users rate some of the items that they have seen in the of contextual neighbors methods, called Mdl1, Mdl2, Mdl3 and past by specifying how much they liked these items. Alternatively, in the transaction-based RSes, function R defines the utility of an item for a user and is usually specified either as (a) a Boolean Permission to make digital or hard copies of all or part of this work for variable indicating if the user bought a particular item or not, (b) personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that as the purchasing frequency of an item, or (c) as a click-through copies bear this notice and the full citation on the first page. To copy rate (CTR) of various Web objects (URLs, ads, etc.) [5]. In this otherwise, or republish, to post on servers or to redistribute to lists, paper we follow the transaction-based approach and measure the requires prior specific permission and/or a fee. utility of product j for user i with the purchasing frequency xij specifying how often user i purchased product j. CARS-2011, October 23, 2011, Chicago, Illinois, USA Copyright is held by the author/owner(s). (a) (b) K1 K2 Figure 1 Hierarchical structure of the contextual information for (a) DB1 and (b) DB2 datasets. As proposed in [2] and shown in Figure 2, this estimation can be collaborative filtering and works as follows. First, for each user i done using the following three types of methods, each of them and context k, we define the user profile in context k, i.e. the starting with data (on users, items, ratings and contextual contextual profile Prof(i, k). For example, if contextual variable k information) and resulting in generating contextual has two values (e.g., Winter and Summer), then we have two recommendations: contextual profiles for each user, one for the Winter and the other for the Summer. Note that these contextual profiles can be defined in many different ways, some of which are presented in [8] [6, 11], and our approach does not depend on any particular choice of a profiling method. However, in the experimental study described in Section 4 we use the following specific contextual profiling technique. As explained in Section 2, we follow the transaction-based approach to RSes and measure the utility rijk of product j for user i in context k with the purchasing frequency xijk specifying how often user i purchased product j in context k. Then we use this measure to define contextual profile as Prof(i, k) = (ri1k , … rink ). We use these profiles to define similarity among users and also to Figure 2 How to use context in the recommendation process. define and find N nearest “neighbors” of user i in context k, where “neighbors” are determined using contextual profiles Prof(i’, k’) 1. Contextual pre-filtering (PreF): contextual information is and similarity measures between the profiles. In order to focus on used to filter out irrelevant ratings before they are used for the comparison among CARS, we decided to use a popular CF computing recommendations using classical (2D) methods. approach as a common method on the CARS that we compare, 2. Contextual post-filtering (PoF): contextual information is despite much research has generated more sophisticated methods, used after the classical (2D) recommendation methods are applied and defined the distance using the cosine similarity in our to the standard (non-contextual) recommendation data. experiments. The basis for generating different contextual 3. Contextual modeling (CM): contextual information is used neighbors approaches is the way context is used to form the inside the recommendation-generating algorithms. neighborhood. We find N pairs (i’, k’) such that the similarity The work presented in [2] helped researchers to understand between these profiles is the largest among all the candidate pairs different aspects of using the contextual information in the (i’, k’) subject to the following constraints: recommendation process. However, [2] did not examine which of • Mdl1: There are no constraints on the set of (i’, k’) pairs, and these methods are more effective for providing contextual we select N pairs that are the most similar to (i, k). recommendations. To address this issue, Panniello et al. [9] proposed certain contextual pre- and post-filtering approaches and • Mdl2: we select an equal proportion of pairs (i’, k’) compared them among themselves and also with the un-contextual corresponding to each context k (e.g., if the contextual variable (2D) approach to determine which one is better. More has only two values, Winter and Summer respectively, and the specifically, [9] proposed the Weight and Filter post-filtering neighborhood size is 80, we select 40 neighbors from Winter and approaches and the exact pre-filtering (EPF) method. Because of 40 from Summer). space limits we do not present the details which can be found in • Mdl3: we select N pairs (i’, k’) that are the most similar to (i, k) the original paper. [9] shows that the comparison of the un- corresponding to each context k at the same level of the context of contextual and the contextual RSes depends very significantly on interest (e.g., if the context of interest is “Winter Holiday” in Fig. the type of the post-filtering method used. The results also 2(a), we select the neighborhood by using only profiles referred to suggested that there is a big difference between good and bad level K2 of that contextual variable). post-filtering approaches in terms of performance measures. For example, the performance differences between the Filter and • Mdl4: we select an equal proportion of pairs (i’, k’) Weight methods range between 37% and 90% for the F-measure corresponding to each context k at the same level of the context of across different datasets and varies between 2.5 and 17 for the interest (e.g., if the context of interest is “Winter Holiday” in Fig. MAE and 1 and 3.5 for the RMSE metric. 2(a) and the neighborhood size is 80, we define the neighborhood by using 20 users from the context “Winter Holiday”, 20 users 3. CONTEXTUAL MODELING from the context “Winter Not Holiday”, 20 users from the context In this section we present a new CM method called contextual- “Summer Holiday” and 20 users from the context “Summer Not neighbors CM, and see how it compares against the pre- and the Holiday”). post-filtering methods. This approach is based on user-based After selecting the neighbors, we used their contextual profiles to method for estimating unknown ratings in the pre-, the post- make the rating predictions. Once we introduced the contextual filtering and CMs cases to make sure that we compare “apples neighbors approach and its four implementations Mdl1, Mdl2, with apples”. Since our aim was to compare different contextual Mdl3 and Mdl4, we next want to (a) compare them to determine approaches instead of finding the best contextual approach, we which one is the best among them, and (b) see how it compares used a well known collaborative filtering instead of a newest, but against the previously studied pre- and post-filtering methods. less known, recommendation engine. Further, we have performed t-tests in order to determine if the 4. EXPERIMENTAL SETUP chosen contextual variables matter. The results of these tests In this study, we compared the four types of CM methods, demonstrated that the contextual variables Season and contextual modeling vs. the un-contextual case, and pre- vs. post- IntentOfPurchase matter (i.e., result in statistically significant filtering vs. contextual modeling recommendations across a wide differences in ratings across the values of the contextual variable range of experimental settings. First, we selected two different at 95%). We used Precision, Recall, F-Measure, Mean Absolute data sets having contextual information. The first dataset (DB1) Error (MAE) and Root Mean Square Error (RMSE) [4] as comes from an e-commerce website commercially operating in a performance measures in our experiments. To this aim we divided certain European country which sells electronic products. For this each dataset into the training and the validation sets, the training dataset, we selected the time of the year (or Season) as a set containing 2/3 and the validation set 1/3 of the whole dataset. contextual variable (Fig. 1(a)). The classification into Summer or For the DB1 dataset, the first two years were the training set and Winter and Holiday or Not Holiday is based on the experiences of the third year was the validation set. For the DB2 dataset, we the CEO of the e-commerce website that we used in our study. randomly split it in 2/3 for the training set and the remaining 1/3 The second dataset (DB2) is taken from the study described in [8]. for the validation set (in this case, it was impossible to make a The key contextual information elicited from the students was the good temporal split because all the transactions were made within intent of a purchase (IntentOfPurchase), and it was hierarchically a couple of months). classified as in Fig. 1(b). In our study, we recommend product categories instead of 5. RESULTS individual items because the e-commerce applications that we First of all we compared the four contextual modeling approaches consider have very large numbers of items (hundreds of thousands among themselves across each experimental setting. In particular, or even millions). Therefore, if single items were used, the Fig. 3 shows the comparison between the four CM approaches conversion from implicit to explicit ratings would not work due to (namely Mdl1, Mdl2, Mdl3 and Mdl4) for each dataset. Because of the low amount of rated data (e.g., many of the products were not space limits, we only show the graphs of F-measure, Recall and purchased at all). We tried different item aggregation strategies RMSE for the two databases. The graphs of Precision and MAE and found that the best results are for 14 categories for DB1 and are very similar to those of F-measure and RMSE, respectively. 24 categories for DB2. In particular, we performed experiments Fig. 3 demonstrates that the performances of the four CM varying the number of categories and we found that each approaches are not remarkably different. The difference between recommender system reached the best performances with these Mdl1 and Mdl2 for DB1 is 0.008, 0.13 and 0.02 in terms of F-, levels of aggregation. For our two datasets, we aggregated items MAE and RMSE measures, respectively and for DB2 is 0.09, into categories of products according to the classification 0.17, 0.18, respectively, which is not very significant. In provided by the Web site product catalogue. When using a comparison, performance differences between various pre- and context-aware recommender system it is useful to recommend a post-filtering methods, as reported in [9], are much more category instead of a product because users may not know what pronounced in comparison to these differences (as an example, [9] categories to look for in a specific context (for example, one may reports that post-filtering Filter PoF method outperformed exact would like to receive a recommendation about the category of pre-filtering on DB1 by 0.21 and on DB2 by 0.4 in terms of the F- items for a not familiar context, such as a gift for a child). measure). Also, Mdl1 slightly dominates other Mdl methods in some cases (see Fig. 3(b)) and is very close to Mdl2 in other cases The utility of items for the customers were measured by the (see Fig. 3(a)). This makes sense because the N neighbors are purchasing frequencies, as described in Section 2 for the selected for Mdl1 in an unconstrained manner, whereas they are transaction-based RSes. Estimations of unknown utilities were selected according to various types of constraints for the other done by using a standard user-based collaborative filtering (CF) three approaches. Since Mdl1 (slightly) outperforms other Mdl method [10]. methods, we selected it as the “best-of-breed” and will use it for The neighborhood size N was set to N = 80 users as follows. We comparing it with the pre- and the post-filtering methods in the performed an experiment where we varied the neighborhood size, rest of the paper. moving from 30 to 200 users, and we computed the F-measure. We have also compared the contextual neighbors methods with We performed this experiment for each dataset. In general, the F- the un-contextual approach across various experimental measure increased as we increased the number of neighbors. conditions. Table 1 reports all the accuracy gains (in terms of F- However, these improvement gains stopped when we set the measure) across each recommender systems for DB1 and DB2 neighborhood size around 80, and the performance decreased (negative values mean performance reduction). For example, its when it went over 80 users. Therefore, we set N = 80 users as an first row shows the performance gains (reductions), in terms of F- appropriate neighborhood size for our experiments. measure, for the un-contextual RS vis-à-vis the EPF, Filter PoF, When comparing the pre-, the post-filtering and the contextual Weight PoF and Mdl1 methods. The matrix in Table 1 is anti- modeling methods, we used the two post-filtering approaches symmetric, as should be the case when two methods are compared (Weight and Filter), the exact pre-filtering (EPF) method and the in terms of their relative performance. As Table 1 shows, the four contextual modeling methods Mdl1, Mdl2, Mdl3 and Mdl4 contextual modeling approaches dominate the un-contextual case described above. Furthermore, we used the same user-based CF across all the levels of context for the F-Measure, Precision, MAE a) F-MEASURE RECALL RMSE 0.55 0.55 0.35 0.5 0.5 0.325 0.45 0.45 Mdl_1 Mdl_1 0.3 0.4 0.4 Mdl_2 Mdl_1 Mdl_2 Mdl_3 0.275 Mdl_3 Mdl_3 0.35 0.35 Mdl_4 Mdl_2 Mdl_4 0.3 0.3 0.25 Mdl_4 b) F-MEASURE RECALL RMSE 0.55 0.8 0.6 0.5 0.75 0.45 0.55 0.7 0.4 0.65 0.5 0.35 0.6Mdl_1 Mdl_1 0.45 Mdl_1 0.3 0.55Mdl_2 Mdl_2 0.25 0.4 Mdl_3 0.5 0.2 Mdl_3 Mdl_3 0.45 0.35 Mdl_2 0.15 Mdl_4 0.1 0.4 Mdl_4 Mdl_4 0.3 Figure 3 Comparison between the four contextual modeling methods for (a) DB1 and (b) DB2. and RMSE. For example, if we consider Mdl1 and the F-measure, un-contextual case and some of the weaker post-filtering methods, for DB1, the difference between contextual and un-contextual such as Weight PoF. However, like EPF, it is inferior to the best- models is 22% on average and for DB2 it is 7% on average.Mdl1 performing post-filtering methods, such as Filter PoF. However, clearly outperform the un-contextual method. The fact that the as argued in [9], finding the best-performing post-filtering contextual modeling methods outperform the un-contextual one in methods can be a hard problem. Therefore, the CM approach, as almost all of the cases is not surprising because the contextual represented in this paper by the Mdl1, Mdl2, Mdl3 and Mdl4 modeling method uses the same information as the un-contextual methods, constitutes a stable, easy to implement and a reasonably one and also includes the contextual variable which brings well-performing alternative that does not require expensive homogeneity in the data without causing the sparsity effect. We identification procedures, unlike the post-filtering methods. next compare the CM, pre-filtering and post-filtering approaches Therefore, considering our experiment settings it has its niche to determine the best among them. among the range of various CARS methods, as EPF does. As explained in Section 2, the performance of the post-filtering Table 1 F-measure gains (reductions) across recommender methods may significantly depend on the type of the post-filtering systems for DB1 and DB2. approach being used [9]. Therefore, we decided to use two post- Un-contextual EPF Filter PoF Weight PoF Mdl1 filtering methods in our experiments, Weight PoF and Filter PoF, Un-contextual 0% -2% -20% 27% -22% to account for these differences. Fig. 4 presents the comparison EPF 2% 0% -19% 29% -21% results among the two post-filtering methods Weight PoF and DB1 Filter PoF 20% 19% 0% 59% -3% Filter PoF, the exact pre-filtering (EPF) and the contextual Weight PoF -27% -29% -59% 0% -39% modeling method Mdl1 across each contextual level and each Mdl1 22% 21% 3% 39% 0% dataset (DB1 and DB2). As Fig. 4 (and Table 1) demonstrates, Un-contextual 0% -2% -27% 14% -7% Filter PoF dominates the EPF approach across the considered EPF 2% 0% -26% 16% -5% experimental settings. In particular, the difference between Filter DB2 Filter PoF 27% 26% 0% 57% 28% PoF and EPF in terms of F-measure is 19% on average for DB1 Weight PoF -14% -16% -57% 0% -18% and 26% for DB2. In contrast, EPF dominates Weight PoF in our Mdl1 7% 5% -28% 18% 0% experiments. In particular the difference between EPF and Weight PoF models in terms of F-measure is 29% on average for DB1 and 16% for DB2. In addition, the CM method Mdl1, dominates the 6. CONCLUSIONS In this paper we proposed a new type of CM, that we called Weight PoF and in some cases the EPF. In particular, the contextual neighbors, and four specific types of contextual difference between Mdl1 and Weight PoF models in terms of F- neighbors methods, called Mdl1, Mdl2, Mdl3, Mdl4, each of them measure is 39% on average for DB1 and 18% for DB2, while the selecting contextual neighborhoods in a different way. We also difference between Mdl1 and EPF is 21% on average for DB1 and compared the contextual neighbors methods Mdl1, Mdl2, Mdl3, 5% for DB2. In contrast, the Filter PoF dominates the modeling Mdl4 to identify the best performing one. Finally, we compare it to method. In particular the difference between Mdl1 and Filter PoF other approaches to CARS. This is the first step of a broader models in terms of F-Measure is 3% on average for DB1 and 28% research in which we want to compare the relative performance of for DB2. different contextual modeling approaches, including ours. These results mean that the performance of the CM approach (as Although Mdl1 slightly outperforms the others, we have shown represented by Mdl1) is very similar to that of the EPF method. that there are no relevant performance differences among them. This also implies, among other things, that CM is better than the a) F-MEASURE RECALL RMSE 0.6 0.7 0.5 0.45 0.5 0.6 EPF 0.4 EPF 0.35 Weight PoF Weight PoF 0.4 0.5 0.3 Filter PoF 0.25 Weight PoF Filter PoF 0.3 0.4 0.2 Mdl_1 Mdl_1 0.15 Mdl_1 Filter PoF 0.2 0.3 0.1 EPF b) F-MEASURE RECALL RMSE 0.55 0.8 0.8 0.7 0.7 0.45 EPF 0.6 EPF 0.6 0.35 0.5 Weight PoF Weight PoF Weight PoF 0.5 0.4 Mdl_1 Filter PoF Filter PoF 0.25 0.4 0.3 Filter PoF Mdl_1 Mdl_1 0.15 0.3 0.2 EPF Figure 4 Comparison between EPF, Weight PoF, Filter PoF and Mdl1 for (a) DB1 and (b) DB2. This result is not surprising because different ways of selecting recommender systems using a multidimensional approach. contextual neighborhood do not fundamentally change ACM T. Inform. Syst. 23, 1 (2005), 103-145. recommendation results. We have also compare Mdl1 with the [2] Adomavicius, G., and Tuzhilin, A. 2011. Recommender pre-, post- filtering and un-contextual methods developed in our Systems Handbook, Chapter 7, Springer. previous studies across various experimental settings, including [3] Dourish, P. 2004. What we talk about when we talk about two datasets, different levels of item aggregation, different context. Personal and Ubiquitous Computing 8, 19-30. neighborhood sizes, different contextual levels (K1 and K2) and [4] Herlocker, J.L., Konstan, J.A., Terveen, L.G., and Riedl, J.T. several performance measures (Precision, Recall, F-Measure, 2004. Evaluating collaborative filtering recommender MAE and RMSE). We have shown that Mdl1 dominates the systems. ACM Transaction on Information System 22, 5-53. traditional un-contextual approach and is comparable to the pre- [5] Huang, Z., Li, X., and Chen, H. 2005. Link prediction filtering method (EPF). We have also shown that Mdl1 dominates approach to collaborative filtering, In Proc. of the 5th some of the less advanced post-filtering methods (such as Weight ACM/IEEE-CS joint conference on Digital libraries, 141- PoF) but is inferior to the best post-filtering methods (such as 142. Filter PoF). Since identification and selection of the best post- [6] Karatzoglou, A., Amatriain, X., Baltrunas, L., and Oliver, N. filtering methods is a laborious process (as argued in [9]), this 2010. Multiverse recommendation: n-dimensional tensor means that contextual neighbors CM methods (such as Mdl1) have factorization for context-aware collaborative filtering, In their prominent place in the spectrum of various CARS Proc. of the fourth ACM conference on recommender recommendation methods: they are easy to implement, reasonably Systems, 79-86. well-performing and do not require expensive identification [7] Kwon, O, and Kim, J. 2009. Concept lattices for visualizing procedures, unlike the post-filtering methods. The main limit of and generating user profiles for context-aware service these results is the fact that we do not compare our contextual recommendations. Expert Systems with Applications 36, modeling approach to the existing ones. The reason is that we 1893-1902. only present the first step of the research. [8] Palmisano, C., Tuzhilin, A., and Gorgoglione, M. 2008. In a future work, we will present the comparison of the contextual Using Context to Improve Predictive Models of Customers neighbor to other CM approaches. In addition, we will use other in Personalization Applications. IEEE TKDE, 20(11), 1535- recommendation engines and other representations of the 1549. contextual variables different from the straightforward kNN and [9] Panniello, U., Tuzhilin, A., Gorgoglione, M., Palmisano C., the hierarchical representation of the context used in this paper. and Pedone, A. 2009. Experimental comparison of pre- vs. We will use other performance metrics, beyond those accuracy- post-filtering approaches in context-aware recommender based, such as the recommendations diversity, in order to better systems. In Proc. of RecSys ’09, 265-268. understand the impact of the different contextual approaches on [10] Resnick, P., Iacovou, N., Suchak, M., Bergstrom, P., and customers behavior. In future research steps we will also measure Riedl, J. 1994. GroupLens: an open architecture for the effect of different CM approaches on customers’ trust and on collaborative filtering of netnews. In Proc. of Conference on their actual purchases. Computer Supported Cooperative Work, 175-186. [11] Shi, Y., Larson, M., and Hanjalic, A. 2010. Mining mood- specific movie similarity with matrix factorization for 7. REFERENCES context-aware recommendation. In Proc. of the Workshop on [1] Adomavicius, G., Sankaranarayanan, R.,Sen, S., and Context-Aware Movie Recommendation, 34-40. Tuzhilin, A. 2005. Incorporating contextual information in