=Paper=
{{Paper
|id=Vol-1705/06-paper
|storemode=property
|title=Improving the Accuracy of Latent Space Based Recommender Systems by Introducing a Cut-off Criterion
|pdfUrl=https://ceur-ws.org/Vol-1705/06-paper.pdf
|volume=Vol-1705
|authors=Ludovico Boratto,Salvatore Mario Carta,Roberto Saia
|dblpUrl=https://dblp.org/rec/conf/eics/BorattoCS16
}}
==Improving the Accuracy of Latent Space Based Recommender Systems by Introducing a Cut-off Criterion==
Improving the Accuracy of Latent-space-based Recommender Systems by Introducing a Cut-off Criterion Ludovico Boratto, Abstract Salvatore Carta, Recommender systems filter the items a user did not Roberto Saia evaluate, in order to acquire knowledge on the those that Department of Mathematics and might be suggested to her. To accomplish this objective, Computer Science they employ the preferences the user expressed in forms of University of Cagliari Via Ospedale 72, 09124, explicit ratings or of implicitly values collected through the Cagliari, Italy browsing of the items. However, users have different rating {ludovico.boratto, salvatore, behaviors (e.g., users might use just the ends of the rating roberto.saia}@unica.it scale, to expressed whether they loved or hated an item), while the system assumes that the users employ the whole scale. Over the last few years, Singular Value Decomposition (SV D ) became the most popular and accurate form of recommendation, because of its capability of working with sparse data, exploiting latent features. This paper presents an approach that pre-filters the items a user evaluated and removes those she did not like. In other words, by analyzing a user’s rating behavior and the rating scale she used, we capture and employ in the recommendation process only the items she really liked. Experimental results show that our form of filtering leads to more accurate recommendations. Copyright is held by the author/owner(s). Author Keywords EICS’16, June 21-24, 2016, Bruxelles, Belgium. Data Mining; Recommender Systems; User Profiling; Algorithms. 44 Introduction The problem that might arise is that if users have different A recommender system is designed to suggest items of behaviors (both when rating the items and when browsing possible interest to the users [24]. In order to generate the the Web), the system might consider as liked by a user an recommendations, different forms of data are employed by item with a high rating, but that actually represents the the different types of systems. Indeed, the two most lowest rating she gave (the same problem holds in the effective classes of systems, i.e., collaborative filtering and opposite scenario, in which a user only gives low ratings content-based approaches, respectively use (i) the ratings and her favorite item might be misclassified if considering given by the user to express a preference for the items and the system’s scale). (ii) the content of the items (e.g., their textual description). The intuition behind this paper is that, since SV D can Independently from the employed approach, user ratings (or detect latent spaces and work with sparse data, the implicitly collected values, like the number of seconds spent algorithm might benefit from receiving less but very while browsing an item’s characteristics) are the elements accurate information about what the users actually like. that allow a system to acquire knowledge on what the users like, or not. However, it is widely-known that users have In this work, we first show that users have different ratings different rating behaviors and that some of them do not use behaviors, then we propose an approach that calculates the the whole rating scale, but express only whether they love weighted average of the user ratings and leaves in the user or hate an item [21]. profile only the ratings greater or equal than this value, thus removing the other ones. By modeling the positive behavior At the moment, however, all the recommender systems of the users and understand what they actually like, our base their filtering on a unique scale of values. Therefore, if approach should lead to more accurate recommendations. a user is required to express a rating in a defined scale, the Note that this study is based on explicitly given ratings to system assumes that the rating behavior of the user covers facilitate its validation with a public dataset, but this the whole scale. Instead, if the system implicitly collects the technique can be applied, as is, to implicitly-collected data data, a fixed cut-off value is chosen, in order to determine if (e.g., by removing all the items that have been browsed for a user liked an item (e.g., Fastweb’s recommender system a number of seconds lower than the user’s average). collects a positive preference for a TV program if a user watches it for at least 5 minutes [5]). More formally, the problem statement is the following: It is widely-known that the recommendation form that Problem 1 We are given a set of users U = {u1 , . . . , uN }, generates the most accurate results is collaborative filtering a set of items I = {i1 , . . . , iM }, and a set R = [1, 5] of and, more specifically, it is Koren’s implementation of ratings used to express the user preferences. The set of all Singular Value Decomposition (SV D ), known as possible preferences expressed by the users is a ternary SV D + + [15]. The algorithm is able to find latent spaces, relation P ⊆ U × I × R. We also denote as based on the ratings expressed by the users, thus avoiding Iu = {i ∈ I|∃(u, i, r) ∈ P ∧ u ∈ U } the set of items in the problems such as sparsity and improving the efficiency of profile of a user u. Let SV DIu denote the fact that the SVD the algorithm. algorithm is run with the set of preferences Iu , ∀u ∈ U . Our 45 objective is to define a Weighted Cut-off Criterion (W CC ) R41 dataset. It contains a large amount of data related to able to generate a set Iˆu , which considers the rating scale users preferences expressed by the Yahoo! Movies employed by the user and removes from the set of items community that are rated on the base of two different positively evaluated by her (Iu ) those in the lower part of scales, from 1 to 13 and from 1 to 5 (we have chosen to use her scale. The goal of this paper is to show that the latter). The training data is composed by 7,642 users accuracy(SV DIˆu ) > accuracy(SV DIu ). (|U |), 11,915 movies/items (|I|), and 211,231 ratings (|R|), and all users involved have rated at least 10 items and all The contributions of our work are reported in the following: items are rated by at least one user. The test data is composed by 2,309 users, 2,380 items, and 10,136 ratings. • analysis of the user ratings of a real-world dataset, There are no test users/items that do not also appear in the aimed to show the non-coincidence of the range of training data. Each user in the training and test data is values adopted by the users to rate the evaluated represented by a unique ID. items, with that defined by the recommender system; • formalization of a Weighted Cut-off Criterion (W CC ) Figure 1 shows that the users express their ratings in able to remove from the user ratings those below the different ways with respect to the range of values allowed by weighted mean value of her preferences; the recommender system (in our case, we have R = [1, 5]). • evaluation of the proposed criterion, by comparing the performance of a state-of-the-art recommendation Since the users in the dataset are 7, 642, only about half of approach, before and after the W CC process applied them (52.51%, 4, 013 users) have given their opinion by to the user ratings. using the whole rating scale, while the others have used a different range of values. Indeed, these users can be mostly In the rest of this paper, we first show that users actually classified in three groups: 1, 319 users (17.25%) that used have different rating behaviors (Section “Analysis of the the range 3 ÷ 5 (i.e., by evaluating their worst experience Users’ Rating Behavior"), continuing by defining our with a minimum score of 3), 1, 315 users (17.20%) that used approach (Section “Approach"). Then we present the the range 2 ÷ 5 (i.e., by evaluating their worst experience results of the performed experiments (Section “Evaluation"), with a minimum score of 2), and 688 users (9.00%) that the literature related with our study (Section “Background expressed their opinion in the range 4 ÷ 5 (i.e., by keeping and Related Work"), concluding with some remarks an high rating in all their evaluations). (Section “Conclusions and Future Work"). These results clearly indicate that each user adopts Analysis of the Users’ Rating Behavior personal criteria of evaluation. For this reason, an effective In order to validate our intuition and understand if users exploitation of her preferences should take into account this actually have different rating behaviors or if they use the aspect. whole rating scale, in this section we are going to present the number of users who use a specific rating scale. The study has been performed on the Yahoo! Webscope 1 http://webscope.sandbox.yahoo.com 46 User 400 Profiles U sers (×10) Apply 200 WCC SVD Process 0 [1-2] [1-3] [1-4] [1-5] [2-3] [2-4] [2-5] [3-4] [3-5] [4-5] Range of V alues SVD Output Figure 1: Ranges of V alues in U ser Evaluations Figure 2: Approach Architecture Approach she gave. This value, called Weighted Ratings Average Given the fact that users have different rating behaviors, in (W RA), is calculated on the basis of the ratings of each this section we present an algorithm that detects the rating user u ∈ U , in the context of her profile i ∈ Iu , as shown in scale of a user and removes from the items she evaluated Equation 1. those under her average rating. The algorithm performs two main steps: P r i∈Iu • Weighted Cut-off Criterion. Calculation of the W RA(u) = (1) average value of the ratings of each user (Weighted | Iu | Ratings Average) and definition of a Weighted Cut-off Criterion (W CC ), to keep only the items with a rating Given the W RA of a user, we define the Weighted Cut-off above the user’s average. Criterion (W CC ) that allows us to filter the user ratings • Item Recommendation. The state-of-the-art r ∈ R in the profile Iu of a user u ∈ U . We perform this algorithm SV D is run with the items processed by operation for each item i ∈ Iu , according to the criterion the previous step. shown in Equation 2. The architecture of the proposed system is summarized in ( Figure 2. In the following, we will describe in detail how 0, if r < W RA(u) r= (2) each step works. r, otherwise Weighted Cut-off Criterion In order to understand which item a user actually likes, it is The output of this step is a set Iˆu , which contains the first necessary to identify the average rating, among those ratings processed through the WCC criterion. 47 This filtering process is summarized in Algorithm 1. proposal is the same one previously presented, i.e., Yahoo! Webscope (R4). Algorithm 1 Ratings f iltering Input: Iu =User profile Environment Output: Iˆu = Filtered user profile The environment is based on the Java language, with the 1: procedure F ILTERU SER P ROFILE(Iu ) 2: WRA=GetWeightedRatingsAverage(Iu ) support of the Mahout framework2 to implement the 3: for each i in Iu do state-of-the-art recommendation approach (i.e., SV D ) and 4: r =GetItemRating(i) 5: if r ≥ W RA then to perform the evaluation of the experimental results in 6: Iˆu ← (i, r) terms of accuracy. The experimental framework was 7: end if 8: end for developed by using a machine with an Intel i7-4510U, quad 9: Return Iˆu core (2 GHz × 4) and a Linux 64-bit Operating System 10: end procedure (Debian Jessie) with 4 GBytes of RAM. The algorithm takes as input (step 1) the profile Iu of a user Parameters Setup u ∈ U , and provides as output (step 9) this user profile Iˆu The optimal values of two of the three parameters needed filtered on the basis of the proposed W CC criterion. After to run the Mahout implementation of SV D (i.e., the the calculation of the Weighted Ratings Average (W RA) of regularization parameter lambda used to avoid overfitting the items in the user profile Iu , performed at the step 2, and the number of training steps) have been chosen from the step 3 to step 8 we extract the rating r of each item through a preliminary training (the selected values are, i ∈ Iu (step 4), by adding it to the set Iˆu , when the value of respectively, 0.05 and 10). The third parameter (i.e., the R is greater or equal the W RA value (from step 5 to step dimension of the feature space) was instead tested in a 6). The set Iˆu , that represents the user profile Iu filtered on range from 2 to 20, during the set of experiments aimed to the basis of the proposed criterion, is returned as output at evaluate the accuracy of the SV D recommender approach, the end of the process (step 9). before and after the use of our W CC approach. This is useful to test the effectiveness of the proposed approach Item Recommendation when the size of the latent space varies. This step runs the state-of-the-art SV D algorithm, by using as input the set Iˆu , for each user u ∈ U . In that way, the Metric algorithm only processes the items for which a user The accuracy of the performed recommendations was expressed an interest above the average. measured through the Root Mean Squared Error (RM SE ). This metric considers the test set and the predicted ratings Evaluation by comparing each rating rui , given by a user u for an item In this section, we first describe the environment and the i and available in the test set, with the rating pui predicted parameters setup, then we present the strategy and the by a recommender system. Its formalization is shown in the involved metric, concluding with the experimental results Equation 3, where n is the number of ratings available in the and their discussion. The dataset employed to validate our 2 http://mahout.apache.org/ 48 test set. SV DI u 1.2 SV D Îu RM SE v uPn 1.15 u (rui − pui )2 t i=0 1.1 RM SE = (3) n 1.05 Strategy 2 4 6 8 10 12 14 16 18 20 SV D F eature Space We evaluate our proposal through a comparative analysis, by considering the recommendations generated by the Figure 3: Recommendations Accuracy SV D approach, before and after the proposed filtering process, based on the W CC criterion. The comparisons have been made by measuring the results accuracy through system worsens when increasing the latent space, if has the well-known Root Mean Squared Error (RM SE ) metric, been used non-filtered user ratings. described in the previous Section “Metric". In order to guarantee the repeatability of the performed experiments, Results Summary according with the Mahout documentation we used in the The results obtained in this study first showed the existence Java code the instruction RandomUtils.useTestSeed(). The of the problem related with the different ways that the users evaluation process has been performed by using the adopt to evaluate the items, while the second illustrates how Mahout functionalities designed to perform this task the proposed W CC approach is able to improve the (RecommenderEvaluator Java class). performance of the state-of-the-art recommendation approach, SV D . Indeed, the results of the experiments We validate our proposal by running a set of experiments show us that the range of values that the users adopt during that measure the accuracy of the SV D recommendations, their evaluations, are in the half of the cases different from before and after the use of our W CC approach. that allowed by the system (Figure 1), and that a preliminary Experimental Results filtering of them by our Weighted Cut-off Criterion As shown in Figure 3, our approach gets better accuracy overcomes this problem, improving the accuracy of the values along almost all the considered SV D feature space recommendations (Figure 3). range. It means that a preliminary filtering of the user ratings leads toward a better performance in approaches of Background and Related Work recommendation such as SV D , which are strongly based In this section we briefly review some main concepts closely on this kind of information. We can observe how the related with the proposed work. RM SE values get worse when the latent space increases User Profiling. In the e-commerce environment the (SV DIu approach), while they remain stable in our case recommender systems play a determinant role, their first (SV DIˆu approach), showing that the accuracy of the implementations were based on the so-called Collaborative 49 Filtering approach [13, 14], which is based on the for each user u, a ranked list of items, and in literature many assumption that users have similar preferences on a item, if of them are focused on the rating prediction problem. The they already have rated other similar items [27]. An most effective strategies in this field exploit the so-called alternative approach is that defined as Content-based, latent factor models, but especially, the matrix factorization where the items to recommend are those whose content is techniques [16]. Other CF ranking-oriented approaches that similar to that of the items previously evaluated by the extend the matrix factorization techniques, have been user [19, 22]. The early systems used relatively simple recently proposed, and most of them use a ranking oriented retrieval models, such as the Vector Space Model, with the objective function, in order to learn the latent factors of basic TF-IDF weighting [4, 6, 18, 23], a spatial users and items [17]. Nowadays, the Singular Value representation of the textual description of the items, where Decomposition (SV D ) [10] approach and its Koren’s each of them is represented by a vector in a n-dimensional version SV D + + [15] are considered the best strategies in space, and each dimension is related to a term from the terms of accuracy and scalability. overall vocabulary of a specific document collection. User Ratings Reliability. The concept of bias, introduced There are several approaches to create user profiles: some in a recommender system process as noise in user ratings, of them focus on short-term user profiles that capture is well known in literature since 1995, when it was cited in a features of the user’s current search context [7, 11, 26], work aimed at discussing the concept of reliability of users while others accommodate long-term profiles that capture in terms of rating coherence [13]. Similar studies have been the user preferences over a long period of time [3, 8, 20]. As performed subsequently, such as that in [9], where shown in [28], compared with the short-term user profiles, hundreds of users evaluated a set of movies, randomly the use of a long-term user profile generally produces more selected, which they have already evaluated in the past, reliable results, at least when the user preferences are fairly with the result to show an incoherence in their evaluations in stable over a long time period. It should be noted that, the 40% of cases. All these studies lead toward the same regardless of the approach used to define the user profiles, problem that in literature is identified as magic barrier [12], almost all the state-of-the-art strategies take into account, a term used to define the theoretical boundary for the level as main source of information, the user ratings (i.e., the of optimization that can be achieved by a recommendation score given to the evaluated items by them), or by using algorithm on transactional data [25]. The evaluation models directly them, or by exploiting their latent characteristics assume as a ground truth that the transactions made in the (e.g., latent-factor-based [15]). past by the users, and stored in their profiles, are free of noise. This is a concept that has been studied in [2, 1], Latent Factor Models. The type of data with which a where a study aimed to capture the noise in a service that recommender system operates is typically a sparse matrix operates in a synthetic environment was performed. where the rows represent the users, and the columns represent the items. The entries of this matrix are the To the best of our knowledge, there are not studies aimed to interactions between users and items, in the form of ratings tackle the problem of the inconsistence in the user ratings, or purchases. The aim of a recommender system is to infer, when this issue derives from the different ways adopted by 50 the users to assign a rating to the evaluated items. The complexity. approach proposed in this work aimed at addressing the aforementioned problem in a twofold manner: first, it wanted Future work will study the relations between the range of to define a method able to operate with any type of profile values adopted by the users to express their opinion in (e.g., short-term or long-term profiles); second, it wanted to different domains, to model in a better way the preferences face the limitation related with the magic barrier problem, by in environments that sell different types of items. (e.g., a site removing from the user profiles all the ratings that do not such as Amazon3 , which sells different types of goods). reflect the user preferences in terms of weighted ratings Indeed, each type of item might be associated to a different average, i.e., those items that could represent a kind of rating behavior. This will allow us to generate more effective noise in the recommender process. recommendations. Conclusions and Future Work Acknowledgment The work presented in this paper wanted to highlight and This work is partially funded by Regione Sardegna under face a problem that rises when the users assign a rating to project NOMAD (Next generation Open Mobile Apps the evaluated items, by adopting a range of values that Development), through PIA - Pacchetti Integrati di could not cover the entire interval allowed by the system Agevolazione “Industria Artigianato e Servizi" (annualità with which they interact. Through the first experiment we 2013). showed the real existence of this problem, which has been faced by introducing a novel cut-off criterion (W CC ). This References criterion is based on a weighted ratings average value [1] Xavier Amatriain, Josep M Pujol, and Nuria Oliver. (W RA), and its effectiveness to improve the accuracy of a 2009a. I like it... i like it not: Evaluating user ratings recommender system at the state of the art, such a SV D , noise in recommender systems. In User modeling, has been demonstrated in the second experiment. adaptation, and personalization. Springer, 247–258. [2] Xavier Amatriain, Josep M Pujol, Nava Tintarev, and In summary, our contribution in this field is twofold: first, we Nuria Oliver. 2009b. Rate it again: increasing are able to improve the performance of a recommender recommendation accuracy by user re-rating. In system based on the user ratings (almost all of the Proceedings of the third ACM conference on state-of-the-art approaches); second, we are also able to Recommender systems. ACM, 173–180. reduce the computational load of the recommendation [3] Fabio A Asnicar and Carlo Tasso. 1997. ifWeb: a process, thanks to the removal of certain items from the prototype of user model-based intelligent agent for user profiles (i.e., those with a rating less than W RA document filtering and navigation in the world wide value). It should also be noted how the proposed approach web. In Sixth International Conference on User leads toward a considerable improvements of the accuracy Modeling. 2–5. of the recommendations, without the needing to adopt [4] Marko Balabanović and Yoav Shoham. 1997. Fab: sophisticated mathematical criteria to preprocess the user content-based, collaborative recommendation. ratings, with a great advantage in terms of computational 3 http://www.amazon.com/ 51 Commun. ACM 40, 3 (1997), 66–72. 2002), 116–131. [5] Riccardo Bambini, Paolo Cremonesi, and Roberto [12] Jonathan L. Herlocker, Joseph A. Konstan, Loren G. Turrin. 2011. A Recommender System for an IPTV Terveen, and John Riedl. 2004. Evaluating Service Provider: a Real Large-Scale Production collaborative filtering recommender systems. ACM Environment. In Recommender Systems Handbook, Trans. Inf. Syst. 22, 1 (2004), 5–53. Francesco Ricci, Lior Rokach, Bracha Shapira, and [13] Will Hill, Larry Stead, Mark Rosenstein, and George Paul B. Kantor (Eds.). Springer, 299–331. Furnas. 1995. Recommending and evaluating choices [6] Daniel Billsus and Michael J Pazzani. 1999. A hybrid in a virtual community of use. In Proceedings of the user model for news story classification. Springer. SIGCHI conference on Human factors in computing [7] Jay Budzik and Kristian J. Hammond. 2000. User systems. ACM Press/Addison-Wesley Publishing Co., Interactions with Everyday Applications As Context for 194–201. Just-in-time Information Access. In Proceedings of the [14] George Karypis. 2001. Evaluation of item-based top-n 5th International Conference on Intelligent User recommendation algorithms. In Proceedings of the Interfaces (IUI ’00). ACM, New York, NY, USA, 44–51. tenth international conference on Information and [8] Paul Alexandru Chirita, Wolfgang Nejdl, Raluca Paiu, knowledge management. ACM, 247–254. and Christian Kohlschütter. 2005. Using ODP [15] Yehuda Koren. 2008. Factorization meets the metadata to personalize search. In Proceedings of the neighborhood: a multifaceted collaborative filtering 28th annual international ACM SIGIR conference on model. In Proceedings of the 14th ACM SIGKDD Research and development in information retrieval. international conference on Knowledge discovery and ACM, 178–185. data mining. ACM, 426–434. [9] Dan Cosley, Shyong K Lam, Istvan Albert, Joseph A [16] Yehuda Koren, Robert M. Bell, and Chris Volinsky. Konstan, and John Riedl. 2003. Is seeing believing?: 2009. Matrix Factorization Techniques for how recommender system interfaces affect users’ Recommender Systems. IEEE Computer 42, 8 (2009), opinions. In Proceedings of the SIGCHI conference on 30–37. Human factors in computing systems. ACM, 585–592. [17] Yehuda Koren and Joe Sill. 2011. OrdRec: an ordinal [10] Michael J. Pazzani Daniel Billsus. 1998. Learning model for predicting personalized item rating Collaborative Information Filters. In Proceedings of the distributions. In Proceedings of the fifth ACM Fifteenth International Conference on Machine conference on Recommender systems. ACM, Learning (ICML 1998), Madison, Wisconsin, USA, July 117–124. 24-27, 1998, Jude W. Shavlik (Ed.). Morgan [18] Henry Lieberman and others. 1995. Letizia: An agent Kaufmann, 46–54. that assists web browsing. IJCAI (1) 1995 (1995), [11] Lev Finkelstein, Evgeniy Gabrilovich, Yossi Matias, 924–929. Ehud Rivlin, Zach Solan, Gadi Wolfman, and Eytan [19] Pasquale Lops, Marco De Gemmis, and Giovanni Ruppin. 2002. Placing Search in Context: The Semeraro. 2011. Content-based recommender Concept Revisited. ACM Trans. Inf. Syst. 20, 1 (Jan. systems: State of the art and trends. In Recommender systems handbook. Springer, 73–105. 52 [20] Zhongming Ma, Gautam Pant, and Olivia R. Liu Plumbaum, Sahin Albayrak, and Christian Scheel. Sheng. 2007. Interest-based Personalized Search. 2012. Estimating the magic barrier of recommender ACM Trans. Inf. Syst. 25, 1, Article 5 (Feb. 2007). systems: a user study. In Proceedings of the 35th [21] Xia Ning, Christian Desrosiers, and George Karypis. international ACM SIGIR conference on Research and 2015. A Comprehensive Survey of development in information retrieval. ACM, 1061–1062. Neighborhood-Based Recommendation Methods. In Recommender Systems Handbook, Francesco Ricci, [26] Xuehua Shen, Bin Tan, and ChengXiang Zhai. 2005. Lior Rokach, and Bracha Shapira (Eds.). Springer, Implicit user modeling for personalized search. In 37–76. Proceedings of the 14th ACM international conference [22] Michael J Pazzani and Daniel Billsus. 2007. on Information and knowledge management. ACM, Content-based recommendation systems. In The 824–831. adaptive web. Springer, 325–341. [27] Xiaoyuan Su and Taghi M Khoshgoftaar. 2009. A [23] Michael J Pazzani, Jack Muramatsu, Daniel Billsus, survey of collaborative filtering techniques. Advances and others. 1996. Syskill & Webert: Identifying in artificial intelligence 2009 (2009), 4. interesting web sites. In AAAI/IAAI, Vol. 1. 54–61. [28] Dwi H Widyantoro, Thomas R Ioerger, and John Yen. [24] Francesco Ricci, Lior Rokach, and Bracha Shapira. 2001. Learning user interest dynamics with a 2015. Recommender Systems: Introduction and three-descriptor representation. Journal of the Challenges. In Recommender Systems Handbook. American Society for Information Science and Springer US, 1–34. Technology 52, 3 (2001), 212–225. [25] Alan Said, Brijnesh J Jain, Sascha Narr, Till 53