Improving Sparsity Problem in Group Recommendation Sarik Ghazarian‡ , Nafiseh Shabib‡† , Mohammad Ali Nematbakhsh‡ ‡ University of Isfahan, ‡† Norwegian University of Science and Technology sarikghazarian@yahoo.com, shabib@idi.ntnu.no, mnematbakhsh@eng.ui.ir ABSTRACT ments are presented in section 3. Section 4 provides some Group recommendation systems can be very challenging when background and formalism. We conclude in section 5. the datasets are sparse and there are not many available rat- ings for items. In this paper, by enhancing basic memory- 2. DATA MODEL AND RECOMMENDATION based techniques we resolve the data sparsity problem for ALGORITHM users in the group. The results have shown that by con- We assume a set of users U = {u1 , . . . , un } out of which ducting our techniques for the users in the group we have a any ad-hoc group G ⊆ U can be built. We consider a set higher group satisfaction and lower group dissatisfaction. I = {i1 , i2 , . . . , im } with m items. Keywords 2.1 Item-Item Similarity The basic component of proposed method is a machine sparsity, group recommendation, collaborative filtering learning regression method called Support Vector Machine (SVM) which is used for calculating similarities between 1. INTRODUCTION items [26]. SVM is a supervised learning technique, which Recommendation systems (RSs) are tools and techniques, learns the function that is produced from input data in the which provide suggestions for items to be used by users. best manner. It uses the built-in function to give appropriate They generally directed towards helping users for finding output for an input data [26]. The input data pairs are as fol- items that are likely interested in the overwhelming number lows: (x1 , y1 ), ..., (xi , yi ). The xi is a record in d dimensional of items and they try to predict the most suitable products space and yi is a real value. SVM tries to find f (x) function or services, based on the users’ preferences and constraints. which approximates the relations between data points [20]. However, even active users have rated just a few items of The target function has two types: linear and nonlinear. In the total number of available items in a database and re- linear regression the relationships between input and output spectively, even popular items have been rated by only a few data points are linear and their relationships can be approx- number of total available users in the database. This prob- imated by a straight line. The linear function is computed lem, commonly referred as a sparsity problem [17]. Different as equation 1. approaches have been proposed in the research literature fo- f (x) = w.x + b (1) cusing on Sparsity problem for single user recommendations , where w ∈ X, X is the input space and b is a real value [21, 24]. However, as far as we know, this is the first work [20]. presenting a complete model for group recommendations, In nonlinear case SVM preprocesses input data. It uses non- which resolving sparsity problem for a group. In general, linear mapping function (ϕ → ρ) which maps data from sparsity has a major negative impact on the effectiveness of input space to the new feature space ρ. After this map- a collaborative filtering approach and especially on group ping action, the standard linear SVM regression algorithm recommendation. The main challenge behind group scenar- is applied in the new higher feature space. The dot product ios has been that of computing recommendations from a between data points in higher dimensional feature is called potentially diverse set of group members’ ratings in a sparse kernel function [23]. Equation 2 shows this function. situations. In this work, we studied sparsity problem in the group recommendation. First, we formalize the problem of K(x, x0 ) = ϕ(x).ϕ(x0 ) (2) sparsity in the group recommendation and use our model There are different kernel functions like linear, polyno- for aggregating user rating in a group. Second, we run an mial, radial basis function (RBF), and Pearson VII Universal extensive set of experiments with different group sizes and Kernel (PUK) [23]. In our proposed method PUK function different group cohesiveness on Millions of Song data set. has been used for modeling the similarities between items, Our experiments exhibit that in the most cases the group because it had higher accuracy than other functions. satisfaction in our proposed model is higher and the group dissatisfaction is lower than the previous models, which does not take into account sparsity. 1 PUK : k(x, x0 ) = " r 2 # (3) The rest of paper is organized as follows: Section 2 describes 2 q ω 2 (1) kx−x0 k2 2 ω −1 the sparsity problem for a group and we propose a complete 1+ σ  model for sparsity in the group recommendation. Experi- 2.2 Listen Count for similar items. The algorithms in our work are based on explicit feed- Two users are dissimilar, if they have rated two dissimilar back from users; subsequently there is a need to normalize items. the listening counts to a predefined scale so that the algo- Given a group G, the similarity of each user u ∈ G is denoted rithms can work optimally. In the [11], they modified basic as: latent factor model to convert implicit ratings to the explicit ui|r vj−r | Σ∀i∈Ru T ∀j∈Rv (1 − rmax −rmin ) × ItemSimij ones. Similarly to the approach taken [11], a boolean vari- U serSimuv = able (pui ) shows the user’s interest on an item ( equation 4 Σ∀i∈Ru T ∀j∈Rv |ItemSimij | ). If a user has listened to a song (lui ), its boolean variable’s (8) value is 1 otherwise it is 0. Thus, implicit data do not in- , Ru = {i|rui 6= 0}, Rv = {j|rvj 6= 0} , and rmax and rmin dicate users’ preferences, rather they show confidence (cui ) are maximum and minimum possible values of the ratings. about users’ preferences and there is a direct relationship be- Note that, ItemSimij is equal to similarity values between tween confidence value and the number of times that each items i and j which is calculated by the SVM regression user has listened to a song (equation 5). The relationship is model that has been explained in the 2.1. controlled by constant α.  2.4.2 User-Item Relevance 1 if lui > 0 Given a group G, the relevance of a user u ∈ G for an item pui = (4) 0 if lui = 0 i ∈ I is denoted as: Σv∈U (rvi0 − r̄v ) × U serSimuv cui = 1 + αlui (5) Relui = r̄u + (9) Σv∈U |U serSimuv | By these alternations, the equation of latent factor model modified as equation 6. This equation is a least square op- , where i0 is the most similar item to i that user v has rated. timization process by considering user factors (pu ) or item Thus, by considering i0 in the relevance function, it is not factors (qi ) to be fix in each step. After finding user factors required to take into account just the users who have rated and item factors, their dot products show the users’ explicit the same item, but it considers all ratings given by users, ratings on items. and we can use ratings of other most similar items to the X target item to fill in the sparseness. min ∗ ∗ = cui (rui − qiT pu )2 + λ(kqi k2 + kpu k2 ) (6) q ,p rui is known 2.4.3 Group Relevance The preference of an item i by a group G, denoted as 2.3 Sparsity Calculation Grel (G, i), is an aggregation over the preferences of each The sparsity value was computed as follows: group member for that item. We consider two main aggre- The ratio of specified ratings of items in the initial user- gation strategies: item matrix to the whole specified and not specified items’ Average ratings. Σu∈G Relui Grel(G, i) = (10) Num.of specified ratings |G| SparsityV alue = (7) Num.of all possible ratings Least Misery 2.4 Group Modeling Grel(G, i) = minu∈G (Relui ) (11) We define the following hypothesis: The relevance between a group and an item i is only dependent on the relevance of 2.5 Group Satisfaction i to individual members of the group. Using this hypothesis, To evaluate our methods accuracy in group recommenda- we derive the following definition that not only includes the tion process, we used group satisfaction metric [5]. preferences of individual users but also integrates the users This metric is the average of all group members’ satisfaction preferences when they are in a group while recommending a for recommended items set of items. Σu∈U U sat Gsat = (12) 2.4.1 User-User Similarity |G| The major goal of this component is to overcome the User’s satisfaction is shown as U sat(u) which is calculated: weakness of Pearson’s correlation method in the sparsity situation. The Pearson’s correlation is limited to the joint Σki=1 Relui items in both users’ preference lists. In a random group set- U sat = (13) k ∗ M ax(Relui ) ting, the collections of common items between users are very small, so comparing users based on very few items leads to , where Relui is user preference on item, k is the number of lower accuracy [8, 19]. To solving this problem, the idea of items, and M ax(Relui ) is maximum preference value of user proposed method is to compare all items rated by one user u for all items. with all items in another user in the group, one by one. In other words, our method involves all possible combination 2.6 Group DisSatisfaction of items in preference lists of both users. Equation 8 demon- To evaluate our methods in group recommendation pro- strates the idea. The basic part of this equation is based on cess, we also used group dissatisfaction metric [13]. This our conception of similar and dissimilar users: metric is the fraction of dissatisfied users whose satisfaction Two users are considered similar, if they have close ratings measures were less than a threshold. In our case we consider the threshold equals to 0.6. differences. Then we used these new records to create SVM model for predicting similarities between songs. |U| GdisSat = (14) |G| 3.1.1 Item-Item similarity results , where u|U sat < 0.6 (equation 13) For computing SVM model’s accuracy, mean absolute er- ror (M AE) [25] values of different regression models were compared by using Waikato Environment for Knowledge 3. EXPERIMENTS Analysis (WEKA) software tool. All parameters in different We have shown after solving sparsity problem for each methods were tested. In all SVM methods with different single user in the group, we have a higher group satisfaction kernel functions like P U K, RBF , normalizedP olyKernel, and lower group dissatisfaction. and polyKernel, the P U K kernel function with σ = 1 and Dataset description: In this section, we evaluate our ω = 1 had the minimum and best M AE value. Figure 1 il- method with Million Song Dataset (MSD)1 , in the music rec- lustrates different MAE values for different regression meth- ommendation scope. The Million Song Dataset (MSD) is a ods in WEKA. Therefore, in our work PUK function has collection of music audio features and metadata that has cre- been used for modeling the similarities between items. ated to support research into industrial-scale music informa- tion retrieval. It is freely-available collection of meta data for one million of contemporary songs such as song title, artist, publication year, audio features, and much more [14]. In ad- dition, The MSD is a cluster of complementary datasets con- tributed by the community: SecondHandSongs dataset for cover songs, musiXmatch dataset for lyrics, Last.fm dataset2 for song-level tags and similarity, and Taste Profile subset for user listening history data. Comprising several comple- mentary datasets that are linked to the same set of songs, the MSD contains extensive meta-data, audio features, song- level tags, lyrics, cover songs, similar artists, and similar songs. In this work, we have used information about song’s features such as title, release, artist, duration, year, song- Figure 1: MAE value for different regression meth- hotness, songs similarity, users listening history, and song’s ods tags. In addition to this information, we have information about song tags and its degrees in Last.fm dataset, which the tag’s degree shows how much the song is associated to a 3.2 User Collection Phase particular tag. In our work, for each song we consider three We selected subset of users to provide their music prefer- main tags. ences. Later, those users are used to form different groups We implemented our prototype system using Java and for and perform judgments on group recommendations. For this computing SVM model’s accuracy we used WEKA3 . aim, we selected those users who have at least listened to fif- teen songs in our dataset. As mentioned in previous section, 3.1 Item-Item Similarity MSD contains listening history of users, which shows the In order to use similarity data between songs and create number of times each user has listened to a particular song. SVM regression model, we needed to prepare suitable data, Thus, preferences have been expressed in implicit format. preprocessing, for training process as follows: song, release, This format is not equivalent to explicit one, which shows the artist, term1, term2, term3, song-hotness, duration, year, exact preferences of users. Since, the user-based and item- similarity-degree. based collaborative filtering (CF) approaches have been de- In MSD, about half of songs have at least one tag. In this signed for explicit ratings, conversion of implicit feedbacks to research for each song, its three most relevant tags were con- explicit ones was essential. In order to achieve explicit one, sidered. If a song didn’t have three relevant tags, remaining we have used latent factor model with some alternations as tags were filled with the highest one. Similarity-degree is an proposed in the Listen Count in the previous part. integer attribute in [0, 1] interval. 1 shows the most simi- 3.3 Group Formation lar songs and conversely 0 is used for dissimilar songs. In SVM model each record should be represented as a point We considered two main factors in forming user groups i.e. in input space. To achieve this purpose similarity based group size, group cohesiveness [2]. We hypothesized that functions have been used [10]. For computing similarity varying group sizes will impact to the group satisfaction. between string attributes, Jaro-Winkler method has been We chose three group sizes, 3, 5, and 10, representing small, used, which gives 1 to most similar items and 0 to dissimi- medium, and large groups, respectively. Similarly, we as- lar ones. For terms, we used similarity function of nominal sumed that group cohesiveness (i.e., how similar are group attributes. After computing similarity between correspond- members in their music tastes) is also a significant factor ing pairs of attributes, each record came in form: title-dif, in their satisfaction with the group recommendation. As release-dif, artist-dif, term-dif, song-hotness-dif, duration- a result, we chose to form three kinds of groups: similar, dif, year-dif, similarity-degree The ”dif” suffix stands for the dissimilar, and random. 1 http://labrosa.ee.columbia.edu/millionsong 3.4 Result Interpretation 2 http://last.fm After predicting unknown items’ score in all users’ pref- 3 http://www.cs.waikato.ac.nz/ml/weka erence lists, it is essential to aggregate users’ preferences to make recommendation for a group. For this purpose, we TV programs [27]. Group is defined as two or more individu- used basic methods (average and least misery) and recom- als who are connected to one another. A group can range in mended k items with highest values. To evaluate our method size from two members to thousands of members. A group in the group recommendation process, we used group satis- may be formed at any time by a random number of people faction and dissatisfaction metrics. The reason that we used with different interests, a number of persons who explicitly group dissatisfaction metric is observing how the algorithm choose to be part of a group, or by computing similarities performs when we have dissatisfied members in the group. between users with respect to some similarity functions and Note that, the sparsity value for each group is the following then cluster similar users together [15, 2]. There are two numbers. dominant strategies for groups: (1) aggregation of individ- Similar group : G3=0.31 G5=0.55 G10=0.77 ual preferences into a single recommendation list or (2) ag- Dissimilar group : G3=0.52 G5=0.68 G10=0.80 gregation of individual recommendation lists to the group Random group : G3=0.58 G5=0.72 G10=0.84 recommendation list [2, 3]. In other words, the first one cre- ates a pseudo user for a group based on its group members 3.4.1 Varying Group size and then makes recommendations based on the pseudo user, We examined the effect of different group sizes on group while the second strategy computes a recommendation list satisfaction/dissatisfaction in Figure 2. The number of rec- for each single user in the group and then combines the re- ommended items is fixed 10 and the group sizes varies be- sults into the group recommendation list. tween 3, 5, and 10 members. As we can see in Figure 2, in However, in the both approaches we may faced the sparsity the similar groups, the group satisfaction remains the same problem. Sparsity is one of the major problems in memory- even though the number of people in each group is increas- based CF approaches [22]. In sparseness conditions most ing. In addition, in most of cases our algorithm has higher cells of user-item matrix are not rated. The reason is that group satisfaction in both average and least misery meth- users may not willing to provide their opinions and prefer- ods in compare of CF method, which does not take into ences and they do this only when it is necessary [7]. In these account sparsity. Additionally, with increasing the group type of matrices, the accuracy of calculated predictions by sizes the sparsity value is increasing, but our algorithm per- applying memory-based CF approaches is low, since there forms fairly constant. Moreover, the result shows that in the are not enough information about user ratings [12]. Lately, dissimilar and random groups we have lower dissatisfaction. Ntoutsi applied user-based CF approach in order to predict unknown ratings [16]. For this, they partitioned users in 3.4.2 Varying Top-k to clusters. Then for predicting a particular item’s rating We examined the effect of different recommendation items for a user, they considered just the ones in the cluster of (Top-k= 5,10,15, and 20) on group satisfaction/dissatisfaction target user instead of all users in dataset. They calculated in Figure 3. The group size is fixed 10. The result shows the relevancy of an item to a user based on the relevancy of that with increasing the number of items, the group satis- that item to similar users in the target user’s cluster. More- faction is decreasing in all the groups but it decreases more over, they involved a support score in prediction process to in the similar and dissimilar groups than random groups. In be shown how many users in the cluster have rated that general, our method has a higher group satisfaction in com- item. Because of using memory-based approaches as basis, pare of CF method. Also, the result shows that, we have less this approach also cannot be used in sparse data situations. dissatisfaction when we applied Average as an aggregation Chen et al. proposed a method which predicts each item’s method and we have less dissatisfaction in our method. group rating by considering its similar items that have been rated by whole group or by most subgroups [4]. For this 3.4.3 Varying Group Cohesiveness aim, first they applied collaborative filtering technique and We examined the effect of different group cohesiveness on find each user’s preferences on that item and then used ge- group satisfaction/dissatisfaction in Figure 4. Group co- netic algorithm according to subgroups’ ratings to achieve hesiveness varies between similar group (similarity between the item’s overall score. However, our main focus in this re- members >0.5), dissimilar (similarity between members <0.5) search is on sparsity problem in users’ preference lists, Chen and random members. The number of recommended item et al. worked on sparsity problem in groups’ ratings, for this is fixed 10. Our observation showed that for small groups, reason they could use collaborative filtering in their calcula- group satisfaction is very close to each other in different tions. techniques, but in the random groups we can see noticeably change in the group satisfaction between CF and our pro- 5. CONCLUSION posed method that takes into account sparsity. In addition, We formalize the problem of sparsity in the group recom- the result shows that in the dissimilar and random group mendation and use our model for aggregating user rating for our method has a lower dissatisfaction. the group. In this work, we proposed a new method that overcomes the weakness of basic memory-based approaches 4. RELATED WORK in sparsity. We evaluated our method in sparse cases and Research on recommendations is extensive. Typically, rec- compared it with prior methods. The results show that in ommendation approaches are distinguished between: content- sparse matrices our proposed method has better group sat- based, collaborative filtering, and hybrid [1]. Recently, there isfaction and lower group dissatisfaction than basic CF. In are also approaches focusing on group recommendations. addition, in conditions where user-based approach can be Group recommendation aims to identify items that are suit- run, our proposed method performs better. In the future, able for the whole group instead of individual group mem- we plan to peruse the accuracy of our proposed method in bers. Group recommendation has been designed for various other less been paid fields like TV programs, books and im- domains such as news pages [18], tourism [9], music [6], and ages, and we want to investigate our research in the big Figure 2: Comparison of group satisfaction and group dissatisfaction with varying group size Figure 3: Comparison of group satisfaction and group dissatisfaction with varying Top-k Figure 4: Comparison of group satisfaction and group dissatisfaction with varying group cohesiveness groups. [15] E. Ntoutsi, K. Stefanidis, K. Nørvåg, and H.-P. Kriegel. Fast group recommendations by applying user clustering. In Proceedings of the 31st International 6. REFERENCES Conference on Conceptual Modeling, pages 126–140. [1] G. Adomavicius and A. Tuzhilin. Toward the next Springer-Verlag, 2012. generation of recommender systems: A survey of the [16] E. Ntoutsi, K. Stefanidis, K. Nørvåg, and H.-P. state-of-the-art and possible extensions. IEEE Kriegel. Fast group recommendations by applying user Transactions on Knowledge and Data Engineering, clustering. In ER, pages 126–140, 2012. 17(6):734–749, 2005. [17] M. Papagelis, D. Plexousakis, and T. Kutsuras. [2] S. Amer-Yahia, S. B. Roy, A. Chawlat, G. Das, and Alleviating the sparsity problem of collaborative C. Yu. Group recommendation: semantics and filtering using trust inferences. In Proceedings of the efficiency. Proc. VLDB Endow., pages 754–765, 2009. Third International Conference on Trust Management, [3] S. Berkovsky and J. Freyne. Group-based recipe iTrust’05, pages 224–239, Berlin, Heidelberg, 2005. recommendations: Analysis of data aggregation Springer-Verlag. strategies. In Proceedings of the Fourth ACM [18] S. Pizzutilo, B. De Carolis, G. Cozzolongo, and Conference on Recommender Systems, pages 111–118, F. Ambruoso. Group modeling in a public space: 2010. Methods, techniques, experiences. In Proceedings of [4] Y.-L. Chen, L.-C. Cheng, and C.-N. Chuang. A group the 5th WSEAS International Conference on Applied recommendation system with consideration of Informatics and Communications, AIC’05, pages interactions among group members. Expert Syst. 175–180, Stevens Point, Wisconsin, USA, 2005. World Appl., 34(3):2082–2090, 2008. Scientific and Engineering Academy and Society [5] I. A. Christensen and S. N. Schiaffino. Entertainment (WSEAS). recommender systems for group of users. Expert Syst. [19] B. M. Sarwar, G. Karypis, J. A. Konstan, and J. T. Appl., 38(11):14127–14135, 2011. Riedl. Application of dimensionality reduction in [6] A. Crossen, J. Budzik, and K. J. Hammond. Flytrap: recommender systems: A case study. In WebKDD Intelligent group music recommendation. In Workshop at the ACM SIGKKD, 2000. Proceedings of the 7th International Conference on [20] A. J. Smola and B. Schölkopf. A tutorial on support Intelligent User Interfaces, IUI ’02, pages 184–185, vector regression. Statistics and Computing, New York, NY, USA, 2002. ACM. 14(3):199–222, Aug. 2004. [7] L. N. Dery. Iterative voting under uncertainty for [21] X. Su, T. M. Khoshgoftaar, X. Zhu, and R. Greiner. group recommender systems (research abstract). In Imputation-boosted collaborative filtering using J. Hoffmann and B. Selman, editors, AAAI. AAAI machine learning classifiers. In Proceedings of the 2008 Press, 2012. ACM Symposium on Applied Computing, SAC ’08, [8] A. Eckhardt. Similarity of users’ (content-based) pages 949–950, New York, NY, USA, 2008. ACM. preference models for collaborative filtering in few [22] X. Su, T. M. Khoshgoftaar, X. Zhu, and R. Greiner. ratings scenario. Expert Syst. Appl., Imputation-boosted collaborative filtering using 39(14):11511–11516, Oct. 2012. machine learning classifiers. In SAC, pages 949–950, [9] I. Garcia, L. Sebastia, and E. Onaindia. On the design 2008. of individual and group recommender systems for [23] B. Ustun, W. Melssen, and L. Buydens. Facilitating tourism. Expert Syst. Appl., 38(6):7683–7692, June the application of support vector regression by using a 2011. universal pearson vii function based kernel. 2006. [10] J. Han, M. Kamber, and J. Pei. Data Mining: [24] J. Wang, A. P. de Vries, and M. J. T. Reinders. Concepts and Techniques. Morgan Kaufmann Unifying user-based and item-based collaborative Publishers Inc., San Francisco, CA, USA, 3rd edition, filtering approaches by similarity fusion. In E. N. 2011. Efthimiadis, S. T. Dumais, D. Hawking, and [11] Y. Hu, Y. Koren, and C. Volinsky. Collaborative K. Järvelin, editors, SIGIR. ACM, 2006. filtering for implicit feedback datasets. In Proceedings [25] G.-R. Xue, C. Lin, Q. Yang, W. Xi, H.-J. Zeng, Y. Yu, of the 2008 Eighth IEEE International Conference on and Z. Chen. Scalable collaborative filtering using Data Mining, ICDM ’08, pages 263–272, Washington, cluster-based smoothing. In R. A. Baeza-Yates, DC, USA, 2008. IEEE. N. Ziviani, G. Marchionini, A. Moffat, and J. Tait, [12] Z. Huang, H. Chen, and D. D. Zeng. Applying editors, SIGIR, pages 114–121. ACM, 2005. associative retrieval techniques to alleviate the [26] H. Yu and S. Kim. Svm tutorial — classification, sparsity problem in collaborative filtering. ACM regression and ranking. In G. Rozenberg, T. Bäck, and Trans. Inf. Syst., 22(1):116–142, 2004. J. Kok, editors, Handbook of Natural Computing, [13] J. K. Kim, H. K. Kim, H. Y. Oh, and Y. U. Ryu. A pages 479–506. Springer Berlin Heidelberg, 2012. group recommendation system for online communities. [27] Z. Yu, X. Zhou, Y. Hao, and J. Gu. Tv program Int. J. Inf. Manag., 30(3):212–219, June 2010. recommendation for multiple viewers based on user [14] B. McFee, T. Bertin-Mahieux, D. P. Ellis, and G. R. profile merging. User Model. User-Adapt. Interact., Lanckriet. The million song dataset challenge. In 16(1):63–82, 2006. Proceedings of the 21st International Conference Companion on World Wide Web, WWW ’12 Companion, pages 909–916. ACM, 2012.