=Paper=
{{Paper
|id=None
|storemode=property
|title=Predicting Component Utilities for Linear-Weighted Hybrid Recommendation
|pdfUrl=https://ceur-ws.org/Vol-1271/Paper7.pdf
|volume=Vol-1271
|dblpUrl=https://dblp.org/rec/conf/recsys/VahedianB14
}}
==Predicting Component Utilities for Linear-Weighted Hybrid Recommendation==
Predicting Component Utilities for Linear-Weighted Hybrid Recommendation Fatemeh Vahedian & Robin Burke Center for Web Intelligence, Depaul University, Chicago, IL 60604 {fvahedia, rburke}@cs.depaul.edu ABSTRACT nesses, and other related elements. One obvious recommen- An effective technique for recommendation in social media dation task within Yelp is to recommend new businesses and other heterogeneous networks is the weighted hybrid to users, but there are a variety of others. Recommend- of low-dimensional components (WHyLDR). Recent stud- ing other users to befriend, recommending locations, and ies have shown this technique is comparable to other in- recommending categories of businesses are all user-focused tegrative approaches while being considerably more flexi- recommendation tasks. A site like Yelp may also be in- ble. One key issue for the implementation of a WHyLDR terested in recommending users to businesses for marketing system is the choice of components to generate. Research purposes. In addition, a user may wish to constrain the rec- has shown that the contribution of components based on ommendations in various ways: looking for a recommended different network paths varies in unexpected and domain- business in a particular category or in a particular location, dependent ways. This work examines an information theo- for example. retic technique for estimating component performance. Us- ing a real-world social media dataset, we show that this tech- nique is useful both for optimization (estimating component weights) and for determining which components to include in a hybrid. 1. INTRODUCTION Social media sites are an important element of today’s In- ternet, drawing millions of users each day. The wealth of information found in these sites makes recommender sys- tems essential. Such sites often must integrate recommen- dations of many types: recommending content, like-minded users or appropriate tags to name just a few possibilities. Figure 1: Yelp network schema Our approach, called the Weighted Hybrid of Low-Dimen- sional Recommenders (WHyLDR), is designed to support As in other work with heterogeneous networks [10], the WHy- the flexible creation and rapid deployment of a wide vari- LDR model views the network structure as a set of map- ety of recommenders in a heterogeneous environment. We pings, or projections from nodes to nodes: projA (n) → have demonstrated its effectiveness in prior work focusing {m0 , m1 , ..., mi } where A is a set of paths, n is a starting on social tagging systems [8, 7, 6, 3, 4]. node and the ms are nodes reachable from n via some path in A. Sets of paths that pass from one type of node to another A social media site can be viewed as a heterogeneous network over specific categories of edges are known as meta-paths. defined by a diversity of objects and relations. This diver- sity gives rise to a wide variety of recommendation tasks. Meta-paths can also be envisioned as paths through a net- For example, consider the popular social media site Yelp, work schema, an abstract view of a heterogeneous network which allows users to review and rate restaurants and other that shows the node types and the possible connections be- types of businesses.1 In Yelp, there are users, reviews, busi- tween them. Figure 1 shows the schema for part of the 1 www.yelp.com Yelp social network. In this network, a path from any user node to the businesses that individual has reviewed to the categories associated with that business would be a UBC metapath. Following such a path for any given user yields a type of user profile – in this case, a user represented as the categories of rated businesses. Using such profiles, we can build standard collaborative filtering components to se- lect neighbors of users and generate recommendations. Out of a collection of such meta-paths, we can build an ensem- Proceedings of the 6th Workshop on Recommender Systems and the Social Web (RSWeb 2014), collocated with ACM RecSys 2014, 10/06/2014, Foster ble of components, each capturing a different aspect of the City, CA, USA. Copyright held by the authors. network. While this model has proved successful in multiple social me- items and users are compared directly based on their dia settings, a key problem remains of how to control hybrid profiles (the rows in their respective matrices), and size. The set of components, like the set of possible meta- paths, is unbounded. We have found that in some settings • a non-personalized recommendation component based components built using longer paths can outperform those solely on item popularity. using shorter paths. There is therefore no simple way to choose a limited number of components and be sure of opti- mal performance. In this paper, we examine an information Table 1: Meta-paths for recommendation compo- gain metric that can be used to estimate the potential con- nents tribution of components. We show that this metric can be Type Meta-paths User-based UB, UBC, UBL, UBH, UBCB, UBLB, UBHB used in two ways: Item-based BU, BL, BC, BH, BLBC, BLBU, BUBU, BUBL Cosine UBC, UBH • to estimate the hybrid weights directly, and 3. CONTROLLING META-PATH • to filter components based on their expected contribu- tion to the hybrid. GENERATION There is no requirement that meta-paths be simple: nodes and edges can be revisited, as seen in components like kNNUBLB , 2. WEIGHTED HYBRID where the meta-path loops from businesses to locations and A weighted hybrid recommender is a system comprised of back to businesses again. Component generation could in multiple recommendation components, each of which re- theory continue indefinitely. However, there are significant turns a real-valued score for a combination of user and item. computational costs in generating components and in opti- The scores from all the components are combined in a weighted mizing a hybrid with a large number of components. More- sum [2]. More formally, over, some components will make only a minor contribution to recommendation performance. It is therefore important X to control this process. Ideally, we would like to be able s(u, i) = αj sj (u, i) to estimate in advance what components are likely to make j a substantial contribution to the hybrid and make an in- formed decision to trade off expected accuracy against the computational costs of additional components. where s(u, i) is the overall score computed for a user-item combination, sj (u, i) is the score computed by the jth com- We have developed a measure based on mutual information ponent, and αj is the weight associated with the jth compo- for each meta-path to estimate the utility of recommenda- nent. tion components. For a given two-dimensional projection AB, the mutual information can be calculated as The weights are learned through an optimization procedure as discussed below. The components are a function of the I(A, B) = H(A) − H(A|B) recommendation task and the structure of the network. where H(A) is entropy of dimension A and H(A|B) is the WHyLDR components are built from two-dimensional ma- conditional entropy. Entropy is defined as trices familiar to researchers in collaborative recommenda- X H(A) = − p(ai )log(p(ai )) tion [5]. A user-based matrix is one in which the rows are i users and the columns are the destination nodes for a given The entropy is therefore a function of the probability of oc- meta-path. Users are compared on the basis of their profiles currence of nodes in each dimension. In our networks, we and peer users form a neighborhood from which a target define probability of node ai from dimension A based on the user’s preferences for unknown items can be extrapolated.2 node degree: The experiments reported here are for the task of recom- Degree(ai ) p(ai ) = P mending businesses to users. Four types of recommendation i Degree(ai ) components are included in our Yelp model: The intuition behind this measure comes from results re- lated to random walks. As the length of a random walk • user-based KNN components constructed based on meta- approaches infinity, the probability of hitting a node con- paths starting from a user node, verges to its normalized degree [?]. • item-based KNN components from meta-paths start- Conditional entropy measures the uncertainty of one dimen- ing from a business node, sion given another dimension, computed by summing up in- dividual conditional probabilities. In our networks, we de- • cosine components in which two separate matrices are fine the conditional probability PM (bi |aj ) as the fraction of constructed (one for users and one for items) and then meta-paths of type M leaving node ai and arriving at bj out 2 All of the optimizations that have been applied to collabo- of all meta-paths of type M . rative recommenders can therefore be applied to the individ- ual WHyLDR components, for example, matrix factoriza- |aj −→ bi | M tion. We plan to study the properties of such optimizations PM (bi |aj ) = P |aj −→ bk | on our hybrids in future work. k M For example, consider the user-business-category meta-path 3.1 Methodology and associated recommendation component. The values for For the experiments reported here, we used the Yelp Aca- H(U ) and HU BC (U |C) can be calculated using the formu- demic Dataset, and followed the four fold cross validation las above. If these values were roughly the same then the methodology described in [4]. The α weights were learned IU BC (U, C) will be around zero. This suggests that the using Particle Swarm Optimization (PSO) [9] with recall meta-path does not add much information beyond what is as the optimization objective. In order to compare hybrid already contained in the U dimension and that the UBC model performance, we calculate recall and precision at list meta-path is unlikely to give rise to a useful recommenda- sizes from 1 to 10. These are averaged for each partition tion component. The same principle can be applied to any and then averaged over all partitions. We also calculate F1 user-based or item-based component. For the cosine com- at list size 10, the harmonic mean of precision and recall. ponents, we used the minimum information gain from either We also measured the diversity of the results returned by constituent meta-path. For the popularity component, we these recommendations, but omit these results for reasons used minimum information gain across all components. of space. In our previous experiments, we found that there was a sig- 4. RESULTS nificant correlation between the information gain associated Figure 3 shows the recall-precision curve for the top indi- with a meta-path and the learned α weight for a component vidual recommendation component (kNNUB ) and the four built using that meta-path [4]. In this work, we sought to hybrids. There are several points to note here. One is there build on that result by substituting our measure of meta- is a relatively small difference between the performance of path information for weights learned through optimization. hybrid with learned weights HM-1 (dashed line) and the hy- We use a simple normalization of the information gain mea- brid with weights estimated from our information gain mea- sure, so the contribution of ith component can be calculated sure HM-IGW (solid line with diamond marks) and indeed, as: some data points of the HM-IGW curve are above those of Inf ormationGain(i) the learned hybrid. The results for the thresholded hybrids αi = k (1) P (HM-T0.1 and HM-T0.2) are generally lower except at list Inf omationGain(j) size of 1 and list size of 10. j=1 Figure 2 shows a different perspective on these experiments with the F1 measure computed at a recommendation list size of 10. Again, the hybrid with the learned weights shows the top performance with the other hybrids about 6-7% below. What we find therefore is that there is a modest trade-off be- tween recommendation performance with a fully-optimized hybrid and with weights computed using information gain. As the number of components increases the dimensionality of the optimization space increases, making the optimization step more time-consuming and more prone to over-fitting. Information gain can be computed directly from the meta- path expansions used to create the recommendation compo- nents and so is essentially free. Figure 2: F1 values for each hybrid model We also see that information gain may be useful in con- Another way to use the information gain is as a filter con- trolling the size of the hybrid. HM-T0.2 has one third fewer trolling which components are incorporated into the hybrid. components than the full hybrid and approximately 6% lower In these experiments, we used two different thresholds (0.1 F1 performance. This suggests a process whereby meta- and 0.2) for rejecting components with low information gain. paths are generated and their information gain assessed be- Table 2 shows which components were dropped in each case: fore components are constructed, thus saving both off-line the only difference between the lower threshold and the optimization time and run-time computation. higher one was the exclusion of the BUBU component at the higher threshold. 5. CONCLUSIONS A key challenge in social media recommendation is to in- The results below report on four different configurations of tegrate the different types of information available in such the system: HM-1, a hybrid model including all the com- systems to enhance recommendations and to offer recom- ponents discussed in Section 2 with the weights learned mendations of multiple types. The WHyLDR approach has through optimization; HM-IGW, a hybrid including the same been demonstrated to be successful in both of these respects. components as HM-1 but with weights calculated as normal- As there are an unbounded number of possible components ized information gain; HM-T0.1 and HM-T0.2, hybrids with in a WHyLDR hybrid, the questions arise of how to choose learned weights but components with low information gain components for a hybrid and when to stop adding to it. filtered out using thresholds of 0.1 and 0.2, respectively. Ta- In this paper, we show that our information gain measure ble 2 shows which components were removed in each case. shows promise for controlling hybrid creation and possibly for estimating component weights. While the full hybrid with learned weights showed the strongest performance, the Table 2: Removed components at each threshold Threshold Components removed 0.1 UBH, UBHB, BLBC, BUBC, BUBL 0.2 Above, plus BUBU Figure 3: Recall vs. Precision versions where our information gain heuristic was applied of neighborhood-based recommendation methods. In offered comparable results at lower computational cost. F. Ricci, L. Rokach, B. Shapira, and P. B. Kantor, editors, Recommender Systems Handbook, pages One of the key problems unaddressed by this work is the in- 107–144. Springer, 2011. fluence of recommendation task. We know from prior work [6] J. Gemmell, T. Schimoler, B. Mobasher, and that, for most recommendation problems, the most strongly- R. Burke. Recommendation by example in social contributing component is the one that directly maps to the annotation systems. E-Commerce and Web recommendation task. For example, in this paper, kNNUB Technologies, pages 209–220, 2011. is the most important single component because we are rec- [7] J. Gemmell, T. Schimoler, B. Mobasher, and ommending businesses to users. If on the other hand, we R. Burke. Tag-based resource recommendation in were recommending locations, we would expect the kNNUBL social annotation applications. User Modeling, component to be a stronger contributor. This effect is not Adaption and Personalization, pages 111–122, 2011. captured by our information gain measure, which is a func- [8] J. Gemmell, T. Schimoler, B. Mobasher, and tion of the network structure alone. In future work, we will R. Burke. Resource recommendation in social examine additional recommendation tasks and try to deter- annotation systems: A linear-weighted hybrid mine how to modify our information gain measure to take approach. Journal of Computer and System Sciences, the task into account. We will also be experimenting with 78(4):1160–1174, 2012. other heterogeneous network data sets to understand the [9] J. Kennedy and R. C. Eberhart. Particle swarm generalizability of these results. optimization. In Proceedings of the IEEE International Conference on Neural Networks, pages 1942–1948, 6. REFERENCES 1995. [1] https://www.yelp.com/academic dataset. [10] Y. Sun and J. Han. Mining Heterogeneous Information [2] R. Burke. Hybrid recommender systems: Survey and Networks: Principles and Methodologies. Synthesis experiments. User Modeling and User-Adapted Lectures on Data Mining and Knowledge Discovery. Interaction, 12(4):331–370, 2002. Morgan & Claypool Publishers, 2012. [3] R. Burke and F. Vahedian. Social web recommendation using metapaths. In RSWeb@RecSys, 2013. [4] R. Burke, F. Vahedian, and B. Mobasher. Hybrid recommendation in heterogeneous networks. In Proceedings of the 22nd Conference on User Modeling, Adaptation and Personalization, 2014, to appear. [5] C. Desrosiers and G. Karypis. A comprehensive survey