Social Web Recommendation using Metapaths Robin Burke and Fatemeh Vahedian Center for Web Intelligence DePaul University 243 S. Wabash Ave Chicago,IL (rburke, fvahedia)@cs.depaul.edu ABSTRACT ad-hoc solutions for each recommendation problem. The social web is characterized by a wide variety of con- nections between individuals and entities. A challenge for To illustrate this problem, consider a user Alice who is a social web recommendation is make the most effective use member of the Last.fm web site for music lovers, looking for of a diverse set of relations. Typically, researchers focus a song to add to her current playlist: on a limited set of relations (for example, person to person ties for user recommendation or annotations in social tag- Track Song Artist ging recommendation). In this paper, we propose a general 1 Bad Girls Blood Orange approach to recommendation in social networks that can in- 2 Under the Gun Supreme Beings of Leisure 3 The Sea Morcheeba corporate multiple relations in combination. A key feature 4 Paris Train Beth Orton of this approach is the use of the metapath, an abstraction of a large class of paths in the network in which edges of We might expect that a suitable song would also be mellow different types are traversed in a particular order. As a pre- electronica featuring a female vocalist but there will be a liminary demonstration, we show that our approach yields very large number of tracks with these characteristics. We improvements over a state-of-the-art technique on several can discriminate among these tracks using data from the social tagging datasets. Last.fm social network, as summarized in the schema in Fig- ure 1. General Terms Hybrid recommender system, heterogeneous network, social tagging system Keywords 1. INTRODUCTION The social web is characterized by a diversity of data types and relations. For example, the employment-oriented web- site LinkedIn contains information about individuals, com- panies, jobs and skills, and connections can be drawn among any of these entities. There are also discussion forums and user groups. Diversity of information means that there are many kinds of recommendation that can be made to users: other users with whom to connect, groups to join, skills to Figure 1: Network schema for Last.fm acquire, companies to follow, jobs to apply for, etc. At the same time, the complexity of information means that there are many more types of information that can be integrated As the schema shows, a given song may have many possible into making recommendations: should the system recom- associations. It may appear on multiple playlists; it may mend companies that have hired your friends, those that have been tagged by one or more users (AnnotationS); it have many (or few) employees with your skills? Often build- may be associated with one or more artists. We can select ing recommenders for such sites involves devising individual any of these data sources, and build a recommender system with that basis. For example, using a user-based collabora- tive approach we could look at similarities across playlists or across tagging histories. Any such choice inevitably excludes a great deal of possibly-relevant knowledge. Ideally, we would like a recommendation method that is in- tegrative – bringing all of the available data to bear. In this paper, we describe one such technique: the Weighted Hy- brid of Low-Dimensional Recommenders (WHyLDR). The WHyLDR technique was originally developed for social tag- Figure 2: Two-dimensional projections for a social tagging network ging systems [7]; here we show how the concept can be ex- users that have incorporated it into a playlist. tended to more complex networks. A metapath can be arbitrarily long although we anticipate The key insight of the WHyLDR design is that a complex very long paths may not be very useful. Metapaths may also network structure can be viewed as a set of two-dimensional contain multiple occurrences of the same object type. For projections from nodes of one type to nodes of another. Fig- example, the songs on the playlists of the user’s friends of ure 2 illustrates this idea in the case of social tagging sys- friends can be expressed via the UUPS metapath huser → tems. The tagging system on the left has annotations con- user → playlist → songi. sisting of user, tags and web resources the users have tagged. One projection (the UT projection) maps each user to the 2. RELATED WORK set of tags that user has applied. Another projection (UR) The integration of social network data into recommender maps the user to the resources he or she has tagged. Other systems has been studied extensively in recent years [17], projections link resources to tags and to users: six such pro- [4], [13],[14]. Most of this work has been focused on system- jections in total. specific solutions. For example, [10] shows a LastFM music recommendation based on combination of social data and Given a two-dimensional representation, such as user repre- annotation system. A similar system incorporating social sented by tags, it is quite straightforward to apply standard data and tags has been used to recommend publications in collaborative recommendation methodology: find neighbor- the Bibsonomy dataset [3]. A more general technique is the hoods of similar users and make recommendations on that multi-relational approach of [2] in which the heterogeneous basis. with a hybrid recommendation approach, it is not nec- network in Epinions is separated into multiple homogeneous essary to choose just one of these projections as the source networks and then an optimization approach is used to find of data: a recommendation can be made by combining the the best combination of recommendations coming from the results of recommendation components built from these low- different networks. Kazienko and his colleagues [9] take a dimensional projections. Our previous work has shown that similar approach, treating the different kinds of relations in a linear weighted hybrid build of such components is more Flickr as “layers.” flexible and more accurate than integrative techniques that attempt to model all of the dimensions at once [7]. Our domain-independent approach for recommendation with social network data draws heavily on recent research in the We extend this idea to more complex networks through the area of complex heterogeneous information networks. Ac- concept of the metapath [15]. A path in a network is a se- cording to Han[8], heterogeneous networks are “information quence of edges that can be traversed to move from one node systems which consist of a large number of interacting, multi- to another. A metapath is an abstraction of a network path typed components”. In particular, heterogeneous informa- into a sequence of edge types. Navigating a metapath from tion networks involve multiple types of objects and multiple a node reaches all destination nodes reachable by following types of links denoting different relations [16]. edges with the appropriate type. For example, in the music recommendation scenario, we might have the SPU metap- Sun and Han [15] argue that information propagation across ath hsong → playlist → useri. This path goes from a song heterogeneous nodes and links can be very different from to all playlists into which it is a part and then to all users that across homogeneous nodes and links. To capture this contributing those playlists. A different metapath would go diversity, the authors defined the concept of the “metap- from a song to all annotations in which it appears to all users ath”. On top of the metapath abstraction, they were able to creating such annotations: hsong → annotationS → useri, build algorithms operating on heterogeneous networks such denoted SAsU. Note that both the SPU and SAsU meta- as metapath-based similarity search. paths map songs to users, but they follow different routes through the network. As discussed above, this work is an extension of research ap- plying linear weighted hybrids to recommendation problems A metapath can be used to generate a two-dimension pro- in social tagging systems. This work employed a collection of jection where each originating node is mapped to all of the recommendation components including the two-dimensional terminating nodes reachable by following the path. For ex- projection components built as described above and used ample, the SPU metapath can be used to generate an item- random-restart hill climbing to optimize the contribution of based matrix where each song is represented in terms of the each component. This technique is both simple and gen- eral. Our results showed that it was at least as effective In the experiments reported in [7], the system used the fol- as other, more computationally-sophisticated techniques for lowing recommendation components: the well-studied problem of tag recommendation, with the added advantages that it could be applied to a wider vari- ety of recommendation problems and could be more easily • Popular: A non-personalized recommender that scores updated. See [5, 6, 7] for more detail on this line of research. resources based on their overall popularity. There is a close relationship between recommendation in • User-based kNN, user-tag matrix (kNNUT ): A user- a network setting and link prediction, which is a standard based collaborative recommendation component in which problem in the computational study of networks [11]. For users are compared by their usage of tags. The entries example, in our playlist example, if the system recommends in this matrix are normalized counts – the fraction a track and Alice adds it to her playlist, this will become a of annotations in which a user has employed a given new hplaylist → songi link in the network. However, there tag. Pearson correlation is used to compare users and are some important distinctions between link prediction, as Resnick’s algorithm is used to compare predictions. it is customarily studied, and the problem of recommenda- tion. Foremost is the difference of emphasis demonstrated in • User-based kNN, user-resource matrix (kNNUR ): As the output of the system. In link prediction, the output of above, but where users are compared on the basis of the system is a set of links likely to appear in the network. which resources they have tagged. In this matrix, we In recommendation, we are suggesting items for an indi- did not find any benefit to make use of the count in- vidual and personalization is therefore a key element. We formation: the number of tags that a user applied to could filter the link predictions just to those that apply to a given resource. The matrix is therefore binary, re- the current user, but it is important to recognize that link flecting whether or not the user tagged a particular prediction techniques are not really designed or evaluated resource. Predictions are computed as with kNNUT . with personalized presentation in mind. Secondly, recom- • Item-based kNN, resource-tag matrix (kNNRT ): Item- mendation often involves a host of additional considerations based collaborative recommendation in which resources (serendipity, diversity, etc.) that are not typically factors in are compared on the basis of the tags that have been link prediction analysises. associated with them. This matrix is similar to kNNUT , but instead of users, we are profiling resources. To 3. LINEAR-WEIGHTED HYBRID make predictions, we use the adjusted cosine method For the present discussion, we assume that a recommender is from [12]. The predicted relevance of a resource is a function that takes a user as input and returns a ranked list a function of the normalized tag counts of similar re- of recommended items. One common way to implement such sources. Note that this component is not personalized: a recommendation function is to build it on top of a scoring it will give the same predictions for all users. function s(u, i), where u is a user and i is an item. If we can calculate a score for each item available for recommendation, • Item-based kNN, resource-user matrix (kNNRU ): Item- we can sort the items and present the best items to the user. based collaborative recommendation in which resources are compared on the basis of the users who have tagged A weighted hybrid recommender is therefore a scoring func- them. This matrix is the transpose of the UR matrix, tion that forms a weighted sum of the results of its con- and is also binary. Adjusted cosine is used here as well. stituent components [1] • Cosine: In this component, the user is represented as the vector of tags they have applied, normalized as in X kNNUT and each resource is represented as a vector of s0 (u, i) = αi ∗ si (u, i) (1) tags that have been applied to it as in kNNRT . The si scoring of a resource for a user is done by computing the cosine between the two vectors. where the si ’s are the recommendation components and αi ’s are the associated weights. To define a weighted hybrid, we 3.2 Heterogeneous Networks need to specify its components and their weights. Following Sun and Han [15], we define a heterogeneous in- formation network as a directed graph G = (ν, ε) with an 3.1 Recommendation Components object type mapping function γ : ν → A and a edge type The components needed for a hybrid recommender are a mapping function φ : ε → R where each object belongs to function of the recommendation task and the data available particular object type a ∈ A and each edge belongs to a par- to support recommendation. In our work on social tagging ticular relation type r ∈ R. Two edges of the same type by systems, we identified a number of recommendation tasks definition share the same object types at their originating appropriate to that context, including tag recommendation, and terminating points. resource recommendation, resource recommendation by ex- ample, user recommendation, and others. Resource recom- A heterogeneous network is one where there are multiple ob- mendation is the task of identifying items of interest for ject types and/or multiple edge types – typically both. For a user in social tagging system based on tagging behavior. example, the music example above is clearly a hetereoge- Note that these items may or may not be items that the neous network. There are multiple types of nodes (artists, user “likes” – a user may frequently tag disliked items with users, songs, etc.) and multiple types of relations (user-user, deprecatory tags, for example. user-playlist, artist-song, etc.). A network schema, such as that shown in Figure 1, gives an overview of a heteroge- neous network by indicating the different object types and the relations that exist between them. A metapath in a het- erogeneous network is a path over the network schema, a se- quenced composition of relations between two object types. 3.3 Metapath-Based Recommendation Com- ponents A social tagging system can be viewed as a heterogeneous network with four different types of nodes (users, tag, re- sources, and annotations). See Figure 3. With this in mind, consider the UR projection on which the kNNUR component is built. This is a matrix in which the rows correspond to users and the columns correspond to resources, and the en- tries reflect the whether or not the user has tagged that Figure 4: Simple Tagging Network particular resource. We can generate the same matrix using the schema shown in Figure 3 by following the metapath huser → annotation → resourcei. Since the schema has users. The resulting matrix shows a similarity between these a simple star structure, we will omit the reference to the two users that was not present in the UR version. central annotation node (all navigation must go through it) and refer to this as the UR metapath. Track1 Track2 Track3 Carol 1 1 0 Bob 0 1 1 Alice 0 1 1 Of course, this process can be extended indefinitely: UTTR, UTTTR, etc. We can envision in addition a wide variety of other metapaths: for example, UTUR would be all resources tagged by users who share tags with the target user. For our preliminary investigation of this style of recommenda- tion, we opted to explore only a few possible components using short metapaths. We created three additional compo- Figure 3: Network schema for Social Tagging Sytems nents to augment the six already incorporated in the system described in [7]. Adopting the metapath formalism allows us to express a much wider set of possible projections. We can expand the set of resources by which a user is represented by follow- • User-based kNN with the user-tag matrix formed by ing an extended metapath: huser → annotation → tag → following the URT metapath: kNNURT . annotation → resourcei or UTR for short. This path finds • User-based kNN with the user-resource matrix formed all tags a user has employed and then all annotations includ- by following the UTR metapath: kNNUTR . ing those tags (even those not created by the user) and then the resources for that larger set of annotations. This can be • A version of the Cosine metric above in which the vec- seen as a kind of ”query expansion” of the resource space by tor of tags for a user is formed using the URT metap- considering other users’ annotations of the same resources. ath: Cosine-M. To see how this process works, consider Figure 4, which 4. DATASETS shows a simplified network with 3 users having tagged three For our experiments, we used three social tagging networks music tracks. from the data sets studied in [7]. All of the data sets were filtered as described in [7] to eliminate rare and idiosyncratic The UR matrix for this network is as follows: tags and resources. Track1 Track2 Track3 Carol 1 1 0 • Bibsonomy which enables users to tag URL book- Bob 0 1 0 mark and journal articles. This dataset contains 357 Alice 0 0 1 users, 1.783 resources and 1,573 tags. • Amazon includes 4817 users, 5801 resources (prod- There are four annotations (A1...A4) because Carol has cre- ucts on the Amazon.com web site) and 3201 tags. (Note ated two. Note that Bob and Alice have no tagged tracks in that this is a subset of the full network from [7].) common. • LastFM users have music profiles and can create play- However, if we follow the UTR metapath, the fact that Bob lists. User may tag songs, albums and artists. This and Alice have both used the tag “Relaxing” means that dataset contains 2,368 users, 2,350 resources and 1,141 the UTR metapath yields both Track2 and Track3 for these tags. 5. METHODOLOGY Figure 5b shows the results for the Bibsonomy dataset. The For each data set, we divided the data randomly into five metapath enhanced HM hybrid shows a slight improvement partitions each having equal numbers of annotations. The over original hybrid model for this dataset. Most interesting first partition is used to learn the α weights for each com- are the characteristics of the URT and UTR components. ponent. The other partitions are used for cross validation: Unlike the Amazon results, the kNNURT component offers three partitions are used as training data and the fourth is higher precision across almost the whole range of recall val- used to test the system’s predictions. ues and the kNNUTR component is roughly comparable to the kNNUR one, with slightly higher precision at lower recall. 5.1 Weight Learning The α values in Equation 1 are learned empirically from the Finally, the Last.fm results are shown in Figure 5c. One first data partition. We choose a random set of α values and well-known characteristic of this data set is the high degree calculate the F-measure for the hybrid using these weights. of noise associated with the tags. Users often apply vague Then we adjust one weight at random and re-compute the tags such as “rock” or (worse) “favorite song” to music tracks F-measure over the same data partition. If it increases, the on this site. Again the HM hybrid is superior. The UT and change is accepted and the process repeats. Otherwise, the URT components are both equally bad – not surprising given change is rejected and another random modification is pro- the characteristics of the tag dimension. The UR and UTR posed. When the values stabilize, another random start- components are quite similar in performance, which is rather ing point is chosen. The weights leading to the highest F- surprising, given that the UTR metapath makes use of these measure are then chosen for the rest of the experiment and same problematic tag links. the data is discarded. Figure 6 shows the learned α weights for the components in the recommendation experiments. In each graph, the grey 5.2 Evaluation bar represents the weights of the components in the origi- To measure the quality of recommendations, the remaining nal hybrid and the striped bars are the weights for the HM partitions are used for four-fold cross validation. For each algorithm. The first set of results in Figure 6a are for the user in the test partition, we calculate recommendation lists Amazon.com data. In this data, the Cosine component is a Recu of size 1 through 10 and compare these results with strong contributor to the hybrid. When we add the longer the resources Ru tagged by that user in the test partition, metapaths Cosine-M also proves useful. kNNUR also has rel- calculating precision and recall as below. atively high weight in the original version. The extended version of this component via the UTR metapath gets some weight in the HM model. Most surprisingly, UT, which has |Ru ∩ Recu | very low weight in the original model, becomes the single recall = (2) most heavily-weighted component in the HM model. This is |Ru | the only data set where the simple popularity recommender has a relatively large weight. |Ru ∩ Recu | precision = (3) Figure 6b shows the weight contribution of recommender |Recu | components learned for the Bibsonomy dataset. The original hybrid showed a very strong contribution from the kNNUR The recall and precision are calculated for each user and component, which in the HM model is dispersed to the averaged across all users, and then averaged across the four longer metapath components kNNURT and kNNUTR . Inter- folds. estingly, the contribution of the Cosine component drops by more than 50%, while the extended version Cosine-M barely We evaluated two recommendation hybrids: the original hy- makes a showing. It seems that perhaps the components brid (labeled “H”) in the figures containing the 6 components with longer metapaths are doing a better job of capturing used in [7], and an enhanced hybrid (labeled “HM”) adding the tag-based connections between users and resources, ren- the three components using extended metapaths. dering the Cosine component less valuable. We will need to do additional experimentation to characterize this phe- 6. RESULTS nomenon. In all experiments we see that hybrid model that includes The learned weights for the two hybrids on the Last.fm components with longer metapaths offers improved results. data appear in Figure 6c. kNNUTR and kNNUR have similar Figure 5a shows the results for Amazon dataset. This is weights in the larger hybrid. Again, the Cosine metric loses one of the most difficult recommendation data sets in our weight in the HM model, as the Cosine-M gains comparable evaluations: note the extremely low recall and precision re- weight. sults. We can see the HM model offers better results than the original hybrid method. 7. FUTURE WORK Looking at the performance of the individual components, The work described here is very preliminary. While our aim we see that the components with longer metapaths are not is to explore recommendation is data sets with complex net- better than their shorter-path counterparts. kNNUR is the work schemas, to date we have only extended our prior work top individual component but kNNUTR is a distant third. on social tagging systems. The results are very encouraging, Cosine-M and kNNURT appear to be about the same as the however, showing improvement on the baseline system us- non-extended versions. ing a few components based on deeper paths into the same network. We expect that the incorporation of additional ob- L. Christiansen, and B. Mobasher. A fast effective ject types and relations will yield even more improvement multi-channeled tag recommender. In European in recommendation accuracy. Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases One important question is whether the hybrid weights can Discovery Challenge, pages 59–63, Bled, Slovenia, be predicted or at least estimated from the characteristics of 2009. the data. This issue takes on greater urgency when we con- [6] J. Gemmell, T. Schimoler, B. Mobasher, and sider the fact that the set of metapath-based components is R. Burke. Hybrid tag recommendation for social unbounded – it is always possible to consider more friends annotation systems. In 19th ACM International of friends or to follow a link back and forth again: UTR, Conference on Information and Knowledge UTTR, UTTTR, etc. Intuitively, the value of longer paths Management, pages 829–838, Toronto, Canada, 2010. should be less in the limit – eventually there will be a fixed [7] J. Gemmell, T. Schimoler, B. Mobasher, and point. But, as the results here show, under some condi- R. Burke. Resource recommendation in social tions components built with longer metapaths are actually annotation systems: A linear-weighted hybrid more successful than those with shorter paths, so we cannot approach. Journal of Computer and System Sciences, assume that weights will decrease monotonically with path 78(4):1160–1174, 2012. length. [8] J. Han. Mining heterogeneous information networks by exploring the power of links. In J. Gama, V. Costa, We are experimenting with entropy-based measures of the A. Jorge, and P. Brazdil, editors, Discovery Science, contribution of each component, with the aim of finding a volume 5808 of Lecture Notes in Computer Science, metric with which to discriminate between components and pages 13–30. Springer Berlin Heidelberg, 2009. filter out those unlikely to be useful, prior to the weight [9] P. Kazienko, K. Musial, and T. Kajdanowicz. learning step. Limiting the number of components is key Multidimensional social network in the social to making weight learning efficient. In the experiments re- recommender system. Systems, Man and Cybernetics, ported here, we found that adding three more components Part A: Systems and Humans, IEEE Transactions on, (50% increase) quadrupled the amount of training time re- 41(4):746–759, 2011. quired to learn the α values. So it is not possible add com- [10] I. Konstas, V. Stathopoulos, and J. M. Jose. On social ponents indiscriminately. In addition, a weight estimator networks and collaborative recommendation. In might be useful for providing an initial seed for the hill- Proceedings of the 32nd international ACM SIGIR climbing step. All of these questions will be explored in conference on Research and development in future work. information retrieval, SIGIR ’09, pages 195–202, New York, NY, USA, 2009. ACM. 8. CONCLUSION [11] D. Liben-Nowell and J. M. Kleinberg. The link One of the key challenges in social web recommendation prediction problem for social networks. In 12th ACM is the effective integration of the many dimensions of the International Conference on Information and available data. In this paper, we describe a linear-weighted Knowledge Management, pages 556–559, 2003. hybrid approach that generalizes our prior work on social [12] B. Sarwar, G. Karypis, J. Konstan, and J. Reidl. tagging systems to a larger space of heterogeneous networks. Item-Based Collaborative Filtering Recommendation We show that this more general approach can offer perfor- Algorithms. In 10th International Conference on mance improvement in social tagging data, and offers signifi- World Wide Web, Hong Kong, China, 2001. cant potential for incorporating additional data dimensions. [13] S. Siersdorfer and S. Sizov. Social recommender systems for web 2.0 folksonomies. In Hypertext, pages 9. REFERENCES 261–270, 2009. [1] R. Burke. Hybrid recommender systems: Survey and [14] Y. Song, L. Zhang, and C. L. Giles. Automatic tag experiments. User Modeling and User-Adapted recommendation algorithms for social recommender Interaction, 12(4):331–370, 2002. systems. ACM Transactions on the Web, 5(1):4, 2011. [2] J. Chen, G. Chen, H. L. Zhang, J. Huang, and [15] Y. Sun and J. Han. Mining Heterogeneous Information G. Zhao. Social recommendation based on Networks: Principles and Methodologies. Synthesis multi-relational analysis. In International Conference Lectures on Data Mining and Knowledge Discovery. on Web Intelligence and Intelligent Agent Technology, Morgan & Claypool Publishers, 2012. pages 471–477, 2012. [16] Y. Sun, J. Han, X. Yan, P. S. Yu, and T. Wu. [3] S. Doerfel, R. Jäschke, A. Hotho, and G. Stumme. Pathsim: Meta path-based top-k similarity search in Leveraging publication metadata and social data into heterogeneous information networks. In Proceedings of folkrank for scientific publication recommendation. In the 37th International Conference on Very Large Proceedings of the 4th ACM RecSys workshop on Databases, pages 992–1003, 2011. Recommender systems and the social web, RSWeb ’12, [17] S. J. Yu. The dynamic competitive recommendation pages 9–16, New York, NY, USA, 2012. ACM. algorithm in social network services. Inf. Sci., [4] F. A. Durão and P. Dolog. A personalized tag-based 187:1–14, 2012. recommendation in social web systems. In Proceedings of International Workshop on Adaptation and Personalization for Web 2.0, 2009. [5] J. Gemmell, M. Ramezani, T. Schimoler, (a) Amazon dataset (b) Bibsonomy dataset (c) Last.fm dataset Figure 5: Resource recommendation results (a) Amazon dataset (b) Bibsonomy dataset (c) LastFM dataset Figure 6: Hybrid Weights