Item2vec: Neural Item Embedding for Collaborative Filtering Oren Barkan Noam Koenigstein Microsoft, Israel Microsoft, Israel ABSTRACT equivalent to a set or basket of items. Since we ignore the spatial information, we treat each pair of items that share the Many Collaborative Filtering (CF) algorithms are item-based in the same set as a positive example. Therefore, for a given set of sense that they analyze item-item relations in order to produce item K items {wi }i =1 ⊆ W , we aim at maximizing the following term: similarities. Recently, several works in the field of Natural Language Processing (NLP) suggested to learn a latent 1 K K  N  representation of words using neural embedding algorithms. ∑∑ K i =1 j ≠i log  σ (uiT v j )∏ σ (−uiT vk )   k =1  (1) Among them, the Skip-gram with Negative Sampling (SGNS), also m m known as Word2vec, was shown to provide state-of-the-art results where ui ∈ U (⊂ ℝ ) and vi ∈ V (⊂ ℝ ) are latent vectors on various linguistics tasks. In this paper, we show that item-based that correspond to the target and context representations for CF can be cast in the same framework of neural word embedding. the item wi ∈ W , respectively. σ ( x) = 1 /1 + exp( − x) , m is Inspired by SGNS, we describe a method we name Item2vec for item-based CF that produces embedding for items in a latent space. chosen empirically and according to the size of the dataset and The method is capable of inferring item-item relations even when N is a parameter that determines the number of negative user information is not available. We present experimental results examples to be drawn per a positive example. A negative item that demonstrate the effectiveness of the Item2vec method and wi is sampled from the unigram distribution raised to the show it is competitive with SVD. 3/4rd power. In order to overcome the imbalance between rare and 1. INTRODUCTION frequent items, we subsample the data [4]. Specifically, we Computing item similarities is a key building block in modern discard each item w from its set, with a probability recommender systems. While many recommendation p ( discard | w) = 1 − ρ / f ( w) where f ( w) is the algorithms are focused on learning a low dimensional frequency of the item w and ρ is a prescribed threshold. embedding of users and items simultaneously [1], computing item similarities is an end in itself. U and V are estimated by applying stochastic gradient There are several scenarios where item-based CF methods ascent with respect to the objective in (1). Finally, we use ui [2] are desired: in a large scale dataset, when the number of as the representation for the i -th item and the affinity between users is significantly larger than the number of items, the a pair of items is computed by the cosine similarity. computational complexity of methods that model items solely is significantly lower than methods that model both users and 3. EXPERIMENTAL RESULTS items simultaneously. For example, online music services may have hundreds of millions of enrolled users with just tens of In this section, we provide qualitative and quantitative results. thousands of artists (items). As a baseline item-based CF algorithm we used item-item Recent progress in neural embedding methods for linguistic SVD. Specifically, we apply SVD to decompose A = USV * , tasks have dramatically advanced state-of-the-art natural where A is a square matrix in size of number of items. The language processing (NLP) capabilities [3, 4]. These methods (i, j ) entry in A contains the number of times ( wi , w j ) attempt to map words and phrases to a low dimensional vector space that captures semantic and syntactic relations between appears as a positive pair in the dataset, normalized by the words. Specifically, Skip-gram with Negative Sampling square root of the product of its row and column sums. The 1/ 2 (SGNS), known also as word2vec [4], set new records in latent representation is given by the rows of U m Sm . The various NLP tasks [4]. affinity between items is computed by cosine similarity of In this paper, we propose to apply SGNS to item-based CF. their representations. Motivated by its great success in other domains, we suggest that We evaluate the methods on two different private of datasets. SGNS with minor modifications may capture the relations between The first dataset consists of user-artist relations that are retrieved different items in collaborative filtering datasets. To this end, we from the Microsoft Xbox Music service. This dataset consist of 9M propose a modified version of SGNS named Item2vec. We show events. Each event consists of a user-artist relation, which means that Item2vec can induce a similarity measure that is competitive the user played a song by the specific artist. The dataset contains with an item-based CF using SVD. 732K users and 49K distinct artists. The second dataset contains orders of products from Microsoft 2. ITEM2VEC Store. An order is given as a basket of items without any SGNS is a neural word embedding method that was introduced information about the user that made it. Therefore, the information by Mikolov et. al in [4]. The method aims at finding words in this dataset is weaker in the sense that we cannot bind between representation that captures the relation between a word to its users and items. The dataset consists of 379K orders (that contains surrounding words in a sentence. more than a single item) and 1706 distinct items. In the context of CF data, the items are given as user We applied Item2vec and SVD to both datasets. The generated sets. The application of SGNS to CF data is dimension parameter was set to m = 40 . We ran item2vec on straightforward once we realize that a sequence of words is D TABLE 1: A COMPARISON BETWEEN SVD AND ITEM2VEC ON GENRE CLASSIFICATION TASK FOR VARIOUS SIZES OF TOP POPULAR ARTIST SETS Top (q) popular artists SVD accuracy Item2vec accuracy 2.5k 85% 86.4% 5k 83.4% 84.2% 10k 80.2% 82% 15k 76.8% 79.5% 20k 73.8% 77.9% 10k unpopular (see text) 58.4% 68% gap between the two keeps growing as q increases. This Figure 1: t-SNE embedding for the item vectors produced by might imply that item2vec produces a better representation for item2vec. Items are colored according to their genres. less popular items than the one produced by SVD. We further validate this hypothesis by applying the same ‘genre both datasets for 20 epochs with negative sampling value consistency’ test to a subset of 10K unpopular items (the last N = 15 . We further applied subsampling with ρ values of row in Table 1). We define an unpopular item in case it has 1e-5 and 1e-3 to the Music and Store datasets, respectively. less than 15 users that played its corresponding artist. The The reason we set different parameter values is due to different accuracy obtained by item2vec was 68%, compared to 58.4% sizes of the datasets. by SVD. The music dataset does not provide genre metadata. Qualitative comparisons between Item2Vec and SVD are Therefore, for each artist we retrieved the genre metadata from presented in Tables 2-3 for Music and Store datasets, respectively. the web to form a genre-artist catalog. Then we used this The tables present seed items and their 4 nearest neighbors (in the catalog in order to visualize the relation between the learnt latent space). The main advantage of this comparison is that it representation and the genres. This is motivated by the enables the inspection of item similarities in higher resolutions than assumption that a useful representation would cluster artists genres. Moreover, since the Store dataset lacks any informative according to their genre. To this end, we generated a subset tags / labels, a qualitative evaluation is inevitable. We observe that that contains the top 100 popular artists per genre for 13 for both datasets, Item2Vec provides lists that are better related to distinct genres. We applied t-SNE [5] with a cosine kernel to the seed item than the ones that are provided by SVD. Furthermore, reduce the dimensionality of the item vectors to 2. Then, we we see that even though the Store dataset contains weaker colored each artist point according to its genre. Figure 1 information, Item2Vec manages to infer item relations quite well. presents the 2D embedding that was produced by t-SNE, for In future we plan to investigate more complex CF models [1] item2vec. We observe that some of the relatively homogenous and compare between them and item2vec. clusters in Fig. 1 are contaminated with items that are colored differently. We found out that many of these cases originate in 4. REFERENCES artists that are mislabeled in the web or have a mixed genre. [1] Koren Y, Bell R, Volinsky C. Matrix factorization techniques for recommender systems. Computer. 2009 Aug 1(8):30-7. In order to quantify the similarity quality, we tested the genre consistency between an item and its nearest neighbors. [2] Linden G, Smith B, York J. Amazon.com recommendations: Item- to-item collaborative filtering. Internet Computing, IEEE. 2003 We do that by iterating over the top q popular items (for Jan;7(1):76-80. various values of q ) and check whether their genre is [3] Mnih A, Hinton GE. A scalable hierarchical distributed language model. In Proceedings of NIPS 2009 (pp. 1081-1088). consistent with the genres of the k nearest items that surround [4] Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J. Distributed them. This is done by a simple majority voting. Table 1 representations of words and phrases and their compositionality. In presents the results obtained for k = 8 . We observe that Proceedings of NIPS 2013 (pp. 3111-3119). item2vec is consistently better than the SVD model, where the [5] Van der Maaten, L., & Hinton, G. Visualizing data using t-SNE. Journal of Machine Learning Research, (2008) 9(2579-2605), 85. TABLE 2: A QUALITATIVE COMPARISON BETWEEN ITEM2VEC AND SVD FOR SELECTED ITEMS FROM THE MUSIC DATASET Seed item (genre) Item2vec – Top 4 recommendations SVD – Top 4 recommendations David Guetta (Dance) Avicii ,Calvin Harris, Martin Solveig, Deorro Brothers, The Blue Rose, JWJ, Akcent Katy Perry (Pop) Miley Cyrus, Kelly Clarkson, P!nk, Taylor Swift Last Friday Night, Winx Club, Boots On Cats, Thaman S. Dr. Dre (Hip Hop) Game, Snoop Dogg, N.W.A, DMX Jack The Smoker, Royal Goon, Hoova Slim, Man Power Johnny Cash (Country) Willie Nelson, Jerry Reed, Dolly Parton, Merle Haggard Hank Williams, The Highwaymen, Johnny Horton, Hoyt Axton Guns N' Roses (Rock) Aerosmith, Ozzy Osbourne, Bon Jovi, AC/DC Bon Jovi, Gilby Clarke, Def Leppard, Mtley Cre Justin Timberlake (Pop) Rihanna, Beyonce, The Black eyed Peas, Bruno Mars JC Chasez. Jordan Knight, Shontelle, Nsync TABLE 3: A QUALITATIVE COMPARISON BETWEEN ITEM2VEC AND SVD FOR SELECTED ITEMS FROM THE STORE DATASET Seed item Item2vec – Top 4 recommendations SVD – Top 4 recommendations LEGO Emmet LEGO Bad Cop, LEGO Simpsons: Bart, LEGO Ninjago, LEGO Minecraft Foam, Disney Toy Box, Minecraft (Xbox One), Terraria Scooby-Doo (Xbox One) Minecraft Minecraft Diamond Earrings, Minecraft Periodic Table, Rabbids Invasion, Mortal Kombat, Minecraft Periodic Table Lanyard Minecraft Crafting Table, Minecraft Enderman Plush GoPro LCD GoPro Anti-Fog Inserts, GoPro The Frame Mount, GoPro Floaty Titanfall (Xbox One), GoPro The Frame Mount, Call of Duty Touch BacPac Backdoor, GoPro 3-Way (PC), Evolve (PC) Surface Pro 4 UAG Surface Pro 4 Case, Zip Sleeve for Surface, Surface 65W Farming Simulator (PC), Dell 17 Gaming laptop, Bose Wireless Type Cover Power Supply, Surface Pro 4 Screen Protection Headphones, UAG Surface Pro 4 Case Disney Disney Maleficent, Disney Hiro, Disney Stich, Disney Marvel Disney Stich, Mega Bloks Halo UNSC Firebase, LEGO Simpsons: Baymax Super Heroes Bart, Mega Bloks Halo UNSC Gungoose Windows Windows Server Remote Desktop Services 1-User, Exchange NBA Live (Xbox One) – 600 points Download Code, Windows 10 Server 2012 Server 5-Client, Windows Server 5-User Client Access, Home, Mega Bloks Halo Covenant Drone Outbreak, Mega Bloks Exchange Server 5-User Client Access Halo UNSC Vulture Gunship