Item2vec: Neural Item Embedding for Collaborative Filtering
                            Oren Barkan                                             Noam Koenigstein
                           Microsoft, Israel                                           Microsoft, Israel


ABSTRACT                                                               equivalent to a set or basket of items. Since we ignore the
                                                                       spatial information, we treat each pair of items that share the
Many Collaborative Filtering (CF) algorithms are item-based in the     same set as a positive example. Therefore, for a given set of
sense that they analyze item-item relations in order to produce item               K
                                                                       items {wi }i =1 ⊆ W , we aim at maximizing the following term:
similarities. Recently, several works in the field of Natural
Language Processing (NLP) suggested to learn a latent                               1 K K                         N
                                                                                                                                 
representation of words using neural embedding algorithms.                            ∑∑
                                                                                    K i =1 j ≠i
                                                                                                log  σ (uiT v j )∏ σ (−uiT vk ) 
                                                                                                                 k =1           
                                                                                                                                     (1)
Among them, the Skip-gram with Negative Sampling (SGNS), also
                                                                                             m                          m
known as Word2vec, was shown to provide state-of-the-art results       where ui ∈ U (⊂ ℝ ) and vi ∈ V (⊂ ℝ ) are latent vectors
on various linguistics tasks. In this paper, we show that item-based   that correspond to the target and context representations for
CF can be cast in the same framework of neural word embedding.         the item wi ∈ W , respectively. σ ( x) = 1 /1 + exp( − x) , m is
Inspired by SGNS, we describe a method we name Item2vec for
item-based CF that produces embedding for items in a latent space.     chosen empirically and according to the size of the dataset and
The method is capable of inferring item-item relations even when        N is a parameter that determines the number of negative
user information is not available. We present experimental results     examples to be drawn per a positive example. A negative item
that demonstrate the effectiveness of the Item2vec method and           wi is sampled from the unigram distribution raised to the
show it is competitive with SVD.                                       3/4rd power.
                                                                          In order to overcome the imbalance between rare and
1. INTRODUCTION                                                        frequent items, we subsample the data [4]. Specifically, we
Computing item similarities is a key building block in modern          discard each item w from its set, with a probability
recommender systems. While many recommendation                          p ( discard | w) = 1 − ρ / f ( w) where      f ( w) is the
algorithms are focused on learning a low dimensional                   frequency of the item w and ρ is a prescribed threshold.
embedding of users and items simultaneously [1], computing
item similarities is an end in itself.                                    U and V are estimated by applying stochastic gradient
    There are several scenarios where item-based CF methods            ascent with respect to the objective in (1). Finally, we use ui
[2] are desired: in a large scale dataset, when the number of          as the representation for the i -th item and the affinity between
users is significantly larger than the number of items, the
                                                                       a pair of items is computed by the cosine similarity.
computational complexity of methods that model items solely
is significantly lower than methods that model both users and          3. EXPERIMENTAL RESULTS
items simultaneously. For example, online music services may
have hundreds of millions of enrolled users with just tens of          In this section, we provide qualitative and quantitative results.
thousands of artists (items).                                          As a baseline item-based CF algorithm we used item-item
   Recent progress in neural embedding methods for linguistic          SVD. Specifically, we apply SVD to decompose A = USV * ,
tasks have dramatically advanced state-of-the-art natural              where A is a square matrix in size of number of items. The
language processing (NLP) capabilities [3, 4]. These methods            (i, j ) entry in A contains the number of times ( wi , w j )
attempt to map words and phrases to a low dimensional vector
space that captures semantic and syntactic relations between           appears as a positive pair in the dataset, normalized by the
words. Specifically, Skip-gram with Negative Sampling                  square root of the product of its row and column sums. The
                                                                                                                                    1/ 2
(SGNS), known also as word2vec [4], set new records in                 latent representation is given by the rows of U m Sm . The
various NLP tasks [4].                                                 affinity between items is computed by cosine similarity of
   In this paper, we propose to apply SGNS to item-based CF.           their representations.
Motivated by its great success in other domains, we suggest that          We evaluate the methods on two different private of datasets.
SGNS with minor modifications may capture the relations between        The first dataset consists of user-artist relations that are retrieved
different items in collaborative filtering datasets. To this end, we   from the Microsoft Xbox Music service. This dataset consist of 9M
propose a modified version of SGNS named Item2vec. We show             events. Each event consists of a user-artist relation, which means
that Item2vec can induce a similarity measure that is competitive      the user played a song by the specific artist. The dataset contains
with an item-based CF using SVD.                                       732K users and 49K distinct artists.
                                                                          The second dataset contains orders of products from Microsoft
2. ITEM2VEC                                                            Store. An order is given as a basket of items without any
SGNS is a neural word embedding method that was introduced             information about the user that made it. Therefore, the information
by Mikolov et. al in [4]. The method aims at finding words             in this dataset is weaker in the sense that we cannot bind between
representation that captures the relation between a word to its        users and items. The dataset consists of 379K orders (that contains
surrounding words in a sentence.                                       more than a single item) and 1706 distinct items.
   In the context of CF data, the items are given as user                 We applied Item2vec and SVD to both datasets. The
generated sets. The application of SGNS to CF data is                  dimension parameter was set to m = 40 . We ran item2vec on
straightforward once we realize that a sequence of words is
                                  D 

                                                                                TABLE 1: A COMPARISON BETWEEN SVD AND ITEM2VEC ON
                                                                                GENRE CLASSIFICATION TASK FOR VARIOUS SIZES OF TOP
                                                                                POPULAR ARTIST SETS

                                                                                    Top (q) popular artists    SVD accuracy     Item2vec accuracy
                                                                                    2.5k                           85%               86.4%
                                                                                    5k                            83.4%              84.2%
                                                                                    10k                           80.2%               82%
                                                                                    15k                           76.8%              79.5%
                                                                                    20k                           73.8%              77.9%
                                                                                    10k unpopular (see text)      58.4%               68%

                                                                             gap between the two keeps growing as q increases. This
Figure 1: t-SNE embedding for the item vectors produced by
                                                                             might imply that item2vec produces a better representation for
item2vec. Items are colored according to their genres.                       less popular items than the one produced by SVD. We further
                                                                             validate this hypothesis by applying the same ‘genre
both datasets for 20 epochs with negative sampling value                     consistency’ test to a subset of 10K unpopular items (the last
 N = 15 . We further applied subsampling with ρ values of                    row in Table 1). We define an unpopular item in case it has
1e-5 and 1e-3 to the Music and Store datasets, respectively.                 less than 15 users that played its corresponding artist. The
The reason we set different parameter values is due to different             accuracy obtained by item2vec was 68%, compared to 58.4%
sizes of the datasets.                                                       by SVD.
   The music dataset does not provide genre metadata.                           Qualitative comparisons between Item2Vec and SVD are
Therefore, for each artist we retrieved the genre metadata from              presented in Tables 2-3 for Music and Store datasets, respectively.
the web to form a genre-artist catalog. Then we used this                    The tables present seed items and their 4 nearest neighbors (in the
catalog in order to visualize the relation between the learnt                latent space). The main advantage of this comparison is that it
representation and the genres. This is motivated by the                      enables the inspection of item similarities in higher resolutions than
assumption that a useful representation would cluster artists                genres. Moreover, since the Store dataset lacks any informative
according to their genre. To this end, we generated a subset                 tags / labels, a qualitative evaluation is inevitable. We observe that
that contains the top 100 popular artists per genre for 13                   for both datasets, Item2Vec provides lists that are better related to
distinct genres. We applied t-SNE [5] with a cosine kernel to                the seed item than the ones that are provided by SVD. Furthermore,
reduce the dimensionality of the item vectors to 2. Then, we                 we see that even though the Store dataset contains weaker
colored each artist point according to its genre. Figure 1                   information, Item2Vec manages to infer item relations quite well.
presents the 2D embedding that was produced by t-SNE, for                       In future we plan to investigate more complex CF models [1]
item2vec. We observe that some of the relatively homogenous                  and compare between them and item2vec.
clusters in Fig. 1 are contaminated with items that are colored
differently. We found out that many of these cases originate in
                                                                             4. REFERENCES
artists that are mislabeled in the web or have a mixed genre.                [1] Koren Y, Bell R, Volinsky C. Matrix factorization techniques for
                                                                             recommender systems. Computer. 2009 Aug 1(8):30-7.
   In order to quantify the similarity quality, we tested the
genre consistency between an item and its nearest neighbors.                 [2] Linden G, Smith B, York J. Amazon.com recommendations: Item-
                                                                             to-item collaborative filtering. Internet Computing, IEEE. 2003
We do that by iterating over the top q popular items (for                    Jan;7(1):76-80.
various values of q ) and check whether their genre is                       [3] Mnih A, Hinton GE. A scalable hierarchical distributed language
                                                                             model. In Proceedings of NIPS 2009 (pp. 1081-1088).
consistent with the genres of the k nearest items that surround
                                                                             [4] Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J. Distributed
them. This is done by a simple majority voting. Table 1                      representations of words and phrases and their compositionality. In
presents the results obtained for k = 8 . We observe that                    Proceedings of NIPS 2013 (pp. 3111-3119).
item2vec is consistently better than the SVD model, where the                [5] Van der Maaten, L., & Hinton, G. Visualizing data using t-SNE.
                                                                             Journal of Machine Learning Research, (2008) 9(2579-2605), 85.

           TABLE 2: A QUALITATIVE COMPARISON BETWEEN ITEM2VEC AND SVD FOR SELECTED ITEMS FROM THE MUSIC DATASET
   Seed item (genre)                 Item2vec – Top 4 recommendations                                    SVD – Top 4 recommendations
 David Guetta (Dance)      Avicii ,Calvin Harris, Martin Solveig, Deorro                 Brothers, The Blue Rose, JWJ, Akcent
 Katy Perry (Pop)          Miley Cyrus, Kelly Clarkson, P!nk, Taylor Swift               Last Friday Night, Winx Club, Boots On Cats, Thaman S.
 Dr. Dre (Hip Hop)         Game, Snoop Dogg, N.W.A, DMX                                  Jack The Smoker, Royal Goon, Hoova Slim, Man Power
 Johnny Cash (Country)     Willie Nelson, Jerry Reed, Dolly Parton, Merle Haggard        Hank Williams, The Highwaymen, Johnny Horton, Hoyt Axton
 Guns N' Roses (Rock)      Aerosmith, Ozzy Osbourne, Bon Jovi, AC/DC                     Bon Jovi, Gilby Clarke, Def Leppard, Mtley Cre
 Justin Timberlake (Pop)   Rihanna, Beyonce, The Black eyed Peas, Bruno Mars             JC Chasez. Jordan Knight, Shontelle, Nsync

           TABLE 3: A QUALITATIVE COMPARISON BETWEEN ITEM2VEC AND SVD FOR SELECTED ITEMS FROM THE STORE DATASET
   Seed item                     Item2vec – Top 4 recommendations                                      SVD – Top 4 recommendations
 LEGO Emmet         LEGO Bad Cop, LEGO Simpsons: Bart, LEGO Ninjago, LEGO               Minecraft Foam, Disney Toy Box, Minecraft (Xbox One), Terraria
                    Scooby-Doo                                                          (Xbox One)
 Minecraft          Minecraft Diamond Earrings, Minecraft Periodic Table,
                                                                                        Rabbids Invasion, Mortal Kombat, Minecraft Periodic Table
 Lanyard            Minecraft Crafting Table, Minecraft Enderman Plush
 GoPro LCD          GoPro Anti-Fog Inserts, GoPro The Frame Mount, GoPro Floaty         Titanfall (Xbox One), GoPro The Frame Mount, Call of Duty
 Touch BacPac       Backdoor, GoPro 3-Way                                               (PC), Evolve (PC)
 Surface Pro 4      UAG Surface Pro 4 Case, Zip Sleeve for Surface, Surface 65W         Farming Simulator (PC), Dell 17 Gaming laptop, Bose Wireless
 Type Cover         Power Supply, Surface Pro 4 Screen Protection                       Headphones, UAG Surface Pro 4 Case
 Disney             Disney Maleficent, Disney Hiro, Disney Stich, Disney Marvel         Disney Stich, Mega Bloks Halo UNSC Firebase, LEGO Simpsons:
 Baymax             Super Heroes                                                        Bart, Mega Bloks Halo UNSC Gungoose
 Windows            Windows Server Remote Desktop Services 1-User, Exchange             NBA Live (Xbox One) – 600 points Download Code, Windows 10
 Server 2012        Server 5-Client, Windows Server 5-User Client Access,               Home, Mega Bloks Halo Covenant Drone Outbreak, Mega Bloks
                    Exchange Server 5-User Client Access                                Halo UNSC Vulture Gunship