<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Item2vec: Neural Item Embedding for Collaborative Filtering</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Oren Barkan</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Microsoft</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Israel</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Microsoft</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Israel</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Item2vec - Top 4 recommendations Avicii</institution>
          ,
          <addr-line>Calvin Harris, Martin Solveig, Deorro Miley Cyrus, Kelly Clarkson, P!nk, Taylor Swift Game, Snoop Dogg, N.W.A, DMX Willie Nelson, Jerry Reed, Dolly Parton, Merle Haggard Aerosmith, Ozzy Osbourne, Bon Jovi, AC/DC Rihanna, Beyonce, The Black eyed Peas, Bruno Mars</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Rabbids Invasion</institution>
          ,
          <addr-line>Mortal Kombat</addr-line>
          ,
          <country>Minecraft Periodic Table</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>SVD - Top 4 recommendations Brothers</institution>
          ,
          <addr-line>The Blue Rose, JWJ, Akcent Last Friday Night, Winx Club, Boots On Cats, Thaman S. Jack The Smoker, Royal Goon, Hoova Slim, Man Power Hank Williams, The Highwaymen, Johnny Horton, Hoyt Axton Bon Jovi, Gilby Clarke, Def Leppard, Mtley Cre JC Chasez. Jordan Knight, Shontelle, Nsync</addr-line>
        </aff>
      </contrib-group>
      <abstract>
        <p>Many Collaborative Filtering (CF) algorithms are item-based in the sense that they analyze item-item relations in order to produce item similarities. Recently, several works in the field of Natural Language Processing (NLP) suggested to learn a latent representation of words using neural embedding algorithms. Among them, the Skip-gram with Negative Sampling (SGNS), also known as Word2vec, was shown to provide state-of-the-art results on various linguistics tasks. In this paper, we show that item-based CF can be cast in the same framework of neural word embedding. Inspired by SGNS, we describe a method we name Item2vec for item-based CF that produces embedding for items in a latent space. The method is capable of inferring item-item relations even when user information is not available. We present experimental results that demonstrate the effectiveness of the Item2vec method and show it is competitive with SVD.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. INTRODUCTION</title>
      <p>
        Computing item similarities is a key building block in modern
recommender systems. While many recommendation
algorithms are focused on learning a low dimensional
embedding of users and items simultaneously [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], computing
item similarities is an end in itself.
      </p>
      <p>
        There are several scenarios where item-based CF methods
[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] are desired: in a large scale dataset, when the number of
users is significantly larger than the number of items, the
computational complexity of methods that model items solely
is significantly lower than methods that model both users and
items simultaneously. For example, online music services may
have hundreds of millions of enrolled users with just tens of
thousands of artists (items).
      </p>
      <p>
        Recent progress in neural embedding methods for linguistic
tasks have dramatically advanced state-of-the-art natural
language processing (NLP) capabilities [
        <xref ref-type="bibr" rid="ref3 ref4">3, 4</xref>
        ]. These methods
attempt to map words and phrases to a low dimensional vector
space that captures semantic and syntactic relations between
words. Specifically, Skip-gram with Negative Sampling
(SGNS), known also as word2vec [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], set new records in
various NLP tasks [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <p>In this paper, we propose to apply SGNS to item-based CF.
Motivated by its great success in other domains, we suggest that
SGNS with minor modifications may capture the relations between
different items in collaborative filtering datasets. To this end, we
propose a modified version of SGNS named Item2vec. We show
that Item2vec can induce a similarity measure that is competitive
with an item-based CF using SVD.</p>
    </sec>
    <sec id="sec-2">
      <title>2. ITEM2VEC</title>
      <p>
        SGNS is a neural word embedding method that was introduced
by Mikolov et. al in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. The method aims at finding words
representation that captures the relation between a word to its
surrounding words in a sentence.
      </p>
      <p>In the context of CF data, the items are given as user
generated sets. The application of SGNS to CF data is
straightforward once we realize that a sequence of words is
equivalent to a set or basket of items. Since we ignore the
spatial information, we treat each pair of items that share the
same set as a positive example. Therefore, for a given set of</p>
      <p>K
items {wi }i=1 ⊆ W , we aim at maximizing the following term:
1 K K  N </p>
      <p>∑ ∑ log σ (uiT v j )∏σ (−uiT vk ) 
K i=1 j ≠i  k =1
(1)
where ui ∈U (⊂ ℝm ) and vi ∈V (⊂ ℝm ) are latent vectors
that correspond to the target and context representations for
the item</p>
      <p>wi ∈W , respectively. σ ( x) = 1 / 1 + exp(− x) , m is
chosen empirically and according to the size of the dataset and
N is a parameter that determines the number of negative
examples to be drawn per a positive example. A negative item
wi is sampled from the unigram distribution raised to the
3/4rd power.</p>
      <p>
        In order to overcome the imbalance between rare and
frequent items, we subsample the data [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Specifically, we
discard each item w from its set, with a probability
p(discard | w) = 1 −
ρ / f (w)
where
f (w)
is the
frequency of the item w and ρ is a prescribed threshold.
      </p>
      <p>U
and V</p>
      <p>are estimated by applying stochastic gradient
ascent with respect to the objective in (1). Finally, we use ui
as the representation for the i -th item and the affinity between
a pair of items is computed by the cosine similarity.</p>
    </sec>
    <sec id="sec-3">
      <title>3. EXPERIMENTAL RESULTS</title>
      <p>In this section, we provide qualitative and quantitative results.
As a baseline item-based CF algorithm we used item-item
SVD. Specifically, we apply SVD to decompose A = USV * ,
where A is a square matrix in size of number of items. The
(i, j) entry in</p>
      <p>A contains the number of times (wi , wj )
appears as a positive pair in the dataset, normalized by the
square root of the product of its row and column sums. The
latent representation is given by the rows of Um Sm1/2 . The
affinity between items is computed by cosine similarity of
their representations.</p>
      <p>We evaluate the methods on two different private of datasets.
The first dataset consists of user-artist relations that are retrieved
from the Microsoft Xbox Music service. This dataset consist of 9M
events. Each event consists of a user-artist relation, which means
the user played a song by the specific artist. The dataset contains
732K users and 49K distinct artists.</p>
      <p>The second dataset contains orders of products from Microsoft
Store. An order is given as a basket of items without any
information about the user that made it. Therefore, the information
in this dataset is weaker in the sense that we cannot bind between
users and items. The dataset consists of 379K orders (that contains
more than a single item) and 1706 distinct items.</p>
      <p>We applied Item2vec and SVD to both datasets. The
dimension parameter was set to m = 40 . We ran item2vec on
both datasets for 20 epochs with negative sampling value
N = 15 . We further applied subsampling with ρ values of
1e-5 and 1e-3 to the Music and Store datasets, respectively.
The reason we set different parameter values is due to different
sizes of the datasets.</p>
      <p>
        The music dataset does not provide genre metadata.
Therefore, for each artist we retrieved the genre metadata from
the web to form a genre-artist catalog. Then we used this
catalog in order to visualize the relation between the learnt
representation and the genres. This is motivated by the
assumption that a useful representation would cluster artists
according to their genre. To this end, we generated a subset
that contains the top 100 popular artists per genre for 13
distinct genres. We applied t-SNE [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] with a cosine kernel to
reduce the dimensionality of the item vectors to 2. Then, we
colored each artist point according to its genre. Figure 1
presents the 2D embedding that was produced by t-SNE, for
item2vec. We observe that some of the relatively homogenous
clusters in Fig. 1 are contaminated with items that are colored
differently. We found out that many of these cases originate in
artists that are mislabeled in the web or have a mixed genre.
      </p>
      <p>In order to quantify the similarity quality, we tested the
genre consistency between an item and its nearest neighbors.
We do that by iterating over the top q popular items (for
various values of q ) and check whether their genre is
consistent with the genres of the k nearest items that surround
them. This is done by a simple majority voting. Table 1
presents the results obtained for k = 8 . We observe that
item2vec is consistently better than the SVD model, where the
gap between the two keeps growing as q increases. This
might imply that item2vec produces a better representation for
less popular items than the one produced by SVD. We further
validate this hypothesis by applying the same ‘genre
consistency’ test to a subset of 10K unpopular items (the last
row in Table 1). We define an unpopular item in case it has
less than 15 users that played its corresponding artist. The
accuracy obtained by item2vec was 68%, compared to 58.4%
by SVD.</p>
      <p>Qualitative comparisons between Item2Vec and SVD are
presented in Tables 2-3 for Music and Store datasets, respectively.
The tables present seed items and their 4 nearest neighbors (in the
latent space). The main advantage of this comparison is that it
enables the inspection of item similarities in higher resolutions than
genres. Moreover, since the Store dataset lacks any informative
tags / labels, a qualitative evaluation is inevitable. We observe that
for both datasets, Item2Vec provides lists that are better related to
the seed item than the ones that are provided by SVD. Furthermore,
we see that even though the Store dataset contains weaker
information, Item2Vec manages to infer item relations quite well.</p>
      <p>
        In future we plan to investigate more complex CF models [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]
and compare between them and item2vec.
      </p>
    </sec>
    <sec id="sec-4">
      <title>4. REFERENCES</title>
      <sec id="sec-4-1">
        <title>Seed item (genre)</title>
        <p>David Guetta (Dance)
Katy Perry (Pop)
Dr. Dre (Hip Hop)
Johnny Cash (Country)
Guns N' Roses (Rock)
Justin Timberlake (Pop)</p>
        <p>Item2vec – Top 4 recommendations
LEGO Bad Cop, LEGO Simpsons: Bart, LEGO Ninjago, LEGO
Scooby-Doo
Minecraft Diamond Earrings, Minecraft Periodic Table,
Minecraft Crafting Table, Minecraft Enderman Plush
GoPro Anti-Fog Inserts, GoPro The Frame Mount, GoPro Floaty
Backdoor, GoPro 3-Way
UAG Surface Pro 4 Case, Zip Sleeve for Surface, Surface 65W
Power Supply, Surface Pro 4 Screen Protection
Disney Maleficent, Disney Hiro, Disney Stich, Disney Marvel
Super Heroes
Windows Server Remote Desktop Services 1-User, Exchange
Server 5-Client, Windows Server 5-User Client Access,
Exchange Server 5-User Client Access</p>
      </sec>
      <sec id="sec-4-2">
        <title>SVD – Top 4 recommendations</title>
        <p>Minecraft Foam, Disney Toy Box, Minecraft (Xbox One), Terraria
(Xbox One)
Titanfall (Xbox One), GoPro The Frame Mount, Call of Duty
(PC), Evolve (PC)
Farming Simulator (PC), Dell 17 Gaming laptop, Bose Wireless
Headphones, UAG Surface Pro 4 Case
Disney Stich, Mega Bloks Halo UNSC Firebase, LEGO Simpsons:
Bart, Mega Bloks Halo UNSC Gungoose
NBA Live (Xbox One) – 600 points Download Code, Windows 10
Home, Mega Bloks Halo Covenant Drone Outbreak, Mega Bloks
Halo UNSC Vulture Gunship</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Koren</surname>
            <given-names>Y</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bell</surname>
            <given-names>R</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Volinsky</surname>
            <given-names>C</given-names>
          </string-name>
          .
          <article-title>Matrix factorization techniques for recommender systems</article-title>
          .
          <source>Computer. 2009 Aug</source>
          <volume>1</volume>
          (
          <issue>8</issue>
          ):
          <fpage>30</fpage>
          -
          <lpage>7</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Linden</surname>
            <given-names>G</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Smith</surname>
            <given-names>B</given-names>
          </string-name>
          , York J. Amazon.
          <article-title>com recommendations: Itemto-item collaborative filtering</article-title>
          .
          <source>Internet Computing, IEEE. 2003 Jan;7</source>
          (
          <issue>1</issue>
          ):
          <fpage>76</fpage>
          -
          <lpage>80</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Mnih</surname>
            <given-names>A</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hinton</surname>
            <given-names>GE</given-names>
          </string-name>
          .
          <article-title>A scalable hierarchical distributed language model</article-title>
          .
          <source>In Proceedings of NIPS 2009</source>
          (pp.
          <fpage>1081</fpage>
          -
          <lpage>1088</lpage>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Mikolov</surname>
            <given-names>T</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sutskever</surname>
            <given-names>I</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            <given-names>K</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Corrado</surname>
            <given-names>GS</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dean</surname>
            <given-names>J</given-names>
          </string-name>
          .
          <article-title>Distributed representations of words and phrases and their compositionality</article-title>
          .
          <source>In Proceedings of NIPS 2013</source>
          (pp.
          <fpage>3111</fpage>
          -
          <lpage>3119</lpage>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Van der Maaten</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Hinton</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          <article-title>Visualizing data using t-SNE</article-title>
          .
          <source>Journal of Machine Learning Research</source>
          , (
          <year>2008</year>
          )
          <volume>9</volume>
          (
          <fpage>2579</fpage>
          -
          <lpage>2605</lpage>
          ),
          <fpage>85</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>