=Paper= {{Paper |id=Vol-2697/paper2-impactrs |storemode=property |title=div2vec: Diversity-Emphasized Node Embedding |pdfUrl=https://ceur-ws.org/Vol-2697/paper2_impactrs.pdf |volume=Vol-2697 |authors=Jisu Jeong,Jeong-Min Yun,Hongi Keam,Young-Jin Park,Zimin Park,Junki Cho |dblpUrl=https://dblp.org/rec/conf/recsys/JeongYKPPC20 }} ==div2vec: Diversity-Emphasized Node Embedding == https://ceur-ws.org/Vol-2697/paper2_impactrs.pdf
                     div2vec: Diversity-Emphasized Node Embedding
                      Jisu Jeong                                             Jeong-Min Yun                                  Hongi Keam
       Clova AI Research, NAVER Corp.                                        WATCHA Inc.                                    WATCHA Inc.
            Seongnam, South Korea                                         Seoul, South Korea                              Seoul, South Korea
          jisu.jeong@navercorp.com                                       matthew@watcha.com                               paul@watcha.com

                  Young-Jin Park                                                Zimin Park                                   Junki Cho
       Naver R&D Center, NAVER Corp.                                         WATCHA Inc.                                    WATCHA Inc.
             Seoul, South Korea                                            Seoul, South Korea                            Seoul, South Korea
        young.j.park@navercorp.com                                        holden@watcha.com                               leo@watcha.com

ABSTRACT                                                                                     However, the previous approaches have a potentially severe
Recently, the interest of graph representation learning has been                         drawback, a lack of diversity. For example, consider a user just
rapidly increasing in recommender systems. However, most existing                        watched Iron Man. Since a majority of people tend to watch other
studies have focused on improving accuracy, but in real-world                            Marvel Cinematic Universe (MCU) films like Iron Man 2, Thor, and
systems, the recommendation diversity should be considered as well                       Marvel’s The Avengers after watching Iron Man, the system would
to improve user experiences. In this paper, we propose the diversity-                    recommend such MCU films based on historical log data. While
emphasized node embedding div2vec, which is a random walk-based                          the approach may lead to CTR maximization, 1) can we say that
unsupervised learning method like DeepWalk and node2vec. When                            users are satisfied with these apparent results? Or, 2) would a wider
generating random walks, DeepWalk and node2vec sample nodes                              variety of recommendations achieve better user experience?
of higher degree more and nodes of lower degree less. On the                                 Recently, a method that addresses the first question is presented
other hand, div2vec samples nodes with the probability inversely                         on Spotify [2]. This work categorized those who listen to very sim-
proportional to its degree so that every node can evenly belong to                       ilar songs and different sets of entities as specialists and generalists,
the collection of random walks. This strategy improves the diversity                     respectively. This work observed that generalists are much more
of recommendation models. Offline experiments on the MovieLens                           satisfied than specialists based on long-term user metrics (i.e., the
dataset showed that our new method improves the recommendation                           conversion to subscriptions and retention on the platform). Thus,
performance in terms of both accuracy and diversity. Moreover, we                        even if some users are satisfied with the evident recommendations
evaluated the proposed model on two real-world services, WATCHA                          (clicked or played), this satisfaction does not imply the users con-
and LINE Wallet Coupon, and observed the div2vec improves the                            tinue to use the platform.
recommendation quality by diversifying the system.                                           To answer the second question, we propose the diversity-emphasized
                                                                                         node embedding div2vec. Recently, the number of studies on graph
CCS CONCEPTS                                                                             structure [8, 10, 17, 24, 26, 28] is increasing. Unfortunately, most of
                                                                                         those studies have merely focused on the accuracy. DeepWalk and
β€’ Computing methodologies β†’ Learning latent representa-
                                                                                         node2vec are the first and the most famous node embedding meth-
tions; Machine learning algorithms; Knowledge representation
                                                                                         ods [8, 24]. DeepWalk, node2vec, and our method div2vec generates
and reasoning; Neural networks.
                                                                                         random walks first and then use the Skip-gram model [20] to com-
                                                                                         pute embedding vectors of all nodes. When generating random
KEYWORDS                                                                                 walks, their methods choose nodes of high degree more. It makes
graph representation learning, node embedding, diversity, recom-                         sense because if a node had many neighbors in the past, it will have
mender system, link prediction, live test                                                many neighbors in the future, too. However, it may be an obstacle
                                                                                         for personalizing. Using our new method, all nodes are evenly dis-
1    INTRODUCTION                                                                        tributed in the collection of random walks. Roughly speaking, the
Most recommender system studies have focused on finding users’                           key idea is to choose a node with weight 𝑑1 where 𝑑 is the degree
immediate demands; they try to build models that maximize the                            of the node. Also, we propose a variant of div2vec, which we call
click-through rate (CTR). The learned system suggests high-ranked                        rooted div2vec, obtained by changing the weight 𝑑1 to √1 in order to
                                                                                                                                                    𝑑
items that users are likely to click in a myopic sense [6, 9, 30, 32].                   balance accuracy and diversity. To the best of our knowledge, our
Such recommendation strategies have successfully altered simple                          approach is the first node embedding method focusing on diversity.
popularity-based or handmade lists, thus being widely adopted on                         Details are in Section 3.
many online platforms including Spotify [15], Netflix [18], and so                           We evaluate our new methods on a benchmark and two real-
on.                                                                                      word datasets. As the benchmark, we verify the offline metrics on
                                                                                         the MovieLens dataset. As a result, div2vec got higher scores in the
Proceedings of the ImpactRS Workshop at ACM RecSys ’20, September 25, 2020, Virtual      diversity metrics, such as coverage, entropy-diversity, and average
Event, Brazil.
Copyright (c) 2020 for this paper by its authors. Use permitted under Creative Commons   intra-list similarity, and lower scores in the accuracy metrics, such
License Attribution 4.0 International (CC BY 4.0).
.
                                     Figure 1: Screenshots of WATCHA and LINE Wallet Coupon.


as AUC score, than DeepWalk and node2vec. Furthermore, its vari-         Here, for π‘₯ in 𝑁 (𝑣𝑖 ), we set the weight 𝑀 (𝑣𝑖 , π‘₯) as follows:
ant rooted div2vec had the highest AUC score and also the diversity                                  1
                                                                                                   ο£±     if π‘₯ = π‘£π‘–βˆ’1,
scores of rooted div2vec are the best or the second-best.                                          
                                                                                                   𝑝
                                                                                                   
                                                                                                   
   We figure out that increasing diversity actually improves online                    𝑀 (𝑣𝑖 , π‘₯) = 1    if π‘₯ is adjacent to π‘£π‘–βˆ’1 ,
                                                                                                   1
                                                                                                   
performance. We test on two different live services, WATCHA and                                          otherwise.
                                                                                                   
                                                                                                   ο£³π‘ž
LINE Wallet Coupon. Screenshots of the services are in Figure 1.
WATCHA is one of the famous OTT streaming services in South              Note that if a graph is bipartite, the second case does not appear.
Korea. Like Netflix, users can watch movies and TV series using          node2vec chooses 𝑣𝑖+1 at random with the weight 𝑀 (𝑣𝑖 , π‘₯).
WATCHA. LINE is the most popular messenger in Japan, and LINE               The most advantage of graph representation learning or graph
Wallet Coupon service provides various coupons, such as, pizza, cof-     neural networks is that these models can access both local and
fee, shampoo, etc. In the above two different kinds of recommender       higher-order neighborhood information. However, as the number
systems, we used our diversity-emphasized node embedding and             of edges is usually too many, they may be inefficient. The random
succeeded to enhance online performances. It is the biggest contri-      walk-based method solves this problem. Instead of considering all
bution of our work to prove that users in real world prefer a diverse    nodes and all edges, it only considers the nodes in the collection
and personalized recommendation.                                         of random walks. Therefore, the way to generate random walks is
   The structure of the paper is as follows. In Section 2, we review     important and it affects performance.
random walk-based node embedding methods and the study on
diversity problems. The proposed method will be described in Sec-
                                                                         2.2    Diversity problems
tion 3. Section 4 and Section 5 reports the results of our experiments   The word β€œfilter bubble” refers to a phenomenon in which the
on offline tests and online tests, respectively. Section 6 concludes     recommender system blocks providing various information and
our research.                                                            filters only information similar to the user’s taste. In [3, 21, 22],
                                                                         they show the existence of the filter bubble in their recommender
                                                                         system. Some research [1, 27] claim that diversity is one of the
2 RELATED WORK                                                           essential components in the recommender system.
                                                                             Some studies are proving that diversity increases the user’s satis-
2.1 Random walk-based node embeddings
                                                                         faction. Spotify, one of the best music streaming services, observed
The famous word2vec method transforms words into embedding               that diverse consumption behaviors are highly associated with
vectors such that similar words have similar embeddings. It uses         long-term metrics like conversion and retention [2, 13]. Also, Liu
a language model, called Skip-gram [20], that maximizes the co-          et al. [14] improve the user’s preference by using neural graph
occurrence probability among the words near the target word.             filtering which learns diverse fashion collocation.
   Inspired by word2vec, Perozzi et al. [24] introduced DeepWalk that        One may think that if a recommender system gains diversity,
transforms nodes in a graph into embedding vectors. A walk is a          then it looses the accuracy. However, the following research suc-
sequence of nodes in a graph such that two consecutive nodes are         ceeds in improving both. Adomavicius and Kwon [1] applied a
adjacent. A random walk is a walk such that the next node in the         ranking technique to original collaborative filtering in order to in-
walk is chosen randomly from the neighbors of the current node.          crease diversity without decreasing the accuracy. Zheng et al. [31]
DeepWalk first samples a collection of random walks from the in-         proposed a Deep Q-Learning based reinforcement learning frame-
put graph where each node in random walks are chosen uniformly           work for news recommendation. Their model improves both click-
at random. Once a collection of random walks is generated, we            through rate and intra-list similarity.
treat nodes and random walks as words and sentences, respectively.
Then by applying word2vec method, we can obtain an embedding             3 PROPOSED METHOD
vector of each node.
   node2vec [8] is a generalization of DeepWalk. When nodes in           3.1 Motivation
random walks are chosen, node2vec uses two parameters 𝑝 and π‘ž.           In the framework of DeepWalk and node2vec, the model first gen-
Suppose we have an incomplete random walk 𝑣 1, 𝑣 2, . . . , 𝑣𝑖 and we    erates a collection of random walks, and then runs the famous
will choose one node in the neighborhood 𝑁 (𝑣𝑖 ) of 𝑣𝑖 to be 𝑣𝑖+1 .      word2vec algorithm to obtain embedding vectors. In their way,
                               (a) DeepWalk, movieId                                   (b) DeepWalk, userId




                              (c) rooted div2vec, movieId                             (d) rooted div2vec, userId




                                 (e) div2vec, movieId                                    (f) div2vec, userId

Figure 2: The π‘₯-axis are nodes and the blue line means the degree of nodes. The 𝑦-axis denotes the frequency of nodes in the
collection of random walks.


nodes of high degree should be contained more than nodes of low          Figure 2e and Figure 2f are evenly distributed regardless of their
degree in the collection of random walks because, roughly speaking,      degree.
if a node 𝑣 has 𝑑 neighbors, then there are 𝑑 chances that 𝑣 can
belongs to the collection of random walks. Figure 2a and Figure 2b       3.2     div2vec
represent this phenomenon. The π‘₯-axis are the nodes sorted by the        Now, we introduce the diversity-emphasized node embedding method.
degree and the blue line means the degree of nodes. So, the blue line    Similarly to DeepWalk and node2vec, we first produce a lot of ran-
is always increasing and it means that nodes of higher degrees are       dom walks and train skip-gram model. We apply an easy but bright
on the right side in each figure. Orange bars mean the frequencies       idea to generate random walks so that our model can capture the
of nodes in the collection of random walks. It is easy to observe        diversity of the nodes in their embedding vectors.
that nodes of high degree appear extremely more than that of low            Suppose a node 𝑣 is the last node in an incomplete random walk
degree.                                                                  and we are going to choose the next node among the neighbors
   As the collection of random walks are the training set of the skim-   of 𝑣.
gram model, DeepWalk and node2vec might be trained with a bias
to nodes of high degree. It may not be a trouble for solving problems          β€’ DeepWalk picks the next node in 𝑁 (𝑣) at random with the
focused on the accuracy. For example, in link prediction, high-                  same probability.
degree nodes in the original graph might have a higher probability             β€’ In node2vec, if 𝑀 is the node added to the random walk just
of being linked with other nodes than low-degree nodes. In terms                 before 𝑣, then there are three types of probability depend
of movies, if the movie is popular, then many people will like this              on whether a node 𝑒 ∈ 𝑁 (𝑣) is adjacent with 𝑀 or not, or
movie. However, it must be a problem when we want to focus                       𝑒 = 𝑀.
on personalization and diversity. Unpopular movies might not be                β€’ Our method will choose the next node according to the
trained enough so that they are not well-represented. So, even if                degree of neighbors.
a person actually prefers some unpopular movie to some popular           Formally, our method chooses the next node 𝑒 ∈ 𝑁 (𝑣) with the
movie, the recommender system tends to recommend the popular             probability
movie for safe.                                                                                   𝑓 (deg(𝑒))
   Motivated by Figure 3, which shows the difference between
                                                                                             Í
                                                                                               𝑀 βˆˆπ‘ (𝑣) 𝑓 (deg(𝑀))
reality and equity, we decided to consider the degree of the next
                                                                         for some function 𝑓 . For example, when 𝑓 (π‘₯) = 1/π‘₯, if π‘₯ has two
candidate nodes inversely. The main idea is β€˜Low degree, choose
                                                                         neighbors 𝑦 and 𝑧 whose degree is 10 and 90 respectively, then 𝑦 is
more.’. We propose a simple but creative method, which will be
                                                                         chosen with probability (1/10)/(1/10 + 1/90) = 0.9 and 𝑧 is chosen
formally described in the next subsection, which gives Figure 2e
                                                                         with probability 0.1. That is, since the degree of 𝑦 is smaller than
and Figure 2f. Compare to Figure 2a and Figure 2b, the nodes in
                                                                         the degree of 𝑧, the probability that 𝑦 is chosen is larger than the
Figure 3: There are three people of different heights. The concept of equality is to give the same number of boxes to all.
However, in reality, the rich get richer and the poor get poorer. As a perspective of equity, the small person get more boxes
than the tall person.1


probability that 𝑧 is chosen. In Section 4, we set 𝑓 to the inverse       10 steps, but we need binary data, which means watched/not or
of the identity function 𝑓 (π‘₯) = 1/π‘₯ and the inverse of the square        satisfied/unsatisfied, in order to train a model and compute AUC
                            √
root function 𝑓 (π‘₯) = 1/ π‘₯. We call this method div2vec when              score. We set more than 4 stars to be positive and less than 3 stars to
                                                  √
𝑓 (π‘₯) = 1/π‘₯ and rooted div2vec when 𝑓 (π‘₯) = 1/ π‘₯.                         be negative. To prevent noises, we remove both the movies having
   Intuitively, DeepWalk chooses the next node without consider-          less than 10 records and the users having less than 10 or more than
ing the past or the future nodes, node2vec chooses the next node          1000 records. At last, there are 2,009,593 records with 16,002 users
according to the past node, and div2vec chooses the next node             and 5,298 movies. For the test set, 20% of the data are used.
with respect to the future node. Note that it is possible to com-            To compute intra-list similarity, which will be described in Sub-
bine node2vec and div2vec by first dividing into three types and          section 4.3, we use β€˜Tag Genome’ [29] from MovieLens-25M. It
then consider the degree of neighbors. Since there are too many           contains data in β€˜movieId, tagId, relevance’ format for every pair
hyperparameters to control, we remain it to the next work.                of movies and tags. Relevance values are real numbers between
   Figure 2 is the result for generating random walks with several        0 and 1. So, it can be treated as a dense matrix and one row that
methods. The detail for the dataset is in Subsection 4.1. If we use       represents one movie means a vector containing tag information.
DeepWalk, then Figure 2a and Figure 2b show that high-degree
nodes appears extremely much more than low-degree nodes. The              4.2    Experiment settings
problem is that, if the result is too skew, then the skip-gram model
might not train some part of data well. For example, a popular            In movie recommender systems, a model recommends a list of
movie will appear many times in the collection of random walks            movies to each user. In other words, a model needs to find out which
and then the model should overfit to the popular movie. On the            movies will be connected with an individual user. It means that
other hands, an unpopular movie will appears only few times in            our task is a link prediction. However, the methods we discussed
the collection of random walks and then the model should underfit         so far are only compute node embeddings. That is, we have an
to the unpopular movie.                                                   embedding vector for movies and users but not for their interactions.
   This problem is solvable by using our method. Using div2vec,           Grover and Leskovec [8] introduced four operators to obtain edge
we can have the nodes evenly in the collection of random walks.           embeddings from node embeddings as follows. Let 𝑒 and 𝑣 be two
Figure 2e and Figure 2f show that our method solves this problem.         nodes and π‘’π‘šπ‘ (𝑒) and π‘’π‘šπ‘ (𝑣) be their embedding vectors.
The nodes are chosen equally regardless of the degree of nodes. Nor-         (1) Average:
                                                                                           π‘’π‘šπ‘ (𝑒)+π‘’π‘šπ‘ (𝑣)
                                                                                                2
mally, popular movies are consumed more than unpopular movies.               (2) Hadamard: π‘’π‘šπ‘ (𝑒) βˆ— π‘’π‘šπ‘ (𝑣) (element-wise product)
So div2vec may decrease the accuracy. Our experiments prove that             (3) Weighted-L1: |π‘’π‘šπ‘ (𝑒) βˆ’ π‘’π‘šπ‘ (𝑣)|
even if we emphasize the diversity, the accuracy decrease very little.       (4) Weighted-L2: |π‘’π‘šπ‘ (𝑒) βˆ’ π‘’π‘šπ‘ (𝑣)| 2
Furthermore, we suggest the variant rooted div2vec. Figure 2c and
Figure 2d can be treated as the combination of DeepWalk and                  For each edge, we obtain 64-dim vector from the graph induced
div2vec. In Subsection 4.4, our experiments show that compare to          by positive edges and 64-dim vector from the graph induced by
DeepWalk and node2vec, rooted div2vec records better scores for           negative edges. And then we concatenate the positive edge embed-
every metric.                                                             ding vector and the negative edge embedding vector to represent
                                                                          the edge embedding vector.
4 OFFLINE EXPERIMENTS                                                        To avoid disrupting the performance of a prediction model, we
                                                                          use simple deep neural network with one hidden layer of size 128.
4.1 DataSets
We used the famous MovieLens dataset [11] for an offline test. We
used MovieLens-25M because it is the newest data and we only
                                                                          4.3    Evaluation metrics
used the recent 5 years in the dataset. Rating data is made on            For each embedding and each operator, we compute four metrics,
                                                                          one for accuracy and the others for diversity. The larger score means
1 This figure is from http://www.brainkart.com/article/Equality_34271/.   the better performance.
              Method           AUC       CO(1)        ED(1)     CO(10)     ED(10)       ILS(10)     CO(50)      ED(50)       ILS(50)
             DeepWalk        0.874204    1035       4.882882     2944     6.051621     0.673236      4510      6.865239     0.670873
              n2v-(1,2)      0.868537    1125       5.387989     2998     6.223997     0.685589      4483      6.877584     0.680990
              n2v-(2,1)      0.864024     958       5.064918     2577     6.034628     0.682573      4081      6.773981     0.674534
               div2vec       0.851322    2793       6.859013     4717     7.308675     0.706817      5243      7.614828     0.700030
           rooted div2vec    0.888123    2332       6.713877     4500     7.275315     0.705358      5207      7.614992     0.700071
                                               (a) The results with the operator Weighted-L1.

              Method           AUC       CO(1)        ED(1)     CO(10)     ED(10)       ILS(10)     CO(50)      ED(50)       ILS(50)
             DeepWalk        0.878293    1131       5.231090     3016     6.170295     0.665313      4453      6.837852     0.666019
              n2v-(1,2)      0.871771    1302       5.504011     3261     6.323582     0.672085      4598      6.908540     0.671838
              n2v-(2,1)      0.867549     973       4.865928     2725     5.942549     0.669559      4135      6.731078     0.667980
               div2vec       0.853404    2435       6.544910     4608     7.196730     0.704196      5213      7.555974     0.700267
           rooted div2vec    0.889674    2168       6.457435     4597     7.236516     0.700865      5233      7.612159     0.698983
                                               (b) The results with the operator Weighted-L2.

              Method           AUC       CO(1)        ED(1)     CO(10)     ED(10)       ILS(10)     CO(50)      ED(50)       ILS(50)
             DeepWalk        0.862314    1331       6.088063     3156     6.856834     0.671195      4642      7.420330     0.667491
              n2v-(1,2)      0.866459    1593       6.217412     3570     6.988375     0.684405      4740      7.478146     0.680929
              n2v-(2,1)      0.863299    1587       6.315792     3371     7.053128     0.670602      4645      7.516015     0.669998
               div2vec       0.837683    2605       7.092134     4623     7.711741     0.679191      5200      8.033483     0.673614
           rooted div2vec    0.870787    2565       7.177497     4573     7.775564     0.670754      5254      8.103879     0.666571
                                                   (c) The results with the operator Hadamard.

              Method           AUC       CO(1)        ED(1)     CO(10)     ED(10)       ILS(10)     CO(50)      ED(50)       ILS(50)
             DeepWalk        0.903541     440       3.419942     1442     4.940299     0.709170      2872      5.988137     0.706576
              n2v-(1,2)      0.907638     617       4.031595     1957     5.464436     0.730591      3481      6.403608     0.717133
              n2v-(2,1)      0.909999     686       4.259565     2083     5.700572     0.745205      3540      6.471546     0.724843
               div2vec       0.894085     882       4.730973     2575     6.130730     0.725977      4039      6.972115     0.715884
           rooted div2vec    0.913342     831       4.695825     2481     6.083932     0.742770      4121      6.929550     0.727009
                                                    (d) The results with the operator Average.

                                                   Table 1: The results on an offline test.



  AUC SCORE AUC score is area under the Receiver Operating                 means that users can have difference items. Furthermore, if the
Characteristic curve, which is plotting True Positive Rate (TPR)           accuracy is competitive, then we may say that the model is good at
against False Positive Rate (FPR) at various thresholds. TPR and           personalization.
FPR are defined as follow:                                                    ENTROPY-DIVERSITY Adomavicius and Kwon [1] introduced
                          (π‘‘β„Žπ‘’ π‘›π‘’π‘šπ‘π‘’π‘Ÿ π‘œ 𝑓 π‘‘π‘Ÿπ‘’π‘’ π‘π‘œπ‘ π‘–π‘‘π‘–π‘£π‘’)                   the entropy-based diversity measure Entropy-Diversity. Let π‘ˆ be
TPR =                                                                     ,
        (π‘‘β„Žπ‘’ π‘›π‘’π‘šπ‘π‘’π‘Ÿ π‘œ 𝑓 π‘‘π‘Ÿπ‘’π‘’ π‘π‘œπ‘ π‘–π‘‘π‘–π‘£π‘’) + (π‘‘β„Žπ‘’ π‘›π‘’π‘šπ‘π‘’π‘Ÿ π‘œ 𝑓 𝑓 π‘Žπ‘™π‘ π‘’ π‘›π‘’π‘”π‘Žπ‘‘π‘–π‘£π‘’) the set of all users, and π‘Ÿπ‘’π‘ 𝑀,π‘˜ (𝑖) be the number of users 𝑒 such
                                                                           that 𝑖 ∈ 𝑅𝑀,π‘˜ (𝑒) for a model 𝑀, an integer π‘˜, and an item 𝑖. Then
                         (π‘‘β„Žπ‘’ π‘›π‘’π‘šπ‘π‘’π‘Ÿ π‘œ 𝑓 𝑓 π‘Žπ‘™π‘ π‘’ π‘π‘œπ‘ π‘–π‘‘π‘–π‘£π‘’)
FPR =                                                                     .                                     Γ•  π‘Ÿπ‘’π‘ (𝑖)   π‘Ÿπ‘’π‘ (𝑖) 
        (π‘‘β„Žπ‘’ π‘›π‘’π‘šπ‘π‘’π‘Ÿ π‘œ 𝑓 𝑓 π‘Žπ‘™π‘ π‘’ π‘π‘œπ‘ π‘–π‘‘π‘–π‘£π‘’) + (π‘‘β„Žπ‘’ π‘›π‘’π‘šπ‘π‘’π‘Ÿ π‘œ 𝑓 π‘‘π‘Ÿπ‘’π‘’ π‘›π‘’π‘”π‘Žπ‘‘π‘–π‘£π‘’)     ENTROPY-DIVERSITY(𝑀) = βˆ’                        ln             .
AUC score is in range 0 to 1. As close to 1, it gives better evaluation                                          𝑖
                                                                                                                     π‘˜ Γ— |π‘ˆ |    π‘˜ Γ— |π‘ˆ |
and is close to perfect prediction. AUC score is useful evaluation         Note that we can say that if ENTROPY-DIVERSITY(𝑀1 ) < ENTROPY-
metric because of scale invariant and classification threshold in-         DIVERSITY(𝑀2 ), then 𝑀2 recommends more diverse items than 𝑀1 .
variant to compare multiple prediction model.                              Here is an example. For an item set 𝐼 = {π‘–π‘‘π‘’π‘š1, π‘–π‘‘π‘’π‘š2, . . . , π‘–π‘‘π‘’π‘š9}
  COVERAGE Coverage is how many items appear in the recom-                  and a user set π‘ˆ = {π‘’π‘ π‘’π‘Ÿ 1, π‘’π‘ π‘’π‘Ÿ 2, π‘’π‘ π‘’π‘Ÿ 3}, suppose a model 𝑀1 gives
mended result. Formally, we can define as                                   𝑅𝑀1 ,3 (𝑒) = {π‘–π‘‘π‘’π‘š1, π‘–π‘‘π‘’π‘š2, π‘–π‘‘π‘’π‘š3} for every user 𝑒, and a model
                                    Ø                                       𝑀2 gives 𝑅𝑀2 ,3 (π‘’π‘ π‘’π‘Ÿ 1) = {π‘–π‘‘π‘’π‘š1, π‘–π‘‘π‘’π‘š2, π‘–π‘‘π‘’π‘š3}, 𝑅𝑀2 ,3 (π‘’π‘ π‘’π‘Ÿ 2) =
                   coverage(𝑀) =        𝑅𝑀,π‘˜ (𝑒)                            {π‘–π‘‘π‘’π‘š4, π‘–π‘‘π‘’π‘š5, π‘–π‘‘π‘’π‘š6}, 𝑅𝑀2 ,3 (π‘’π‘ π‘’π‘Ÿ 3) = {π‘–π‘‘π‘’π‘š7, π‘–π‘‘π‘’π‘š8, π‘–π‘‘π‘’π‘š9}. Then
                                    𝑒                                       ENTROPY-DIVERSITY(𝑀1 ) = βˆ’(3/9) ln(3/9) Γ— 3 + 0 Γ— 6 = ln 3 and
where 𝑀 is a model, 𝑅𝑀,π‘˜ is a set of top-π‘˜ recommended items                ENTROPY-DIVERSITY(𝑀2 ) = βˆ’(1/9) ln(1/9) Γ— 9 = ln 9.
for a user 𝑒 by 𝑀. Many papers [1, 7, 12, 16] discuss the impor-               AVERAGE INTRA-LIST SIMILARITY From the recommen-
tance of the coverage. If the coverage of the model is large, then          dation model, every user will receive a list of items. Intra-List Simi-
the model recommends a broad range of items, and it implicitly              larity (ILS) measures how dissimilar or similar items in the list are.
                                                         Week       clicks plays watch time
                                                     the first week 66.19 62.07      3.58
                                                   the second week 39.69 28.52       4.19
                                       Table 2: Percentage of improvements (%) of div2vec over node2vec.



Formally,                                                                       We filter-out users who do not have watch-complete logs last few
                                                                                days, also filter-out extreme case users (too many or too few logs);
                              Í        Í
                                𝑖 ∈𝐿       𝑗 ∈𝐿,𝑖≠𝑗 sim(𝑖, 𝑗)
                   ILS(𝐿) =                                                     results in 21,620 users. Two methods, node2vec and div2vec, were
                                             |𝐿|(|𝐿| βˆ’ 1)/2
                                                                                trained with these filtered logs. For node2vec, we set the parameters
where 𝐿 is the recommended item list and sim(𝑖, 𝑗) is the similarity
                                                                                𝑝 = π‘ž = 1.
measure between the tag-genome vectors of 𝑖 and 𝑗, which are
                                                                                    An A/B test had been conducted two weeks, where 21,620 users
given from the MovieLens dataset [11, 29]. We set sim(𝑖, 𝑗) = 1 βˆ’
      𝑣𝑖 ·𝑣 𝑗                                                                   were randomly and evenly partitioned into two groups and each
 | |𝑣𝑖 | | Β· | |𝑣 𝑗 | | where 𝑣𝑖 and 𝑣 𝑗 are corresponding tag-genome vectors   group received either node2vec or div2vec recommending results. In
of 𝑖 and 𝑗. By definition, if the value is small, then items in the             more detail, WATCHA has list-wise recommendation home whose
list are similar. Otherwise, they are dissimilar. Note that Bradly              list consists of several videos, and our list inserted into the fifth
and Smyth [4], and Meymandpour and Davis [19] use the same                      row. To make the list, we sorted all available videos by the final
definition in terms of β€˜diversity’. For every user 𝑒, we compute                scores and pick top π‘˜ of them (π‘˜ varies with devices), and also
𝐼𝐿𝑆 (𝑅𝑀,π‘˜ (𝑒)) and their average, which we call Average Intra-List              apply random shuffling of top 3π‘˜ videos to avoid always the same
Similarity in order to measure how diverse a model is.                          recommendation.
                                                                                    We compare clicks and plays of node2vec and div2vec list by the
4.4     Results on an offline test                                              week. (The first two columns in Table 2) In the first week, div2vec list
Table 1 summarizes the results of our offline experiments on the                received more than 60% more actions than node2vec list in both
MovieLens dataset. We did many experiments under various condi-                 clicks and plays; 39.69% more clicks and 28.52% more plays at the
tions.                                                                          second week. As we can see that div2vec beats node2vec with clicks
      β€’ five methods: DeepWalk [24], node2vec [8] with different                and plays by a significant margin.
        hyperparameters, div2vec and its variant rooted div2vec                     Someone may argue that the above improvement does not im-
      β€’ four operators: Weighted-L1, Weighted-L2, Hadamard, Aver-               prove actual user satisfaction; if users who received the node2vec list
        age                                                                     are satisfied with other recommended list. To see this actually hap-
      β€’ four metrics: AUC score, coverage, entropy-diversity, and               pens, we compare total watch time of each group during the test.
        average intra-list similarity                                           (The last column in Table 2) In the first week, div2vec achieved 3.58%
      β€’ three sizes of recommendation lists: 1, 10, 50                          more watch time than node2vec, and 4.19% in the second week. Let
                                                                                me note that in watch time comparison, even 1% improvement
AUC means AUC score. CO(π‘˜), ED(π‘˜), ILS(π‘˜) means coverage, entropy-              is hard to achieve [5], thus our improvement is quite impressive
diversity, average intra-list similarity of top π‘˜ recommended items,            results.
respectively. n2v-(p,q) is node2vec with hyperparameter 𝑝, π‘ž.
   Our proposed methods div2vec and rooted div2vec record the                   5.2      Coupon Recommendation
highest scores on all metrics in Table 1a and Table 1b In Table 1c
                                                                                To further demonstrate the effectiveness of using the div2vec em-
and Table 1d, the average intra-list similarity is not the best but the
                                                                                bedding in other real-world recommender systems, we run an A/B
second with tiny gaps. Overall, it is easy to see that div2vec and
                                                                                test in the LINE Wallet Coupon service and evaluate the online per-
rooted div2vec got better scores than DeepWalk and node2vec in
                                                                                formance for two weeks in the spring of 2020. The system consists
diversity metrics. Furthermore, rooted div2vec got the best scores
                                                                                of over six million users and over five hundred unique items. We
in the accuracy metric. Thus, we can conclude that our proposed
                                                                                constructed the user-item bipartite graph by defining each user and
methods increase the diversity of recommender systems.
                                                                                item as an individual node and connecting nodes that are interacted
                                                                                each other. Using the graph, we obtained div2vec embedding vec-
5 LIVE EXPERIMENTS
                                                                                tors for each nodes. In this experiment, we compared the number
5.1 Video Recommendation                                                        of unique users clicked (Click UU) and click-through rate (CTR)
In the previous experiments, we verified that our methods, div2vec and          of the neural-network based recommendation model2 using the
rooted div2vec, clearly increase the diversity of recommended re-               precomputed div2vec embedding vectors as additional features to
sults. The remaining job is to prove that div2vec actually increases            the model that did not. As side information, the paper utilized gen-
user satisfaction in real-world recommender systems. To show this,              der, age, mobile OS type, and interest information for users, while
we conduct an A/B test in the commercial video streaming service,               brand, discount information, text, and image features for items. The
WATCHA, and measure and compare various user activity statistics                online experiment results show that the overall Click UU and CTR
that are related to user satisfaction.                                          have been improved by 6.55% and 2.90%, respectively. The relative
   We collected four months watch-complete logs starting from                   2 The details in model architecture for the LINE Wallet Coupon recommender system
January 2020, here watch-complete means user completing a video.                are presented in [23, 25].
                                              w/ div2vec   w/ div2vec (avg)        w/o div2vec                                 w/ div2vec   w/ div2vec (avg)        w/o div2vec
                                                                                                                1.15
            Relative Click UU




                                                                                                 Relative CTR
                                1.1                                                                             1.10
                                                                                                                1.05
                                1.0                                                                             1.00
                                                                                                                0.95
                                      1   3          5     7   9              11     13                                1   3          5     7   9              11     13
                                                           Day                                                                              Day
                                               (a) Relative Click UU                                                              (b) Relative CTR

Figure 4: Relative performance of the model applying div2vec feature to the existing model in LINE Wallet Coupon service by
date.


performance for two weeks is illustrated in Figure 4 by date. By                                      system. In Proceedings of the Twelfth ACM International Conference on Web Search
applying the div2vec feature, a larger number of users get interested                                 and Data Mining. 456–464.
                                                                                                  [6] Heng-Tze Cheng, Levent Koc, Jeremiah Harmsen, Tal Shaked, Tushar Chandra,
in the recommended coupon list and the ratio that the user reacts                                     Hrishi Aradhye, Glen Anderson, Greg Corrado, Wei Chai, Mustafa Ispir, Rohan
to the exposed item increases, significantly. Considering that the                                    Anil, Zakaria Haque, Lichan Hong, Vihan Jain, Xiaobing Liu, and Hemal Shah.
                                                                                                      2016. Wide Deep Learning for Recommender Systems. In Proceedings of the
online tests were conducted for a relatively long period, we con-                                     1st Workshop on Deep Learning for Recommender Systems (Boston, MA, USA)
clude that the diversified recommendation based on the proposed                                       (DLRS 2016). Association for Computing Machinery, New York, NY, USA, 7–10.
method has led to positive consequences in user experience rather                                     https://doi.org/10.1145/2988450.2988454
                                                                                                  [7] Mouzhi Ge, Carla Delgado-Battenfeld, and Dietmar Jannach. 2010. Beyond
than to attract curiosity from users temporarily.                                                     Accuracy: Evaluating Recommender Systems by Coverage and Serendipity. In
                                                                                                      Proceedings of the Fourth ACM Conference on Recommender Systems (Barcelona,
6    CONCLUSION                                                                                       Spain) (RecSys ’10). Association for Computing Machinery, New York, NY, USA,
                                                                                                      257–260. https://doi.org/10.1145/1864708.1864761
We have introduced the diversity-emphasized node embedding                                        [8] Aditya Grover and Jure Leskovec. 2016. Node2vec: Scalable Feature Learning
div2vec. Several experiments showed the importance of our method.                                     for Networks. In Proceedings of the 22nd ACM SIGKDD International Conference
                                                                                                      on Knowledge Discovery and Data Mining (San Francisco, California, USA) (KDD
Compared to DeepWalk and node2vec, the recommendation model                                           ’16). Association for Computing Machinery, New York, NY, USA, 855–864. https:
based on div2vec increased the diversity metrics like coverage,                                       //doi.org/10.1145/2939672.2939754
                                                                                                  [9] Huifeng Guo, Ruiming Tang, Yunming Ye, Zhenguo Li, and Xiuqiang He. 2017.
entropy-diversity, average intra-list similarity in the MovieLens                                     DeepFM: A Factorization-Machine Based Neural Network for CTR Prediction.
dataset. The main contribution of this paper is that we verified that                                 In Proceedings of the 26th International Joint Conference on Artificial Intelligence
users satisfy with the recommendation model using div2vec in two                                      (Melbourne, Australia) (IJCAI’17). AAAI Press, 1725–1731.
                                                                                                 [10] William L. Hamilton, Rex Ying, and Jure Leskovec. 2017. Inductive Representation
different live services.                                                                              Learning on Large Graphs. In Proceedings of the 31st International Conference on
   We remark that as div2vec is an unsupervised learning method                                       Neural Information Processing Systems (Long Beach, California, USA) (NIPS’17).
like word2vec, it can be easily combined with other studies and                                       Curran Associates Inc., Red Hook, NY, USA, 1025–1035.
                                                                                                 [11] F. Maxwell Harper and Joseph A. Konstan. 2015. The MovieLens Datasets: History
services, and it is possible to improve their performance. Also, by                                   and Context. ACM Trans. Interact. Intell. Syst. 5, 4, Article 19 (Dec. 2015), 19 pages.
changing the function 𝑓 , the distribution of nodes in the collection                                 https://doi.org/10.1145/2827872
                                                                                                 [12] Jonathan L. Herlocker, Joseph A. Konstan, Loren G. Terveen, and John T. Riedl.
of random walks can be adjusted to each domain.                                                       2004. Evaluating Collaborative Filtering Recommender Systems. ACM Trans. Inf.
                                                                                                      Syst. 22, 1 (Jan. 2004), 5–53. https://doi.org/10.1145/963770.963772
ACKNOWLEDGMENTS                                                                                  [13] David Holtz, Ben Carterette, Praveen Chandar, Zahra Nazari, Henriette Cramer,
                                                                                                      and Sinan Aral. 2020. The Engagement-Diversity Connection: Evidence from
Special thanks to those who lent their insight and technical support                                  a Field Experiment on Spotify. In Proceedings of the 21st ACM Conference on
for this work, including Jaehun Kim, Taehyun Lee, Kyung-Min Kim,                                      Economics and Computation (Virtual Event, Hungary) (EC ’20). Association for
                                                                                                      Computing Machinery, New York, NY, USA, 75–76. https://doi.org/10.1145/
and Jung-Woo Ha.                                                                                      3391403.3399532
                                                                                                 [14] Xiao hua Liu, Yongbin Sun, Ziwei Liu, and Dahua Lin. 2020. Learning Diverse
REFERENCES                                                                                            Fashion Collocation by Neural Graph Filtering. ArXiv abs/2003.04888 (2020).
                                                                                                 [15] Kurt Jacobson, Vidhya Murali, Edward Newett, Brian Whitman, and Romain Yon.
 [1] Gediminas Adomavicius and YoungOk Kwon. 2012. Improving Aggregate Rec-                           2016. Music Personalization at Spotify. In Proceedings of the 10th ACM Conference
     ommendation Diversity Using Ranking-Based Techniques. IEEE Trans. on Knowl.                      on Recommender Systems (Boston, Massachusetts, USA) (RecSys ’16). Association
     and Data Eng. 24, 5 (May 2012), 896–911. https://doi.org/10.1109/TKDE.2011.15                    for Computing Machinery, New York, NY, USA, 373. https://doi.org/10.1145/
 [2] Ashton Anderson, Lucas Maystre, Ian Anderson, Rishabh Mehrotra, and Mounia                       2959100.2959120
     Lalmas. 2020. Algorithmic effects on the diversity of consumption on spotify. In            [16] Marius Kaminskas and Derek Bridge. 2016. Diversity, Serendipity, Novelty, and
     Proceedings of The Web Conference 2020. 2155–2165.                                               Coverage: A Survey and Empirical Analysis of Beyond-Accuracy Objectives in
 [3] Eytan Bakshy, Solomon Messing, and Lada A. Adamic. 2015.                 Expo-                   Recommender Systems. ACM Trans. Interact. Intell. Syst. 7, 1, Article 2 (Dec. 2016),
     sure to ideologically diverse news and opinion on Facebook.                 Sci-                 42 pages. https://doi.org/10.1145/2926720
     ence 348, 6239 (2015), 1130–1132. https://doi.org/10.1126/science.aaa1160                   [17] Kyung-Min Kim, Donghyun Kwak, Hanock Kwak, Young-Jin Park, Sangkwon
     arXiv:https://science.sciencemag.org/content/348/6239/1130.full.pdf                              Sim, Jae-Han Cho, Minkyu Kim, Jihun Kwon, Nako Sung, and Jung-Woo Ha. 2019.
 [4] Keith Bradley and Barry Smyth. 2001. Improving Recommendation Diversity.                         Tripartite Heterogeneous Graph Propagation for Large-scale Social Recommen-
 [5] Minmin Chen, Alex Beutel, Paul Covington, Sagar Jain, Francois Belletti, and                     dation. arXiv preprint arXiv:1908.02569 (2019).
     Ed H Chi. 2019. Top-k off-policy correction for a REINFORCE recommender
[18] Dawen Liang, Rahul G. Krishnan, Matthew D. Hoffman, and Tony Jebara. 2018.            [26] Rianne van den Berg, Thomas N Kipf, and Max Welling. 2017. Graph Convolu-
     Variational Autoencoders for Collaborative Filtering. In Proceedings of the 2018           tional Matrix Completion. arXiv preprint arXiv:1706.02263 (2017).
     World Wide Web Conference (Lyon, France) (WWW ’18). International World               [27] SaΓΊl Vargas. 2014. Novelty and Diversity Enhancement and Evaluation in Recom-
     Wide Web Conferences Steering Committee, Republic and Canton of Geneva,                    mender Systems and Information Retrieval. In Proceedings of the 37th International
     CHE, 689–698. https://doi.org/10.1145/3178876.3186150                                      ACM SIGIR Conference on Research Development in Information Retrieval (Gold
[19] Rouzbeh Meymandpour and Joseph G. Davis. 2020. Measuring the diversity of                  Coast, Queensland, Australia) (SIGIR ’14). Association for Computing Machinery,
     recommendations: a preference-aware approach for evaluating and adjusting                  New York, NY, USA, 1281. https://doi.org/10.1145/2600428.2610382
     diversity. Knowledge and Information Systems 62, 2 (01 Feb 2020), 787–811.            [28] Petar VeličkoviΔ‡, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro
     https://doi.org/10.1007/s10115-019-01371-0                                                 LiΓ², and Yoshua Bengio. 2018. Graph Attention Networks. International Con-
[20] Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient                   ference on Learning Representations (2018). https://openreview.net/forum?id=
     Estimation of Word Representations in Vector Space. In 1st International Con-              rJXMpikCZ
     ference on Learning Representations, ICLR 2013, Scottsdale, Arizona, USA, May         [29] Jesse Vig, Shilad Sen, and John Riedl. 2012. The Tag Genome: Encoding Commu-
     2-4, 2013, Workshop Track Proceedings, Yoshua Bengio and Yann LeCun (Eds.).                nity Knowledge to Support Novel Interaction. ACM Trans. Interact. Intell. Syst. 2,
     http://arxiv.org/abs/1301.3781                                                             3, Article 13 (Sept. 2012), 44 pages. https://doi.org/10.1145/2362394.2362395
[21] Tien T. Nguyen, Pik-Mai Hui, F. Maxwell Harper, Loren Terveen, and Joseph A.          [30] Hongwei Wang, Miao Zhao, Xing Xie, Wenjie Li, and Minyi Guo. 2019. Knowledge
     Konstan. 2014. Exploring the Filter Bubble: The Effect of Using Recommender                Graph Convolutional Networks for Recommender Systems. In The World Wide
     Systems on Content Diversity. In Proceedings of the 23rd International Conference          Web Conference (San Francisco, CA, USA) (WWW ’19). Association for Computing
     on World Wide Web (Seoul, Korea) (WWW ’14). Association for Computing Ma-                  Machinery, New York, NY, USA, 3307–3313. https://doi.org/10.1145/3308558.
     chinery, New York, NY, USA, 677–686. https://doi.org/10.1145/2566486.2568012               3313417
[22] Eli Pariser. 2011. The Filter Bubble: What the Internet Is Hiding from You. Penguin   [31] Guanjie Zheng, Fuzheng Zhang, Zihan Zheng, Yang Xiang, Nicholas Jing Yuan,
     Group , The.                                                                               Xing Xie, and Zhenhui Li. 2018. DRN: A Deep Reinforcement Learning Framework
[23] Young-Jin Park, Kyuyong Shin, and Kyung-Min Kim. 2020. Hop Sampling: A                     for News Recommendation. In Proceedings of the 2018 World Wide Web Conference
     Simple Regularized Graph Learning for Non-Stationary Environments. arXiv                   (Lyon, France) (WWW ’18). International World Wide Web Conferences Steering
     preprint arXiv:2006.14897 (2020).                                                          Committee, Republic and Canton of Geneva, CHE, 167–176. https://doi.org/10.
[24] Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014. DeepWalk: Online Learn-              1145/3178876.3185994
     ing of Social Representations. In Proceedings of the 20th ACM SIGKDD Interna-         [32] Guorui Zhou, Xiaoqiang Zhu, Chenru Song, Ying Fan, Han Zhu, Xiao Ma, Yanghui
     tional Conference on Knowledge Discovery and Data Mining (New York, New York,              Yan, Junqi Jin, Han Li, and Kun Gai. 2018. Deep Interest Network for Click-
     USA) (KDD ’14). Association for Computing Machinery, New York, NY, USA,                    Through Rate Prediction. In Proceedings of the 24th ACM SIGKDD International
     701–710. https://doi.org/10.1145/2623330.2623732                                           Conference on Knowledge Discovery Data Mining (London, United Kingdom)
[25] Kyuyong Shin, Young-Jin Park, Kyung-Min Kim, and Sunyoung Kwon. 2020.                      (KDD ’18). Association for Computing Machinery, New York, NY, USA, 1059–1068.
     Multi-Manifold Learning for Large-Scale Targeted Advertising System. arXiv                 https://doi.org/10.1145/3219819.3219823
     preprint arXiv:2007.02334 (2020).