=Paper= {{Paper |id=Vol-2431/paper12 |storemode=property |title=Tripartite Heterogeneous Graph Propagation for Large-scale Social Recommendation |pdfUrl=https://ceur-ws.org/Vol-2431/paper12.pdf |volume=Vol-2431 |authors=Kyung-Min Kim,Donghyun Kwak,Hanock Kwak,Young-Jin Park,Sangkwon Sim,Jae-Han Cho,Minkyu Kim,Jihun Kwon,Nako Sung,Jung-Woo Ha |dblpUrl=https://dblp.org/rec/conf/recsys/KimKKPSCKKSH19 }} ==Tripartite Heterogeneous Graph Propagation for Large-scale Social Recommendation== https://ceur-ws.org/Vol-2431/paper12.pdf
                                                                   Tripartite Heterogeneous Graph
                                                                   Propagation for Large-scale Social
                                                                   Recommendation
                                                  Kyung-Min Kim1,∗ , Donghyun Kwak2,∗ , Hanock Kwak3,∗ , Young-Jin Park4,∗
                                                  Sangkwon Sim1 , Jae-Han Cho1 , Minkyu Kim1 , Jihun Kwon4 , Nako Sung1 , Jung-Woo
                                                  Ha1∗
                                                  {kyungmin.kim.ml,donghyun.kwak}@navercorp.com
                                                  hanock.kwak2@linecorp.com
                                                  {young.j.park,jaehan.cho,min.kyu.kim,jihun.kwon,andy.sangkwon,nako.sung,jungwoo.ha}@
                                                  navercorp.com
                                                  1 Clova AI Research, NAVER Corp., 2 Search Solution Inc., 3 LINE Plus Corp. and 4 Naver R&D

                                                  Center, NAVER Corp.
∗ Authors contributed equally to this research.

                                                  ABSTRACT
                                                  Graph Neural Networks (GNNs) have been emerging as a promising method for relational representa-
                                                  tion including recommender systems. However, various challenging issues of social graphs hinder the
                                                  practical usage of GNNs for social recommendation, such as their complex noisy connections and
                                                  high heterogeneity. The oversmoothing of GNNs is a obstacle of GNN-based social recommendation
                                                  as well. Here we propose a new graph embedding method Heterogeneous Graph Propagation (HGP)
                                                  to tackle these issues. HGP uses a group-user-item tripartite graph as input to reduce the number of
                                                  edges and the complexity of paths in a social graph. To solve the oversmoothing issue, HGP embeds
                                                  nodes under a personalized PageRank based propagation scheme, separately for group-user graph and
                                                  user-item graph. Node embeddings from each graph are integrated using an attention mechanism.
                                                  We evaluate our HGP on a large-scale real-world dataset consisting of 1,645,279 nodes and 4,711,208
                                                  edges. The experimental results show that HGP outperforms several baselines in terms of AUC and
                                                  F1-score metrics.

                                                  ACM RecSys 2019 Late-breaking Results, 16th-20th September 2019, Copenhagen, Denmark
                                                  Copyright ©2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International
                                                  (CC BY 4.0).
Tripartite Heterogeneous Graph Propagation for Large-scale Social Recommendation               ACM RecSys 2019 Late-breaking Results, 16th-20th September 2019, Copenhagen, Denmark


                                                                     CCS CONCEPTS
                                                                     • Information systems → Recommender systems; • Computing methodologies → Neural
                                                                     networks; Learning latent representations.

                                                                     KEYWORDS
                                                                     Social recommendation, Graph neural networks, Heterogeneous graph embedding, User profiling

                                                                     INTRODUCTION
                                                                     Recently, deep learning-based methods have shown promising results in recommender systems. The
                                                                     GNNs are becoming increasingly popular methods to leverage relational information, successfully
                                                                     applied in IT industries. They represent user-item interactions as a user-item graph and compute
                                                                     the similarity between nodes to recommend items to a user. Besides, it is beneficial to use additional
                                                                     user-user interaction information in social recommender systems to alleviate cold-spots in a user-item
                                                                     graph. However, it has not been straightforward to apply GNNs to social recommendation tasks due
                                                                     to the following challenging issues: 1) As the number of user node increases, the number of edges in
                                                                     the user-user graph grows exponentially in general. 2) Tractable social recommendation using GNNs
                                                                     requires proper computational tricks and sampling methods to handle large-scale social graphs. 3)
                                                                     Previous GNNs suffer from the oversmoothing problem, which hinders GNNs from stacking two or
                                                                     more layers. 4) There are two inherently different graphs, i.e., user-user graph and user-item graph. A
                                                                     model has to combine these two graphs coherently.
                                                                        In this paper, we propose a novel graph configuration and embedding method, Heterogeneous
                                                                     Graph Propagation (HGP), to tackle these four problems. We introduce the concept of a group node
Figure 1: Social graph representation. (a) Tradi-
                                                                     that connects groups of related users defined by common social properties to mitigate the complexity
tional user-item graph for social recommendation.                    of user connections. A user node would belong to multiple group nodes, and there is no edge between
(b) Group-user-item tripartite attributed multiplex                  user nodes. This configuration reduces computing time and memory by the square of the number of
heterogeneous graph used in our task.                                nodes while preserving social attributes and structures. This graph can be formulated as a tripartite
                                                                     attributed multiplex heterogeneous network as illustrated in Figure 1.
                                                                        At first, HGP builds node embeddings separately for user-item graph and user-group graph. To
                                                                     prevent the oversmoothing problem, it uses personalized PageRank scheme [3] when propagating
                                                                     node embedding through the whole graph structure. The HGP handles the heterogeneity of graph in
                                                                     two ways; it applies different predicting functions for each node type and combines two types of the
                                                                     node embeddings from each graph with an attention-based function. Finally, the HGP recommends
                                                                     items to a user by retrieving the nearest items to the user in user-item joint embedding space.
                                                                        We evaluate our HGP on a large-scale real-world dataset collected from a social application for our
                                                                     experiments. The dataset includes 1.7M nodes consisting of 456K groups, 1.1M users, and 135K items.
Tripartite Heterogeneous Graph Propagation for Large-scale Social Recommendation                   ACM RecSys 2019 Late-breaking Results, 16th-20th September 2019, Copenhagen, Denmark


                                                                     The nodes have multiple attributes such as group topic (group node), demographic properties (user
                                                                     node), visual-linguistic information, and category (item node). The total number of edges is 4.7M.
                                                                     The experimental results show that our HGP outperforms competitive graph embedding methods.
                                                                     Moreover, as the number of layers increases, the HGP can achieve better performance. It implies that
                                                                     propagating item preference of friends indeed help improve the performance of recommendation.

                                                                     HETEROGENEOUS GRAPH PROPAGATION
                                                                     Items, users, and social relationships build a complex graph with multiple types of nodes and edges.
                                                                     To effectively handle different edge types, Heterogeneous Graph Propagation (HGP) propagates
                                                                     neighborhoods for each edge type independently and then combines the final node representations
                                                                     with an attention model. Also, HGP uses the predicting neural networks separated for each node type
                                                                     considering heterogeneity of node attributes.
                                                                        In a heterogeneous graph G = (V , E), there is a node type mapping function ϕ : V → O and an
                                                                     edge type mapping function ψ : E → R. We denote Ar as an adjacency matrix that only includes
                                                                     edges of type r ∈ R. To propagate self information of nodes to itself, the graph networks add self loops
                                                                                                                                                                         −1/2         −1/2
                                                                     to the adjacency matrix: A˜r = Ar + I . It is then symmetrically normalized as: Aˆr = D˜r                A˜r D˜r      ,
                                                                     where D r is the diagonal degree matrix of Ã.
                                                                             ˜
                                                                        The nodes are initially represented by the feature matrix X ∈ R |V |×n where n is the number of
                                                                     features. We split the nodes by types: X 1 , X 2 , ..., X |O | , apply each predicting neural network: Hi = fi (X i )
                                                                     and concatenate the results: H = [H 1 , H 2 , ..., H |O | ]. Starting with Z r(0) = H ∈ R |V |×m for each edge
                                                                     type r ∈ R, HGP uses similar scheme as that of APPNP [3] which avoids the oversmoothing problem.
                                                                     The purpose of APPNP is not to learn deep node embedding, but to learn a transformation from
                                                                     attributes to class labels in the semi-supervised setting. HGP instead uses non-linear propagation
                                                                     with additional learnable weights to learn deep node representations:

Figure 2: Schematic architecture of Heterogeneous                                                       Z r(k +1) = (1 − α)ReLU (Aˆr Z r(k)WH(k) ) + αH .                               (1)
Graph Propagation (HGP) as a social recommender
                                                                       HGP combines the final node representation matrices with an attention model. Without loss of
system. HGP propagates neighborhoods indepen-
dently for each edge type and then combines the fi-                  generality, we select i-th node (row) from Z r(K ) for each edge type. We stack these vectors building a
nal node representations with an attention model.                    matrix Yi ∈ R |R |×m . Using a single layer Transformer, the HGP performs self attention to Yi :
We compute dot-product similarity between user
                                                                                                           Yi′ = Attention(YiWQ , YiWK , YiWV ),                                        (2)
and item representations to predict CTR.
                                                                     where query, key and value are same, except that different weight matrices are used. Then, the HGP
                                                                     concatenates all rows of Yi′ and pass it to a linear layer, generating representation zi for i-th node.
                                                                       We compute dot-product similarity between the user and item representations to predict CTR. We
                                                                     optimize the model by reducing the cross-entropy loss. We sample subset of nodes for every batch
Tripartite Heterogeneous Graph Propagation for Large-scale Social Recommendation                ACM RecSys 2019 Late-breaking Results, 16th-20th September 2019, Copenhagen, Denmark


                                                                     to reduce memory size. Specifically, we adjust the sampling probability to be proportional to the
                                                                     number of nodes for each type. To reduce approximation variance, the sampling probability is also
                                                                     proportional to the degree of node.
     Table 1: Statistics of the datasets.
                                                                     EXPERIMENTS
       Node or edge types      Numbers                               Datasets
               User            1,105,921                             We use a dataset collected from a large-scale social network service including 1,645,279 nodes con-
              Group             456,483                              nected with 4,711,208 edges. Group, User and Item are three node types in the dataset. The group and
               Item              82,875                              user nodes are connected if the user belongs to the group. The item and user nodes are connected
            Item-User          3,746,650                             when the user positively interacts with the item. Table 1 shows the overall statistics of the dataset.
           Group-User           964,548
                                                                     To enhance accuracy and generality, the nodes contain various attributes such as group topic of
                                                                     group node, demographic properties of user node, and visual-linguistic information and category
                                                                     of item node. We extract high-level features with BERT and VGG16 for visual-linguistic attributes.
                                                                     We transform categorical attributes into dense features with linear embedding layers. Finally, we
Table 2: Performance comparison of com-                              aggregate all features to represent the nodes. We use the first eleven days as a training set, the
peting models. The hyphen ‘-’ implies that                           subsequent two days as a validation set, and the last four days as a test set.
we can not measure the stable perfor-
mance due to the high variance of the re-                            Comparable Models
sults.
                                                                     We compare our model with several models proposed for graph structures as well as a traditional
                                                                     model, i.e., Factorization Machine (FM).
        Models            ROC-AUC       PR-AUC        F1               metapath2vec [2]: It performs meta-path based random walk and leverages heterogeneous skip-
         FM                 0.5725         0.5654   0.5400           gram model for node embedding. It only uses the identification information of nodes and does not
   metapath2vec               0.5             -        -             cover attributes. In our datasets, we choose G-U-I-U-G as a meta-path which considers all node types.
 metapath2vec+EGES          0.6136         0.6290   0.5604             metapath2vec+EGES: We modify metapath2vec to use the attributes. Following the attribute
    MNE+EGES                0.6158         0.6307   0.5660           embedding methods from EGES [4], it densely embeds each attribute and fuses them using attention.
      FastGCN               0.6010         0.5937   0.5417             MNE+EGES: The nodes in MNE [5] use its distinctive embedding and several additional embeddings
     HGP (ours)             0.6365         0.6378   0.5967           for each edge type, which are jointly learned by a unified graph embedding model. The attribute
                                                                     embedding method is same as EGES.
                                                                       FastGCN [1]: The FastGCN is a homogeneous graph embedding method that directly subsamples
                                                                     the nodes for each layer altogether. It is scalable but does not consider edge types. In our task, it uses
                                                                     the same attribute embedding method as HGP for initial node features.

                                                                     Results
                                                                     We report the experimental results of the competitors on Table 2. We found that the values of PR-
                                                                     AUC and F1-score are proportional to that of ROC-AUC. The metapath2vec that does not use node
Tripartite Heterogeneous Graph Propagation for Large-scale Social Recommendation                    ACM RecSys 2019 Late-breaking Results, 16th-20th September 2019, Copenhagen, Denmark


                                                                     attributes fails to learn CTR prediction. The HGP outperforms the FastGCN, which is not suitable
                                                                     for heterogeneous graphs and suffers from the oversmoothing problem. It also outperforms other
Table 3: Learning time of our model ac-                              recent heterogeneous graph models (metapath2vec+EGES and MNE+EGES). Moreover, the validation
cording to the use of sampling method.
                                                                     loss of our model converges within a half day by using sampling scheme. In Table 3, we compare the
                                                                     learning time of HGP according to the use of sampling scheme .
       Method           Learning Time Per Epoch                         Figure 3 shows the performance comparison of HGP with different propagation steps. In our graph,
 HGP w/o sampling                ≈ 45 hrs                            HGP needs at least two propagation steps to know other members in a group (user → group → user).
      HGP                        ≈ 1.1 hrs                           If the number of propagation steps is three, it can approach other preferred items of users who have
                                                                     common preferences (user → item → user → item). The performance of previous GCN architecture
                                                                     degrades as the number of propagation steps k increases, even when the k is two or three. Overall,
                                                                     the HGP achieves the best performance at k=10.

                                                                     CONCLUSION
                                                                     In this paper, we proposed a graph configuration, group-user-item tripartite attributed multiplex
                                                                     heterogeneous networks, for a social recommender system. To avoid the oversmoothing problem, the
                                                                     HGP propagates neighborhood information using the personalized PageRank scheme. The HGP can
                                                                     effectively handle the heterogeneity of a graph in two ways: 1) It builds node embeddings separately
                                                                     for each edge type and combines them with the attention function. 2) It uses different predicting
                                                                     functions on each node type. We tested our model on the large-scale real-world dataset and showed
                                                                     that the HGP outperforms competitive graph embedding methods in terms of various metrics.

                                                                     REFERENCES
Figure 3: Performance comparison of HGP with dif-                    [1] Jie Chen, Tengfei Ma, and Cao Xiao. 2018. Fastgcn: fast learning with graph convolutional networks via importance
ferent propagation steps.                                                sampling. In International Conference on Learning Representations (ICLR).
                                                                     [2] Yuxiao Dong, Nitesh V Chawla, and Ananthram Swami. 2017. metapath2vec: Scalable representation learning for heteroge-
                                                                         neous networks. In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining.
                                                                         ACM, 135–144.
                                                                     [3] Johannes Klicpera, Aleksandar Bojchevski, and Stephan Günnemann. 2018. Predict then Propagate: Graph Neural Networks
                                                                         meet Personalized PageRank. In International Conference on Learning Representations (ICLR).
                                                                     [4] Jizhe Wang, Pipei Huang, Huan Zhao, Zhibo Zhang, Binqiang Zhao, and Dik Lun Lee. 2018. Billion-scale commodity
                                                                         embedding for e-commerce recommendation in alibaba. In Proceedings of the 24th ACM SIGKDD International Conference
                                                                         on Knowledge Discovery & Data Mining. ACM, 839–848.
                                                                     [5] Hongming Zhang, Liwei Qiu, Lingling Yi, and Yangqiu Song. 2018. Scalable Multiplex Network Embedding.. In IJCAI.
                                                                         3082–3088.