<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Tripartite Heterogeneous Graph Propagation for Large-scale Social Recommendation</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Kyung-Min Kim</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Donghyun Kwak</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Hanock Kwak</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Young-Jin Park</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sangkwon Sim</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jae-Han Cho</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Minkyu Kim</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jihun Kwon</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nako Sung</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jung-Woo H</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Clova AI Research, NAVER Corp.</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>LINE Plus Corp</institution>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Naver R&amp;D Center</institution>
          ,
          <addr-line>NAVER Corp</addr-line>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Search Solution Inc.</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2019</year>
      </pub-date>
      <abstract>
        <p>Graph Neural Networks (GNNs) have been emerging as a promising method for relational representation including recommender systems. However, various challenging issues of social graphs hinder the practical usage of GNNs for social recommendation, such as their complex noisy connections and high heterogeneity. The oversmoothing of GNNs is a obstacle of GNN-based social recommendation as well. Here we propose a new graph embedding method Heterogeneous Graph Propagation (HGP) to tackle these issues. HGP uses a group-user-item tripartite graph as input to reduce the number of edges and the complexity of paths in a social graph. To solve the oversmoothing issue, HGP embeds nodes under a personalized PageRank based propagation scheme, separately for group-user graph and user-item graph. Node embeddings from each graph are integrated using an atention mechanism. We evaluate our HGP on a large-scale real-world dataset consisting of 1,645,279 nodes and 4,711,208 edges. The experimental results show that HGP outperforms several baselines in terms of AUC and F1-score metrics.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>CCS CONCEPTS</title>
      <p>• Information systems → Recommender systems; • Computing methodologies → Neural
networks; Learning latent representations.</p>
    </sec>
    <sec id="sec-2">
      <title>INTRODUCTION</title>
      <p>Social recommendation, Graph neural networks, Heterogeneous graph embedding, User profiling
Recently, deep learning-based methods have shown promising results in recommender systems. The
GNNs are becoming increasingly popular methods to leverage relational information, successfully
applied in IT industries. They represent user-item interactions as a user-item graph and compute
the similarity between nodes to recommend items to a user. Besides, it is beneficial to use additional
user-user interaction information in social recommender systems to alleviate cold-spots in a user-item
graph. However, it has not been straightforward to apply GNNs to social recommendation tasks due
to the following challenging issues: 1) As the number of user node increases, the number of edges in
the user-user graph grows exponentially in general. 2) Tractable social recommendation using GNNs
requires proper computational tricks and sampling methods to handle large-scale social graphs. 3)
Previous GNNs sufer from the oversmoothing problem, which hinders GNNs from stacking two or
more layers. 4) There are two inherently diferent graphs, i.e., user-user graph and user-item graph. A
model has to combine these two graphs coherently.</p>
      <p>In this paper, we propose a novel graph configuration and embedding method, Heterogeneous
Graph Propagation (HGP), to tackle these four problems. We introduce the concept of a group node
that connects groups of related users defined by common social properties to mitigate the complexity
of user connections. A user node would belong to multiple group nodes, and there is no edge between
user nodes. This configuration reduces computing time and memory by the square of the number of
nodes while preserving social atributes and structures. This graph can be formulated as a tripartite
atributed multiplex heterogeneous network as illustrated in Figure 1.</p>
      <p>
        At first, HGP builds node embeddings separately for user-item graph and user-group graph. To
prevent the oversmoothing problem, it uses personalized PageRank scheme [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] when propagating
node embedding through the whole graph structure. The HGP handles the heterogeneity of graph in
two ways; it applies diferent predicting functions for each node type and combines two types of the
node embeddings from each graph with an atention-based function. Finally, the HGP recommends
items to a user by retrieving the nearest items to the user in user-item joint embedding space.
      </p>
      <p>We evaluate our HGP on a large-scale real-world dataset collected from a social application for our
experiments. The dataset includes 1.7M nodes consisting of 456K groups, 1.1M users, and 135K items.
The nodes have multiple atributes such as group topic (group node), demographic properties (user
node), visual-linguistic information, and category (item node). The total number of edges is 4.7M.
The experimental results show that our HGP outperforms competitive graph embedding methods.
Moreover, as the number of layers increases, the HGP can achieve beter performance. It implies that
propagating item preference of friends indeed help improve the performance of recommendation.</p>
    </sec>
    <sec id="sec-3">
      <title>HETEROGENEOUS GRAPH PROPAGATION</title>
      <p>Items, users, and social relationships build a complex graph with multiple types of nodes and edges.
To efectively handle diferent edge types, Heterogeneous Graph Propagation (HGP) propagates
neighborhoods for each edge type independently and then combines the final node representations
with an atention model. Also, HGP uses the predicting neural networks separated for each node type
considering heterogeneity of node atributes.</p>
      <p>In a heterogeneous graph G = (V , E), there is a node type mapping function ϕ : V → O and an
edge type mapping function ψ : E → R. We denote Ar as an adjacency matrix that only includes
edges of type r ∈ R. To propagate self information of nodes to itself, the graph networks add self loops
to the adjacency matrix: A˜r = Ar + I . It is then symmetrically normalized as: Aˆr = D˜r −1/2A˜r D˜r −1/2,
where D˜r is the diagonal degree matrix of A˜.</p>
      <p>
        The nodes are initially represented by the feature matrix X ∈ R|V |×n where n is the number of
features. We split the nodes by types: X1, X2, ..., X |O |, apply each predicting neural network: Hi = fi (Xi )
and concatenate the results: H = [H1, H2, ..., H |O |]. Starting with Z (0) = H ∈ R|V |×m for each edge
r
type r ∈ R, HGP uses similar scheme as that of APPNP [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] which avoids the oversmoothing problem.
The purpose of APPNP is not to learn deep node embedding, but to learn a transformation from
atributes to class labels in the semi-supervised seting. HGP instead uses non-linear propagation
with additional learnable weights to learn deep node representations:
      </p>
      <p>Yi′ = Attention(YiWQ , YiWK , YiWV ),
where query, key and value are same, except that diferent weight matrices are used. Then, the HGP
concatenates all rows of Yi′ and pass it to a linear layer, generating representation zi for i-th node.</p>
      <p>We compute dot-product similarity between the user and item representations to predict CTR. We
optimize the model by reducing the cross-entropy loss. We sample subset of nodes for every batch</p>
      <p>HGP combines the final node representation matrices with an atention model. Without loss of
generality, we select i-th node (row) from Zr(K) for each edge type. We stack these vectors building a
matrix Yi ∈ R|R |×m . Using a single layer Transformer, the HGP performs self atention to Yi :
to reduce memory size. Specifically, we adjust the sampling probability to be proportional to the
number of nodes for each type. To reduce approximation variance, the sampling probability is also
proportional to the degree of node.</p>
    </sec>
    <sec id="sec-4">
      <title>EXPERIMENTS</title>
    </sec>
    <sec id="sec-5">
      <title>Datasets</title>
      <p>We use a dataset collected from a large-scale social network service including 1,645,279 nodes
connected with 4,711,208 edges. Group, User and Item are three node types in the dataset. The group and
user nodes are connected if the user belongs to the group. The item and user nodes are connected
when the user positively interacts with the item. Table 1 shows the overall statistics of the dataset.
To enhance accuracy and generality, the nodes contain various atributes such as group topic of
group node, demographic properties of user node, and visual-linguistic information and category
of item node. We extract high-level features with BERT and VGG16 for visual-linguistic atributes.
We transform categorical atributes into dense features with linear embedding layers. Finally, we
aggregate all features to represent the nodes. We use the first eleven days as a training set, the
subsequent two days as a validation set, and the last four days as a test set.</p>
    </sec>
    <sec id="sec-6">
      <title>Comparable Models</title>
      <p>We compare our model with several models proposed for graph structures as well as a traditional
model, i.e., Factorization Machine (FM).</p>
      <p>
        metapath2vec [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]: It performs meta-path based random walk and leverages heterogeneous
skipgram model for node embedding. It only uses the identification information of nodes and does not
cover atributes. In our datasets, we choose G-U-I-U-G as a meta-path which considers all node types.
      </p>
      <p>
        metapath2vec+EGES: We modify metapath2vec to use the atributes. Following the atribute
embedding methods from EGES [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], it densely embeds each atribute and fuses them using atention.
      </p>
      <p>
        MNE+EGES: The nodes in MNE [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] use its distinctive embedding and several additional embeddings
for each edge type, which are jointly learned by a unified graph embedding model. The atribute
embedding method is same as EGES.
      </p>
      <p>
        FastGCN [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]: The FastGCN is a homogeneous graph embedding method that directly subsamples
the nodes for each layer altogether. It is scalable but does not consider edge types. In our task, it uses
the same atribute embedding method as HGP for initial node features.
      </p>
    </sec>
    <sec id="sec-7">
      <title>Results</title>
      <p>We report the experimental results of the competitors on Table 2. We found that the values of
PRAUC and F1-score are proportional to that of ROC-AUC. The metapath2vec that does not use node
≈ 45 hrs
≈ 1.1 hrs
atributes fails to learn CTR prediction. The HGP outperforms the FastGCN, which is not suitable
for heterogeneous graphs and sufers from the oversmoothing problem. It also outperforms other
recent heterogeneous graph models (metapath2vec+EGES and MNE+EGES). Moreover, the validation
loss of our model converges within a half day by using sampling scheme. In Table 3, we compare the
learning time of HGP according to the use of sampling scheme .</p>
      <p>Figure 3 shows the performance comparison of HGP with diferent propagation steps. In our graph,
HGP needs at least two propagation steps to know other members in a group (user → group → user).
If the number of propagation steps is three, it can approach other preferred items of users who have
common preferences (user → item → user → item). The performance of previous GCN architecture
degrades as the number of propagation steps k increases, even when the k is two or three. Overall,
the HGP achieves the best performance at k=10.</p>
    </sec>
    <sec id="sec-8">
      <title>CONCLUSION</title>
      <p>In this paper, we proposed a graph configuration, group-user-item tripartite atributed multiplex
heterogeneous networks, for a social recommender system. To avoid the oversmoothing problem, the
HGP propagates neighborhood information using the personalized PageRank scheme. The HGP can
efectively handle the heterogeneity of a graph in two ways: 1) It builds node embeddings separately
for each edge type and combines them with the atention function. 2) It uses diferent predicting
functions on each node type. We tested our model on the large-scale real-world dataset and showed
that the HGP outperforms competitive graph embedding methods in terms of various metrics.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Jie</given-names>
            <surname>Chen</surname>
          </string-name>
          , Tengfei Ma, and
          <string-name>
            <given-names>Cao</given-names>
            <surname>Xiao</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Fastgcn: fast learning with graph convolutional networks via importance sampling</article-title>
          .
          <source>In International Conference on Learning Representations (ICLR).</source>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Yuxiao</given-names>
            <surname>Dong</surname>
          </string-name>
          , Nitesh V Chawla,
          <string-name>
            <given-names>and Ananthram</given-names>
            <surname>Swami</surname>
          </string-name>
          .
          <year>2017</year>
          . metapath2vec:
          <article-title>Scalable representation learning for heterogeneous networks</article-title>
          .
          <source>In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining. ACM</source>
          ,
          <volume>135</volume>
          -
          <fpage>144</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Johannes</given-names>
            <surname>Klicpera</surname>
          </string-name>
          , Aleksandar Bojchevski, and
          <string-name>
            <given-names>Stephan</given-names>
            <surname>Günnemann</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Predict then Propagate: Graph Neural Networks meet Personalized PageRank</article-title>
          .
          <source>In International Conference on Learning Representations (ICLR).</source>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Jizhe</given-names>
            <surname>Wang</surname>
          </string-name>
          , Pipei Huang,
          <string-name>
            <given-names>Huan</given-names>
            <surname>Zhao</surname>
          </string-name>
          , Zhibo Zhang,
          <source>Binqiang Zhao, and Dik Lun Lee</source>
          .
          <year>2018</year>
          .
          <article-title>Billion-scale commodity embedding for e-commerce recommendation in alibaba</article-title>
          .
          <source>In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery &amp; Data Mining. ACM</source>
          ,
          <volume>839</volume>
          -
          <fpage>848</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Hongming</given-names>
            <surname>Zhang</surname>
          </string-name>
          , Liwei Qiu, Lingling Yi, and
          <string-name>
            <given-names>Yangqiu</given-names>
            <surname>Song</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Scalable Multiplex Network Embedding.</article-title>
          .
          <source>In IJCAI. 3082-3088.</source>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>