<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>KG), October</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>GraphPOPE: Retaining Structural Graph Information Using Position-aware Node Embeddings</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jeroen B. den Boef</string-name>
          <email>jeroen@socialdatabase.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Joran Cornelisse</string-name>
          <email>joran@socialdatabase.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Paul Groth</string-name>
          <email>p.t.groth@uva.nl</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Socialdatabase</institution>
          ,
          <addr-line>Slego 1A, 1046 BM Amsterdam</addr-line>
          ,
          <country country="NL">The Netherlands</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Amsterdam</institution>
          ,
          <addr-line>Science Park 904, 1098 XH Amsterdam</addr-line>
          ,
          <country country="NL">The Netherlands</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2021</year>
      </pub-date>
      <volume>25</volume>
      <issue>2021</issue>
      <abstract>
        <p>Exponential computational cost arises when graph convolutions are performed on large graphs such as knowledge graphs. This computational bottleneck, dubbed the 'neighbor explosion' problem, has been overcome through application of graph sampling strategies. Graph Convolutional Network architectures that employ such a strategy, e.g. GraphSAGE, GraphSAINT, circumvent this bottleneck by sampling subgraphs. This approach improves scalability and speed at the cost of information loss of the overall graph topology. To improve topological information retention and utilization in graph sampling frameworks, we introduce Graph Position-aware Preprocessed Embeddings (GraphPOPE), a novel, feature-enhancing preprocessing technique. GraphPOPE samples influential anchor nodes in the graph based on centrality measures and subsequently generates normalized geodesic, Cosine or Euclidean distance embeddings for all nodes with respect to these anchor nodes. Structural graph information is retained during sampling as the position-aware node embeddings act as a skeleton for the graph. Our algorithm outperforms GraphSAGE on a Flickr benchmark dataset. Moreover, we demonstrate the added value of topological</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Data that emphasizes relationships between data points, e.g. social networks, general knowledge,
protein interactions, can be formally represented as graphs. Within machine learning there
has been much interest in leveraging these graphs representations leading to the inception
of graph learning and Graph Neural Networks (GNN) [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Graph neural networks have been
successfully applied to a wide variety of tasks utilizing graph-structured data ranging from
knowledge graphs to social networks [
        <xref ref-type="bibr" rid="ref1 ref2 ref3 ref4">2, 3, 1, 4</xref>
        ].
      </p>
      <p>
        Many of the initial teething problems of GNNs have been resolved [
        <xref ref-type="bibr" rid="ref10 ref5 ref6 ref7 ref8 ref9">5, 6, 7, 8, 9, 10</xref>
        ]. Graph
Convolutional Networks (GCN) combined a convolutional smoothing kernel with a spectral
graph representation to achieve state of the art results on transductive node classification tasks
[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. While accurate in a transductive setting, this approach to node classification tasks fails to
generalize well to unseen nodes [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. GraphSAGE opened up the avenue for inductive graph
convolution models by learning general embedding generation functions for node features [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
The GraphSAGE propagation rule utilizing a mean aggregator function is nearly equivalent
to the one utilized in transductive GCNs and can be viewed as a linear approximation of
localized spectral convolution. Additionally, primitive skip connections are performed by
concatenating previous neighborhood representations of nodes with the current neighborhood
representation. GraphSAGE also introduced a complementary NeighborSampler dataloader
which improved scalability by introducing mini-batch training to Graph Convolutional Networks.
The NeighborSampler aids embedding computation by sampling neighboring nodes iteratively
and constructing mini-batches of nodes. Bipartite graphs are subsequently constructed to
simulate the computation flow of GNNs. While inductive by nature and competitively accurate
when introduced, the GraphSAGE model is constrained by its set neighborhood sampling
function, as this restricts the convolutional kernel to a fixed size. The introduction of Graph
Attention Networks (GAT) resolved this constraint by using an attention mechanism as a
dynamic smoothing kernel [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. When combined with a self-attention mechanism, this approach
to graph convolution produces competitive results on both inductive and transductive tasks
[
        <xref ref-type="bibr" rid="ref11 ref7">7, 11</xref>
        ].
      </p>
      <sec id="sec-1-1">
        <title>1.1. Present work</title>
        <p>
          Graph sampling architectures improve scalability and speed for Graph Convolutional Networks
on large graphs at the cost of information loss with respect to overall graph topology. In an efort
to improve topological information retention and utilization in graph sampling frameworks, we
propose a general preprocessing technique for Neural Networks operating on graph-structured
data, called GraphPOPE (Graph POsition-aware Preprocessed Embeddings). In this framework,
topological information is embedded into the feature matrix through the generation of relative
distance embeddings. By sampling anchor nodes from a given graph, identification points are
determined. Normalized relative distance embeddings are then generated for all pairings of
nodes and anchor nodes. These embeddings serve as a skeleton of the graph and identify
which neighborhood a node belongs to. Intuitively, GraphPOPE embeddings can be interpreted
as node2vec neighborhood embeddings for the whole graph, whereas node2vec generates
second-order random walks neighborhood embeddings for individual nodes [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ]. This makes
GraphPOPE applicable to Multi Layer Perceptrons and local pooling models such as Graph
Convolutional Networks alike, as topological information beyond the scope of a convolutional
kernel is provided.
        </p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. Related Literature</title>
      <p>Conceptually, GraphPOPE is closely related to recent advances in GNNs that seek to improve
topological information usage through position-aware convolutional layers. We consider
GraphPOPE closely intertwined with graph sampling techniques as it mitigates topological
information loss.</p>
      <p>0Code implementations are publicly available on Github: https://github.com/JeroendenBoef/GraphPOPE</p>
      <sec id="sec-2-1">
        <title>2.1. Graph Sampling Approaches</title>
        <p>
          to GNNs favor scalability and speed over embedding or self-attention strategies by sampling
subgraphs for training. Notable instances of this approach are Cluster-GCN and GraphSAINT,
which both address the main computational bottleneck of GCNs [
          <xref ref-type="bibr" rid="ref8 ref9">8, 9</xref>
          ]. This computational
bottleneck has been dubbed the neighbor explosion problem and is twofold: First, outputs of a
GCN for a single node require data from neighbouring nodes in the previous layer of the network
[
          <xref ref-type="bibr" rid="ref8 ref9">8, 9</xref>
          ]. Every layer within a GCN requires another  -hop neighbors where  depends on the
convolutional kernel size, increasing computational cost exponentially for every layer. Second,
back-propagation of a GCN requires all of the embeddings in the computation graph to be stored
in GPU memory. Cluster-GCN proposes a solution to this bottleneck by preceding training
with a clustering phase. Clusters of nodes belonging to dense subgraphs are identified during
this clustering phase. These subgraphs are then used to restrict neighborhood search, acting as
boundaries for the convolutional kernel. This relatively simple strategy introduces scalability to
graph convolutional networks while reducing computational costs by a large margin. However,
the employed clustering algorithm introduces additional heavy computational costs.
        </p>
        <p>
          GraphSAINT adopts a similar approach to Cluster-GCN, marginally improving accuracy but
substantially decreasing computation time [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]. The improved computational speed is mainly
achieved through employment of inexpensive sampling algorithms, contrasting the expensive
clustering algorithm utilized by Cluster-GCN.
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Information Loss:</title>
        <p>
          While sampling graphs during training mitigates the neighbor explosion problem and
reintroduces scalability to GCNs, it nevertheless results in the loss of information. Restricting the
GCN to specific clusters or completely disconnected subgraphs during training withholds or
even removes edges and thus information from the graph. Chiang and colleagues identify this
shortcoming for Cluster-GCN and introduce stochastic multiple clustering in an efort to reduce
clustering bias and restore lost information simultaneously [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. However, stochastic multiple
clustering exclusively addresses the issue of cut edges during stochastic gradient descent batch
updates, disregarding information loss preceding this phase.
        </p>
        <p>
          A residual weakness within GNNs is their inability to distinguish between node positions
with regards to the broader context of graph structure [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]. Not taking node features into
account, two nodes can reside within opposite sides of a graph while having a topologically
identical neighbourhood structure. Attempted heuristics range from attempts at deeper GNNs
to node feature augmentation using position-aware convolutional layers [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ].
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Method: GraphPOPE</title>
      <p>The inclusion of position-aware node embeddings as a general preprocessing technique for
graph-based Neural Networks is motivated by the perceived topological information loss in
graph sampling GCNs. Embedding this information into the node features before subgraph
sampling could improve model performance. We first describe the geodesic GraphPOPE
algorithm, which samples anchor nodes stochastically and subsequently generates position-aware
node embeddings for all nodes in a given graph (Section 3.1). Embedding enhancement through
biased anchor node sampling and algorithmic time complexity are detailed in Section 3.2. Finally,
we introduce a faster, embedding space approximation of the geodesic GraphPOPE in Section
3.3. A schematic overview of the GraphPOPE algorithm is depicted in figure 1.</p>
      <sec id="sec-3-1">
        <title>3.1. Geodesic Distance Embeddings</title>
        <p>This section details the GraphPOPE anchor node sampler and geodesic distance embedding
generator algorithm (Algorithm 1 - GraphPOPE-geodesic), which assumes the output matrix
is concatenated with the feature matrix so that all nodes are enriched with their respective
distance embeddings. Let  ( , ℰ ) denote graph  with nodes  and edges ℰ,  the amount
of anchor nodes   to sample,  the geodesic distance function used to derive relative node
distances and   × the geodesic distance matrix generated by GraphPOPE. The intuition behind
this algorithm is that for each node   in the graph, normalized geodesic distances between this
node and all sampled anchor nodes   are computed and added to feature vector v , which
is subsequently added to the relative distance matrix  at index  . Anchor nodes are sampled
stochastically to reduce algorithm complexity and prevent bias in the data. The distance function
employed for this computation is either a single-source or all-pairs shortest path algorithm,
returning 0 if the target node is unreachable and 1 otherwise. As this distance function
(  ,  )
serves as an approximation of how many hops a node is from an anchor node, it can be replaced
by similar but faster distance functions.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Biased Anchor Node Sampling</title>
        <p>The vanilla GraphPOPE (Algorithm 1) avoids bias through stochastic sampling of anchor nodes.
This approach to sampling has a potential drawback of sampling less influential nodes. We
introduce biased anchor node sampling based on node centrality which replaces the stochastic
sampler in Algorithm 1 and alleviates this aforementioned phenomenon.</p>
        <p>In this algorithm (see Appendix A, Algorithm 2: Biased Sampler), centrality scores are derived
for all nodes and the highest ranking nodes are selected as anchor nodes. This extension upon
the vanilla GraphPOPE algorithm aims to increase and stabilize the amount of topological
1 is utilized as an enumeration of  ∈   in line 3 to insert   into vector   at index</p>
        <sec id="sec-3-2-1">
          <title>1: Stochastically sample  anchor nodes   from</title>
          <p>2: for   ∈</p>
          <p>do
Embedding vector v [] ←</p>
          <p>1
information encoded in the distance embeddings by selecting nodes with higher centrality</p>
          <p>
            The geodesic distance matrix generation of Algorithm 1 employs a single-source shortest
path algorithm for distance function  with time complexity ( + )
[
            <xref ref-type="bibr" rid="ref14">14</xref>
            ]. With this function
being utilized for every combination of sampled anchor node   with every node   ∈  , this
results in an overall complexity of ( (
 ( + )))
. This can be simplified to a notation of
(  ( 2 +  ))
directional graph ( = 
, which is close to a worst-case complexity of (
4) for a densely connected,
2) with   =  . This is nevertheless an extreme scenario, divergent
from most average cases. When treated as an all-pairs shortest path problem on a directed
graph and computed in parallel, this complexity can be improved to ( ( + ))
. An example
of such an approach is the Floyd-Warshall algorithm, which generates a complete mapping of
all shortest paths in a given graph. While this all-pairs approach to the shortest path derivation
reduces the complexity from approximately (
regards to memory usage.
3) to (
          </p>
        </sec>
        <sec id="sec-3-2-2">
          <title>2), it is substantially more costly with</title>
          <p>
            Biased anchor node sampling through node centrality introduces additional time complexity.
Betweenness centrality would likely identify well connected, influential nodes most accurately
as it denotes the fraction of shortest paths that pass through a given node. Nodes with a high
betweenness centrality score would logically be more connected than those with a lower score,
and thus less likely to return 0 when fed into distance function  . As this centrality measure
requires shortest path computation, biased sampling has similar scalability issues to the shortest
path algorithms employed in Algorithm 1. As faster approximations for betweenness centrality,
other measures such as eigenvector-, clustering coeficient-, degree-, farness- and closeness
centrality are utilized for centrality function  [
            <xref ref-type="bibr" rid="ref15">15</xref>
            ].
          </p>
        </sec>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Embedding Space Approximation</title>
        <p>
          In order to resolve the exponential scaling complexity of GraphPOPE-geodesic, we propose
an embedding space alternative. This algorithm (see Appendix A, Algorithm 3:
GraphPOPEnode2vec) utilizes Node2vec to generate local neighborhood embeddings   for every node   in
 [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ]. Anchor nodes can then be sampled stochastically or with a bias by K-means clustering
  into  clusters and utilizing the cluster centroids as pseudo anchor nodes   . Classical
K-means has a time complexity of ( 2) which can be reduced to () through cluster shifting
[
          <xref ref-type="bibr" rid="ref16">16</xref>
          ]. Moreover, the necessity of biased sampling is reduced for this algorithm, as edge traversal
is not utilized in the distance computation. As a result, information loss does not occur in
a similar fashion to the geodesic distance calculation. Distance between   and   is then
calculated through parallelized matrix multiplications, which has a linear complexity of () .
Algorithms used for the distance function are Cosine similarity, Cosine distance and Euclidean
distance.
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experiments</title>
      <p>
        We evaluate position-aware node embeddings by performing node property predictions on two
benchmark datasets: Flickr and PubMed [
        <xref ref-type="bibr" rid="ref17 ref9">9, 17</xref>
        ]. In all experiments, predictions are performed on
nodes that are unseen during training but included in the preliminary GraphPOPE embedding
generation. This is thus considered a supervised, transductive learning setting. We used Weights
&amp; Biases for experiment tracking and hyperparameter optimization [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]. Sections 4.1 and 4.2
detail the experimental setup and data, respectively.
      </p>
      <sec id="sec-4-1">
        <title>4.1. Experimental Setup</title>
        <p>
          Experiments were conducted with an Ubuntu OS, GTX 2060 and RTX 3060 NVIDIA GPUs
and Intel i7-5820K and Intel gold 6130 CPUs. Geodesic distances are derived with Networkx
and all models are implemented through a combination of Pytorch, Pytorch Geometric and
an abstraction layer of Pytorch Lightning [
          <xref ref-type="bibr" rid="ref19 ref20 ref21 ref22">19, 20, 21, 22</xref>
          ]. Our baseline model is a vanilla
GraphSAGE architecture, consisting of 2-3 SAGE convolutional layers with intermediate batch
normalization layers, the GraphSAGE mini-batch NeighborSampler, a hidden dimension size
of 256, a dropout rate of 0.5 and a cross-entropy loss function [
          <xref ref-type="bibr" rid="ref21 ref23 ref6">23, 6, 21</xref>
          ]. GraphPOPE models
used for experimentation are divided into two categories, geodesic and node2vec approaches.
Geodesic iterations consist of the vanilla GraphPOPE-stochastic version and its biased
alternatives. Specifically betweenness-, closeness-, degree-, clustering coeficient-, eigenvector
centrality and PageRank-based versions. Node2vec implementations are normalized Cosine and
Euclidean distances supported by biased anchor node sampling through K-means clustering.
All experiments were conducted utilizing identical architecture and hyperparameters with the
exception of optimized batch sizes, amount of convolutional layers (2 or 3) and GraphPOPE
settings.
        </p>
        <p>
          Hyperparameter optimization was conducted separately per dataset for
GraphPOPE-geodesic, GraphPOPE-node2vec and the vanilla GraphSAGE baseline to ensure
unbiased comparison. These optimizations are implemented through hyperparameter sweeps
with a Bayesian search method to optimize validation accuracy [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ]. Sweeps are performed
over  anchor nodes, batch size  and the amount of convolutional layers ℓ.
        </p>
        <p>
          Experiments are run for a max of 300 epochs using early stopping mechanics and a learning
rate monitor. A starting learning rate of 0.001 is employed which is reduced upon stagnation of
validation loss. Finally, an Adam optimizer is used for all models and early stopping is provoked
if validation accuracy does not increase for 20 epochs [
          <xref ref-type="bibr" rid="ref24 ref25">24, 25</xref>
          ].
        </p>
        <p>The motivation behind this experimental setup is twofold. It allows us to assess whether graph
sampling GCNs can be improved through introduction of additional topological information.
Moreover, it reduces the possibility of experimental bias as the convolutional kernel introduces
unfavourable conditions. Convolutional kernels already have access to topological information
of local node neighborhoods, rendering the information gain of GraphPOPE less potent.
4.2. Data
Two graph datasets of contrasting size and density are employed. Test partition nodes are unseen
during training with a separate dataloader that is exclusively instantiated upon conclusion of
training.</p>
        <p>
          Flickr dataset We use the Flickr dataset introduced with GraphSAINT [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]. In the dataset,
edges denote shared metadata among images such as locations or users. Labels are manually
merged tags and represent 7 entities such as animals, nature, humans, etc. Features are
500D bag of words extractions of SIFT image descriptions. The graph contains roughly 89,250
nodes with 899,756 edges connecting them. Similar to Zeng and colleagues, edge weights are
normalized in-degrees and a fixed-partition split is applied to the data, resulting in 50/25/25
train/val/test split [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ].
        </p>
        <p>
          Citation dataset The PubMed medical citation graph is used to assess performance on a
smaller, less challenging node property prediction dataset [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ]. In this directed graph, nodes
represent scientific publications regarding diabetes research from the PubMed database, edges
indicate outgoing citations and labels represent publication categories. The dataset consists of
19,717 nodes divided over 3 classes with 44,338 edges. We employ the FastGCN partition split,
resulting in 500 validation and 1000 test nodes, leaving the remaining 18,217 nodes for training
[
          <xref ref-type="bibr" rid="ref26">26</xref>
          ].
        </p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Results</title>
      <p>We provide experimental results detailing accuracy metrics on the tasks, feature importance of
 anchor nodes and hyperparameter sweeps. Reported accuracy scores are averages of 20 runs
in a range of fixed global seeds to ensure reproducibility. Tables 1 detail the results. Accuracy
scores are accompanied by their respective standard error and highest accuracy values are
denoted in bold. on benchmarking tasks. Optimized hyperparameters are given in Appendix A,
Table 2.</p>
      <p>Table 1 shows an accuracy increment for all geodesic GraphPOPE models with respect
to the baseline on the Flickr dataset. Moreover, GraphPOPE-geodesic implementations that
employ biased sampling of anchor nodes generally experience more substantial accuracy gains.
Contrastingly, node2vec-based approximations yield no improved performance. Results on
PubMed indicate a homogeneous performance of 89% accuracy.</p>
      <p>Results on Flickr indicate that configurations with more anchor nodes generally experience a
more substantial increase in performance except for node2vec approximations. Increasing the
amount of anchor nodes from 32 to 256 for an unoptimized, geodesic GraphPOPE with closeness
centrality sampling raises accuracy from 51.98% to 53.05%. Moreover, validation accuracy yields
an 88% positive correlation with the amount of anchor nodes over 90 hyperparameter sweeps
on the aforementioned configuration. PubMed experiments display a contrasting trend where
the amount of anchor nodes provide diminishing returns. On this smaller dataset, the amount
of anchor nodes have a linear correlation of -25%, resulting in an optimal configuration with 64
anchor nodes to maximize validation accuracy.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Discussion</title>
      <p>Our experimental results on the Flickr dataset display trends of accuracy gain for
GraphPOPEenhanced architectures with a more substantial improvement on larger datasets. Additionally,
accuracy gains improve upon biased anchor node sampling. The amount of anchor nodes
correlates positively with an increase in validation accuracy. This suggests that additional topological
information can be beneficial to sampling-assisted Graph Convolutional Networks. Specifically
for larger graphs, where the fraction of topological information beyond the convolutional kernel
is higher.</p>
      <p>GraphPOPE-node2vec poses an inefective embedding-space alternative. Whereas
computation complexity is improved, model accuracy is not. This phenomenon might be explained
by the fact that convolutional kernels employed by GCNs already have access to the local
neighborhood information encoded by node2vec.</p>
      <p>Scalability is a recurring bottleneck for GNNs. Our time complexity stems from the algorithm’s
geodesic distance calculation. Overcoming these scalability issues could prove beneficial for
graph learning on large datasets, given the substantial accuracy increments on such graphs.
Embedding-space approximations of the distance calculation could provide a solution for
the memory and computation time bottlenecks, removing the need for costly edge traversal
operations. Alternatively, models could be trained to approximate the distance function. Finally,
deep learning alternatives for the identification of influential nodes in the graph could accelerate
performance of biased anchor node sampling, potentially improving another time complexity
component.</p>
    </sec>
    <sec id="sec-7">
      <title>7. Conclusion</title>
      <p>We introduced GraphPOPE, a novel prepossessing technique designed to improve topological
information retention and utilization in Graph Neural Networks. Our algorithm is applicable to
any Neural Network that has access to graph-structured data and operates through
positionaware node embedding generation. We demonstrate an accuracy gain on graph benchmarking
datasets Flickr and PubMed with the application of our position-aware node embeddings, which
can be improved additionally at the cost of additional time complexity. Our experimental results
indicate that larger graphs benefit more from increased amounts of topological information
retention. Finally, we propose approximations for future research to reduce time complexity,
thus increasing applicability to real-world scenarios.
Algorithm 3 GraphPOPE-node2vec
Input: Graph  ( , ℰ ) ; Sampling amount  ; Distance function  ; Node2vec algorithm  ;
Kmeans clustering algorithm  ; Bias setting  ∈ {  ,  }
Output: Normalized embedding distance matrix  ̃  ×</p>
      <p>Node2vec embedding   ←  ( )
if  ==    then
3: clustering centroids  ←  (  )</p>
      <p>Sample  pseudo anchor nodes   from 
else
6: if  ==   then</p>
      <p>Stochastically sample  anchor nodes   from  
end if
9: end if
 ← (  ,   )</p>
      <p>Normalize 
GraphPOPE-geodesic
GraphPOPE-node2vec
Baseline</p>
      <p>Flickr</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>W. L.</given-names>
            <surname>Hamilton</surname>
          </string-name>
          ,
          <article-title>Graph representation learning</article-title>
          ,
          <source>Synthesis Lectures on Artifical Intelligence and Machine Learning</source>
          <volume>14</volume>
          (
          <year>2020</year>
          )
          <fpage>1</fpage>
          -
          <lpage>159</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>W.</given-names>
            <surname>Fan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Ma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Yin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Tang</surname>
          </string-name>
          ,
          <article-title>Graph neural networks for social recommendation</article-title>
          ,
          <source>arXiv</source>
          (
          <year>2019</year>
          )
          <fpage>417</fpage>
          -
          <lpage>426</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>S.</given-names>
            <surname>Arora</surname>
          </string-name>
          ,
          <article-title>A survey on graph neural networks for knowledge graph completion (</article-title>
          <year>2020</year>
          ).
          <article-title>a r X i v : 2 0 0 7 . 1 2 3 7 4</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>T.</given-names>
            <surname>Thanapalasingam</surname>
          </string-name>
          ,
          <string-name>
            <surname>L. van Berkel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Bloem</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Groth</surname>
          </string-name>
          ,
          <article-title>Relational graph convolutional networks: A closer look</article-title>
          ,
          <source>CoRR abs/2107</source>
          .10015 (
          <year>2021</year>
          ).
          <article-title>a r X i v : 2 1 0 7 . 1 0 0 1 5</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>T. N.</given-names>
            <surname>Kipf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Welling</surname>
          </string-name>
          ,
          <article-title>Semi-supervised classification with graph convolutional networks</article-title>
          ,
          <source>arXiv preprint arXiv:1609.02907</source>
          (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>W. L.</given-names>
            <surname>Hamilton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Ying</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Leskovec</surname>
          </string-name>
          ,
          <article-title>Inductive representation learning on large graphs</article-title>
          ,
          <source>Advances in Neural Information Processing Systems 2017-Decem</source>
          (
          <year>2017</year>
          )
          <fpage>1025</fpage>
          -
          <lpage>1035</lpage>
          .
          <article-title>a r X i v : 1 7 0 6 . 0 2 2 1 6</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>P.</given-names>
            <surname>Velicković</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Cucurull</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Casanova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Romero</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Liò</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Bengio</surname>
          </string-name>
          ,
          <article-title>Graph attention networks</article-title>
          ,
          <source>arXiv</source>
          (
          <year>2017</year>
          )
          <fpage>1</fpage>
          -
          <lpage>12</lpage>
          .
          <article-title>a r X i v : 1 7 1 0 . 1 0 9 0 3</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>W. L.</given-names>
            <surname>Chiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Bengio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Si</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. J.</given-names>
            <surname>Hsieh</surname>
          </string-name>
          ,
          <string-name>
            <surname>Cluster-GCN</surname>
          </string-name>
          :
          <article-title>An eficient algorithm for training deep and large graph convolutional networks</article-title>
          ,
          <source>Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining</source>
          (
          <year>2019</year>
          )
          <fpage>257</fpage>
          -
          <lpage>266</lpage>
          .
          <source>doi:1 0 . 1 1</source>
          <volume>4 5 / 3 2 9 2 5 0 0 . 3 3 3 0 9 2 5</volume>
          .
          <article-title>a r X i v : 1 9 0 5 . 0 7 9 5 3</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>H.</given-names>
            <surname>Zeng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Srivastava</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Prasanna</surname>
          </string-name>
          , R. Kannan,
          <article-title>GraphSAINT: Graph sampling based inductive learning method</article-title>
          ,
          <source>arXiv</source>
          (
          <year>2019</year>
          ).
          <article-title>a r X i v : 1 9 0 7 . 0 4 9 3 1</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>J.</given-names>
            <surname>You</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Ying</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Leskovec</surname>
          </string-name>
          ,
          <article-title>Position-aware graph neural networks</article-title>
          ,
          <source>36th International Conference on Machine Learning</source>
          , ICML
          <year>2019</year>
          2019-
          <fpage>June</fpage>
          (
          <year>2019</year>
          )
          <fpage>12372</fpage>
          -
          <lpage>12381</lpage>
          .
          <article-title>a r X i v : 1 9 0 6 . 0 4 8 1 7</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>W.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Fey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zitnik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Dong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Ren</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Catasta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Leskovec</surname>
          </string-name>
          ,
          <article-title>Open graph benchmark: Datasets for machine learning on graphs</article-title>
          , arXiv preprint arXiv:
          <year>2005</year>
          .
          <volume>00687</volume>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>A.</given-names>
            <surname>Grover</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Leskovec</surname>
          </string-name>
          , node2vec:
          <article-title>Scalable feature learning for networks</article-title>
          ,
          <source>in: Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining</source>
          ,
          <year>2016</year>
          , pp.
          <fpage>855</fpage>
          -
          <lpage>864</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>K.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Leskovec</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Jegelka</surname>
          </string-name>
          ,
          <article-title>How powerful are graph neural networks?</article-title>
          , arXiv preprint arXiv:
          <year>1810</year>
          .
          <volume>00826</volume>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>A.</given-names>
            <surname>Bhargava</surname>
          </string-name>
          , Grokking Algorithms:
          <article-title>An illustrated guide for programmers and other curious people</article-title>
          ,
          <source>Simon and Schuster</source>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>X.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Meghanathan</surname>
          </string-name>
          ,
          <article-title>Alternatives to betweenness centrality: A measure of correlation coeficient</article-title>
          ,
          <source>2016. doi: 1 0 . 5 1 2 1 / c s i t . 2</source>
          <volume>0 1 6 . 6 1 3 0 1 .</volume>
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>M.</given-names>
            <surname>Pakhira</surname>
          </string-name>
          ,
          <article-title>A linear time-complexity k-means algorithm using cluster shifting, 2014. doi:1 0 . 1 1 0 9 / C I C N . 2 0 1 4 . 2 2 0</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W. W.</given-names>
            <surname>Cohen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Salakhutdinov</surname>
          </string-name>
          ,
          <article-title>Revisiting semi-supervised learning with graph embeddings</article-title>
          ,
          <year>2016</year>
          .
          <article-title>a r X i v : 1 6 0 3 . 0 8 8 6 1</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>L.</given-names>
            <surname>Biewald</surname>
          </string-name>
          ,
          <article-title>Experiment tracking with weights and biases, 2020</article-title>
          . URL: https://www.wandb. com/, software available from wandb.
          <source>com.</source>
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>A.</given-names>
            <surname>Hagberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Swart</surname>
          </string-name>
          ,
          <string-name>
            <surname>D.</surname>
          </string-name>
          <article-title>S Chult, Exploring network structure, dynamics, and function using NetworkX</article-title>
          ,
          <source>Technical Report</source>
          , Los Alamos National Lab.
          <source>(LANL)</source>
          , Los Alamos,
          <source>NM (United States)</source>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>A.</given-names>
            <surname>Paszke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gross</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Massa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Lerer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Bradbury</surname>
          </string-name>
          , G. Chanan,
          <string-name>
            <given-names>T.</given-names>
            <surname>Killeen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Gimelshein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Antiga</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Desmaison</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kopf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>DeVito</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Raison</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Tejani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Chilamkurthy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Steiner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Fang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Bai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Chintala</surname>
          </string-name>
          ,
          <string-name>
            <surname>Pytorch:</surname>
          </string-name>
          <article-title>An imperative style, high-performance deep learning library (</article-title>
          <year>2019</year>
          )
          <fpage>8024</fpage>
          -
          <lpage>8035</lpage>
          . URL: http://papers.neurips.cc/ paper/9015-pytorch
          <article-title>-an-imperative-style-high-performance-deep-learning-library</article-title>
          .pdf.
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>M.</given-names>
            <surname>Fey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. E.</given-names>
            <surname>Lenssen</surname>
          </string-name>
          ,
          <article-title>Fast graph representation learning with pytorch geometric</article-title>
          , arXiv preprint arXiv:
          <year>1903</year>
          .
          <volume>02428</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>W.</given-names>
            <surname>Falcon</surname>
          </string-name>
          , et al.,
          <string-name>
            <surname>Pytorch</surname>
            <given-names>lightning</given-names>
          </string-name>
          ,
          <source>GitHub</source>
          . Note: https://github.com/PyTorchLightning/pytorch-lightning
          <volume>3</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>S.</given-names>
            <surname>Iofe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Szegedy</surname>
          </string-name>
          ,
          <article-title>Batch normalization: Accelerating deep network training by reducing internal covariate shift</article-title>
          ,
          <source>CoRR abs/1502</source>
          .03167 (
          <year>2015</year>
          ). URL: http://arxiv.org/abs/1502.03167.
          <article-title>a r X i v : 1 5 0 2 . 0 3 1 6 7</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>D. P.</given-names>
            <surname>Kingma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Ba</surname>
          </string-name>
          ,
          <article-title>Adam: A method for stochastic optimization</article-title>
          ,
          <source>arXiv preprint arXiv:1412.6980</source>
          (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>I.</given-names>
            <surname>Loshchilov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Hutter</surname>
          </string-name>
          ,
          <article-title>Fixing weight decay regularization in adam</article-title>
          ,
          <source>CoRR abs/1711</source>
          .05101 (
          <year>2017</year>
          ). URL: http://arxiv.org/abs/1711.05101.
          <article-title>a r X i v : 1 7 1 1 . 0 5 1 0 1</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>J.</given-names>
            <surname>Chen</surname>
          </string-name>
          , T. Ma, C. Xiao, Fastgcn:
          <article-title>Fast learning with graph convolutional networks via importance sampling</article-title>
          , CoRR abs/
          <year>1801</year>
          .10247 (
          <year>2018</year>
          ). URL: http://arxiv.org/abs/
          <year>1801</year>
          .10247.
          <article-title>a r X i v : 1 8 0 1 . 1 0 2 4 7</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          <string-name>
            <given-names>A.</given-names>
            <surname>Appendix</surname>
          </string-name>
          <article-title>Algorithm 2 Biased sampler Input: Graph  (</article-title>
          , ℰ ) ; Sampling amount  ; Centrality function  Output: Anchor nodes    ← ( ) 2: Sample  highest   anchor nodes   from
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>