-

EvalNE: A Framework for Evaluating Network Embeddings on Link Prediction

Alexandru Maray

Jefrey Lij jty

Tijl De Biey

results. These methods map nodes in the network Network embedding (NE) methods aim to learn low- to d-dimensional vectors, typically in Euclidean space. dimensional representations of network nodes as vectors, The obtained representations are then used as features typically in Euclidean space. These representations are then for a variety of tasks such as visualization, multi-label validated on a variety of downstream prediction tasks. Link classi cation, clustering or link prediction. prediction is one of the most popular choices for assessing The challenges of evaluating NE methods for the performance of NE methods. However, the complex- link prediction Unfortunately, the practical perfority of link prediction requires a carefully designed evalu- mance of most NE methods is poorly understood and ation pipeline in order to provide consistent, reproducible not comparable due to inconsistent evaluation proceand comparable results. We argue this has not been con- dures. In this paper, we focus on a number of di culties sidered su ciently in many recent works. The main goal of speci c to the evaluation of NE methods for link predicthis paper is to overcome di culties associated with eval- tion. Link prediction is particularly challenging as the uation pipelines and reproducibility of results. We intro- evaluation involves a number of design choices that can duce EvalNE, an evaluation framework to transparently as- confound the results and are prone to errors. sess and compare the performance of NE methods on link For example, the implicit assumption is that the prediction. EvalNE provides automation and abstraction input graph is not complete and the purpose is to for tasks such as hyper-parameter tuning, model validation, predict the missing edges as accurately as possible. edge sampling, computation of edge embeddings and model To evaluate the performance of a NE method for link validation. The framework integrates e cient procedures for prediction, one thus needs an (incomplete) training edge and non-edge sampling and can be used to easily eval- graph along with a (more) complete version of that uate any o -the-shelf embedding method. The framework is graph for testing. freely available as a Python toolbox. Finally, demonstrating Much research has been devoted to determining the usefulness of EvalNE in practice, we conduct an empir- the best approach to generate these training graphs ical study in which we try to replicate and analyse experi- [8, 18, 29]. Theoretical and empirical evidence suggest mental sections of several in uential papers. that in order to fairly evaluate link prediction methods,

A related problem is the fact that, in addition 2 to the `positive' train and test edges, in most cases also `negative' train and test edges (also called nonedges) are required. Sometimes these are used also to derive the embedding, while in other cases they are used only to train a classi er that predicts links. These sets of non-edges can be selected according to di erent strategies [ 14 ] and can be of various sizes.

Furthermore, many NE methods only provide node embeddings. From these, edge embeddings need to be derived prior to performing predictions. There are several approaches for deriving edge embeddings [ 4 ], and the selection of the edge embedding approach has a strong impact on the performance [ 10 ].

Also the metrics used to report the accuracy of di erent methods vary wildly, from, e.g., AUC-ROC [ 13 ] to precision-recall [ 27 ] to precision@k [ 26 ].

Finally, it appears to be common practice in recent literature to use recommended default settings for existing methods, while tuning the hyper-parameters for the method being introduced. Especially when the recommended default settings were informed by experiments on other graphs than those used in the study at hand, this can paint an unduly unfavorable picture.

Contributions In this paper we propose EvalNE, a framework that simpli es the complex and time consuming process of evaluating NE methods for link prediction. EvalNE automates many parts of the evaluation process: hyper-parameter tuning, selection of train and test edges, negative sampling, and more. The framework: (1) Implements guidelines from research in the area of link prediction evaluation and sampling [ 8, 18, 29 ]. (2) Includes (novel) e cient edge and nonedge sampling algorithms. (3) Provides the most widely used edge embedding methods. (4) Evaluates the scala- 3 bility and accuracy of methods, through wall clock time and a wide range of xed-threshold metrics and threshold curves. (5) Integrates in a single operation the evaluation of any set of NE methods coded in any language on any number of networks. (6) Finally, EvalNE ensures reproducibility and comparability of evaluations and results through shareable con guration les.

The remainder of this paper is organized as follows. In Section 2 we give a brief overview of related work. Section 3 presents the proposed evaluation framework. Our experiments reproducing the experimental setups of several papers and empirical analysis of the proposed edge set selection strategy are reported in Section 4. Finally Section 5 concludes this paper.

The open-source toolbox is freely available at https://github.com/Dru-Mara/EvalNE and the documentation and examples at ReadTheDocs https:// evalne.readthedocs.io/en/latest/index.html.

EvalNE Related Work

Our work is related to the evaluation of link prediction which has been studied in several recent works, i.e., [ 8, 18, 20, 29 ]. These studies focus on the impact of different train set sampling strategies, negative sampling and fair evaluations criteria.

Link prediction as evaluation for the representations learned by a NE algorithm was rst introduced in the pioneering work of Grover et al. [ 10 ]. The survey work of Zhang et al. [ 31 ] points out the importance of link prediction as an application of network representation learning. The authors also signal the inconsistencies in the evaluation of di erent NE approaches and conduct an empirical analysis of many of these methods on a wide range of datasets. Their empirical study, however, only focuses on vertex classi cation and clustering. The importance of standard evaluation frameworks as tools to bridge the gap between research and application of NEs is discussed in Hamilton et al. [ 12 ].

To the best of our knowledge only two frameworks for the evaluation of NE methods currently exist. OpenNE is a recently proposed toolbox for evaluating NE methods on multi-label classi cation. The toolbox also includes implementations of several state-of-the-art embedding methods. GEM [ 9 ] is a similar framework which also implements a variety of embedding methods and includes basic evaluation on multi-label classi cation, vizualization and link prediction tasks. These frameworks, however, are focused on the implementations of embedding methods rather than the evaluation pipeline. Furthermore, these libraries are limited to the NE methods provided by the authors or require new implementations that comply with pre-de ned interfaces. In this section we discuss the key aspects of EvalNE. The framework has been designed as a pipeline of interconnected and interchangeable building blocks, as illustrated in Figure 1. These blocks have speci c functions such as providing sets of train and test edges or computing edge embeddings from given node embeddings. The modular structure of our framework simplies code maintenance and the addition of new features, and allows for exible model evaluation. The framework can be used to evaluate methods providing node embeddings, edge embeddings, or similarity scores (we include in this category the link prediction heuristics). Next, we describe the core functions as well as other building blocks that constitute EvalNE and provide some details regarding the software design.

3.1 Evaluation framework The core building

blocks of our framework are the data split and model evaluation. These blocks constitute the most basic 2. Initialize the set of training edges Etrain to all edges pipeline for assessing the quality of link predictions. Be- in ST fore describing these, we introduce some notation.

We represent a directed weighted network as G = 3. Select edges uniformly at random without replace(V; E; W ) with vertex set V = f1; : : : ; N g, edge set ment from the remaining edges E n Etrain. E V V and weight matrix W 2 IRN N. Edges are We select a spanning tree uniformly at random from represented as ordered pairs e = (u; v) 2 E with weights the set of all possible ones using Broder's algorithm [ 2 ]: we 2 [0; 1). Etrain and Etest denote the training and testing edge sets. We represent the embedding of network nodes into a d-dimensional space as X = (x1; x2; : : : ; xN ) where X 2 IRN d. 1. Select a random vertex s of G and start a random walk on the graph until every vertex is visited. For each vertex i 2 V n fsg collect the edge e = (j; i) that corresponds to the rst entrance to vertex i.

Let T be this collection of edges. 3.1.1 Data split As pointed out in Section 1, in order to perform link prediction on a given input graph 2. Output the set T.

G, sets of train and test edges are required. The set of train edges is generally required to span all The complexity of the uniform spanning tree gennodes in G and induce a training graph Gtrain with eration methods is O(n log n) on expectation [ 2 ]. The a single connected component, because embeddings of same bound applies for the proposed train-test split alindependent components will be far away from and gorithm as the addition of random edges to the train unrelated to each other. set can be e ciently done in O(jEtrainj) time. A more

Most studies resort to a naive algorithm to derive e cient algorithm for constructing a uniform spanning a connected Gtrain. The procedure removes edges from tree of a given graph exists. This method, named Wilan input graph iteratively until the required number of son's algorithm [ 28 ], will be included in future versions train edges remain. The removed edges are used as test of EvalNE. samples. In each iteration, the connectedness of the For directed graphs G we rst construct an equivgraph needs to be checked and only remove an edge if alent undirected version G by adding reciprocals for it does not cause the graph to become disconnected. every edge in E. We then run the same algorithm deThis requirement is generally satis ed by running a scribed above on G and include in the train set only Breadth First Search (BFS) on the graph after each edge those edges present in the initial directed graph G. This removal, which is a costly operation (O(jV j + jEj)). method results in a weakly connected training graph

Integrated in our evaluation framework, we include spanning the same nodes as the original G. a novel algorithm to perform the train-test splits which, In addition to train and test edges, sets of train as we will show in Section 4, is orders of magnitude and test non-edges (also referred to as negative samples ) faster yet equally simple. Our algorithm, also accounts are required in order to evaluate link prediction. These for the fact that the training graph Gtrain must span all are edges between pairs of vertices u; v such that e = nodes in G and contain a single connected component. (u; v) 2= E. The proposed toolbox can compute these

Given as input an undirected graph G = (V; E; W ) non-edges according to either the open world or the and a target number of train edges jEtrainj, the pro- closed world assumption. The two strategies only di er posed algorithm to split the train and test edges pro- in the selection of the train non-edges. Under the open ceeds as follows: world assumption, train non-edges are selected such that they are not in Etrain. Thus, this strategy allows 1. Obtain a uniform spanning tree ST of G overlapping between train non-edges and test real edges. Under the closed world assumption, we consider the 3.2.2 Link prediction heuristics A training graph non-edges to be known a priori and therefore select Gtrain = (V; Etrain; W ) spanning the same set of them so that they are neither in Etrain nor in Etest. vertices as the initial graph G but containing only

The number of train and test edges and non-edges the edges in the train set is provided as input to the are user-controlled parameters. The minimum number link prediction heuristics. Depending on the input of train edges is determined by the size of the spanning graph type we use one of the following de nitions. For tree and the combined sets of non-edges have a trivial undirected graphs we use the ones presented below, upper limit of (N N ) jEj. However, for the train set where (u) denotes the neighbourhood of a node u. For size, fractions between 50% and 90% of total edges E directed graphs we restrict our analysis to either the are recommended. For values below 50%, the resulting in-neighbourhood ( i(u)) or the out-neighbourhood training graph will often not preserve the properties of ( o(u)) of the nodes. the original graph G as shown in [ 16 ]. 3.1.2 Evaluation The proposed framework can evaluate the scalability, parameter sensitivity and accuracy of embedding methods. We assess the scalability directly by measuring the wall clock time. The performance of the methods can be easily reported for different values of embedding dimensionality and hyperparameter values. Finally, the link prediction accuracy is reported using two types of metrics: xed-threshold metrics and threshold curves. Fixed-threshold metrics summarize the performance of the methods to single values. The framework provides the following measures: confusion matrix (TP, FN, FP, TN), precision, recall, fallout, miss, accuracy, F-score and AUC-ROC. Threshold curves present the performance of the methods for a range of threshold values between 0 and 1. EvalNE includes precision-recall curves [ 18 ] and receiver operating curves [ 6 ]. The framework also recommends the most suitable metrics based on the evaluation setup.

3.2 Baselines and building blocks In addition to

the core building blocks, and in order to extend the evaluation process to di erent types of embedding methods, our framework also provides data preprocessing, data manipulation, and node to edge embedding functions. Moreover, EvalNE integrates a standard binary classication technique which can be used, for example, to obtain link predictions from edge embeddings. Furthermore, the toolbox contains a series of built-in link prediction heuristics which can be used as baselines. 3.2.1 Preprocessing The toolbox o ers a variety of functions to load, store, and manipulate networks. These include methods to prune nodes based on degree, remove self-loops, relabel nodes, obtain sets of speci c types of edges, restrict networks to their main connected components and obtain common network statistics. The preprocessing functions of EvalNE build on top of and can be used in combination with those provided by other mainstream software packages (e.g. NetworkX [ 11 ]).

Common neighbours (CN): Jaccard coe cient (JC):

CN(u; v) = j (u) \ (v)j JC(u; v) = j (u) \ (v)j j (u) [ (v)j

Adamic-Adar index (AA):

AA(u; v) = X w2 (u)\ (v) 1 log j (w)j

Resource allocation index (RAI):

RAI(u; v) = X

1 w2 (u)\ (v) j (w)j PA(u; v) = j (u)j j (v)j 1 Katz(u; v) = X l j (u; vjl)j l=1

Preferential attachment (PA): Katz:

For Katz, the parameter 2 [0; 1] is a damping factor and j (u; vjl)j represents the number of paths of length l from node u to node v.

In addition to these methods, a random prediction model is provided for reference. This method simply outputs a uniform random value in [0; 1] as the likelihood of a link between any pair of given vertices (u; v).

3.2.3 Node to edge embeddings Unlike for the

abovementioned LP heuristics where the output can be directly used for link prediction, for NE methods this is generally not the case. Most implementations of NE methods produce only node embeddings. Thus, an additional step of learning edge embeddings is required in order to predict links via binary classi cation.

The edge representations are typically derived in an unsupervised fashion. For example, to derive the edge embedding y(u;v) for edge e = (u; v), we apply a binary operator over the embeddings of nodes u and v, i.e. xu and xv respectively: y(u;v) = xu xv:

In [ 10 ] the authors propose the following alternatives for the operator which we include in our evaluation framework. Any other user-de ned operators can be also easily integrated.

Average (Avg.):

Hadamard (Had.): xu xv xu;i + xv;i 2 xu xv xu;i xv;i

Weighted L1:

Weighted L2: jjxu xvjj1 jxu;i xv;ij new les de ning their own evaluations (e.g. by adding new methods or di erent networks) and share them in order to ensure the reproducibility of results.

When used as an API, the framework exposes a modular design with blocks providing independent and self contained functionalities which can be easily exploited. The user interacts with the library trough an evaluator object which integrates all the building blocks and orchestrates the evaluation pipelines. As an API EvalNE provides users with access to more advanced capabilities and more exible evaluation work ows. 4

Experiments

In this section we seek to prove the simplicity and usefulness of our evaluation framework. To this end we have selected four papers from the NE literature, i.e. Node2vec, PRUNE, CNE and SDNE and replicated their experimental setups. We have also conducted a series of experiments in which we compare our edge set selection algorithm to the naive approach. We performed these comparisons in terms of both scalability and link prediction accuracy. All experiments were run on a single machine equiped with a Intel R CoreTM i5 processor and 16GB of RAM. jjxu xvjj2 jxu;i xv;ij2 4.1 Experimental setups Next, we present the main characteristics of the experimental settings we 3.2.4 Binary classi cation Most NE methods have replicated and some important remarks: (e.g., [ 7, 10, 13, 15 ]) rely on a logistic regression classi- Node2vec [ 10 ]: In this work the authors evaluer to predict link probabilities from edge embeddings. ate the following embedding methods and link predicIn EvalNE we include logistic regression with 10-fold tion heuristics: Node2vec, DeepWalk [ 22 ], LINE [ 23 ], cross validation. The framework, however, is exible Spectral Clustering [ 24 ], CN, JC, AA, and PA. Experiand allows for any binary classi er to be used. ments are performed on the Facebook [ 17 ], PPI [ 10 ], and AstroPh[ 17 ] datasets with 50-50 train-test splits. The 3.3 Software design The EvalNE framework is pro- same number of edges and non-edges are used throughvided as a Python toolbox compatible with Python2 and out the experiment. The results are reported in terms of Python3 and which can be easily installed and used on AUC-ROC and the edge embedding methods used are Linux, MacOS, and Microsoft Windows. The toolbox average, Hadamard, weighted L1 and weighted L2. depends only on a small number of popular open-source PRUNE [ 15 ]: This experimental setup includes Python packages and the coding style and code docu- Node2vec, DeepWalk, LINE, SDNE, PRUNE and mentation comply with the PEP8 and numpy docstring NRCL [ 27 ]. The networks used are HepPH [ 17 ], FBstandards, respectively. The documentation provided wallpost [ 25 ] and Webspam and the train-test fraction includes instructions on the installation and use as well is 0.8 with similar sizes of the non-edge sets. The node as examples of high-level and low-level use and integra- to edge embedding operator used is not reported by the tion with existing code. authors and the results are presented in terms of AUC

EvalNE can be used both as a command line tool ROC. In this setting, the network used are directed. and an API. As a command line tool, the framework CNE [ 13 ]: The evaluation setup in this case is simrequires as input a con guration le de ning a com- ilar to that of Node2vec with the following di erences: plete experimental setup, from methods to evaluate to CNE and Metapath2vec [ 5 ] methods are also evaluated; networks, edge splits and parameters to be tuned. For BlogCatalog [ 30 ], Wikipedia [ 17 ] and StudentDB are inconvenience, several pre- lled con guration les which cluded in the list of evaluated networks and the results reproduce the experimental sections of in uential pa- are only reported for the Hadamard operator. It should pers on NE are provided as examples. Users can create be noted that two authors of this paper are also authors

4.2 Evaluation results For each of the experimenTable 1: All the datasets used for evaluation. tal settings reproduced we present a table containing the original values reported in the paper, and in parentheses the di erence between our result and these values of CNE, and to some degree the development of CNE in- (Tables 3{6). Positive values in parentheses, thus, indispired this work. However, the code used for evaluation cate that our results are higher than the ones reported in that paper was completely separate. in the original papers by that margin, while negative

SDNE [ 26 ]: The authors report in this case exper- values indicate the opposite. iments using the following methods: DeepWalk, LINE, The results obtained show that our experiments SDNE, GraRep [ 3 ], LapEig [ 1 ] and CN. The link predic- align fairly well with those reported in the CNE and tion experiments are performed on the GR-QC dataset PRUNE papers. The only exception is the result of [ 17 ] with a 0.85 train-test fraction. The results are re- Metapath2vec on the Facebook dataset, which substanported only for the Hadamard operator and in terms tially di ers from the CNE paper. For Node2vec and of precision@k for a collection of values between 2 and SDNE, the di erences are larger and occasionally severe 10000. In this setup, the number of train non-edges is (di erences over 0:15 are marked in bold in all tables). the same as that of train edges while all the remaining Possible explanations for these inconsistencies are the non-endges in the graph are used for testing. use of di erent implementations of NE methods, di er

In our experiments we have replicated these setting ent not reported default parameters or the use of parwith the following exceptions (1) for node2vec we did allelization. In addition, we have also observed impornot include spectral clustering and (2) for PRUNE we tant di erences when computing the AUC-ROC directly could not obtain NRCL and lacked the computational from class labels or from class probabilities. In order to resources to evaluate the NE methods on the Webspam reproduce the CNE experiments we used class probabildataset. In all cases we report the same metrics as the ities, while for Node2vec we used class labels as seem to original authors and average the results over ve inde- have been done by the original authors. pendent executions. We used the open-world assump- These results illustrate the need to: (a) create retion for the non-edges, logistic regression for binary clas- producible pipelines for experiments, and (b) report si cation and tuned the same hyper-parameters. All speci cs about the parameter settings and precise imexperiments were run using speci c con guration les plementations used in the evaluation. created for each setting which will be made public.

Regarding the implementations, we evaluated the LP heuristics included in EvalNE; original code by the authors for Deepwalk1 , Node2vec2, LINE3, PRUNE4, CNE5, and Metapath2vec6, and for the remaining ones, the implementations in the OpenNE7 library. Table 1 contains the main characteristics of all the networks used. Network sizes are those corresponding to the main (weakly) connected components of each graph. 4.3 Edge sampling We compared the proposed edge and non-edge sampling strategy in terms of accuracy and scalability to the naive approach. For the accuracy experiment we selected the Facebook, PPI, and AstroPh datasets, and performed link predictions with the CN, JC, AA, and PA heuristics. We collected all the metrics available in EvalNE and computed for each the absolute di erence between the naive and proposed approach.

The recall presents the highest deviation, so we show these results in Table 2. Although recall di ers up to 0.0393, in this case it is traded o with precision. The accuracy in this case di ers only by 0.0154. For the AUC-ROC, one of the most widely used metrics, the maximum deviation is of 0:015 reached on the AstroPh dataset by the PA heuristic. Across methods and 1https://github.com/phanein/deepwalk 2https://github.com/aditya-grover/node2vec 3https://github.com/tangjianpku/LINE 4https://github.com/ntumslab/PRUNE 5https://bitbucket.org/ghentdatascience/cne/ 6https://ericdongyx.github.io/metapath2vec/m2v.html 7https://github.com/thunlp/OpenNE

Avg.

Had.

W L1 W L2

CN JC AA PA DeepWalk LINE node2vec DeepWalk LINE node2vec DeepWalk LINE node2vec DeepWalk LINE node2vec

Facebook 0:81 (+0:14) 0:88 (+0:04) 0:83 (+0:13) 0:71 (+0:04) 0:72 ( 0:01) 0:70 ( 0:03) 0:73 ( 0:01) 0:97 ( 0:03) 0:95 ( 0:06)

0:97 (0:0) 0:96 ( 0:01) 0:95 ( 0:33) 0:96 ( 0:01) 0:96 ( 0:01) 0:95 ( 0:33) 0:96 (0:0)

PPI 0:71 (+0:06) 0:70 (+0:04) 0:71 (+0:06) 0:67 (+0:13) 0:69 (+0:08) 0:63 (+0:12) 0:75 ( 0:01) 0:74 ( 0:2) 0:72 ( 0:01) 0:77 ( 0:17) 0:60 (+0:14) 0:70 ( 0:01) 0:63 ( 0:03) 0:61 (+0:14) 0:71 ( 0:02) 0:62 ( 0:02)

AstroPh 0:82 (+0:13) 0:81 (+0:12) 0:83 (+0:12) 0:70 (+0:08) 0:71 (+0:01) 0:65 (+0:14) 0:72 ( 0:02) 0:93 ( 0:12) 0:89 (+0:05) 0:94 ( 0:05) 0:83 (+0:09) 0:88 ( 0:28) 0:85 (+0:03) 0:83 (+0:09) 0:89 ( 0:27) 0:85 (+0:03) datasets, the results obtained with the naive approach were slightly higher than those using our method.

Regarding the scalability experiments, in Figure 2 we summarize the execution time in seconds of both methods on four di erent datasets. This experiment shows that the proposed method is several order of magnitude faster than the naive approach and independent on the number of train and test edges required. The execution time of the naive approach decreases, as expected, as the number of test edges requested becomes StudentDb

Facebook

GrQc

Proposed edge split

Naive edge split 10 2

0.6 0.7 Proportion or train edges 0.8 0.9 0 2000

4000 6000 Number of graph nodes 8000 10000 0:0016 and maximum di erence 0:2429. This indicate that, in general, the train-test split size has indeed a strong impact on the method results. smaller. In Figure 3 we select the BlogCatalog dataset and restrict it to sub-graphs of di erent number of nodes ranging from 400 up to 10000. The execution times using both sampling strategies are reported in Figure 3. 5 Conclusions

For our proposed strategy we have also evaluated The recent surge of research in the area of network emthe variance observed in the performance metrics for beddings has resulted in a wide variety of data sets, di erent number of repetitions of the experiments (or metrics, and setups for evaluating and comparing the equivalently, the e ect of averaging the results over utility of embedding methods. Comparability across di erent number of edge splits). In this case we used studies is lacking and not all evaluations are equally the same datasets, NE methods and LP heuristics as sound. This highlights the need for speci c tools and in the node2vec experiment and 50-50 and 90-10 train- pipelines to ensure the correct evaluation of these methtest splits. We compared the results obtained in a ods. Particularly, the use of representation learning for single run with those averaged over ve independent link prediction tasks requires train and test sampling, runs for both split fractions. For the 50-50 split the non-edge sampling, and in many cases selection of edge average di erence observed over all methods, datasets embedding methods and binary classi ers. The evaland metrics is 3:4 10 3 with a variance of 1:8 10 5 uation procedure, thus, becomes an ensemble of tasks and a maximum di erence of 0:0293. For the 90-10 which allow for many errors or inconsistencies. split, the average di erence is 3:0 10 3 the variance of In this work we have proposed EvalNE, a novel 1:0 10 5 and maximum di erence 0:0186. These results framework that can be used to evaluate any network indicate that a single train and test split already gives a embedding method for link prediction. Our pipeline su ciently good estimate of the generalization error of automates the selection of train and test edge sets, the models. Thus, experiment repeats are not necessary simpli es the process of tuning model parameters and for networks of similar sizes. These observations seem reports the accuracy of the methods according to many to hold for di erent train-test split fractions. criteria. Our experiments highlight the importance

Finally, we study the e ect of the train-test split of the edge sampling strategy and parameter tuning sizes on the method results. We use an indentical setting for evaluating NE methods. We have also introduced as described above and compare the absolute di erence a scalable procedure to sample edge sets from given of the results obtained using a 50-50 split with those networks and showed empirically that is orders or obtained using a 90-10. The average di erence across magnitude faster than the naive approach that has been all methods, datasets and metrics is 0:038, the variance used in recent literature.

[1]

Belkin and

Niyogi , Laplacian eigenmaps and spectral techniques for embedding and clustering , in Proc. of NIPS , 2002 , pp. 585 { 591 .

[2]

Broder , Generating random spanning trees , in Proc. of FOCS , 1989 , pp. 442 { 447 .

[3]

Cao ,

Lu , and

Xu , GraRep: Learning graph representations with global structural information , in Proc. of CIKM , 2015 , pp. 891 { 900 .

[4]

Chen ,

Perozzi ,

Al-Rfou , and

Skiena , A Tutorial on Network Embeddings . arXiv: 1808 .02590, 2018 .

[5]

Dong ,

N. V.

Chawla , and

Swami , Metapath2vec: Scalable representation learning for heterogeneous networks , in Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , KDD ' 17 , New York, NY, USA, 2017 , ACM, pp. 135 { 144 .

[6]

Fawcett , ROC graphs: Notes and practical considerations for researchers , tech. rep. , 2004 .

[7]

Gao ,

Chen ,

He , and

Zhou , Bine: Bipartite network embedding , in Proc. of SIGIR , 2018 , pp. 715 { 724 .

[8]

Garcia-Gasulla ,

C. U.

Cortes , E. Ayguade, and

J. J.

Labarta , Evaluating link prediction on large graphs , in Proc. of CAAI , 2015 , pp. 90 { 99 .

[9]

Goyal and

Ferrara , Gem: A python package for graph embedding methods , JOSS , 3 ( 2018 ), p. 876 .

[10]

Grover and

Leskovec , node2vec: Scalable feature learning for networks , in Proc. of KDD , 2016 , pp. 855 { 864 .

[11]

A. A.

Hagberg ,

D. A.

Schult , and

P. J.

Swart , Exploring network structure, dynamics, and function using networkx , in Proceedings of the 7th Python in Science Conference , G. Varoquaux,

Vaught , and J. Millman, eds., Pasadena, CA USA, 2008 , pp. 11 { 15 .

[12]

W. L.

Hamilton ,

Ying , and

Leskovec , Representation learning on graphs: Methods and applications . arXiv:1709.05584 , 2017 .

[13]

Kang ,

Lijffijt , and T. De Bie, Conditional Network Embeddings. arXiv: 1805 .07544, 2018 .

[14]

Kotnis and

Nastase , Analysis of the impact of negative sampling on link prediction in knowledge graphs, CoRR , abs/1708.06816 ( 2017 ).

[15]

Y.-A.

Lai , C.-C. Hsu , W. H.

Chen , M.- Y.

Yeh , and S. -D. Lin , Prune: Preserving proximity and global ranking for network embedding , in Proc. of NIPS , 2017 , pp. 5257 { 5266 .

[16]

Leskovec and

Faloutsos , Sampling from large graphs , in Proc. of KDD , 2006 , pp. 631 { 636 .

[17]

Leskovec and

Krevl , SNAP Datasets: Stanford large network dataset collection . http://snap. stanford.edu/data, June 2014 .

[18]

R. N.

Lichtenwalter and

N. V.

Chawla , Link prediction: Fair and e ective evaluation , in Proc. of ASONAM , 2012 , pp. 376 { 383 .

[19]

Lu and

Zhou , Link prediction in complex networks: A survey , Physica

, 390 ( 2011 ), pp. 1150 { 1170 .

[20]

Mart nez ,

Berzal , and J. -C. Cubero , A survey of link prediction in complex networks , ACM Comp. Surv. , 49 ( 2016 ).

[21]

Ou ,

Cui ,

Pei ,

Zhang , and

Zhu , Asymmetric transitivity preserving graph embedding , in Proc. of KDD , 2016 , pp. 1105 { 1114 .

[22]

Perozzi ,

Al-Rfou , and

Skiena , Deepwalk: Online learning of social representations , in Proc. of KDD , 2014 , pp. 701 { 710 .

[23]

Tang ,

Qu ,

Wang ,

Zhang , J. Yan, and

Mei , LINE: Large-scale information network embedding , in Proc. of WWW , 2015 , pp. 1067 { 1077 .

[24]

Tang and

Liu , Leveraging social media networks for classi cation , Data Mining and Knowledge Discovery , 23 ( 2011 ), pp. 447 { 478 .

[25]

Viswanath ,

Mislove ,

Cha , and

K. P.

Gummadi , On the evolution of user interaction in facebook , in Proceedings of the 2nd ACM SIGCOMM Workshop on Social Networks (WOSN'09) , August 2009 .

[26]

Wang ,

Cui , and

Zhu , Structural deep network embedding , in Proc. of KDD , 2016 , pp. 1225 { 1234 .

[27]

Wei ,

Xu ,

Cao , and

P. S.

Yu , Cross view link prediction by learning noise-resilient representation consensus , in Proc. of WWW , 2017 , pp. 1611 { 1619 .

[28] D. B. Wilson , Generating random spanning trees more quickly than the cover time , in Proc. of STOC , 1996 , pp. 296 { 303 .

[29]

Yang ,

R. N.

Lichtenwalter , and

N. V.

Chawla , Evaluating link prediction methods , KAIS , 45 ( 2015 ), pp. 751 { 782 .

[30]

Zafarani and

Liu , Social computing data repository at ASU , 2009 .

[31]

Zhang ,

Yin ,

Zhu , and

Zhang , Network representation learning: A survey . arXiv: 1801 .05852, 2018 .