INTRODUCTION

Aug

Sensitive attribute prediction for social networks users

Younes Abid

younes.abid@loria.fr 1

Abdessamad Imine

abdessamad.imine@loria.fr 0

Michaël Rusinowitch

michael.rusinowitch@loria.fr 0 0 Lorraine University , Cnrs, Inria, Nancy , France 1 Lorraine University , Nancy , France

2013

1849 153 168

Social networks are popular means of data sharing but they are vulnerable to privacy breaches. For instance, relating users with similar profiles an entity can predict personal data with high probability. We present SONSAI a tool to help Facebook users to protect their private information from these inferences. The system samples a subnetwork centered on the user, cleanses the collected public data and predicts user sensitive attribute values by leveraging machine learning techniques. Since SONSAI displays the most relevant attributes exploited by each inference, the user can modify them to prevent undesirable inferences. The tool is designed to perform reasonably with the limited resources of a personal computer, by collecting and processing a relatively small relevant part of network data.

INTRODUCTION

Data published on social networks profiles can be mined for inferring sensitive information about users. For instance it was shown in [ 10 ] how musical tastes allow one to predict educational level. To increase user awareness about these privacy threats we have designed a tool, SONSAI, for Facebook users to audit their profiles.The system crawls the network around the user and predicts its sensitive attributes values using a machine learning engine. The results provided by SONSAI, also shows the public attributes of the user that have oriented the learning algorithm towards a particular sensitive attribute value. The user can therefore modify these public attributes to prevent inference.

For the approach to be feasible several problems have to be solved: First, data collection by crawling is limited both by the social network and by country regulations. Hence the crawler exploration strategy has to focus only on meaningful representative network nodes. Since attributes are numerous, for the learning program to scale one has to select only the most relevant ones for inferring sensitive attribute values. Hence the second problem is to find an attribute relevance measure that is both accurate and easy to compute. Note that we cannot rely on semantic proximity since we notice that a user that hides a sensitive attribute probably will hide semantically related ones, too. Moreover for fully anonymised datasets the attributes semantics is hard to recover. Therefore we follow an alternative approach by modelling attributes as bipartite graphs and measuring relevance of attributes by comparing their bipartite graph structures.

For specific attributes such as gender and relationship status, the sets of values are much smaller than for other attributes like music and movie. Consequently, the graphs that model these attributes have higher connectivity than the other graphs. For instance, the density of the graph that models gender (as most users publish their gender) is close to 0.5. In order to infer hidden links in such graphs we need to learn from highly connected graphs. However most of the available learning graphs are sparse. To cope with this last problem, we derive new graphs by merging several learning graphs.

All the proposed methods have been implemented for Facebook, however they can be applied to many other social networks. The system has been tested by several volunteer users for auditing their Facebook profiles. In each case a dataset was built from real profiles collected in the user neighborood network.

Related works. In [15] the authors propose algorithms to detect whether a sensitive attribute value can be infered from the neighborhood of a target user in a social network. Heatherly et al. [12] infer attribute values in social network with bayesian classification techniques. For the same purpose Estivill-Castro et al. [ 7 ] employ decision-tree learning algorithms. In these works learning is performed of-line on large datasets. In order to perform attribute inference from sparser datasets collected in short time by our tool user in his ego-network, we rather use randow walk-based learning. The random walk technique has been applied to social representations in [14] and [11] where the authors analysed friendships and used a skip-gram model. In these works, user profiles that have similar friends will be mapped to similar representations. This model helps to detect communities and can predict a set of potential friends for a given user profile. Skip-gram model has also been applied recently to infer social relationship from mobility data [5]. Our work uses a more adapted Continuous Bag of Words (CBOW) model to predict the most likely values for a user sensitive attribute. In our setting friends are considered as an attribute among others. Moreover we first determine automatically a relevant subset of the social network for optimizing random walks. In [3] the relevance of attributes is computed by Bayesian optimisation which is much less eficient than the graph comparison approach adopted here. In particular [3] does not succeed in reasonable time (on standard PC) with attributes like gender and relationship status. Let us note that (unlike [ 1 ]) our approach does not need any ontology to perform semantic correlation between attributes. Finally we notice that the proposed system SONSAI is close under some aspects to a recommendation system: an item suggestion can be viewed as an attribute value prediction [4]. However unlike recommendation systems SONSAI also provides explanations for the predicted values, namely an ordered list of attributes that have played a significant role in the computation. For enforcing privacy, our final goal will be to reduce the “recommendation accuracy” by acting on this list of attributes. 2

ARCHITECTURE OF SONSAI

The architecture of SONSAI is overviewed in Figure 1. The Facebook crawler (about 5k lines of Java 8) drives a Firefox 58.0b4 navigator through a Selenium 3.5.3 server 1. Collected information from each profile, group and page are stored in separated XML files. The Anonymizer, Cleanser, Random walker and Ranker components are written in Python 2.7 (about 2.5k lines of code). The Anonymizer component parses all the XML files, generates anonymized graphs and stores them as TSV files. Then, the Cleanser selects the most relevant graphs and stores them as adjacency lists. The Random walker component browses the adjacency lists and stores the resulting walks in a text document. We use the Python gensim 2 implementation of Word2Vec to parse the text document and compute a vectorial representation of social network nodes encountered in the walk. Finally, the Ranker component classifies the sensitive nodes according to their similarity to the target user profile. 3

SAMPLING FACEBOOK

We have designed a crawler that explores the social network around a user (to some distance given as a parameter) and collects public information from the visited web pages (i.e. friends, liked pages . . . ) in order to build a representative subnetwork. We distinguish two types of Facebook nodes: user profiles (u) and pages (p) and two types of links: like-ships between user profiles and pages, and friendships between user profiles. Given a node , . denotes the set of nodes that are linked to . A discovered node is a node whose URL is known by the crawler. For instance, if the crawler retrieves a user profile and collects its public friends list, then all the friends of that particular user profile are discovered. Algorithm 1 crawls at most nodes at distance ≤ from the target node . Each iteration of the outer loop samples a node, crawls it and update the sets of discovered and crawled nodes. The sampling is done by random walks of length ≤ with a transition probability designed to crawl with higher priority closer nodes and to favour neighbour nodes according to their type. Function sinks( ) returns the set of sinks, i.e. crawled nodes such that all discovered nodes at distance ≤ are also crawled. Sinks are avoided by the random walks to guarantee that the final node has not been crawled yet. 1http://www.seleniumhq.org/ 2https://radimrehurek.com/gensim/index.html 1 Procedure crawl_nodes( , , , ) 2 crawl_node( ); 3 while | _ | < do 4 ← ; ← {} ; 5 for ← 1 to by 1 do 6 . addAll(sinks( − ))); 7 ← random_select_in({ ∪ . } ∖ ); 8 end 9 crawl_node( ); 10

end

Algorithm 1: Crawling nodes around a target user. 4

SOCIAL NETWORK MODEL

Modelling friendship relations. Since friendship on Facebook is symmetric, we model friendship between user proifles by an undirected graph (, ) where is a set of users’ profiles and is a set of friendship links between them.

Modelling page like-ship. We model like-ship between user profiles and pages by several bipartite graphs (, , ) where is a set of users profiles, is a set of pages (a type) and is a set of like-ship links between them. Figure 2 shows an example of page like-ship modelled by two graphs3. Graph ( ) models liked pages of music type and Graph ( ) models liked pages of book type. We note that user profiles can like several pages of the same type.

Anonymizing the social network graphs. For ethical and regulation reasons Facebook identifiers are replaced by fresh identifiers. Each node in the network is then identified by a unique integer ID replacing its Facebook ID. The anonymized IDs are sorted according to the node types. Anonymized graphs are saved under the tab-separated 3Icon made by Smashicons from www.flaticon.com value (TSV) format, one of the most general delimiterseparated values format (DSV). TSV is widely used in graph exchange. In contrast to the dataset released by

4 where only user IDs are anonymised (but not movies title, rating, date of rating), all attribute values are anonymised in our datasets.

In the following a sensitive graph is a graph that models an attribute that is considered sensitive by the user (i.e. the user does not want its value to be predictable). The learning graphs are attribute graphs available for the learning module in order to predict hidden links in the sensitive other semantically-related attributes, and moreover ii) for fully anonymised datasets the attributes semantics is hard to recover. On the contrary we will rely on comparing attributes graph structures.

In the following we assume a fixed sensitive graph .

Step 1: Computing the learning and the confidence rates of learning graphs. In order to compare the structure of a given learning graph to the structure of the sensitive graph, we first split each graph in two parts. The first part contains user profiles that hide their links in the sensitive graph. The ratio of user profiles that publish their links in the first part of the learning graph represents the learning rate . The second part contains user profiles that publish their links in the sensitive graph. The ratio of user profiles that publish their links in the second part of the learning graph represents the confidence rate .

The function connected_profiles_in() returns the set of user profiles that publish their links in the graph given as argument. Given , respectively a learning graph and a sensitive graph and

the set of user profiles in all graphs, we define: = connected_profiles_in( ) = connected_profiles_in( ) ( ) = size( ( ) = size( ∩ ( ∖ )) / size( ∖ ) ∩ ) / size( ) 1 and 2 in a given graph is computed as follows, where the function

( ) returns the set of nodes to which user node is connected in the graph .

( 1, 2) = ( 1) ∩ ( 1) ∪ ( 2) ( 2) The Hamming distance between graphs and is defined by: ( ) = | ( , ) − ( , )| (1) (2) ︁∑ ̸= , ∈ ∩ In order to compare learning graphs with diferent sets of common connected profiles distance by the maximal Hamming distance that can be obtained on such a set. Hence we define the Hamming rate: ∩ , we normalize this ℎ ( ) = ( )/ ( ) where ( ) is rate ℎ

higher than ℎ the sensitive graph.

Densifying graphs

Step 3: Selecting most relevant graphs for learning sensitive attribute values. We first discard the learning graphs that have a learning rate lower than threshold since they do not convey enough information. We then discard the graphs that have a confidence rate lower than since they are considered as unreliable. Finally, from the remaining graphs we only select graphs that have a Hamming since they are the most similar to For some sensitive attributes such as gender, age and relationship status, user profiles are linked to at most one value. Moreover the sets of values for these particular attributes are much smaller than for other attributes. Consequently, the graphs that model these attributes are denser than the other ones. In this case, for improving the random-walk based learning process (see Section 6) we need to merge several learning graphs in order to obtain a denser one. We explain the method with a simple example of gender prediction: we select attribute graphs with high , rates ︁∑ ̸= , ∈ ∩ | ( ( , ), 1 − ( , ))| (3) but with also a good rate of discrimination between genders, that is the gender of connected users in the graph is unbalanced between male and female. For instance we can select jewelry and fast-food graphs. We merge these graphs by grouping all fast-foods in a unique node and similarly for jewelries as shown in Figure 4 to obtain a new learning

× 3 covariance matrix over the set of selected

To specify the random walk transitions we first define the probability ,

that being in node the next node in the random walk is in a selected graph : , = ︃{ ︀∑ 0

( ) { | ( )>0} ( ) if ( ) > 0 otherwise (4) where

( ) is the degree of user in graph . A value node is followed by a user node chosen uniformly at random from the ones connected to . Assuming the node following user node is in , then it will be chosen uniformly at random from the nodes in that are connected to . Therefore, the transition probabilities are ( 1 is the friendship graph): → → : , / :

1/ → ′ : , 1/ ( ) ( ) 1( ) if (, ′) ∈ if ̸= 1 and (, ) ∈ if ̸= 1 and (, ) ∈ (5)

As illustrated in Figure 5 the document is constructed by connecting all graphs through random jumps between their nodes (see also [14]). At each step the walker state changes and a new word is written in the text document. One step amounts to select a graph where the current node occurs with non null degree, and then to select a node that is connected to the current node in the selected graph. The selected node then becomes the new current node.

In this example we aim to predict liked pages of politicians masked by user profile 3. The sensitive graph is Graph 2 and the learning graphs are Graph 1 and Graph 3. Since the values of the sensitive attributes (the pages of politicians) are labelled (each value belongs to a unique cluster), they are represented by the label of their clusters in the final document. We use a greedy clustering algorithm [3] to define size similar clusters. Pages of politicians that share many common "likers" end up in the same cluster. For instance the first walk depicted by Figure 5 is [ 1, 4, 2,3, 4]. But for eficiency the walk [ 1, 4, 2,2, 4] is stored instead in the document since the value 2,3 belongs to the cluster 2,2.

Applying word2vec to compute node representations. We have performed multi-graph random walks in the social network and generated a text document. Walks in the document can be interpreted as sentences, where the words are network nodes. Hence, inferring a link between a user node and an attribute value node is similar to the problem of estimating the likelihood of words co-occurrence in a corpus. We use word2vec [9, 13] to map one-hot encoded vectors that represent words in a high-dimensional vocabulary space to low-dimensional vectors (see [6]). Word2vec is employed with the Continuous Bag of Words (CBOW) model for predicting a word given its context defined by the −

1 words surrounding it ( is the window size of the context). Inputs of the model in our case are users and published attribute value. Since there is no order between the attribute values CBOW model is more adequate than Skipgram. The output of the CBOW model is a vector of size representing a probability distribution of co-occurrence between all the words of the context and each word from the vocabulary within a window of size .

Ranking sensitive attribute values. We measure semantic similarity between two nodes by the cosine of the angle formed by the vectors representing the nodes. Cosine similarity is known to take greater account of context information. We rank all the sensitive values by their cosine similarity to the target user profile. The values that have the lowest rank are most likely the sensitive attribute secret values of the target user profile, where secret values are actually the true values of the target user but are not published by him on the social network.

The Figure 6 depicts an example of 2-dimensional vectors that encode 8 nodes: 3 user profiles ( 1, 2 and 3), 2 pages of musics ( 3,1 and 3,2) and 3 clusters of pages of politicians ( 2,1, 2,2 and 2,3). The clusters of the pages of politicians are the sensitive values and their vectors are red. The node 1 is the target user profile and its vector is blue. The clusters of pages of politicians will be ranked according to their distances to 1. And the inference algorithm will infer as most probable pages of politicians to be liked by 1, the pages of politicians of the cluster that has the smallest rank (the closest cluster to 1).

In [16] Schakel et al. show that word2vec unsupervised learning algorithm encodes word semantics by afecting vectors in the same direction for co-occurrent words during training. Besides, the magnitude of a vector reflects both the frequency of appearance of related words in the corpus and the homogeneity of contexts. Where a context is a set of words that have high co-occurrence probability in the corpus.

In fact, the words that appear in the same contexts have small angular distances between them. The less overlapping the contexts are, the larger the angular distances between their diferent words are. However, words that appear in many contexts are represented by vectors that average vectors pointing in many contexts directions. Hence, the vectors magnitude generally decreases with respect to the number of contexts. Moreover, the higher the word frequency is, the higher the chance that it is used in diferent contexts is. Consequently, the vector magnitude also decreases with respect to frequency. From these remarks, we conclude that the euclidean distance is not a good measure for our inference purpose. Actually, words that appear in many contexts have low magnitude. As a result, their euclidean distances will be small and, using this criteria, they would be considered close even if they do not appear in any common context. For instance, the euclidean distance between the cluster of pages of the most popular politicians will be small even if they are rivals. In the example depicted by Figure 6 the politicians of the clusters 2,1 and 2,2 are rivals. The angular distance between those two clusters is big. However, the euclidean distance is small. Moreover, the euclidean distance between a user that has many friends, for instance the user 1 in the Figure 6, and a popular music like “despacito”, for instance the page of music 2,1 in the Figure 6, will be small. But popular users do not necessarily like popular musics. 7

EXPERIMENTS

To build datasets we have crawled Facebook profiles of people that live in North-East France and in Île-de-France (Paris). Table 1 gives statistics about the two crawled datasets.

Dataset 1 (D 1) North-East France Dataset 2 (D 2) Île-de-France 512. The dimension is usually taken between 100 and 300 for natural languages. However the size of the vocabulary in social networks (equal to the number of nodes) is much higher than in natural languages.

In Dataset 1 the sensitive graph represents the links between 2554 user profiles and 4589 politician pages. For each experiment we generate a new social graph from the dataset by selecting the user profiles that publish their preferences concerning the sensitive attribute (pages of politicians) and at least another attribute. Then we remove all the links in the sensitive graph of 10% of the selected user profiles. The algorithm makes sure that all the nodes in the resulting social graph remain connected. The experiments have consisted then in inferring the hidden links based on information from the learning graphs.

Among the 1928 learning graphs, we selected the ones with learning rate greater than = 20%, confidence rate = 60% and Hamming rate lower than greater than ℎ cians.

Attribute graph Users Communities Musicians/Bands PublicFigures NonProfitOrganizations Artists Companies Websites TVShows EntertainmentWebsites Media/NewsCompanies Products/Services News/MediaWebsites Organizations Movies LocalBusinesses Clothings Gastronomies Actors/Directors Magazines Athletes ApplicationPages SportsTeams 88.37 2357 political organizations, 2 models the links between 1120 user profiles and 1758 political parties and 3 models the links between 39 user profiles and 41 political ideologies. Although the selected graphs seem promising, the inference accuracy is only 0.46. This can be explained by the fact that the selected graphs are very sparse and users are vigilant when publishing their preferences about those attributes. Consequently, the algorithm cannot learn well from them.

We note that the musicians/bands graph was automatically selected by our relevance-based selection method confirming that music and politics are correlated as it was empirically discovered in previous studies [17],

Table 3 summarizes the results of the conducted experiments.

Selection

based on

Relevance

(23 graphs)

Random (23 graphs) accuracy

targets 0.79 0.41 252 204 deleted links 409 351 nodes 558125 11200 (average)

(average) (average) number of sensitive values is smaller. Moreover learning graphs are obtained by grouping several attribute values in the merging operation (see Section 5). For the experiments, we have generated a new social graph from the dataset by randomly selecting 10% of user profiles that publish the value of their sensitive attribute and we have masked it. Then SONSAI has tried to infer the masked values of the sensitive attribute for each user based on the selected attributes by the cleanser module.

Relationship status. The sensitive graph models the relationship status of user profiles. To simplify the presentation we define two meta-relationship status as follows: 1= { , 2= { , , , ℎ, ,

, ℎ, ,

We aim to infer the meta-relationship status of users. Table 4 gives more details about the selected attributes from dataset D2.

We notice that discriminant attributes toward 1 are focused around educations and leisures. On the other hand, discriminant attributes toward 2 are focused around business. The accuracy in AUC of inferring the metarelationship status is higher than 0.7 in both datasets D1 } } and D2 as soon as the target publishes values concerning at least 4 selected attributes by the cleanser.

Gender. The sensitive graph models the gender of user profiles. We notice that discriminant attributes toward male are focused around sports, games and software. On the other hand, discriminant attributes toward female are focused around health, home and luxury. The accuracy in AUC of inferring the gender is higher than 0.83 in dataset D1 and higher than 0.67 in dataset D2 as soon as the target publishes values concerning at least 2 selected attributes by the cleanser.

7.3 Processing times

Table 6 displays the processing times. The processor clock is 2.3 GHz. Cleansing and random walk algorithms are not parallelized. Cleansing takes more time than the other processes in the case of gender inference since it handles hundreds of thousands of nodes, compares hundreds of graphs to the sensitive graph and computes their importance. The random walk, in the case of gender inference, is performed on a small graph containing only a few dozen of super-values and a few thousands of user profiles. On the other hand, in the case of politicians inference, the task is performed on larger graphs containing dozens of thousands of values. The machine disposes only of 8GB of RAM memory. Each chunk of 5k steps is stored separately in a text file of about 25MB. Those files are then parsed by Word2Vec. Word2Vec speed depends on the size of the document vocabulary. It is fast in the case of gender inference since the vocabulary is limited to user profiles, super-values (i.e. clusters of values) and sensitive values.

Ranking task has to compute cosine similarity between a few vectors that represent the sensitive values and the target user vector. Cleansing task selects most important attributes and reduces the vocabulary. Consequently, this process accelerates the inference tasks (relying on Random walk and Word2Vec). Moreover, Cleansing increases accuracy by discarding irrelevant information.

7.4 Parameter sensitivity analysis

Let us investigate the impact of the cleansing parameters , and ℎ . All experiments detailed in this section are conducted on dataset D1 to infer users’ political orientation.

Table 7 shows that only 3 graphs among the 1928 available graphs have a learning rate higher than 30%. Based on those graphs, inference accuracy can be very low. For instance, inference accuracy based on gender attribute is only 0.36. Based only on the users (i.e. friendship) graph accuracy is getting better to 0.64. The communities graph gives high accuracy of 0.74 for inferring political views. However, we notice that the best accuracy is obtained when selecting graphs with learning rate between 10% and 40%. Table 7 shows that the learning rate parameter is important to select the best graphs for inference. However, accuracy does not depend only on this parameter since some graphs such as gender graph that have high learning rate may lead to very low accuracy results.

Table 8 shows that when the Hamming rate ℎ decreases, accuracy increases. However, most graphs have a low Hamming rate because only a small part of them can be compared to the sensitive graph, as few users publish their preferences in both graphs. Hence, their structure is not fairly comparable to the politicians’ graph structure. To cope with this problem we compute a third parameter, the confidence rate , that indicates how reliable the structure comparison is.

Table 9 shows that the confidence rate, , does not give information about the best graph to select when it is considered isolately but it must be coupled with other parameters. For instance, if a given graph has a high confidence rate but a low Hamming rate, that means that it is a good graph for inference. However, if a given graph has a high confidence rate and high Hamming distance rate that means that is probably a bad graph for infering the sensitive attribute. But a given graph could be interesting for inference if it has low and high ℎ .

8 CONCLUSION

SONSAI application enables users to predict their sensitive links in social networks from relatively small amount of data and computing resources. Indeed, sensitive data inferences are fast and accurate on typical personal attributes. It should be noted that the friendship graph was not selected among important ones to deduce both the gender and the relationship status of users. This probably means that alternative techniques based solely on homophily would be inaccurate in this context. Moreover, we have observed that the privacy of users is threatened as soon as they start publishing at least three important attributes. These ones are automatically brought to light by SONSAI, regardless

♯ attacked targets ♯ masked links

Inference accuracy in AUC ♯ selected graphs 1744 ♯ attacked targets ♯ masked links

on inference accuracy.

Inference accuracy in AUC ♯ selected graphs 1711 ♯ attacked targets ♯ masked links ℎ

Foundation5. their semantics and only through a structural analysis of the social network graph. As future work, we plan to incorporate countermeasures into our tool to protect users against posts that might compromise their privacy. We also plan to enhance SONSAI prediction engine with a tool that permits one to disclose hidden friendship links using adequate combinations of queries provided by the social Acknowledgments. This work is supported by MAIF

Mohamed leakage work NDSS 5-8, 2012. through

like! Information 2012, San

Diego,

California,

USA, 2012.

Gender inference Politicians inference

walk

Ranking

Word Embeddings for User Profiling in Online Social Networks. Com

SIGSAC Conference on tian Janvin. 2003.

A Neural Probabilistic Language Model. 2014. 644–649. https://doi.org/10.1109/BigData.2014.7004287 2015.

Link Prediction

Methods and Their Accuracy for Different Social Networks and Network Metrics. Scientific Pro10.1155/2015/172879 [9] Yoav deriving 2016. 855–864. https://doi.org/10.1145/2939672.2939754

Representations of (2013). Walk: online learning of social representations. In The 20th ACM SIGKDD International Conference on Knowledge Discovery and Data

Mining, KDD ’14, New York, NY, USA 2013. curso: protect yourself from curse of attribute inference: ACM SIGMOD

[1]

Chaabane

Abdelberi , Gergely Ács, and Kâafar.

Systems Applications - 28th International Conference, DEXA 2017, Lyon, France, August 28-31 , 2017 , Proceedings, Part I.

249- 263 . https://doi.org/10.1007/978-3- 319 -64468-4_ 19 [4]

Anton

Alekseev and

Sergey I.

Nikolenko . 2017 .

putación y Sistemas 21 , 2 ( 2017 ). http://www.cys.cic.ipn.mx/ ojs/index.php/CyS/article/view/2734 [5]

Michael

Backes , Mathias Humbert, Jun Pang, and Yang Zhang.

2017 . walk2friends: Inferring Social Links from Mobility Profiles .

Proceedings of the 2017 Journal of Machine Learning Research 3 ( 2003 ), 1137 - 1155 .

[7]

Vladimir

Estivill-Castro ,

Peter

Hough , and Md Zahidul Islam.

Empowering users of social networks to assess their privacy risks . In 2014 IEEE International Conference on Big Data, Big Data 2014 , Washington, DC, USA, October 27 -30, Goldberg and

Omer

Levy . 2014 . word2vec Explained: Mikolov et al.' s negative-sampling word-embedding method . CoRR abs/1402 .3722 ( 2014 ).

[10]

Virgil

Grifith . 2007 .

music that makes you dumb . ( 2007 ).

http://musicthatmakesyoudumb.virgil.gr/ [11]

Aditya

Grover and

Jure

Leskovec . 2016 . node2vec: Scalable Feature Learning for Networks . In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , San Francisco, CA, USA, August 13-17, [12]

Heatherly ,

Kantarcioglu , and

Thuraisingham . 2013 .