From Node Embeddings to Triple Embeddings Valeria Fionda1 , Giuseppe Pirrò2 1 DeMaCS, University of Calabria, via Pietro Bucci 30B, 87036, Rende, Italy 2 DI, Sapienza Università di Roma, Piazzale Aldo Moro, 5, 00185 Roma, Italy Abstract An extended version of this paper has been published at the the 34th AAAI Conference on Artificial Intel- ligence (AAAI) with the title “Learning Triple Embeddings from Knowledge Graphs”. Graph embedding techniques allow to learn high-quality low-dimensional graph representations useful in various tasks, from node classification to clustering. Knowledge graphs are particular types of graphs characterized by several distinct types of nodes and edges. Existing knowledge graph embedding approaches have only focused on learning embeddings of nodes and predicates. However, the basic piece of information stored in knowledge graphs are triples and thus, an interesting problem is that of learning embeddings of triples as a whole. In this paper we report on Triple2Vec, a new technique to directly compute triple embeddings in knowledge graphs. Triple2Vec leverages the idea of line graph and extends it to the context of knowledge graphs. Embeddings are then generated by adopting the SkipGram model, where sentences are replaced with walks on a wighted version of the line graph. Keywords Knowledge Graphs, Triple Embeddings, Line Graph 1. Introduction In the last years, learning graph representations using low-dimensional vectors has received attention as viable support to various (machine) learning tasks, from node classification to clustering [1]. Several approaches focused on computing node embeddings for homogeneous graphs only including one type of edge (e.g., [2], [3]) or knowledge graphs (aka heterogeneous information networks) characterized by several distinct types of nodes and edges [4] (e.g., [5], [6], [7]). There have also been attempts considering edge embeddings in homogeneous graphs by applying some operator (e.g., average, Hadamard product) to the embeddings of the edge endpoint nodes [3]. In the knowledge graph landscape, node and predicate embeddings are learned separately and these pieces of information could be aggregate (e.g., [8], [9], [10], [11]). In this paper we report about a technique to directly learn triple embeddings from knowledge graph, which builds upon the notion of line graph of a graph; in particular, we leverage the fact that here triples can be turned into nodes. As an example, the directed line graph obtained from the knowledge graph in Fig. 1 (a) is shown in Fig. 1 (b), where the two copies of the node Lauren Oliver, Americans correspond to the triples having nationality and citizenship as a predicate, respectively. It would be tempting to directly applying embedding techniques to the nodes of AIxIA 2021 Discussion Papers " fionda@mat.unical.it (V. Fionda); pirro@di.uniroma1.it (G. Pirrò)  0000-0002-7018-1324 (V. Fionda); 0000-0002-7499-5798 (G. Pirrò) © 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) Morgan Freeman, Morgan Freeman, stateOfOrigin w1 stateOfOrigin, Morgan Freeman Americans w3 Americans w4 Americans Invictus, Invictus, Morgan Freeman, w7 starring nationality Morgan Freeman Morgan Freeman, starring, nationality, w6 Americans Morgan Freeman Americans Lauren Oliver, Invictus nationality Lauren Oliver, Lauren Oliver, citizenship Lauren Oliver, Americans w16 w2 w5 citizenship, nationality, Americans Americans Americans Lauren Oliver Invictus, Invictus, w8 starring starring, w9 Matt Damon Matt Damon w10 Lauren Oliver, almaMater Lauren Oliver, almaMater, Matt Damon University of Chicago University of Chicago w15 University of Chicago, University of Chicago city University of Chicago, Chicago Matt Damon, city, Matt Damon, Chicago w11 birthPlace Chicago Cambridge birthPlace, Cambridge w12 Cambridge country Chicago, Chicago, w14 country, United States United States country UnitedStates Cambridge, Cambridge, w13 United States country, (a) (b) United States (c) Figure 1: A knowledge graph (a), its directed line graph and (c) its triple line graph. the directed line graph to obtain triple embeddings. However, we detect two main problems. The first is that it is impossible to discern between the two triples encoded by the nodes Lauren Oliver and Americans. The second is that the directed line graph is disconnected and as such, it becomes problematic to learn triple embeddings via random walks. Therefore, we introduce the notion of triple line graph 𝒢𝐿 of a knowledge graph 𝐺. In 𝒢𝐿 , nodes are the triples of 𝐺 and an edge is introduced whenever the triples of 𝐺 share an endpoint. This construction guarantees that 𝒢𝐿 is connected if 𝐺 is connected. The triple line graph for the graph in Fig. 1 (a) is shown in Fig. 1 (c). Unfortunately, directly working with 𝒢𝐿 may lead to low-quality embeddings because edges in 𝒢𝐿 are added based on adjacency. Thus, we introduce a mechanism to weight the edges of 𝒢𝐿 based on predicate relatedness [12]. The weight of an edge between nodes of 𝒢𝐿 is equal to the semantic relatedness between the predicates in the triples of 𝐺 represented by the two nodes. As an example, in Fig. 1 (c) the weight of the nodes of 𝒢𝐿 (M. Damon, birthPlace, Cambridge) and (Cambridge, country, United States) will be equal to the relatedness between birthPlace and country. Finally, to compute edge embeddings we generate truncated random walks, in the form of sequences of nodes, on the weighted triple line graph. Note that weights based on semantic relatedness will bias the random walker to obtain similar contexts for nodes in the weighted triple line graph linked by related predicates. The underlying idea is that similar contexts will lead to similar embeddings. Finally, the walks are fed into the Skip-gram model [13], which will give the embeddings of the nodes of the weighted triple line graph that correspond to the embeddings of the original triples. The remainder of the paper is organized as follows. We introduce some preliminary definitions in Section 2. In Section 3, we introduce the notion of triple line graph of a knowledge graph along with an algorithm to compute it. Section 4 describes the Triple2Vec approach to learn triple embeddings from knowledge graphs. In Section 5, we discuss an experimental evaluation. We conclude and sketch future work in Section 6. 2. Preliminaries A Knowledge Graph (G) is a kind of heterogeneous information network. It is a node and edge labeled directed multigraph 𝐺=(𝑉𝐺 , 𝐸𝐺 , 𝑇𝐺 ) where 𝑉𝐺 is a set of uniquely identified vertices representing entities (e.g., D. Lynch), 𝐸𝐺 a set of predicates (e.g., director) and 𝑇𝐺 a set of triples of the form (𝑠, 𝑝, 𝑜) representing directed labeled edges, where 𝑠, 𝑜 ∈ 𝑉𝐺 and 𝑝 ∈ 𝐸𝐺 . The line graph 𝒢𝐿 =(𝑉𝐿 , 𝐸𝐿 ) of an undirected graph 𝐺=(𝑉𝐺 , 𝐸𝐺 ) is such that: (i) each node of 𝒢𝐿 represents an edge of 𝐺; (ii) two vertices of 𝒢𝐿 are adjacent iff, their corresponding edges in 𝐺 have a node in common. Starting from 𝐺 it is possible ∑︀ to compute the number of nodes and edges of 𝒢𝐿 as follows: (i) |𝑉𝐿 | = |𝐸𝐺 |; (ii) |𝐸𝐿 | ∝ 12 𝑣∈𝑉𝐺 𝑑2𝑣 − |𝐸𝐺 |, where 𝑑𝑣 denotes the degree of the node 𝑣 ∈ 𝑉𝐺 . The concept of line graph has been extended to other types of graphs, including multigraphs and directed graphs. The extension to multigraphs adds a different node in the line graph for each edge of the original multigraph. If the graph 𝐺 is directed, the corresponding line graph 𝒢𝐿 will also be directed; its vertices are in one-to-one correspondence to the edges of 𝐺 and its edges represent two-length directed paths in 𝐺. 3. Triple Line Graph of a Knowledge Graph As we have discussed in the Introduction, apply the notion of line graph to knowledge graphs would lead to counter-intuitive behaviors. Consider the graph 𝐺 in Fig. 1 (a). Fig. 1 (b) shows the directed line graph 𝒢𝐿 , obtained by applying the standard definition, where nodes of 𝒢𝐿 are in one-to-one correspondence to the edges of 𝐺. Moreover, from the same sub-figure it can be noticed that each directed edge of 𝒢𝐿 corresponds to a path of length 2 in 𝐺. At this point, three main issues arise. First, the standard definition associates to each node of the directed line graph the two endpoints of the corresponding edge; however, the edge labels in a knowledge graph carry a semantic meaning, which is completely lost if only the endpoints are considered. As an example, it is not possible to discern between the two copies of the nodes Morgan Freeman, Americans. Second, the edges of the line graph are determined by considering their direction. This disregards the fact that edges in 𝐺 witness some semantic relation between their endpoints (i.e., entities) that can be interpreted bidirectionally. As an example, according to the definition of directed line graph, the two nodes (Lauren Oliver, Americans) in 𝒢𝐿 remain isolated since the corresponding edges do not belong to any two-length path in 𝐺 (see Fig. 1 (a)-(b)). However, consider the triple (Lauren Oliver, nationality, Americans): while the edge label from Lauren Oliver to Americans states the relation nationality, the same label in the opposite direction states the relation is nationality of. Hence, in the case of knowledge graphs, two nodes of the line graph must be connected by an edge if they form a two-length path in the original knowledge graph no matter the edge direction, as the semantics of edge labels can be interpreted bidirectionally. Finally, triples encode some semantic meaning via predicates, and the desideratum is to preserve this semantics when connecting two nodes (i.e., triples of 𝐺) in the line graph. Because of these issues, we introduce the novel notion of triple line graph suitable for KGs. Definition 1. (Triple Line Graph). Given a knowledge graph 𝐺 = (𝑉𝐺 , 𝐸𝐺 , 𝑇𝐺 ), the associated triple line graph 𝒢𝐿 = (𝑉𝐿 , 𝐸𝐿 , 𝑤) is such that: (i) each node of 𝒢𝐿 represents a triple of 𝐺; (ii) two vertices of 𝒢𝐿 , say 𝑠1 , 𝑝1 , 𝑜1 and 𝑠2 , 𝑝2 , 𝑜2 , are adjacent if, and only if, {𝑠1 , 𝑜1 } ∩ {𝑠2 , 𝑜2 } = ̸ ∅; (iii) the function 𝑤 associates a weight in the range [0, 1] to each edge of 𝒢𝐿 . We devised an algorithm (outlined in Algorithm 1) to compute the triple line graph 𝒢𝐿 of a knowledge graph 𝐺 that is at the core of Triple2Vec for the computation of triple embeddings. After initializing the set of nodes and edges of 𝒢𝐿 to the empty set (line 1), the algorithm iterates Input : Knowledge Graph 𝐺 Output : 𝒢𝐿 : the Triple Line Graph associated to 𝐺 1: 𝒢𝐿 = {∅, ∅, ∅} 2: for all (𝑠, 𝑝, 𝑜) in 𝐺 do 3: add the node (𝑠, 𝑝, 𝑜) to 𝒢𝐿 4: end for 5: for all 𝑠 ∈ 𝐺 do 6: 𝐼(𝑠) = ∅ 7: for all (𝑠, 𝑝, 𝑜) (resp., (𝑜, 𝑝, 𝑠)) in 𝐺 do 8: add 𝑠, 𝑝, 𝑜 (resp., 𝑜, 𝑝, 𝑠) to 𝐼(𝑠) 9: for all pair 𝑛, 𝑛′ in 𝐼(𝑠) do 10: add the edge (𝑛, 𝑛′ ) to 𝒢𝐿 11: set 𝑤(𝑛, 𝑛′ ) = computeEdgeWeight(𝑛, 𝑛′ ) 12: end for 13: end for 14: end for 15: return 𝒢𝐿 Algorithm 1: BuildTripleLineGraph (𝐺) over the triples of the input 𝐺 and add a node to 𝒢𝐿 for each visited triple (lines 2-3), thus inducing a bijection from the set of triples of 𝐺 to the set of nodes of 𝒢𝐿 . Besides, if two triples share a node in 𝐺 then an edge will be added between the corresponding nodes of 𝒢𝐿 (lines 5-14). In particular, the data structure 𝐼(𝑠) (line 6) keeps track, for each node 𝑠 of 𝐺, of the triples in which 𝑠 appears as subject or object (lines 7-8). Since such triples correspond to nodes of the triple line graph, by iterating over pairs of triples in 𝐼(𝑠) it is possible to add the desired edge between the corresponding nodes of 𝒢𝐿 (lines 9-10). The algorithm also considers a generic edge weighting mechanism (line 11) based on predicate relatedness (Section 4). By inspecting Algorithm 1, we observe that 𝒢𝐿 can be computed in time 𝒪(|𝑇 |2 ×𝑐𝑜𝑠𝑡𝑊 𝑒𝑖𝑔ℎ𝑡), where 𝑐𝑜𝑠𝑡𝑊 𝑒𝑖𝑔ℎ𝑡 is the cost of computing the weight between nodes in 𝒢𝐿 . 4. Triple2Vec: Learning Triple Embeddings We now describe Triple2Vec that directly learns triple embeddings from knowledge graphs. Triple2Vec includes four main phases: (i) building of the triple line graph (Section 3); (ii) weighting of the triple line graph edges; (iv) computing walks on the weighted triple line graph; and (v) computing embeddings via the Skip-gram model. Triple Line Graph Edge Weighting. We have mentioned in Section 3 that the number of edges in the (triple) line graph can be large. This structure is much denser than that of the original graph and may significantly affect the dynamics of the graph in terms of random walks. To remedy this drawback, we introduce edge weighting mechanism (line 11 Algorithm 1). The desideratum is to come up with a strategy to compute walks so that the neighborhood of a triple will include triples that are semantically related. To this end, we leverage a predicate relatedness measure [14]. This measure is based on the Triple Frequency defined as 𝑇 𝐹 (𝑝𝑖 , 𝑝𝑗 )=log(1 + 𝐶𝑖,𝑗 ), where 𝐶𝑖,𝑗 counts the number of times the predicates 𝑝𝑖 and 𝑝𝑗 link the same subjects and objects. Moreover, it uses the Inverse Triple Frequency defined as 𝐼𝑇 𝐹 (𝑝𝑗 , 𝐸)=log |{𝑝𝑖 :𝐶|𝐸| 𝑖,𝑗 >0}| [12]. Based on TF and ITF, for each pair of predicates 𝑝𝑖 and 𝑝𝑗 we can build a (symmetric) matrix 𝐶𝑀 where each element is 𝐶𝑀 (𝑖, 𝑗)=𝑇 𝐹 (𝑝𝑖 , 𝑝𝑗 ) × 𝐼𝑇 𝐹 (𝑝𝑗 , 𝐸). The final predicate relatedness matrix 𝑀𝑅 can be constructed such that 𝑀𝑅 (𝑝𝑖 , 𝑝𝑗 )=𝐶𝑜𝑠𝑖𝑛𝑒(𝑊𝑖 , 𝑊𝑗 ), where 𝑊𝑖 (resp., 𝑊𝑗 ) is the row of 𝑝𝑖 (resp., 𝑝𝑗 ) in 𝐶𝑀 . This approach guarantees that the more related predicates in the triples representing two nodes in the triple line graph are, the higher the weight of the edge between these nodes. Driving the random walks according to a relatedness criterion can capture both the graph topology in terms triple-to-triple relations (i.e., edges in the triple line graph) and semantic proximity in terms of relatedness between predicates in triples. Computing Walks. Triple2Vec leverages a language model approach to learn the final edge embeddings. As such, it requires a “corpus" of both nodes and sequences of nodes similarly to word embeddings techniques that require words and sequences of words (i.e., sentences). We leverage truncated graph walks. The idea is to start from each node of 𝒢𝐿 (representing a triple of the original graph) and provide a context for each of such node in terms of a sequence of other nodes. Although walks have been used by previous approaches (e.g., [3]), none of them has tackled the problem of computing triple embeddings. Computing Triple Embeddings. Once the “corpus" (in terms of the set of walks 𝒲) is available, the last step of the Triple2Vec workflow is to compute the embeddings of the nodes of 𝒢𝐿 that will correspond to the embeddings of the triples of the input graph 𝐺. The embedding we seek can be seen as a function 𝑓 : 𝑉𝐿 → 𝑅𝑑 , which projects nodes of the weighted triple line graph 𝒢𝐿 into a low dimensional vector space, where 𝑑 ≪ |𝑉𝐿 |, so that neighboring nodes are in proximity in the vector space. For every node 𝑢 ∈ 𝑉𝐿 , 𝑁 (𝑢) ⊂ 𝑉𝐿 is the set of neighbors, which is determined by the walks computed. The co-occurrence probability of two nodes 𝑣𝑖 and 𝑣𝑖+1 in a set of walks 𝒲 is given by the Softmax function using their vector embeddings 𝑒𝑣𝑖 and 𝑒𝑣𝑖+1 : 𝑝((𝑒𝑣𝑖 , 𝑒𝑣𝑖+1 ) ∈ 𝒲) = 𝜎(𝑒𝑇𝑣𝑖 𝑒𝑣𝑖+1 ) (1) where 𝜎 is the Softmax function and 𝑒𝑇𝑣𝑖 𝑒𝑣𝑖+1 is the dot product of the vectors 𝑒𝑣𝑖 and 𝑒𝑣𝑖+1 . As the computation of (1) is computationally demanding [3], we use negative sampling to training the Skip-gram model. Negative sampling randomly selects nodes that do not appear together in a walk as negative examples, instead of considering all nodes in a graph. If a node 𝑣𝑖 appears in a walk of another node 𝑣𝑖+1 , then the vector embedding 𝑒𝑣𝑖 is closer to 𝑒𝑣𝑖+1 as compared to any other randomly chosen node. The probability that a node 𝑣𝑖 and a randomly chosen node 𝑣𝑗 do not appear in a walk starting from 𝑣𝑖 is given by: 𝑝((𝑒𝑗 , 𝑒𝑖 ) ̸∈ 𝒲) = 𝜎(−𝑒𝑇𝑣𝑖 𝑒𝑣𝑗 ). For any two nodes 𝑣𝑖 and 𝑣𝑖+1 , the negative sampling objective of the Skip-gram model to be maximized is given by the following objective function: 𝑘 ∑︁ 𝒪(𝜃) = log 𝜎(𝑒𝑇𝑣𝑖 𝑒𝑣𝑖+1 ) + E𝑣𝑗 [log 𝜎(−𝑒𝑇𝑣𝑖 𝑒𝑣𝑗 )], (2) 𝑗=1 Figure 2: Triple classification results in terms of Micro and Macro F1. where 𝜃 denotes the set of all parameters and 𝑘 is the number of negative samples. For the optimization of the objective function, we use the parallel asynchronous stochastic gradient descent algorithm [15]. 5. Experiments We now report on the evaluation and comparison with related work. Triple2Vec has been implemented in Python and uses the Gensim1 library to learn embeddings. In the experiments, we used three real-world data sets. DBLP [16] about authors, papers, venues, and topics, containing ∼16K nodes, ∼52K edges, and 4 predicate types. Authors are labeled with one among four labels (i.e., database, data mining, machine learning, and information retrieval). Foursquare [7] with ∼30K nodes and ∼83K edges including four different kinds of entities, that is, users, places, points of interests and timestamps where each point of interest has also associated one among 10 labels. Finally, we used a subset of Yago [16] in the domain of movies including ∼22K nodes, ∼89K edges, and 5 predicate types, where each movie is assigned one or more among 5 available labels. Since there are no benchmarks available for the evaluation of triple embedding approaches, we constructed two benchmarks for triple classification and triple clustering by using the labels available in the datasets. For DBLP the 4 labels for author nodes have been propagated to paper nodes and from paper nodes to topic and venue nodes. For Yago, the labels for movies have been propagated to actors, musicians and directors. For FourSquare, the 10 labels for points of interest have been propagated to places, users and timestamps. With this reasoning, each node has been labeled with a subset of labels and nodes of 𝒢𝐿 (i.e., the triples of 𝐺) have been labeled 1 https://radimrehurek.com/gensim (a) Triple2Vec (b) metapath2vec Figure 3: DBLP triple embedding visualization. with the union of the labels associated with the triple endpoints. To each subset of labels we assigned a different meta-label. We compared Triple2Vec to the following two groups of related approaches: (i) metap- ath2vec [6], node2vec [3] implemented in StellarGraph (https://www.stellargraph.io) and Deep- Walk [2] configured with the best parameters reported in their respective papers; that compute embeddings for each node only (not for predicates), and a triple embedding was obtained by averaging the embeddings of the triple endpoints; (ii) ConvE [9] and RotatE [11] proposed for knowledge graph embedding configured with the best parameters reported in their respective papers and implemented in the pykg2vec (https://github.com/Sujit-O/pykg2vec), that compute embeddings for each node and each predicate, and triple embeddings were obtained by concate- nating the embeddings of the triple endpoints and the predicate. For sake of space, we only report the best results obtained by setting the parameters of Triple2Vec as follows: number of walks per node 𝑛=10, maximum walk length 𝐿 = 100, window size (necessary for the context in the Skip-gram model) 𝑤 = 10. Moreover, we used 𝑑=128 as a dimension of the embeddings. The number of negative samples Γ is set to 50. All results are the average of 10 runs. Results on Triple Classification. To carry out this task, we trained a one-vs-rest Logistic regression model, giving as input the triple embeddings along with their labels (the labels of the node of 𝒢𝐿 ). Then, we compute the Micro-F1 and Macro-F1 scores by varying the percentage of training data. The results of the evaluation are reported in Fig. 2. We observe that Triple2Vec consistently outperforms competitors. This is especially true in the DBLP and Yago datasets. We also note that metapath2vec performs worse than node2vec and DeepWalk, although the former has been proposed to work on knowledge graphs. This may be explained by the fact that the metapaths used in the experiments [7], while being able to capture node embeddings, fail short in capturing edge (triple) embeddings. As for the other group of approaches, we can see that the performance are even worse than the first group in some cases although also predicate embeddings are considered. This may be since the goal of these approaches is to learn entity and predicate embeddings for link prediction. Another reason is the fact that these approaches compute a single predicate and node embedding, which is the same for all triples in which it appears. Results on Triple Clustering and Visualization. To have a better account of how triple embeddings are placed in the embedding space, we used t-SNE [17] to obtain a 2-d representation of the triple embeddings (originally including 𝑑 dimensions) obtained from Triple2Vec and metapath2vec. We only report results for DBLP (Fig. 3) as we observed similar trends in the other datasets. Moreover, metapath2vec was the system giving the “most graphically readable” results. We note that while Triple2Vec can clearly identify groups of triples (i.e., triples labeled with the same labels), metapath2vec offers a less clear perspective. We can explain this behavior with the fact that Triple2Vec defines a specific strategy for triple embeddings based on the notion of semantic proximity, while triple embeddings for metapath2vec have been obtained from the embedding of endpoint nodes according to predefined metapaths. 6. Concluding Remarks and Future Work We discussed a technique to directly learn triple embeddings in knowledge graphs. While for homogeneous graphs, there have been some sub-optimal proposals [3], for knowledge graphs, this problem was never explored. The presented solution builds upon the notion of line graph coupled with semantic proximity. Although existing approaches can be adapted we have shown that: (i) simply aggregating the embedding of the triple endpoint nodes is problematic when the endpoint nodes are connected by more than one predicate: (ii) even concatenating node and predicate embeddings would not work since it is not possible to discriminate the role of the same entities and predicates in all triples they appear. References [1] H. Cai, V. W. Zheng, K. C.-C. Chang, A Comprehensive Survey of Graph Embedding: Problems, Techniques, and Applications, Transactions on Knowledge and Data Engineering 30 (2018) 1616–1637. [2] B. Perozzi, R. Al-Rfou, S. Skiena, Deepwalk: Online Learning of Social rRpresentations, in: Proc. of KDD, 2014, pp. 701–710. [3] A. Grover, J. Leskovec, node2vec: Scalable Feature Learning for Networks, in: Proc. of Int. Conference on Knowledge Discovery and Data Mining, 2016, pp. 855–864. [4] Q. Wang, Z. Mao, B. Wang, L. Guo, Knowledge Graph embedding: A Eurvey of Approaches and Applications, Trans. on Knowledge and Data Engineering 29 (2017) 2724–2743. [5] P. Ristoski, H. Paulheim, Rdf2vec: RDF Graph Embeddings for Data Mining, in: Proc. of International Semantic Web Conference, 2016, pp. 498–514. [6] X. Dong, N. V. Chawla, A. Swami, metapath2vec: Scalable Representation Learning for Heterogeneous Networks, in: Proc. of Int. Conference on Information and Knowledge Management, 2017, pp. 135–144. [7] R. Hussein, D. Yang, P. Cudré-Mauroux, Are Meta-Paths Necessary?: Revisiting Heteroge- neous Graph Embeddings, in: Proc. of Proc. of Conference on Information and Knowledge Management, 2018, pp. 437–446. [8] T. Trouillon, J. Welbl, S. Riedel, É. Gaussier, G. Bouchard, Complex Embeddings for Simple Link Prediction, in: Proc. of Int. Conf. on Machine Learning, 2016, pp. 2071–2080. [9] T. Dettmers, P. Minervini, P. Stenetorp, S. Riedel, Convolutional 2D Knowledge Graph Embeddings, in: Proc. of the AAAI Conference, 2018, pp. 1811–1818. [10] H. Xiao, M. Huang, X. Zhu, TransG: A Fenerative Model for Knowledge Graph Embedding, in: Proc. of Association for Computational Linguistics, 2016, pp. 2316–2325. [11] Z. Sun, Z.-H. Deng, J.-Y. Nie, J. Tang, RotatE: Knowledge Graph Embedding by Rela- tional Rotation in Complex Space, in: Proc. of International Conference on Learning Representations, 2019. [12] G. Pirrò, Building Relatedness Explanations from Knowledge Graphs, Semantic Web 10 (2019) 963–990. [13] T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, J. Dean, Distributed Representations of Words and Phrases and their Compositionality, in: Proc. of Proc. of International Conference on Neural Information Processing, 2013, pp. 3111–3119. [14] V. Fionda, G. Pirrò, Fact checking via evidence patterns, in: Proc. of International Joint Conference on Artificial Intelligence, 2018, pp. 3755–3761. [15] B. Recht, C. Re, S. Wright, F. Niu, Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent, in: Proc. of Proc. of International Conference on Neural Information Processing, 2011, pp. 693–701. [16] Z. Huang, N. Mamoulis, Heterogeneous Information Network Embedding for Meta path based Proximity, arXiv preprint arXiv:1701.05291 (2017). [17] L. v. d. Maaten, G. Hinton, Visualizing Data Using t-SNE, J. of Machine Learning Research 9 (2008) 2579–2605.