1. Introduction

Jun

Methodology for Re-evaluation of Knowledge Graph Embedding Models

Bhushan Zope

bhushan.zope@hotmail.com 0 1 3

Sashikala Mishra

sashikala.mishra@sitpune.edu.in 0 1 3

Sanju Tiwari

sanju.tiwari.2007@gmail.com 0 1 4

Deepali Vora

deepali.vora@sitpune.edu 0 1 3

Ketan Kotecha

0 1 2 0 (Deemed University) (SIU) , Lavale, Pune 412115 , India 1 Knowledge Graph Embedding Models , Natural Language Processing, Knowledge Representation, Repro- 2 Symbiosis Centre for Applied Artificial Intelligence (SCAAI), Symbiosis Institute of Technology , Symbiosis International 3 Symbiosis Institute of Technology, Symbiosis International (Deemed University) (SIU) , Lavale, Pune 412115 , India 4 Universidade Autonoma de Tamaulipas , Mexico

2023

1 2023

Knowledge Graph (KG) has emerged as a favored tool in many areas of research and industry. One of the research areas in the KG domain is knowledge graph embedding, which involves mapping entities and relationships to low-dimensional vectors. Many knowledge graph embedding models have been proposed in the literature. However, minimal eforts have been made to investigate the reproducibility of these models. This research focuses on a reproducibility study of four state-of-the-art knowledge graph embedding models viz. CompGCN, NodePiece, PairRE, and TorusE. The PairRE results are 80% comparable to the corresponding published results on the Hit@10 parameter. On the other hand, for the MRR parameter, TorusE provided 95% comparable findings to results reported in the accompanying publication. This research has also demonstrated that reproducibility is a significant challenge in knowledge graph embedding research and highlighted the importance of transparency and standardization in this field.

co-located with Extended Semantic Web Conference (ESWC) Hersonissos Greece

1. Introduction

Graph Completion (KGC) tries to solve this problem by identifying the missing entities or relations. Most of the research in KGC focuses on finding the low-level embedding for entities and relations. These models are called Knowledge Graph Embedding Models (KGEMs).

Many neural-network-based methods [ 4, 5, 6, 7, 8 ] have been proposed for KGEM, showing promising results. However, they have a complex scoring function in the form of a black-box neural network. Due to this black-box nature, it becomes dificult sometime to correlate the architectural changes with better performance. To alleviate these problems, Ali et al. [9] has performed extensive experimentation over 21 KGEM models with various training approaches, loss functions, and many other hyperparameters. However, four latest KGEM models, viz. TorusE[10], PairRE[11], NodePiece[12], and CompGCN[13] are not considered in that research work.

Keeping research work by Ali et al. [9] as a base for our study, we have performed similar experimentation on those four KGEM models. Hence this study aims to replicate the same result under similar circumstances to examine whether embedding models are particularly successful for link prediction tasks. Thorough experimentation is done on those KGEMs on four widely popular datasets, and reported the results on four well-known performance metrics. Pykeen library has been used for the implementation. Our experimentation results show that reproducing the results published in respective research papers is dificult, and performance varies drastically with slight changes in hyper-parameters. These observations point to the need for more study on KGEM algorithms and their evaluation on a complex, practical dataset.

This paper is organized in the following way: Section 2 introduces the basics of KG and its embedding concepts along with discussion on interaction models. Section 3 discusses the experimental setup and a detailed discussion on evaluation metrics and the dataset. Section 4 presents the results of this experimentation along with a discussion and implication. Finally, the conclusion and future work is discussed in section 5.

2. Knowledge Graph Embedding Models

KGs are being widely used for storing knowledge in the form of vertices and edges. All the important entities in the text are treated as vertices, while the relation between two entities is expressed as the edge. Figure 1 shows the sample knowledge graph of drug discovery. Such KGs can then be processed to derive insights from the vast amount of heterogeneous data or to perform the inference on it. KGs thus possess the possibility of becoming the ‘brains’ of machines. Hence A collection of triples (ℎ, , ) may be used to describe KG, where h and t stand for the head node and tail node, respectively, and r indicates the relationship between them. If is an entity collection, and ℜ is a collection of relations, KG can be defined as ⊂ × ℜ × .

In KGEM, entities and relations are represented in vector space, preserving the latent structure, as demonstrated in figure 2. One can manipulate the KG entities and relations through vector manipulation. Due to simplicity in manipulation, KGEM has gained attention recently due to its applicability in recommendation systems [14, 15], information retrieval [16, 17], question answering [18] etc. In the subsequent section, a few interaction models that have been considered for the experimentation are explained.

Associates

Binds Interacts Treats

Treats Contains Causes

2.1. Interaction model

Symptoms Disease

Gene

Medicine side effects

Molecule Interaction models calculate the plausibility of the fact (ℎ, , ) when embedding for the head entity, relation, and the tail entity is given. Thus interaction model can be summarized as a mapping function ∶ × ℜ × → ℝ , that gives the plausibility score of the triple (ℎ, , ) ∈ Generally, interaction models can be categorized into two, viz., 1. Translation models 2. Semantic matching models.

Translation Model finds the distance between entities and uses it as a scoring function. Figure 3 illustrates the translation principal, which models this problem as the minimization of (ℎ + ) − .

Canonical methods that come under this category are TransE[19] and its variants like TransR[20], TransH[21], TransD[22]; RotatE[23], HakE[24], MuRE[25], KG2E[26], PairRE[11]. All these methods follow the general principle of translation while adopting diferent representation spaces.

Semantic Matching Model preserves the latent semantic by using a similarity-based scoring function. Entities and semantically close relations are mapped to nearby points on vector space as demonstrated in figure 4. Usually, semantic matching is done using factorization or a neural network approach. RESCAL[27], ComplEx[28], QuatE[29], TuckER[30], DISTMULT[31], SimplE[32], etc. are few factorization based approaches. Whearas, ProjE[33], ERMLP[34], NTN[35], ConvKB[ 4 ], ConvE[ 5 ], etc. are neural network based models.

For this study four latest KGEM models viz. TorusE, PairRE, CompGCN, and NodePiece are considered. In the following subsection, these models are explained in depth. r h fr(h,t) t 2.1.1. TorusE Since regularization afects the TransE algorithm negatively as it forces embedding to be on a sphere of embedding space, reducing its link prediction capability [36], TorusE[10] avoids the regularization by projecting n-dimensional Euclidean space into n-dimensional torus space. TorusE also, as shown in figure 5, follows the underlying translational principle of TransE i.e., [ℎ] + [ ] ≈ []. It uses equation 1 as a scoring function on the n-dimensional torus. 2.1.2. PairRE Two major challenges in embedding any relations are: 1. Handling the complex relations. 2. Preserving the inherent property of relation. To overcome both challenges, the PairRE model proposed in [11] considers relation vector as a vector pair [ , ]. As shown in figure 6, the Hadamard Product of these head and tail entity vectors project them in euclidean space and plausibility of the triple (ℎ, , ) is calculated from a distance between projected vectors. Thus the vector pair [ , ] is adjusted so that ℎ ∘ ≈ ∘ if triple is present. Otherwise, ℎ ∘ and ∘ should not be close. Thus scoring function becomes the minimization of equation 2 (ℎ, ) = − ‖ℎ ∘ − ∘ ‖ (2) 2.1.3. CompGCN Most of the multi-relation GCN methods sufer from the over-parameterization problem. CompGCN [13], which is a generalization of multi-relation GCN, alleviates these problems by using the same embedding space for entities and relations. Subsequently, using many composition operators on them.

Basically CompGCN views the KG as (, ℜ, ℤ, ) , where and ℜ are as defined earlier, while ℤ ∈ ℝ||× 0 and ∈ ℝ |ℝ|× 0 mean 0 size input feature of entity and relation respectively.

GCN uses equation 3 to update the embedding for nodes. Since it doesn’t involve relationspecific input features, it sufers from over-parameterization.

ℤ = (

∑ (,)∈() ) where ( ) : Neighbourhood of v : Trainable matrix representing all the relation types.

To reduce the problem of over-parameterization, CompGCN uses the composition operator Φ over the node in a neighborhood with respect to relation r. For better information flow, CompGCN assumes bi-directional edges in the graph. Hence, the relation set is expanded by adding inverse relation for each relation type. To incorporate these points, CompGCN uses equation 4.

ℤ = (

∑ (,)∈() () Φ( , )) where, () specifies the relation type parameter. and are initial representation of u and r respectively.

v r1 r2 r2_inv u1

z_u1 x_r1 Φ u2 x_r2inv

Φ z_u2 (1) Φ( 1 , 1 ) + (2 )Φ( 2 , 2 )) (3) (4) (5) 2.1.4. Node Piece Large Language Models (LMM) like BERT and GPT don’t employ shallow embedding by finding the embedding for all the words. However, it learns the embedding for a few words and tries to ifnd the embedding for other words by using the learned embedding. This reduces the number of parameters drastically.

Taking a cue from this, NodePiece [12] uses a few selected entities (anchor nodes) and all relations as vocabulary set . Hence, || ≪ ||

. Similar to CompGCN, NodePiece also assumes bi-directional relations; hence inverse relation of each relation type is added to the relation set.

a1 r3_inv r3

r5 Target Node a2 a3 r7 k nearest anchor embeddings from target anchor distance node 2 3 1 a1 a2 a3 r3_inv r5 r7

Encoder hash(Target Node) encoder function for the same is given in equation 7.

ℎℎ() = [{ } , {Δ } , { } ]

(7) where tively. { } represents embedding for all m incident relations.

3. Experimental setup

{ } and {Δ } represents embedding for k anchors and their distance from target node respec Models selected for this study belong to diferent categories; separate experimental setups are considered by looking at the advantages and prerequisites of each model. KG is split into training and testing parts in every experimentation, and then Hyper-Parameter Optimization (HPO) is performed on TorusE, PairRE, CompGCN, and NodePiece models. As prescribed by Ali et al. [37], the detailed experimental setup is designed, which is also discussed in figure 9.

Datasets: Four Well known datasets are used in this research.

Model Selection

NodePiece Loss function sLCWA

Parameter Configuration

EDmimbeendsdiionng Optimizer Train Evaluate

HPO KG embeddings

Trained Model 1. Kinships: The dataset Kinships[38] has 104 entities representing the people of some tribe, and those people are related to each other with 26 diferent relationships. It has 10,686 triples. 2. Nations: The Nations [39] dataset is one of the oldest knowledge graph datasets comprising 14 countries and their various relationships. 3. WN18: WN18 [40] dataset is part of WordNet dataset. There are 40,943 sunsets (i.e., entities) and 18 conceptual semantic relationships. WN18 has a test leakage, inverse relations present in WN18 are removed, and WN18RR[41] dataset is formed. Though WN18RR is a better version of WN18, most of the methods have used WN18; hence for consistency purposes, we have used the WN18 dataset for detailed analysis. 4. FB15k-237: Since FB15K [19] also has similar leakage issue, FB15k-237 [42] dataset is formed by removing the inverse relations. It has 14541 entities and 237 relations.

Evaluation Metric KGEMs are mostly evaluated using link prediction tasks. There are many metrics available to evaluate knowledge graph embedding; however, Mean Rank(MR), Adjusted Mean Rank (AMR), Mean Reciprocal Rank (MMR), and Hits@K are used more frequently in the literature.

Mean Rank : MR is the arithmetic mean of ranks of all triples (ℎ, , ) ∈ . It is given in the equation 8: =

1 |

∑ () | ∈

Adjusted Mean Rank: As explained in [43], MR is flawed since getting a low rank with fewer possible candidates is easy. AMR neutralizes this by taking the ratio of MR with the expected mean rank. Thus making it useful to compare the results of two diferent-sized datasets.

Mean Reciprocal Rank: MRR, also known as Inverse harmonic mean rank, is the mean of the reciprocal of a rank. Thus defined as equation 9 =

1 |

∑ | ∈ () 1 (8) (9) [44] and [45] argues that MMR is theoretically incorrect as ranks are in an ordinal scale and ifnding their reciprocal is wrong. However, these arguments are countered by [ 46]. Moreover, MMR is a frequently used metric. Unlike the Hits@K metric, it doesn’t ignore the changes in high-rank values. At the same time, unlike MR, it is more sensitive to changes in low-rank values than high-rank values. Thus it gives more importance to small ranks while remaining less afected by outliers.

Hits@K: Hits@K (Mostly K= {1,3,5,10}) is a simple metric that measures the ratio of test triples appearing in top K entries. Mathematically it can be represented as equation 10. { ∈ | () ⩽ } | |

One of the biggest drawbacks of the Hits@K metric is it considers the entries appearing in top k ranks but ignores all the cases where () > . In consequence, it doesn’t make any diference to Hits@K whether () = + 1 or + where » 1 . Thus Hits@K is practically useless for comparing diferent models. However, most of the published articles include Hits@K as an evaluation metric. Hence, this metric has been used for reproducibility study. With this experimental setup, a series of trials have been performed. The following section discusses the outcome of these trials.

4. Result and Discussions

This research objective was to assess whether the findings of four current Knowledge Graph Embedding techniques were reproducible and provide comprehensive results of these models for their comparability with newer models. In order to assess reproducibility, this study replicated the key findings from ToursE, PairRE, CompGCN, and NodePiece. Analysis revealed that results are reproducible only in some cases, indicating a substantial challenge in reproducing results across scientific research. This section is divided into two parts. Section 4.1 compares the obtained results with published results. It also discusses the probable reasons for the deviations in results. Section 4.2 gives the results1 for the selected models for various datasets and for diferent evaluation metrics.

4.1. Comparison with published results

The PyKEEN framework, a Python toolkit for KGEMs, has been used for all work. In addition, the Optuna package is used for hyperparameters optimization. The evaluation metrics used are rank-based. Hence it becomes very necessary to select the strategy to break a tie. Due to the absence of any strategy reported in the selected publications, the ’realistic rank’ method on ’head’ and ’tail’ prediction is used.

In this research, 160 experiments, ten each for every model-dataset pair, were carried out. Each experiment is repeated multiple times for better reporting of a study. Moreover, CompGCN and NodePiece methods are also used to train a model on the WN18RR dataset. But results of WN18RR and WN18 were found to be similar. Table 2 shows the best results 2 obtained for the respective setup. For the FB15k237 dataset, results for the PairRE method are 80% and 70% of published results for Hit@10 and MRR, respectively. Moreover, with 95% comparable results, the TorusE method has also performed exceptionally well with respect to the MRR parameters. However, the results were not near to the published results for other methods. Reproducing the same result under similar conditions is challenging. Also, due to diferent implementations and various interpretations about the link prediction task evaluation metric, it becomes dificult 1The source code and results obtained have been made available at https://github.com/bhushan-zope/ReproducibilityStudy 2 Results in bracket uses WN18RR dataset to compare two previously published results [9]. This diference in the observed and reported results could be attributed to various factors discussed below.

1. Diference in ranking approach: As discussed in [47], [48], various authors have implemented the ranking metrics diferently. If more than one triple has the same rank, which should be ranked higher, it is an important question to answer and afects the overall scores. Yet, no author has declared their ranking approach in the publication. However, in this research, a realistic ranking approach is used, which is the mean of the pessimistic and optimistic ranking approaches. 2. Diference in the implementation: This complete study is based on the PYKEEN library, whereas all the respective authors have implemented the methods independently. 3. Missing finer details: In most cases, the hyper-parameter values have not been reported.

Additionally, some publications have used grid search for hyper-parameters. However, the best configuration is not reported, leaving scope for various possible combinations to experiment with. These hyper-parameters initialized to diferent values can significantly afect the results [ 9].

Additionally, WN18 is based on WordNet, which consists of English words and their semantic relationships. It mainly consists of semantically similar relations. Hence due to Humongous relations (i.e., all the relations of the same type) and Data Imbalance (i.e., some relations occur more frequently), the results of all the methods are better for the FB15K-237 dataset than the WN18 dataset.

Moreover, figure 10 represents the central tendency of the results obtained. From figure 10a and figure 10b, it is evident that the results for CompGCN and NodePiece are away from the central tendency. The interquartile range(IQR) indicates the reliability of the results. More points clustered around the median give a narrow IQR indicating that the data is more reliable or mored representative of the population. Hence, wide IQR for TorusE (as shown in figure 10c and figure 10d) indicates that the results obtained might not be very reliable; on the other hand, narrow IQR for CompGCN, NodePiece, and PairRE(up to a certain extent) makes their results more reliable. A highly skewed distribution around the median also indicates the results’ poor quality.

(a) Hit@10 (b) MMR

4.2. Comprehensive results

Due to the availability of various performance parameters and datasets, researchers in the KGEM domain have used various combinations of them. As a result, it becomes practically impossible to compare the two KGEMs. To eliminate this problem, another objective of this study was to present the result of selected models on more datasets with various parameters. Section 3 explains the considered datasets and evaluation metrics in detail. This section discusses the results of ToursE, PairRE, CompGCN, and NodePiece methods with those datasets and evaluation metrics.

Results listed in appendix A show that the models’ performance varied significantly depending on the dataset and the performance parameter being used. Figure 11 shows the complete results of all the models on all four datasets using all four parameters. The blue line in figure 11 represents the results of CompGCN, whereas the red, yellow, and green lines represent the results for NodePiece, PairRE, and TorusE, respectively.

Unlike MR and AMR, larger values represent better results for Hit@10 and MMR parameters. Thus in figure 11a and 11b, the yellow line covers most of the area; at the same time, in figure 11c and 11d, the yellow line covers the smallest area compared to the other three lines representing an excellent performance of PairRE method on all the datasets. Similar observations can be drawn on other models from figure 11. Moreover, It is surprising to notice the inconsistent gain on some datasets. For example, in PairRE, there is ≈a 30% increase in the hit@10 metric compared to TorusE on WN18 and Kinships dataset. However, there is only a 3% increase on the FB15K237 dataset and a 1% decrease on the Nations dataset. This also leads to less confidence in the methods.

Overall, the findings can be summarized as follows: 1. Results of PairRE method were satisfactorily close to the results mentioned in its publication Chao et al. [11]. However, the results of other methods were dificult to reproduce. 2. Compared to the other three methods, PairRE performed outstandingly well on all datasets considering all metrics. 4. TorusE is more suitable for smaller datasets and its performance degrades for more complex datasets.

These findings suggest that the choice of model and performance parameter can significantly impact the efectiveness of knowledge graph embedding models. Our study provides valuable insights for researchers and practitioners seeking to use these models for knowledge graph completion tasks.

5. Conclusion

In this work, knowledge graph embedding models’ reproducibility was examined. The study’s main objective was to replicate the findings of four cutting-edge embedding models on four benchmark datasets. Findings showed that several factors, such as the selection of hyperparameters, the evaluation dataset, and the implementation specifics, significantly impact the reproducibility of these models. In order to ensure reproducibility, the study emphasizes the significance of including thorough explanations of the experimental setup and hyperparameters in research articles. It also highlights the necessity of uniformity in embedding model evaluation to enable comparison and benchmarking across various research.

In summary, the reproducibility analysis of knowledge graph embedding models ofers insightful information on the challenges and opportunities of this emerging research area. This work will serve as a catalyst for future studies to enhance the consistency and dependability of knowledge graph embedding models, which will ultimately result in better comprehension and implementation of knowledge graphs in many contexts. [6] L. Guo, Z. Sun, W. Hu, Learning to exploit long-term relational dependencies in knowledge graphs, ArXiv abs/1905.04914 (2019). [7] X. Jiang, Q. Wang, B. Wang, Adaptive convolution for multi-relational learning, in: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Association for Computational Linguistics, Minneapolis, Minnesota, 2019, pp. 978–987. URL: https://aclanthology.org/N19-1103. doi:1 0 . 1 8 6 5 3 / v 1 / N 1 9 - 1 1 0 3 . [8] D. Q. Nguyen, T. Vu, T. D. Nguyen, D. Q. Nguyen, D. Q. Phung, A capsule network-based embedding model for knowledge graph completion and search personalization, CoRR abs/1808.04122 (2018). URL: http://arxiv.org/abs/1808.04122. a r X i v : 1 8 0 8 . 0 4 1 2 2 . [9] M. Ali, M. Berrendorf, C. T. Hoyt, L. Vermue, M. Galkin, S. Sharifzadeh, A. Fischer, V. Tresp, J. Lehmann, Bringing light into the dark: A large-scale evaluation of knowledge graph embedding models under a unified framework, IEEE Transactions on Pattern Analysis and Machine Intelligence (2021). [10] T. Ebisu, R. Ichise, Toruse: Knowledge graph embedding on a lie group, in: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, 2018. [11] L. Chao, J. He, T. Wang, W. Chu, PairRE: Knowledge graph embeddings via paired relation vectors, in: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Association for Computational Linguistics, Online, 2021, pp. 4360–4369. URL: https://aclanthology.org/2021.acl-long.336. doi:1 0 . 1 8 6 5 3 / v 1 / 2 0 2 1 . a c l - l o n g . 3 3 6 . [12] M. Galkin, E. Denis, J. Wu, W. L. Hamilton, Nodepiece: Compositional and parametereficient representations of large knowledge graphs, in: International Conference on Learning Representations, 2022. URL: https://openreview.net/forum?id=xMJWUKJnFSw. [13] S. Vashishth, S. Sanyal, V. Nitin, P. Talukdar, Composition-based multi-relational graph convolutional networks, in: International Conference on Learning Representations, 2020.

URL: https://openreview.net/forum?id=BylA_C4tPr. [14] Q. Wang, Z. Mao, B. Wang, L. Guo, Knowledge graph embedding: A survey of approaches and applications, IEEE Transactions on Knowledge and Data Engineering 29 (2017) 2724–2743. doi:1 0 . 1 1 0 9 / T K D E . 2 0 1 7 . 2 7 5 4 4 9 9 . [15] H. Wang, F. Zhang, M. Zhao, W. Li, X. Xie, M. Guo, Multi-task feature learning for knowledge graph enhanced recommendation, in: The World Wide Web Conference, WWW ’19, Association for Computing Machinery, New York, NY, USA, 2019, p. 2000–2010.

URL: https://doi.org/10.1145/3308558.3313411. doi:1 0 . 1 1 4 5 / 3 3 0 8 5 5 8 . 3 3 1 3 4 1 1 . [16] R. Reinanda, E. Meij, M. de Rijke, Knowledge Graphs: An Information Retrieval Perspective, 2020. [17] S. Tiwari, F. Ortiz-Rodriguez, B. Villazon, Guest editorial: Special issue on “current topics of knowledge graphs and semantic web”, International Journal of Web Information Systems 18 (2022) 237–239. [18] B. Zope, S. Mishra, K. Shaw, D. R. Vora, K. Kotecha, R. V. Bidwe, Question answer system: A state-of-art representation of quantitative and qualitative analysis, Big Data and Cognitive Computing 6 (2022). URL: https://www.mdpi.com/2504-2289/6/4/109. doi:1 0 . 3 3 9 0 / b d c c 6 0 4 0 1 0 9 . [19] A. Bordes, N. Usunier, A. Garcia-Duran, J. Weston, O. Yakhnenko, Translating embeddings for modeling multi-relational data, in: C. Burges, L. Bottou, M. Welling, Z. Ghahramani, K. Weinberger (Eds.), Advances in Neural Information Processing Systems, volume 26, Curran Associates, Inc., 2013. URL: https://proceedings.neurips.cc/paper/2013/file/ 1cecc7a77928ca8133fa24680a88d2f9-Paper.pdf. [20] Y. Lin, Z. Liu, M. Sun, Y. Liu, X. Zhu, Learning entity and relation embeddings for knowledge graph completion, in: Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, AAAI’15, AAAI Press, 2015, p. 2181–2187. [21] Z. Wang, J. Zhang, J. Feng, Z. Chen, Knowledge graph embedding by translating on hyperplanes, Proceedings of the AAAI Conference on Artificial Intelligence 28 (2014). URL: https://ojs.aaai.org/index.php/AAAI/article/view/8870. doi:1 0 . 1 6 0 9 / a a a i . v 2 8 i 1 . 8 8 7 0 . [22] G. Ji, S. He, L. Xu, K. Liu, J. Zhao, Knowledge graph embedding via dynamic mapping matrix, in: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Association for Computational Linguistics, Beijing, China, 2015, pp. 687–696. URL: https://aclanthology.org/P15-1067. doi:1 0 . 3 1 1 5 / v 1 / P 1 5 - 1 0 6 7 . [23] Z. Sun, Z.-H. Deng, J.-Y. Nie, J. Tang, Rotate: Knowledge graph embedding by relational rotation in complex space, in: International Conference on Learning Representations, 2019. URL: https://openreview.net/forum?id=HkgEQnRqYQ. [24] Z. Zhang, J. Cai, Y. Zhang, J. Wang, Learning hierarchy-aware knowledge graph embeddings for link prediction, Proceedings of the AAAI Conference on Artificial Intelligence 34 (2020) 3065–3072. URL: https://ojs.aaai.org/index.php/AAAI/article/view/5701. doi:1 0 . 1 6 0 9 / a a a i . v 3 4 i 0 3 . 5 7 0 1 . [25] I. Balažević, C. Allen, T. Hospedales, Multi-relational poincar\’e graph embeddings, in:

Advances in Neural Information Processing Systems, 2019. [26] S. He, K. Liu, G. Ji, J. Zhao, Learning to represent knowledge graphs with gaussian embedding, in: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, CIKM ’15, Association for Computing Machinery, New York, NY, USA, 2015, p. 623–632. URL: https://doi.org/10.1145/2806416.2806502. doi:1 0 . 1 1 4 5 / 2 8 0 6 4 1 6 . 2 8 0 6 5 0 2 . [27] M. Nickel, V. Tresp, H.-P. Kriegel, A three-way model for collective learning on multirelational data, in: Proceedings of the 28th International Conference on International Conference on Machine Learning, ICML’11, Omnipress, Madison, WI, USA, 2011, p. 809–816. [28] T. Trouillon, J. Welbl, S. Riedel, E. Gaussier, G. Bouchard, Complex embeddings for simple link prediction, in: Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48, ICML’16, JMLR.org, 2016, p. 2071–2080. [29] S. Zhang, Y. Tay, L. Yao, Q. Liu, Quaternion Knowledge Graph Embeddings, Curran

Associates Inc., Red Hook, NY, USA, 2019. [30] I. Balazevic, C. Allen, T. Hospedales, TuckER: Tensor factorization for knowledge graph completion, in: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Association for Computational Linguistics, Hong Kong, China, 2019, pp. 5185–5194. URL: https://aclanthology.org/D19-1522. doi:1 0 . 1 8 6 5 3 / v 1 / D 1 9 - 1 5 2 2 . [31] B. Yang, W. Yih, X. He, J. Gao, L. Deng, Embedding entities and relations for learning and inference in knowledge bases, in: Y. Bengio, Y. LeCun (Eds.), 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, 2015. URL: http://arxiv.org/abs/1412.6575. [32] S. M. Kazemi, D. Poole, Simple embedding for link prediction in knowledge graphs, in: Proceedings of the 32nd International Conference on Neural Information Processing Systems, NIPS’18, Curran Associates Inc., Red Hook, NY, USA, 2018, p. 4289–4300. [33] B. Shi, T. Weninger, Proje: Embedding projection for knowledge graph completion, in: Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, AAAI’17, AAAI Press, 2017, p. 1236–1242. [34] X. Dong, E. Gabrilovich, G. Heitz, W. Horn, N. Lao, K. Murphy, T. Strohmann, S. Sun, W. Zhang, Knowledge vault: A web-scale approach to probabilistic knowledge fusion, in: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’14, Association for Computing Machinery, New York, NY, USA, 2014, p. 601–610. URL: https://doi.org/10.1145/2623330.2623623. doi:1 0 . 1 1 4 5 / 2 6 2 3 3 3 0 . 2 6 2 3 6 2 3 . [35] R. Socher, D. Chen, C. D. Manning, A. Y. Ng, Reasoning with neural tensor networks for knowledge base completion, in: Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 1, NIPS’13, Curran Associates Inc., Red Hook, NY, USA, 2013, p. 926–934. [36] S. Ji, S. Pan, E. Cambria, P. Marttinen, P. S. Yu, A survey on knowledge graphs: Representation, acquisition, and applications, IEEE Transactions on Neural Networks and Learning Systems 33 (2022) 494–514. doi:1 0 . 1 1 0 9 / T N N L S . 2 0 2 1 . 3 0 7 0 8 4 3 . [37] M. Ali, H. Jabeen, C. T. Hoyt, J. Lehmann, The keen universe, in: C. Ghidini, O. Hartig, M. Maleshkova, V. Svátek, I. Cruz, A. Hogan, J. Song, M. Lefrançois, F. Gandon (Eds.), The Semantic Web – ISWC 2019, Springer International Publishing, Cham, 2019, pp. 3–18. [38] W. W. Denham, The detection of patterns in alyawarra nonverbal behavior, 2014. [39] R. J. Rummel, Dimensionality of nations project: Nation attribute data, 1950-1965, 1992.

URL: https://doi.org/10.3886/ICPSR05020.v1. doi:1 0 . 3 8 8 6 / I C P S R 0 5 0 2 0 . v 1 . [40] A. Bordes, X. Glorot, J. Weston, Y. Bengio, A semantic matching energy function for learning with multi-relational data, Machine Learning 94 (2014) 233–259. URL: https: //doi.org/10.1007/s10994-013-5363-6. doi:1 0 . 1 0 0 7 / s 1 0 9 9 4 - 0 1 3 - 5 3 6 3 - 6 . [41] K. Toutanova, D. Chen, Observed versus latent features for knowledge base and text inference, in: Proceedings of the 3rd Workshop on Continuous Vector Space Models and their Compositionality, Association for Computational Linguistics, Beijing, China, 2015, pp. 57–66. URL: https://aclanthology.org/W15-4007. doi:1 0 . 1 8 6 5 3 / v 1 / W 1 5 - 4 0 0 7 . [42] K. Toutanova, D. Chen, Observed versus latent features for knowledge base and text inference, in: Proceedings of the 3rd Workshop on Continuous Vector Space Models and their Compositionality, Association for Computational Linguistics, Beijing, China, 2015, pp. 57–66. URL: https://aclanthology.org/W15-4007. doi:1 0 . 1 8 6 5 3 / v 1 / W 1 5 - 4 0 0 7 . [43] M. Berrendorf, E. Faerman, L. Vermue, V. Tresp, Interpretable and fair comparison of link prediction or entity alignment methods with adjusted mean rank, CoRR abs/2002.06914 (2020). URL: https://arxiv.org/abs/2002.06914. a r X i v : 2 0 0 2 . 0 6 9 1 4 . [44] N. Fuhr, Some common mistakes in ir evaluation, and how they can be avoided, SIGIR

A. Appendix

Result Table:

H@10 CompGCN NodePiece PairRE

TorusE

MR CompGCN NodePiece PairRE

TorusE

[1]

Chen ,

Wang ,

Zhao , J. Cheng,

Zhao ,

Duan , Knowledge graph completion: A review , IEEE Access 8 ( 2020 ) 192435 - 192456 .

[2]

Wang ,

Mao ,

Wang ,

Guo , Knowledge graph embedding: A survey of approaches and applications , IEEE Transactions on Knowledge and Data Engineering 29 ( 2017 ) 2724 - 2743 . doi:1 0 . 1 1 0 9 / T K D E . 2 0 1 7 . 2 7 5 4 4 9 9 .

[3]

Tiwari ,

F. N.

Al-Aswadi ,

Gaurav , Recent trends in knowledge graphs: theory and practice , Soft Computing 25 ( 2021 ) 8337 - 8355 .

[4]

D. Q.

Nguyen ,

T. D.

Nguyen ,

D. Q.

Nguyen ,

D. Q.

Phung , A novel embedding model for knowledge base completion based on convolutional neural network , in: NAACL-HLT (2) , 2018 , pp. 327 - 333 . URL: https://aclanthology.info/papers/N18-2053/n18- 2053 .

[5]

Dettmers ,

Minervini ,

Stenetorp ,

Riedel , Convolutional 2d knowledge graph embeddings , in: Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence and Thirtieth Innovative Applications of Artificial Intelligence Conference and Eighth AAAI Symposium on Educational Advances in Artificial Intelligence , AAAI'18/IAAI'18/EAAI'18, AAAI Press, 2018 .