Data Augmented Knowledge Graph Completion via Pre-trained Language Models Shixuan Liu1 , Pouya Ghiasnezhad Omran2 and Kerry Taylor2 1 China Telecom Corporation Ltd. Data&AI Technology Company, 1301, 13 / F, Building 1, No. 31 Financial Street, Xicheng District, Beijing, China 2 Australian National University, ACT 2601, Canberra, Australia Abstract Knowledge graphs provide significant assistance for many artificial intelligence tasks, but they are usually incomplete. Techniques for knowledge graph completion can improve the coverage of Knowledge Graphs (KGs) by inducing new facts. Traditional methods for completion use structural representations in embedding space, but textual information can also be helpful. Recently, pre-trained language models have shown impressive performance on natural language processing tasks. KG-BERT is a pre-trained language model that is used for knowledge graph completion and achieves appealing performance. However, KG-BERT struggles when the number of facts is inadequate. We consider the inadequacy of data for various relations and the compensation for sparsity via data augmentation. We propose two knowledge graph data augmentation methods to generate facts with novel relations. Specifically, multi-hop relations between two entities are extracted to form multi-hop facts, and implicit relations are generated by horn rules. Moreover, we find multi-hop facts are useful for few-shot learning scenarios. Our system improves the performance of KG-BERT regarding the accuracy of the link prediction task. The experimental results demonstrate that our models make significant enhancements for KG-BERT on several knowledge graph completion benchmarks (e.g., WN18RR and UMLS). Keywords Knowledge Graph Completion, Pre-trained Language Models, Data Augmentation 1. Introduction Knowledge graphs (KG) are effective structures for machine learning, composed of ’triples’—pairs of entities and a relation. Given KGs’ inherent incompleteness, efforts are made to complete them by inferring plausible triples. Transformer-based models like BERT [1] have excelled in transferring pre-training knowledge to specific tasks. KG-BERT [1] is a notable development in this field, which employs pre-trained language models for KG Completion (KGC). It fine-tunes BERT models with concatenated text descriptions of triples and utilizes contextualized embedding to evaluate the validity of a potential fact. Although KG-BERT demonstrates great achievement on the link prediction task, it can be further improved. Due to the incompleteness of knowledge graphs, some potential relationships could be further explored. For example, there are two triples (Anthony Albanese, born_in_city, ISWC 2023 Posters and Demos: 22nd International Semantic Web Conference, November 6–10, 2023, Athens, Greece $ liusx14@chinatelecom.cn (S. Liu); p.g.omran@anu.edu.au (P. G. Omran); kerry.taylor@anu.edu.au (K. Taylor)  0000-0002-4473-3877 (P. G. Omran); 0000-0003-2447-1088 (K. Taylor) © 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings Sydney) and (Sydney, located_in, Australia). According to these two triples, Anthony Albanese and Australia can use an alternative relation that may not appear in the knowledge graph to directly represent their relationship, i.e., nationality_of. Besides, the potential relations explored by rules could also give further information. For example, consider the rule co_work(X, Y) ← work_at(X, A) ∧ work_at(Y, A), where X, Y, A can be any entity that satisfies this rule and co_work and work_at are existing relations. Not all entities satisfying the rule’s body will satisfy its head, it usually has a confidence score to show its reliability. If we want to find a relation that could describe (X, Y) while it should be semantically similar with co_work, like one relation expressing work in the same company. This kind of relation is a loose implicit expression of the original one, and it is worthwhile to investigate how to generate relations like it. While the rule learners can mine rules with relatively low confidence, we cannot apply all these rules to KG directly and treat all infrared facts as valid facts to KG. In this paper, we improve the Knowledge Graph-BERT (KG-BERT) with two innovative data augmentation techniques. First, a multi-hop data augmentation approach boosts the overall performance and offers an effective solution for few-shot learning. Second, the creation of implicit facts using AnyBURL-generated horn rules. Such facts express implicit information by constructing similar relations by dropout. These strategies together significantly improve the robustness and performance of KG-BERT. 2. Related work KG-BERT: BERT [2], as a landmark pre-trained language model, has achieved great performance on multiple NLP tasks. KG-BERT is an application of pre-trained language models on the knowledge graph completion task [1]. KG-BERT makes some changes to the fine-tuning process to suit the knowledge graph completion task. For a triple (h,r,t)(h, r, t), we could concatenate the head entity, relation, and tail entity as a sequence of tokens and encode them into word embeddings which can be represented as Wℎ = (w1ℎ , w2ℎ , ..., w𝑙ℎℎ ), W𝑟 = (w1𝑟 , w2𝑟 , ..., w𝑙𝑟𝑟 ) and W𝑡 = (w1𝑡 , w2𝑡 , ..., w𝑙𝑡𝑡 ) respectively, where 𝑙ℎ , 𝑙𝑟 and 𝑙𝑡 are the length of head entity, relation and tail entity. Eventually, we concatenate sequences by special tokens [CLS] and [SEP] as W ̃︁ = [w [CLS] ,W ,wℎ [SEP] 𝑟 ,W ,w [SEP] ,W,w𝑡 [SEP] ]. Meanwhile, different segment ids (0 and 1) are used to distinguish entity and relation as [w[CLS] (0), Wℎ (0), w[SEP] (0), W𝑟 (1), w[SEP] (1), W𝑡 (0), w[SEP] (0)]. The head and tail entities have the same segment embeddings (type 0), which are different from the segment embeddings of the relation (type 1). The pooling output of [CLS] in the last hidden state will be seen as the representation of the whole triple and will be fed into a binary classifier to determine if the triple is real. In prediction, the positive probability of triples is used for ranking. Rule Learning: Rule-based models, such as AnyBURL [3], can not only complete the knowledge graph but can also enable mining the rules, which may themselves be rich in information [4]. AnyBURL is based on sampling paths and generalizing them to Horn rules. For each generated ground path rule, its format is like ℎ(𝑐0 , 𝑐1 ) ← 𝑏1 (𝑐1 , 𝑐2 ), ..., 𝑏𝑛 (𝑐𝑛 , 𝑐𝑛+1 ), where {ℎ, 𝑏1 , ...𝑏𝑛 } ∈ ℛ and {𝑐0 , 𝑐1 , ...𝑐𝑛 } ∈ ℰ. The head of the rule is ℎ(...) while body is 𝑏1 (...) to 𝑏𝑛 (...). There are two types of generated rules, cyclic (𝑐0 = 𝑐𝑛+1 ) and acyclic (𝑐0 ̸= 𝑐𝑛+1 ). The quality of early sampling can be improved by reinforcement learning. Meanwhile, the summarized Figure 1: Flowchart showing how augmentation data complements language model-based knowledge graph completion rules and corresponding confidence scores are applied for knowledge graph completion. 3. Augmentation for Language Model-based KGC The flow chart shown in figure 1 shows how original data and two kinds of augmented data are fed into language models. For an LM-based KGC such as KG-BERT, we directly input the original knowledge graph 𝐾𝐺 into the model. For data augmentation via multi-hops, the multi-hop triples are extracted as a knowledge graph 𝐾𝐺𝑀 𝑇 and fed into the model. For data augmentation via rules, the implicit triples are extracted as a knowledge graph 𝐾𝐺𝐼 . Differently to 𝐾𝐺 and 𝐾𝐺𝑀 𝑇 , it first converts to word embeddings and is implemented dropout and then fed into the model. Augmentation via Multi-hop Facts: For entities ℎ and 𝑡, assume a series of 𝑛 triples such as {(ℎ, 𝑟1 , 𝑒1 ), (𝑒1 , 𝑟2 , 𝑒2 ), ..., (𝑒𝑛−1 , 𝑟𝑛 , 𝑡)}. We can present their relation in a multi-hop format 𝑟1 𝑟2 ... 𝑟𝑛 like ℎ−→𝑒 1 −→ 𝑒2 −→𝑒𝑛−1 −→𝑡. In order to make multi-hop triples adaptable for pre-trained language models, we ignore the entities in the relation and get the triple (ℎ, 𝑟1 ⊕ 𝑟2 ⊕ ... ⊕ 𝑟𝑛 , 𝑡). We convert such triples suitable format and call this method Data Augmentation via Multi-Hop (DAMH). Assume a multi-hop triple has 𝑘 relations as (ℎ, 𝑟1 ⊕ ... ⊕ 𝑟𝑘 , 𝑡), then the text sequence W ̃︁ DAMH can be represented as shown in formula 1. After that, we input W ̃︁ DAMH into a pre- trained language model and take out the pooled embedding CDAMH as the representation of the entire multi-hop triple. Besides, constructing 2-hop triples for relations that infrequently appear in training dataset can effectively improve the performance, which we call this few-shot learning in knowledge graph completion [5]. ̃︁ DAMH = [w[CLS] , Wℎ , w[SEP] , W𝑟1 , w[SEP] , ..., W𝑟𝑘 , w[SEP] , W𝑡 , w[SEP] ] W (1) Augmentation via Horn Rules: Rule-based models such as AnyBURL can find potential triples by exploring horn rules. The information implied in these rules can also be used to generate augmented triples. By exploring implicit patterns in a knowledge graph, the rule-based model can obtain rules such as co-work(X,Y) ← work_at(X,A) ∧ work_at(Y,A), where co-work(X,Y) is head and work_at(X,A) ∧ work_at(Y,A) is body. But not all entities X,Y,AX, Y, A satisfying body can get the head. If we need to find an implicit relation which can describe these entity pairs, then it should have some similarity to the original relation. Inspired by SimCSE [6], there is a simple and effective way to construct implicit relations by dropout. We directly implement relation embeddings dropout for creating implicit relations, referred to Data Augmentation via Implicit (DAI). As shown in formula 2 and 3, the dropout token embeddings added segment and position embeddings forms new token embeddings. We find even for graphs with only low-confidence rules, this approach can make improvements. ̃︁ token = [w[CLS] , Wℎ , w[SEP] , Dropout(W𝑟 , w[SEP] ), W𝑡 , w[SEP] ] W (2) token token token token token token token W ̃︁ DAI = W ̃︁ token + W ̃︁ segment + W ̃︁ position (3) 4. Experiments and Conclusion We conduct a set of experiments to assess the performance of our data augmentation methods. We evaluate our Data Augmentation via Implicit triples method on four datasets, while two of them are open-source WN18RR [7] and UMLS[7]. Two of them are reconstructed for few-shot learning. FB15k-237(few-shot) takes part in the test triples in FB15k-237 [8], whose relations are top 50 infrequently appearing in the training dataset. NELL-ONE(few-shot) [9] are reconstructed as follows, for each set of triples in one relation, we separate in proportion (0.7, 0.1, 0.2) for training, validation, and test, respectively. We have released our code and detailed results 1 , and generated augmented datasets for training, favoring 2-hop data extraction for optimal performance. DAMH expands the training datasets of WN18RR and UMLS by 158% and 307% respectively. Statistics of datasets are also published in the link. We adopted KG-BERT’s hyper-parameter settings due to high computational costs. Our methods are compared with other baselines on UMLS and WN18RR datasets. Test samples involved creating negative samples by replacing the head or tail entity with other entities not present in the training set. Table 1 The experimental results of few-shot learning on FB15k-237 and reformatted NELL-ONE datasets FB15k-237 NELL-ONE WN18RR UMLS (few-shot) (few-shot) Models Hits@10 MR Hits@10 MR Hits@10 MR Hits@10 MR KG-BERT 52.4 97 99.0 1.47 50.7 142 17.4 218 KG-BERT+DAI 56.8 90 99.3 1.46 - - - - KG-BERT++DAMH 62.4 85 99.5 1.39 57.8 139 21.9 258 KG-BERT++DAMH+DAI 63.0 156 99.5 1.42 - - - - From the comparison of experimental results in table 1, it can be seen that KG-BERT+DAMH, KG-BERT+DAI, and KG-BERT+DAMH+DAI all outperform the original KG-BERT on both 1 Detailed experimental results and code can be found at https://github.com/LSX-Sneakerprogrammer/ KG-Augmentation WN18RR and UMLS datasets. For the Hits@10 metric, the most significant improvement on WN18RR is from KG-BERT+DAI+DAMH (+10.6% compared with KG-BERT), while for UMLS is from KG-BERT+DAMH (+0.5% compared with KG-BERT). For the mean rank metric, KG-BERT+DAMH outperforms the official result of KG-BERT by a significant 12 on WN18RR and a smaller 0.08 on UMLS. These results establish that our augmentation approaches make significant progress on these well-studied tasks. For few-shot learning, we evaluate models on two datasets, FB15k-237 (few-shot) and NELL- ONE (few-shot), and compare them to KG-BERT. We can see the results in table 1 for FB15k-237 (few-shot). The improvements of KG-BERT+DAMH are 7.8% and 3, respectively for Hits@10 and Mean Rank. For the NELL-ONE (few-shot) dataset, the KG-BERT+DAMH surpasses KG-BERT on four metrics. These experimental results demonstrate the effectiveness of our method on few-shot learning. Our study proposes generating augmented triples via multi-hop connections and horn rules, effectively enhancing KG-BERT, particularly with the multi-hop method. Both methods and their combined form remarkably improve performance, with the multi-hop approach notably strength- ening few-shot learning tasks. Future research should investigate integrating the augmented data with other knowledge graph completion methods. References [1] L. Yao, C. Mao, Y. Luo, Kg-bert: Bert for knowledge graph completion, arXiv preprint arXiv:1909.03193 (2020). [2] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, BERT: Pre-training of deep bidirectional transformers for language understanding, in: NAACL-HLT, 2019, pp. 4171–4186. [3] C. Meilicke, M. W. Chekol, M. Fink, H. Stuckenschmidt, Reinforced anytime bottom up rule learning for knowledge graph completion, arXiv preprint arXiv:2004.04412 (2020). [4] P. G. Omran, Z. Wang, K. Wang, Knowledge graph rule mining via transfer learning, in: Advances in Knowledge Discovery and Data Mining: 23rd Pacific-Asia Conference, PAKDD 2019, Macau, China, April 14-17, 2019, Proceedings, Part III 23, Springer, 2019, pp. 489–500. [5] C. Zhang, H. Yao, C. Huang, M. Jiang, Z. Li, N. V. Chawla, Few-shot knowledge graph completion, in: Proceedings of the AAAI conference on artificial intelligence, volume 34, 2020, pp. 3041–3048. [6] T. Gao, X. Yao, D. Chen, Simcse: Simple contrastive learning of sentence embeddings, in: EMNLP, 2021, pp. 6894–6910. [7] T. Dettmers, P. Minervini, P. Stenetorp, S. Riedel, Convolutional 2d knowledge graph embeddings, in: AAAI, 2018, pp. 1811–1818. [8] K. Toutanova, D. Chen, P. Pantel, H. Poon, P. Choudhury, M. Gamon, Representing text for joint embedding of text and knowledge bases, in: EMNLP, 2015, pp. 1499–1509. [9] W. Xiong, M. Yu, S. Chang, X. Guo, W. Y. Wang, One-shot relational learning for knowledge graphs, in: EMNLP, 2018, pp. 1980–1990.