=Paper=
{{Paper
|id=Vol-3818/paper4
|storemode=property
|title=Knowledge Base-enhanced Multilingual Relation Extraction with Large Language Models
|pdfUrl=https://ceur-ws.org/Vol-3818/paper4.pdf
|volume=Vol-3818
|authors=Tong Chen,Procheta Sen,Zimu Wang,Zhengyong Jiang,Jionglong Su
|dblpUrl=https://dblp.org/rec/conf/lkm/ChenSWJS24
}}
==Knowledge Base-enhanced Multilingual Relation Extraction with Large Language Models==
Knowledge Base-enhanced Multilingual Relation
Extraction with Large Language Models
Tong Chen1,2 , Procheta Sen2 , Zimu Wang2,3 , Zhengyong Jiang1,* and Jionglong Su1,*
1
School of AI and Advanced Computing, Xiβan Jiaotong-Liverpool University, Suzhou, 215123, China
2
Department of Computer Science, University of Liverpool, Liverpool, L69 3BX, UK
3
School of Advanced Technology, Xiβan Jiaotong-Liverpool University, Suzhou, 215123, China
Abstract
Relation Extraction (RE) is an essential task that involves comprehending relational facts between entities from
natural language texts. However, existing research in RE, particularly those based on large language models
(LLMs), is proven to fall short in the task due to their context unawareness (lack of fine-grained understanding),
schema misalignment (misaligned with human-defined schema), and world knowledge ignorance (relying solely on
internal parametric knowledge). In this paper, we propose a novel framework to address the aforementioned
challenges. The framework consists of two stages, including 1) entity linking and 2) relation inference, by
fully leveraging the efficacy of external knowledge bases (KBs) and LLMs in this task. We conduct extensive
experiments in a multilingual setting and achieve state-of-the-art performance on the experimented datasets.
The LLMs with external knowledge can typically outperform those without knowledge by a significant margin,
indicating the effectiveness of our proposed framework.
Keywords
Multilingual, Relation Extraction, Knowledge Bases, Large Language Models, Natural Language Processing
1. Introduction
Relation Extraction (RE) is an essential task in information extraction (IE) that aims to comprehend
relational facts between entities in natural language texts [1]. For the first example in Table 1, given
an original input and an entity pair of interest (Apple Inc., iPhone), an RE model should be able to
πππππ’ππ‘ πππππ’πππ
predict the relationship between them, i.e., Apple Inc. βββββββββββ iPhone. The structured knowledge
obtained from RE models can support a variety of downstream applications, such as knowledge graph
construction or completion [1], question answering [2], and dialogue systems [3].
Previous research usually formulates RE as a pairwise classification task with pre-trained language
models (PLMs), in which novel methods have been proposed [4, 5]. Recently, large language models
(LLMs) demonstrate promising performance in a variety of downstream tasks [6, 7] across several
paradigms, such as in-context learning (ICL) [8], chain-of-thought (CoT) prompting [9], and fine-tuning.
However, they fall short in multiple specification-heavy tasks, including RE, whose performance under
particularly ICL is much behind state-of-the-art PLM-based methods [10]. Table 1 gives some examples
of mispredicted entity relationships using LLMs. Overall, the reasons why LLMs cannot perform well
in RE include their context unawareness, schema misalignment, and world knowledge ignorance:
1. Context Unawareness. The completion of RE requires a thorough and fine-grained comprehen-
sion of the information in given contexts. However, LLMs with ICL usually lack fine-grained
context awareness, which results in disregarded or erroneous relation prediction [10]. In the
first example in Table 1, LLMs should first thoroughly appreciate the context and the connec-
tion between βApple Inc.β, βdeviceβ, and βiPhoneβ; otherwise, they are unable to determine the
relationship between βApple Inc.β and βiPhoneβ.
The First international OpenKG Workshop: Large Knowledge-Enhanced Models, August 03, 2024, Jeju Island, South Korea
*
Corresponding author.
$ Zhengyong.Jiang02@xjtlu.edu.cn (Z. Jiang); Jionglong.Su@xjtlu.edu.cn (J. Su)
0000-0001-8873-4073 (Z. Jiang); 0000-0001-5360-6493 (J. Su)
Β© 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
CEUR
ceur-ws.org
Workshop ISSN 1613-0073
Proceedings
Table 1
Examples of mispredicted relationships by large language models (LLMs), consisting of three categories: context
unawareness, schema misalignment, and world knowledge ignorance.
Error Type Example
Input: Apple Inc. is an American multinational corporation [...] Devices include the
iPhone, iPad, Mac, [...].
Context
Entities: Apple Inc., iPhone
Unawareness
Prediction: N/A
Label: product produced
Input: Armstrong joined the NASA Astronaut Corps in the second group, which
was selected in 1962.
Schema
Entities: Armstrong, NASA Astronaut Corps
Misalignment
Prediction: work for
Label: part of
Input: The theory of relativity usually encompasses two interrelated physics theories
by Albert Einstein.
Knowledge
Entities: theory of relativity, Albert Einstein
Ignorance
Prediction: inventor
Label: discoverer
2. Schema Misalignment. RE models are required to predict the relationships between entities
from a human-labeled, pre-defined schema. However, the number of candidate relationships is
typically lengthy, and some relation types are misaligned between LLMs and human expectations
[10, 11]. In the second example in Table 1, LLMs may confuse the two relation types, βwork
forβ and βpart of β, and make incorrect predictions on the relationship between βArmstrongβ and
βNASA Astronaut Corpsβ.
3. World Knowledge Ignorance. World knowledge usually plays a vital role in RE, particularly in
understanding implicit relationships [12] and domain-specific knowledge [13]. However, LLMs
suffer in tasks that require rich world knowledge [14] and solely rely on their internal parametric
knowledge [10]. In the third example in Table 1, LLMs may predict the relationship as βinventorβ
rather than βdiscovererβ without thoroughly understanding the knowledge of βAlbert Einsteinβ
and the βtheory of relativityβ.
Knowledge bases (KBs) have been extensively employed in previous RE research. For example,
researchers leverage the relationships obtained from Freebase [15] and Wikipedia infoboxes [16] to
classify the relationships between entities in texts. However, such relationships are typically noisy
and are not faithful to what is described in the given contexts [17]. The following research focuses on
denoising and learning context-dependent relationships, such as utilizing natural language inference
(NLI) with entailment prediction [18]. Nevertheless, as LLMs have demonstrated their abilities in
NLI [19] and natural language reasoning [20], the capability of the combination of KBs and LLMs
requires further exploration to design contextual, aligned, and knowledgeable RE models. Moreover,
previous research on knowledge-enhanced RE primarily focuses on the English corpus, which limits
the adaptability of RE models to different linguistic contexts. This shortage hinders the development of
comprehensive IE systems in the multilingual setting.
In this paper, we propose a novel framework for RE to address the aforementioned challenges by
making the process contextually aware, schema-aligned, world knowledge-considered. The framework
consists of two stages, entity linking and relation inference, that fully leverage the efficacy of KBs
and LLMs in this task. As shown in Figure 1, given an original document and two entities of interest,
we first link the entities to Wikidata [21], a large-scale multilingual KB, to ascertain the relationship
between the entities in the world knowledge and regard it as the candidate relationship in the document.
Subsequently, in the second stage, we use the ICL strategy on LLMs to determine whether the candidate
relationship actually takes place in the given context.
We conduct extensive experiments in a multilingual setting using three widely used RE datasets:
DocRED [22], REBEL [18], and REDFM [23], with three LLMs: GPT-3.5, Llama 2 [24], and Flan-T5-XL
[25]. Experimental results demonstrate the effectiveness of our framework on all datasets, where the
performance of zero-shot RE on the models outperforms the cases without knowledge by a significant
margin. Additionally, it also achieves state-of-the-art performance on all three datasets and outperforms
fine-tuned PLM-based methods, validating the efficacy of our proposed framework. We also conduct
additional analysis on the effectiveness of knowledge, the impact of scaling up model parameters, and the
coverage of knowledge in multilingualism to further demonstrate the effectiveness and generalizability
of our proposed method.
The key contributions of this work are summarized as follows:
β’ We review the key literature on LLM-based RE thoroughly, and we argue that well-behaved RE
models should be contextually aware, schema-aligned, and world knowledge-considered.
β’ We propose a novel framework for RE, consisting of two stages: entity linking and relation
inference, to fully leverage the efficacy of KBs and LLMs in the RE task.
β’ Experimental results under a multilingual setting demonstrate the effectiveness and general-
izability of our method across diverse linguistic contexts with substantial improvements over
state-of-the-art baselines.
2. Related Work
2.1. Relation Extraction
RE has been extensively studied over the past years due to its potency in various downstream applications.
Early research in RE focuses on sentence-level RE [26, 27], while some later approaches shift to the
document level, aiming to comprehend the relationships between entities across multiple sentences [22].
The most commonly used methods for RE are sequence-based techniques, which essentially rely on
LSTM- or Transformer-based architectures [4, 5], modeling complicated interactions between entities
while implicitly capturing long-distance relationships. Furthermore, graph neural networks (GNNs) are
also employed in RE due to their efficacy in representing and interacting with structured data. In this
process, researchers construct relevant graphs using words, mentions, entities, or sentences as nodes
and predict relationships by reasoning on the graph [28, 29].
2.2. Knowledge-enhanced RE
Knowledge-enhanced RE incorporates external knowledge information to comprehensively understand
the relations between entities. Some existing work utilizes external knowledge bases like Freebase
and Wikidata to improve the representation by using entity and relation information. Liu et al. [30]
injects triples from knowledge graphs into texts, transforming sentences into knowledge-enhanced
sentence trees. Chen et al. [31] proposes a knowledge-aware prompt-tuning approach with synergistic
optimization that incorporates knowledge from relation labels into RE. External knowledge can bridge
the gap between general domain data and domain-specific data, while general domain RE methods
are applied in specific domains. Roy and Pan [32] uses an entity-level knowledge graph in pre-trained
BERT for clinical RE, integrating medical information.
2.3. LLM-based RE
LLM-based RE has also been studied by researchers motivated by the generalized intelligence of LLMs in
various downstream tasks, such as information extraction [33], machine translation [7], and adversarial
attacks [6]. However, previous research concludes that LLMs typically fall short in the RE task, whose
performance is much behind PLM-based approaches [10, 34, 35]. To overcome this, Zhang et al. [36]
proposes QA4RE, a framework to improve the performance of LLM by aligning RE with question
Question: Is there such a
Source Text: relationship βproduct producedβ
Apple Inc. is an American multinational (Apple Inc., SPARQL Query between βApple Inc.β and βiPhoneβ?
iPhone)
corporation and technology company
headquartered in Cupertino, California, in Yes.
Silicon Valley. [β¦] Devices include the
iPhone, iPad, Mac, Apple Watch, Vision
Pro, and Apple TV.
(Apple Inc., product produced, iPhone) Relationship: Product Produced
Entity Linking Relation Inference
Figure 1: Overall framework of the proposed method, consisting of two stages: (1) entity linking and (2) relation
inference using large language models (LLMs).
answering (QA) tasks. Wan et al. [37] proposes GPT-RE that utilizes task-aware representations and
reasoning logic to improve entity-relationship relevance and the capability of explaining input-label
mapping. Li et al. [38] suggests integrating LLM with an NLI module to construct relation triples in
response to the abundance of pre-defined relation types and the uncontrollability of LLMs.
3. Methodology
3.1. Problem Formulation
We define our RE task as follows: Given a document π· consisting of π sentences {π 1 , π 2 , ..., π π } (π is
the number of sentences within the document, and π = 1 indicates sentence-level RE) and an entity
pair of interest (πβ , ππ‘ ), in which πβ represents the head entity and ππ‘ refers to the tail entity, the RE
model aims to determine the potential relationship between πβ and ππ‘ from a pre-defined schema. In
our task, a KB π¦ is leveraged with world knowledge, and an LLM is utilized to identify the existence of
the relationship ππ¦ retrieved form π¦ between πβ and ππ‘ in the given document.
3.2. Entity Linking and Querying
In the first stage of our proposed framework, we conduct entity linking and querying to obtain the
candidate relationships between the entities of interest, which are regarded as supervision of world
knowledge to the given entity pair. Entity linking is the process of linking recognized entity words to an
entity in a KB, which is a pioneering step in extracting construction information from unstructured text
[39]. In our framework, we link the labeled entity mentions to Wikidata [21], a large-scale multilingual
KB. Once the entities are linked, we introduce a query based on SPARQL1 to retrieve the relationships
between the linked entities and regard it as the candidate relationship between them. For the datasets
whose entities are annotated with coreference chains, we iterate the head and tail entities until a pair of
entities can be linked to Wikidata.
3.3. Relation Inference using LLMs
After obtaining the candidate relationship between the entity pair of interest, in the second stage of
our proposed framework, we adopt LLMs to identify whether the relationship actually occurs in the
given context. Specifically, we leverage the ICL strategy [9] that conditions LLMs on a natural language
instruction and formulate the task as a QA task due to the capacity of LLMs to answer natural questions.
In accordance with the entity linking results in the first stage, we design separate prompts for the entity
1
https://www.w3.org/TR/sparql12-query/
Table 2
Instruction and an example for relation inference for the entity pair with world knowledge retrieved from
Wikidata.
Instruction:
Given information: {source_text}
Is there such a relationship {relationship} between {head_entity} and {tail_entity}?
Example:
Coburg Peak is the rocky peak rising to 783m in Erul Heights on Trinity Peninsula in Graham Land, Antarctica.
Head Entity: Trinity Peninsula
Tail Entity: Graham Land
Relationship: part of
Output:
Yes.
Answer:
(Trinity Peninsula, part of , Graham Land)
Table 3
Instruction and an example for relation inference for the entity pair without world knowledge retrieved from
Wikidata.
Instruction:
Given information: {source_text}
Options of relations: {relation_list}
Which relationship between {head_entity} and {tail_entity} can be inferred from given options? (Please
answer in English and only output the option)
Example:
Source Text: Utus Peak is the rocky peak rising to 1217m in Trakiya Heights on Trinity Peninsula in Graham
Land, Antarctica. The peak is named after the ancient Roman town of Utus in Northern Bulgaria.
Relaiton List: head of government, country, place of death, sibling, [...]
Head Entity: Trakiya Heights
Tail Entity: Antarctica
Output:
continent
Answer:
(Trakiya Heights, continent, Antarctica)
pairs that have or have not been found potential relationships, and the prompts with separate examples
are illustrated in Tables 2 and 3. For the entity pairs that have been found candidate relationships in the
KB, we ask LLMs to determine whether they actually exist in the given context. Otherwise, we ask the
LLMs to classify the relationships between the entities from the schemas directly.
This framework enables us to carry out a contextual, aligned, and knowledgeable RE process: it
regards the knowledge in KBs as supervision, and the inference with LLMs makes the predictions with
respect to the given contexts. Furthermore, since KBs are human-constructed world knowledge, their
candidate knowledge also conforms to human-defined schemas.
4. Experiments and Analysis
4.1. Datasets
We conduct our experiments on the following three datasets, whose dataset statistics are organized in
Table 4:
Table 4
Dataset statistics.
Dataset #Types Train Val Test
DocRED 96 3, 053 1, 000 1, 000
REBEL 1146 3.13M 173K 174K
REDFM-EN 32 1.88K 449 446
REDFM-ES 32 1.87K 228 281
REDFM-DE 32 2.07K 252 285
β’ DocRED [22] is a document-level human-annotated RE dataset constructed from Wikipedia and
Wikidata. Since at least 40.7% of relational facts in DocRED can only be extracted from multiple
sentences, it requires models to comprehensively model the whole document to determine the
relationships between entities.
β’ REBEL [18] is a distantly supervised dataset, hyperlinking with Wikidata and Wikipedia for
relation extraction. It employs an NLI model to filter noise and address relations that are not
entailed by the Wikipedia text through entailment prediction.
β’ REDFM [23] is constructed for multilingual RE that involves seven languages. Different from the
REBEL dataset, REDFM not only applies NLI to filter noise but also conducts manual filtering to
ensure the annotation quality. We select the English (EN), Spanish (ES), and German (DE) subsets
to validate the performance of our framework in a multilingual setting.
Following the previous work in LLM-based RE [10, 34], we sample a subset from the validation set
of DocRED and the test set of REBEL and REDFM to validate the performance of our method against
baselines. We evaluate the performance of the experimented models using micro F1-score.
4.2. Baselines
We compare the performance of our proposed method on RE against the following baselines:
β’ KD-DocRE [4] is a semi-supervised framework for document-level RE that incorporates axial
attention, adaptive focal loss, and knowledge distillation to capture the interdependency among
entity-pairs. It addresses the class imbalance problem and the differences between human
annotated and distantly supervised data in document-level RE.
β’ DREEAM [5] is a memory-efficient approach for improving document-level RE by incorporating
evidence and offering a self-training strategy, addressing high memory consumption and limited
annotated data availability in document-level RE.
4.3. Experimental Setup
We conduct our experiments on three commonly used multilingual LLMs: GPT-3.5, Llama 2 [24], and
Flan-T5 [25], and we access the models with different approaches and settings. For GPT-3.5, we call the
API by OpenAI2 and select the gpt-3.5-turbo-instruct checkpoint due to its ability to interpret and
execute human instructions seamlessly. For Llama 2 (Llama-2-7b-chat-hf) and Flan-T5 (flan-t5-xl),
the models are retrieved from the HuggingFace repository3 . To mimic the randomness of human
reasoning and produce relatively stable outputs, we set the temperature of GPT-3.5 and Llama 2 as 0.2.
All experiments are conducted on a single NVIDIA GeForce RTX 4090 graphics card.
2
https://platform.openai.com/
3
https://huggingface.co/
Table 5
Experimental results on F1-score of our proposed method under different large language models (LLMs) with
and without external knowledge against baselines, in which the best and the second-best results are highlighted
in bold and underlined, respectively.
Model DocRED REBEL REDFM-EN REDFM-ES REDFM-DE
KD-DocRE 68.79 β β β β
DREEAM 69.55 β β β β
GPT-3.5 22.45 23.65 19.22 9.88 10.03
w/ Knowledge 62.52 56.68 68.17 60.27 69.39
LLaMA 2 0.00 0.72 0.00 0.00 0.00
w/ Knowledge 27.24 54.51 52.83 33.33 51.53
Flan-T5 62.79 60.65 70.76 61.05 67.86
w/ Knowledge 73.90 70.40 79.32 73.84 81.97
4.4. Main Results
The experimental results of our proposed framework under different LLMs with and without external
knowledge are given in Table 5. From the table, we make the following observations:
First, without the incorporation of external knowledge, LLMs have been shown to fall short in the
RE task and their performances are much behind those of the state-of-the-art baseline models. The
results are also consistent with the previous work [10], indicating the correctness of our implementation.
Among the three LLMs, Flan-T5 achieves the best performance and is remarkably close to the deliberated
baseline models, indicating its excellent document-level understanding and relation reasoning ability.
Llama 2 achieves the worst performance, with its results close to zero. We sample 50 outputs of Llama
2 and compare them with the ground truths. We conclude that this phenomenon is attributed to the
excessively uncontrollable and flexible nature of its output compared to the rest of the models.
Second, after incorporating the external knowledge into the models, LLMs exhibit remarkable
performance across all datasets, in which the average improvements of GPT-3.5, Llama 2, and Flan-T5
are 52.90, 45.90, and 11.82, respectively. Notably, the performance of Flan-T5 under the zero-shot
setting achieves state-of-the-art results on all datasets, which is also better than the deliberated, fine-
tuned PLM-based methods. GPT-3.5 improves the most among the models, but there is still room between
the performance and the PLM-based methods. These results demonstrate that the performances of
LLMs with external knowledge in a zero-shot setting can be comparable to or even surpass the fine-
tuned PLM-based method on the RE task. They also underscore the effectiveness of our approach in
multilingual settings, which is not limited to the English context.
Finally, the performance of the experimented models is consistent regardless of the language and
the existence of external knowledge. Flan-T5 consistently achieve the best performance across all
datasets, and Llama 2 exhibits comparatively lower performance, indicating that Flan-T5 has a better
performance and a robust generalization advantage when dealing with the RE task and can be regarded
as an ideal model in real-world application, while Llama 2 requires additional improvements for higher
performance.
4.5. Additional Analysis
Effectiveness of External Knowledge First, we analyze the effectiveness of the external KB in our
proposed method. Since not all entity pairs can be linked to Wikidata, we calculate the percentage
of correct prediction of LLMs with and without the incorporation of external knowledge, denoted as
ππ€/ππππ€ and ππ€/πππππ€ , calculated as:
# of Correct Prediction
ππ€/ππππ€ = , (1)
# of Entity Pairs Linked to Wikidata
83.53 84.43 83.51 86.65 89.32 87.99
80
80
70
Correct Prediction%
Correct Prediction%
60
60
50 49.45
40
40
40.09
30
22.74
20 20
10 With Knowledge 5.99 With Knowledge
Without Knowledge
0.00 Without Knowledge
0.00
0 0
GPT-3.5 Flan-T5 LLaMA 2 GPT-3.5 Flan-T5 LLaMA 2
Model Model
Figure 2: Percentage of correct relation prediction Figure 3: Percentage of correct relation prediction
with and without external knowledge on the DocRED with and without external knowledge on the REBEL
dataset. dataset.
0.7 With Knowledge
Without Knowledge
0.6
0.5
F1-score
0.4
0.3
0.2
80M 250M 780M 3B
# of Model Parameters
Figure 4: Scaling law of Flan-T5 on RE performance.
# of Correct Prediction
ππ€/πππππ€ = . (2)
# of Entity Pairs Not Linked to Wikidata
We visualize the calculation results in Figures 2 and 3, respectively. Our findings show a significant
difference in performance with and without incorporating knowledge across all datasets and LLMs.
Specifically, the correct prediction of LLMs with external knowledge is as high as more than 80%, while
the results without knowledge are inferior, among which only Flan-T5 can exceed 40%. Because of the
flexible and uncontrollable nature of Llama 2, its correct predictions without external knowledge are
nearly zero, while after incorporating knowledge, its results improve to over 80%. The performance
difference indicates that a performance gap exists across different modelsβwhile all models can achieve
similar performance with external knowledge, their results are dominated by the relation classification
result without external knowledge, indicating that LLMs are good inferencers but not classifiers for
entity relationships. Moreover, although the performance of LLMs is better on the REBEL dataset with
the incorporation of external knowledge, it becomes worse on the models without knowledge due to
the large relation schema of the dataset. This remains a challenge for future research to design better
methods to deal with the entity pairs that cannot link to the KBs.
Scaling Law We also analyze whether the performance of LLM-based RE can benefit from scaling up
the model parameters. Specifically, we select the Flan-T5 series models with four different model sizes:
REDFM-EN REDFM-ES REDFM-DE
Not Linked to Wikidata Not Linked to Wikidata Not Linked to Wikidata
14.5% 15.9% 15.5%
85.5% 84.1% 84.5%
Linked to Wikidata Linked to Wikidata Linked to Wikidata
Figure 5: Knowledge coverage in the portion of the dataset we chose for the three languages.
Flan-T5-Small (80M), Flan-T5-Base (250M), Flan-T5-Large (780M), and Flan-T5-XL (3B) and evaluate the
performance of the models with and without external knowledge. As shown in Figure 4, a clear positive
scaling effect exists in LLM-based RE, i.e., fine-tuned larger models achieve better performance in the
RE task. We can also observe the role of external knowledge. After incorporating external knowledge
into the LLM, the increase in the number of model parameters has a smaller impact on the results.
Moreover, with external knowledge, Flan-T5-Small can surpass Flan-T5-Large, and Flan-T5-Base can
exceed Flan-T5-XLβs performance without external knowledge. This validates the effectiveness of both
LLMs and external knowledge when handling the RE task.
Coverage of Knowledge in Multilingualism Given the multilingual support of the chosen LLMs,
we extend our investigation to include multilingual RE experiments using the REDFM dataset. The
experimental outcomes, as summarized in Table 5, reveal subpar performance when the LLMs attempt
multilingual RE tasks directly. However, integrating external knowledge significantly enhances perfor-
mance, prompting us to explore the coverage of Wikidata for the selected multilingual dataset. To this
end, we conduct supplementary experiments on REDFM-EN, REDFM-DE, and REDFM-ES to assess
the percentage of samples that could be linked to Wikidata for external knowledge, as illustrated in
Figure 5. The results indicate that a relatively high proportion of samples across the three languages
could be covered by Wikidata, with coverages nearing or exceeding 85%, specifically 85.5%, 84.5%,
and 84.1% for REDFM-EN, REDFM-DE, and REDFM-ES, respectively. The remaining 14.5%, 15.5%,
and 15.9% are attributed to entries not indexed by Wikidata, with a small fraction being inaccessible
due to unstable network connections.
5. Conclusion and Future Work
In this paper, we propose a novel framework to address the current challenges of LLMs falling short in RE
tasks because of their context-unawareness and schema-misalignment, with world knowledge ignorance.
It consists of two stages: entity linking and relation inference, fully leveraging the efficacy of KBs and
LLMs in this task. We conduct experiments in a multilingual setting using three datasets and three
LLMs to validate the effectiveness of our framework, where the zero-shot RE with world knowledge
outperforms those without that by a significant margin and achieves state-of-the-art performance on
all experimental datasets, even better than fine-tuned PLM-based methods, indicating the effectiveness
of our proposed framework. We also conduct additional analysis on the effectiveness of knowledge, the
impact of scaling up model parameters, and the coverage of knowledge in multilingualism to further
demonstrate the effectiveness and generalizability of our proposed method. In the future, we will
conduct more detailed analysis on other related tasks, such as event relation extraction, to further
validate the effectiveness and generalizability of our proposed method.
6. Acknowledgments
This research is funded by the Postgraduate Research Scholarship (PGRS) at Xiβan Jiaotong-Liverpool
University, contract number FOSSP221001.
References
[1] H. Peng, T. Gao, X. Han, Y. Lin, P. Li, Z. Liu, M. Sun, J. Zhou, Learning from Context or Names? An
Empirical Study on Neural Relation Extraction, in: Proceedings of EMNLP, 2020, pp. 3661β3672.
URL: https://aclanthology.org/2020.emnlp-main.298. doi:10.18653/v1/2020.emnlp-main.298.
[2] X. Li, F. Yin, Z. Sun, X. Li, A. Yuan, D. Chai, M. Zhou, J. Li, Entity-relation extraction as multi-turn
question answering, in: Proceedings of ACL, 2019, pp. 1340β1350. URL: https://aclanthology.org/
P19-1129. doi:10.18653/v1/P19-1129.
[3] A. Madotto, C.-S. Wu, P. Fung, Mem2Seq: Effectively incorporating knowledge bases into end-
to-end task-oriented dialog systems, in: Proceedings of ACL, 2018, pp. 1468β1478. URL: https:
//aclanthology.org/P18-1136. doi:10.18653/v1/P18-1136.
[4] Q. Tan, R. He, L. Bing, H. T. Ng, Document-level relation extraction with adaptive focal loss and
knowledge distillation, in: Findings of ACL, 2022, pp. 1672β1681. URL: https://aclanthology.org/
2022.findings-acl.132. doi:10.18653/v1/2022.findings-acl.132.
[5] Y. Ma, A. Wang, N. Okazaki, DREEAM: Guiding attention with evidence for improving document-
level relation extraction, in: Proceedings of EACL, 2023, pp. 1971β1983. URL: https://aclanthology.
org/2023.eacl-main.145. doi:10.18653/v1/2023.eacl-main.145.
[6] Z. Wang, W. Wang, Q. Chen, Q. Wang, A. Nguyen, Generating valid and natural adversarial
examples with large language models, 2023. arXiv:2311.11861.
[7] H. Na, Z. Wang, M. Maimaiti, T. Chen, W. Wang, T. Shen, L. Chen, Rethinking human-like
translation strategy: Integrating drift-diffusion model with large language models for machine
translation, 2024. arXiv:2402.10699.
[8] T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam,
G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss, G. Krueger, T. Henighan, R. Child, A. Ramesh,
D. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark,
C. Berner, S. McCandlish, A. Radford, I. Sutskever, D. Amodei, Language models are few-shot
learners, in: Proceedings of NeurIPS, volume 33, 2020, pp. 1877β1901. URL: https://proceedings.
neurips.cc/paper_files/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf.
[9] J. Wei, X. Wang, D. Schuurmans, M. Bosma, b. ichter, F. Xia, E. Chi, Q. V. Le, D. Zhou, Chain-
of-thought prompting elicits reasoning in large language models, in: Proceedings of NeurIPS,
volume 35, 2022, pp. 24824β24837. URL: https://proceedings.neurips.cc/paper_files/paper/2022/
file/9d5609613524ecf4f15af0f7b31abca4-Paper-Conference.pdf.
[10] H. Peng, X. Wang, J. Chen, W. Li, Y. Qi, Z. Wang, Z. Wu, K. Zeng, B. Xu, L. Hou, J. Li,
When does in-context learning fall short and why? a study on specification-heavy tasks, 2023.
arXiv:2311.08993.
[11] C. Si, D. Friedman, N. Joshi, S. Feng, D. Chen, H. He, Measuring inductive biases of in-context
learning with underspecified demonstrations, in: Proceedings of ACL, 2023, pp. 11289β11310. URL:
https://aclanthology.org/2023.acl-long.632. doi:10.18653/v1/2023.acl-long.632.
[12] P. Cao, X. Zuo, Y. Chen, K. Liu, J. Zhao, Y. Chen, W. Peng, Knowledge-enriched event causality
identification via latent structure induction networks, in: Proceedings of ACL-IJCNLP, 2021, pp.
4862β4872. URL: https://aclanthology.org/2021.acl-long.376. doi:10.18653/v1/2021.acl-long.
376.
[13] T. Lai, H. Ji, C. Zhai, Q. H. Tran, Joint biomedical entity and relation extraction with knowledge-
enhanced collective inference, in: Proceedings of ACL-IJCNLP, 2021, pp. 6248β6260. URL: https:
//aclanthology.org/2021.acl-long.488. doi:10.18653/v1/2021.acl-long.488.
[14] A. Mallen, A. Asai, V. Zhong, R. Das, D. Khashabi, H. Hajishirzi, When not to trust language
models: Investigating effectiveness of parametric and non-parametric memories, in: Proceedings
of ACL, 2023, pp. 9802β9822. URL: https://aclanthology.org/2023.acl-long.546. doi:10.18653/v1/
2023.acl-long.546.
[15] M. Mintz, S. Bills, R. Snow, D. Jurafsky, Distant supervision for relation extraction without labeled
data, in: Proceedings of ACL-AFNLP, 2009, pp. 1003β1011. URL: https://aclanthology.org/P09-1113.
[16] R. Hoffmann, C. Zhang, D. S. Weld, Learning 5000 relational extractors, in: Proceedings of ACL,
2010, pp. 286β295. URL: https://aclanthology.org/P10-1030.
[17] M. Chen, L. Huang, M. Li, B. Zhou, H. Ji, D. Roth, New frontiers of information extraction,
in: Proceedings of NAACL-HLT (Tutorials), 2022, pp. 14β25. URL: https://aclanthology.org/2022.
naacl-tutorials.3. doi:10.18653/v1/2022.naacl-tutorials.3.
[18] P.-L. Huguet Cabot, R. Navigli, REBEL: Relation extraction by end-to-end language generation, in:
Findings of EMNLP, 2021, pp. 2370β2381. URL: https://aclanthology.org/2021.findings-emnlp.204.
doi:10.18653/v1/2021.findings-emnlp.204.
[19] X. Zhao, M. Zhang, M. Ma, C. Su, Y. Liu, M. Wang, X. Qiao, J. Guo, Y. Li, W. Ma, HW-TSC
at SemEval-2023 task 7: Exploring the natural language inference capabilities of ChatGPT and
pre-trained language model for clinical trial, in: Proceedings of SemEval-2023, 2023, pp. 1603β1608.
URL: https://aclanthology.org/2023.semeval-1.221. doi:10.18653/v1/2023.semeval-1.221.
[20] L. Pan, A. Albalak, X. Wang, W. Wang, Logic-LM: Empowering large language models with sym-
bolic solvers for faithful logical reasoning, in: Findings of EMNLP, 2023, pp. 3806β3824. URL: https:
//aclanthology.org/2023.findings-emnlp.248. doi:10.18653/v1/2023.findings-emnlp.248.
[21] D. VrandeΔiΔ, M. KrΓΆtzsch, Wikidata: a free collaborative knowledgebase, Commun. ACM 57
(2014) 78β85. URL: https://doi.org/10.1145/2629489. doi:10.1145/2629489.
[22] Y. Yao, D. Ye, P. Li, X. Han, Y. Lin, Z. Liu, Z. Liu, L. Huang, J. Zhou, M. Sun, DocRED: A large-scale
document-level relation extraction dataset, in: Proceedings of ACL, 2019, pp. 764β777. URL:
https://aclanthology.org/P19-1074. doi:10.18653/v1/P19-1074.
[23] L. Huguet Cabot, S. Tedeschi, A.-C. Ngonga Ngomo, R. Navigli, REDfm : a filtered and multilingual
relation extraction dataset, in: Proceedings of ACL, 2023, pp. 4326β4343. URL: https://aclanthology.
org/2023.acl-long.237. doi:10.18653/v1/2023.acl-long.237.
[24] H. Touvron, L. Martin, K. Stone, P. Albert, A. Almahairi, Y. Babaei, N. Bashlykov, S. Batra, P. Bhar-
gava, S. Bhosale, D. Bikel, L. Blecher, C. C. Ferrer, M. Chen, G. Cucurull, D. Esiobu, J. Fernandes,
J. Fu, W. Fu, B. Fuller, C. Gao, V. Goswami, N. Goyal, A. Hartshorn, S. Hosseini, R. Hou, H. Inan,
M. Kardas, V. Kerkez, M. Khabsa, I. Kloumann, A. Korenev, P. S. Koura, M.-A. Lachaux, T. Lavril,
J. Lee, D. Liskovich, Y. Lu, Y. Mao, X. Martinet, T. Mihaylov, P. Mishra, I. Molybog, Y. Nie, A. Poulton,
J. Reizenstein, R. Rungta, K. Saladi, A. Schelten, R. Silva, E. M. Smith, R. Subramanian, X. E. Tan,
B. Tang, R. Taylor, A. Williams, J. X. Kuan, P. Xu, Z. Yan, I. Zarov, Y. Zhang, A. Fan, M. Kambadur,
S. Narang, A. Rodriguez, R. Stojnic, S. Edunov, T. Scialom, Llama 2: Open foundation and fine-tuned
chat models, 2023. arXiv:2307.09288.
[25] H. W. Chung, L. Hou, S. Longpre, B. Zoph, Y. Tay, W. Fedus, Y. Li, X. Wang, M. Dehghani,
S. Brahma, A. Webson, S. S. Gu, Z. Dai, M. Suzgun, X. Chen, A. Chowdhery, A. Castro-Ros,
M. Pellat, K. Robinson, D. Valter, S. Narang, G. Mishra, A. Yu, V. Zhao, Y. Huang, A. Dai, H. Yu,
S. Petrov, E. H. Chi, J. Dean, J. Devlin, A. Roberts, D. Zhou, Q. V. Le, J. Wei, Scaling instruction-
finetuned language models, 2022. arXiv:2210.11416.
[26] P. Verga, E. Strubell, A. McCallum, Simultaneously self-attending to all mentions for full-abstract
biological relation extraction, in: Proceedings of NAACL-HLT, 2018, pp. 872β884. URL: https:
//aclanthology.org/N18-1080. doi:10.18653/v1/N18-1080.
[27] L. Baldini Soares, N. FitzGerald, J. Ling, T. Kwiatkowski, Matching the blanks: Distributional
similarity for relation learning, in: Proceedings of ACL, 2019, pp. 2895β2905. URL: https://
aclanthology.org/P19-1279. doi:10.18653/v1/P19-1279.
[28] S. Zeng, R. Xu, B. Chang, L. Li, Double graph based reasoning for document-level relation
extraction, in: Proceedings of EMNLP, 2020, pp. 1630β1640. URL: https://aclanthology.org/2020.
emnlp-main.127. doi:10.18653/v1/2020.emnlp-main.127.
[29] S. Zeng, Y. Wu, B. Chang, SIRE: Separate intra- and inter-sentential reasoning for document-level
relation extraction, in: Findings of ACL-IJCNLP, 2021, pp. 524β534. URL: https://aclanthology.org/
2021.findings-acl.47. doi:10.18653/v1/2021.findings-acl.47.
[30] W. Liu, P. Zhou, Z. Zhao, Z. Wang, Q. Ju, H. Deng, P. Wang, K-bert: Enabling language rep-
resentation with knowledge graph, Proceedings of AAAI 34 (2020) 2901β2908. URL: https:
//ojs.aaai.org/index.php/AAAI/article/view/5681. doi:10.1609/aaai.v34i03.5681.
[31] X. Chen, N. Zhang, X. Xie, S. Deng, Y. Yao, C. Tan, F. Huang, L. Si, H. Chen, Know-
prompt: Knowledge-aware prompt-tuning with synergistic optimization for relation extraction,
in: Proceedings of WWW, 2022, p. 2778β2788. URL: https://doi.org/10.1145/3485447.3511998.
doi:10.1145/3485447.3511998.
[32] A. Roy, S. Pan, Incorporating medical knowledge in BERT for clinical relation extraction, in:
Proceedings of EMNLP, 2021, pp. 5357β5366. URL: https://aclanthology.org/2021.emnlp-main.435.
doi:10.18653/v1/2021.emnlp-main.435.
[33] H. Peng, X. Wang, F. Yao, Z. Wang, C. Zhu, K. Zeng, L. Hou, J. Li, OmniEvent: A comprehensive, fair,
and easy-to-use toolkit for event understanding, in: Proceedings of EMNLP (Demo), 2023, pp. 508β
517. URL: https://aclanthology.org/2023.emnlp-demo.46. doi:10.18653/v1/2023.emnlp-demo.46.
[34] R. Han, T. Peng, C. Yang, B. Wang, L. Liu, X. Wan, Is information extraction solved by chatgpt? an
analysis of performance, evaluation criteria, robustness and errors, 2023. arXiv:2305.14450.
[35] B. Li, G. Fang, Y. Yang, Q. Wang, W. Ye, W. Zhao, S. Zhang, Evaluating chatgptβs information
extraction capabilities: An assessment of performance, explainability, calibration, and faithfulness,
2023. arXiv:2304.11633.
[36] K. Zhang, B. Jimenez Gutierrez, Y. Su, Aligning instruction tasks unlocks large language models
as zero-shot relation extractors, in: Findings of ACL, 2023, pp. 794β812. URL: https://aclanthology.
org/2023.findings-acl.50. doi:10.18653/v1/2023.findings-acl.50.
[37] Z. Wan, F. Cheng, Z. Mao, Q. Liu, H. Song, J. Li, S. Kurohashi, GPT-RE: In-context learning for
relation extraction using large language models, in: Proceedings of EMNLP, 2023, pp. 3534β3547.
URL: https://aclanthology.org/2023.emnlp-main.214. doi:10.18653/v1/2023.emnlp-main.214.
[38] J. Li, Z. Jia, Z. Zheng, Semi-automatic data enhancement for document-level relation extraction with
distant supervision from large language models, in: Proceedings of EMNLP, 2023, pp. 5495β5505.
URL: https://aclanthology.org/2023.emnlp-main.334. doi:10.18653/v1/2023.emnlp-main.334.
[39] W. Shen, J. Wang, J. Han, Entity linking with a knowledge base: Issues, techniques, and solutions,
IEEE Transactions on Knowledge and Data Engineering 27 (2015) 443β460. doi:10.1109/TKDE.
2014.2327028.