<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Fine-Tuning vs. Prompting: Evaluating the Knowledge Graph Construction with LLMs</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Hussam</forename><surname>Ghanem</surname></persName>
							<email>hussam.ghanem@u-bourgogne.fr</email>
							<affiliation key="aff0">
								<orgName type="laboratory" key="lab1">ICB</orgName>
								<orgName type="laboratory" key="lab2">UMR 6306</orgName>
								<orgName type="institution" key="instit1">CNRS</orgName>
								<orgName type="institution" key="instit2">Université de Bourgogne</orgName>
								<address>
									<postCode>21000</postCode>
									<settlement>Dijon</settlement>
									<country key="FR">France</country>
								</address>
							</affiliation>
							<affiliation key="aff1">
								<orgName type="institution">DAVI The Humanizers</orgName>
								<address>
									<settlement>Puteaux</settlement>
									<country key="FR">France</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Christophe</forename><surname>Cruz</surname></persName>
							<email>christophe.cruz@u-bourgogne.fr</email>
							<affiliation key="aff0">
								<orgName type="laboratory" key="lab1">ICB</orgName>
								<orgName type="laboratory" key="lab2">UMR 6306</orgName>
								<orgName type="institution" key="instit1">CNRS</orgName>
								<orgName type="institution" key="instit2">Université de Bourgogne</orgName>
								<address>
									<postCode>21000</postCode>
									<settlement>Dijon</settlement>
									<country key="FR">France</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Fine-Tuning vs. Prompting: Evaluating the Knowledge Graph Construction with LLMs</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">C4DD0769A476769AD0B797157BB4E275</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T18:38+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Text-to-Knowledge Graph</term>
					<term>Large Language Models</term>
					<term>Zero-Shot Prompting</term>
					<term>Few-Shot Prompting</term>
					<term>Fine-Tuning</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>This paper explores Text-to-Knowledge Graph (T2KG) construction" assessing Zero-Shot Prompting (ZSP), Few-Shot Prompting (FSP), and Fine-Tuning (FT) methods with Large Language Models (LLMs). Through comprehensive experimentation with Llama2, Mistral, and Starling, we highlight the strengths of FT, emphasize dataset size's role, and introduce nuanced evaluation metrics. Promising perspectives include synonym-aware metric refinement, and data augmentation with LLMs. The study contributes valuable insights to KG construction methodologies, setting the stage for further advancements. 1   </p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>The term "knowledge graph" has been around since 1972, but its current definition can be traced back to Google in 2012. This was followed by similar announcements from companies such as Airbnb, Amazon, eBay, Facebook, IBM, LinkedIn, Microsoft, and Uber, among others, leading to an increase in the adoption of Knowledge graphs(KGs) by various industries. As a result, academic research in this field has seen a surge in recent years, with an increasing number of scientific publications on KGs <ref type="bibr" target="#b0">[1]</ref>. These graphs utilize a graph-based data model to effectively manage, integrate, and extract valuable insights from large and diverse datasets <ref type="bibr" target="#b1">[2]</ref>.</p><p>KGs serve as repositories for structured knowledge, organized into a collection of triples, denoted as 𝐾𝐺 = (ℎ, 𝑟, 𝑡) ⊆ 𝐸 × 𝑅 × 𝐸, where E represents the set of entities, and R represents the set of relations <ref type="bibr" target="#b0">[1]</ref>. Within a graph, nodes represent various levels, entities, or concepts. These nodes encompass diverse types, including person, book, or city, and are interconnected by relationships such as located in, lives in, or works with. The essence of a KG emerges when it incorporates multiple types of relationships rather than being confined to a single type. The overarching structure of a KG constitutes a network of entities, featuring their semantic types, properties, and interconnections. Thus, constructing a KG necessitates information about entities (along with their types and properties) and the semantic relationships that bind them. For the extraction of entities and relationships, practitioners often turn to NLP tasks like Named Entity Recognition (NER), Coreference Resolution (CR), and Relation Extraction (RE).</p><p>KGs play a crucial role in organizing complex information across diverse domains, such as question answering, recommendations, semantic search, etc. However, the ongoing challenge persists in constructing them, particularly as the primary sources of knowledge are embedded in unstructured textual data such as press articles, emails, and scientific journals. This challenge can be addressed by adopting an information extraction approach, sometimes implemented as a pipeline. It involves taking textual inputs, processing them using Natural Language Processing (NLP) techniques, and leveraging the acquired knowledge to construct or enhance the KG.</p><p>If we envision the Text-to-Knowledge Graph (T2KG) construction task as a black box, the input is textual data, and the output is a knowledge graph. Achieving this can be approached through methods that directly convert text into a graph or by implementing NLP tasks in two ways: 1) through an information extraction pipeline incorporating the mentioned tasks independently, or 2) by adopting an end-to-end approach, also known as joint prediction, using Large Language Models (LLMs) for example. In the realm of LLMs and KGs, their mutual enhancement is evident. LLMs can assist in the construction of KGs. Conversely, KGs can be employed to validate outputs from LLMs or provide explanations for them <ref type="bibr" target="#b2">[3]</ref>. LLMs can be adapted to KG construction task (T2KG) through various approaches, such as fine-tuning <ref type="bibr" target="#b3">[4]</ref> (FT), zero-shot prompting <ref type="bibr" target="#b4">[5]</ref> (ZSP), or few-shot prompting (FSP) <ref type="bibr" target="#b5">[6]</ref> with a limited number of examples. Each of these approaches has their pros and cons with respect to the performance, computation resources, training time, domain adaption and training data required.</p><p>In-context learning, as discussed by <ref type="bibr" target="#b6">[7]</ref>, coupled with prompt design, involves telling a model to execute a new task by presenting it with only a few demonstrations of input-output pairs during inference. Instruction fine-tuning methods, exemplified by InstructGPT <ref type="bibr" target="#b7">[8]</ref> and Reinforcement Learning from Human Feedback (RLHF) <ref type="bibr" target="#b8">[9]</ref>, markedly enhance the model's ability to comprehend and follow a diverse range of written instructions. Numerous LLMs have been introduced in the last year, as highlighted by <ref type="bibr" target="#b2">[3]</ref>, particularly within the ChatGPT <ref type="bibr" target="#b9">[10]</ref> like models, which includes GPT-3 <ref type="bibr" target="#b10">[11]</ref>, LLaMA <ref type="bibr" target="#b11">[12]</ref>, BLOOM <ref type="bibr" target="#b12">[13]</ref>, PaLM <ref type="bibr" target="#b13">[14]</ref>, Mistral <ref type="bibr" target="#b14">[15]</ref>, Starling <ref type="bibr" target="#b15">[16]</ref> and Zephyr <ref type="bibr" target="#b16">[17]</ref>. These models can be readily repurposed for KG construction from text by employing a prompt design that incorporates instructions and contextual information.</p><p>This study does not entail a comparison with traditional methods of constructing KGs; rather, it delves into the developments and challenges associated with KG construction methodologies, and aiming at providing formal evaluation of T2KG task. Specifically, we focus on the utilization of LLMs, and explore the three approaches mentioned before, Zero-shot, Few-shot and Finetuning (Fig. <ref type="figure" target="#fig_0">1</ref>). Each of these approaches addresses specific challenges, contributing significantly to the evolution of T2KG construction techniques.</p><p>The present study is organized as follows, Section 2 presents a comprehensive overview of the current state-of-the-art approaches for Text to KG (T2KG) Construction. In the Section 3, we present the general architecture of our proposed implementation (method), with datasets, metrics, and experiments. Section 4 then encapsulates the findings and discussions, presenting the culmination of results. Finally, Section 5 critically examines the strengths and limitations of these techniques.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Background</head><p>The current state of research on knowledge graph construction using LLMs is discussed. Three main approaches are identified: Zero-Shot, Few-Shot, and Fine-Tuning. Each approach has its own challenges, such as maintaining accuracy without specific training data or ensuring the robustness of models in diverse real-world scenarios. Evaluation metrics used to assess the quality of constructed KGs are also discussed, including semantic consistency and linguistic coherence. This section highlight methods and metrics to construct KGs and evaluate the result.</p><p>The figure <ref type="figure" target="#fig_0">1</ref> illustrates the black box joint prediction of the T2KG construction process using LLMs. It demonstrates how two French examples on the left are converted into an expected result (KG) on the right using ZSP, FSP or FT approaches with LLMs. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1.">Zero Shot</head><p>Zero Shot methods enable KG construction without task-specific training data, leveraging the inherent capabilities of large language models. <ref type="bibr" target="#b17">[18]</ref> introduce an innovative approach using large language models (LLMs) for knowledge graph construction, employing iterative zeroshot prompting for scalable and flexible KG construction. <ref type="bibr" target="#b18">[19]</ref> evaluate the performance of LLMs, specifically GPT-4 and ChatGPT, in KG construction and reasoning tasks, introducing the Virtual Knowledge Extraction task and the VINE dataset, but they do not take into account open sourced LLMs as LLaMA <ref type="bibr" target="#b11">[12]</ref>. <ref type="bibr" target="#b19">[20]</ref> assess ChatGPT's abilities in information extraction tasks, identifying overconfidence as an issue and releasing annotated datasets. <ref type="bibr" target="#b20">[21]</ref> tackle zero-shot information extraction using ChatGPT, achieving impressive results in entity relation triple extraction. <ref type="bibr" target="#b21">[22]</ref> propose a method for Knowledge Graph Construction (KGC) using an analogy-based approach, demonstrating superior performance on Wikidata. <ref type="bibr" target="#b22">[23]</ref> address the limitations of existing generative knowledge graph construction methods by leveraging large generative language models trained on structured data. The most of these approaches having the same limitation, which is the use of closed and huge LLMs as ChatGPT or GPT4 for this task. Challenges in this area include maintaining accuracy without specific training data and addressing nuanced relationships between entities in untrained domains.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2.">Few Shot</head><p>Few Shot methods focus on constructing KGs with limited training examples, aiming to achieve accurate knowledge representation with minimal data. <ref type="bibr" target="#b5">[6]</ref> introduce PiVe, a framework enhancing the graph-based generative capabilities of LLMs, and the authors create a verifier which is responsable to verifie the results of LLMs with multi-iteration type. <ref type="bibr" target="#b23">[24]</ref> explore the potential of LLMs for knowledge graph completion, treating triples as text sequences and utilizing LLM responses for predictions. <ref type="bibr" target="#b24">[25]</ref> automate the process of generating structured knowledge graphs from natural language text using foundation models. <ref type="bibr" target="#b25">[26]</ref> present OpenBG, an open business knowledge graph derived from Alibaba Group, containing 2.6 billion triples with over 88 million entities. <ref type="bibr" target="#b26">[27]</ref> explore the integration of LLMs with semantic technologies for reasoning and inference. <ref type="bibr" target="#b27">[28]</ref> investigate LLMs' application in relation labeling for e-commerce Knowledge Graphs (KGs). As ZSP approaches, FSP approaches use closed and huge LLMs as ChatGPT or GPT4 <ref type="bibr" target="#b9">[10]</ref> for this task. Challenges in this area include achieving high accuracy with minimal training data and ensuring the robustness of models in diverse real-world scenarios.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.3.">Fine-Tuning</head><p>Fine-Tuning methods involve adapting pre-trained language models to specific knowledge domains, enhancing their capabilities for constructing KGs tailored to particular contexts. <ref type="bibr" target="#b3">[4]</ref> present a case study automating KG construction for compliance using BERT-based models. This study emphasizes the importance of machine learning models in interpreting rules for compliance automation. <ref type="bibr" target="#b28">[29]</ref> propose an approach for knowledge extraction and analysis from biomedical clinical notes, utilizing the BERT model and a Conditional Random Field layer, showcasing the effectiveness of leveraging BERT models for structured biomedical knowledge graphs. <ref type="bibr" target="#b29">[30]</ref> propose Knowledge Graph-Enhanced Large Language Models (KGLLMs), enhancing LLMs with KGs for improved factual reasoning capabilities. These approaches that applied FT, they do not use new generations of LLMs, specially, decoder only LLMs as Llama, and Mistral. Challenges in this domain include ensuring the scalability, interpretability, and robustness of fine-tuned models across diverse knowledge domains.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.4.">Evaluation metrics</head><p>As we employ LLMs to construct KGs, and given that LLMs function as Natural Language Generation (NLG) models, it becomes imperative to discuss NLG criteria. In NLG, two criteria <ref type="bibr" target="#b30">[31]</ref> are used to assess the quality of the produced answers (triples in our context).</p><p>The first criterion is semantic consistency or Semantic Fidelity which quantifies the fidelity of the data produced against the input data. The most common indicators are :</p><p>• Hallucination: It is manifested by the Presence of information (facts) in the generated text that is absent in the input data. In our scenario, hallucination exists if the generated triples (GT) contain triples not present in the ground truth triples (ET) (T in GT and not in ET);</p><p>• Omission: It is manifested by the omission of one of the pieces of information (facts) in the generated text. In our case, omission occurs if a triple is present in ET but not in GT;</p><p>• Redundancy: This is manifested by the repetition of information in the generated text.</p><p>In our case, the redundancy exists if a triple appears more than once in GT;</p><p>• Accuracy: The lack of accuracy is manifested by the modification of information such as the inversion of the subject and the direct object complement in the generated text. Accuracy increases if there is an exact match between ET and GT. ;</p><p>• Ordering: It occurs when the sequence of information is different from the input data.</p><p>In our case, the ordering of GT is not considered.</p><p>The second criterion is linguistic coherence or Output Fluency to evaluate the fluidity of the text and the linguistic constructions of the generated text, the segmentation of the text into different sentences, the use of anaphoric pronouns to reference entities and to have linguistically correct sentences. However, in our evaluation, we do not take into account the second criterion.</p><p>In their experiments, <ref type="bibr" target="#b2">[3]</ref> calculated three hallucination metrics -subject hallucination, relation hallucination, and object hallucination -using certain preprocessing steps such as stemming. They used the ground truth ontology alongside the ground truth test sentence to determine if an entity or relation is present in the text. However, a limitation could arise when there is a disparity in entities or relations between the ground truth ontology and the ground truth test sentence. If the generated triples contain entities or relations not present in the ground truth text, even if they exist in the ground truth ontology, it will be considered a hallucination.</p><p>The authors of <ref type="bibr" target="#b5">[6]</ref> evaluate their experiments using several evaluation metrics, including Triple Match F1 (T-F1), Graph Match F1 (G-F1), G-BERTScore (G-BS) from <ref type="bibr" target="#b31">[32]</ref> which extends BertScore <ref type="bibr" target="#b32">[33]</ref> for graph matching, and Graph Edit Distance (GED) from <ref type="bibr" target="#b33">[34]</ref>. The GED metric measures the distance between the predicted graph and the ground-truth graph, which is equivalent to computing the number of edit operations (addition, deletion, or replacement of nodes and edges) needed to transform the predicted graph into a graph that is identical to the ground-truth graph, but it does not provide a specific path for these operations to calculate the exact number of operations. To adhere with semantic consistency criterion, we use the terms "omission" and "hallucination" in place of "addition" and "deletion, " respectively.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Propositions</head><p>This section describes our approach to evaluate the quality of generated KGs. We explain how we use evaluation metrics such as T-F1, G-F1, G-BS, GED, Bleu-F1 <ref type="bibr" target="#b34">[35]</ref> and ROUGE-F1 <ref type="bibr" target="#b35">[36]</ref> to assess the quality of the generated KGs in comparison to ground-truth KGs. Additionally, we discuss the use of Optimal Edit Paths (OEP) metric <ref type="foot" target="#foot_0">1</ref> to determine the precise number of operations required to transform the predicted graph into an identical representation of the ground-truth graph. This metric serves as a basis for calculating omissions and hallucinations in the generated graphs. We employ examples from the WebNLG+2020 dataset <ref type="bibr" target="#b36">[37]</ref> for testing with ZSP and FSP techniques. Additionally, we utilize the training dataset of WebNLG+2020 to train LLMs using the FT technique. Subsequent subsections delve into a detailed discussion of each phase. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">Overall experimentation's process</head><p>We leverage the WebNLG+2020 dataset, specifically the version curated by <ref type="bibr" target="#b5">[6]</ref>. Their preparation of graphs in lists of triples proves beneficial for evaluation purposes. We utilize these lists and employ NetworkX <ref type="bibr" target="#b37">[38]</ref> to transform them back into graphs, facilitating evaluations on the resultant graphs. This step is instrumental in performing ZSP, FSP, and FT LLMs on this dataset.</p><p>The figure <ref type="figure" target="#fig_1">2</ref> illustrates the different stages of our experimentation process, including data preparation, model selection, training, validation, and evaluation. The process begins with data preparation, where the WEBNLG dataset is preprocessed and split into training, validation, and test sets. Next, the learning type is selected, and different models are trained using the training set. The trained models are then evaluated on the validation set to evaluate their performance. Finally, the best-performing model is selected and validated on the test set to estimate its generalization ability.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">Prompting learning</head><p>During this phase, we employ the ZSP and FSP techniques on LLMs to evaluate their proficiency in extracting triples (e.g. construction of the KG). The application of these techniques involves merging examples from the test dataset of WebNLG+2020 with our adapted prompt. Our prompt is strategically modified to provide contextual guidance to the LLMs, facilitating the effective extraction of triples, without the inclusion of a support ontology description, as demonstrated in <ref type="bibr" target="#b2">[3]</ref>. The specific prompts used for ZSP and FSP are illustrated in In our approach for ZSP, we began with the methodology outlined in <ref type="bibr" target="#b5">[6]</ref>, initiating our prompt with the directive "Transform the text into a semantic graph. " However, we enhanced For FSP, we executed 7-shots learning. The rationale behind employing 7-shots learning lies in the fact that the maximum KG size in WebNLG+2020 is 7 triples. Consequently, we fed our prompt with 7 examples of varying sizes; example 1 with size 1, example 2 with size 2, example 3 with size 3, and so forth. In Figure <ref type="figure" target="#fig_3">3</ref>-b, we depict a prompt containing two examples.</p><p>To demonstrate the efficacy of our refined prompt (including additional sentences), we conducted zero-shot experiments on ChatGPT <ref type="bibr" target="#b9">[10]</ref>, comparing the outcomes with those of <ref type="bibr" target="#b5">[6]</ref>. Our results consistently reveal that our prompt yields more coherent answers in terms of structure.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3.">Finetuning</head><p>If the initial results from the ZSP and FSP on LLMs prove reasonable, we proceed to the FT phase. This phase aims to provide the LLMs with a more specific context and knowledge related to the task of extracting triples within the domains covered by the WebNLG+2020 dataset. Using the example "a)" illustrated in <ref type="bibr">Fig 3,</ref><ref type="bibr"></ref> we passe in the FT prompt, at once for each line of the training dataset, the input text and the corresponding KG (the list of triples). To do this phase (FT), we employ QLoRA <ref type="bibr" target="#b38">[39]</ref>, a methodology that integrates quantization <ref type="bibr" target="#b39">[40]</ref> and Low-Rank Adapters (LoRA) <ref type="bibr" target="#b40">[41]</ref>. The LLM is loaded with 4-bit precision using bitsandbytes <ref type="bibr" target="#b41">[42]</ref>, and the training process incorporates LoRA through the PEFT library (Parameter-Efficient Fine-Tuning) <ref type="bibr" target="#b42">[43]</ref> provided by Hugging Face.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.4.">Postprocessing</head><p>Given our focus on KG construction, our evaluation process involves assessing the generated KGs against ground-truth KGs. To facilitate this evaluation, we take a cleaning process for the LLMs output. This involves transforming the graphs generated by LLMs into organized lists of triples, subsequently transferred to textual documents.</p><p>The transformation is executed through rule-based processing. This step is applied to remove corrupted text (outside the lists of triples) from the whole text generated by LLMs in the preceding step. The output is then presented in a list of lists of triples format, optimizing our evaluation process. This approach proves especially effective when calculating metrics such as G-F1, GED, and OEP, as we will see in more detail in 3.5</p><p>A potential problem arises when instructing LLMs to produce lists of triples (KGs), as there may be instances where the generated text lacks the desired structure. In such cases, we address this issue by substituting the generated text with an empty list of triples, represented as '[["","",""]]', allowing us to effectively evaluate omissions. However, this approach tends to underestimate hallucinations compared to the actual occurrences.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.5.">Experiment's evaluation</head><p>For assessing the quality of the generated graphs in comparison to ground-truth graphs, we adopt evaluation metrics as employed in <ref type="bibr" target="#b5">[6]</ref>. These metrics encompass T-F1, G-F1, G-BS <ref type="bibr" target="#b31">[32]</ref>, and GED <ref type="bibr" target="#b33">[34]</ref>. Additionally, we incorporate the Optimal Edit Paths (OEP) metric, a tool aiding in the calculation of omissions and hallucinations within the generated graphs.</p><p>Our evaluation procedure aligns with the methodology outlined in <ref type="bibr" target="#b5">[6]</ref>, particularly in the computation of GED and G-F1. This involves constructing directed graphs from lists of triples, referred to as linearized graphs, utilizing NetworkX <ref type="bibr" target="#b37">[38]</ref>.</p><p>In contrast to <ref type="bibr" target="#b2">[3]</ref>, our methodology diverges by not relying on the ground truth test sentence of an ontology. As previously mentioned, we opt for a distinct approach wherein we assess omissions and hallucinations in the generated graphs using the OEP metric. Unlike the global edit distance provided by GED, OEP gives the precise path of the edit, enabling the exact quantification of omissions and hallucinations, either in absolute terms or as a percentage across the entire test dataset.</p><p>For example, in the illustrated nodes path labeled 'a)' in , we observe 2 omissions, while the edges path in Fig 4 <ref type="figure">-(a</ref>) exhibits 1 hallucination. In our evaluation, the criterion for incrementing the global hallucination metric for all graphs is set at finding &gt;=1 hallucinations or 1 omission in a generated graph. This approach ensures a comprehensive assessment of the presence of omissions and hallucinations across the entirety of the generated graphs.</p><p>As mentioned earlier, the evaluation of the three methods is conducted using examples sourced from the test dataset of WebNLG+2020. The primary goal is to enhance G-F1, T-F1, G-BS, Bleu-F1, and ROUGE-F1 metrics, while reducing GED, Hallucination, and Omission.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.6.">Mathematical representation of the used metrics</head><p>We mathematically represent the used metrics as follows:</p><p>Graph Matching (𝐺-𝐹 1 ). Let 𝑀 𝑐ℎ be the number of matches between predicted and gold graphs. And let 𝑇 𝑜𝐺𝑟𝑎𝑝ℎ𝑠 be the total number of predicted graphs. Then, the accuracy for entire graph matches 𝐴𝑐𝑐 𝑔𝑟𝑎𝑝ℎ can be calculated as:</p><formula xml:id="formula_0">𝐴𝐶𝐶 𝑔𝑟𝑎𝑝ℎ = 𝑀 𝑐ℎ 𝑇 𝑜𝐺𝑟𝑎𝑝ℎ𝑠</formula><p>Triples Matcning (𝑇 -𝐹 1). The 𝐹 1 score for triple matches 𝑇 -𝐹 1 is calculated in the following:</p><formula xml:id="formula_1">𝑇 -𝐹 1 = 2 × 𝑇 𝑃 2 × 𝑇 𝑃 + 𝐹 𝑃 + 𝐹 𝑁 Where</formula><p>• TP is the number of true positive triple matches.</p><p>• FP is the number of false positive triple matches.</p><p>• FN is the number of false negative triple matches.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Graph Edit Distance (GED).</head><p>The following equation calculate GED between two given graphs :</p><formula xml:id="formula_2">𝐺𝐸𝐷(𝑔1, 𝑔2) = min 𝑒 1 ,...,𝑒 𝑘 ∈𝛾(𝑔1,𝑔2) 𝑘 ∑︁ 𝑖=1 𝑐(𝑒 𝑖 )</formula><p>Where:</p><p>• GED(𝑔 This part calculates the sum of the costs of each individual edit operation 𝑒 𝑖 in the selected edit path. The cost function 𝑐(𝑒 𝑖 ) measures the cost or strength of each edit operation. The objective is to find the edit path with the minimum total cost, which represents the least amount of transformation required to convert 𝑔 1 into 𝑔 2 .</p><p>In our experiments, we calculate the overall GED which is computed as follows:</p><formula xml:id="formula_3">overall_ged = 1 𝑁 𝑁 ∑︁ 𝑖=1 GED ED 𝑖</formula><p>Where:</p><p>• 𝑁 is the total number of graphs.</p><p>• GED ED 𝑖 is the graph edit distance for the 𝑖th graph.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Graph BERTScore (G-BS)</head><p>. G-BS takes graphs as a set of edges and solve a matching problem which finds the best alignment between the edges in predicted graph and those in ground-truth graph. Each edge is considered as a sentence and BERTScore is used to calculate the score between a pair of predicted and ground-truth edges, Based on the best alignment and the overall matching score, the computed F1 score is used as the final G-BERTScore. Considering 𝑥 𝑖 as reference token (entity or relation) and 𝑥 ˆ𝑖 as generated token (entity or relation), the complete score matches each token in 𝑥 to a generated token in 𝑥 ˆto compute recall, and each token in 𝑥 ˆto a token in 𝑥 to compute precision. A greedy matching is used to maximize the matching similarity score, where each token is matched to the most similar token in the other graph. Then precision and recall are combined to compute an F1 measure. For a reference 𝑥 and candidate 𝑥 ˆ, the recall, precision, and F1 scores are:</p><formula xml:id="formula_4">𝑅 BERT = 1 |𝑥| ∑︁ 𝑥 𝑖 ∈𝑥 max 𝑥 ^𝑗 ∈𝑥 ^𝑥𝑇 𝑖 𝑥 ˆ𝑗, 𝑃 BERT = 1 |𝑥 ˆ| ∑︁ 𝑥 ^𝑗 ∈𝑥 ^max 𝑥 𝑖 ∈𝑥 𝑥 𝑇 𝑖 𝑥 ˆ𝑗, 𝐹 1 BERT = 2 • 𝑃 BERT • 𝑅 BERT 𝑃 BERT + 𝑅 BERT .</formula><p>Bleu-F1 Score (𝐹 1 𝐵𝑙𝑒𝑢 ). Let 𝐶 𝑔𝑒𝑛 be the count of 4-grams in the generated graph , Let 𝐶 𝑟𝑒𝑓 be the count of 4-grams in the reference graph, and Let 𝐶 𝑚𝑎𝑡𝑐ℎ be the count of matching 4-grams in both texts</p><formula xml:id="formula_5">𝑃 𝐵𝑙𝑒𝑢 = 𝐶 𝑚𝑎𝑡𝑐ℎ 𝐶 𝑔𝑒𝑛 𝑅 𝐵𝑙𝑒𝑢 = 𝐶 𝑚𝑎𝑡𝑐ℎ 𝐶 𝑟𝑒𝑓 𝐹 1 𝐵𝑙𝑒𝑢 = 2 × 𝑃 𝐵𝑙𝑒𝑢 × 𝑅 𝐵𝑙𝑒𝑢 𝑃 𝐵𝑙𝑒𝑢 + 𝑅 𝐵𝑙𝑒𝑢 ROUGE-F1 Score (𝐹 1 𝑅𝑂𝑈 𝐺𝐸 ).</formula><p>In our experiments, we calculate F1-score for Rouge-2 (bigram), which is presented in the following equation:</p><formula xml:id="formula_6">𝑃 𝑅𝑂𝑈 𝐺𝐸 = 𝑏𝑖𝑔𝑟𝑎𝑚 𝑐𝑎𝑛𝑑. ∩ 𝑏𝑖𝑔𝑟𝑎𝑚 𝑟𝑒𝑓. 𝑏𝑖𝑔𝑟𝑎𝑚 𝑐𝑎𝑛𝑑. 𝑅 𝑅𝑂𝑈 𝐺𝐸 = 𝑏𝑖𝑔𝑟𝑎𝑚 𝑐𝑎𝑛𝑑. ∩ 𝑏𝑖𝑔𝑟𝑎𝑚 𝑟𝑒𝑓. 𝑏𝑖𝑔𝑟𝑎𝑚 𝑟𝑒𝑓. 𝐹 1 𝑅𝑂𝑈 𝐺𝐸 = 2. 𝑅 𝑅𝑂𝑈 𝐺𝐸 .𝑃 𝑅𝑂𝑈 𝐺𝐸 𝑅 𝑅𝑂𝑈 𝐺𝐸 + 𝑃 𝑅𝑂𝑈 𝐺𝐸</formula></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Hallucination and Omission.</head><p>As mentioned before, we calculate hallucination and omission using OEP, which is the optimal edit paths between the gold and predicted graphs. Each edit operation (ei) in OEP represents an action required to transform the predicted graph into the gold graph.</p><p>• Hallucination: An edit operation 𝑒 𝑖 is considered a hallucination if it involves adding an entity or a relation that is not present in the gold graph but exists in the predicted graph. In our work, we take into account the overall hallucination ℎ𝑎𝑙𝑙., this metric is represented by the following equation :</p><formula xml:id="formula_7">𝐻𝑎𝑙𝑙. = ℎ𝑎𝑙𝑙 𝑇 𝑜𝐺𝑟𝑠</formula><p>Where ℎ𝑎𝑙𝑙 is the number of graphs with hallucination, and 𝑇 𝑜𝐺𝑟𝑠 in the total number of generated graphs • Omission: An edit operation 𝑒𝑖 is considered an omission if it involves deleting an entity or a relation that exists in the gold graph but is missing in the predicted graph. In ou work, we do the same as the hallucicnation, we calculate the overall omission 𝑜𝑚𝑖𝑠., presented by the following equation :</p><formula xml:id="formula_8">𝑂𝑚𝑖𝑠. = 𝑜𝑚𝑖𝑠𝑠/𝑇 𝑜𝐺𝑟𝑠</formula><p>Where 𝑜𝑚𝑖𝑠𝑠 is the number of graphs with omission. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Experiments</head><p>This section provides insights into the LLMs utilized in our study for ZSP, FSP, or FT, followed by the presentation of our experimental results.</p><p>In this section, we provide a brief overview of the LLMs utilized in our experiments. Our selection criteria focused on employing small, open-source, and easily accessible LLMs. All models were sourced from the HuggingFace platform<ref type="foot" target="#foot_1">2</ref> </p><p>• Llama 2 <ref type="bibr" target="#b11">[12]</ref> is a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters. In our experiments, we deploy the 7B and 13B pretrained models, which have been converted to the Hugging Face Transformers format. • Introduced by <ref type="bibr" target="#b14">[15]</ref>, Mistral-7B-v0.1 is a pretrained generative text model featuring 7 billion parameters. Notably, Mistral-7B-v0.1 exhibits superior performance to Llama 2 13B across all benchmark tests in their experiments. • In the work presented by <ref type="bibr" target="#b15">[16]</ref>, Starling-7B is introduced as an open LLM trained through Reinforcement Learning from AI Feedback (RLAIF). This model leverages the GPT-4 labeled ranking dataset, berkeley-nest/Nectar, and employs a novel reward training and policy tuning pipeline.</p><p>In our review of the state-of-the-art, we observed that, apart from <ref type="bibr" target="#b2">[3]</ref>, which incorporates hallucination evaluation in their experiments, other studies primarily focus on metrics such as precision, recall, F1 score, triple matching, or graph matching. In our approach to evaluating experiments, we consider also hallucination and omission through a linguistic lens.</p><p>Upon examining Table <ref type="table" target="#tab_1">1</ref>, we observe the superior performance of the FT method compared to ZSP and FSP for the T2KG construction task. Of particular interest is the finding that, with the exception of Llama2-7b, applying ZSP to the fine-tuned Llama2-7b results in worse performance than FSP on the original Llama2-7b. Overall, this table provides a clear visualization of the relative performance of each method, highlighting the strengths and limitations of each approach for T2KG construction.</p><p>Furthermore, it is evident that better results are achieved by providing more examples (more shots) to the same model, whether original or fine-tuned. The results underscore the positive correlation between the quantity of examples and the model's performance. Comparing the fine-tuned Mistral and fine-tuned Starling, they exhibit similar performance when given 7 shots, surpassing the two Llama2 models by a significant margin. The standout performer with ZSP on the fine-tuned LLM is Mistral, showcasing a considerable lead over other LLMs, including Starling. To corroborate these findings, future versions of our study plan to assess our fine-tuned models using an alternative dataset with diverse domains.</p><p>As depicted in Figure <ref type="figure" target="#fig_1">2</ref>, Hall. represents Hallucinations, while Omis. denotes Omissions.</p><p>Taking into account our strategy of introducing an empty graph when LLMs fail to produce triples, we note that even with LLama2-13b with ZSP exhibiting the least favorable results across all metrics, it displays minimal hallucinations. Nonetheless, it's crucial to recognize that the model with the fewest hallucinations may not necessarily be the most suitable choice. To overcome this limitation in our evaluation metric, we aim to improve it by considering the prevalence of empty graphs in the generated results before assessing them against ground truth graphs.</p><p>The G-BS consistently remains high, indicating that LLMs frequently generate text with words (entities or relations) very similar to those in the ground truth graphs. Among the models, the finetuned Starling with 7 shots achieves the highest G-F1, which focuses on the entirety of the graph and evaluates how many graphs are exactly produced the same, suggesting that it accurately generates approximately 36% of graphs identical to the ground truth. For various metrics, the finetuned Mistral with 7 shots performs exceptionally well, particularly in T-F1, where F1 scores are computed for all test samples and averaged for the final Triple Match F1 score. Additionally, it excels in metrics such as "Omis.," F1-Bleu, and F1-Rouge. F1-Bleu and F1-Rouge represent n-gram-based metrics encompassing precision (Bleu), recall (Rouge), and F-score (Bleu and Rouge). These metric could potentially yield even better results if synonyms of entities or relations are considered as exact matches.</p><p>The authors in <ref type="bibr" target="#b5">[6]</ref> conduct evaluations using WebNLG+2020. Consequently, we adopt their approach (PiVE) as a baseline for comparison with our experiments. Upon analyzing the results, it becomes evident that nearly all fine-tuned LLMs outperform PiVE, which is applied on both ChatGPT and GPT-4 as mentioned before.</p><p>In Table <ref type="table" target="#tab_2">2</ref>, we present the evaluation results of original LLMs with 7 shots and fine-tuned LLMs with zero-shot and 7 shots on the KELM-sub dataset prepared by <ref type="bibr" target="#b5">[6]</ref>, building upon <ref type="bibr" target="#b43">[44]</ref>. It's crucial to note that the experiments utilized the same prompts as previously described.  <ref type="table" target="#tab_2">2</ref> indicate that our fine-tuned LLMs perform less effectively than the original LLMs with 7 shots. Furthermore, all LLMs' results on KELM-sub are inferior to those on WebNLG+2020. This disparity can be attributed to the presence of different relation types, where some types are expressed differently in Kelm, utilizing synonyms not considered in the current metrics. Addressing this, our forthcoming versions aim to refine metrics to accommodate synonyms in entities and relations.</p><p>We also observe that the evaluation of PiVE on Sub-Kelm yields better results, leveraging examples from the Sub-Kelm training dataset in their few-shot experiments, providing LLMs with insights into certain relation types.</p><p>One of the future experimentations will be to use examples from KELM-sub for few-shot prompts to investigate whether the generalization issue stems from WebNLG domains, relation types, or prompts that need improvement to disregard the relation types provided by the examples.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Conclusion and perspectives</head><p>This study delves into the Text-to-Knowledge Graph (T2KG) construction task, exploring the efficacy of three distinct approaches: Zero-Shot Prompting (ZSP), Few-Shot Prompting (FSP), and Fine-Tuning (FT) of Large Language Models (LLMs). Our comprehensive experimentation, employing models such as Llama2, Mistral, and Starling, sheds light on the strengths and limitations of each approach. The results demonstrate the remarkable performance of the FT method, particularly when compared to ZSP and FSP across various models. Notably, the fine-tuned Llama2-7b with ZSP gaved worst results than FSP with the original Llama2. Additionally, the positive correlation between the quantity of examples and model performance underscores the significance of dataset size in training. An essential part of our study involves the evaluation metrics employed to assess the generated graphs. Particularly, we introduced nuanced considerations for refining these metrics to measuring hallucination and omission in the generated graphs, offering valuable insights into the fidelity of the constructed knowledge graphs.</p><p>Looking forward, there are promising perspectives for further enhancement. One is to involve refining evaluation metrics to accommodate synonyms of entities or relations in generated graphs, employing advanced methods or tools for synonym detection. Furthermore, leveraging LLMs for data augmentation in the T2KG construction task shows promise. Notably, during experimentation, LLMs, particularly Starling, exhibited the ability to provide continuity in generated results for T2KG, proposing texts alongside corresponding KGs (triples).</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: T2KG Task</figDesc><graphic coords="3,89.29,242.39,416.68,165.44" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: Overall experimentation's process</figDesc><graphic coords="6,89.29,177.42,416.69,132.55" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head></head><label></label><figDesc>Fig 3(a) and Fig 3(b),</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head>Figure 3 :</head><label>3</label><figDesc>Figure 3: Prompting examples</figDesc><graphic coords="7,89.29,84.19,416.70,223.29" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_4"><head>Figure 4 :</head><label>4</label><figDesc>Figure 4: Results examples</figDesc><graphic coords="11,91.37,403.13,412.52,221.80" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_5"><head>The 7 -</head><label>7</label><figDesc>shot experiments sourced examples from the WebNLG+2020 training dataset. These new experiments aim to assess the generalization ability of original LLMs with 7 shots and fine-tuned LLMs with zero-shot and 7 shots across diverse domains in the T2KG construction task.The results in Table</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 1</head><label>1</label><figDesc>Comparison of performance metrics and models</figDesc><table><row><cell>Model | Metric</cell><cell>G-F1</cell><cell>T-F1</cell><cell cols="4">G-BS GED F1-Bleu F1-Rouge</cell><cell cols="2">Hall. Omis.</cell></row><row><cell>PiVE</cell><cell cols="4">14.00 18.57 89.82 11.22</cell><cell>-</cell><cell>-</cell><cell>-</cell><cell>-</cell></row><row><cell>Mistral-0</cell><cell>2.30</cell><cell>0.00</cell><cell cols="2">77.87 15.93</cell><cell>54.97</cell><cell>55.15</cell><cell cols="2">20.63 31.48</cell></row><row><cell>Mistral-7</cell><cell cols="4">18.72 28.44 87.54 10.13</cell><cell>55.09</cell><cell>63.94</cell><cell cols="2">17.88 21.14</cell></row><row><cell>Mistral-FT-0</cell><cell cols="3">31.93 44.08 86.89</cell><cell>8.25</cell><cell>63.88</cell><cell>69.08</cell><cell cols="2">13.55 18.27</cell></row><row><cell>Mistral-FT-7</cell><cell cols="4">34.68 49.11 91.99 6.69</cell><cell>71.78</cell><cell>77.43</cell><cell cols="2">15.01 14.45</cell></row><row><cell>Starling-0</cell><cell>5.23</cell><cell>7.83</cell><cell cols="2">86.29 13.35</cell><cell>34.64</cell><cell>14.61</cell><cell cols="2">17.48 33.24</cell></row><row><cell>Starling-7</cell><cell cols="3">21.30 33.77 90.41</cell><cell>8.96</cell><cell>60.47</cell><cell>69.34</cell><cell cols="2">17.31 14.61</cell></row><row><cell>Starling-FT-0</cell><cell cols="4">21.47 28.29 72.86 11.87</cell><cell>44.07</cell><cell>47.69</cell><cell cols="2">10.17 42.78</cell></row><row><cell>Starling-FT-7</cell><cell cols="4">35.69 48.49 91.95 6.60</cell><cell>71.51</cell><cell>76.67</cell><cell cols="2">11.35 18.27</cell></row><row><cell>Llama2-7b-0</cell><cell>0.00</cell><cell>0.46</cell><cell cols="2">54.20 18.29</cell><cell>20.23</cell><cell>17.98</cell><cell>4.83</cell><cell>81.53</cell></row><row><cell>Llama2-7b-7</cell><cell cols="4">11.80 20.88 82.78 12.66</cell><cell>45.48</cell><cell>54.29</cell><cell cols="2">20.74 30.02</cell></row><row><cell>Llama2-7b-FT-0</cell><cell>3.82</cell><cell cols="3">15.41 59.19 15.78</cell><cell>16.82</cell><cell>17.95</cell><cell>6.07</cell><cell>79.20</cell></row><row><cell>Llama2-7b-FT-7</cell><cell cols="4">18.77 32.63 87.19 10.16</cell><cell>58.48</cell><cell>66.35</cell><cell cols="2">25.24 18.66</cell></row><row><cell>Llama2-13b-0</cell><cell>0.00</cell><cell>0.79</cell><cell cols="2">57.42 17.79</cell><cell>20.50</cell><cell>18.23</cell><cell>4.78</cell><cell>81.23</cell></row><row><cell>Llama2-13b-7</cell><cell cols="4">13.49 23.99 84.89 11.59</cell><cell>50.18</cell><cell>58.71</cell><cell cols="2">26.36 19.06</cell></row><row><cell>Llama2-13b-FT-0</cell><cell cols="4">20.52 32.18 75.88 11.38</cell><cell>46.53</cell><cell>50.78</cell><cell cols="2">11.64 39.63</cell></row><row><cell>Llama2-13b-FT-7</cell><cell cols="3">23.55 37.29 88.77</cell><cell>8.94</cell><cell>63.26</cell><cell>70.12</cell><cell cols="2">23.55 16.19</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 2</head><label>2</label><figDesc>Results on KELM-sub</figDesc><table><row><cell>Model | Metric</cell><cell>G-F1</cell><cell>T-F1</cell><cell cols="5">G-BS GED F1-Bleu F1-Rouge Hall. Omis.</cell></row><row><cell>PiVE</cell><cell>23.11</cell><cell>7.50</cell><cell>87.70 11.35</cell><cell>-</cell><cell>-</cell><cell>-</cell><cell>-</cell></row><row><cell>Mistral-7</cell><cell>5.61</cell><cell cols="2">10.89 71.29 14.28</cell><cell>56.56</cell><cell>61.11</cell><cell>2.33</cell><cell>77.33</cell></row><row><cell>Mistral-FT-0</cell><cell>2.28</cell><cell>8.02</cell><cell>69.29 14.92</cell><cell>24.24</cell><cell>35.70</cell><cell>2.06</cell><cell>77.22</cell></row><row><cell>Mistral-FT-7</cell><cell>2.83</cell><cell>8.73</cell><cell>68.55 14.54</cell><cell>26.35</cell><cell>38.76</cell><cell>1.78</cell><cell>78.17</cell></row><row><cell>Starling-7</cell><cell>5.61</cell><cell cols="2">13.82 83.16 12.85</cell><cell>65.79</cell><cell>71.20</cell><cell cols="2">5.33 59.44</cell></row><row><cell>Starling-FT-0</cell><cell>2.00</cell><cell>5.76</cell><cell>64.87 16.51</cell><cell>17.64</cell><cell>24.29</cell><cell>0.72</cell><cell>79.39</cell></row><row><cell>Starling-FT-7</cell><cell>3.11</cell><cell>9.82</cell><cell>67.79 14.53</cell><cell>27.37</cell><cell>39.49</cell><cell></cell><cell>78.67</cell></row><row><cell>Llama2-7b-7</cell><cell>5.06</cell><cell>6.20</cell><cell>67.49 15.55</cell><cell>52.18</cell><cell>56.71</cell><cell>2.28</cell><cell>76.83</cell></row><row><cell>Llama2-7b-FT-0</cell><cell>0.22</cell><cell>1.71</cell><cell>58.85 18.84</cell><cell>6.54</cell><cell>7.81</cell><cell cols="2">0.56 80.28</cell></row><row><cell>Llama2-7b-FT-7</cell><cell>5.28</cell><cell>8.33</cell><cell>67.29 15.09</cell><cell>26.86</cell><cell>38.75</cell><cell>3.67</cell><cell>75.33</cell></row><row><cell>Llama2-13b-7</cell><cell>5.17</cell><cell>7.82</cell><cell>71.66 15.12</cell><cell>55.39</cell><cell>60.06</cell><cell>3.44</cell><cell>75.56</cell></row><row><cell>Llama2-13b-FT-0</cell><cell>1.72</cell><cell>7.73</cell><cell>63.37 15.80</cell><cell>20.59</cell><cell>29.53</cell><cell>1.56</cell><cell>79.44</cell></row><row><cell>Llama2-13b-FT-7</cell><cell>4.50</cell><cell>8.63</cell><cell>67.44 14.81</cell><cell>26.33</cell><cell>38.09</cell><cell>2.06</cell><cell>77.22</cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">NetworkX -optimal edit paths : https://networkx.org/documentation/stable/index.html</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_1">Hugging Face: https://huggingface.co/</note>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgments</head><p>The authors thank the French company DAVI (Davi The Humanizers, Puteaux, France) for their support, and the French government for the plan France Relance funding.</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Knowledge graphs</title>
		<author>
			<persName><forename type="first">A</forename><surname>Hogan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Blomqvist</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Cochez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Amato</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">D</forename><surname>Melo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Gutierrez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Kirrane</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">E L</forename><surname>Gayo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Navigli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Neumaier</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">ACM Computing Surveys (Csur)</title>
		<imprint>
			<biblScope unit="volume">54</biblScope>
			<biblScope unit="page" from="1" to="37" />
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Industry-scale knowledge graphs: Lessons and challenges: Five diverse technology companies show how it&apos;s done</title>
		<author>
			<persName><forename type="first">N</forename><surname>Noy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Gao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Jain</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Narayanan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Patterson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Taylor</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Queue</title>
		<imprint>
			<biblScope unit="volume">17</biblScope>
			<biblScope unit="page" from="48" to="75" />
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Text2kgbench: A benchmark for ontology-driven knowledge graph generation from text</title>
		<author>
			<persName><forename type="first">N</forename><surname>Mihindukulasooriya</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Tiwari</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">F</forename><surname>Enguix</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Lata</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International Semantic Web Conference</title>
				<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2023">2023</date>
			<biblScope unit="page" from="247" to="265" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<monogr>
		<author>
			<persName><forename type="first">V</forename><surname>Ershov</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2302.01842</idno>
		<title level="m">A case study for compliance as code with graphs and language models: Public release of the regulatory knowledge graph</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b4">
	<monogr>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">H</forename><surname>Caufield</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Hegde</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Emonet</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><forename type="middle">L</forename><surname>Harris</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">P</forename><surname>Joachimiak</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Matentzoglu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Kim</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">A</forename><surname>Moxon</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">T</forename><surname>Reese</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">A</forename><surname>Haendel</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2304.02711</idno>
		<title level="m">Structured prompt interrogation and recursive extraction of semantics (spires): A method for populating knowledge bases using zero-shot learning</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b5">
	<monogr>
		<author>
			<persName><forename type="first">J</forename><surname>Han</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Collier</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Buntine</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Shareghi</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2305.12392</idno>
		<title level="m">Pive: Prompting with iterative verification improving graph-based generative capability of llms</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b6">
	<monogr>
		<author>
			<persName><forename type="first">S</forename><surname>Min</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Lyu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Holtzman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Artetxe</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Lewis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Hajishirzi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Zettlemoyer</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2202.12837</idno>
		<title level="m">Rethinking the role of demonstrations: What makes in-context learning work?</title>
				<imprint>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Training language models to follow instructions with human feedback</title>
		<author>
			<persName><forename type="first">L</forename><surname>Ouyang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Jiang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Almeida</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Wainwright</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Mishkin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Agarwal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Slama</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Ray</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Advances in neural information processing systems</title>
		<imprint>
			<biblScope unit="volume">35</biblScope>
			<biblScope unit="page" from="27730" to="27744" />
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Learning to summarize with human feedback</title>
		<author>
			<persName><forename type="first">N</forename><surname>Stiennon</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Ouyang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Ziegler</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Lowe</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Voss</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Radford</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Amodei</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">F</forename><surname>Christiano</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Advances in Neural Information Processing Systems</title>
		<imprint>
			<biblScope unit="volume">33</biblScope>
			<biblScope unit="page" from="3008" to="3021" />
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<monogr>
		<title level="m" type="main">Gpt-4</title>
		<author>
			<persName><forename type="first">R</forename><surname>Openai</surname></persName>
		</author>
		<idno>arxiv 2303.08774</idno>
		<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
	<note type="report_type">technical report</note>
	<note>View in Article 2</note>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Language models are few-shot learners</title>
		<author>
			<persName><forename type="first">T</forename><surname>Brown</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Mann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Ryder</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Subbiah</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">D</forename><surname>Kaplan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Dhariwal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Neelakantan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Shyam</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Sastry</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Askell</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Advances in neural information processing systems</title>
		<imprint>
			<biblScope unit="volume">33</biblScope>
			<biblScope unit="page" from="1877" to="1901" />
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<monogr>
		<author>
			<persName><forename type="first">H</forename><surname>Touvron</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Lavril</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Izacard</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Martinet</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M.-A</forename><surname>Lachaux</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Lacroix</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Rozière</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Goyal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Hambro</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Azhar</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2302.13971</idno>
		<title level="m">Llama: Open and efficient foundation language models</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b12">
	<monogr>
		<author>
			<persName><forename type="first">B</forename><surname>Workshop</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">L</forename><surname>Scao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Fan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Akiki</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Pavlick</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Ilić</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Hesslow</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Castagné</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">S</forename><surname>Luccioni</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Yvon</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2211.05100</idno>
		<title level="m">Bloom: A 176b-parameter open-access multilingual language model</title>
				<imprint>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Palm: Scaling language modeling with pathways</title>
		<author>
			<persName><forename type="first">A</forename><surname>Chowdhery</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Narang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Devlin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Bosma</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Mishra</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Roberts</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Barham</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><forename type="middle">W</forename><surname>Chung</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Sutton</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Gehrmann</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Machine Learning Research</title>
		<imprint>
			<biblScope unit="volume">24</biblScope>
			<biblScope unit="page" from="1" to="113" />
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<monogr>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">Q</forename><surname>Jiang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Sablayrolles</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Mensch</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Bamford</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">S</forename><surname>Chaplot</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Casas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Bressand</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Lengyel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Lample</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Saulnier</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2310.06825</idno>
		<title level="m">Mistral 7b</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b15">
	<monogr>
		<author>
			<persName><forename type="first">B</forename><surname>Zhu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Frick</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Zhu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Jiao</surname></persName>
		</author>
		<title level="m">Starling-7b: Improving llm helpfulness &amp; harmlessness with rlaif</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<monogr>
		<author>
			<persName><forename type="first">L</forename><surname>Tunstall</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Beeching</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Lambert</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Rajani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Rasul</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Belkada</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Huang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Werra</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Fourrier</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Habib</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2310.16944</idno>
		<title level="m">Zephyr: Direct distillation of lm alignment</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b17">
	<monogr>
		<author>
			<persName><forename type="first">S</forename><surname>Carta</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Giuliani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Piano</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">S</forename><surname>Podda</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Pompianu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">G</forename><surname>Tiddia</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2307.01128</idno>
		<title level="m">Iterative zero-shot llm prompting for knowledge graph construction</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b18">
	<monogr>
		<author>
			<persName><forename type="first">Y</forename><surname>Zhu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Qiao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Ou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Yao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Deng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Zhang</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2305.13168</idno>
		<title level="m">Llms for knowledge graph construction and reasoning: Recent capabilities and future opportunities</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b19">
	<monogr>
		<author>
			<persName><forename type="first">B</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Fang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Yang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Ye</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Zhao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Zhang</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2304.11633</idno>
		<title level="m">Evaluating chatgpt&apos;s information extraction capabilities: An assessment of performance, explainability, calibration, and faithfulness</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b20">
	<monogr>
		<author>
			<persName><forename type="first">X</forename><surname>Wei</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Cui</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Cheng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Huang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Xie</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Xu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Zhang</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2302.10205</idno>
		<title level="m">Zero-shot information extraction via chatting with chatgpt</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b21">
	<analytic>
		<title level="a" type="main">Relevant entity selection: Knowledge graph bootstrapping via zero-shot analogical pruning</title>
		<author>
			<persName><forename type="first">L</forename><surname>Jarnac</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Couceiro</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Monnin</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 32nd ACM International Conference on Information and Knowledge Management</title>
				<meeting>the 32nd ACM International Conference on Information and Knowledge Management</meeting>
		<imprint>
			<date type="published" when="2023">2023</date>
			<biblScope unit="page" from="934" to="944" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<analytic>
		<title level="a" type="main">Codekgc: Code language model for generative knowledge graph construction</title>
		<author>
			<persName><forename type="first">Z</forename><surname>Bi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Jiang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Xiong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Guo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Zhang</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">ACM Transactions on Asian and Low-Resource Language Information Processing</title>
		<imprint>
			<biblScope unit="volume">23</biblScope>
			<biblScope unit="page" from="1" to="16" />
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b23">
	<monogr>
		<author>
			<persName><forename type="first">L</forename><surname>Yao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Peng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Mao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Luo</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2308.13916</idno>
		<title level="m">Exploring large language models for knowledge graph completion</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b24">
	<monogr>
		<author>
			<persName><forename type="first">H</forename><surname>Khorashadizadeh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Mihindukulasooriya</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Tiwari</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Groppe</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Groppe</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2305.08804</idno>
		<title level="m">Exploring in-context learning capabilities of foundation models for generating knowledge graphs from text</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b25">
	<analytic>
		<title level="a" type="main">Construction and applications of billion-scale pre-trained multimodal business knowledge graph</title>
		<author>
			<persName><forename type="first">S</forename><surname>Deng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Dai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Xiong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Yan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Chen</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">IEEE 39th International Conference on Data Engineering (ICDE)</title>
				<imprint>
			<publisher>IEEE</publisher>
			<date type="published" when="2023">2023. 2023</date>
			<biblScope unit="page" from="2988" to="3002" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b26">
	<monogr>
		<author>
			<persName><forename type="first">M</forename><surname>Trajanoska</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Stojanov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Trajanov</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2305.04676</idno>
		<title level="m">Enhancing knowledge graph construction using large language models</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b27">
	<monogr>
		<author>
			<persName><forename type="first">J</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Ma</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Thakurdesai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Xu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">H</forename><surname>Cho</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Nag</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Korpeoglu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Kumar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Achan</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2305.09858</idno>
		<title level="m">Knowledge graph completion models are few-shot learners: An empirical study of relation labeling in e-commerce with llms</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b28">
	<analytic>
		<title level="a" type="main">Bert based clinical knowledge extraction for biomedical knowledge graph construction and analysis</title>
		<author>
			<persName><forename type="first">A</forename><surname>Harnoune</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Rhanoui</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Mikram</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Yousfi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Elkaimbillah</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">El</forename><surname>Asri</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Computer Methods and Programs in Biomedicine Update</title>
		<imprint>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page">100042</biblScope>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b29">
	<monogr>
		<author>
			<persName><forename type="first">L</forename><surname>Yang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Ding</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Wu</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2306.11489</idno>
		<title level="m">Chatgpt is not enough: Enhancing large language models with knowledge graphs for fact-aware language modeling</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b30">
	<monogr>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">C</forename><surname>Ferreira</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Van Der Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Van Miltenburg</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Krahmer</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1908.09022</idno>
		<title level="m">Neural data-to-text generation: A comparison between pipeline and end-to-end architectures</title>
				<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b31">
	<monogr>
		<author>
			<persName><forename type="first">S</forename><surname>Saha</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Yadav</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Bauer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Bansal</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2104.07644</idno>
		<title level="m">Explagraphs: An explanation graph generation task for structured commonsense reasoning</title>
				<imprint>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b32">
	<monogr>
		<author>
			<persName><forename type="first">T</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Kishore</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">Q</forename><surname>Weinberger</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Artzi</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1904.09675</idno>
		<title level="m">Bertscore: Evaluating text generation with bert</title>
				<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b33">
	<analytic>
		<title level="a" type="main">An exact graph edit distance algorithm for solving pattern recognition problems</title>
		<author>
			<persName><forename type="first">Z</forename><surname>Abu-Aisheh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Raveaux</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J.-Y</forename><surname>Ramel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Martineau</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">4th International Conference on Pattern Recognition Applications and Methods 2015</title>
				<imprint>
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b34">
	<analytic>
		<title level="a" type="main">Bleu: a method for automatic evaluation of machine translation</title>
		<author>
			<persName><forename type="first">K</forename><surname>Papineni</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Roukos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Ward</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W.-J</forename><surname>Zhu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 40th annual meeting of the Association for Computational Linguistics</title>
				<meeting>the 40th annual meeting of the Association for Computational Linguistics</meeting>
		<imprint>
			<date type="published" when="2002">2002</date>
			<biblScope unit="page" from="311" to="318" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b35">
	<analytic>
		<title level="a" type="main">Rouge: A package for automatic evaluation of summaries</title>
		<author>
			<persName><forename type="first">C.-Y</forename><surname>Lin</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Text summarization branches out</title>
				<imprint>
			<date type="published" when="2004">2004</date>
			<biblScope unit="page" from="74" to="81" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b36">
	<analytic>
		<title level="a" type="main">The webnlg challenge: Generating text from rdf data</title>
		<author>
			<persName><forename type="first">C</forename><surname>Gardent</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Shimorina</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Narayan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Perez-Beltrachini</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 10th international conference on natural language generation</title>
				<meeting>the 10th international conference on natural language generation</meeting>
		<imprint>
			<date type="published" when="2017">2017</date>
			<biblScope unit="page" from="124" to="133" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b37">
	<monogr>
		<title level="m" type="main">Exploring network structure, dynamics, and function using NetworkX</title>
		<author>
			<persName><forename type="first">A</forename><surname>Hagberg</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Swart</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Chult</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2008">2008</date>
			<pubPlace>Los Alamos, NM (United States</pubPlace>
		</imprint>
		<respStmt>
			<orgName>Los Alamos National Lab ; LANL)</orgName>
		</respStmt>
	</monogr>
	<note type="report_type">Technical Report</note>
</biblStruct>

<biblStruct xml:id="b38">
	<analytic>
		<title level="a" type="main">Qlora: Efficient finetuning of quantized llms</title>
		<author>
			<persName><forename type="first">T</forename><surname>Dettmers</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Pagnoni</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Holtzman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Zettlemoyer</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Advances in Neural Information Processing Systems</title>
		<imprint>
			<biblScope unit="volume">36</biblScope>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b39">
	<monogr>
		<author>
			<persName><forename type="first">X</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Huang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Zhou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Guo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Kang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><surname>Guo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Du</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1911.00361</idno>
		<title level="m">Adaptive precision training: Quantify back propagation in neural networks with fixed-point numbers</title>
				<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b40">
	<monogr>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">J</forename><surname>Hu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Shen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Wallis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Allen-Zhu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Chen</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2106.09685</idno>
		<title level="m">Lora: Low-rank adaptation of large language models</title>
				<imprint>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b41">
	<analytic>
		<title level="a" type="main">-bit quantization and qlora</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Belkada</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Dettmers</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Pagnoni</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Gugger</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Mangrulkar</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Making llms even more accessible with bitsandbytes</title>
				<imprint>
			<date type="published" when="2023">2023</date>
			<biblScope unit="page">4</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b42">
	<monogr>
		<author>
			<persName><forename type="first">S</forename><surname>Mangrulkar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Gugger</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Debut</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Belkada</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Paul</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Bossan</surname></persName>
		</author>
		<title level="m">Peft: State-of-theart parameter-efficient fine-tuning methods, Younes Belkada and Sayak Paul</title>
				<imprint>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
	<note>PEFT: State-of-the-art Parameter-Efficient Fine-Tuning methods</note>
</biblStruct>

<biblStruct xml:id="b43">
	<monogr>
		<author>
			<persName><forename type="first">O</forename><surname>Agarwal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Ge</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Shakeri</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Al-Rfou</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2010.12688</idno>
		<title level="m">Knowledge graph based synthetic corpus generation for knowledge-enhanced language model pre-training</title>
				<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
