<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Instruct Large Language Models for Public Administration Document Information Extraction</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Salvatore</forename><surname>Carta</surname></persName>
							<email>salvatore@unica.it</email>
							<affiliation key="aff0">
								<orgName type="department">Department of Mathematics and Computer Science</orgName>
								<orgName type="institution">University of Cagliari</orgName>
								<address>
									<addrLine>via Ospedale 72</addrLine>
									<postCode>09124</postCode>
									<settlement>Cagliari</settlement>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Alessandro</forename><surname>Giuliani</surname></persName>
							<email>alessandro.giuliani@unica.it</email>
							<affiliation key="aff0">
								<orgName type="department">Department of Mathematics and Computer Science</orgName>
								<orgName type="institution">University of Cagliari</orgName>
								<address>
									<addrLine>via Ospedale 72</addrLine>
									<postCode>09124</postCode>
									<settlement>Cagliari</settlement>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Marco</forename><surname>Manolo Manca</surname></persName>
							<email>marcom.manca@unica.it</email>
							<affiliation key="aff0">
								<orgName type="department">Department of Mathematics and Computer Science</orgName>
								<orgName type="institution">University of Cagliari</orgName>
								<address>
									<addrLine>via Ospedale 72</addrLine>
									<postCode>09124</postCode>
									<settlement>Cagliari</settlement>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Leonardo</forename><surname>Piano</surname></persName>
							<email>leonardo.piano@unica.it</email>
							<affiliation key="aff0">
								<orgName type="department">Department of Mathematics and Computer Science</orgName>
								<orgName type="institution">University of Cagliari</orgName>
								<address>
									<addrLine>via Ospedale 72</addrLine>
									<postCode>09124</postCode>
									<settlement>Cagliari</settlement>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Alessia</forename><surname>Pisu</surname></persName>
							<email>alessia.pisu96@unica.it</email>
							<affiliation key="aff0">
								<orgName type="department">Department of Mathematics and Computer Science</orgName>
								<orgName type="institution">University of Cagliari</orgName>
								<address>
									<addrLine>via Ospedale 72</addrLine>
									<postCode>09124</postCode>
									<settlement>Cagliari</settlement>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Sandro</forename><forename type="middle">Gabriele</forename><surname>Tiddia</surname></persName>
							<email>sandrog.tiddia@unica.it</email>
							<affiliation key="aff0">
								<orgName type="department">Department of Mathematics and Computer Science</orgName>
								<orgName type="institution">University of Cagliari</orgName>
								<address>
									<addrLine>via Ospedale 72</addrLine>
									<postCode>09124</postCode>
									<settlement>Cagliari</settlement>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Instruct Large Language Models for Public Administration Document Information Extraction</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">89706BE108445AB4C7EDBCBFD838F9C7</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T16:57+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Large Language Models</term>
					<term>Public Administration</term>
					<term>Tenders</term>
					<term>Italian Open Information Extraction</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>With the rapid digitization of institutions, there is an ever-increasing problem of effectively organizing and accessing information. Public Administrations (PAs) manage large volumes of disparate data from a variety of sources. Thus, these organizations would greatly benefit from AI, particularly Natural Language Processing solutions that help organize, structure, and search for information effectively. In the context of Italian PA, which we address in this paper, there are two main challenges: the lack of ontologies and the limited tools available for Italian information extraction. In this paper, we attempt to advance Information Extraction for Italian PAs by instructing a Large Language Model on a set of automatically labeled triplets of public tenders.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>The pervasive impact of Information and Communication Technologies (ICT) on our society over the past two decades is undeniable. This technological revolution has permeated every aspect of society. Such a revolution has also affected Public Administrations (PAs), radically transforming how these entities operate and interact with citizens. Digital technologies have enabled PAs to streamline processes, improve service access, and increase transparency. However, along with these opportunities, significant challenges also arise in terms of data management and internal organization. Public administrations handle vast amounts of sensitive and often disparate data from various sources. Lack of data standardization, information security, and citizen privacy are crucial issues to be addressed. In addition, data fragmentation among different systems and departments can inhibit effective information sharing and analysis. For the aforementioned reasons, PAs would benefit from technology solutions based on Machine Learning and, in particular, Natural Language Processing (NLP) to improve the organization of such fragmented information.</p><p>However, there are two major challenges. The first is the lack of appropriate resources to adequately organize PA documents. Indeed, it is crucial to organize, access, understand, and utilize information with proper struc-tures, such as knowledge graphs or ontologies, which represent a powerful solution in many domains, e.g., in online news platforms <ref type="bibr" target="#b0">[1]</ref>, health and life sciences <ref type="bibr" target="#b1">[2]</ref>, or cultural heritage <ref type="bibr" target="#b2">[3]</ref>. In this context, Open Information Extraction (OIE) <ref type="bibr" target="#b3">[4]</ref> represents the unique solution to structure and organize PA information. OIE systems usually adopt a domain-agnostic method and can extract entities and relationship triples (the main components of knowledge graphs) from any sentence written in natural language.</p><p>The second challenge is that a predominant part of the research conducted on OIE is mainly oriented toward the English language. While advancements in OIE have been notable, they often must encompass the complexities inherent in non-English languages. This linguistic bias significantly hinders the widespread applicability and effectiveness of OIE systems in multilingual contexts.</p><p>In this paper, we aim to advance the research on Open Information Extraction applied to PA by testing and exploiting the potential offered by Large Language Models (LLMs). In particular, a proper LLM is instructed with an effective strategy, employing proper Italian PA data.</p><p>The rest of the paper is structured as follows: Section 2 gives an overview of the state-of-the-art; our methodology is detailed in Section 3, whereas the experiments are described in Section 4. Section 5 reports and discusses the results, and Section 6 ends the paper with the conclusions.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Related Works</head><p>The advent of Open Information Extraction (OIE) enabled the transcendation of domain-specific constraints inherent in conventional IE methodologies. OIE meth-ods aim to identify linguistic extraction patterns, either hand-crafted or automatically learned from the data <ref type="bibr" target="#b4">[5]</ref>. Therefore, they are subdivided into rule-based or neural methods. The former include ClausIE <ref type="bibr" target="#b5">[6]</ref>, an OIE framework based on dependency parsing to detect clauses in an input sentence and subsequently extract proposition. RE-VERB <ref type="bibr" target="#b6">[7]</ref> extract the tuples by isolating relation phrases that satisfy syntactic and lexical constraints. Similarly, TEXTRUNNER <ref type="bibr" target="#b7">[8]</ref> first identifies a pair of noun phrases that are not too far apart, and then it applies a classifier to determine whether or not to extract a relationship. Further works rely on a proper strategy for combining different OIE tools for triplet generation and filtering <ref type="bibr" target="#b8">[9]</ref>. A pioneering proposal regarding the more recent Neural methods is the work of Stanovsky et al. <ref type="bibr" target="#b9">[10]</ref>, wherein OIE is treated as a sequence labeling problem, and an LSTM-transducer automatically extracts triplets. Zhan and Zhao <ref type="bibr" target="#b10">[11]</ref> introduced a span model for n-ary Open Information Extraction. More recently, Kolluru et al. <ref type="bibr" target="#b11">[12]</ref> introduced IMOJIE a neural Open Information Extraction system that follows an iterative approach where the triplet extraction is conditioned by the previously retrieved triplets, with the aim of reducing redundancy.</p><p>The methods above have been developed or tested specifically for English textual corpus. Regarding the Italian language, no significant research has been conducted on Italian Open IE until the last decade. To date, only a few works have addressed such a challenge. Damiano et al. proposed ItalIE <ref type="bibr" target="#b12">[13]</ref>, a clause-based OIE system inspired by ClausIE aimed at extracting n-ary coherent propositions from simple sentences. Sentences are analyzed to identify and categorize clauses based on seven predefined patterns specific to the Italian language. Guarasci et al. <ref type="bibr" target="#b13">[14]</ref> presented an OIE method for Italian single-verb sentences based on Lexicon-Grammar tables. The system employs linguistic structures and patterns of verbal behavior to identify arguments, match patterns, and generate propositions, demonstrating effectiveness in generating syntactically and semantically valid propositions for the Italian language. Finally, <ref type="bibr" target="#b14">[15]</ref> proposed OIE4PA, an Open IE framework that can identify facts from Public Administration documents. Leveraging the proposal of Siciliani et al. <ref type="bibr" target="#b14">[15]</ref>, in this work, we proposed an Instructed Large Language model for Italian Open Information Extraction specialized in Public Administration Documents.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Methodology</head><p>We propose a novel model for automated Information Extraction for Italian PAs by instructing an LLM on a set of automatically labeled triplets of public tenders. To this end, we devise a proper strategy to train an LLM with a suitable set of triplets and instructions. The entire process is depicted in Figure <ref type="figure" target="#fig_0">1</ref>.</p><p>Our method involves two stages. In detail, the process first performs a step aimed at obtaining a correctly annotated set of triplets (Triplet Auto-Labeling), which is subsequently used to train the LLM (Instruction Tuning). Each step is described in the following.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">Triplet Auto-Labeling</head><p>The first step of our methodology is training a Sequence Classifier Language Model to identify meaningful triplets within the PA context. To accomplish this, we leveraged the dataset OIE4PA, consisting of a collection of triplets extracted from Italian tenders of the Apulia region <ref type="bibr" target="#b14">[15]</ref>. In particular, each triplet is extracted with the WikiOIE framework <ref type="bibr" target="#b15">[16]</ref>. Specifically, the dataset is organized into two sets: a labeled set ℒ, which contains a subset of 2000 binary triplets labeled by humans as valid or not, and an unlabeled set 𝒰 of 14,096 triplets, together with the original sentences. Then, at this stage, we exploited the ℒ set to properly train a classifier to distinguish between valid and invalid triplets. To do this, we treated this task as a sentence classification problem, concatenating triplets into a single sentence and separating subject, predicate, and object by a semicolon. To this end, we identified three suitable Language Models (LMs) for this task, namely Italian-bert, LegalBert <ref type="bibr" target="#b16">[17]</ref>, and BureauBERTo <ref type="bibr" target="#b17">[18]</ref>. The former is a Bert base model <ref type="bibr" target="#b18">[19]</ref> fine-tuned on Italian corpus, the second is a fine-tuned version of Italian Bert on Italian civil law corpora, and the last is an UmBERTO model fine-tuned on PA, banking, and insurances corpus. Table <ref type="table" target="#tab_1">1</ref> outlines the results obtained by these three Language Models on the triplet classification task. Finally, the trained most accurate classifier has been employed to label the triplets of the U set, forming a new 𝒜ℒ (Auto-Labeled) set, which in turn will be exploited to instruct the Large Language Model for the OIE task.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">Instruction Tuning</head><p>Instruction tuning is an innovative strategy that involves guiding a language model through human-like instructions to improve its performance on a specific task. Unlike traditional methods that rely solely on large-scale training data, instruction tuning provides targeted guidance, allowing the model to adapt and refine its behaviour toward desired outcomes. Incorporating human-like instructions enhances the model's understanding and improves its ability to generate contextually relevant responses. In summary, given a source text and taskspecific instructions, the model is trained to create a sequence of tokens representing the desired output.</p><p>To instruct an LLM to perform Open Information Extraction, we transformed the 𝒜ℒ triplets set into an instruction dataset-In particular, each auto-labeled triplet </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.1.">Task instruction</head><p>Task instructions provide a detailed statement on accomplishing the desired task and properly structuring the output. In detail, we formulated the following instruction to query LLM: &lt;Trova quali triple semantiche esistono nel testo. Formatta l'output come [Soggetto;Predicato;Oggetto]&gt;.</p><p>We formulate the instruction in Italian to make the model immediately understand that we are referring to the Italian language. The translation in English of the instruction is:"Find which semantic triples exist in the text, Format the output as [Subject; Predicate; Object]".</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.2.">Input text</head><p>The input text represents the sentences in which LLM has to perform the task defined by the instructions. In detail, each sentence is the original text excerpt from which a triplet belonging to the dataset OIE4PA has been extracted.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.3.">Response</head><p>The response represents the desired output. In our case, the input sentence was transformed into an open triplet. We also specify that to instruct the model to distinguish sentences where a triplet can be extracted from sentences where no useful triplets exist, we included the triplet as a response if it was labeled as valid by the classifier; otherwise, we leave an empty string.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Experimental settings</head><p>We adopted the Flan-T5 family <ref type="bibr" target="#b19">[20]</ref> as an instruction model. Such a choice is motivated by two reasons: first, prior research <ref type="bibr" target="#b20">[21]</ref> has demonstrated the potential of such models in Information Extraction tasks, eventually outperforming larger models such as LLama2 or similar, resulting in a perfect trade-off between speed of inference and prediction quality. The other main benefit is that Flan-T5 is a multi-language model, which is also suitable for tasks related to understanding Italian. We tested with two different T5-Flan sizes flan-xxl (11b) and flan-xl (3b) adopting for both the OIE4PA dataset, relying on a split of 80% and 20% for training and test, respectively. We fine-tuned the models for efficiency and hardware reasons by exploiting QLora<ref type="foot" target="#foot_0">1</ref> with a 4-bit quantization, allowing faster training and saving GPU memory. All experiments were conducted with an Nvidia RTX A6000 GPU machine with 48 GB of VRAM. We train both models for one epoch, and we adopt the following QLora settings and hyperparameters: </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Lora</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.">Evaluation Metrics</head><p>To properly apply such metrics for the triplets evaluation, we considered as true positive (TP) a non-empty triplet that matches with the corresponding triplet in the ground truth (i.e., the triples belonging to the 𝒜ℒ set), true negative (TN) a triple returned as an empty string by the model and labeled as invalid in the ground truth, false positive (FP) a triplet that was labeled as invalid but retrieved by the model, and false negative (FN) when the model returned an empty string rather than a valid triplet.</p><p>In doing so, we can evaluate the performances in terms of classical confusion matrix metrics, i.e., accuracy (a), precision (p), recall (r), and F1 score (F1); whose formulae are:  </p><formula xml:id="formula_0">𝑎 = 𝑇 𝑃 + 𝑇 𝑁 𝑇 𝑃 + 𝑇 𝑁 + 𝐹 𝑃 + 𝐹 𝑁 𝑝 = 𝑇 𝑃 𝑇 𝑃 + 𝐹 𝑃 𝑟 = 𝑇 𝑃 𝑇 𝑃 + 𝐹 𝑁 𝐹 1 = 2 * 𝑝 * 𝑟 𝑝 + 𝑟</formula></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Results</head></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Instructed model training.</figDesc><graphic coords="3,89.29,84.20,416.63,115.19" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 1</head><label>1</label><figDesc>reports the comparisons of three different Italian Bert models for the triplet classification task. In detail, the selcted models are LegalBERT-ITA 2 , BertBase-ITA 3 , and BureauBERTo 4 . The best model turns out to be Bu-reauBerto, probably due to the fact that it is the only model pre-trained on Public Administration corpora.</figDesc><table /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 1</head><label>1</label><figDesc>Bert triplet classification results in terms of accuracy (𝑎), precision (𝑟), recall (𝑟), and F1 score (𝐹 1). To this end, we plan to create new datasets in the future to develop a new set of foundational models for information extraction in Italian, with a particular focus on PAs and other administrative entities.</figDesc><table><row><cell>Model</cell><cell>𝑎</cell><cell>𝑝</cell><cell>𝑟</cell><cell>𝐹 1</cell></row><row><cell>LegalBERT-ITA</cell><cell>0.935</cell><cell>0.953</cell><cell>0.897</cell><cell>0.919</cell></row><row><cell>BertBase-ITA</cell><cell>0.927</cell><cell>0.935</cell><cell>0.894</cell><cell>0.911</cell></row><row><cell>BureauBERTo</cell><cell cols="4">0.945 0.963 0.901 0.932</cell></row><row><cell cols="5">Table 2 outlines the result of the two fine-tuned Flan-</cell></row><row><cell cols="5">T5 models on extracting triplets in procurement texts.</cell></row><row><cell cols="5">Both model sizes show excellent results for all metrics; in</cell></row><row><cell cols="5">particular, recall is significantly high, demonstrating that</cell></row><row><cell cols="5">the models are quite effective in finding a large number</cell></row><row><cell cols="5">of true positives (e.g., valid triplets). It is also good to note</cell></row><row><cell cols="5">that the values are higher for the model with a higher</cell></row><row><cell cols="5">number of parameters. Therefore, the promising results</cell></row><row><cell cols="5">support the thesis of leveraging Instruction Tuning to</cell></row><row><cell cols="5">build strong Open Information Extraction models for</cell></row><row><cell cols="3">Italian public administrations.</cell><cell></cell><cell></cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_3"><head>Table 2</head><label>2</label><figDesc>FLAN-OpenIE results on OIE4PA dataset in terms of accuracy (𝑎), precision (𝑟), recall (𝑟), and F1 score (𝐹 1).Considering the significant gap between information extraction available for English and other resourceconstrained languages such as Italian, we explored an Instruction Tuning approach to perform Open Information Extraction on Italian Public Tenders in this paper. A proper LLM is instructed with an effective two-stage strategy, in which a language model-based classifier is trained on a proper Italian PA dataset to obtain a set of correct triplets, which are used to instruct a suitable LLM. The promising experiments have validated the assumptions pointed out in the paper and incentivized future developments aimed at developing new datasets and models capable of theoretically understanding and structuring technical texts in Italian in the form of semantics triplets.</figDesc><table><row><cell>Model</cell><cell>𝑎</cell><cell>𝑝</cell><cell>𝑟</cell><cell>𝐹 1</cell></row><row><cell>T5-xl</cell><cell>0.78</cell><cell>0.74</cell><cell>0.97</cell><cell>0.84</cell></row><row><cell>T5-xxl</cell><cell cols="4">0.82 0.78 0.99 0.87</cell></row><row><cell cols="2">6. Conclusions</cell><cell></cell><cell></cell><cell></cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">https://github.com/artidoro/qlora</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_1">https://huggingface.co/dlicari/Italian-Legal-BERT</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_2">https://huggingface.co/dbmdz/bert-base-italian-uncased</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="4" xml:id="foot_3">https://huggingface.co/colinglab/BureauBERTo</note>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgments</head><p>This work has been partially carried out thanks to the Ministerial Decree no. 351 of 9th April 2022, based on the NRRP -funded by the European Union -NextGenera-tionEU -Mission 4 "Education and Research", Component 1 "Enhancement of the offer of educational services: from nurseries to universities" -Investment 4.1, that provided a financial support for the Leonardo Piano's doctoral pathway. Also, Alessia Pisu acknowledge MUR and EU-FSE for financial support of the PON Research and Innovation 2014-2020 (D.M. 1061/2021). Furthermore, we acknowledge financial support under the National Recovery and Resilience Plan (NRRP), Mission 4 Component 2 Investment 1.5 -Call for tender No.3277 published on December 30, 2021 by the Italian Ministry of University and Research (MUR) funded by the European Union -NextGenerationEU. Project Code ECS0000038 -Project Title eINS Ecosystem of Innovation for Next Generation Sardinia -CUP F53C22000430001-Grant Assignment Decree No. 1056 adopted on June 23, 2022 by the Italian Ministry of University and Research (MUR).</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Searching news articles using an event knowledge graph leveraged by wikidata</title>
		<author>
			<persName><forename type="first">C</forename><surname>Rudnik</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Ehrhart</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Ferret</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Teyssou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Troncy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Tannier</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Companion of The 2019 World Wide Web Conference, WWW 2019</title>
				<editor>
			<persName><forename type="first">S</forename><surname>Amer-Yahia</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">M</forename><surname>Mahdian</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">A</forename><surname>Goel</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">G</forename><surname>Houben</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">K</forename><surname>Lerman</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">J</forename><forename type="middle">J</forename><surname>Mcauley</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">R</forename><surname>Baeza-Yates</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">L</forename><surname>Zia</surname></persName>
		</editor>
		<meeting><address><addrLine>San Francisco, CA, USA</addrLine></address></meeting>
		<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2019">May 13-17, 2019. 2019</date>
			<biblScope unit="page" from="1232" to="1239" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Knowlife: A knowledge graph for health and life sciences</title>
		<author>
			<persName><forename type="first">P</forename><surname>Ernst</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Meng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Siu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Weikum</surname></persName>
		</author>
		<idno type="DOI">10.1109/ICDE.2014.6816754</idno>
	</analytic>
	<monogr>
		<title level="m">IEEE 30th International Conference on Data Engineering</title>
				<imprint>
			<date type="published" when="2014">2014. 2014</date>
			<biblScope unit="page" from="1254" to="1257" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<monogr>
		<title level="m" type="main">Empowering digital transformation in tourism through intelligent methods for representation and exploitation of cultural heritage knowledge</title>
		<author>
			<persName><forename type="first">S</forename><surname>Carta</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Fenu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Giuliani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">M</forename><surname>Manca</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Marras</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Piano</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">S</forename><surname>Podda</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Pompianu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">G</forename><surname>Tiddia</surname></persName>
		</author>
		<ptr target="https://www.scopus.com/inward/record.uri?eid=2-s2.0-85177612618&amp;partnerID=40&amp;md5=7e8334f126d9385a733fbfb0d1674f19" />
		<imprint>
			<date type="published" when="2023">2023</date>
			<biblScope unit="volume">3536</biblScope>
			<biblScope unit="page" from="83" to="91" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Open information extraction from the web</title>
		<author>
			<persName><forename type="first">M</forename><surname>Banko</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">J</forename><surname>Cafarella</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Soderland</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Broadhead</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Etzioni</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 20th International Joint Conference on Artifical Intelligence, IJCAI&apos;07</title>
				<meeting>the 20th International Joint Conference on Artifical Intelligence, IJCAI&apos;07<address><addrLine>San Francisco, CA, USA</addrLine></address></meeting>
		<imprint>
			<publisher>Morgan Kaufmann Publishers Inc</publisher>
			<date type="published" when="2007">2007</date>
			<biblScope unit="page" from="2670" to="2676" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">A survey on open information extraction</title>
		<author>
			<persName><forename type="first">C</forename><surname>Niklaus</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Cetto</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Freitas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Handschuh</surname></persName>
		</author>
		<ptr target="https://aclanthology.org/C18-1326" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 27th International Conference on Computational Linguistics</title>
				<editor>
			<persName><forename type="first">E</forename><forename type="middle">M</forename><surname>Bender</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">L</forename><surname>Derczynski</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">P</forename><surname>Isabelle</surname></persName>
		</editor>
		<meeting>the 27th International Conference on Computational Linguistics<address><addrLine>Santa Fe, New Mexico, USA</addrLine></address></meeting>
		<imprint>
			<publisher>Association for Computational Linguistics</publisher>
			<date type="published" when="2018">2018</date>
			<biblScope unit="page" from="3866" to="3878" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Clausie: clause-based open information extraction</title>
		<author>
			<persName><forename type="first">L</forename><surname>Del Corro</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Gemulla</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 22nd international conference on World Wide Web</title>
				<meeting>the 22nd international conference on World Wide Web</meeting>
		<imprint>
			<date type="published" when="2013">2013</date>
			<biblScope unit="page" from="355" to="366" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Identifying relations for open information extraction</title>
		<author>
			<persName><forename type="first">A</forename><surname>Fader</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Soderland</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Etzioni</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Conference on Empirical Methods in Natural Language Processing</title>
				<imprint>
			<date type="published" when="2011">2011</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<monogr>
		<author>
			<persName><forename type="first">A</forename><surname>Yates</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Banko</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Broadhead</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">J</forename><surname>Cafarella</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Etzioni</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Soderland</surname></persName>
		</author>
		<ptr target="https://api.semanticscholar.org/CorpusID:1455080" />
		<title level="m">North American Chapter of the Association for Computational Linguistics</title>
				<imprint>
			<date type="published" when="2007">2007</date>
		</imprint>
	</monogr>
	<note>Textrunner: Open information extraction on the web</note>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Sailgenie: Sailing expertise to knowledge graph through open information extraction</title>
		<author>
			<persName><forename type="first">S</forename><surname>Carta</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Fariello</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Giuliani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Piano</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">S</forename><surname>Podda</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">G</forename><surname>Tiddia</surname></persName>
		</author>
		<idno type="DOI">10.1016/J.PROCS.2023.10.213</idno>
		<ptr target="https://doi.org/10.1016/j.procs.2023.10.213.doi:10.1016/J.PROCS.2023.10.213" />
	</analytic>
	<monogr>
		<title level="m">Knowledge-Based and Intelligent Information &amp; Engineering Systems: Proceedings of the 27th International Conference KES-2023</title>
		<title level="s">Procedia Computer Science</title>
		<editor>
			<persName><forename type="first">G</forename><forename type="middle">A</forename><surname>Tsihrintzis</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">C</forename><surname>Toro</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">S</forename><forename type="middle">A</forename><surname>Ríos</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">R</forename><forename type="middle">J</forename><surname>Howlett</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">L</forename><forename type="middle">C</forename><surname>Jain</surname></persName>
		</editor>
		<meeting><address><addrLine>Athens, Greece</addrLine></address></meeting>
		<imprint>
			<publisher>Elsevier</publisher>
			<date type="published" when="2023-09-08">6-8 September 2023. 2023</date>
			<biblScope unit="volume">225</biblScope>
			<biblScope unit="page" from="2224" to="2233" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Supervised open information extraction</title>
		<author>
			<persName><forename type="first">G</forename><surname>Stanovsky</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Michael</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Zettlemoyer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Dagan</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">North American Chapter of the Association for Computational Linguistics</title>
				<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Span model for open information extraction on accurate corpus</title>
		<author>
			<persName><forename type="first">J</forename><surname>Zhan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Zhao</surname></persName>
		</author>
		<ptr target="https://api.semanticscholar.org/CorpusID:208138002" />
	</analytic>
	<monogr>
		<title level="m">AAAI Conference on Artificial Intelligence</title>
				<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<monogr>
		<author>
			<persName><forename type="first">K</forename><surname>Kolluru</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Aggarwal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Rathore</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Mausam</surname></persName>
		</author>
		<author>
			<persName><surname>Chakrabarti</surname></persName>
		</author>
		<idno>ArXiv abs/2005.08178</idno>
		<ptr target="https://api.semanticscholar.org/CorpusID:218674382" />
		<title level="m">Imojie: Iterative memorybased joint open information extraction</title>
				<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Open information extraction for italian sentences</title>
		<author>
			<persName><forename type="first">E</forename><surname>Damiano</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Minutolo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Esposito</surname></persName>
		</author>
		<idno type="DOI">10.1109/WAINA.2018.00165</idno>
	</analytic>
	<monogr>
		<title level="m">2018 32nd International Conference on Advanced Information Networking and Applications Workshops (WAINA)</title>
				<imprint>
			<date type="published" when="2018">2018</date>
			<biblScope unit="page" from="668" to="673" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Lexicon-grammar based open information extraction from natural language sentences in italian</title>
		<author>
			<persName><forename type="first">R</forename><surname>Guarasci</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Damiano</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Minutolo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Esposito</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">De</forename><surname>Pietro</surname></persName>
		</author>
		<idno type="DOI">10.1016/j.eswa.2019.112954</idno>
		<ptr target="https://doi.org/10.1016/j.eswa.2019.112954" />
	</analytic>
	<monogr>
		<title level="j">Expert Systems with Applications</title>
		<imprint>
			<biblScope unit="volume">143</biblScope>
			<biblScope unit="page">112954</biblScope>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Oie4pa: open information extraction for the public administration</title>
		<author>
			<persName><forename type="first">L</forename><surname>Siciliani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Ghizzota</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Basile</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Lops</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Intelligent Information Systems</title>
		<imprint>
			<biblScope unit="page" from="1" to="22" />
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<monogr>
		<author>
			<persName><forename type="first">L</forename><surname>Siciliani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Cassotti</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Basile</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>De Gemmis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Lops</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Semeraro</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Moro</surname></persName>
		</author>
		<title level="m">Extracting relations from italian wikipedia using self-training</title>
				<imprint>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">ITALIAN-LEGAL-BERT: A Pre-trained Transformer Language Model for Italian Law</title>
		<author>
			<persName><forename type="first">D</forename><surname>Licari</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Comandè</surname></persName>
		</author>
		<ptr target="iSSN:1613-0073" />
	</analytic>
	<monogr>
		<title level="m">Companion Proceedings of the 23rd International Conference on Knowledge Engineering and Knowledge Management</title>
		<title level="s">CEUR Workshop Proceedings</title>
		<editor>
			<persName><forename type="first">D</forename><surname>Symeonidou</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">R</forename><surname>Yu</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">D</forename><surname>Ceolin</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">M</forename><surname>Poveda-Villalón</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">D</forename><surname>Audrito</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">L</forename><forename type="middle">D</forename><surname>Caro</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">F</forename><surname>Grasso</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">R</forename><surname>Nai</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">E</forename><surname>Sulis</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">F</forename><forename type="middle">J</forename><surname>Ekaputra</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">O</forename><surname>Kutz</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">N</forename><surname>Troquard</surname></persName>
		</editor>
		<meeting><address><addrLine>Bozen-Bolzano, Italy</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2022">2022</date>
			<biblScope unit="volume">3256</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<monogr>
		<title level="m" type="main">Bureauberto: adapting umberto to the italian bureaucratic language</title>
		<author>
			<persName><forename type="first">S</forename><surname>Auriemma</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Madeddu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Miliani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Bondielli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">C</forename><surname>Passaro</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Lenci</surname></persName>
		</author>
		<ptr target="https://api.semanticscholar.org/CorpusID:262088765" />
		<imprint>
			<date type="published" when="2023">2023</date>
			<pubPlace>Ital-IA</pubPlace>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">Bert: Pre-training of deep bidirectional transformers for language understanding</title>
		<author>
			<persName><forename type="first">J</forename><surname>Devlin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M.-W</forename><surname>Chang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Toutanova</surname></persName>
		</author>
		<ptr target="https://api.semanticscholar.org/CorpusID:52967399" />
	</analytic>
	<monogr>
		<title level="m">North American Chapter of the Association for Computational Linguistics</title>
				<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<monogr>
		<author>
			<persName><forename type="first">H</forename><forename type="middle">W</forename><surname>Chung</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Hou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Longpre</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Zoph</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Tay</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Fedus</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Dehghani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Brahma</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2210.11416</idno>
		<title level="m">Scaling instruction-finetuned language models</title>
				<imprint>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">Revisiting relation extraction in the era of large language models</title>
		<author>
			<persName><forename type="first">S</forename><surname>Wadhwa</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Amir</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">C</forename><surname>Wallace</surname></persName>
		</author>
		<ptr target="https://api.semanticscholar.org/CorpusID:258564662" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the conference. Association for Computational Linguistics. Meeting 2023</title>
				<meeting>the conference. Association for Computational Linguistics. Meeting 2023</meeting>
		<imprint>
			<date type="published" when="2023">2023</date>
			<biblScope unit="page" from="15566" to="15589" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
