<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Leveraging Large Language Models for Fact Verification in Italian</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Antonio</forename><surname>Scaiella</surname></persName>
							<email>scaiella@revealsrl.it</email>
							<affiliation key="aff0">
								<orgName type="department">Department of Enterprise Engineering</orgName>
								<orgName type="institution">University of Rome Tor Vergata</orgName>
								<address>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Stefano</forename><surname>Costanzo</surname></persName>
							<email>stefano.costanzo@students.uniroma2.eu</email>
							<affiliation key="aff0">
								<orgName type="department">Department of Enterprise Engineering</orgName>
								<orgName type="institution">University of Rome Tor Vergata</orgName>
								<address>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Elisa</forename><surname>Passone</surname></persName>
							<email>passone@ing.uniroma2.it</email>
							<affiliation key="aff0">
								<orgName type="department">Department of Enterprise Engineering</orgName>
								<orgName type="institution">University of Rome Tor Vergata</orgName>
								<address>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Danilo</forename><surname>Croce</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Department of Enterprise Engineering</orgName>
								<orgName type="institution">University of Rome Tor Vergata</orgName>
								<address>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Giorgio</forename><surname>Gambosi</surname></persName>
							<email>giorgio.gambosi@uniroma2.it</email>
							<affiliation key="aff0">
								<orgName type="department">Department of Enterprise Engineering</orgName>
								<orgName type="institution">University of Rome Tor Vergata</orgName>
								<address>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Leveraging Large Language Models for Fact Verification in Italian</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">CD1EE257F552159C85A15C4548137EA2</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T17:36+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Automatic Fact Checking</term>
					<term>Fact Checking in Italian</term>
					<term>Resource in Italian</term>
					<term>Large Language Model for Fact Verification</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>In recent years, Automatic Fact Checking has become a crucial tool for combating fake news by leveraging AI to verify the accuracy of information. Despite significant advancements, most datasets and models are predominantly available in English, posing challenges for other languages. This paper presents an Italian resource based on the dataset made available in the FEVER evaluation campaign, created to train and evaluate fact-checking models in Italian. The dataset comprises approximately 240k examples, with over 2k test examples manually validated. Additionally, we fine-tuned a state-of-the-art LLM, namely LLaMA3, on both the original English and translated Italian datasets, demonstrating that fine-tuning significantly improves model performance. Our results suggest that the fine-tuned models achieve comparable accuracy in both languages, highlighting the value of the proposed resource.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>In recent years, Automatic Fact Checking (AFC) has assumed a significant role as an instrument to identify fake news. AFC is a process that verifies the truthfulness and accuracy of information, claims, and data contained in a text or speech. The focus is on debunking disinformation and misinformation, intercepting errors, and verifying sources and facts.</p><p>Automated fact-checking uses AI tools to identify, verify, and respond to misleading claims, using techniques based on natural language processing, machine learning, knowledge representation, and databases to automatically predict the truthfulness of claims <ref type="bibr" target="#b0">[1]</ref>. This is a complex process that involves searching, interpreting, and assessing information. As discussed in <ref type="bibr" target="#b0">[1]</ref> a NLP framework for automated fact-checking consists of three stages: claim detection to identify claims that require verification; evidence retrieval to find sources supporting or refuting the claim; and claim verification to assess the truthfulness of the claim based on the retrieved evidence.</p><p>At first, automating the fact-checking process has been discussed in the context of computational journalism in works like <ref type="bibr" target="#b1">[2]</ref>, and has received significant attention in the computational linguistics and, in general, the artifi-cial intelligence communities, surveyed in <ref type="bibr" target="#b0">[1]</ref> and more recently in <ref type="bibr" target="#b2">[3]</ref> and <ref type="bibr" target="#b3">[4]</ref>. In particular, in <ref type="bibr" target="#b0">[1]</ref> the authors expose a survey on the topic, describing the early developments that were surveyed in <ref type="bibr" target="#b4">[5]</ref>, which is an exhaustive overview of the subject.</p><p>As with most machine learning paradigms <ref type="bibr" target="#b0">[1]</ref>, stateof-the-art methods require datasets and benchmarks.</p><p>One of the most impactful campaigns for collecting a large-scale benchmark is FEVER (Fact Extraction and VERification) <ref type="bibr" target="#b5">[6]</ref>. In this context, fact-checking involves verifying whether a claim is supported by one or more pieces of evidence. FEVER is a publicly available dataset designed for claim verification against textual sources. It comprises about 180K claims generated by altering sentences extracted from Wikipedia. The claims are classified into three categories: Supported (a piece of evidence exists and it supports the claim), Refutes (a piece of evidence exists and it contradicts the claim), or NotE-noughInfo (there is insufficient evidence to verify the claim). The challenge, therefore, is to retrieve the relevant evidence and verify the accuracy of the claims, categorizing them with the correct label.</p><p>Many works like FEVER have recently focused on building data and datasets for the task of Fact Verification, achieving very good results <ref type="bibr" target="#b6">[7,</ref><ref type="bibr" target="#b7">8,</ref><ref type="bibr" target="#b8">9,</ref><ref type="bibr" target="#b9">10,</ref><ref type="bibr" target="#b10">11,</ref><ref type="bibr" target="#b11">12]</ref>. However, all of these datasets are designed for the English language. Although multilingual models exist (e.g., in <ref type="bibr" target="#b12">[13,</ref><ref type="bibr" target="#b13">14]</ref>), finetuning a model on a specific language, pre-training it for a specific task and use case, could lead to a significant decline in quality if applied to another language. Few studies have worked on training models for languages other than English. An example is the work presented in <ref type="bibr" target="#b14">[15]</ref>, which focuses on developing automated claim detection for Dutch-language fact-checkers.</p><p>In this work, we propose a FEVER-IT dataset in which the FEVER dataset has been translated into Italian to train the model for the Italian language. Inspired by SQUAD-IT <ref type="bibr" target="#b15">[16]</ref> and MSCOCO-IT <ref type="bibr" target="#b16">[17]</ref>, we worked to obtain quality data. Although the training set may be affected by translation errors, the test set will not, as it is composed of manually validated data. Furthermore, while the original FEVER dataset contained evidence only for Supports and Refutes, in this work we have also added and translated examples for the NotEnoughInfo category using the heuristics proposed in <ref type="bibr" target="#b17">[18]</ref>. This work extends the experience described in <ref type="bibr" target="#b18">[19]</ref>, where translations were done using Google API, by using publicly available models ( <ref type="bibr" target="#b19">[20]</ref>) and adding data for the NotEnoughInfo category.</p><p>The contribution of this work is twofold. Firstly, we release FEVER-IT, a corpus with 228K claims each associated with at least one (possibly useful) piece of evidence, including a test set of 2,000 manually validated claims. In addition, we fine-tuned and validated a state-of-theart model, LLaMA3 <ref type="bibr" target="#b13">[14]</ref>, on both the original English dataset and the Italian dataset. While this provides a high-performance model ready for the task in both languages, the primary goal is to assess whether the quality of the Italian data is comparable to the English one. By training the model separately on each dataset, we can evaluate its stability: if the model performs similarly on the manually validated Italian dataset and the English test set, we can conclude that the quality of the Italian data is on par with the English data.</p><p>Additionally, we want to assess whether using an Italian train dataset, despite the noise from automatic translation, is truly beneficial. LLMs like LLaMA3 can already perform tasks in other languages through zero-shot or few-shot learning, without requiring fine-tuning on a specific dataset, especially if that dataset is noisy. Therefore, we aim to compare the performance on the test set between a LLaMA3 model that hasn't been fine-tuned on the noisy Italian data and one that has been fine-tuned, to determine whether fine-tuning actually improves results or if the model performs on par or better without it.</p><p>The experimental results show that the model without fine-tuning achieves an average accuracy of only about 45%. Fine-tuning on the English dataset yields about 90% mean accuracy, while fine-tuning on the Italian dataset results in a percentage quite similar to the fine-tuned English model and much greater than testing without fine-tuning <ref type="foot" target="#foot_0">1</ref> .</p><p>The remainder of the paper is organized as follows: Section 2 discusses related work, Section 3 presents FEVER-IT, Section 4 details the experimental measures, and Section 5 provides the conclusions.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Related Work</head><p>One of the pioneering works in autonomous factchecking was conducted by <ref type="bibr" target="#b20">[21]</ref>, which proposed creating publicly available datasets and developing automated systems using natural language processing technologies. Recent challenges such as CheckThat! at CLEF <ref type="bibr" target="#b9">[10,</ref><ref type="bibr" target="#b10">11,</ref><ref type="bibr" target="#b11">12]</ref> and Fever <ref type="bibr" target="#b6">[7,</ref><ref type="bibr" target="#b7">8,</ref><ref type="bibr" target="#b8">9]</ref> from 2018 have advanced fact-checking tasks by leveraging advanced approaches and integrating Large Language Models (LLMs) like BERT and GPT. These models represent the current state of the art in many Natural Language Processing tasks, including fact-checking. Notable examples of such technology include FacTeR-Check <ref type="bibr" target="#b21">[22]</ref>, a multilingual architecture for semi-automated fact-checking and hoax propagation analysis using the XLM-RoBERTa Transformer <ref type="bibr" target="#b12">[13]</ref>, and FACT-GPT <ref type="bibr" target="#b22">[23]</ref>, a framework that automates the claimmatching phase of fact-checking using LLMs to identify social media content that supports or contradicts claims previously debunked by fact-checkers.</p><p>The success of these systems is largely due to the capabilities of LLMs as summarized in <ref type="bibr" target="#b2">[3]</ref>, which are neural models based on the Transformer architecture. Specifically, decoder-based architectures, such as GPT <ref type="bibr" target="#b23">[24]</ref>, GPT-3 <ref type="bibr" target="#b24">[25]</ref>, and LLaMA <ref type="bibr" target="#b13">[14]</ref>, generate output sequences in an auto-regressive manner. These models have demonstrated impressive capabilities following pre-training on large collections of documents. One notable outcome is few-shot learning, where models can adapt to new tasks with only a few examples <ref type="bibr" target="#b24">[25]</ref>, greatly enhancing their flexibility and applicability.</p><p>When new annotated data is available, fine-tuning further enhances a model's capabilities. This process involves taking the pre-trained base model and training it on a smaller, specialized dataset relevant to the desired task. Parameter Efficient Fine-Tuning (PEFT) is an optimized technique that involves training only a small portion of the weights, typically by adding a new layer to the model. One widely used technique is LoRA <ref type="bibr" target="#b25">[26]</ref>, which adds an adapter consisting of two matrices of weights that are relatively small compared to the original model. Extremita <ref type="bibr" target="#b26">[27]</ref> is an example of a decoder-based model fine-tuned with LoRA in Italian for multi-task executions.</p><p>Several benchmark datasets have been developed to fine-tune and evaluate fact-checking systems, typically collected by organizations like Snopes, FullFact, and Poli-tiFact. The FEVER challenge has produced four major datasets: FEVER (2018) <ref type="bibr" target="#b5">[6]</ref>, FEVER 2.0 (2019) <ref type="bibr" target="#b7">[8]</ref>, FEVER-OUS (2021) <ref type="bibr" target="#b8">[9]</ref>, and AVeriTeC (2024) <ref type="bibr" target="#b27">[28]</ref>. These datasets range from labeled claim-evidence associations to verified claims with structured and unstructured evidence. Despite the wealth of resources available, there is a lack of large benchmark datasets in Italian. This work addresses this gap by providing a large-scale Italian resource.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Fact Verification in Italian</head><p>As in <ref type="bibr" target="#b5">[6]</ref>, the original FEVER dataset is composed of claims that can potentially be verified against an encyclopedic resource, in this case, Wikipedia. The claims are classified into three categories: Supported, Refutes and NotEnoughInfo. For the first two categories, each claim is associated with one or more passages from Wikipedia, each specifying the page from which it was extracted. For the NotEnoughInfo category, no passages are provided because no information was found on Wikipedia to support or refute the claim. For instance, the sentence "Dan Brown is illiterate." is a claim associated with pieces of evidence such as: "Angels and Demons is a 2000 bestselling mystery-thriller novel written by American author Dan Brown and published by Pocket Books and then by Corgi Books.". These pieces of evidence prove that the claim is incorrect, so it can be classified with the label Refutes. In FEVER, a claim is thus a sentence that expresses information (true or mutated) about a target entity.</p><p>To generate the Italian dataset, we started from the dataset version<ref type="foot" target="#foot_1">2</ref> proposed in <ref type="bibr" target="#b28">[29]</ref>, which consists of 260k claims. This version extends the original FEVER by adding evidence associated with claims justified as NotE-noughInfo in FEVER, using the heuristics in <ref type="bibr" target="#b17">[18]</ref>. The approach involved using a search engine to retrieve potential evidence and a textual entailment system based on GPT <ref type="bibr" target="#b23">[24]</ref>. Claims not judged as Supports or Refutes were classified as NotEnoughInfo.</p><p>This gives us examples of sentences that are closely related to the claim (according to the search engine) but neither support nor refute it. This makes it more straightforward and efficient to train and/or evaluate a classifier, even though some of the derived examples might be somewhat noisy, as they were generated through heuristics.</p><p>For the automatic translation process, we utilized MADLAD400 <ref type="bibr" target="#b19">[20]</ref>, a machine translation system based on the Transformer architecture<ref type="foot" target="#foot_2">3</ref> , trained on MADLAD, a manually audited, general domain 3T token multilingual dataset based on CommonCrawl, spanning 419 languages. Since the Italian data are obtained through machine translation, and thus potentially incorrect as suggested in <ref type="bibr" target="#b15">[16,</ref><ref type="bibr" target="#b16">17]</ref>, we needed validated test data to obtain a realistic benchmark. Our hypothesis is that an LLM is robust enough to generalize from the 228k examples and recognize the relationships involved in FEVER without inheriting translation errors. However, to prevent these errors from being inherited by the model, we manually corrected the translations of the test set.</p><p>Out of the approximately 16k available test examples, three annotators were involved in verifying and correcting 2, 063 translations from the test set. The annotators focused on correcting mistakes related to the proper sentence structure in Italian, the accurate meaning of specific English words that MADLAD had translated literally, any misunderstandings of the intended meaning in Italian, and a few grammatical errors.</p><p>In some cases, translation errors do not completely undermine the examples with respect to the task's purpose. For instance, the English sentence from an evidence, "he was booked to win a third world championship at a WWE event on the night of his death" was translated into Italian as "era stato prenotato per vincere un terzo titolo mondiale in un evento della WWE la notte della sua morte". A more accurate translation would be "si pensava avrebbe vinto un terzo titolo mondiale in un evento della WWE la notte della sua morte", better capturing the verb's meaning. In other, more problematic cases, translation errors, loss of information, or introduction of hallucinations could even change the classification in the fact verification task. For example, in the claim "The Thin Red Line (1998 film) has an all-British cast.", the automatic translation was "La sottile linea rossa (The Thin Red Line) è un film del 1998.", which is incorrect because it omits the information about the cast. This detail is crucial, as its absence could lead to incorrect labeling.  A quantitative analysis of the translation quality suggests that MADLAD performs well in translating simple assertive sentences such as claims. In fact, 91% of the claims were not altered by the validators, who considered them completely correct. This percentage is lower for the Wikipedia passages, dropping to 76%. This discrepancy may be due to the greater complexity of the evidence compared to the simpler sentence structures in the claims. Additionally, we reported the results in terms of BLEU score <ref type="bibr" target="#b29">[30]</ref> for the corrected translations compared to the originals, as shown in Table <ref type="table" target="#tab_0">1</ref>. It should be noted that measuring the translation quality after correcting the sentences introduces a strong bias in the measurements; however, it provides a more specific idea of the translation quality, especially in understanding the potential noisiness of the training and development sentences. In this case, results of over 95% for BLEU-1 and over 92% for BLEU-4 suggest that very few terms were altered during validation, and even the grammatical patterns remained largely unchanged. At most, a few mistranslated terms needed updating, as indicated by the qualitative analysis.</p><p>Table <ref type="table" target="#tab_1">2</ref> summarizes the number of examples created for the Italian dataset. In line with the original English material, the dataset is divided into training, development, and test sets, with claims categorized into Supports, Refutes, and NotEnoughInfo (NEI). The table also distinguishes between silver data (automatically translated) and gold data (manually validated). The training set consists of 228,277 claims, the development set contains 15,935 claims, and the test set has 2,063 claims. Each Italian claim or evidence is aligned with the English counterpart, facilitating future research in cross-lingual fact verification. Language Models for Fact Verification. For addressing the capabilities of Large Language Models in Fact Verification, they can be utilized through In-Context Learning techniques <ref type="bibr" target="#b30">[31]</ref> or by directly fine-tuning the model for specific downstream tasks. In-context learning relies on the model's pre-existing knowledge acquired during pretraining and on instructions provided in natural language at inference time. This method does not involve additional training and can be categorized based on the number of examples provided: i) 0-shot Learning, where no examples are given, and the model generates responses based solely on its pre-existing knowledge and the provided instructions; ii) 1-shot Learning, where one example per class is added to provide a more precise context, helping the model better understand the task by offering a concrete reference point; iii) Few-shot Learning, where more than one example per class is provided to give the model additional contextual information during decisionmaking. When the model's pre-existing knowledge is insufficient, we can fine-tune it on the downstream task. Fine-tuning involves training the model in a traditional manner using input-output pairs (training data) to adjust its parameters. This process improves the model's performance on specific tasks, allowing it to learn from a more extensive set of examples. As a result, the model becomes more adept at handling similar queries in the future, with a focus on the specific task at hand. We thus evaluated the application of state-of-the-art LLM, namely LLAMA3 <ref type="bibr" target="#b31">[32]</ref>, by providing just the definition of the task (zero-shot) or adding an example (one-shot) or by performing fine-tuning, to demonstrate the necessity of a training dataset like the one constructed in this work, as discussed in the following section.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Experimental Evaluation</head><p>The goal of our experimentation is to assess the performance of a state-of-the-art LLM applied to Fact Verification. Specifically, we aim to determine whether a multilingual model maintains consistent quality when applied to both the English FEVER dataset and our Italian dataset. We utilize LLaMA3-Instruct<ref type="foot" target="#foot_3">4</ref> , an instruction-tuned generative text model from META with 8 billion parameters, released in April 2024. This model is trained to execute specific instructions or prompts across various tasks. To ensure alignment, we evaluate the systems on the manually validated Italian test set and the same subset of 2,063 claims in the English counterpart. The model is evaluated in 0-shot and 1-shot settings to assess its capability without fine-tuning. The prompts used in English and Italian are provided in Appendix A. Additionally, we fine-tuned LLaMA3 on the English datasets from <ref type="bibr" target="#b28">[29]</ref> and separately on the Italian datasets obtained via machine translation. Fine-tuning was conducted on an NVIDIA A100 using the LoRA technique <ref type="foot" target="#foot_4">5</ref> .</p><p>In FEVER, the title of the document associated with each claim often provides crucial context. For example, the claim "The University of Leicester discovered and identified the remains of a king." relies on the document titled "University of Leicester" to correctly classify the claim as Supports. To ensure the model's generalization, we will evaluate the impact of including document titles in prompts. The metrics used to analyze the results are recall, precision, accuracy, and F1 score, calculated globally and for each label (Supports, Refutes, NotEnough-Info).</p><p>The results are reported in Tables <ref type="table" target="#tab_3">3 and 4</ref> for the English and Italian datasets, respectively. Each table shows whether the model underwent fine-tuning (column FT), whether a prompt without examples (0-shot) or with one example per class (1-shot) was used (column Prompt), and whether the document title was included (column Doc). Notably, if no fine-tuning was performed, the original LLaMA3-Instruct model was used. Given that the system's response can consist of multiple words, we search the output for the mention of one of the classes and associate the example with that class. If no class is identified, the result is classified as NotEnoughInfo. In general, the fine-tuned model is extremely stable, consistently outputting one of the three categories for every request. The non-fine-tuned model, on rare occasions-just a few dozen times out of 2000-produces responses that do not correspond to any of the required classes. This highlights the inherent stability of LLaMA3 while also supporting  </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Table 4</head><p>Performance in terms of Accuracy, Precision, Recall and F1-measure of our systems on Fever-IT dataset the soundness of the results achieved.</p><p>A key finding is that the multilingual model generally achieves similar, though modest, results on English and Italian datasets without fine-tuning, with accuracy values around 0.40-0.50 and average F1 scores in the range of 0.35-0.55. This performance is relatively unstable, and the addition of an example in the prompt does not lead to significant improvements. In English, there are some improvements, but in Italian, there are fewer. We believe this is because, although LLaMA is multilingual, the percentage of Italian examples observed during training is less than 1%, making it less performant and less stable in this language.</p><p>However, when fine-tuning is applied, the results improve dramatically, with accuracy exceeding 90% in both languages. This demonstrates the utility of the translated dataset, even if it contains some noise. In this scenario, adding an example in the prompt leads to negligible but consistent improvements. Additionally, the inclusion of the document title, while sometimes causing inconsistencies in zero-shot learning, is better utilized by the fine-tuned model, leading to slight but not significant improvements. This is interesting because it suggests that the model not relying on document titles is more broadly applicable. Overall, the fine-tuned models perform significantly better, highlighting the importance of the translated dataset for achieving high accuracy in fact verification tasks in both English and Italian.</p><p>The error analysis suggests that the model sometimes inherits the mathematical reasoning limitations of the LLM. For example, the claim "Il Castello di Praga attira oltre 18 milioni di visitatori ogni anno. <ref type="foot" target="#foot_5">6</ref> " was given the evidence "Il castello è tra le attrazioni turistiche più visitate di Praga che attira oltre 1,8 milioni di visitatori all'anno. <ref type="foot" target="#foot_6">7</ref> " The model's predicted label was Refutes, while the true label was Supports. Here, the true label should be Supports since 18 million is indeed greater than 1.8 million, but the model found the numbers inconsistent. In another case, the claim "Ned Stark è stato introdotto nel 1996 in Tempesta di spade. <ref type="foot" target="#foot_7">8</ref> " was paired with the evidence "Introdotto nel 1996 in Il Trono di Spade, Ned è l'onorevole signore di Winterfell, un'antica fortezza nel nord del continente immaginario di Westeros. <ref type="foot" target="#foot_8">9</ref> " The model predicted Refutes, although the true label was Supports. The confusion here is due to the difference in the book titles, which are from the same series but are distinct works. The error analysis revealed that the model occasionally struggled with mathematical reasoning and contextual understanding, highlighting areas for future enhancement. Larger models and further fine-tuning could potentially address these issues, which remain open questions for future research.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Conclusion</head><p>In this work, we have introduced FEVER-IT, an Italian version of the FEVER dataset, designed to improve the training and evaluation of models for fact verification in the Italian language. Using a machine translation system, we translated a large-scale dataset of 228,000 claims/pieces of evidence pairs and manually validated 2, 000 test instances to ensure meaningful evaluations. This enabled us to fine-tune a state-of-the-art LLM, specifically LLaMA3, and assess its performance in both English and Italian.</p><p>Our experiments demonstrated that the multilingual model, without fine-tuning, performed similarly on both English and Italian datasets, though the accuracy and stability were limited. Fine-tuning significantly improved the model's performance, achieving over 90% accuracy in both languages. This underscores the importance and effectiveness of the translated dataset, even if it contains some noise.</p><p>Future work will explore the performance of larger models and further refinement of the dataset to enhance accuracy and generalization capabilities or explore more complex settings such as those described in <ref type="bibr" target="#b8">[9]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A. Prompting Engineering</head><p>This appendix contains the prompts used in the experiments. The prompts are provided in both Italian and English, reflecting the task-specific nature of the experiments. Each prompt begins with an explanation of the task and the meaning of the classes. In the different variants, the 0-shot setting does not include any examples, unlike the 1-shot setting. Where necessary, the name of the document from which the evidence is taken is also specified.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A.1. Prompts in English</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A.1.1. 0-shot Setting</head><p>The following prompt is used for 0-shot learning, where the task and classes are presented without additional information. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A.1.2. 1-shot Setting</head><p>The following prompt is used for 1-shot learning, where the task and classes are explained, and one example per class is provided. Notice that only the evidence is reported without the title of the original document. The following prompt is used for 0-shot learning, where the task and classes are explained without additional information. Each input evidence is provided with the title of its original document. The following prompt is used for 1-shot learning, where the task and classes are explained, and one example per class is provided. Each input evidence is provided with the title of its original document. </p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>#</head><label></label><figDesc># # I n s t r u c t i o n E v a l u a t e i f t h e c l a i m i s s u p p o r t e d by t h e e v i d e n c e p r o v i d e d . D e f i n i t i o n s f o r key t e r m s u s e d i n t h i s t a s k a r e : − Claim : A s t a t e m e n t o r a s s e r t i o n un der e x a m i n a t i o n . − E v i d e n c e : I n f o r m a t i o n t h a t e i t h er s u p p o r t s o r o p p o s e s t h e c l a i m . Answer w i t h one o f t h e f o l l o w i n g j u d g m e n t s b a s e d on t h e e v i d e n c e p r o v i d e d : − SUPPORTS : i f t h e e v i d e n c e s u b s t a n t i a t e s t h e c l a i m . − REFUTES : i f t h e e v i d e n c e d i r e c t l y c o n t r a d i c t s t h e c l a i m . − NOT ENOUGH INFO : i f t h e r e i s i n s u f f i c i e n t e v i d e n c e t o d e t e r m i n e t h e c l a i m ' s v a l i d i t y # # # I n p u t − Claim : [ CLAIM HERE ] − E v i d e n c e : [ EVIDENCE HERE ] # # # Answer : [ANSWER HERE ]</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>#</head><label></label><figDesc># # I n s t r u c t i o n E v a l u a t e i f t h e c l a i m i s s u p p o r t e d by t h e e v i d e n c e p r o v i d e d . D e f i n i t i o n s f o r key t e r m s u s e d i n t h i s t a s k a r e : − Claim : A s t a t e m e n t o r a s s e r t i o n un der e x a m i n a t i o n . − E v i d e n c e : I n f o r m a t i o n t h a t e i t h e r s u p p o r t s o r o p p o s e s t h e c l a i m . Answer w i t h one o f t h e f o l l o w i n g j u d g m e n t s b a s e d on t h e e v i d e n c e p r o v i d e d : − SUPPORTS : i f t h e e v i d e n c e s u b s t a n t i a t e s t h e c l a i m . − REFUTES : i f t h e e v i d e n c e d i r e c t l y c o n t r a d i c t s t h e c l a i m . − NOT ENOUGH INFO : i f t h e r e i s i n s u f f i c i e n t e v i d e n c e t o d e t e r m i n e t h e c l a i m ' s v a l i d i t y # # # Examples These e x a m p l e s d e m o n s t r a t e how t o a p p l y t h e e v a l u a t i o n c r i t e r i a : − Claim : The Germanic p e o p l e s a r e a l s o c a l l e d G o t h i c . − E v i d e n c e : The Germanic p e o p l e s ( a l s o r e f e r r e d t o a s T e u t o n i c , S u e b i a n , o r G o t h i c i n o l d e r l i t e r a t u r e ) a r e an Indo − European ethno − l i n g u i s t i c group o f N o r t h e r n European o r i g i n . − Answer : SUPPORTS − Claim : T e n n i s i s n o t a s p o r t . − E v i d e n c e : T e n n i s i s p l a y e d by m i l l i o n s o f r e c r e a t i o n a l p l a y e r s and i s a l s o a p o p u l a r w o r l d w i d e s p e c t a t o r s p o r t . − Answer : REFUTES − Claim : Kick − Ass i s a h o r r o r f i l m . − E v i d e n c e : Kick − Ass i s a 2 0 1 0 B r i t i s h − American f i l m b a s e d on t h e comic book o f t h e same name by Mark M i l l a r and John Romita , J r . − Answer : NOT ENOUGH INFO # # # I n p u t − Claim : [ CLAIM HERE ] − E v i d e n c e : [ EVIDENCE HERE ] # # # Answer : [ANSWER HERE ] A.1.3. 0-shot Setting with Document Title</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head></head><label></label><figDesc># # # I n s t r u c t i o n E v a l u a t e i f t h e c l a i m i s s u p p o r t e d byt h e e v i d e n c e p r o v i d e d . D e f i n i t i o n s f o r key t e r m s u s e d i n t h i s t a s k a r e : − Claim : A s t a t e m e n t o r a s s e r t i o n u nd e r e x a m i n a t i o n . − E v i d e n c e : I n f o r m a t i o n t h a t e i t h e r s u p p o r t s o r o p p o s e s t h e c l a i m . − Document : d e n o t e s t h e s o u r c e document f o r t h e e v i d e n c e . Answer w i t h one o f t h e f o l l o w i n g j u d g m e n t s b a s e d on t h e e v i d e n c e p r o v i d e d : − SUPPORTS : i f t h e e v i d e n c e s u b s t a n t i a t e s t h e c l a i m . − REFUTES : i f t h e e v i d e n c e d i r e c t l y c o n t r a d i c t s t h e c l a i m . − NOT ENOUGH INFO : i f t h e r e i s i n s u f f i c i e n t e v i d e n c e t o d e t e r m i n e t h e c l a i m ' s v a l i d i t y # # # I n p u t − Claim : [ CLAIM HERE ] − E v i d e n c e : [ EVIDENCE HERE ] − Document : [DOCUMENT HERE ] # # # Answer : [ANSWER HERE ] A.1.4. 1-shot Setting with Document Title</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head>#</head><label></label><figDesc># # I n s t r u c t i o n E v a l u a t e i f t h e c l a i m i s s u p p o r t e d byt h e e v i d e n c e p r o v i d e d . D e f i n i t i o n s f o r key t e r m s u s e d i n t h i s t a s k a r e : − Claim : A s t a t e m e n t o r a s s e r t i o n u nd e r e x a m i n a t i o n . − E v i d e n c e : I n f o r m a t i o n t h a t e i t h e r s u p p o r t s o r o p p o s e s t h e c l a i m . − Document : d e n o t e s t h e s o u r c e document f o r t h e e v i d e n c e .Answer w i t h one o f t h e f o l l o w i n g j u d g m e n t s b a s e d on t h e e v i d e n c e p r o v i d e d : − SUPPORTS : i f t h e e v i d e n c e s u b s t a n t i a t e s t he c l a i m . − REFUTES : i f t h e e v i d e n c e d i r e c t l y c o n t r a d i c t s t h e c l a i m . − NOT ENOUGH INFO : i f t h e r e i s i n s u f f i c i e n t e v i d e n c e t o d e t e r m i n e t h e c l a i m ' s v a l i d i t y # # # Examples These e x a m p l e s d e m o n s t r a t e how t o a p p l y t h e e v a l u a t i o n c r i t e r i a : − Claim : The Germanic p e o p l e s a r e a l s o c a l l e d G o t h i c . − E v i d e n c e : The Germanic p e o p l e s ( a l s o r e f e r r e d t o a s T e u t o n i c , S u e b i a n , o r G o t h i c i n o l d e r l i t e r a t u r e ) a r e an Indo − European ethno − l i n g u i s t i c group o f N o r t h e r n European o r i g i n . − Document : Germanic p e o p l e s − Answer : SUPPORTS − Claim : T e n n i s i s n o t a s p o r t . − E v i d e n c e : T e n n i s i s p l a y e d by m i l l i o n s o f r e c r e a t i o n a l p l a y e r s and i s a l s o a p o p u l a r w o r l d w i d e s p e c t a t o r s p o r t . − Document : T e n n i s − Answer : REFUTES − Claim : Kick − Ass i s a h o r r o r f i l m . − E v i d e n c e : Kick − Ass i s a 2 0 1 0 B r i t i s h − American f i l m b a s e d on t h e comic book o f t h e same name by Mark M i l l a r and John Romita , J r . − Document : Kick − Ass ( f i l m ) − Answer : NOT ENOUGH INFO # # # I n p u t − Claim : [ CLAIM HERE ] − E v i d e n c e : [ EVIDENCE HERE ] − Document : [DOCUMENT HERE ] # # # Answer : [ANSWER HERE ]</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1 BLEU</head><label>1</label><figDesc></figDesc><table><row><cell>Metric</cell><cell>BLEU-1</cell><cell>BLEU-2</cell><cell>BLEU-3</cell><cell>BLEU-4</cell></row><row><cell>Claim</cell><cell>0,9776</cell><cell>0,9695</cell><cell>0,9623</cell><cell>0,9544</cell></row><row><cell>Evidence</cell><cell>0,9529</cell><cell>0,9411</cell><cell>0,9309</cell><cell>0,9207</cell></row><row><cell></cell><cell>Train (S)</cell><cell>Dev (S)</cell><cell>Test (G)</cell><cell>Total</cell></row><row><cell>Supports</cell><cell>114,801</cell><cell>4,638</cell><cell>654</cell><cell>120,095</cell></row><row><cell>Refutes</cell><cell>47,096</cell><cell>4,887</cell><cell>643</cell><cell>52,626</cell></row><row><cell>NEI</cell><cell>66,380</cell><cell>6,410</cell><cell>766</cell><cell>73,556</cell></row><row><cell>Total</cell><cell>228,277</cell><cell>15,935</cell><cell>2,063</cell><cell>246,275</cell></row></table><note>score metrics of Claim and Evidence manually validated (gold) respect automatic translation version (silver)</note></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2</head><label>2</label><figDesc>Number</figDesc><table /><note>of claims and evidence in the Italian dataset. (S) indicates silver data (automatically translated), and (G) indicates gold data (manually validated).</note></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_3"><head>Table 3</head><label>3</label><figDesc>Performance in terms of Accuracy, Precision, Recall and F1-measure of our systems on Fever-EN dataset</figDesc><table><row><cell></cell><cell></cell><cell></cell><cell></cell><cell>Support</cell><cell></cell><cell></cell><cell>Refutes</cell><cell></cell><cell cols="3">Not enough info</cell><cell>Macro Average</cell></row><row><cell cols="3">FT Prompt Doc</cell><cell>Acc</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell></cell><cell></cell><cell></cell><cell>P</cell><cell>R</cell><cell>F1</cell><cell>P</cell><cell>R</cell><cell>F1</cell><cell>P</cell><cell>R</cell><cell>F1</cell><cell>P</cell><cell>R</cell><cell>F1</cell></row><row><cell>No</cell><cell>0-shot 1-shot</cell><cell>No Yes No Yes</cell><cell cols="10">0.462 0.411 0.951 0.574 0.607 0.457 0.522 0.585 0.050 0.092 0.534 0.486 0.396 0.507 0.463 0.942 0.620 0.587 0.663 0.622 0.800 0.005 0.010 0.617 0.537 0.418 0.425 0.376 0.963 0.541 0.671 0.333 0.445 0.478 0.043 0.079 0.508 0.446 0.355 0.462 0.403 0.968 0.569 0.632 0.361 0.459 0.698 0.115 0.197 0.578 0.481 0.409</cell></row><row><cell>Yes</cell><cell>0-shot 1-shot</cell><cell cols="11">No Yes No Yes 0.905 0.913 0.942 0.927 0.924 0.854 0.888 0.883 00.897 0.897 0.940 0.918 0.924 0.845 0.882 0.877 0.903 0.890 0.899 0.896 0.897 0.901 0.899 0.936 0.917 0.923 0.855 0.888 0.887 0.910 0.898 0.903 0.900 0.901 0.895 0.891 0.947 0.918 0.919 0.843 0.879 0.881 0.894 0.887 0.897 0.895 0.895</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_4"><head>.915 0.899 0.907 0.904 0.905</head><label></label><figDesc></figDesc><table /></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">The resource, fine-tuned models, and code will be released on a dedicated repository: https://github.com/crux82/FEVER-it</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_1">https://huggingface.co/datasets/copenlu/fever_gold_evidence</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_2">https://github.com/google-research/google-research/tree/master/ madlad_400</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="4" xml:id="foot_3">https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="5" xml:id="foot_4">The following hyperparameters were used: a learning rate of 0.0001, two epochs, LoRA_R set to 8, LoRA_alpha set to 16, and LoRA_dropout at 0.05. The micro-batch size was 2, and gradient accumulation steps were set to 8.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="6" xml:id="foot_5">In English: "The Prague Castle attracts over 18 million visitors every year."</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="7" xml:id="foot_6">In English: "The castle is among the most visited tourist attractions in Prague, attracting over 1.8 million visitors every year."</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="8" xml:id="foot_7">In English: "Ned Stark was introduced in 1996 in A Storm of Swords."</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="9" xml:id="foot_8">In English: "Introduced in 1996 in A Game of Thrones, Ned is the honorable lord of Winterfell, an ancient fortress in the north of the imaginary continent of Westeros."</note>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgments</head><p>The team would like to thank Monika Kakol for her invaluable support in the validation of the translations. This work was supported by Project ECS 0000024 Rome Technopole, -CUP B83C22002820006, NRP Mission 4 Component 2 Investment 1.5, Funded by the European Union -NextGenerationEU.</p></div>
			</div>

			<div type="annex">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A.2. Prompts in Italian</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A.2.1. 0-shot Setting</head><p>The following prompt is used for 0-shot learning, where the task and classes are presented without additional information.</p><p># # # I s t r u z i o n i V a l u t a s e l ' a f f e r m a z i o n e è s u p p o r t a t a d a l l e p r o v e f o r n i t e . Le d e f i n i z i o n i d e i t e r m i n i c h i a v e u t i l i z z a t i i n q u e s </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A.2.2. 1-shot Setting</head><p>The following prompt is used for 1-shot learning, where the task and classes are explained, and one example per class is provided. Notice that only the evidence is reported without the title of the original document. The following prompt is used for 0-shot learning, where the task and classes are explained without additional information. Each input evidence is provided with the title of its original document. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A.2.4. 1-shot Setting with Document Title</head><p>The following prompt is used for 1-shot learning, where the task and classes are explained, and one example per class is provided. Each input evidence is provided with the title of its original document. </p></div>			</div>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">A survey on automated fact-checking</title>
		<author>
			<persName><forename type="first">Z</forename><surname>Guo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">S</forename><surname>Schlichtkrull</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Vlachos</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Trans. Assoc. Comput. Linguistics</title>
		<imprint>
			<biblScope unit="volume">10</biblScope>
			<biblScope unit="page" from="178" to="206" />
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">The promise of computational journalism</title>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">D</forename><surname>Terry Flew</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Christina</forename><surname>Spurgeon</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Swift</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journalism Practice</title>
		<imprint>
			<biblScope unit="volume">6</biblScope>
			<biblScope unit="page" from="157" to="171" />
			<date type="published" when="2012">2012</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<monogr>
		<title level="m" type="main">Combating misinformation in the age of llms: Opportunities and challenges</title>
		<author>
			<persName><forename type="first">C</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Shu</surname></persName>
		</author>
		<ptr target="https://arxiv.org/abs/2311.05656.arXiv:2311.05656" />
		<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<monogr>
		<title level="m" type="main">Multimodal automated factchecking: A survey</title>
		<author>
			<persName><forename type="first">M</forename><surname>Akhtar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Schlichtkrull</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Guo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Cocarascu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Simperl</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Vlachos</surname></persName>
		</author>
		<ptr target="https://arxiv.org/abs/2305.13507.arXiv:2305.13507" />
		<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Automated fact checking: Task formulations, methods and future directions</title>
		<author>
			<persName><forename type="first">J</forename><surname>Thorne</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Vlachos</surname></persName>
		</author>
		<ptr target="https://aclanthology.org/C18-1283" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 27th International Conference on Computational Linguistics, Association for Computational Linguistics</title>
				<meeting>the 27th International Conference on Computational Linguistics, Association for Computational Linguistics<address><addrLine>Santa Fe, New Mexico, USA</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2018">2018</date>
			<biblScope unit="page" from="3346" to="3359" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">FEVER: a large-scale dataset for fact extraction and VERification</title>
		<author>
			<persName><forename type="first">J</forename><surname>Thorne</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Vlachos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Christodoulopoulos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Mittal</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/N18-1074</idno>
		<ptr target="https://aclanthology.org/N18-1074.doi:10.18653/v1/N18-1074" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</title>
		<title level="s">Long Papers</title>
		<editor>
			<persName><forename type="first">M</forename><surname>Walker</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">H</forename><surname>Ji</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">A</forename><surname>Stent</surname></persName>
		</editor>
		<meeting>the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies<address><addrLine>New Orleans, Louisiana</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2018">2018</date>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page" from="809" to="819" />
		</imprint>
	</monogr>
	<note>Association for Computational Linguistics</note>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">The fact extraction and VERification (FEVER) shared task</title>
		<author>
			<persName><forename type="first">J</forename><surname>Thorne</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Vlachos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Cocarascu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Christodoulopoulos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Mittal</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/W18-5501</idno>
		<ptr target="https://aclanthology.org/W18-5501.doi:10.18653/v1/W18-5501" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the First Workshop on Fact Extraction and VERification (FEVER), Association for Computational Linguistics</title>
				<meeting>the First Workshop on Fact Extraction and VERification (FEVER), Association for Computational Linguistics<address><addrLine>Brussels, Belgium</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2018">2018</date>
			<biblScope unit="page" from="1" to="9" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">The FEVER2.0 shared task</title>
		<author>
			<persName><forename type="first">J</forename><surname>Thorne</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Vlachos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Cocarascu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Christodoulopoulos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Mittal</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/D19-6601</idno>
		<ptr target="https://aclanthology.org/D19-6601.doi:10.18653/v1/D19-6601" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Second Workshop on Fact Extraction and VERification (FEVER), Association for Computational Linguistics</title>
				<meeting>the Second Workshop on Fact Extraction and VERification (FEVER), Association for Computational Linguistics<address><addrLine>Hong Kong, China</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="1" to="6" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Mittal, The fact extraction and VERification over unstructured and structured information (FEVER-OUS) shared task</title>
		<author>
			<persName><forename type="first">R</forename><surname>Aly</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Guo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">S</forename><surname>Schlichtkrull</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Thorne</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Vlachos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Christodoulopoulos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Cocarascu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename></persName>
		</author>
		<idno type="DOI">10.18653/v1/2021.fever-1.1</idno>
		<ptr target="https://aclanthology.org/2021.fever-1.1.doi:10.18653/v1/2021.fever-1.1" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Fourth Workshop on Fact Extraction and VERification (FEVER), Association for Computational Linguistics</title>
				<meeting>the Fourth Workshop on Fact Extraction and VERification (FEVER), Association for Computational Linguistics<address><addrLine>Dominican Republic</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2021">2021</date>
			<biblScope unit="page" from="1" to="13" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">The CLEF-2021 CheckThat! lab on detecting check-worthy claims, previously fact-checked claims, and fake news</title>
		<author>
			<persName><forename type="first">P</forename><surname>Nakov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">D S</forename><surname>Martino</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Elsayed</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Barrón-Cedeño</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Míguez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Shaar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Alam</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Haouari</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Hasanain</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Babulkov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Nikolov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">K</forename><surname>Shahi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">M</forename><surname>Struß</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Mandl</surname></persName>
		</author>
		<idno type="DOI">10.1007/978-3-030-72240-1_75</idno>
		<ptr target="https://link.springer.com/chapter/10.1007/978-3-030-72240-1_75" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 43rd European Conference on Information Retrieval, ECIR &apos;21</title>
				<meeting>the 43rd European Conference on Information Retrieval, ECIR &apos;21<address><addrLine>Lucca, Italy</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2021">2021</date>
			<biblScope unit="page" from="639" to="649" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">The clef-2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection</title>
		<author>
			<persName><forename type="first">P</forename><surname>Nakov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Barrón-Cedeño</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Da San Martino</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Alam</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">M</forename><surname>Struß</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Mandl</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Míguez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Caselli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Kutlu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Zaghouani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Shaar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">K</forename><surname>Shahi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Mubarak</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Nikolov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Babulkov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><forename type="middle">S</forename><surname>Kartal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Beltrán</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Advances in Information Retrieval</title>
				<meeting><address><addrLine>Cham</addrLine></address></meeting>
		<imprint>
			<publisher>Springer International Publishing</publisher>
			<date type="published" when="2022">2022</date>
			<biblScope unit="page" from="416" to="428" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">The clef-2024 checkthat! lab: Check-worthiness, subjectivity, persuasion, roles, authorities, and adversarial robustness</title>
		<author>
			<persName><forename type="first">A</forename><surname>Barrón-Cedeño</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Alam</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Chakraborty</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Elsayed</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Nakov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Przybyła</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">M</forename><surname>Struß</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Haouari</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Hasanain</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Ruggeri</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Song</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Suwaileh</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Advances in Information Retrieval</title>
				<editor>
			<persName><forename type="first">N</forename><surname>Goharian</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">N</forename><surname>Tonellotto</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Y</forename><surname>He</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">A</forename><surname>Lipani</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">G</forename><surname>Mcdonald</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">C</forename><surname>Macdonald</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">I</forename><surname>Ounis</surname></persName>
		</editor>
		<meeting><address><addrLine>Nature Switzerland, Cham</addrLine></address></meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2024">2024</date>
			<biblScope unit="page" from="449" to="458" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<monogr>
		<author>
			<persName><forename type="first">A</forename><surname>Conneau</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Khandelwal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Goyal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Chaudhary</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Wenzek</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Guzmán</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Grave</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Ott</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Zettlemoyer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Stoyanov</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1911.02116</idno>
		<title level="m">Unsupervised crosslingual representation learning at scale</title>
				<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b13">
	<monogr>
		<author>
			<persName><forename type="first">H</forename><surname>Touvron</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Lavril</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Izacard</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Martinet</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M.-A</forename><surname>Lachaux</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Lacroix</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Rozière</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Goyal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Hambro</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Azhar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Rodriguez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Joulin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Grave</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Lample</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2302.13971</idno>
		<title level="m">Llama: Open and efficient foundation language models</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Factrank: Developing automated claim detection for dutch-language factcheckers</title>
		<author>
			<persName><forename type="first">B</forename><surname>Berendt</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Burger</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Hautekiet</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Jagers</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Pleijter</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Van Aelst</surname></persName>
		</author>
		<idno type="DOI">10.1016/j.osnem.2020.100113</idno>
		<idno>doi:</idno>
		<ptr target="https://doi.org/10.1016/j.osnem.2020.100113" />
	</analytic>
	<monogr>
		<title level="j">Online Social Networks and Media</title>
		<imprint>
			<biblScope unit="volume">22</biblScope>
			<biblScope unit="page">100113</biblScope>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Enabling deep learning for large scale question answering in italian</title>
		<author>
			<persName><forename type="first">D</forename><surname>Croce</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Zelenanska</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Basili</surname></persName>
		</author>
		<idno type="DOI">10.3233/IA-190018</idno>
		<ptr target="https://doi.org/10.3233/IA-190018.doi:10.3233/IA-190018" />
	</analytic>
	<monogr>
		<title level="j">Intelligenza Artificiale</title>
		<imprint>
			<biblScope unit="volume">13</biblScope>
			<biblScope unit="page" from="49" to="61" />
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Large scale datasets for image and video captioning in italian</title>
		<author>
			<persName><forename type="first">A</forename><surname>Scaiella</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Croce</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Basili</surname></persName>
		</author>
		<ptr target="http://www.ai-lc.it/IJCoL/v5n2/IJCOL_5_2_3___scaiella_et_al.pdf" />
	</analytic>
	<monogr>
		<title level="j">Italian Journal of Computational Linguistics</title>
		<imprint>
			<biblScope unit="volume">2</biblScope>
			<biblScope unit="page" from="49" to="60" />
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">Team papelo: Transformer networks at FEVER</title>
		<author>
			<persName><forename type="first">C</forename><surname>Malon</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/W18-5517</idno>
		<ptr target="https://aclanthology.org/W18-5517.doi:10.18653/v1/W18-5517" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the First Workshop on Fact Extraction and VERification (FEVER), Association for Computational Linguistics</title>
				<editor>
			<persName><forename type="first">J</forename><surname>Thorne</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">A</forename><surname>Vlachos</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">O</forename><surname>Cocarascu</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">C</forename><surname>Christodoulopoulos</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">A</forename><surname>Mittal</surname></persName>
		</editor>
		<meeting>the First Workshop on Fact Extraction and VERification (FEVER), Association for Computational Linguistics<address><addrLine>Brussels, Belgium</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2018">2018</date>
			<biblScope unit="page" from="109" to="113" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<monogr>
		<title level="m" type="main">Experimenting ai technologies for disinformation combat: the idmo project</title>
		<author>
			<persName><forename type="first">L</forename><surname>Canale</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Messina</surname></persName>
		</author>
		<ptr target="https://arxiv.org/abs/2310.11097.arXiv:2310.11097" />
		<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">Madlad-400: A multilingual and document-level large audited dataset</title>
		<author>
			<persName><forename type="first">S</forename><surname>Kudugunta</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Caswell</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Garcia</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Xin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Kusupati</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Stella</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Bapna</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Firat</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Advances in Neural Information Processing Systems</title>
				<imprint>
			<publisher>Curran Associates, Inc</publisher>
			<date type="published" when="2023">2023</date>
			<biblScope unit="volume">36</biblScope>
			<biblScope unit="page" from="67284" to="67296" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">Fact checking: Task definition and dataset construction</title>
		<author>
			<persName><forename type="first">A</forename><surname>Vlachos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Riedel</surname></persName>
		</author>
		<idno type="DOI">10.3115/v1/W14-2508</idno>
		<ptr target="https://aclanthology.org/W14-2508.doi:10.3115/v1/W14-2508" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the ACL 2014 Workshop on Language Technologies and Computational Social Science, Association for Computational Linguistics</title>
				<editor>
			<persName><forename type="first">C</forename><surname>Danescu-Niculescu-Mizil</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">J</forename><surname>Eisenstein</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">K</forename><surname>Mckeown</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">N</forename><forename type="middle">A</forename><surname>Smith</surname></persName>
		</editor>
		<meeting>the ACL 2014 Workshop on Language Technologies and Computational Social Science, Association for Computational Linguistics<address><addrLine>Baltimore, MD, USA</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2014">2014</date>
			<biblScope unit="page" from="18" to="22" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<analytic>
		<title level="a" type="main">Facter-check: Semi-automated fact-checking through semantic similarity and natural language inference</title>
		<author>
			<persName><forename type="first">A</forename><surname>Martín</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Huertas-Tato</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Álvaro</forename><surname>Huertas-García</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Villar-Rodríguez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Camacho</surname></persName>
		</author>
		<idno type="DOI">10.1016/j.knosys.2022.109265</idno>
		<idno>doi:</idno>
		<ptr target="https://doi.org/10.1016/j.knosys.2022.109265" />
	</analytic>
	<monogr>
		<title level="j">Knowledge-Based Systems</title>
		<imprint>
			<biblScope unit="volume">251</biblScope>
			<biblScope unit="page">109265</biblScope>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<analytic>
		<title level="a" type="main">Automated claim matching with large language models: Empowering factcheckers in the fight against misinformation</title>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">C</forename><surname>Choi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Ferrara</surname></persName>
		</author>
		<idno type="DOI">10.1145/3589335.3651910</idno>
		<idno>doi:10.1145/3589335.3651910</idno>
		<ptr target="https://doi.org/10.1145/3589335.3651910" />
	</analytic>
	<monogr>
		<title level="m">Companion Proceedings of the ACM on Web Conference 2024, WWW &apos;24</title>
				<meeting><address><addrLine>New York, NY, USA</addrLine></address></meeting>
		<imprint>
			<publisher>Association for Computing Machinery</publisher>
			<date type="published" when="2024">2024</date>
			<biblScope unit="page" from="1441" to="1449" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b23">
	<monogr>
		<title level="m" type="main">Improving language understanding by generative pre-training</title>
		<author>
			<persName><forename type="first">A</forename><surname>Radford</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Narasimhan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Salimans</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Sutskever</surname></persName>
		</author>
		<idno>CoRR abs/1801.06146</idno>
		<ptr target="http://arxiv.org/abs/1801.06146.arXiv:1801.06146" />
		<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b24">
	<analytic>
		<title level="a" type="main">Language models are few-shot learners</title>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">B</forename><surname>Brown</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Mann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Ryder</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Subbiah</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Kaplan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Dhariwal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Neelakantan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Shyam</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Sastry</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Askell</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Agarwal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Herbert-Voss</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Krueger</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Henighan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Child</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Ramesh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">M</forename><surname>Ziegler</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Winter</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Hesse</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Sigler</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Litwin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Gray</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Chess</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Clark</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Berner</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Mccandlish</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Radford</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Sutskever</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Amodei</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020</title>
				<editor>
			<persName><forename type="first">H</forename><surname>Larochelle</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">M</forename><surname>Ranzato</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">R</forename><surname>Hadsell</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">M</forename><surname>Balcan</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">H</forename><surname>Lin</surname></persName>
		</editor>
		<meeting><address><addrLine>NeurIPS</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2020-12">2020. December, 2020</date>
			<biblScope unit="page" from="6" to="12" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b25">
	<monogr>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">J</forename><surname>Hu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Shen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Wallis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Allen-Zhu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Chen</surname></persName>
		</author>
		<idno>CoRR abs/2106.09685</idno>
		<ptr target="https://arxiv.org/abs/2106.09685.arXiv:2106.09685" />
		<title level="m">Lora: Low-rank adaptation of large language models</title>
				<imprint>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b26">
	<analytic>
		<title level="a" type="main">Extremita at EVALITA 2023: Multi-task sustainable scaling to large language models at its extreme, in</title>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">D</forename><surname>Hromei</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Croce</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Basile</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Basili</surname></persName>
		</author>
		<ptr target="https://ceur-ws.org/Vol-3473/paper13.pdf" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Eighth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop (EVALITA 2023)</title>
		<title level="s">CEUR Workshop Proceedings</title>
		<meeting>the Eighth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop (EVALITA 2023)<address><addrLine>Parma, Italy</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2023">September 7th-8th, 2023. 2023</date>
			<biblScope unit="volume">3473</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b27">
	<analytic>
		<title level="a" type="main">Averitec: A dataset for real-world claim verification with evidence from the web</title>
		<author>
			<persName><forename type="first">M</forename><surname>Schlichtkrull</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Guo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Vlachos</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Advances in Neural Information Processing Systems</title>
				<editor>
			<persName><forename type="first">A</forename><surname>Oh</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">T</forename><surname>Naumann</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">A</forename><surname>Globerson</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">K</forename><surname>Saenko</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">M</forename><surname>Hardt</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">S</forename><surname>Levine</surname></persName>
		</editor>
		<imprint>
			<publisher>Curran Associates, Inc</publisher>
			<date type="published" when="2023">2023</date>
			<biblScope unit="volume">36</biblScope>
			<biblScope unit="page" from="65128" to="65167" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b28">
	<analytic>
		<title level="a" type="main">Generating label cohesive and well-formed adversarial claims</title>
		<author>
			<persName><forename type="first">P</forename><surname>Atanasova</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Wright</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Augenstein</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2020.emnlp-main.256</idno>
		<ptr target="https://aclanthology.org/2020.emnlp-main.256.doi:10.18653/v1/2020.emnlp-main.256" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Association for Computational Linguistics</title>
				<meeting>the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Association for Computational Linguistics</meeting>
		<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="3168" to="3177" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b29">
	<analytic>
		<title level="a" type="main">Bleu: a method for automatic evaluation of machine translation</title>
		<author>
			<persName><forename type="first">K</forename><surname>Papineni</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Roukos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Ward</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W.-J</forename><surname>Zhu</surname></persName>
		</author>
		<idno type="DOI">10.3115/1073083.1073135</idno>
		<idno>doi:10.3115/1073083. 1073135</idno>
		<ptr target="https://doi.org/10.3115/1073083.1073135" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, ACL &apos;02, Association for Computational Linguistics</title>
				<meeting>the 40th Annual Meeting on Association for Computational Linguistics, ACL &apos;02, Association for Computational Linguistics<address><addrLine>USA</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2002">2002</date>
			<biblScope unit="page" from="311" to="318" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b30">
	<monogr>
		<title level="m" type="main">A survey on in-context learning</title>
		<author>
			<persName><forename type="first">Q</forename><surname>Dong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Dai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Zheng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Ma</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Xia</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Xu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Chang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Sun</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Sui</surname></persName>
		</author>
		<ptr target="https://arxiv.org/abs/2301.00234.arXiv:2301.00234" />
		<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b31">
	<monogr>
		<ptr target="https://github.com/meta-llama/llama3/blob/main/MODEL_CARD.md" />
		<title level="m">AI@Meta, Llama 3 model card</title>
				<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
