<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Non Verbis, Sed Rebus: Large Language Models are Weak Solvers of Italian Rebuses</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Gabriele</forename><surname>Sarti</surname></persName>
							<email>g.sarti@rug.nl</email>
							<affiliation key="aff0">
								<orgName type="department">Center for Language and Cognition (CLCG)</orgName>
								<orgName type="institution">University of Groningen</orgName>
								<address>
									<addrLine>Oude Kijk in &apos;t Jatstraat 26</addrLine>
									<postCode>9712EK</postCode>
									<settlement>Groningen</settlement>
									<country key="NL">The Netherlands</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Tommaso</forename><surname>Caselli</surname></persName>
							<email>t.caselli@rug.nl</email>
							<affiliation key="aff0">
								<orgName type="department">Center for Language and Cognition (CLCG)</orgName>
								<orgName type="institution">University of Groningen</orgName>
								<address>
									<addrLine>Oude Kijk in &apos;t Jatstraat 26</addrLine>
									<postCode>9712EK</postCode>
									<settlement>Groningen</settlement>
									<country key="NL">The Netherlands</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Malvina</forename><surname>Nissim</surname></persName>
							<email>m.nissim@rug.nl</email>
							<affiliation key="aff0">
								<orgName type="department">Center for Language and Cognition (CLCG)</orgName>
								<orgName type="institution">University of Groningen</orgName>
								<address>
									<addrLine>Oude Kijk in &apos;t Jatstraat 26</addrLine>
									<postCode>9712EK</postCode>
									<settlement>Groningen</settlement>
									<country key="NL">The Netherlands</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Arianna</forename><surname>Bisazza</surname></persName>
							<email>a.bisazza@rug.nl</email>
							<affiliation key="aff0">
								<orgName type="department">Center for Language and Cognition (CLCG)</orgName>
								<orgName type="institution">University of Groningen</orgName>
								<address>
									<addrLine>Oude Kijk in &apos;t Jatstraat 26</addrLine>
									<postCode>9712EK</postCode>
									<settlement>Groningen</settlement>
									<country key="NL">The Netherlands</country>
								</address>
							</affiliation>
						</author>
						<author>
							<affiliation key="aff1">
								<orgName type="department">Tenth Italian Conference on Computational Linguistics</orgName>
								<address>
									<addrLine>Dec 04 -06</addrLine>
									<postCode>2024</postCode>
									<settlement>Pisa</settlement>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Non Verbis, Sed Rebus: Large Language Models are Weak Solvers of Italian Rebuses</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">290F50854597AF065AEC9D2728AE0361</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T17:35+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Large language models</term>
					<term>Sequential reasoning</term>
					<term>Puzzle</term>
					<term>Rebus</term>
					<term>Crosswords</term>
					<term>Enigmistica Italiana</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Rebuses are puzzles requiring constrained multi-step reasoning to identify a hidden phrase from a set of images and letters. In this work, we introduce a large collection of verbalized rebuses for the Italian language and use it to assess the rebus-solving capabilities of state-of-the-art large language models. While general-purpose systems such as LLaMA-3 and GPT-4o perform poorly on this task, ad-hoc fine-tuning seems to improve models' performance. However, we find that performance gains from training are largely motivated by memorization. Our results suggest that rebus solving remains a challenging test bed to evaluate large language models' linguistic proficiency and sequential instruction-following skills.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>Complex games such as chess and Go have long been a source of inspiration to develop more flexible and robust AI systems <ref type="bibr" target="#b0">[1,</ref><ref type="bibr" target="#b1">2]</ref>. Recent developments in NLP suggested that creative language games could be exploited as promising benchmarks for quantifying the ability of large language models (LLMs) to carry out multi-step knowledge-intensive reasoning tasks under pre-specified constraints <ref type="bibr" target="#b2">[3]</ref>. While crossword puzzles have been historically the main focus of such efforts <ref type="bibr" target="#b3">[4]</ref>, other categories of linguistic games received only marginal attention, especially for languages other than English. A prominent example of less-studied language games is the rebus, a visual puzzle combining images and graphic signs to encode a hidden phrase. Indeed, rebus solving is a complex, multi-step process requiring factual knowledge, contextual understanding, vocabulary usage, and reasoning within pre-defined constraints -a set of fundamental skills to address a variety of real-world tasks.</p><p>In this work, we conduct the first open evaluation of LLMs' rebus-solving capabilities, focusing specifically on the Italian language. We propose a novel strategy to derive text-only verbalized rebuses from transcribed intermediate rebus solutions and use it to produce a large collection with more than 80k verbalized rebuses. We then evaluate the rebus-solving skills of state-of-the-art LLMs,  Solution key (# of chars/word): <ref type="bibr" target="#b10">11</ref> 5 including open-source systems and proprietary models, via few-shot prompting. Moreover, we fine-tune a small but capable LLM on verbalized rebus solving, outperforming state-of-the-art systems by a wide margin. Finally, we conduct a fine-grained assessment of LLMs' sequential reasoning steps, explaining model performance in terms of word complexity and memorization.</p><p>Gabriele Sarti et al. CEUR Workshop Proceedings</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>1-10</head><p>Beyond rebus solving, our evaluation sheds light on the limits of current LLMs in multi-step reasoning settings, highlighting challenges with their application to complex sequential instruction-following scenarios.<ref type="foot" target="#foot_0">1</ref> </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Background and Related Work</head><p>Italian Enigmistica and Rebuses The Italian language is characterized by a rich and long-standing tradition of puzzle games, including rebuses, dating back to the 19th century <ref type="bibr" target="#b4">[5]</ref>  <ref type="foot" target="#foot_1">2</ref> In Italian rebuses, a first pass (prima lettura) representing an intermediate solution of the puzzle is produced by combining graphemes with underlying image elements in a left-to-right direction (Figure <ref type="figure" target="#fig_2">1</ref>). Then, the letters and words of the first pass undergo a re-segmentation (cesura) according to a solution key (chiave di lettura <ref type="foot" target="#foot_2">3</ref> ), which specifies the length of words in the solution (frase risolutiva). The verbalized rebuses we introduce in this work are variants of textual rebuses (rebus descritto or verbis), where the text-based puzzle is crafted by replacing first pass words with their crossword definitions in a templated format (Figure <ref type="figure" target="#fig_2">1</ref>).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Linguistic Puzzles as NLP Progress Metrics</head><p>Language games have recently been adopted as challenging tasks for LLM evaluation <ref type="bibr" target="#b2">[3,</ref><ref type="bibr" target="#b8">9,</ref><ref type="bibr" target="#b9">10]</ref>. While works in this area have historically focused on English crosswords <ref type="bibr" target="#b10">[11,</ref><ref type="bibr" target="#b12">12,</ref><ref type="bibr" target="#b3">4,</ref><ref type="bibr" target="#b13">13]</ref>, recent tests focus on a more diverse set of games such as the New York Times' "Connections" <ref type="bibr" target="#b14">[14]</ref> and "Wordle" <ref type="bibr" target="#b15">[15]</ref>. Automatic crossword solvers were also developed for French <ref type="bibr" target="#b16">[16]</ref>, German <ref type="bibr" target="#b17">[17]</ref> and Italian <ref type="bibr" target="#b18">[18,</ref><ref type="bibr" target="#b19">19]</ref>, while didactic crossword generators are available for Italian <ref type="bibr" target="#b20">[20]</ref> and Turkish <ref type="bibr" target="#b21">[21]</ref>. Relatedly, the Italian evaluation campaign EVALITA <ref type="foot" target="#foot_3">4</ref> recently hosted two shared tasks focusing on the word-guessing game "La Ghigliottina" (The Guillotine) <ref type="bibr" target="#b22">[22,</ref><ref type="bibr" target="#b23">23]</ref>. To our knowledge, our work is the first to attempt the computational modeling and evaluation of rebus-solving systems. Importantly, language games such as rebuses are not easily translatable into other languages due to their structural and cultural elements. This makes them a scarce but valuable resource for language-specific evaluations of language processing systems.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>LLMs as Sequential Reasoners State-of-the-art</head><p>LLMs were shown to struggle to follow sequential instructions presented in a single query <ref type="bibr" target="#b24">[24]</ref>, but their performances improved significantly with ad-hoc training <ref type="bibr" target="#b25">[25]</ref>. This acts as an initial motivation for our rebus-solving Models We fine-tune Phi-3 Mini 3.8B 4K <ref type="bibr" target="#b28">[28]</ref>, the most capable LLM below 4B parameters for a wide range of Italian language tasks<ref type="foot" target="#foot_7">8</ref> . We use quantized low-rank adapters (QLoRA; 29, 30) for efficient fine-tuning with Unsloth <ref type="foot" target="#foot_8">9</ref>and Transformers <ref type="bibr" target="#b31">[31]</ref>, training the model for 5,000 steps with a batch size of 16 over 81k examples. For comparing our model performances, we select GPT-4o <ref type="bibr" target="#b32">[32]</ref> and Claude-3.5 Sonnet <ref type="bibr" target="#b33">[33]</ref> as the current state-of-the-art for proprietary LLMs and the instruction-tuned variants of Qwen-2 72B <ref type="bibr" target="#b34">[34]</ref> and LLaMA-3 70B <ref type="bibr" target="#b35">[35]</ref> as the bestperforming open-source LLMs according to the Invalsi Italian benchmark <ref type="bibr" target="#b36">[36]</ref>. These four systems are used as untrained baselines thanks to their instruction-following abilities and prompted for rebus solving in a few-shot setting.</p><p>Format Table <ref type="table" target="#tab_0">1</ref> presents an example in the templated format used for fine-tuning Phi-3. <ref type="foot" target="#foot_9">10</ref> The model is prompted to reason step-by-step by 1) solving crossword definitions sequentially (definition resolution); 2) producing a first pass copying letters and definitions' words;</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Prompt</head><p>Risolvi gli indizi tra parentesi per ottenere una prima lettura, e usa la chiave di lettura per ottenere la soluzione del rebus.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Rebus: U [Lo è il passacavallo] LO [È fatta di vimini] F F [Decimi di chilo] S [Disusato soprabito] A [Un rampicante dei Tropici]</head><p>Chiave di lettura: 3 <ref type="bibr">6 12 8</ref> Generation Procediamo alla risoluzione del rebus passo per passo:</p><formula xml:id="formula_0">Definition resolution ⎧ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩ -U = U -[Lo è il passacavallo] = nave -L O = L O -[È fatta di vimini] = cesta -F F = F F -[Decimi di chilo] = etti -S = S -[Disusato soprabito] = tait -A = A -[Un rampicante dei Tropici] = liana</formula></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>First pass</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>{︂</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Prima lettura: U nave LO cesta F F etti S tait A liana</head><p>Ora componiamo la soluzione seguendo la chiave risolutiva: </p><formula xml:id="formula_1">Solution segmen- tation ⎧ ⎪ ⎪ ⎨ ⎪ ⎪ ⎩ 3 = Una 6 = veloce 12 = staffettista 8 = italiana Solution {︀ Soluzione: Una veloce staffettista italiana</formula></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Results</head><p>Table <ref type="table" target="#tab_1">2</ref> presents our evaluation results. We observe that all prompted models perform poorly on the task, with the overall best prompted system (Claude 3.5 Sonnet) obtaining the correct solution only for 24% of the 2k tested examples. Notably, open-source systems perform significantly worse than proprietary ones, producing correct first passes only for 4% of the examples, and next to no correct solutions. Our fine-tuned system largely outperforms all state-of-the-art prompted models, predicting the correct solution in 51% of cases. From first pass metrics, it is evident these results can be largely explained by the poor word-guessing capabilities of the models, which are greatly improved with fine-tuning. For prompted models, the slight decrease in scores between Def. and FP Words also highlights issues with copying predicted words in the expected format. Finally, we observe that fine-tuning strongly improves the constraint-following abilities of our system, with prompted systems being less strict with applying length and letter-choice constraints for their solutions (Key/FP Match).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">What Motivates Model Performances?</head><p>In light of the strong performances achieved by our relatively small fine-tuned system, this section conducts an in-depth investigation to identify factors motivating such performance improvements. 2) the word frequency in Paisà <ref type="bibr" target="#b38">[38]</ref>, a large web Italian corpus; and 3) the length of the word (number of characters). We find a significant positive correlation (𝜌 = 0.44) between first pass word prediction accuracy and training frequency for the fine-tuned Phi-3 model, suggesting that model performance is strongly related to training coverage. The length of characters is also found to negatively affect our model's performance, albeit to a smaller extent (𝜌 = −0.11). The performance of prompted models is unrelated to both properties for first pass words, indicating that these results are the product of fine-tuning. 12   LLM Fine-Tuning Fails to Generalize to Unseen Words To further confirm the importance of finetuning word coverage in defining model performances, 12 Paisà frequency is never found to correlate significantly. Full correlation results are available in Table <ref type="table" target="#tab_6">6</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Model</head><p>we  <ref type="table" target="#tab_2">3</ref> confirm that this is indeed the case. We find Phi-3 performances to be near-perfect on seen first pass words (FP W. ID = 0.96) in both test sets, with a major drop for OOD words (FP W. OOD = 0.20). This produces second-order effects on subsequent steps, causing the FP EM results to drop by 71% (FP EM Test ∆), while significantly impacting downstream solution accuracies. On the contrary, GPT-4o few-shot prompting performances remain nearly identical on both splits, confirming that these results are not the product of a skewed data selection process. Overall, these results strongly suggest that memorization is the main factor behind the strong rebus-solving performance of our fine-tuned LLM.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Manual Inspection</head><p>We conclude by manually evaluating some generations produced by the best-performing LLMs. Table <ref type="table" target="#tab_4">4</ref> presents two examples with definitions (D) and solution (S) words predicted by three LLMs, with more examples provided in Appendix C. We use naw as short-hand for "Not A Word" to mark nonsensical terms.</p><p>In the first example, Phi-3 correctly predicts all first pass and solution words. On the contrary, other models make several mistakes in the first pass, leading to incorrect solutions. Both prompted models tend to ignore first pass words when these cannot be assembled to form sensical, length-fitting solution words. For example, for D1 GPT-4o predicts p (naw), which would lead to the solution word "SAPpTE" (naw), but the S8 = "Spettacolo" (show) is predicted instead by the model). In particular, GPT-4o appears to prioritize grammatically correct solutions at the cost of ignoring first pass words and solution key length constraints, while Claude 3.5S</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Rebus: SAP [La porta della breccia] D1 TE [La pinza del granchio] D2 SBA [Si legge su alcuni orologi] D3 G [Le sue coccole sono aromatiche] D4 V [Un gioco con dadi e pedine] D5 D [Sono verdi in gioventù] D6</head><p>Chiave di lettura: 8 3 2 12 7 5</p><p>Step shows an improved ability to follow these constraints, as confirmed by Key/FP Match results of Table <ref type="table" target="#tab_1">2</ref>.</p><p>In the second example, the first pass word D2 = salice (willow) is OOD for Phi-3. Consequently, the model produces the incorrect prediction aro (naw), and the error is propagated to all solution words, as previously observed in the Test OOD column of Table <ref type="table" target="#tab_2">3</ref>. Prompted models also underperform in this example, with errors on D1 and D2 propagating to most solution words. However, we note that D1 and D2 incorrect predictions for Claude 3.5S satisfy the provided definitions, suggesting that access to more explicit information about the given constraints could further boost LLMs' performance on this task.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.">Discussion and Conclusion</head><p>This work introduced a verbalized rebus-solving task and dataset for evaluating LLMs' sequential instruction following skills for the Italian language. We crafted a large collection of 83k verbalized rebuses by combining rebus transcriptions with crossword definitions and used it to evaluate the rebus-solving skills of state-of-the-art LLMs. Our experiments revealed the challenging nature of this task, with even the most capable prompted models achieving only 24% accuracy on solutions.</p><p>While fine-tuning a smaller LLM dramatically improved performance to 51% solution accuracy, our analysis uncovered that these gains were largely driven by memorization and do not generalize to out-ofdistribution examples. These results suggest important limitations in the generalization capabilities of current systems for sequential instruction following tasks. Our manual analysis further shows that LLMs seldom account for length constraints when solving definitions, despite the fundamental role of these cues in restricting the pool of possible words. These results suggest that searchbased approaches accounting for constraints more explicitly might improve puzzle structure adherence, as previously shown by Chen et al. <ref type="bibr" target="#b39">[39]</ref>. Other augmentation techniques employing LLM reformulation skills can also be explored to mitigate overfitting.</p><p>Future work in this area should focus on expanding similar evaluations to a wider set of languages, input modalities, and puzzle categories, creating a comprehensive benchmark to test LLMs' puzzle-solving skills. Importantly, the task of solving visual rebuses and their more convoluted variants <ref type="foot" target="#foot_11">13</ref> remains far beyond the current capabilities of vision-language models. Hence, solving these puzzles automatically can be considered an important milestone in developing multimodal AI systems for constrained multi-step reasoning tasks. Our results confirm that the challenging nature of rebuses, even in their verbalized form, makes this task valuable for assessing future progress in LLMs' linguistic proficiency and sequential reasoning abilities. Finally, our rebus-solving LLM can facilitate future interpretability work investigating the mechanisms behind factual recall and multi-step reasoning in transformer models <ref type="bibr" target="#b40">[40]</ref>.</p><p>Limitations Our analysis was limited to a relatively small set of models, and a single prompt template obtained after minimal tuning. Further experiments are needed to verify that memorization patterns after finetuning remain relevant for other model sizes, prompt formats, and training regimes, particularly for full-weight training approaches.  <ref type="bibr" target="#b41">[41]</ref> the pool of available definitions for every word.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A. Additional Data Information</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Dataset statistics</head><p>First pass/Solution word distribution Figure <ref type="figure">2</ref> shows the distribution of first pass and solution words for the filtered EurekaRebus subset used in our work.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>B. Additional Experimental Results</head><p>Table <ref type="table" target="#tab_6">6</ref> presents the correlations between model accuracy and the properties presented in Section 5. Table <ref type="table">7</ref> presents the full ID/OOD performances for all tested models, showing consistent results with Table <ref type="table" target="#tab_2">3</ref> for all prompted models. Table <ref type="table">8</ref> presents Phi-3 Mini performances across rebus-solving fine-tuning steps.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Ali</head><label></label><figDesc></figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>First</head><label></label><figDesc>Pass: M ali -N coni -cane NIA Verbalized Rebus: M [Due calciatori attaccanti] (Two attacking footballers) N [Usati per mangiare il gelato] (Used for eating ice cream) [Abbaia e morde] (Barks and bites) NIA Solution: Malinconica nenia (melancholic lullaby)</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: An example of a verbalized rebus crafted by combining a rebus first pass (intermediate solution) with crossword definitions. We use verbalized rebuses to test LLMs' sequential instruction following capabilities. Image from Settimana Enigmistica n. 4656, © Bresi S.r.l.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1</head><label>1</label><figDesc>Template</figDesc><table><row><cell>Metrics For our granular evaluation of rebus-solving</cell></row><row><cell>performance, we adopt the following set of metrics focus-</cell></row><row><cell>ing on the first passes (FP) and solutions (S) generated</cell></row><row><cell>by LLMs:</cell></row><row><cell>• Definition (Def.): Proportion of correctly</cell></row><row><cell>guessed words during definition resolution.</cell></row></table><note>used in fine-tuning and prompting experiments with highlighted reasoning stages. Example rebus by Il Piacentino, Settimana Enigmistica n. 29423) re-segmenting it into solution words based on the solution key (solution segmentation); and finally 4) producing the solution by copying re-segmented words. We automatically convert rebuses in this format by deriving the solution key from solution word lengths and dynamically infilling the available information into the template. We use a similar format for prompting experiments, with five in-context step-by-step demonstrations and an explicit instruction asking the model to stick to the previous examples' format to streamline solution parsing.</note></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2</head><label>2</label><figDesc>Fine-grained verbalized rebus solving performances of various LLMs. Bold denotes best overall performances, and underline marks best training-free results.</figDesc><table><row><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell cols="2">First Pass (FP)</cell><cell></cell><cell></cell><cell>Solution (S)</cell></row><row><cell></cell><cell></cell><cell>Setup</cell><cell></cell><cell>Def.</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell cols="2">Words Letters</cell><cell>EM</cell><cell cols="3">Key Match FP Match Words</cell><cell>EM</cell></row><row><cell cols="2">LLaMA-3 70B</cell><cell cols="2">5-shot prompt</cell><cell>0.22</cell><cell>0.20</cell><cell>0.60</cell><cell>0.04</cell><cell>0.16</cell><cell>0.51</cell><cell>0.03</cell><cell>0.00</cell></row><row><cell>Qwen-2 72B</cell><cell></cell><cell cols="2">5-shot prompt</cell><cell>0.28</cell><cell>0.25</cell><cell>0.76</cell><cell>0.04</cell><cell>0.20</cell><cell>0.52</cell><cell>0.04</cell><cell>0.00</cell></row><row><cell>GPT-4o</cell><cell></cell><cell cols="2">5-shot prompt</cell><cell>0.55</cell><cell>0.51</cell><cell>0.83</cell><cell>0.15</cell><cell>0.53</cell><cell>0.74</cell><cell>0.27</cell><cell>0.11</cell></row><row><cell cols="4">Claude-3.5 Sonnet 5-shot prompt</cell><cell>0.66</cell><cell>0.62</cell><cell>0.90</cell><cell>0.28</cell><cell>0.83</cell><cell>0.82</cell><cell>0.43</cell><cell>0.24</cell></row><row><cell cols="2">Phi-3 3.8B (ours)</cell><cell cols="2">fine-tuned</cell><cell>0.84</cell><cell>0.84</cell><cell>1.00</cell><cell>0.56</cell><cell>0.86</cell><cell>0.94</cell><cell>0.68</cell><cell>0.51</cell></row><row><cell></cell><cell></cell><cell>GPT-4o</cell><cell></cell><cell cols="3">Phi-3 (ours)</cell><cell></cell><cell></cell><cell></cell></row><row><cell>Metric</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell></cell><cell>Test</cell><cell>Test</cell><cell>Test</cell><cell>Test</cell><cell>Test</cell><cell>Test</cell><cell></cell><cell></cell><cell></cell></row><row><cell></cell><cell>id</cell><cell>ood</cell><cell>Δ</cell><cell>id</cell><cell>ood</cell><cell>Δ</cell><cell></cell><cell></cell><cell></cell></row><row><cell>FP W. ID</cell><cell>0.52</cell><cell>0.51</cell><cell>-0.01</cell><cell>0.96</cell><cell>0.96</cell><cell>0.00</cell><cell></cell><cell></cell><cell></cell></row><row><cell>FP W. OOD</cell><cell>-</cell><cell>0.44</cell><cell>-</cell><cell>-</cell><cell>0.20</cell><cell>-</cell><cell></cell><cell></cell><cell></cell></row><row><cell>FP EM</cell><cell>0.16</cell><cell>0.14</cell><cell>-0.02</cell><cell>0.89</cell><cell>0.18</cell><cell>-0.71</cell><cell></cell><cell></cell><cell></cell></row><row><cell>S W. ID</cell><cell>0.29</cell><cell>0.26</cell><cell>-0.03</cell><cell>0.92</cell><cell>0.49</cell><cell>-0.43</cell><cell></cell><cell></cell><cell></cell></row><row><cell>S W. OOD</cell><cell>0.18</cell><cell>0.16</cell><cell>-0.02</cell><cell>0.63</cell><cell>0.20</cell><cell>-0.40</cell><cell></cell><cell></cell><cell></cell></row><row><cell>S EM</cell><cell>0.12</cell><cell>0.09</cell><cell>-0.03</cell><cell>0.82</cell><cell>0.16</cell><cell>-0.66</cell><cell></cell><cell></cell><cell></cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 3 Model</head><label>3</label><figDesc></figDesc><table /><note>performances for test subsets containing only indomain (Test ID), or some out-of-domain (Test OOD) first pass words. W. ID and W. OOD are accuracies for ID and OOD words for first pass (FP) and solution (S) sequences. Test Δ = Test ID -Test OOD performance.Word Complexity and Frequency Affects LLM Finetuning PerformanceFor every word in the first passes and solutions of test set examples, we measure LLMs' overall accuracy in predicting it for the full test set. We then correlate this score to various quantities that could motivate LLMs' performances. More specifically, we use 1) the word frequency in the training set;</note></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_3"><head></head><label></label><figDesc>evaluate our fine-tuned model in out-of-distribution settings. For this evaluation, the 2k examples of the test set from previous sections are divided into two subsets: one in which all first pass words were seen during finetuning by Phi-3 (Test id, 1061 examples) and one in which, for every example, at least one first pass word was unseen in training (Test ood, 939 examples). Intuitively, if Phi-3 performance is mainly motivated by memorizing fine-tuning data, introducing OOD words should produce a significant drop in model performances. Results shown in Table</figDesc><table /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_4"><head>Table 4</head><label>4</label><figDesc>Examples of LLM generations for rebuses by Slam, Nuova Enigmistica Tascabile n. 2802 (top) and Grizzly, Domenica Quiz n. 2 (bottom). Correct guesses and errors and denoted for predicted first pass definitions (D 1,...,𝑁 ) and solution words (S 𝑖 , with 𝑖 being the 𝑖-th solution key value).</figDesc><table><row><cell></cell><cell>GPT-4o</cell><cell>Claude 3.5S</cell><cell>Phi-3</cell></row><row><cell>D1</cell><cell>p</cell><cell>one</cell><cell>pia</cell></row><row><cell>D2</cell><cell>chela</cell><cell>chela</cell><cell>chela</cell></row><row><cell>D3</cell><cell>ora</cell><cell>data</cell><cell>data</cell></row><row><cell>D4</cell><cell>ginepro</cell><cell>lio</cell><cell>ginepro</cell></row><row><cell>D5</cell><cell>ludo</cell><cell>oca</cell><cell>oca</cell></row><row><cell>D6</cell><cell>acerbi</cell><cell>anni</cell><cell>anni</cell></row><row><cell>S8</cell><cell>Spettacolo</cell><cell>Saponate</cell><cell>Sappiate</cell></row><row><cell>S3</cell><cell>che</cell><cell>che</cell><cell>che</cell></row><row><cell>S2</cell><cell>fa</cell><cell>la</cell><cell>la</cell></row><row><cell>S12</cell><cell>sognare</cell><cell>sbadataggine</cell><cell>sbadataggine</cell></row><row><cell>S7</cell><cell>ogni</cell><cell>vocando</cell><cell>provoca</cell></row><row><cell>S5</cell><cell>sera</cell><cell>danni</cell><cell>danni</cell></row><row><cell cols="4">Soluzione: SAPpiaTE che la SBAdataGgine proVoca Danni</cell></row><row><cell cols="4">Rebus: STU [Si salva otturandolo] D1 S [Ha foglie</cell></row><row><cell cols="4">seghettate] D2 AL [Lo è l'operaio che lavora in cantiere] D3</cell></row><row><cell cols="3">G [Un uomo... non all' altezza] D4</cell><cell></cell></row><row><cell cols="3">Chiave di lettura: 11 7 2 7</cell><cell></cell></row><row><cell>Step</cell><cell>GPT-4o</cell><cell>Claude 3.5S</cell><cell>Phi-3</cell></row><row><cell>D1</cell><cell>tappo</cell><cell>falla</cell><cell>dente</cell></row><row><cell>D2</cell><cell>acero</cell><cell>ortica</cell><cell>aro</cell></row><row><cell>D3</cell><cell>edile</cell><cell>edile</cell><cell>edile</cell></row><row><cell>D4</cell><cell>nano</cell><cell>nano</cell><cell>nano</cell></row><row><cell>S11</cell><cell>Stupaccerone</cell><cell>Stufallassor</cell><cell>Studentesaro</cell></row><row><cell>S7</cell><cell>salendo</cell><cell>ticale</cell><cell>aledile</cell></row><row><cell>S2</cell><cell>al</cell><cell>di</cell><cell>gi</cell></row><row><cell>S7</cell><cell>genano</cell><cell>Legnano</cell><cell>nanano</cell></row><row><cell></cell><cell cols="3">Soluzione: STUdenteSsa liceALe di LeGnano</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_5"><head>Table 5</head><label>5</label><figDesc>Table 5 presents statistics for the Eu-rekaRebus dataset and the filtered subset we use for composing verbalized rebuses. The ItaCW dataset contains a total of 125,202 definitions for 40,963 unique words, with the most frequent words having hundreds of different definitions, e.g. 173 for re (king), 155 for te (you). Definitions used for verbalization are randomly sampled from Statistics for the full EurekaRebus dataset and the crosswordsfiltered subset used in this work. Avg./SD = Average/standard deviation.</figDesc><table><row><cell>Statistic</cell><cell cols="3">EurekaRebus ItaCW-filtered</cell></row><row><cell># examples</cell><cell></cell><cell>222089</cell><cell>83157</cell></row><row><cell># authors</cell><cell></cell><cell>8138</cell><cell>5046</cell></row><row><cell>Year range</cell><cell></cell><cell>1800 -2024</cell><cell>1869 -2024</cell></row><row><cell></cell><cell></cell><cell>First pass</cell><cell></cell></row><row><cell># unique words</cell><cell></cell><cell>38977</cell><cell>8960</cell></row><row><cell cols="2">Avg./SD words/ex.</cell><cell>3.50/1/48</cell><cell>3.08/1.00</cell></row><row><cell cols="2">Avg./SD word len.</cell><cell>6.51/1.96</cell><cell>5.70/1.60</cell></row><row><cell>Avg./SD FP len.</cell><cell></cell><cell>26.45/11.19</cell><cell>25.74/8.73</cell></row><row><cell></cell><cell></cell><cell>Solution</cell><cell></cell></row><row><cell># unique words</cell><cell></cell><cell>75718</cell><cell>42558</cell></row><row><cell cols="2">Avg./SD words/ex.</cell><cell>3.02/1.60</cell><cell>2.80/1.21</cell></row><row><cell cols="2">Avg./SD word len.</cell><cell>8.07/2.30</cell><cell>7.79/2.23</cell></row><row><cell>Avg./SD Sol. len.</cell><cell></cell><cell>19.47/8.44</cell><cell>18.81/6.06</cell></row><row><cell>Model</cell><cell cols="3"># Char. Paisà Freq. Train Freq.</cell></row><row><cell>GPT-4o</cell><cell>-0.01</cell><cell>0.01</cell><cell>0.02</cell></row><row><cell>Claude-3.5</cell><cell>-0.02</cell><cell>-0.02</cell><cell>0.00</cell></row><row><cell>Phi-3 (ours)</cell><cell>-0.11</cell><cell>-0.05</cell><cell>0.44</cell></row><row><cell>GPT-4o</cell><cell>-0.18</cell><cell>0.14</cell><cell>0.19</cell></row><row><cell>Claude-3.5</cell><cell>-0.15</cell><cell>0.08</cell><cell>0.13</cell></row><row><cell>Phi-3 (ours)</cell><cell>-0.02</cell><cell>0.08</cell><cell>0.22</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_6"><head>Table 6</head><label>6</label><figDesc>Spearman's correlation with average word accuracies for metrics computed on first pass (top) and solution (bottom) words. Bold scores are significant with Bonferroni-corrected 𝑝 &lt; 1𝑒 − 5</figDesc><table /></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">Code, data and models are available on Github and Huggingface</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_1">Refer to Miola<ref type="bibr" target="#b5">[6]</ref>,Bartezzaghi [7], Ichino<ref type="bibr" target="#b7">[8]</ref> for a comprehensive overview of peculiarities and norms in modern Italian rebuses.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_2">Referred to as diagramma in jargon.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="4" xml:id="foot_3">https://www.evalita.it</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="5" xml:id="foot_4">http://www.eureka5.it, additional details in Appendix A. Rebus illustrations are not available in Eureka5.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="6" xml:id="foot_5">Since verbalized rebus are produced from textual contents only, crossword definitions may refer to different word meanings (e.g. [Two soccer players] is used to represent the word "wings" in Figure1despite not matching the word sense "bird wings" of the original image. This does not affect the validity of our task.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="7" xml:id="foot_6"><ref type="bibr" target="#b6">7</ref> Composed by Test id and Test ood, described in Section</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="8" xml:id="foot_7">5  8  https://hf.co/spaces/FinancialSupport/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="9" xml:id="foot_8">open_ita_llm_leaderboard 9 https://</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="10" xml:id="foot_9">github.com/unslothai/unsloth<ref type="bibr" target="#b9">10</ref> An English example is available in Table9</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="11" xml:id="foot_10">In practice, we define this as 1 − CER(FP, S), where CER is the character error rate<ref type="bibr" target="#b37">[37]</ref> between the two sequences (lowercased, whitespace removed) computed with Jiwer</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="13" xml:id="foot_11">For example, rebuses requiring first pass anagrams (anarebus) or dynamic relations derived from multi-scene analysis (stereorebus)</note>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgments</head><p>Gabriele Sarti and Arianna Bisazza acknowledge the support of the Dutch Research Council (NWO) for the project InDeep (NWA.1292.19.399). Arianna Bisazza is further supported by the NWO Talent Programme (VI.Vidi.221C.009). We are grateful to the Associazione Culturale "Biblioteca Enigmistica Italiana -G. Panini" for making its rebus collection freely accessible on the Eureka5 platform, and to Valeriya Zelenkova for her valuable comments on the first version of this work. We also thank the CLiC-it 2024 reviewers for their valuable feedback.</p></div>
			</div>

			<div type="annex">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>C. Additional Model Generations</head><p>Table <ref type="table">9</ref> presents an English translation of Figure <ref type="figure">1</ref> example using the prompt format adopted in this study.</p><p>Tables <ref type="table">10 and 11</ref> provide additional example of LLM generations for tested rebuses, with the example from Table 11 (bottom) being OOD due to the manovella (crank) word in D2, and the others being ID for the fine-tuned # Train Steps Def. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>First Pass (FP) Solution (S)</head><note type="other">Words</note></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Prompt</head><p>Solve the clues provided between parentheses to obtain a first pass, and use the solution key to obtain the rebus' solution.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Rebus: M [Two attacking footballers] N [Used for eating icecream] [Barks and bites] NIA</head><p>Solution key: 11 5 Generation Let's proceed to solve the rebus step by step:  </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Two attacking footballers] = ali (wings) -N = N -[Used for eating icecream] = coni (cones) -[Barks and bites] = cane (dog) -N I A = N I A</head></div>			</div>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Mastering the game of Go with deep neural networks and tree search</title>
		<author>
			<persName><forename type="first">D</forename><surname>Silver</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Huang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">J</forename><surname>Maddison</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Guez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Sifre</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Van Den Driessche</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Schrittwieser</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Antonoglou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Panneershelvam</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Lanctot</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Dieleman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Grewe</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Nham</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Kalchbrenner</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Sutskever</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Lillicrap</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Leach</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Kavukcuoglu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Graepel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Hassabis</surname></persName>
		</author>
		<idno type="DOI">10.1038/nature16961</idno>
	</analytic>
	<monogr>
		<title level="j">Nature</title>
		<imprint>
			<biblScope unit="volume">529</biblScope>
			<biblScope unit="page" from="484" to="489" />
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">A general reinforcement learning algorithm that masters chess, shogi, and go through self-play</title>
		<author>
			<persName><forename type="first">D</forename><surname>Silver</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Hubert</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Schrittwieser</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Antonoglou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Lai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Guez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Lanctot</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Sifre</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Kumaran</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Graepel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Lillicrap</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Simonyan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Hassabis</surname></persName>
		</author>
		<idno type="DOI">10.1126/science.aar6404</idno>
	</analytic>
	<monogr>
		<title level="j">Science</title>
		<imprint>
			<biblScope unit="volume">362</biblScope>
			<biblScope unit="page" from="1140" to="1144" />
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Decrypting cryptic crosswords: Semantically complex wordplay puzzles as a target for nlp</title>
		<author>
			<persName><forename type="first">J</forename><surname>Rozner</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Potts</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Mahowald</surname></persName>
		</author>
		<ptr target="https://proceedings.neurips.cc/paper_files/paper/2021/file/5f1d3986fae10ed2994d14ecd89892d7-Paper.pdf" />
	</analytic>
	<monogr>
		<title level="m">Advances in Neural Information Processing Systems</title>
				<editor>
			<persName><forename type="first">M</forename><surname>Ranzato</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">A</forename><surname>Beygelzimer</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Y</forename><surname>Dauphin</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">P</forename><surname>Liang</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">J</forename><forename type="middle">W</forename><surname>Vaughan</surname></persName>
		</editor>
		<imprint>
			<publisher>Curran Associates, Inc</publisher>
			<date type="published" when="2021">2021</date>
			<biblScope unit="volume">34</biblScope>
			<biblScope unit="page" from="11409" to="11421" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Automated crossword solving</title>
		<author>
			<persName><forename type="first">E</forename><surname>Wallace</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Tomlin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Xu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Yang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Pathak</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Ginsberg</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Klein</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2022.acl-long.219</idno>
		<ptr target="https://aclanthology.org/2022.acl-long.219.doi:10.18653/v1/2022.acl-long.219" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics</title>
				<editor>
			<persName><forename type="first">S</forename><surname>Muresan</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">P</forename><surname>Nakov</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">A</forename><surname>Villavicencio</surname></persName>
		</editor>
		<meeting>the 60th Annual Meeting of the Association for Computational Linguistics<address><addrLine>Dublin, Ireland</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2022">2022</date>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page" from="3073" to="3085" />
		</imprint>
	</monogr>
	<note>: Long Papers), Association for Computational Linguistics</note>
</biblStruct>

<biblStruct xml:id="b4">
	<monogr>
		<author>
			<persName><forename type="first">D</forename><surname>Tolosani</surname></persName>
		</author>
		<title level="m">Enimmistica</title>
				<meeting><address><addrLine>Hoepli, Milan</addrLine></address></meeting>
		<imprint>
			<date type="published" when="1901">1901</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<monogr>
		<author>
			<persName><forename type="first">E</forename><surname>Miola</surname></persName>
		</author>
		<title level="m">Che cos&apos;è un rebus</title>
				<imprint>
			<publisher>Carocci</publisher>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<monogr>
		<title level="m" type="main">Parole in gioco: Per una semiotica del gioco linguistico</title>
		<author>
			<persName><forename type="first">S</forename><surname>Bartezzaghi</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2017">2017</date>
			<publisher>Bompiani</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<monogr>
		<title level="m" type="main">L&apos;ora desiata vola: guida al mondo del rebus per solutori (ancora) poco abili</title>
		<author>
			<persName><forename type="first">P</forename><surname>Ichino</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2021">2021</date>
			<publisher>Bompiani</publisher>
			<pubPlace>Milan</pubPlace>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Riddle me this: Evaluating large language models in solving word-based games</title>
		<author>
			<persName><forename type="first">R</forename><surname>Manna</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">P</forename><surname>Di Buono</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Monti</surname></persName>
		</author>
		<ptr target="https://aclanthology.org/2024.games-1.11" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 10th Workshop on Games and Natural Language Processing @ LREC-COLING 2024</title>
				<editor>
			<persName><forename type="first">C</forename><surname>Madge</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">J</forename><surname>Chamberlain</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">K</forename><surname>Fort</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">U</forename><surname>Kruschwitz</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">S</forename><surname>Lukin</surname></persName>
		</editor>
		<meeting>the 10th Workshop on Games and Natural Language Processing @ LREC-COLING 2024<address><addrLine>Torino, Italia</addrLine></address></meeting>
		<imprint>
			<publisher>ELRA and ICCL</publisher>
			<date type="published" when="2024">2024</date>
			<biblScope unit="page" from="97" to="106" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<monogr>
		<title level="m" type="main">Puzzle solving using reasoning of large language models: A survey</title>
		<author>
			<persName><forename type="first">P</forename><surname>Giadikiaroglou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Lymperaiou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Filandrianos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Stamou</surname></persName>
		</author>
		<idno>ArXiv</idno>
		<ptr target="https://arxiv.org/abs/2402.11291" />
		<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">A probabilistic approach to solving crossword puzzles</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">L</forename><surname>Littman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">A</forename><surname>Keim</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Shazeer</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Artificial Intelligence</title>
		<imprint>
			<biblScope unit="volume">134</biblScope>
			<biblScope unit="page">23</biblScope>
			<date type="published" when="2002">2002</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<monogr>
		<title/>
		<idno type="DOI">10.1016/S0004-3702(01)00114-X</idno>
		<ptr target="https://doi.org/10.1016/S0004-3702(01)00114-X" />
		<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Webcrow: A web-based system for crossword solving</title>
		<author>
			<persName><forename type="first">M</forename><surname>Ernandes</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Angelini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Gori</surname></persName>
		</author>
		<idno type="DOI">10.1007/11590323_37</idno>
		<ptr target="https://link.springer.com/chapter/10.1007/11590323_37" />
	</analytic>
	<monogr>
		<title level="m">AAAI Conference on Artificial Intelligence</title>
				<imprint>
			<date type="published" when="2005">2005</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<monogr>
		<title level="m" type="main">Are llms good cryptic crossword solvers?</title>
		<author>
			<persName><forename type="first">A</forename><surname>Boda</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Sadallah</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Kotova</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Kochmar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Yao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Yu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Zhao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">L</forename><surname>Shafran</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Griffiths</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">N</forename><surname>Cao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Yousefi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Betthauser</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Hasanbeig</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Milliere</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Momennejad</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>De-Coding</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Zugarini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Röthenbacher</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Klede</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">M</forename><surname>Ernandes</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">Z</forename><surname>Eskofier</surname></persName>
		</author>
		<ptr target="https://arxiv.org/abs/2403.12094" />
		<imprint>
			<date type="published" when="2023">2023. 2024</date>
		</imprint>
	</monogr>
	<note type="report_type">ArXiv</note>
</biblStruct>

<biblStruct xml:id="b14">
	<monogr>
		<author>
			<persName><forename type="first">G</forename><surname>Todd</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Merino</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Earle</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Togelius</surname></persName>
		</author>
		<ptr target="https://arxiv.org/abs/2404.11730" />
		<title level="m">Missed connections: Lateral thinking puzzles for large language models</title>
				<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
	<note type="report_type">Arxiv</note>
</biblStruct>

<biblStruct xml:id="b15">
	<monogr>
		<title level="m" type="main">Finding the optimal human strategy for wordle using maximum correct letter probabilities and reinforcement learning</title>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">J</forename><surname>Anderson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">G</forename><surname>Meyer</surname></persName>
		</author>
		<ptr target="https://arxiv.org/abs/2202.00557" />
		<imprint>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
	<note type="report_type">Arxiv</note>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">The webcrow french crossword solver</title>
		<author>
			<persName><forename type="first">G</forename><surname>Angelini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Ernandes</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Laquinta</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Stehl'e</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Simoes</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Zeinalipour</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Zugarini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Gori</surname></persName>
		</author>
		<idno type="DOI">10.1007/978-3-031-55722-4_14</idno>
		<ptr target="https://link.springer.com/chapter/10.1007/978-3-031-55722-4_14" />
	</analytic>
	<monogr>
		<title level="m">Intelligent Technologies for Interactive Entertainment</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">Die rätselrevolution: Automated german crossword solving</title>
		<author>
			<persName><forename type="first">A</forename><surname>Zugarini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Rothenbacher</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Klede</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Ernandes</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">M</forename><surname>Eskofier</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Zanca</surname></persName>
		</author>
		<ptr target="https://ceur-ws.org/Vol-3596" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 9th Italian Conference on Computational Linguistics (CLiC-it 2023)</title>
				<meeting>the 9th Italian Conference on Computational Linguistics (CLiC-it 2023)</meeting>
		<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">Solving italian crosswords using the web</title>
		<author>
			<persName><forename type="first">G</forename><surname>Angelini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Ernandes</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Gori</surname></persName>
		</author>
		<idno type="DOI">10.1007/11558590_40</idno>
		<ptr target="https://link.springer.com/chapter/10.1007/11558590_40" />
	</analytic>
	<monogr>
		<title level="m">International Conference of the Italian Association for Artificial Intelligence</title>
				<imprint>
			<date type="published" when="2005">2005</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">Clue-instruct: Text-based clue generation for educational crossword puzzles</title>
		<author>
			<persName><forename type="first">A</forename><surname>Zugarini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Zeinalipour</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">S</forename><surname>Kadali</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Maggini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Gori</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Rigutini</surname></persName>
		</author>
		<ptr target="https://aclanthology.org/2024.lrec-main.297" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)</title>
				<editor>
			<persName><forename type="first">N</forename><surname>Calzolari</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">M.-Y</forename><surname>Kan</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">V</forename><surname>Hoste</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">A</forename><surname>Lenci</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">S</forename><surname>Sakti</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">N</forename><surname>Xue</surname></persName>
		</editor>
		<meeting>the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)<address><addrLine>Torino, Italia</addrLine></address></meeting>
		<imprint>
			<publisher>ELRA and ICCL</publisher>
			<date type="published" when="2024">2024</date>
			<biblScope unit="page" from="3347" to="3356" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">Italian crossword generator: Enhancing education through interactive word puzzles</title>
		<author>
			<persName><forename type="first">K</forename><surname>Zeinalipour</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Iaquinta</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Zanollo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Angelini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Rigutini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Maggini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Gori</surname></persName>
		</author>
		<ptr target="https://ceur-ws.org/Vol-3596" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 9th Italian Conference on Computational Linguistics (CLiC-it 2023)</title>
				<meeting>the 9th Italian Conference on Computational Linguistics (CLiC-it 2023)</meeting>
		<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<monogr>
		<title level="m" type="main">A turkish educational crossword puzzle generator</title>
		<author>
			<persName><forename type="first">K</forename><surname>Zeinalipour</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><forename type="middle">G</forename><surname>Keptig</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Maggini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Rigutini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Gori</surname></persName>
		</author>
		<idno>ArXiv abs/2405.07035</idno>
		<ptr target="https://arxiv.org/abs/2405.07035v2" />
		<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<analytic>
		<title level="a" type="main">Ghigliottin-ai@evalita2020: Evaluating artificial players for the language game &quot;la ghigliottina</title>
		<author>
			<persName><forename type="first">P</forename><surname>Basile</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Lovetere</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Monti</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Pascucci</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Sangati</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Siciliani</surname></persName>
		</author>
		<idno type="DOI">10.4000/books.aaccademia.7488</idno>
		<ptr target="https://doi.org/10.4000/books.aaccademia.7488" />
	</analytic>
	<monogr>
		<title level="m">EVALITA Evaluation of NLP and Speech Tools for Italian -December 17th</title>
				<imprint>
			<date type="published" when="2020">2020. 2020</date>
		</imprint>
	</monogr>
	<note>short paper</note>
</biblStruct>

<biblStruct xml:id="b23">
	<analytic>
		<title level="a" type="main">Solving a complex language game by using knowledgebased word associations discovery</title>
		<author>
			<persName><forename type="first">P</forename><surname>Basile</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>De Gemmis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Lops</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Semeraro</surname></persName>
		</author>
		<idno type="DOI">10.1109/TCIAIG.2014.2355859</idno>
	</analytic>
	<monogr>
		<title level="j">IEEE Transactions on Computational Intelligence and AI in Games</title>
		<imprint>
			<biblScope unit="volume">8</biblScope>
			<biblScope unit="page" from="13" to="26" />
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b24">
	<monogr>
		<title level="m" type="main">The sifo benchmark: Investigating the sequential instruction following ability of large language models</title>
		<author>
			<persName><forename type="first">X</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Liao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Qi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Eustratiadis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Monz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Bisazza</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>De Rijke</surname></persName>
		</author>
		<ptr target="https://arxiv.org/abs/2406.19999.arXiv:2406.19999" />
		<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b25">
	<monogr>
		<title level="m" type="main">Fine-tuning large language models with sequential instructions</title>
		<author>
			<persName><forename type="first">H</forename><surname>Hu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Yu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">M</forename><surname>Ponti</surname></persName>
		</author>
		<ptr target="https://arxiv.org/abs/2403.07794" />
		<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
	<note type="report_type">Arxiv</note>
</biblStruct>

<biblStruct xml:id="b26">
	<analytic>
		<title level="a" type="main">Language models are few-shot learners</title>
		<author>
			<persName><forename type="first">T</forename><surname>Brown</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Mann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Ryder</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Subbiah</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">D</forename><surname>Kaplan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Dhariwal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Neelakantan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Shyam</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Sastry</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Askell</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Agarwal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Herbert-Voss</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Krueger</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Henighan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Child</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Ramesh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Ziegler</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Winter</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Hesse</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Sigler</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Litwin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Gray</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Chess</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Clark</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Berner</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Mccandlish</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Radford</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Sutskever</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Amodei</surname></persName>
		</author>
		<ptr target="https://proceedings.neurips.cc/paper_files/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf" />
	</analytic>
	<monogr>
		<title level="m">Advances in Neural Information Processing Systems</title>
				<editor>
			<persName><forename type="first">H</forename><surname>Larochelle</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">M</forename><surname>Ranzato</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">R</forename><surname>Hadsell</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">M</forename><surname>Balcan</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">H</forename><surname>Lin</surname></persName>
		</editor>
		<imprint>
			<publisher>Curran Associates, Inc</publisher>
			<date type="published" when="2020">2020</date>
			<biblScope unit="volume">33</biblScope>
			<biblScope unit="page" from="1877" to="1901" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b27">
	<analytic>
		<title level="a" type="main">Chainof-thought prompting elicits reasoning in large language models</title>
		<author>
			<persName><forename type="first">J</forename><surname>Wei</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Schuurmans</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Bosma</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Ichter</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Xia</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><forename type="middle">V</forename><surname>Chi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Le</surname></persName>
		</author>
		<author>
			<persName><surname>Zhou</surname></persName>
		</author>
		<ptr target="https://proceedings.neurips.cc/paper_files/paper/2022/file/9d5609613524ecf4f15af0f7b31abca4-Paper-Conference.pdf" />
	</analytic>
	<monogr>
		<title level="m">Advances in Neural Information Processing Systems</title>
				<editor>
			<persName><forename type="first">S</forename><surname>Koyejo</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">S</forename><surname>Mohamed</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">A</forename><surname>Agarwal</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">D</forename><surname>Belgrave</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">K</forename><surname>Cho</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">A</forename><surname>Oh</surname></persName>
		</editor>
		<imprint>
			<publisher>Curran Associates, Inc</publisher>
			<date type="published" when="2022">2022</date>
			<biblScope unit="volume">35</biblScope>
			<biblScope unit="page" from="24824" to="24837" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b28">
	<monogr>
		<author>
			<persName><forename type="first">M</forename><surname>Abdin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">A</forename><surname>Jacobs</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">A</forename><surname>Awan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Aneja</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Awadallah</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Awadalla</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Bach</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Bahree</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Bakhtiari</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Bao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Behl</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Benhaim</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Bilenko</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Bjorck</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Bubeck</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><forename type="middle">C</forename></persName>
		</author>
		<ptr target="https://arxiv.org/abs/2404.14219" />
		<title level="m">Phi-3 technical report: A highly capable language model locally on your phone</title>
				<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
	<note type="report_type">Arxiv</note>
</biblStruct>

<biblStruct xml:id="b29">
	<analytic>
		<title level="a" type="main">LoRA: Low-rank adaptation of large language models</title>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">J</forename><surname>Hu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Shen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Wallis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Allen-Zhu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><surname>Chen</surname></persName>
		</author>
		<ptr target="https://openreview.net/forum?id=nZeVKeeFYf9" />
	</analytic>
	<monogr>
		<title level="m">The Tenth International Conference on Learning Representations (ICLR 2022)</title>
				<meeting><address><addrLine>OpenReview, Online</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b30">
	<analytic>
		<title level="a" type="main">Qlora: Efficient finetuning of quantized llms</title>
		<author>
			<persName><forename type="first">T</forename><surname>Dettmers</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Pagnoni</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Holtzman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Zettlemoyer</surname></persName>
		</author>
		<ptr target="https://proceedings.neurips.cc/paper_files/paper/2023/file/1feb87871436031bdc0f2beaa62a049b-Paper-Conference.pdf" />
	</analytic>
	<monogr>
		<title level="m">Advances in Neural Information Processing Systems</title>
				<editor>
			<persName><forename type="first">A</forename><surname>Oh</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">T</forename><surname>Naumann</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">A</forename><surname>Globerson</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">K</forename><surname>Saenko</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">M</forename><surname>Hardt</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">S</forename><surname>Levine</surname></persName>
		</editor>
		<imprint>
			<publisher>Curran Associates, Inc</publisher>
			<date type="published" when="2023">2023</date>
			<biblScope unit="volume">36</biblScope>
			<biblScope unit="page" from="10088" to="10115" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b31">
	<analytic>
		<title level="a" type="main">Transformers: State-of-the-art natural language processing</title>
		<author>
			<persName><forename type="first">T</forename><surname>Wolf</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Debut</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Sanh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Chaumond</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Delangue</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Moi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Cistac</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Rault</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Louf</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Funtowicz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Davison</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Shleifer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Von Platen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Ma</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Jernite</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Plu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Xu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">Le</forename><surname>Scao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Gugger</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Drame</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><surname>Lhoest</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Rush</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2020.emnlp-demos.6</idno>
		<ptr target="https://aclanthology.org/2020.emnlp-demos.6.doi:10.18653/v1/2020.emnlp-demos.6" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, Association for Computational Linguistics</title>
				<editor>
			<persName><forename type="first">Q</forename><surname>Liu</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">D</forename><surname>Schlangen</surname></persName>
		</editor>
		<meeting>the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, Association for Computational Linguistics<address><addrLine>Online</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="38" to="45" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b32">
	<monogr>
		<author>
			<persName><surname>Openai</surname></persName>
		</author>
		<ptr target="https://openai.com/index/hello-gpt-4o" />
		<title level="m">Hello gpt-4o</title>
				<imprint>
			<publisher>Website</publisher>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b33">
	<monogr>
		<ptr target="https://www.anthropic.com/news/claude-3-5-sonnet" />
		<title level="m">Claude 3.5 sonnet</title>
				<imprint>
			<publisher>Website</publisher>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
	<note>Anthropic</note>
</biblStruct>

<biblStruct xml:id="b34">
	<monogr>
		<title/>
		<author>
			<persName><forename type="first">A</forename><surname>Yang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Yang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Hui</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Zheng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Yu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Zhou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Huang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Dong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Wei</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Lin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Tang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Yang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Tu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Ma</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Xu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Zhou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Bai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>He</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Lin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Dang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Lu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Yang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Xue</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Ni</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Peng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Men</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Gao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Lin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Bai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Tan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Zhu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Ge</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Deng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Zhou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Ren</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Wei</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Ren</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Fan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Yao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Wan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Chu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Cui</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Fan</surname></persName>
		</author>
		<ptr target="https://arxiv.org/abs/2407.10671" />
		<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
	<note type="report_type">Qwen2 technical report</note>
</biblStruct>

<biblStruct xml:id="b35">
	<monogr>
		<author>
			<persName><forename type="first">M</forename><surname>Ai</surname></persName>
		</author>
		<ptr target="https://ai.meta.com/blog/meta-llama-3" />
		<title level="m">Introducing meta llama 3: The most capable openly available llm to date</title>
				<imprint>
			<publisher>Website</publisher>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b36">
	<monogr>
		<title level="m" type="main">Disce aut deficere: Evaluating llms proficiency on the invalsi italian benchmark</title>
		<author>
			<persName><forename type="first">F</forename><surname>Mercorio</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Mezzanzanica</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Potertì</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Serino</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Seveso</surname></persName>
		</author>
		<ptr target="https://arxiv.org/abs/2406.17535" />
		<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b37">
	<monogr>
		<title level="m" type="main">From wer and ril to mer and wil: improved evaluation measures for connected speech recognition</title>
		<author>
			<persName><forename type="first">A</forename><surname>Morris</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Maier</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Green</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2004">2004</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b38">
	<analytic>
		<title level="a" type="main">The PAISÀ corpus of Italian web texts</title>
		<author>
			<persName><forename type="first">V</forename><surname>Lyding</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Stemle</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Borghetti</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Brunello</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Castagnoli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Dell'orletta</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Dittmann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Lenci</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Pirrelli</surname></persName>
		</author>
		<idno type="DOI">10.3115/v1/W14-0406</idno>
		<ptr target="https://aclanthology.org/W14-0406.doi:10.3115/v1/W14-0406" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 9th Web as Corpus Workshop (WaC-9), Association for Computational Linguistics</title>
				<editor>
			<persName><forename type="first">F</forename><surname>Bildhauer</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">R</forename><surname>Schäfer</surname></persName>
		</editor>
		<meeting>the 9th Web as Corpus Workshop (WaC-9), Association for Computational Linguistics<address><addrLine>Gothenburg, Sweden</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2014">2014</date>
			<biblScope unit="page" from="36" to="43" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b39">
	<analytic>
		<title level="a" type="main">Crossword puzzle resolution via monte carlo tree search</title>
		<author>
			<persName><forename type="first">L</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Jiang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Liang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Xiao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Song</surname></persName>
		</author>
		<idno type="DOI">10.1609/icaps.v32i1.19783</idno>
		<ptr target="https://ojs.aaai.org/index.php/ICAPS/article/view/19783.doi:10.1609/icaps.v32i1.19783" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the International Conference on Automated Planning and Scheduling</title>
				<meeting>the International Conference on Automated Planning and Scheduling</meeting>
		<imprint>
			<date type="published" when="2022">2022</date>
			<biblScope unit="volume">32</biblScope>
			<biblScope unit="page" from="35" to="43" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b40">
	<monogr>
		<author>
			<persName><forename type="first">J</forename><surname>Ferrando</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Sarti</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Bisazza</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">R</forename><surname>Costa</surname></persName>
		</author>
		<ptr target="https://arxiv.org/abs/2405.00208" />
		<title level="m">A primer on the inner workings of transformerbased language models</title>
				<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
	<note type="report_type">Arxiv</note>
	<note>-jussà</note>
</biblStruct>

<biblStruct xml:id="b41">
	<analytic>
		<title level="a" type="main">Teoria statistica delle classi e calcolo delle probabilita, Pubblicazioni del R. Istituto</title>
		<author>
			<persName><forename type="first">C</forename><surname>Bonferroni</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Superiore di Scienze Economiche e Commericiali di Firenze</title>
		<imprint>
			<biblScope unit="volume">8</biblScope>
			<biblScope unit="page" from="3" to="62" />
			<date type="published" when="1936">1936</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
