<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">EurekaRebus -Verbalized Rebus Solving with LLMs: A CALAMITA Challenge</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Gabriele</forename><surname>Sarti</surname></persName>
							<email>g.sarti@rug.nl</email>
							<affiliation key="aff0">
								<orgName type="department">Center for Language and Cognition (CLCG)</orgName>
								<orgName type="institution">University of Groningen</orgName>
								<address>
									<addrLine>Oude Kijk in &apos;t Jatstraat 26</addrLine>
									<postCode>9712EK</postCode>
									<settlement>Groningen</settlement>
									<country key="NL">The Netherlands</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Tommaso</forename><surname>Caselli</surname></persName>
							<email>t.caselli@rug.nl</email>
							<affiliation key="aff0">
								<orgName type="department">Center for Language and Cognition (CLCG)</orgName>
								<orgName type="institution">University of Groningen</orgName>
								<address>
									<addrLine>Oude Kijk in &apos;t Jatstraat 26</addrLine>
									<postCode>9712EK</postCode>
									<settlement>Groningen</settlement>
									<country key="NL">The Netherlands</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Arianna</forename><surname>Bisazza</surname></persName>
							<email>a.bisazza@rug.nl</email>
							<affiliation key="aff0">
								<orgName type="department">Center for Language and Cognition (CLCG)</orgName>
								<orgName type="institution">University of Groningen</orgName>
								<address>
									<addrLine>Oude Kijk in &apos;t Jatstraat 26</addrLine>
									<postCode>9712EK</postCode>
									<settlement>Groningen</settlement>
									<country key="NL">The Netherlands</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Malvina</forename><surname>Nissim</surname></persName>
							<email>m.nissim@rug.nl</email>
							<affiliation key="aff0">
								<orgName type="department">Center for Language and Cognition (CLCG)</orgName>
								<orgName type="institution">University of Groningen</orgName>
								<address>
									<addrLine>Oude Kijk in &apos;t Jatstraat 26</addrLine>
									<postCode>9712EK</postCode>
									<settlement>Groningen</settlement>
									<country key="NL">The Netherlands</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">EurekaRebus -Verbalized Rebus Solving with LLMs: A CALAMITA Challenge</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">0B7CE57944EC2733BD3C52282D569247</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T17:35+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Large language models</term>
					<term>Sequential reasoning</term>
					<term>Puzzle</term>
					<term>Rebus</term>
					<term>Crosswords</term>
					<term>Enigmistica Italiana</term>
					<term>CALAMITA 1. Challenge: Introduction and Motivation</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Language games can be valuable resources for testing the ability of large language models (LLMs) to conduct challenging multi-step, knowledge-intensive inferences while respecting predefined constraints. Our proposed challenge prompts LLMs to reason step-by-step to solve verbalized variants of rebus games recently introduced with the EurekaRebus dataset <ref type="bibr" target="#b0">[1]</ref>. Verbalized rebuses replace visual cues with crossword definitions to create an encrypted first pass, making the problem entirely text-based. We introduce a simplified task variant with word length hints and adopt a comprehensive set of metrics to obtain a granular overview of models' performance in knowledge recall, constraints adherence, and re-segmentation abilities across reasoning steps.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Language games were adopted as testbeds for measuring NLP progress in recent years <ref type="bibr" target="#b1">[2,</ref><ref type="bibr" target="#b2">3,</ref><ref type="bibr" target="#b3">4]</ref>, with a particular focus on (cryptic) crossword solving English <ref type="bibr" target="#b4">[5,</ref><ref type="bibr" target="#b5">6,</ref><ref type="bibr" target="#b6">7,</ref><ref type="bibr" target="#b7">8,</ref><ref type="bibr" target="#b8">9]</ref>. For the Italian language, initial efforts focused on crossword solving and generation <ref type="bibr" target="#b9">[10,</ref><ref type="bibr" target="#b10">11]</ref> and clue-based word guessing <ref type="bibr" target="#b11">[12,</ref><ref type="bibr" target="#b12">13,</ref><ref type="bibr" target="#b8">9]</ref>. Recently, Sarti et al. <ref type="bibr" target="#b0">[1]</ref> introduced an extensive collection of text-adapted Italian rebus puzzles to evaluate large language models' (LLMs) knowledge and sequential reasoning abilities. Rebuses are complex puzzles combining visual elements and graphic signs to encode a hidden phrase. Italian can boast a rich and long-standing rebus tradition dating back to the 19th century <ref type="bibr" target="#b13">[14]</ref>, popularized by high-diffusion magazines such as La Settimana Enigmistica 1 . The structure of Italian rebuses has, with time, been formalized into beauty canons <ref type="bibr" target="#b14">[15]</ref>, and their peculiarities and design principles were analyzed by several authors <ref type="bibr" target="#b15">[16,</ref><ref type="bibr" target="#b16">17,</ref><ref type="bibr" target="#b17">18]</ref>.</p><p>In Italian rebuses, rebus solving begins by combining derived by combining graphemes with their underlying visual elements in a left-to-right fashion, composing a first pass (prima lettura) representing an intermediate solution of the puzzle. Then, first pass elements are re- This work proposes to adopt the EurekaRebus introduced by Sarti et al. <ref type="bibr" target="#b0">[1]</ref> to extend their evaluation of LLMs' multi-step reasoning and linguistic/cultural awareness to the systems evaluated as part of the CALAMITA evaluation campaign <ref type="bibr" target="#b18">[19]</ref>. We believe the task is particularly relevant since the crossword definitions that compose verbalized rebuses rely heavily on idiomatic expressions, wordplay, and cultural references specific to Italian. Hence, the results of this task could provide valuable insights into the linguistic and cultural competence of LLMs trained on the Italian language. Moreover, the task is especially appealing since it is framed in a templated reasoning format, enabling us to disentangle the various components required to successfully solve a verbalized rebus step-by-step. More specifically, several metrics will be employed to assess LLMs' factual recall, textual concatenation and re-segmentation capabilities and, finally, constraint satisfaction given the provided cues.</p><p>In light of the results reported by <ref type="bibr" target="#b0">[1]</ref> for state-of-theart proprietary LLMs, we expect all tested open-source systems to perform very poorly, with final solution accuracies well below 30%. We also note that the highest reported overall performance in previous work 2 was found by the original authors to be primarily the product of memorization. We anticipate that this challenge will highlight significant limitations in LLMs' current factual recall and multi-step reasoning ability and act as a catalyst for future improvements in these areas.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Challenge: Description</head><p>The proposed challenge aims to evaluate the capabilities of existing LLMs in solving verbalized Italian rebuses via prompting at various granularity levels. More specifically, LLMs will be evaluated in a few-shot prompting setting with two fixed in-context learning examples pre-selected at random from the available pool of verbalized rebuses in EurekaRebus, in two settings:</p><p>• Regular, matching the example in table <ref type="table">1</ref> and the original input format used by Sarti et al. <ref type="bibr" target="#b0">[1]</ref>. • Hints, in which the number of characters for every hidden word is provided alongside definitions in the verbalized rebus to help the model in identifying the correct choice. This variant was not tested by Sarti et al. <ref type="bibr" target="#b0">[1]</ref>.</p><p>Refer to section 3.3 for the respective example formats. Models will be evaluated on their performance at each step required to successfully solve the verbalized rebus and their overall ability to produce correct final solutions. 2 Namely 58% Solution Exact Match for a LLaMA-3.1 8B model LoRAtuned on 80k EurekaRebus examples <ref type="bibr" target="#b19">[20,</ref><ref type="bibr" target="#b20">21]</ref> </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Data description</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">Origin of data</head><p>The dataset used for this challenge is an extended version of EurekaRebus <ref type="bibr" target="#b0">[1]</ref>, a collection of 222,089 unique Italian rebuses extracted from Eureka5 platform <ref type="foot" target="#foot_0">3</ref> , an open database of rebuses and other linguistic puzzles maintained by the Associazione Culturale "Biblioteca Enigmistica Italiana -G. Panini"<ref type="foot" target="#foot_1">4</ref> . Among these, 83,157 were converted by the original authors in verbalized form by leveraging the crossword definitions from the ItaCW collection <ref type="bibr" target="#b9">[10]</ref>, including 125,202 definition-solution pairs. While Sarti et al. <ref type="bibr" target="#b0">[1]</ref> evaluated the performances of prompted and tuned LLMs on rebuses up to June 17th, 2024, the current test set include 168 new unseen examples released on Eureka5 after that date.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">Annotation details</head><p>We employ the same procedure of Sarti et al. <ref type="bibr" target="#b0">[1]</ref> for verbalizing available rebuses. More specifically, only rebuses having all lowercased or camel-cased words among ItaCW solutions are selected, and every word is replaced by sampling one of the available crossword definitions for it at random. <ref type="foot" target="#foot_2">5</ref> Moreover, only regular rebuses containing at least two hidden words are selected, avoiding examples requiring a single definition-solving step and those with more complex templates (e.g., anarebuses using anagrams of hidden words for the solution).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3.">Data format</head><p>Each example in the dataset consists of:</p><p>• The verbalized rebus (verbalized_rebus) containing letters from the original rebus and crossword-style definitions enclosed in square brackets. • The whitespace-separated solution words obtained after resegmenting the first pass according to the solution key, provided in a semicolon-separated string in order of occurrence (solution_words). • The solution of the verbalized rebus used as the final prediction target for the LLM (solution).</p><p>An example is provided in Listing 1.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.4.">Prompting</head><p>Table <ref type="table">1</ref> shows the 2-shot prompting template adopted for generating a templated solution with the tested LLMs.</p><p>The second in-context example used in the template, omitted for brevity, corresponds to the one shown in Listing 1.</p><p>The task description provided to the model was derived from a trial-and-error process starting from the original prompt by Sarti et al. <ref type="bibr" target="#b0">[1]</ref>. Notably, compared to the original authors the task description provides more detailed descriptions of individual components of the rebus to provide a clearer overview of the task to the LLM. We opted for a 2-shot setting as opposed to the 5-shot prompting employed by Sarti et al. <ref type="bibr" target="#b0">[1]</ref> to accommodate the limited context length of some of the tested LLMs, thus ensuring that the total length after model generation does not exceed 1024 tokens 6 . The two examples provided remain the same shown here to simplify evaluation and ensure consistent results.</p><p>Verbalized rebus solving steps Table <ref type="table">1</ref> provide labels for the steps necessary to solve the verbalized rebus that are considered in this challenge task. The model receives a problem input including a verbalized rebus (possibly with length hints) and a solution key (chiave di lettura). The first step involves resolving crossword definitions in order (Definition resolution), exploiting only the model's parametric knowledge to accomplish 6 The LLaMA 3 tokenizer was used to perform this estimate the task. Then, the resolved words need to be infilled into the original rebus to compose the first pass, and re-segmented in the Solution segmentation step. Finally, the individual solution words are reassembled into a single solution string. While prompted models should obtain similar performances across all test subsets, the aformentioned division will enable further comparisons with previously trained systems.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.5.">Detailed data statistics</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Metrics</head><p>The challenge employs a comprehensive set of metrics adapted from the original evaluation of <ref type="bibr" target="#b0">[1]</ref>:</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Prompt template</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Sei un'esperto risolutore di giochi enigmistici. Il seguente gioco contiene una frase (Rebus) nella quale alcune parole sono state sostituite da indizi tra parentesi quadre. I numeri in ogni indizio rappresentano la lunghezza della parola nascosta.</head><p>Il tuo compito è quello di identificare le parole nascoste e sostituirle agli indizi nel Rebus, producendo una prima lettura dalla quale poi si deriverà una frase risolutiva. La chiave di lettura è una sequenza di numeri che rappresentano la rispettive lunghezze delle parole che compongono la frase risolutiva. La tua risposta deve essere una frase risolutiva sensata e che rispetti le lunghezze definite nella chiave di lettura.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>First example</head><formula xml:id="formula_0"># Esempio 1: Problem input ⎧ ⎪ ⎪ ⎪ ⎪ ⎨ ⎪ ⎪ ⎪ ⎪ ⎩</formula></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Rebus: AC [Un mollusco nell'insalata di mare (5)] GLI [Lo è l'operaio che lavora in cantiere (5)] S TO [Soldati da trincea (5)]</head><p>Chiave di lettura: 11 <ref type="bibr">2 10</ref> Procediamo alla risoluzione del rebus passo per passo: ... (same format as the first example) The Solution Match metric will be used as a primary metric of correctness, since it captures the model ability to fully solve the verbalized rebus. While no baseline evaluation was conducted for the new test set used in this challenge, we expect the performances of most capable open-source systems to align with those of 5shot prompted LLaMA-3 70B and Qwen-2 72B models reported by Sarti et al. <ref type="bibr">[</ref> </p><formula xml:id="formula_1">Definition resolution ⎧ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩ -A C = A C -[Un mollusco nell'insalata di mare] = cozza -G L I = G L I -[Lo è l'</formula></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>operaio che lavora in cantiere] = edile -S T O = S T O -[Soldati da trincea] = fanti</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Table 3</head><p>Baseline results for LLaMA-3 70B and Qwen-2 72B for the original test set, adapted from Sarti et al. <ref type="bibr" target="#b0">[1]</ref>.</p><p>to complete the task primarily due to incorrect word guesses, with errors propagating across resolution steps and ultimately resulting in a final accuracy of 0%.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Limitations</head><p>Several limitations should be considered when interpreting the results of this challenge:</p><p>Verbalization Simplification The use of verbalized rebuses, while necessary for text-based LLMs, simplifies the original visual puzzle. This does not fully capture the complexity of solving traditional rebuses, which rely on visual cues and cultural knowledge, making verbalized rebus solving a much simpler proxy to the multi-step reasoning required for regular rebuses.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Cultural Specificity</head><p>The selected rebuses and crossword definitions rely heavily on Italian-specific linguistic and cultural background. Performance on this task may not generalize to other languages or puzzle types, and it might be unrealistic to expect general-purpose LLMs to possess the specific lexicon and knowledge used for rebus solving.</p><p>Prompt Sensitivity While the selected prompt template was observed to perform well for capable proprietary LLMs in preliminary tests, there are no guarantees that the instructions provided in the prompt are sufficient for smaller open-source models to perform verbalized rebus solving proficiently. Moreover, alternative prompt formulations could lead to potentially better results.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Lack of Human Baseline</head><p>The challenge currently lacks a clear human performance baseline, which would be valuable for contextualizing model performance on verbalized rebus solving.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.">Ethical issues</head><p>While this challenge focuses on a relatively benign task of puzzle-solving, there are some ethical considerations to keep in mind. First, the dataset captures a very narrow subset of Italian language and culture. Hence, evaluation findings should not be overgeneralized to Italian language competence as a whole or to other cultures. This dataset's rebuses and crossword definitions are derived from commercially available published sources. While efforts have been made to ensure this data's exclusive, fair usage for research purposes, there may be copyright considerations to address.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7.">Data license and copyright issues</head><p>As reported by the original EurekaRebus dataset license, the data is redistributed for research purposes only with the explicit approval of the Associazione Culturale "Biblioteca Enigmistica Italiana -G. Panini" (here onwards referred to as the Association), and the rights to each entry in the EurekaRebus collection are the property of the respective copyright holders. The usage and redistribution of these data is allowed only for users providing appropriate attribution to the original copyright holders and the Association, and the creation of derivative works is permitted only for research purposes, using terms no less restrictive than the EurekaRebus license. Researchers are encouraged to contact the challenge organizers with any questions or concerns about data usage and licensing.</p><p>• The Associazione Culturale "Biblioteca Enigmistica Italiana -G. Panini" for making their rebus collection freely accessible on Eureka5. • The creators of the ItaCW dataset for enabling the creation of verbalized rebuses. • The puzzle creators whose work is represented in this dataset.</p><p>Gabriele Sarti and Arianna Bisazza acknowledge the support of the Dutch Research Council (NWO) for the project InDeep (NWA.1292.19.399). Arianna Bisazza is further supported by the NWO Talent Programme (VI.Vidi.221C.009). We hope this challenge will contribute to the diffusion of the art of Italian enigmistica among computational linguistics and artificial intelligence researchers.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>9 Figure 1 :</head><label>91</label><figDesc>Figure 1: Example of a verbalized rebus crafted by combining a rebus first pass (intermediate solution) with crossword definitions. Rebus by Lionello, art by Laura Neri.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2 from</head><label>2</label><figDesc>Sarti et al.<ref type="bibr" target="#b0">[1]</ref> reports statistics for the full and verbalized subsets of the EurekaRebus dataset.</figDesc><table><row><cell>Train set contents The training set contains 80,158</cell></row><row><cell>examples, which are ignored for the purpose of the</cell></row><row><cell>CALAMITA campaign provided that no adaptation meth-</cell></row><row><cell>ods are evaluated.</cell></row><row><cell>Test set contents The test set contains 3,167 examples</cell></row><row><cell>divided as follows, in order of appearance:</cell></row><row><cell>• 2000 examples matching the in-domain setting</cell></row><row><cell>for models trained by [1], i.e. containing only first</cell></row><row><cell>pass words seen by all available trained models.</cell></row><row><cell>• 999 examples matching the out-of-distribution</cell></row><row><cell>setting for models trained by [1], i.e. containing</cell></row><row><cell>at least one first pass word unseen during training</cell></row><row><cell>by available trained models.</cell></row><row><cell>• 168 new verbalized rebuses added in EurekaRe-</cell></row><row><cell>bus v1.1, added to the Eureka5 platform after</cell></row><row><cell>June 17th, 2024. These can be either in-domain</cell></row><row><cell>or out-of-distribution for models trained on the</cell></row><row><cell>EurekaRebus's training set.</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 2</head><label>2</label><figDesc>Statistics for the full EurekaRebus dataset and the crosswordsfiltered subset used in this work. Avg./SD = Average/standard deviation. Table adapted from Sarti et al. [1].</figDesc><table><row><cell>Answer</cell><cell># Ora tocca a te!</cell></row><row><cell>prefix</cell><cell></cell></row><row><cell></cell><cell>Completa il rebus seguendo il procedimento</cell></row><row><cell></cell><cell>descritto, rispondendo esattamente nello</cell></row><row><cell></cell><cell>stesso formato utilizzato dagli esempi prece-</cell></row><row><cell></cell><cell>denti.</cell></row><row><cell></cell><cell>Rebus: {{verbalized_rebus}} or {{verbal-</cell></row><row><cell></cell><cell>ized_rebus_with_length_hints}}</cell></row><row><cell></cell><cell>Chiave di lettura: {{solution_key}}</cell></row><row><cell>Table 1</cell><cell></cell></row><row><cell cols="2">2-shot prompt used for the CALAMITA evaluation. Blue text</cell></row><row><cell cols="2">represent additions for the evaluation in the Hints setting.</cell></row><row><cell cols="2">Template elements are highlighted next to the first in-context</cell></row><row><cell cols="2">example. Example rebus by Parodi E., Domenica Quiz n. 7</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_3"><head>Model Word Acc. FP Acc. Solution Word Acc. Solution Word Len. Solution Acc.</head><label></label><figDesc>1], which we summarize in Section 4. The results show that current models struggle</figDesc><table><row><cell>LLaMA-3 70B</cell><cell>0.22</cell><cell>0.04</cell><cell>0.03</cell><cell>0.16</cell><cell>0.00</cell></row><row><cell>Qwen-2 72B</cell><cell>0.28</cell><cell>0.04</cell><cell>0.04</cell><cell>0.20</cell><cell>0.00</cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_0">http://www.eureka5.it</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="4" xml:id="foot_1">http://www.enignet.it/home</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="5" xml:id="foot_2">Words in ItaCW can be associated to multiple definitions.</note>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgments</head><p>We would like to express our gratitude to the following individuals and organizations:</p></div>
			</div>


			<div type="funding">
<div xmlns="http://www.tei-c.org/ns/1.0"><p>(M. Nissim) https://gsarti.com (G. Sarti); https://cs.rug.nl/~bisazza (A. Bisazza); https://malvinanissim.github.io (M. Nissim) 0000-0001-8715-2987 (G. Sarti); 0000-0003-2936-0256 (T. Caselli); 0000-0003-1270-3048 (A. Bisazza); 0000-0001-5289-0971 (M. Nissim)</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Non verbis, sed rebus: Large language models are weak solvers of italian rebuses</title>
		<author>
			<persName><forename type="first">G</forename><surname>Sarti</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Caselli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Nissim</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Bisazza</surname></persName>
		</author>
		<ptr target="https://arxiv.org/abs/2408.00584" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Tenth Italian Conference on Computational Linguistics (CLiC-it 2024)</title>
				<editor>
			<persName><forename type="first">F</forename><surname>Dell'orletta</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">A</forename><surname>Lenci</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">S</forename><surname>Montemagni</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">R</forename><surname>Sprugnoli</surname></persName>
		</editor>
		<meeting>the Tenth Italian Conference on Computational Linguistics (CLiC-it 2024)<address><addrLine>Pisa, Italy</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<monogr>
		<title level="m" type="main">Puzzle solving using reasoning of large language models: A survey</title>
		<author>
			<persName><forename type="first">P</forename><surname>Giadikiaroglou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Lymperaiou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Filandrianos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Stamou</surname></persName>
		</author>
		<idno>ArXiv</idno>
		<ptr target="https://arxiv.org/abs/2402.11291" />
		<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<monogr>
		<title level="m" type="main">Finding the optimal human strategy for wordle using maximum correct letter probabilities and reinforcement learning</title>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">J</forename><surname>Anderson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">G</forename><surname>Meyer</surname></persName>
		</author>
		<ptr target="https://arxiv.org/abs/2202.00557" />
		<imprint>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
	<note type="report_type">Arxiv</note>
</biblStruct>

<biblStruct xml:id="b3">
	<monogr>
		<author>
			<persName><forename type="first">G</forename><surname>Todd</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Merino</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Earle</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Togelius</surname></persName>
		</author>
		<ptr target="https://arxiv.org/abs/2404.11730" />
		<title level="m">Missed connections: Lateral thinking puzzles for large language models</title>
				<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
	<note type="report_type">Arxiv</note>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Webcrow: A web-based system for crossword solving</title>
		<author>
			<persName><forename type="first">M</forename><surname>Ernandes</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Angelini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Gori</surname></persName>
		</author>
		<idno type="DOI">10.1007/11590323_37</idno>
		<ptr target="https://link.springer.com/chapter/10.1007/11590323_37" />
	</analytic>
	<monogr>
		<title level="m">AAAI Conference on Artificial Intelligence</title>
				<imprint>
			<date type="published" when="2005">2005</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Decrypting cryptic crosswords: Semantically complex wordplay puzzles as a target for nlp</title>
		<author>
			<persName><forename type="first">J</forename><surname>Rozner</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Potts</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Mahowald</surname></persName>
		</author>
		<ptr target="https://proceedings.neurips.cc/paper_files/paper/2021/file/5f1d3986fae10ed2994d14ecd89892d7-Paper.pdf" />
	</analytic>
	<monogr>
		<title level="m">Advances in Neural Information Processing Systems</title>
				<editor>
			<persName><forename type="first">M</forename><surname>Ranzato</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">A</forename><surname>Beygelzimer</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Y</forename><surname>Dauphin</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">P</forename><surname>Liang</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">J</forename><forename type="middle">W</forename><surname>Vaughan</surname></persName>
		</editor>
		<imprint>
			<publisher>Curran Associates, Inc</publisher>
			<date type="published" when="2021">2021</date>
			<biblScope unit="volume">34</biblScope>
			<biblScope unit="page" from="11409" to="11421" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Automated crossword solving</title>
		<author>
			<persName><forename type="first">E</forename><surname>Wallace</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Tomlin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Xu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Yang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Pathak</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Ginsberg</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Klein</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2022.acl-long.219</idno>
		<ptr target="https://aclanthology.org/2022.acl-long.219.doi:10.18653/v1/2022.acl-long.219" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics</title>
				<editor>
			<persName><forename type="first">S</forename><surname>Muresan</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">P</forename><surname>Nakov</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">A</forename><surname>Villavicencio</surname></persName>
		</editor>
		<meeting>the 60th Annual Meeting of the Association for Computational Linguistics<address><addrLine>Dublin, Ireland</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2022">2022</date>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page" from="3073" to="3085" />
		</imprint>
	</monogr>
	<note>: Long Papers), Association for Computational Linguistics</note>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Clue-instruct: Text-based clue generation for educational crossword puzzles</title>
		<author>
			<persName><forename type="first">A</forename><surname>Zugarini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Zeinalipour</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">S</forename><surname>Kadali</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Maggini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Gori</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Rigutini</surname></persName>
		</author>
		<ptr target="https://aclanthology.org/2024.lrec-main.297" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)</title>
				<meeting>the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)<address><addrLine>Torino, Italia</addrLine></address></meeting>
		<imprint>
			<publisher>ELRA and ICCL</publisher>
			<date type="published" when="2024">2024</date>
			<biblScope unit="page" from="3347" to="3356" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Riddle me this: Evaluating large language models in solving word-based games</title>
		<author>
			<persName><forename type="first">R</forename><surname>Manna</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">P</forename><surname>Di Buono</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Monti</surname></persName>
		</author>
		<ptr target="https://aclanthology.org/2024.games-1.11" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 10th Workshop on Games and Natural Language Processing @ LREC-COLING 2024</title>
				<editor>
			<persName><forename type="first">C</forename><surname>Madge</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">J</forename><surname>Chamberlain</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">K</forename><surname>Fort</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">U</forename><surname>Kruschwitz</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">S</forename><surname>Lukin</surname></persName>
		</editor>
		<meeting>the 10th Workshop on Games and Natural Language Processing @ LREC-COLING 2024<address><addrLine>Torino, Italia</addrLine></address></meeting>
		<imprint>
			<publisher>ELRA and ICCL</publisher>
			<date type="published" when="2024">2024</date>
			<biblScope unit="page" from="97" to="106" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Italian crossword generator: Enhancing education through interactive word puzzles</title>
		<author>
			<persName><forename type="first">K</forename><surname>Zeinalipour</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Iaquinta</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Zanollo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Angelini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Rigutini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Maggini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Gori</surname></persName>
		</author>
		<ptr target="https://ceur-ws.org/Vol-3596" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 9th Italian Conference on Computational Linguistics (CLiC-it 2023)</title>
				<meeting>the 9th Italian Conference on Computational Linguistics (CLiC-it 2023)</meeting>
		<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Solving italian crosswords using the web</title>
		<author>
			<persName><forename type="first">G</forename><surname>Angelini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Ernandes</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Gori</surname></persName>
		</author>
		<idno type="DOI">10.1007/11558590_40</idno>
		<ptr target="https://link.springer.com/chapter/10.1007/11558590_40" />
	</analytic>
	<monogr>
		<title level="m">International Conference of the Italian Association for Artificial Intelligence</title>
				<imprint>
			<date type="published" when="2005">2005</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Ghigliottin-ai@evalita2020: Evaluating artificial players for the language game &quot;la ghigliottina</title>
		<author>
			<persName><forename type="first">P</forename><surname>Basile</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Lovetere</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Monti</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Pascucci</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Sangati</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Siciliani</surname></persName>
		</author>
		<idno type="DOI">10.4000/books.aaccademia.7488</idno>
		<ptr target="https://doi.org/10.4000/books.aaccademia.7488" />
	</analytic>
	<monogr>
		<title level="m">EVALITA Evaluation of NLP and Speech Tools for Italian -December 17th</title>
				<imprint>
			<date type="published" when="2020">2020. 2020</date>
		</imprint>
	</monogr>
	<note>short paper</note>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Solving a complex language game by using knowledgebased word associations discovery</title>
		<author>
			<persName><forename type="first">P</forename><surname>Basile</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>De Gemmis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Lops</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Semeraro</surname></persName>
		</author>
		<idno type="DOI">10.1109/TCIAIG.2014.2355859</idno>
	</analytic>
	<monogr>
		<title level="j">IEEE Transactions on Computational Intelligence and AI in Games</title>
		<imprint>
			<biblScope unit="volume">8</biblScope>
			<biblScope unit="page" from="13" to="26" />
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<monogr>
		<author>
			<persName><forename type="first">D</forename><surname>Tolosani</surname></persName>
		</author>
		<title level="m">Enimmistica</title>
				<meeting><address><addrLine>Hoepli, Milan</addrLine></address></meeting>
		<imprint>
			<date type="published" when="1901">1901</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<monogr>
		<author>
			<persName><forename type="first">G</forename><surname>Brighenti</surname></persName>
		</author>
		<ptr target="http://win.cantodellasfinge.net/portale/leonardo/articoli/langense/pag2.asp" />
		<title level="m">I canoni di bellezza nel rebus, Labirinto -Mensile di cultura enigmistica</title>
				<imprint>
			<date type="published" when="1974">1974</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<monogr>
		<author>
			<persName><forename type="first">E</forename><surname>Miola</surname></persName>
		</author>
		<title level="m">Che cos&apos;è un rebus</title>
				<imprint>
			<publisher>Carocci</publisher>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<monogr>
		<title level="m" type="main">Parole in gioco: Per una semiotica del gioco linguistico</title>
		<author>
			<persName><forename type="first">S</forename><surname>Bartezzaghi</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2017">2017</date>
			<publisher>Bompiani</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<monogr>
		<title level="m" type="main">L&apos;ora desiata vola: guida al mondo del rebus per solutori (ancora) poco abili</title>
		<author>
			<persName><forename type="first">P</forename><surname>Ichino</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2021">2021</date>
			<publisher>Bompiani</publisher>
			<pubPlace>Milan</pubPlace>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">CALAMITA: Challenge the Abilities of LAnguage Models in ITAlian</title>
		<author>
			<persName><forename type="first">G</forename><surname>Attanasio</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Basile</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Borazio</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Croce</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Francis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Gili</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Musacchio</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Nissim</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Patti</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Rinaldi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Scalena</surname></persName>
		</author>
		<ptr target="CEUR-WS.org" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 10th Italian Conference on Computational Linguistics (CLiC-it 2024)</title>
		<title level="s">CEUR Workshop Proceedings</title>
		<meeting>the 10th Italian Conference on Computational Linguistics (CLiC-it 2024)<address><addrLine>Pisa, Italy</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2024-12-06">December 4 -December 6, 2024. 2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<monogr>
		<author>
			<persName><forename type="first">M</forename><surname>Ai</surname></persName>
		</author>
		<ptr target="https://ai.meta.com/blog/meta-llama-3" />
		<title level="m">Introducing meta llama 3: The most capable openly available llm to date</title>
				<imprint>
			<publisher>Website</publisher>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">LoRA: Low-rank adaptation of large language models</title>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">J</forename><surname>Hu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Shen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Wallis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Allen-Zhu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><surname>Chen</surname></persName>
		</author>
		<ptr target="https://openreview.net/forum?id=nZeVKeeFYf9" />
	</analytic>
	<monogr>
		<title level="m">The Tenth International Conference on Learning Representations (ICLR 2022)</title>
				<meeting><address><addrLine>OpenReview, Online</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
