<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Non Verbis, Sed Rebus: Large Language Models are Weak Solvers of Italian Rebuses</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Gabriele Sarti</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tommaso Caselli</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Malvina Nissim</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Arianna Bisazza</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ali (wings)</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Center for Language and Cognition (CLCG), University of Groningen, Oude Kijk in 't Jatstraat 26 Groningen</institution>
          ,
          <addr-line>9712EK</addr-line>
          ,
          <country country="NL">The Netherlands</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Rebuses are puzzles requiring constrained multi-step reasoning to identify a hidden phrase from a set of images and letters. In this work, we introduce a large collection of verbalized rebuses for the Italian language and use it to assess the rebus-solving capabilities of state-of-the-art large language models. While general-purpose systems such as LLaMA-3 and GPT-4o perform poorly on this task, ad-hoc fine-tuning seems to improve models' performance. However, we find that performance gains from training are largely motivated by memorization. Our results suggest that rebus solving remains a challenging test bed to evaluate large language models' linguistic proficiency and sequential instruction-following skills.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Large language models</kwd>
        <kwd>Sequential reasoning</kwd>
        <kwd>Puzzle</kwd>
        <kwd>Rebus</kwd>
        <kwd>Crosswords</kwd>
        <kwd>Enigmistica Italiana</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>including open-source systems and proprietary models,
via few-shot prompting. Moreover, we fine-tune a small
but capable LLM on verbalized rebus solving,
outperforming state-of-the-art systems by a wide margin. Finally, we
conduct a fine-grained assessment of LLMs’ sequential
reasoning steps, explaining model performance in terms
of word complexity and memorization.</p>
      <p>Beyond rebus solving, our evaluation sheds light on the ifne-tuning experiments. In our evaluation, we also adopt
limits of current LLMs in multi-step reasoning settings, few-shot prompting [26] and chain-of-thought
reasonhighlighting challenges with their application to complex ing [27], which were both shown to strongly improve
sequential instruction-following scenarios.1 LLMs’ abilities when solving complex multi-step tasks.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Background and Related Work</title>
    </sec>
    <sec id="sec-3">
      <title>3. Experimental Setup</title>
      <p>
        Italian Enigmistica and Rebuses The Italian lan- Data We begin by extracting all rebuses’ first passes
guage is characterized by a rich and long-standing tra- and solutions available on Eureka55, an online repository
dition of puzzle games, including rebuses, dating back of Italian puzzles. We refer to the resulting dataset
conto the 19th century [
        <xref ref-type="bibr" rid="ref24">5</xref>
        ]2 In Italian rebuses, a first pass taining 223k unique rebuses sourced from various
publi(prima lettura) representing an intermediate solution of cations as EurekaRebus. For crossword definitions, we
the puzzle is produced by combining graphemes with use ItaCW [20], containing 125k unique definition-word
underlying image elements in a left-to-right direction pairs. We select only EurekaRebus examples in which
(Figure 1). Then, the letters and words of the first pass all first pass words match an existing ItaCW definition
undergo a re-segmentation (cesura) according to a solu- to enable verbalization, maintaining 83,157 examples for
tion key (chiave di lettura3), which specifies the length of our modeling experiments.6 Since several ItaCW words
words in the solution (frase risolutiva). The verbalized are associated with multiple definitions, we randomly
rebuses we introduce in this work are variants of textual sample definitions to promote diversity in the resulting
rebuses (rebus descritto or verbis), where the text-based verbalized rebuses. A test set of 2k examples7 is kept
puzzle is crafted by replacing first pass words with their aside for evaluation, and the remaining 81k examples are
crossword definitions in a templated format (Figure 1). used for model training.
      </p>
      <p>
        Linguistic Puzzles as NLP Progress Metrics Lan- Models We fine-tune Phi-3 Mini 3.8B 4K [ 28], the most
guage games have recently been adopted as challeng- capable LLM below 4B parameters for a wide range of
Italing tasks for LLM evaluation [
        <xref ref-type="bibr" rid="ref2 ref35 ref85">3, 9, 10</xref>
        ]. While works ian language tasks8. We use quantized low-rank adapters
in this area have historically focused on English cross- (QLoRA; 29, 30) for eficient fine-tuning with Unsloth 9
words [11, 12, 4, 13], recent tests focus on a more di- and Transformers [31], training the model for 5,000 steps
verse set of games such as the New York Times’ “Con- with a batch size of 16 over 81k examples. For
comparnections” [
        <xref ref-type="bibr" rid="ref88">14</xref>
        ] and “Wordle” [15]. Automatic crossword ing our model performances, we select GPT-4o [32] and
solvers were also developed for French [16], German [
        <xref ref-type="bibr" rid="ref52">17</xref>
        ] Claude-3.5 Sonnet [33] as the current state-of-the-art
and Italian [18, 19], while didactic crossword generators for proprietary LLMs and the instruction-tuned variants
are available for Italian [20] and Turkish [21]. Relat- of Qwen-2 72B [34] and LLaMA-3 70B [35] as the
bestedly, the Italian evaluation campaign EVALITA4 recently performing open-source LLMs according to the Invalsi
hosted two shared tasks focusing on the word-guessing Italian benchmark [36]. These four systems are used as
game “La Ghigliottina” (The Guillotine) [22, 23]. To our untrained baselines thanks to their instruction-following
knowledge, our work is the first to attempt the computa- abilities and prompted for rebus solving in a few-shot
tional modeling and evaluation of rebus-solving systems. setting.
      </p>
      <p>Importantly, language games such as rebuses are not
easily translatable into other languages due to their struc- Format Table 1 presents an example in the templated
tural and cultural elements. This makes them a scarce format used for fine-tuning Phi-3. 10 The model is
but valuable resource for language-specific evaluations prompted to reason step-by-step by 1) solving crossword
of language processing systems. definitions sequentially ( definition resolution ); 2)
producing a first pass copying letters and definitions’ words;</p>
      <sec id="sec-3-1">
        <title>LLMs as Sequential Reasoners State-of-the-art</title>
        <p>
          LLMs were shown to struggle to follow sequential instruc- 5http://www.eureka5.it, additional details in Appendix A. Rebus
tions presented in a single query [24], but their perfor- illustrations are not available in Eureka5.
mances improved significantly with ad-hoc training [ 25]. 6Since verbalized rebus are produced from textual contents only,
This acts as an initial motivation for our rebus-solving [cTrowsoswsoocrcdedrepfinlaityioernss] misauyserdefteor
rteopdreifesreennttthweowrdormde“awniinnggss”(ien.gF.igure 1 despite not matching the word sense “bird wings” of the
1Code, data and models are available on Github and Huggingface original image. This does not afect the validity of our task.
2Refer to Miola [6], Bartezzaghi [7], Ichino [
          <xref ref-type="bibr" rid="ref56 ref94">8</xref>
          ] for a comprehensive 7Composed by Test id and Test ood, described in Section 5
overview of peculiarities and norms in modern Italian rebuses. 8https://hf.co/spaces/FinancialSupport/open_ita_llm_leaderboard
3Referred to as diagramma in jargon. 9https://github.com/unslothai/unsloth
4https://www.evalita.it 10An English example is available in Table 9
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>Prompt</title>
        <sec id="sec-3-2-1">
          <title>Risolvi gli indizi tra parentesi per ottenere una prima lettura, e usa la chiave di lettura per ottenere la soluzione del rebus.</title>
        </sec>
        <sec id="sec-3-2-2">
          <title>Rebus: U [Lo è il passacavallo] LO [È fatta di vimini] F F [Decimi di chilo] S [Disusato soprabito] A [Un rampicante dei Tropici] Chiave di lettura: 3 6 12 8</title>
          <p>• First Pass Words/Letter Accuracy: Proportion
of correct words and letters in the generated first
pass. Lower scores may indicate issues with
assembling a first pass from previous information.
• First Pass Exact Match (EM): Proportion of
generated first passes matching the gold reference.
• Solution Key Match: Proportion of generated
solution words matching the lengths specified by
the solution key. Lower scores may indicate
dificulty in respecting the given length constraints.
• Solution First Pass Match: Proportion of first
pass characters employed to construct solution
words. Lower scores indicate issues with using
generated first pass characters in the solution. 11
• Solution Words Accuracy: Proportion of
correct words in the generated solution.
• Solution Exact Match (EM): Proportion of
generated solutions matching the gold reference.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Results</title>
      <sec id="sec-4-1">
        <title>Model</title>
        <p>LLaMA-3 70B
Qwen-2 72B
GPT-4o
Claude-3.5 Sonnet
Phi-3 3.8B (ours)</p>
      </sec>
      <sec id="sec-4-2">
        <title>Setup</title>
        <p>5-shot prompt
5-shot prompt
5-shot prompt
5-shot prompt
fine-tuned</p>
        <p>Def.</p>
        <p>Metric GPT-4o Phi-3 (ours) we evaluate our fine-tuned model in out-of-distribution
Test Test Test Test Test Test settings. For this evaluation, the 2k examples of the test
id ood Δ id ood Δ set from previous sections are divided into two subsets:
FP W. ID 0.52 0.51 -0.01 0.96 0.96 0.00 one in which all first pass words were seen during
fineFP W. OOD - 0.44 - - 0.20 - tuning by Phi-3 (Test id, 1061 examples) and one in
FP EM 0.16 0.14 -0.02 0.89 0.18 -0.71 which, for every example, at least one first pass word
S W. ID 0.29 0.26 -0.03 0.92 0.49 -0.43 was unseen in training (Test ood, 939 examples).
InS W. OOD 0.18 0.16 -0.02 0.63 0.20 -0.40 tuitively, if Phi-3 performance is mainly motivated by
S EM 0.12 0.09 -0.03 0.82 0.16 -0.66 memorizing fine-tuning data, introducing OOD words
should produce a significant drop in model performances.</p>
        <p>Table 3 Results shown in Table 3 confirm that this is indeed the
Model performances for test subsets containing only in- case. We find Phi-3 performances to be near-perfect on
domain (Test ID), or some out-of-domain (Test OOD) first seen first pass words (FP W. ID = 0.96) in both test sets,
pass words. W. ID and W. OOD are accuracies for ID and OOD with a major drop for OOD words (FP W. OOD = 0.20).
wTeosrtdIsDfo-rTfeisrsttOpOasDs (pFePr)foarnmdasnocleu.tion (S) sequences. Test Δ = This produces second-order efects on subsequent steps,
causing the FP EM results to drop by 71% (FP EM Test
∆ ), while significantly impacting downstream solution
accuracies. On the contrary, GPT-4o few-shot prompting
performances remain nearly identical on both splits,
conifrming that these results are not the product of a skewed
data selection process. Overall, these results strongly
suggest that memorization is the main factor behind the
strong rebus-solving performance of our fine-tuned LLM.</p>
      </sec>
      <sec id="sec-4-3">
        <title>Word Complexity and Frequency Afects LLM Fine</title>
        <p>tuning Performance For every word in the first
passes and solutions of test set examples, we measure
LLMs’ overall accuracy in predicting it for the full test
set. We then correlate this score to various quantities
that could motivate LLMs’ performances. More
specifically, we use 1) the word frequency in the training set;
2) the word frequency in Paisà [38], a large web
Italian corpus; and 3) the length of the word (number of
characters). We find a significant positive correlation
( = 0.44) between first pass word prediction accuracy
and training frequency for the fine-tuned Phi-3 model,
suggesting that model performance is strongly related
to training coverage. The length of characters is also
found to negatively afect our model’s performance,
albeit to a smaller extent ( = − 0.11). The performance of
prompted models is unrelated to both properties for first
pass words, indicating that these results are the product
of fine-tuning. 12</p>
      </sec>
      <sec id="sec-4-4">
        <title>LLM Fine-Tuning Fails to Generalize to Unseen</title>
        <p>Words To further confirm the importance of
finetuning word coverage in defining model performances,
12Paisà frequency is never found to correlate significantly. Full
correlation results are available in Table 6.</p>
        <p>Manual Inspection We conclude by manually
evaluating some generations produced by the best-performing
LLMs. Table 4 presents two examples with definitions
(D) and solution (S) words predicted by three LLMs, with
more examples provided in Appendix C. We use naw as
short-hand for “Not A Word” to mark nonsensical terms.</p>
        <p>In the first example, Phi-3 correctly predicts all first
pass and solution words. On the contrary, other
models make several mistakes in the first pass, leading to
incorrect solutions. Both prompted models tend to
ignore first pass words when these cannot be assembled
to form sensical, length-fitting solution words. For
example, for D1 GPT-4o predicts p (naw), which would
lead to the solution word “SAPpTE” (naw), but the S8 =
“Spettacolo” (show) is predicted instead by the model). In
particular, GPT-4o appears to prioritize grammatically
correct solutions at the cost of ignoring first pass words
and solution key length constraints, while Claude 3.5S
shows an improved ability to follow these constraints, as
confirmed by Key/FP Match results of Table 2.</p>
        <p>In the second example, the first pass word D2 = salice
(willow) is OOD for Phi-3. Consequently, the model
produces the incorrect prediction aro (naw), and the error is
propagated to all solution words, as previously observed
in the Test OOD column of Table 3. Prompted models
also underperform in this example, with errors on D1 and
D2 propagating to most solution words. However, we
note that D1 and D2 incorrect predictions for Claude 3.5S
satisfy the provided definitions, suggesting that access
to more explicit information about the given constraints
could further boost LLMs’ performance on this task.</p>
      </sec>
      <sec id="sec-4-5">
        <title>Step</title>
        <p>D1
D2
D3
D4
D5
D6</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>6. Discussion and Conclusion</title>
      <p>This work introduced a verbalized rebus-solving task
and dataset for evaluating LLMs’ sequential instruction
following skills for the Italian language. We crafted a
large collection of 83k verbalized rebuses by combining
rebus transcriptions with crossword definitions and used
it to evaluate the rebus-solving skills of state-of-the-art
LLMs. Our experiments revealed the challenging nature
of this task, with even the most capable prompted models
achieving only 24% accuracy on solutions.</p>
      <p>While fine-tuning a smaller LLM dramatically
improved performance to 51% solution accuracy, our
analysis uncovered that these gains were largely driven
by memorization and do not generalize to
out-ofdistribution examples. These results suggest important
limitations in the generalization capabilities of current
systems for sequential instruction following tasks. Our
manual analysis further shows that LLMs seldom account
for length constraints when solving definitions, despite
the fundamental role of these cues in restricting the pool
of possible words. These results suggest that
searchbased approaches accounting for constraints more
explicitly might improve puzzle structure adherence, as
previously shown by Chen et al. [39]. Other
augmentation techniques employing LLM reformulation skills can
also be explored to mitigate overfitting.</p>
      <p>Future work in this area should focus on expanding
similar evaluations to a wider set of languages, input
modalities, and puzzle categories, creating a
comprehensive benchmark to test LLMs’ puzzle-solving skills.
Importantly, the task of solving visual rebuses and their
more convoluted variants13 remains far beyond the
current capabilities of vision-language models. Hence,
solving these puzzles automatically can be considered an
important milestone in developing multimodal AI
systems for constrained multi-step reasoning tasks. Our
results confirm that the challenging nature of rebuses,
even in their verbalized form, makes this task valuable
for assessing future progress in LLMs’ linguistic
proficiency and sequential reasoning abilities. Finally, our
rebus-solving LLM can facilitate future interpretability
work investigating the mechanisms behind factual recall
and multi-step reasoning in transformer models [40].
Limitations Our analysis was limited to a relatively
small set of models, and a single prompt template
obtained after minimal tuning. Further experiments are
needed to verify that memorization patterns after
finetuning remain relevant for other model sizes, prompt
formats, and training regimes, particularly for full-weight
training approaches.
13For example, rebuses requiring first pass anagrams ( anarebus) or
dynamic relations derived from multi-scene analysis (stereorebus)</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments References</title>
      <p>Gabriele Sarti and Arianna Bisazza acknowledge the
support of the Dutch Research Council (NWO) for the
project InDeep (NWA.1292.19.399). Arianna Bisazza
is further supported by the NWO Talent Programme
(VI.Vidi.221C.009). We are grateful to the Associazione
Culturale “Biblioteca Enigmistica Italiana - G. Panini”
for making its rebus collection freely accessible on the
Eureka5 platform, and to Valeriya Zelenkova for her
valuable comments on the first version of this work. We also
thank the CLiC-it 2024 reviewers for their valuable
feedback.</p>
    </sec>
    <sec id="sec-7">
      <title>A. Additional Data Information</title>
      <p>Dataset statistics Table 5 presents statistics for the
EurekaRebus dataset and the filtered subset we use for
composing verbalized rebuses. The ItaCW dataset contains a
total of 125,202 definitions for 40,963 unique words, with
the most frequent words having hundreds of diferent
definitions, e.g. 173 for re (king), 155 for te (you).
Definitions used for verbalization are randomly sampled from</p>
      <p>Table 6 presents the correlations between model
accuracy and the properties presented in Section 5. Table 7
presents the full ID/OOD performances for all tested
models, showing consistent results with Table 3 for all
prompted models. Table 8 presents Phi-3 Mini
performances across rebus-solving fine-tuning steps.
Metric
FP W. ID
FP W. OOD
FP EM
S W. ID
S W. OOD
S EM</p>
    </sec>
    <sec id="sec-8">
      <title>C. Additional Model Generations</title>
      <sec id="sec-8-1">
        <title>Prompt</title>
        <sec id="sec-8-1-1">
          <title>Solve the clues provided between parentheses to obtain a first pass, and use the solution key to obtain the rebus’ solution.</title>
        </sec>
        <sec id="sec-8-1-2">
          <title>Rebus: M [Two attacking footballers] N [Used for eating icecream] [Barks and bites] NIA</title>
        </sec>
        <sec id="sec-8-1-3">
          <title>Solution key: 11 5</title>
        </sec>
      </sec>
      <sec id="sec-8-2">
        <title>Generation</title>
        <sec id="sec-8-2-1">
          <title>Let’s proceed to solve the rebus step by step:</title>
        </sec>
        <sec id="sec-8-2-2">
          <title>Rebus: MOR [Musa della commedia] D1 L [Si accendono per vedere] D2 NO [Uomini di intelligenza superiore] D3</title>
        </sec>
        <sec id="sec-8-2-3">
          <title>Chiave di lettura: 7 12</title>
        </sec>
      </sec>
      <sec id="sec-8-3">
        <title>Step</title>
        <p>D1
D2
D3
S7
S12
GPT-4o</p>
        <p>Talia
luci
geni</p>
        <p>Mortali
allucinogeni</p>
      </sec>
      <sec id="sec-8-4">
        <title>Claude 3.5S</title>
        <p>Talia
luci
geni</p>
        <p>Mortali
allucinogeni
Phi-3
Talia
luci
genii</p>
        <p>Mortali
allucinogeni</p>
        <sec id="sec-8-4-1">
          <title>Soluzione: MORTali aLluciNOgeni</title>
          <p>Phi-3 Mini.</p>
        </sec>
        <sec id="sec-8-4-2">
          <title>Rebus: PRI [Ricoperto di sudore] D1 MIN [Gli altari del tempio] D2 DO [Un ordigno bellico] D3 [Possono essere “di serie” in certi tornei] D4 SSO</title>
          <p>Chiave di lettura: 5 2 8 6 2 6</p>
        </sec>
        <sec id="sec-8-4-3">
          <title>Rebus: B [Una figura geometrica] D1 [La si impugna per far girare un congegno] D2 DA [Le produce il rovo] D3</title>
        </sec>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <surname>Milan</surname>
          </string-name>
          ,
          <year>2021</year>
          . [9]
          <string-name>
            <given-names>R.</given-names>
            <surname>Manna</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. P.</surname>
          </string-name>
          di Buono, J. Monti, Riddle me
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <source>of the 10th Workshop on Games and Natural</source>
          Lan-
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <source>guage Processing @ LREC-COLING</source>
          <year>2024</year>
          , ELRA
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>and</surname>
            <given-names>ICCL</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Torino</surname>
          </string-name>
          , Italia,
          <year>2024</year>
          , pp.
          <fpage>97</fpage>
          -
          <lpage>106</lpage>
          . URL:
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          https://aclanthology.org/
          <year>2024</year>
          .games-
          <volume>1</volume>
          .
          <fpage>11</fpage>
          . [10]
          <string-name>
            <given-names>P.</given-names>
            <surname>Giadikiaroglou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lymperaiou</surname>
          </string-name>
          , G. Filandrianos,
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <article-title>language models: A survey</article-title>
          ,
          <source>ArXiv</source>
          (
          <year>2024</year>
          ). URL:
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          https://arxiv.org/abs/2402.11291. [11]
          <string-name>
            <given-names>M. L.</given-names>
            <surname>Littman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. A.</given-names>
            <surname>Keim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Shazeer</surname>
          </string-name>
          , A
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <article-title>probabilistic approach to solving crossword puz</article-title>
          [1]
          <string-name>
            <given-names>D.</given-names>
            <surname>Silver</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. J.</given-names>
            <surname>Maddison</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          . Guez, zles,
          <source>Artificial Intelligence</source>
          <volume>134</volume>
          (
          <year>2002</year>
          )
          <fpage>23</fpage>
          -
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <given-names>L.</given-names>
            <surname>Sifre</surname>
          </string-name>
          , G. van den Driessche, J. Schrittwieser, 55. URL: https://www.sciencedirect.com/science/
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <given-names>I.</given-names>
            <surname>Antonoglou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Panneershelvam</surname>
          </string-name>
          , M. Lanctot, article/pii/S000437020100114X. doi:https://doi.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <given-names>S.</given-names>
            <surname>Dieleman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Grewe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Nham</surname>
          </string-name>
          , N. Kalchbrenner, org/10.1016/S0004-
          <volume>3702</volume>
          (
          <issue>01</issue>
          )
          <fpage>00114</fpage>
          -
          <lpage>X</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <given-names>I.</given-names>
            <surname>Sutskever</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Lillicrap</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Leach</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Kavukcuoglu</surname>
          </string-name>
          , [12]
          <string-name>
            <given-names>M.</given-names>
            <surname>Ernandes</surname>
          </string-name>
          , G. Angelini,
          <string-name>
            <given-names>M.</given-names>
            <surname>Gori</surname>
          </string-name>
          , We-
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <volume>529</volume>
          (
          <year>2016</year>
          )
          <fpage>484</fpage>
          -
          <lpage>489</lpage>
          . doi:
          <volume>10</volume>
          .1038/nature16961. telligence,
          <year>2005</year>
          . URL: https://link.springer.com/ [2]
          <string-name>
            <given-names>D.</given-names>
            <surname>Silver</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Hubert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Schrittwieser</surname>
          </string-name>
          , I. Antonoglou, chapter/10.1007/11590323_
          <fpage>37</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <string-name>
            <given-names>M.</given-names>
            <surname>Lai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Guez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lanctot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Sifre</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Kumaran</surname>
          </string-name>
          , [13]
          <string-name>
            <given-names>A.</given-names>
            <surname>Boda</surname>
          </string-name>
          , Sadallah,
          <string-name>
            <given-names>D.</given-names>
            <surname>Kotova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Kochmar</surname>
          </string-name>
          , S. Yao,
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <article-title>A general reinforcement learning algorithm that</article-title>
          K. N.
          <year>2023</year>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Yousefi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Betthauser</surname>
          </string-name>
          , H. Hasan-
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <source>ence 362</source>
          (
          <year>2018</year>
          )
          <fpage>1140</fpage>
          -
          <lpage>1144</lpage>
          . doi:
          <volume>10</volume>
          .1126/science. garini, T. Röthenbacher,
          <string-name>
            <given-names>K.</given-names>
            <surname>Klede</surname>
          </string-name>
          , M. Ernandes,
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <string-name>
            <surname>aar6404. B. M. Eskofier</surname>
            ,
            <given-names>D. Z.</given-names>
          </string-name>
          <year>2023</year>
          , Are llms good cryp[3]
          <string-name>
            <given-names>J.</given-names>
            <surname>Rozner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Potts</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Mahowald</surname>
          </string-name>
          , Decrypting tic crossword solvers?,
          <source>ArXiv</source>
          (
          <year>2024</year>
          ). URL: https:
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>cryptic crosswords: Semantically complex word- //arxiv.org/abs/2403.12094.</mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          <article-title>play puzzles as a target for nlp</article-title>
          , in: M. Ranzato, [14]
          <string-name>
            <given-names>G.</given-names>
            <surname>Todd</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Merino</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Earle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Togelius</surname>
          </string-name>
          , Missed
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          (Eds.),
          <source>Advances in Neural Information Processing guage models, Arxiv</source>
          (
          <year>2024</year>
          ). URL: https://arxiv.org/
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          <string-name>
            <surname>Systems</surname>
          </string-name>
          , volume
          <volume>34</volume>
          ,
          <string-name>
            <surname>Curran</surname>
            <given-names>Associates</given-names>
          </string-name>
          , Inc.,
          <year>2021</year>
          , abs/2404.11730.
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          pp.
          <fpage>11409</fpage>
          -
          <lpage>11421</lpage>
          . URL: https://proceedings. [15]
          <string-name>
            <given-names>B. J.</given-names>
            <surname>Anderson</surname>
          </string-name>
          , J. G. Meyer, Finding the optimal
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          neurips.cc/paper_files/paper/2021/
          <article-title>file/ human strategy for wordle using maximum cor-</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          <article-title>5f1d3986fae10ed2994d14ecd89892d7-Paper.pdf . rect letter probabilities and reinforcement learning</article-title>
          , [4]
          <string-name>
            <given-names>E.</given-names>
            <surname>Wallace</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Tomlin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Yang</surname>
          </string-name>
          , E. Pathak,
          <string-name>
            <surname>Arxiv</surname>
          </string-name>
          (
          <year>2022</year>
          ). URL: https://arxiv.org/abs/2202.00557.
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          <string-name>
            <given-names>M.</given-names>
            <surname>Ginsberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Klein</surname>
          </string-name>
          , Automated crossword solv- [16]
          <string-name>
            <given-names>G.</given-names>
            <surname>Angelini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ernandes</surname>
          </string-name>
          , T. laquinta, C. Stehl'e,
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          <article-title>Proceedings of the 60th Annual Meeting of the As- The webcrow french crossword solver</article-title>
          , in: In-
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          <article-title>sociation for Computational Linguistics (Volume 1: telligent Technologies for Interactive Entertain-</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          <string-name>
            <surname>Long</surname>
            <given-names>Papers)</given-names>
          </string-name>
          ,
          <source>Association for Computational Lin- ment</source>
          ,
          <year>2023</year>
          . URL: https://link.springer.com/chapter/
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          <string-name>
            <surname>guistics</surname>
          </string-name>
          , Dublin, Ireland,
          <year>2022</year>
          , pp.
          <fpage>3073</fpage>
          -
          <lpage>3085</lpage>
          . URL:
          <volume>10</volume>
          .1007/978-3-
          <fpage>031</fpage>
          -55722-4_
          <fpage>14</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          https://aclanthology.org/
          <year>2022</year>
          .
          <article-title>acl-long</article-title>
          .
          <volume>219</volume>
          . doi:10. [17]
          <string-name>
            <given-names>A.</given-names>
            <surname>Zugarini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Rothenbacher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Klede</surname>
          </string-name>
          , M. Ernan-
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          <volume>18653</volume>
          /v1/
          <year>2022</year>
          .
          <article-title>acl-long</article-title>
          .
          <volume>219</volume>
          .
          <string-name>
            <surname>des</surname>
            ,
            <given-names>B. M.</given-names>
          </string-name>
          <string-name>
            <surname>Eskofier</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Zanca</surname>
            , Die rätselrevolu[5]
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Tolosani</surname>
          </string-name>
          , Enimmistica, Hoepli, Milan,
          <year>1901</year>
          . tion:
          <source>Automated german crossword solving</source>
          , in: [6]
          <string-name>
            <given-names>E.</given-names>
            <surname>Miola</surname>
          </string-name>
          ,
          <article-title>Che cos'è un rebus</article-title>
          ,
          <source>Carocci</source>
          ,
          <year>2020</year>
          .
          <source>Proceedings of the 9th Italian Conference on Com</source>
          [7]
          <string-name>
            <given-names>S.</given-names>
            <surname>Bartezzaghi</surname>
          </string-name>
          ,
          <article-title>Parole in gioco: Per una semiotica putational Linguistics (CLiC-it</article-title>
          <year>2023</year>
          ),
          <year>2023</year>
          . URL:
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          <source>del gioco linguistico, Bompiani</source>
          ,
          <year>2017</year>
          . https://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>3596</volume>
          . [8]
          <string-name>
            <given-names>P.</given-names>
            <surname>Ichino</surname>
          </string-name>
          , L'ora desiata vola:
          <source>guida al mondo del</source>
          [18]
          <string-name>
            <given-names>G.</given-names>
            <surname>Angelini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ernandes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Gori</surname>
          </string-name>
          , Solving ital-
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          <article-title>Conference of the Italian Association for Artificial Inc</article-title>
          .,
          <year>2020</year>
          , pp.
          <fpage>1877</fpage>
          -
          <lpage>1901</lpage>
          . URL: https://proceedings.
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          <string-name>
            <surname>Intelligence</surname>
          </string-name>
          ,
          <year>2005</year>
          . URL: https://link.springer.com/ neurips.cc/paper_files/paper/2020/file/
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          <source>chapter/10</source>
          .1007/11558590_
          <fpage>40</fpage>
          . 1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf. [19]
          <string-name>
            <given-names>A.</given-names>
            <surname>Zugarini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Zeinalipour</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. S.</given-names>
            <surname>Kadali</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Maggini</surname>
          </string-name>
          , [27]
          <string-name>
            <given-names>J.</given-names>
            <surname>Wei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Schuurmans</surname>
          </string-name>
          , M. Bosma,
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          <string-name>
            <given-names>N.</given-names>
            <surname>Xue</surname>
          </string-name>
          (Eds.),
          <source>Proceedings of the 2024 Joint In- A. Agarwal</source>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Belgrave</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Cho</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          . Oh (Eds.),
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          <string-name>
            <surname>tics</surname>
          </string-name>
          , Language Resources and
          <string-name>
            <surname>Evaluation (LREC- Systems</surname>
          </string-name>
          , volume
          <volume>35</volume>
          ,
          <string-name>
            <surname>Curran</surname>
            <given-names>Associates</given-names>
          </string-name>
          , Inc.,
          <year>2022</year>
          ,
        </mixed-citation>
      </ref>
      <ref id="ref38">
        <mixed-citation>
          <string-name>
            <surname>COLING</surname>
          </string-name>
          <year>2024</year>
          ),
          <article-title>ELRA</article-title>
          and
          <string-name>
            <given-names>ICCL</given-names>
            ,
            <surname>Torino</surname>
          </string-name>
          , Italia,
          <year>2024</year>
          , pp.
          <fpage>24824</fpage>
          -
          <lpage>24837</lpage>
          . URL: https://proceedings.
        </mixed-citation>
      </ref>
      <ref id="ref39">
        <mixed-citation>
          pp.
          <fpage>3347</fpage>
          -
          <lpage>3356</lpage>
          . URL: https://aclanthology.org/
          <year>2024</year>
          . neurips.cc/paper_files/paper/2022/file/
        </mixed-citation>
      </ref>
      <ref id="ref40">
        <mixed-citation>
          <source>lrec-main.297</source>
          . 9d5609613524ecf4f15af0f7b31abca4-Paper-Conference. [20]
          <string-name>
            <given-names>K.</given-names>
            <surname>Zeinalipour</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Iaquinta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Zanollo</surname>
          </string-name>
          , G. Angelini, pdf.
        </mixed-citation>
      </ref>
      <ref id="ref41">
        <mixed-citation>
          <string-name>
            <given-names>L.</given-names>
            <surname>Rigutini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Maggini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Gori</surname>
          </string-name>
          , Italian crossword [28]
          <string-name>
            <given-names>M.</given-names>
            <surname>Abdin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. A.</given-names>
            <surname>Jacobs</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. A.</given-names>
            <surname>Awan</surname>
          </string-name>
          , J. Aneja,
        </mixed-citation>
      </ref>
      <ref id="ref42">
        <mixed-citation>
          <article-title>tive word puzzles</article-title>
          ,
          <source>in: Proceedings of the 9th Italian A</source>
          .
          <string-name>
            <surname>Bakhtiari</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Bao</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Behl</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Benhaim</surname>
          </string-name>
          , M. Bilenko,
        </mixed-citation>
      </ref>
      <ref id="ref43">
        <mixed-citation>
          <string-name>
            <surname>Conference on Computational Linguistics (CLiC-it J. Bjorck</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Bubeck</surname>
            ,
            <given-names>Q. C.</given-names>
          </string-name>
          et al., Phi-
          <volume>3</volume>
          <fpage>techni</fpage>
          -
        </mixed-citation>
      </ref>
      <ref id="ref44">
        <mixed-citation>
          <year>2023</year>
          ),
          <year>2023</year>
          . URL: https://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>3596</volume>
          .
          <article-title>cal report: A highly capable language model lo</article-title>
          [21]
          <string-name>
            <given-names>K.</given-names>
            <surname>Zeinalipour</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y. G.</given-names>
            <surname>Keptig</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Maggini</surname>
          </string-name>
          , L. Rigutini, cally on your phone,
          <source>Arxiv</source>
          (
          <year>2024</year>
          ). URL: https:
        </mixed-citation>
      </ref>
      <ref id="ref45">
        <mixed-citation>
          <string-name>
            <given-names>M.</given-names>
            <surname>Gori</surname>
          </string-name>
          ,
          <article-title>A turkish educational crossword puzzle //arxiv</article-title>
          .org/abs/2404.14219.
        </mixed-citation>
      </ref>
      <ref id="ref46">
        <mixed-citation>
          generator,
          <source>ArXiv abs/2405</source>
          .07035 (
          <year>2024</year>
          ). URL: https: [29]
          <string-name>
            <given-names>E. J.</given-names>
            <surname>Hu</surname>
          </string-name>
          , yelong shen, P. Wallis,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Allen-Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
        </mixed-citation>
      </ref>
      <ref id="ref47">
        <mixed-citation>
          //arxiv.org/abs/2405.07035v2.
          <string-name>
            <given-names>S.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Wang</surname>
          </string-name>
          , W. Chen, LoRA: Low-rank adap[22]
          <string-name>
            <given-names>P.</given-names>
            <surname>Basile</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lovetere</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Monti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Pascucci</surname>
          </string-name>
          ,
          <string-name>
            <surname>F.</surname>
          </string-name>
          <article-title>San- tation of large language models</article-title>
          , in: The Tenth
        </mixed-citation>
      </ref>
      <ref id="ref48">
        <mixed-citation>
          <string-name>
            <surname>gati</surname>
          </string-name>
          , L. Siciliani, Ghigliottin-ai@evalita2020: Eval- International Conference on Learning Representa-
        </mixed-citation>
      </ref>
      <ref id="ref49">
        <mixed-citation>
          <article-title>uating artificial players for the language game tions (ICLR 2022), OpenReview</article-title>
          , Online,
          <year>2022</year>
          . URL:
        </mixed-citation>
      </ref>
      <ref id="ref50">
        <mixed-citation>
          <article-title>"la ghigliottina" (short paper)</article-title>
          , EVALITA Evalua- https://openreview.net/forum?id=nZeVKeeFYf9.
        </mixed-citation>
      </ref>
      <ref id="ref51">
        <mixed-citation>
          <article-title>tion of NLP and Speech Tools for Italian -</article-title>
          Decem- [30]
          <string-name>
            <given-names>T.</given-names>
            <surname>Dettmers</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Pagnoni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Holtzman</surname>
          </string-name>
          , L. Zettle-
        </mixed-citation>
      </ref>
      <ref id="ref52">
        <mixed-citation>
          <source>ber 17th</source>
          ,
          <year>2020</year>
          (
          <year>2020</year>
          ). URL: https://doi.org/10.4000/ moyer, Qlora: Eficient finetuning of quantized
        </mixed-citation>
      </ref>
      <ref id="ref53">
        <mixed-citation>
          <string-name>
            <given-names>books.aaccademia.</given-names>
            7488. llms, in: A.
            <surname>Oh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Naumann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Globerson</surname>
          </string-name>
          , [23]
          <string-name>
            <given-names>P.</given-names>
            <surname>Basile</surname>
          </string-name>
          , M. de Gemmis, P. Lops, G. Semeraro,
          <string-name>
            <surname>Solv- K. Saenko</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Hardt</surname>
          </string-name>
          , S. Levine (Eds.), Advances
        </mixed-citation>
      </ref>
      <ref id="ref54">
        <mixed-citation>
          <article-title>based word associations discovery</article-title>
          ,
          <source>IEEE Trans-</source>
          volume
          <volume>36</volume>
          ,
          <string-name>
            <surname>Curran</surname>
            <given-names>Associates</given-names>
          </string-name>
          , Inc.,
          <year>2023</year>
          ,
        </mixed-citation>
      </ref>
      <ref id="ref55">
        <mixed-citation>
          <source>actions on Computational Intelligence and AI</source>
          in pp.
          <fpage>10088</fpage>
          -
          <lpage>10115</lpage>
          . URL: https://proceedings.
        </mixed-citation>
      </ref>
      <ref id="ref56">
        <mixed-citation>
          <issue>Games 8</issue>
          (
          <year>2016</year>
          )
          <fpage>13</fpage>
          -
          <lpage>26</lpage>
          . doi:
          <volume>10</volume>
          .1109/TCIAIG.
          <year>2014</year>
          . neurips.cc/paper_files/paper/2023/file/
        </mixed-citation>
      </ref>
      <ref id="ref57">
        <mixed-citation>
          2355859. 1feb87871436031bdc0f2beaa62a049b-Paper-Conference. [24]
          <string-name>
            <given-names>X.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Liao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Qi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Eustratiadis</surname>
          </string-name>
          , C. Monz, pdf.
        </mixed-citation>
      </ref>
      <ref id="ref58">
        <mixed-citation>
          <string-name>
            <given-names>A.</given-names>
            <surname>Bisazza</surname>
          </string-name>
          , M. de Rijke, The sifo benchmark: Inves- [31]
          <string-name>
            <given-names>T.</given-names>
            <surname>Wolf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Debut</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Sanh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Chaumond</surname>
          </string-name>
          , C. De-
        </mixed-citation>
      </ref>
      <ref id="ref59">
        <mixed-citation>
          <source>ity of large language models</source>
          ,
          <year>2024</year>
          . URL: https: towicz, J. Davison,
          <string-name>
            <given-names>S.</given-names>
            <surname>Shleifer</surname>
          </string-name>
          , P. von Platen, C. Ma,
        </mixed-citation>
      </ref>
      <ref id="ref60">
        <mixed-citation>
          //arxiv.org/abs/2406.19999. arXiv:
          <volume>2406</volume>
          .19999.
          <string-name>
            <given-names>Y.</given-names>
            <surname>Jernite</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Plu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. Le</given-names>
            <surname>Scao</surname>
          </string-name>
          , S. Gugger, [25]
          <string-name>
            <given-names>H.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. M.</given-names>
            <surname>Ponti</surname>
          </string-name>
          ,
          <string-name>
            <surname>Fine-tuning M. Drame</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          <string-name>
            <surname>Lhoest</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Rush</surname>
          </string-name>
          , Transformers:
        </mixed-citation>
      </ref>
      <ref id="ref61">
        <mixed-citation>
          <string-name>
            <surname>Arxiv</surname>
          </string-name>
          (
          <year>2024</year>
          ). URL: https://arxiv.org/abs/2403.07794.
          <string-name>
            <given-names>Q.</given-names>
            <surname>Liu</surname>
          </string-name>
          , D. Schlangen (Eds.), Proceedings of the [26]
          <string-name>
            <given-names>T.</given-names>
            <surname>Brown</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Mann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ryder</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Subbiah</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. D.</surname>
          </string-name>
          <year>2020</year>
          <article-title>Conference on Empirical Methods in Natu-</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref62">
        <mixed-citation>
          <string-name>
            <given-names>G.</given-names>
            <surname>Krueger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Henighan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Child</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          . Ramesh, line,
          <year>2020</year>
          , pp.
          <fpage>38</fpage>
          -
          <lpage>45</lpage>
          . URL: https://aclanthology.
        </mixed-citation>
      </ref>
      <ref id="ref63">
        <mixed-citation>
          <string-name>
            <given-names>D.</given-names>
            <surname>Ziegler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Winter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Hesse</surname>
          </string-name>
          , M. Chen, org/
          <year>2020</year>
          .emnlp-demos.6. doi:
          <volume>10</volume>
          .18653/v1/
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref64">
        <mixed-citation>
          <string-name>
            <given-names>E.</given-names>
            <surname>Sigler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Litwin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gray</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Chess</surname>
          </string-name>
          , J. Clark, emnlp-demos.
          <volume>6</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref65">
        <mixed-citation>
          <string-name>
            <given-names>C.</given-names>
            <surname>Berner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>McCandlish</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Radford</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Sutskever</surname>
          </string-name>
          , [
          <volume>32</volume>
          ]
          <string-name>
            <surname>OpenAI</surname>
          </string-name>
          , Hello gpt-4o, Website,
          <year>2024</year>
          . URL: https:
        </mixed-citation>
      </ref>
      <ref id="ref66">
        <mixed-citation>
          <string-name>
            <given-names>D.</given-names>
            <surname>Amodei</surname>
          </string-name>
          ,
          <article-title>Language models are few-shot learners</article-title>
          , //openai.com/index/hello-gpt-4o.
        </mixed-citation>
      </ref>
      <ref id="ref67">
        <mixed-citation>
          in: H.
          <string-name>
            <surname>Larochelle</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Ranzato</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Hadsell</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Balcan</surname>
          </string-name>
          , [
          <volume>33</volume>
          ]
          <string-name>
            <surname>Anthropic</surname>
          </string-name>
          , Claude
          <volume>3</volume>
          .5 sonnet, Website,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref68">
        <mixed-citation>
          <source>Processing Systems</source>
          , volume
          <volume>33</volume>
          ,
          <string-name>
            <surname>Curran</surname>
            <given-names>Associates</given-names>
          </string-name>
          ,
          <fpage>claude</fpage>
          -3-5-sonnet. [34]
          <string-name>
            <given-names>A.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Hui</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Zheng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <surname>Statistic EurekaRebus</surname>
          </string-name>
          ItaCW-filtered
        </mixed-citation>
      </ref>
      <ref id="ref69">
        <mixed-citation>
          <string-name>
            <given-names>C.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Huang</surname>
          </string-name>
          , G. Dong,
          <string-name>
            <given-names>H.</given-names>
            <surname>Wei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <source># examples 222089 83157</source>
        </mixed-citation>
      </ref>
      <ref id="ref70">
        <mixed-citation>
          <string-name>
            <given-names>J.</given-names>
            <surname>Tang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Tu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , J. Ma, #
          <source>authors 8138 5046</source>
        </mixed-citation>
      </ref>
      <ref id="ref71">
        <mixed-citation>
          <string-name>
            <given-names>J.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Bai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Dang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lu</surname>
          </string-name>
          ,
          <source>Year range</source>
          <year>1800</year>
          - 2024
          <fpage>1869</fpage>
          -
          <lpage>2024</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref72">
        <mixed-citation>
          <string-name>
            <given-names>S.</given-names>
            <surname>Bai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Tan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Ge</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Deng</surname>
          </string-name>
          , #
          <source>unique words 38977 8960</source>
        </mixed-citation>
      </ref>
      <ref id="ref73">
        <mixed-citation>
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Ren</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Ren</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Fan</surname>
          </string-name>
          , Avg./SD words/ex. 3.
          <issue>50</issue>
          /1/48 3.
          <issue>08</issue>
          /1.00
        </mixed-citation>
      </ref>
      <ref id="ref74">
        <mixed-citation>
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Chu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Cui</surname>
          </string-name>
          , Avg.
          <source>/SD word len. 6.51/1.96 5.70/1</source>
          .60
        </mixed-citation>
      </ref>
      <ref id="ref75">
        <mixed-citation>
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Fan</surname>
          </string-name>
          ,
          <source>Qwen2 technical report</source>
          ,
          <year>2024</year>
          . Avg./
          <source>SD FP len. 26.45/11.19 25.74/8</source>
          .73
        </mixed-citation>
      </ref>
      <ref id="ref76">
        <mixed-citation>
          URL: https://arxiv.org/abs/2407.10671. Solution [35]
          <string-name>
            <surname>M. AI</surname>
          </string-name>
          ,
          <article-title>Introducing meta llama 3: The most capable # unique words 75718 42558</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref77">
        <mixed-citation>
          <article-title>openly available llm to date</article-title>
          , Website,
          <year>2024</year>
          . URL: Avg./SD words/
          <source>ex. 3.02/1.60 2.80/1</source>
          .21
        </mixed-citation>
      </ref>
      <ref id="ref78">
        <mixed-citation>
          https://ai.meta.com/blog/meta-llama-3. Avg./
          <source>SD word len. 8.07/2.30 7.79/2</source>
          .23 [36]
          <string-name>
            <given-names>F.</given-names>
            <surname>Mercorio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mezzanzanica</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Potertì</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Serino</surname>
          </string-name>
          , Avg./SD Sol.
          <source>len. 19.47/8.44 18.81/6</source>
          .06
        </mixed-citation>
      </ref>
      <ref id="ref79">
        <mixed-citation>
          <source>ciency on the invalsi italian benchmark</source>
          ,
          <year>2024</year>
          . URL:
          <article-title>TStaabtliseti5cs for the full EurekaRebus dataset and the crosswords-</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref80">
        <mixed-citation>
          https://arxiv.org/abs/2406.17535.
          <article-title>filtered subset used in this work</article-title>
          . Avg./SD = Average/standard [37]
          <string-name>
            <given-names>A.</given-names>
            <surname>Morris</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Maier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Green</surname>
          </string-name>
          ,
          <article-title>From wer and ril deviation</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref81">
        <mixed-citation>
          <article-title>connected speech recognition</article-title>
          .,
          <year>2004</year>
          . [38]
          <string-name>
            <given-names>V.</given-names>
            <surname>Lyding</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Stemle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Borghetti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Brunello</surname>
          </string-name>
          , Model # Char. Paisà Freq. Train Freq.
        </mixed-citation>
      </ref>
      <ref id="ref82">
        <mixed-citation>
          <string-name>
            <given-names>S.</given-names>
            <surname>Castagnoli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Dell'Orletta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Dittmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Lenci</surname>
          </string-name>
          ,
          <source>GPT-4o -0.01 0.01 0</source>
          .
          <fpage>02</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref83">
        <mixed-citation>
          <string-name>
            <given-names>V.</given-names>
            <surname>Pirrelli</surname>
          </string-name>
          ,
          <source>The PAISÀ corpus of Italian web texts, Claude-3.5 -0.02 -0.02 0</source>
          .
          <fpage>00</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref84">
        <mixed-citation>
          <string-name>
            <surname>in: F. Bildhauer</surname>
          </string-name>
          , R. Schäfer (Eds.),
          <source>Proceedings of Phi-3 (ours) -0.11 -0.05 0</source>
          .
          <fpage>44</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref85">
        <mixed-citation>
          <source>the 9th Web as Corpus Workshop (WaC-9)</source>
          ,
          <source>Associ- GPT-4o -0.18 0.14 0</source>
          .
          <fpage>19</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref86">
        <mixed-citation>
          <article-title>ation for Computational Linguistics</article-title>
          , Gothenburg,
          <source>Claude-3.5 -0.15 0.08 0</source>
          .
          <fpage>13</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref87">
        <mixed-citation>
          <string-name>
            <surname>Sweden</surname>
          </string-name>
          ,
          <year>2014</year>
          , pp.
          <fpage>36</fpage>
          -
          <lpage>43</lpage>
          . URL: https://aclanthology. Phi-3
          <source>(ours) -0.02 0.08 0</source>
          .
          <fpage>22</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref88">
        <mixed-citation>
          <source>org/W14-0406</source>
          . doi:
          <volume>10</volume>
          .3115/v1/
          <fpage>W14</fpage>
          -0406. [39]
          <string-name>
            <given-names>L.</given-names>
            <surname>Chen</surname>
          </string-name>
          , J. Liu,
          <string-name>
            <given-names>S.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Liang</surname>
          </string-name>
          , Table 6
        </mixed-citation>
      </ref>
      <ref id="ref89">
        <mixed-citation>
          <source>mated Planning and Scheduling</source>
          <volume>32</volume>
          (
          <year>2022</year>
          )
          <fpage>35</fpage>
          -
          <lpage>43</lpage>
          .  &lt; 1 −
          <volume>5</volume>
          [41]
        </mixed-citation>
      </ref>
      <ref id="ref90">
        <mixed-citation>
          <source>view/19783</source>
          . doi:
          <volume>10</volume>
          .1609/icaps.v32i1.19783.
          <article-title>the pool of available definitions for every word</article-title>
          . [40]
          <string-name>
            <given-names>J.</given-names>
            <surname>Ferrando</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Sarti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bisazza</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. R.</surname>
          </string-name>
          <article-title>Costa-jussà,</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref91">
        <mixed-citation>
          <article-title>A primer on the inner workings of transformer- First pass/Solution word distribution Figure 2</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref92">
        <mixed-citation>
          <source>based language models</source>
          ,
          <source>Arxiv</source>
          (
          <year>2024</year>
          ).
          <article-title>URL: https: shows the distribution of first pass and solution words</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref93">
        <mixed-citation>
          //arxiv.org/abs/2405.00208.
          <article-title>for the filtered EurekaRebus subset used in our work</article-title>
          . [41]
          <string-name>
            <given-names>C.</given-names>
            <surname>Bonferroni</surname>
          </string-name>
          , Teoria statistica delle classi e calcolo
        </mixed-citation>
      </ref>
      <ref id="ref94">
        <mixed-citation>
          <issue>Firenze 8</issue>
          (
          <year>1936</year>
          )
          <fpage>3</fpage>
          -
          <lpage>62</lpage>
          . Results
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>