<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>K. Zeinalipour);</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Harnessing LLMs for Educational Content-Driven Italian Crossword Generation</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Kamyar Zeinalipour</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Achille Fusco</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Asya Zanollo</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marco Maggini</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marco Gori</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>IUSS Pavia</institution>
          ,
          <addr-line>Piazza della Vittoria 15, 27100 Pavia, PV</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Siena, DIISM</institution>
          ,
          <addr-line>Via Roma 56, 53100 Siena</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2024</year>
      </pub-date>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0002</lpage>
      <abstract>
        <p>In this work, we unveil a novel tool for generating Italian crossword puzzles from text, utilizing advanced language models such as GPT-4o, Mistral-7B-Instruct-v0.3, and Llama3-8b-Instruct. Crafted specifically for educational applications, this cutting-edge generator makes use of the comprehensive Italian-Clue-Instruct dataset, which comprises over 30,000 entries including diverse text, solutions, and types of clues. This carefully assembled dataset is designed to facilitate the creation of contextually relevant clues in various styles associated with specific texts and keywords. The study delves into four distinctive styles of crossword clues: those without format constraints, those formed as definite determiner phrases, copular sentences, and bare noun phrases. Each style introduces unique linguistic structures to diversify clue presentation. Given the lack of sophisticated educational tools tailored to the Italian language, this project seeks to enhance learning experiences and cognitive development through an engaging, interactive platform. By meshing state-of-the-art AI with contemporary educational strategies, our tool can dynamically generate crossword puzzles from Italian educational materials, thereby providing an enjoyable and interactive learning environment. This technological advancement not only redefines educational paradigms but also sets a new benchmark for interactive and cognitive language learning solutions.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Large Language Models</kwd>
        <kwd>Italian Educational Puzzles</kwd>
        <kwd>Interactive Learning</kwd>
        <kwd>Italian Educational Crosswords</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>techniques, the tool produces high-quality clues and
answers, ofering educators a resource to develop more
While traditionally valued for their challenge and enter- interactive and efective instructional methods.
tainment, crossword puzzles are increasingly recognized Furthermore, a new dataset called 1 has been compiled
for their educational benefits. They provide an interac- and will be released to the scientific community.
tive learning environment that enhances the retention of The layout of this paper is organized in the following
both technical terms and general language skills, hence manner: Section 2 surveys the relevant literature in detail.
facilitating learning across various disciplines, improving Section 3 explains the methods used for dataset collection
language acquisition, and supporting cognitive develop- and curation. In Section 3, we describe the computational
ment, through critical thinking and memory retention techniques employed in our study. Section 4 reports the
[1, 2, 3, 4, 5, 6, 7, 3, 8, 9, 2, 10, 11]. results derived from our experimental analysis. Finally,
The integration of Natural Language Processing (NLP) Section 5 closes with conclusive insights and the broader
and Large Language Models (LLMs) has further enhanced implications of our research findings.
their efectiveness by providing sophisticated,
contextually relevant clues for educational crosswords.</p>
      <p>This paper presents a novel tool that uses LLMs to gen- 2. Related Works
erate tailored Italian educational crossword puzzles from
texts, ofering various clue types. By integrating
userprovided texts or keywords and applying fine-tuning
clues [15]. The prominent focus was placed on the bolded keywords
On a diferent front, Arora et al. developed SEEKH, a that highlight the primary topic and other significant
system that integrates statistical and linguistic analyses terms within each article. Beyond keyword
identificato generate crossword puzzles in multiple Indian lan- tion, we also gathered a variety of essential metadata.
guages. Their approach emphasizes the identification of This included metrics such as view counts, relevance
askeywords to structure the puzzles [16]. sessments, brief narrative summaries, central headlines,
Recent progress in crossword puzzle generation has related terms, categorization, and URLs.2 The uniform
been notably advanced by the work of Zeinalipour et al. structure of the Italian Wikipedia significantly aids this
[17, 18, 19, 20], who demonstrated the use of large-scale process. By tapping into the introductory sections, which
language models to develop puzzles in languages with are particularly information-rich, we could
systematilimited support, such as English, Italian and Arabic. Their cally extract and outline the key concepts needed. This
research highlights the vast potential of computational approach ensures a comprehensive data repository,
caplinguistics in crafting puzzles that are both engaging and turing critical elements and insights from a diverse array
linguistically rich. Initially, they employed few-shot and of articles.
zero-shot learning techniques to generate new crossword
clues from text [18, 17]. Data Enhancement To ensure the reliability and
efecFurthermore, Zugarini et al. [21] introduced a method tiveness of our data, we performed some filtering based
for generating educational crossword clues from the pro- on diferent criteria. The first filter was designed to
priorivided text in English. tize the most important pages and those with the highest
In their Italian crossword puzzle generation study [18], number of views. Firstly, articles were selected based
Zeinalipour et al. initially used few-shot learning with on their popularity and relevance. To ensure a balanced
large language models as-is. However, our current and manageable dataset, we also discarded articles that
project goes a step further by introducing a specially were either too lengthy or too brief, specifically those
designed dataset for this task in Italian. Additionally, with fewer than 50 words. Additionally, we removed
keywe have developed open-source models that have been word associations longer than two words to maintain the
ifne-tuned to significantly enhance performance for this clarity and relevance of the crossword clues. Finally, we
specific application. imposed restrictions on keywords to ensure they were
The current research initiates a novel approach by uti- between 3 and 20 characters in length and free of
spelizing state-of-the-art language modeling to develop Ital- cial characters or numerals. Multi-words expressions
ian crossword puzzles from given texts. By doing so, were also included as good keywords as they are quite
it enriches the toolkit for language education, thereby common in crossword puzzles.
pushing forward the development of Italian crossword
puzzles.</p>
      <sec id="sec-1-1">
        <title>Formulation of Various Prompts Crafting special</title>
        <p>ized prompts was pivotal for producing Italian
cross3. Methodology word clues from a given text using GPT-4o. The prompts
were created to generate clues that were both
informaWe have developed an automated system that gen- tive and engaging, by incorporating crucial details and
erates educational Italian crossword puzzles using background context from the articles. Additionally, apart
LLMs, with the Italian-Clue-Instruct dataset at its we aimed to elicit three specific types of clue varying in
core. Our approach leverages the adaptability of their syntactic structures:</p>
        <sec id="sec-1-1-1">
          <title>LLMs, like GPT-4o, to create puzzles from text, with human validation for accuracy. Additionally, we ifne-tuned models such as Llama3-8b-Instruct and</title>
          <p>Mistral-7B-Instruct-v0.3 to improve clue
accuracy and relevance.</p>
          <p>A more detailed description of our methodology,
illustrated in Figure 1, is provided in the following.
• definite determiner phrases: nominal
clues headed by a definite article and usually
modified by adjectives, prepositional phrases (PPs) or
relative clauses (RCs), like &lt;La repubblica asiatica
con capitale Tashkent, Uzbekistan&gt; (‘The Asian
republic with Tashkent as capital’, ‘Uzbekistan’).</p>
        </sec>
        <sec id="sec-1-1-2">
          <title>Such clues are examples of definite descriptions</title>
          <p>which have been traditionally analyzed as
carrying a uniqueness presupposition ([22]) when
singular and a maximality presupposition [23]
when plural. In the context of crosswords, clues
of this kind refer to their solution as the single
Italian-Clue-Instruct</p>
        </sec>
      </sec>
      <sec id="sec-1-2">
        <title>Data Collection Methodology Initiating the data col</title>
        <p>lection process, we began by extracting the introductory
portions of Italian Wikipedia articles. We use Wikipedia
API and Beautiful Soup to automatically extract the pages. 2Wikipedia: Lists of popular pages by WikiProject
entity or the maximal plural entity satisfying the in 9) to ensure that the required structure is given
description. in output. It has been observed during the prompt
• bare noun phrases [24]: the clue consists of trials that the validity of precise structures for clues
a simple noun phrase (NP) with no determiner strongly depends on the type of text given in input.
and typically modified by adjectives, PPs or RCs, The prompts used for clue generation in this study are
for example &lt;Grande centro commerciale di lusso presented in Figures 6, 7, 8 and 9, located in the Appendix.
con sede a Londra, Harrods&gt; (‘Luxury shopping
mall based in London’, ‘Harrods’). In Italian, NPs
are taken to denote a predicate that can be true Generation of Educational Italian Clues. Guided by
of one or more individuals [22, 25].3 Given the the self-instruct framework [27], we devised a method
absence of the definite determiner, bare NP clues to automate the generation of educational crossword
do not specify whether the referent of the solution clues in Italian, harnessing the power of LLMs. Central to
uniquely satisfy the description [22], thus more our approach is the sophisticated GPT-4o5, an enhanced
than one solution could in principle be possible. version of LLMs, renowned for its eficiency. A key
difer• copular sentences [26]: copular clues are entiator of our strategy is the integration of contextual
clausal definitions structured as &lt; copula predi- information with the clues produced. To achieve this,
cate&gt; with an elliptical subject as in &lt;è una salsa we carefully curated the content and keywords from the
piccante tipica della Tunisia, Harissa&gt; (‘(It) is a Wikipedia text extracted in previous sections. We used
spicy sauce typical of Tunisia’, ‘Harissa’). Copu- four distinct types of prompts, each designed to generate
las, like Italian essere (’to be’) connect a subject diferent categories of clues: bare noun phrases,
defiwith a non-verbal predicate, such as an adjecti- nite determiner phrases, and copular sentences. These
val phrase (AP), a PP or another nominal phrase prompts were crafted to create diverse types of clues,
(NP/DP). In crossword puzzles, the solution tar- ensuring alignment with our specific objectives for
edugets the precopular position of such sentences, cational content in Italian.
i.e. the elliptical subject. 4</p>
        <p>Overview of the Italian-Clue-Instruct Dataset Our
To accomplish this, we created three distinct prompts for research began with downloading 88,403 articles from
each clue structure, and one prompt that does not specify the Italian Wikipedia, which we filtered down to 11,413
the structure. This step allows us to test the syntactic sen- relevant entries. From this refined set, we selected 5,000
sitivity of the models employed and, more importantly it articles for clue generation, spanning 29 thematic
categives us the possibility of manipulating the structure to gories. To enhance our dataset, we leveraged the
capabilcreate variation not just with respect to the subject mat- ities of GPT-4o, generating a minimum of three diverse
ter but also in the clue syntactic complexity. Moreover, clues per Wikipedia article, depending on the text length.
generating clues with specific structures represents an in- This efort resulted in a compilation of 15,000 unique
teresting resource for the educational characterization of clues.
puzzles. Indeed, it is well-known from psycholinguistic The dataset’s in-depth analysis demonstrates a
variresearch that diferent structures can elicitate diferent ability in context length, ranging from 10 to 1512 tokens,
reactions in the processing which can be correlated with with most texts falling between 100 and 600 tokens.
Figfactors like age, linguistic disorders etc. and this can be ure 2 showcases the token distribution for contexts and
exploited when creating puzzles specific for any solver’s clues, which have been processed using the Llama3
tokneeds. enizer. Typically, the clue-generation process results in</p>
        <p>As for the prompt engeneering, the structure has clues ranging from 4 to 55 tokens in length.
been explicitated in one dedicated step of the prompt Figure 3 illustrates the spread of data across diferent
chain. For what regards the copular structure, which is categories. The dataset is notably dominated by the
catwidespread and widely used with diferent formulation, egories of "Entertainment", "Geography", and "History".
we include an example in the prompt (as shown In contrast, categories such as "Mathematics",
"Architecture", and "Languages" are underrepresented.
3Bare NPs are known to denote also natural kinds [22]. However,
given that NP clues occur in isolation, it is rather dificult to
distinguish among the two senses, therefore we assume the more general
reading of NPs as predicates. We leave this discussion to future
analyses.</p>
        <sec id="sec-1-2-1">
          <title>4Copular sentences are known to be diferentiated between canonical</title>
          <p>and inverse structures [26]. Usually in crossword clues canonical
structure are found more frequently, but inverse copular clues are
not excluded. We leave the question open for further, purely
linguistic research.</p>
        </sec>
      </sec>
      <sec id="sec-1-3">
        <title>Evaluating quality of the Italian-Clue-Instruct</title>
        <sec id="sec-1-3-1">
          <title>Dataset Producing accurate and engaging Italian edu</title>
          <p>cational crossword clues is inhibited by the absence of
a reference corpus, making it dificult to draw
comparisons using standard measures, such as ROUGE scores.</p>
        </sec>
        <sec id="sec-1-3-2">
          <title>5https://openai.com/index/hello-gpt-4o/</title>
          <p>Data
Collection
(a)
(b)
(c)</p>
          <p>Italian Clue
Creation (using
GPT-4 Turbo)
(d)
(e)
Our evaluation strategy adapts uniquely to the task re- the similarity between the n-grams of the generated clues
quirements. Specifically, efective clues should represent and the reference text from Wikipedia, it is not a reliable
contextually accurate paraphrases of text information. metric and does not provide any assessment of the
semanTo accommodate this, we adopted an extractive method, tic quality of the generated clues. However, it provides a
using the ROUGE-L score to gauge the adequacy of clues general picture of the generated clues.
in reflecting the input context that we extracted from In addition, the integrity of the generated clues was
furWikipedia. By comparing input sentences to the gener- ther examined through human evaluations. A randomly
ated clues, the evaluation aimed to attain high scores to chosen subset of clues was assessed, generated from a
ensure strict adherence to the original text, minimizing sample of 100 articles, with a maximum of three clues
irrelevant content and avoiding clues that merely repli- per article. To avoid repetitions, duplicate clues were
recate the input or improperly introduce the target key- moved. The evaluation employed a five-level criteria
sysword. Results indicated a substantial connection between tem, analogous to the methodology utilized by [27]. For
the context and the clues, with an average ROUGE-1, the present evaluation, the following parameters were
ROUGE-2, and ROUGE-L score of 0.159, 0.114, and 0.146 used:
respectively.</p>
          <p>Considering that the ROUGE score merely compares • RATING-A: The clue is coherent and valid,
align</p>
          <p>Content Categories
1400
1200
1000</p>
        </sec>
        <sec id="sec-1-3-3">
          <title>The evaluation was made by a native Italian speaker,</title>
          <p>master student of linguistics, and PhD student in
linguistics, who followed the criteria described above. Please
refer to Table 2 for examples of clues and their respective
ratings.</p>
        </sec>
        <sec id="sec-1-3-4">
          <title>The distribution of the evaluation outcomes is depicted in Figure 4, these illustrate that the majority of the generated clues were of high quality rated as ’A’ and only a small fraction rated as ’C’, ’D’, or ’E’.</title>
        </sec>
        <sec id="sec-1-3-5">
          <title>By utilizing both quantitative metrics and qualitative</title>
          <p>assessments, the study aimed to validate the educational
utility and contextual accuracy of the clues created for</p>
        </sec>
        <sec id="sec-1-3-6">
          <title>Italian educational crosswords.</title>
        </sec>
      </sec>
      <sec id="sec-1-4">
        <title>Enhancing LLMs for Italian text-based Educational</title>
      </sec>
      <sec id="sec-1-5">
        <title>Crossword Puzzle Generation To develop crossword</title>
        <p>puzzle clues from Italian texts using advanced LLM
functionalities, we employed three models: GPT-4o (for
data generation), Mistral-7B-Instruct-v0.3, and</p>
        <sec id="sec-1-5-1">
          <title>Llama3-8b-Instruct known for their strong text gen</title>
          <p>eration and Italian language support. [28, 29].</p>
        </sec>
        <sec id="sec-1-5-2">
          <title>We began the process by fine-tuning the models with</title>
          <p>the Italian-Clue-Instruct dataset, which was rich in
relevant material. This calibration was vital to enhance
the models’ proficiency in generating Italian clues while
accurately reflecting the Italian language’s intricate
grammar and vocabulary within educational contexts.</p>
        </sec>
        <sec id="sec-1-5-3">
          <title>To further refine the models, we optimized the parame</title>
          <p>ters during the fine-tuning phase. This efort aimed to
reduce errors specific to our task and better align the
output of the models with Italian educational materials.</p>
        </sec>
        <sec id="sec-1-5-4">
          <title>Ultimately, the specialized tuning of these LLMs with a</title>
          <p>dedicated dataset was intended to foster their ability to Llama3-8b-Instruct models in generating clues from
generate high-quality crossword clues from Italian texts. Italian educational texts.</p>
        </sec>
        <sec id="sec-1-5-5">
          <title>The goal was to ensure that the resulting clues were not only linguistically sound but also relevant within an educational framework.</title>
        </sec>
      </sec>
      <sec id="sec-1-6">
        <title>Evaluation Results with the human evaluator Us</title>
        <p>ing a dataset of 100 Italian contexts, each containing 3
clues, a human evaluation was conducted on both the
4. Experimental Results generated and base models. The results of this evaluation
are depicted in Figure 5. The evaluation employed the
This section ofers a detailed overview of the experiments 5-level rating system described in Section 3.
conducted in the study. It begins with the training setup The table provided ofers a comparative
evaluafor the Italian-Clue-Instruct LLMs, including key param- tion of the performance of language models in
geneters and computational resources. The performance of erating Italian clues from a given text.
Specifithe models is then evaluated using automated metrics, cally, the models Mistral-7B-Instruct-v0.3 and
such as the ROUGE score, to compare configurations and Llama3-8b-Instruct are evaluated based on both
identify areas for improvement. This is followed by an their base and fine-tuned configurations. Upon
finein-depth analysis of human evaluations, focusing on rele- tuning, Mistral-7B-Instruct-v0.3 displays a
sigvance, coherence, and content quality to provide insights nificant improvement, emerging as the top performer
beyond automated metrics. Additionally, an example of a in category "A", and surpassing Llama3-8b-Instruct
generated crossword puzzle is presented to demonstrate in terms of performance enhancement. These
findpractical usability. The goal is to highlight the robustness ings underscore the impact of fine-tuning on
enhancand versatility of the proposed approach. ing model capabilities, particularly highlighted by the
performances of Mistral-7B-Instruct-v0.3 and
Training Setup The models pLalraammae3t-er8sb, -reIsnpsetcrtiuveclty,. wFuhritchherfmeaotruer,efine7-taunndin8gbwililtihon
Mistral-7B-Instruct-v0.3 and the introduced dataset significantly increased the
modLlama3-8b-Instruct were fine-tuned using LORA els’ ability to generate Italian clues from the given text,
[30], with parameters set to  = 16 and  = 32, across illustrating the quality and efectiveness of the
Italianthree training epochs, maintaining a total batch size of 64.</p>
        <p>The full experimental setup was performed on a server TChlueem-Inestthroudctoldoagtyasfeotr. generating Italian crossword clues
equipped with four NVIDIA A6000 GPUs, utilizing from educational texts was explored, enabling
cusDeepSpeed [31] and FlashAttention 2 [32]. For the tomized clues. This would allow educators to select
suitinitial learning rate was configured at 3 × 10− 4. During able clues matching their teaching needs. The selected
inference, model distribution sampling was applied to clues could in turn be used to automatically generate a
generate clues for both Mistral-7B-Instruct-v0.3 crossword schema as discussed Zeinalipour et al. [17].
and Llama3-8b-Instruct, with a temperature param- Figure 10 in Appendix shows an example puzzle,
demoneter set to 0.1. Additionally, the parameters for top- strating the system’s application.
and top- sampling were set to 0.95 and 50, respectively.</p>
        <sec id="sec-1-6-1">
          <title>Among the three epoch checkpoints, the one with the</title>
          <p>minimum loss was selected, which, in our case, turned 5. Conclusion
out to be the second checkpoint.</p>
        </sec>
        <sec id="sec-1-6-2">
          <title>A novel system for generating crossword clues from</title>
          <p>Evaluation Results with the Automatic Metrics Italian text is introduced, leveraging the newly
deWe evaluated the resemblance between various sets of veloped Italian-Clue-Instruct dataset. This dataset,
clues produced by diferent models (details shown in which includes text, keywords, categories, and
reTable 1) and those generated by the GPT-4o model lated crossword clues in Italian, is pioneering in
on a test set of 200 educational contexts. This evalu- this field. By fine-tuning two large language
ation was done using ROUGE scores. Our results indi- models (LLMs), Mistral-7B-Instruct-v0.3 and
cate that the fine-tuned Mistral-7B-Instruct-v0.3 Llama3-8b-Instruct, using this dataset, we have
and Llama3-8b-Instruct models exhibit a closer achieved significant improvements in the models’ ability
similarity to GPT-4o. On the other hand, the base to generate crossword clues from given text. The results
Llama3-8b-Instruct model shows significantly lower highlight a substantial enhancement in model
perforsimilarity with minimal overlap. These outcomes mance after fine-tuning. Both the Italian-Clue-Instruct
highlight the eficacy of fine-tuning, demonstrating dataset and the fine-tuned models are now publicly
availthat using the Italian-Clue-Instruct dataset enhances able, providing valuable tools for students and teachers
the capability of Mistral-7B-Instruct-v0.3 and to create educational crossword puzzles from Italian text.
80
60
s
t
n
u
o
C40
20
0</p>
          <p>Counts of Ratings by Model</p>
          <p>model
mistral_base
llama3_base
mistral_finetuned
llama3_finetuned
Fine-tuned LLMs</p>
        </sec>
      </sec>
      <sec id="sec-1-7">
        <title>Model name</title>
        <p>Mistral-7B
Llama3-8b
Mistral-7B
Llama3-8b</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>Acknowledgments</title>
      <sec id="sec-2-1">
        <title>Future research will aim to develop models capable of</title>
        <p>generating various types of crossword clues, including
ifll-in-the-blank clues.</p>
        <p>The funding for this paper was provided by the TAILOR
project and the HumanE-AI-Net projects, both supported
by the EU Horizon 2020 research and innovation program
under GA No 952215 and No 952026, respectively.
thinking and problem solving skills in engineering theory of n-movement in syntax and logical form,
education, J Engin Educ Trans 30 (2017) 103–13. Linguistic inquiry (1994) 609–665.
[12] B. Ranaivo-Malançon, T. Lim, J.-L. Minoi, A. J. R. [25] Z. Roberto, Layers in the determiner phrase, Ph.D.</p>
      </sec>
      <sec id="sec-2-2">
        <title>Jupit, Automatic generation of fill-in clues and an- thesis, PhD Thesis, University of Rochester (Pub</title>
        <p>swers from raw texts for crosswords, in: 2013 8th lished by Garland, 2000), 1995.</p>
        <p>International Conference on Information Technol- [26] A. Moro, Copular sentences, The Blackwell
comogy in Asia (CITA), IEEE, 2013, pp. 1–5. panion to syntax (2006) 1–23.
[13] L. Rigutini, M. Diligenti, M. Maggini, M. Gori, A [27] Y. Wang, Y. Kordi, S. Mishra, A. Liu, N. A. Smith,
fully automatic crossword generator, in: 2008 Sev- D. Khashabi, H. Hajishirzi, Self-instruct: Aligning
enth International Conference on Machine Learn- language model with self generated instructions,
ing and Applications, IEEE, 2008, pp. 362–367. arXiv preprint arXiv:2212.10560 (2022).
[14] L. Rigutini, M. Diligenti, M. Maggini, M. Gori, Au- [28] T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D.
Katomatic generation of crossword puzzles, Inter- plan, P. Dhariwal, A. Neelakantan, P. Shyam, G.
Sasnational Journal on Artificial Intelligence Tools 21 try, A. Askell, et al., Language models are few-shot
(2012) 1250014. learners, Advances in neural information
process[15] J. Esteche, R. Romero, L. Chiruzzo, A. Rosá, Au- ing systems 33 (2020) 1877–1901.
tomatic definition extraction and crossword gen- [29] H. Touvron, T. Lavril, G. Izacard, X. Martinet,
eration from spanish news text, CLEI Electronic M.-A. Lachaux, T. Lacroix, B. Rozière, N. Goyal,</p>
      </sec>
      <sec id="sec-2-3">
        <title>Journal 20 (2017). E. Hambro, F. Azhar, et al., Llama: Open and efi</title>
        <p>[16] B. Arora, N. Kumar, Automatic keyword extraction cient foundation language models, arXiv preprint
and crossword generation tool for indian languages: arXiv:2302.13971 (2023).</p>
      </sec>
      <sec id="sec-2-4">
        <title>Seekh, in: 2019 IEEE Tenth International Confer- [30] E. J. Hu, Y. Shen, P. Wallis, Z. Allen-Zhu, Y. Li,</title>
        <p>ence on Technology for Education (T4E), IEEE, 2019, S. Wang, L. Wang, W. Chen, Lora: Low-rank
adappp. 272–273. tation of large language models, arXiv preprint
[17] K. Zeinalipour, T. Iaquinta, G. Angelini, L. Rigutini, arXiv:2106.09685 (2021).</p>
      </sec>
      <sec id="sec-2-5">
        <title>M. Maggini, M. Gori, Building bridges of knowl- [31] J. Rasley, S. Rajbhandari, O. Ruwase, Y. He, Deep</title>
        <p>edge: Innovating education with automated cross- speed: System optimizations enable training deep
word generation, in: 2023 International Conference learning models with over 100 billion parameters,
on Machine Learning and Applications (ICMLA), in: Proceedings of the 26th ACM SIGKDD
Interna</p>
      </sec>
      <sec id="sec-2-6">
        <title>IEEE, 2023, pp. 1228–1236. tional Conference on Knowledge Discovery &amp; Data</title>
        <p>[18] K. Zeinalipour, A. Zanollo, G. Angelini, L. Rigutini, Mining, 2020, pp. 3505–3506.</p>
        <p>M. Maggini, M. Gori, et al., Italian crossword [32] T. Dao, Flashattention-2: Faster attention with
generator: Enhancing education through interac- better parallelism and work partitioning, arXiv
tive word puzzles, arXiv preprint arXiv:2311.15723 preprint arXiv:2307.08691 (2023).
(2023).
[19] K. Zeinalipour, M. Saad, M. Maggini, M. Gori,
Arabicros: Ai-powered arabic crossword puzzle gener- A. Appendix
ation for educational applications, in: Proceedings
of ArabicNLP 2023, 2023, pp. 288–301.
[20] K. Zeinalipour, Y. G. Keptiğ, M. Maggini, L. Rigutini,</p>
      </sec>
      <sec id="sec-2-7">
        <title>M. Gori, A turkish educational crossword puzzle</title>
        <p>generator, in: International Conference on Artificial
Intelligence in Education, Springer, 2024, pp. 226–
233.
[21] A. Zugarini, K. Zeinalipour, S. S. Kadali, M.
Maggini, M. Gori, L. Rigutini, Clue-instruct: Text-based
clue generation for educational crossword puzzles,
arXiv preprint arXiv:2404.06186 (2024).
[22] G. Chierchia, Reference to kinds across language,</p>
      </sec>
      <sec id="sec-2-8">
        <title>Natural language semantics 6 (1998) 339–405.</title>
        <p>[23] G. Link, The logical analysis of plurals and mass
terms: A lattice theoretical approach, Meaning, Use,
and Interpretation of Language/Walter de Gruyter
(1983).
[24] G. Longobardi, Reference and proper names: A</p>
        <p>You are a crossword expert.</p>
        <p>Generate concise and clever clues in Italian for educational crossword puzzles based on a specified Keyword and its relation to an
assigned Text. To execute this task properly, replicate the guidelines below:
KEYWORD: {keyword}
TEXT: {text}
Observe the following steps:
1. Substitute every pronoun in the text with full phrases expressing their referents.
2. Split the text into small independent sentences that could be understood out of context.
3. Pinpoint three concise sentences that contain the Keyword and best characterize the keyword. Try to select sentences from
different parts of the Text.
4. Generate short and clever crossword clues in Italian from the selected sentences. Make sure that the keyword remains absent
from the clues. If the Keyword is not the subject of the sentence, make sure that it is substituted with an appropriate clitic,
possessive or demonstrative pronoun. Generate clues from all the parts of the text and use all of the information provided to
generate the clues.
5. Ensure that each clue functions as a description or definition of the keyword rather than a query, focusing on details about
the keyword.
6. Make sure that each clue's information can be traced back to the text. Make sure that the clues are relevant and that they are
sufficient to identify the keyword. Make sure that the keyword does not appear in the clues. Make sure that any part of the
keyword is not present in the clues.
7. Select only the three best clues for educational purposes.
8. Compile these clues into a list formatted as follows: [clue1, clue2, clue3] into a JSON file under the key: 'clues'. Make sure
the output is in the requested format and do not include the whole process in the output, but only the clues.
Observe the following steps:
1. Substitute every pronoun in the text with full phrases expressing their referents.
2. Split the text into small independent sentences that could be understood out of context.
3. Pinpoint three concise sentences that contain the Keyword and best characterize the keyword. Try to select sentences from
different parts of the Text.
4. Generate short and clever crossword clues in Italian from the selected sentences. Make sure that the keyword remains absent
from the clues. Each clue must have the syntax of a determiner phrase with the definite article (followed by a noun and possibly
adjectives). It can be followed by a relative clause or other complements or adjuncts. Generate clues from all the parts of the
text and use all of the information provided to generate the clues.
5. Ensure that each clue functions as a description or definition of the keyword rather than a query, focusing on details about
the keyword.
6. Make sure that each clue's information can be traced back to the text. Make sure that the clues are relevant and that they are
sufficient to identify the keyword. Make sure that the keyword does not appear in the clues. Make sure that any part of the
keyword is not present in the clues.
7. Select only the three best clues for educational purposes.
8. Compile these clues into a list formatted as follows: [clue1, clue2, clue3] into a JSON file under the key: 'clues'. Make sure
the output is in the requested format and do not include the whole process in the output, but only the clues.</p>
        <p>Generate concise and clever clues in Italian for educational crossword puzzles based on a specified Keyword and its relation to an
assigned Text. To execute this task properly, replicate the guidelines below:
KEYWORD: {keyword}
TEXT: {text}
Observe the following steps:
1. Substitute every pronoun in the text with full phrases expressing their referents.
2. Split the text into small independent sentences that could be understood out of context.
3. Pinpoint three concise sentences that contain the Keyword and best characterize the keyword. Try to select sentences from
different parts of the Text.
4. Generate short and clever crossword clues in Italian from the selected sentences. Make sure that the keyword remains absent
from the clues. Each clue must be a copular sentence, in which the keyword constitutes the subject. The syntax of each clue then
must corresponds to a copular sentence without the subject. For example: "è &lt;clue&gt;". Generate clues from all the parts of the text
and use all of the information provided to generate the clues.
5. Ensure that each clue functions as a description or definition of the keyword rather than a query, focusing on details about
the keyword.
6. Make sure that each clue's information can be traced back to the text. Make sure that the clues are relevant and that they are
sufficient to identify the keyword. Make sure that the keyword does not appear in the clues. Make sure that any part of the
keyword is not present in the clues.
7. Select only the three best clues for educational purposes.
8. Compile these clues into a list formatted as follows: [clue1, clue2, clue3] into a JSON file under the key: 'clues'. Make sure
the output is in the requested format and do not include the whole process in the output, but only the clues.</p>
        <sec id="sec-2-8-1">
          <title>Answer</title>
          <p>Quadrophenia
Paramore
Pixies</p>
        </sec>
        <sec id="sec-2-8-2">
          <title>Rating</title>
        </sec>
        <sec id="sec-2-8-3">
          <title>Explanation</title>
          <p>A
B
C
D
E</p>
          <p>Definite determiner is not appropriate: there are
other boroughs in Lancashire.</p>
          <p>The clue provides accurate but incomplete
information: the band was a duo for a limited period.
The clue is too generic.</p>
          <p>The clue contains part of the answer.</p>
        </sec>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list />
  </back>
</article>