<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">ECWCA -Educational CrossWord Clues Answering A CALAMITA Challenge</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Andrea</forename><surname>Zugarini</surname></persName>
							<email>azugarini@expert.ai</email>
							<affiliation key="aff0">
								<orgName type="department">expert.ai</orgName>
								<address>
									<settlement>Siena</settlement>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Kamyar</forename><surname>Zeinalipour</surname></persName>
							<email>kamyar.zeinalipour2@unisi.it</email>
							<affiliation key="aff1">
								<orgName type="institution">University of Siena</orgName>
								<address>
									<addrLine>DIISM, Via Roma 56</addrLine>
									<postCode>53100</postCode>
									<settlement>Siena</settlement>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Achille</forename><surname>Fusco</surname></persName>
							<email>achille.fusco@iusspavia.it</email>
							<affiliation key="aff2">
								<orgName type="institution">USS Pavia</orgName>
								<address>
									<addrLine>Piazza della Vittoria 15</addrLine>
									<postCode>27100</postCode>
									<settlement>Pavia</settlement>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Asya</forename><surname>Zanollo</surname></persName>
							<email>zanolloasya@gmail.com</email>
							<affiliation key="aff2">
								<orgName type="institution">USS Pavia</orgName>
								<address>
									<addrLine>Piazza della Vittoria 15</addrLine>
									<postCode>27100</postCode>
									<settlement>Pavia</settlement>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">ECWCA -Educational CrossWord Clues Answering A CALAMITA Challenge</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">35332E42D819AA2554EB6390E784DAE2</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T17:35+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Educational Crosswords Dataset</term>
					<term>Large Language Models</term>
					<term>CALAMITA 1. Challenge: Introduction and Motivation</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>This paper presents ECWCA (Educational CrossWord Clues Answering), a novel challenge designed to evaluate knowledge and reasoning capabilities of large language models through crossword clue-answering. The challenge consists of two tasks: a standard question-answering format where the LLM has to solve crossword clues, and a variation of it, where the model is receives hints about the word lengths of the answers, which is expected to help models with reasoning abilities. To construct the ECWCA dataset, synthetic clues were generated based on entities and facts extracted from Italian Wikipedia. Generated clues were then selected manually in order to ensure high-quality examples with factually correct and unambiguous clues.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>LLM to reply with the correct answer. In the second case, the goal is analogous, but we assist the model with hints related to the length of the words in the answer. Suggestions reduce the number of possible answers, therefore models with reasoning skills are supposed to take advantage of that.</p><p>To build ECWCA, we created a dataset of synthetic clues grounded on entities and facts extracted from Italian Wikipedia pages. Clue-answer pairs were generated following the same methodology of clue-instruct <ref type="bibr" target="#b12">[13]</ref>. In a nutshell, we create multiple clues for a given answer. The generation is grounded to a content that is about the given answer, and a topic. A sketch of the method is outlined in Figure <ref type="figure" target="#fig_0">1</ref>. Since the approach produces multiple definitions for a single answer, and the quality may not be good enough for all of them, we perform a manual selection step to preserve only high-quality clues.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Data description</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">Origin of data</head><p>The dataset was constructed following the clueinstruct <ref type="bibr" target="#b12">[13]</ref> approach. In clue-instruct it was faced a clues generation problem. Indeed, the task was to generate multiple clues given a certain answer, its context and its category. Here instead, we exploit the approach to build a QA dataset of clue-answer pairs. This happens in two steps, first we generate a set of examples constituted by an answer and the generated clues (as in clue-instruct), then we manually select the most suited clue-answer pairs (see Section 3.2 for further details).</p><p>In order to construct the examples with clue-instruct, we identified the most visited Italian Wikipedia 1 pages.</p><note type="other">Clues Generation</note><p>To count visits, we considered a period between September 10, 2023 and May 31, 2024 and gathered stats from Wikimedia APIs 2 . We considered the page title as the answer. Titles with non-alphabetic characters, with less than two characters or more than 20 were excluded. On the remaining pages, we extracted their content. Differently from clue-instruct, we did not dispose of the category information, therefore we generated it by querying GPT-4o <ref type="bibr" target="#b5">[6]</ref>, asking to choose the category of the answer given its page content within a set of 20 predefined categories. We then randomly sampled the pages and we interrogated GPT-4o to create three clues for the answer. Finally, those examples underwent through the manual selection process, to keep only one clue amongst the three. The dataset is publicly available 3 .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">Annotation details</head><p>The clue-instruct method produces three different clues for each given answer and its context. To select only one clue we add a human selection step. Doing so, we avoid the presence of multiple occurrences for the same answer. Moreover, we guarantee high quality definitions and answers. The example selection process was carried out by three native Italian speaking annotators. Examples were split in 18 chunks of 100 examples each, equally distributed among the annotators.</p><p>Each example was presented with the answer, the three generated clues and the Wikipedia page paragraph that was used to create the clues. Annotators were tasked with selecting the best one, if any, based on the following criteria: Truthfulness and Accuracy. It was imperative that the content of the selected clue was factually correct. Annotators cross-verified the accuracy of the clue from the provided Wikipedia page content to ensure that it did not contain misleading or false 1 https://it.wikipedia.org/ 2 wikimedia.org 3 https://huggingface.co/datasets/azugarini/crossword-clues-QA information, thereby ensuring the integrity of the dataset.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Answerability.</head><p>Annotators were instructed to choose a clue that could be answered without a high degree of ambiguity. The focus was on clues that provided enough information to infer the correct answer with confidence. Clues that left room for multiple interpretations or guesses were rejected. For example, generic definitions, such as 'a large mammal', does not fit this criteria, since there are many possible species fitting for this answer.</p><p>No clue-answer overlap.</p><p>Clues including the answer or a significant portion of it should be discarded.</p><p>In cases where more than one clue satisfied all the criteria, annotators were directed to select the clue that provided the most relevant information with most clarity and simplicity. When no clue matched the criteria, the whole example was discarded.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3.">Data format</head><p>Each example includes the clue-answer pair, the word length hint, some additional metadata (such as the category and the page views) and the reference to the wikipedia page url, whose content was exploited to generate the clue. More precisely, there are the following columns: clue, answer, answer_len, url, content, views, category, length_hint, raw_entity. A few examples are showcased in Table <ref type="table">1</ref>, where for the sake of simplicity, we only report the clue-answer pair, the hint and the category of the example.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.4.">Example of prompts used for zero or/and few shots</head><p>We defined two different prompts, one with and the other without indications about the words length of the answer. The two prompts are presented in Figure <ref type="figure" target="#fig_3">4</ref> and Figure <ref type="figure" target="#fig_2">3</ref>, respectively.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Table 1</head><p>Some examples of generated clues in the dataset, their answers, the hint suggesting the character length of each word in the answer and the category representing the topic of the clue.  Task without hints. We construct a 2-shot prompt (Figure <ref type="figure" target="#fig_2">3</ref>) for the task. First, we instruct the model to act as an expert in solving crossword clues without any additional hints related to the structure of the answer (such as words length). The format is clear and concise, focusing on the core task: resolving the crossword definition and providing only the solution. Then, the two static demonstration examples are showcased to illustrate to the model how to approach the task. Finally, following the same layout, we present a new clue and expect the model to complete it with the answer.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Clue</head><p>Task with word length hints. This prompt (see Figure <ref type="figure" target="#fig_3">4</ref>) is very similar to the first one, but introduces an hint indicating the words length of the expected answer.</p><p>The hint is a constraint that reduces the number of valid answers, giving indications on both how many words there are and their lengths, therefore, ideally, it should aid the language model.  characters. Sports, Geography, History and Society are also well represented, whereas the remaining categories are less frequent, which some, like Applied Science, Philosophy and Education being rare.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.5.">Detailed data statistics</head><p>The pages from which clue-answer pairs were built have about 234 thousand views each on average, with a minimum of 1,108 up to almost five million views. However, only a few examples outreach the million and the vast majority of them is within the half million visits, as we can observe from Figure <ref type="figure" target="#fig_1">2</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Metrics</head><p>To evaluate the performance on the tasks we rely on the following metrics: Edit Distance (ED), Exact Match (EM), and average F1 score on words (F1).</p><p>Edit Distance. Edit Distance (also known as Levenshtein Distance) measures the minimum number of single-character edits (insertions, deletions, or substitutions) required to change one sequence into another. In this context, ED measures how close the generated   response is to the ground truth answer. A lower ED indicates better performance, as it signifies that the predicted text is more similar to the target text.</p><p>Exact Match. Exact Match (EM) is a binary metric that evaluates whether the generated answer exactly matches the ground truth. We report in percentage the EM score obtained in each example, which corresponds to the percentage of correctly predicted answers.</p><p>[10 3 , 10 4 ) [10 4 , 10 5 ) [10 5 , 10 6 ) [10 6 , ) # Views  F1 score. The F1 score evaluates how well the predicted words overlap with the ground truth answer. For example, if the ground truth is "leonardo dicaprio" and the model predicts "dicaprio", the model would have perfect precision, but imperfect recall (50%), resulting in a 66.67% F1 score. Preliminary Results. We establish baseline results on ECWCA, testing some of the models in the Llama family.</p><p>In particular, we consider Llama3 8B and Llama3.1 8B in both instructed and non-instructed versions, and the Llama3.1 70B-instruct, to observe how model size affects the results. Table <ref type="table" target="#tab_3">2</ref> illustrates the performance of the LLMs on the two tasks (with and without word-length hints), both evaluated on the defined scores. We can observe that Llama3.1 8B consistently outperforms its predecessor across all the metrics, both with and without hints. The gap between smaller LLMs and Llama3.1 70Binstruct is remarkable, proving once again that larger LLMs preserve much more knowledge. Word-length hints instead are generally not helping the models, actually harming the performance in noninstructed models. For example, the F1 score of Llama3.1 8B drops significantly, from 37.35 without hints to 27.51 with hints, and similarly, EM decreases from 34.16 to 25.72 as well. Instructed models instead are not affected by this, but the suggestions lead to a small increase in all the metrics. Only in Llama3.1 70B-instruct, we can observe some statistically significant improvement. This may suggest that constraints are beneficial only on models with stronger understanding capabilities.</p><p>In Figure <ref type="figure" target="#fig_6">6</ref>, we show how the performance of Llama3.1 family models vary with respect to the number of page views. We group examples in intervals, then we compute the metrics on each of them. Edit distance shows no significant trends, whereas EM and F1 exhibit an increasing trend on more visited pages for 8B sized models, whereas the 70B model has a behaviour that seems uncorrelated with the number of views. This suggests that the larger number of weights in 70B model, stored a broader and deeper knowledge about world facts and entities, covering also less popular ones, whereas smaller LLMs did embody only the most popular factual knowledge seen during training.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Limitations</head><p>Large Language Models have all been exposed to vast amount of data. The clues proposed in this dataset were created from Wikipedia pages that were definitely seen by the LLMs during training. Clues are also generally very adherent to the pages content, since they were created from it. Indeed, one of the goals of the benchmark is to assess their memorization capabilities on facts that were likely to be well known by them. However, the proposed dataset is new, hence it could not have been part of the training set of such LLMs.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.">Data license and copyright issues</head><p>Data is released under apache-2.0 license.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Sketch of clue-instruct method. Picture taken from [13].</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: Page views distribution (the very few examples above one million visits were excluded).</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Figure 3 :</head><label>3</label><figDesc>Figure 3: Prompt task without hints.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head>Figure 4 :</head><label>4</label><figDesc>Figure 4: Prompt task with word length hints.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_4"><head>Figure 5 :</head><label>5</label><figDesc>Figure 5: Distribution of the examples across the categories.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_6"><head>Figure 6 :</head><label>6</label><figDesc>Figure 6: ED, EM and F1 score performance varying with respect to the number of page views for 3.1 llama models.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_3"><head>Table 2</head><label>2</label><figDesc>Performance on the task with and without word length hints.</figDesc><table><row><cell>Model</cell><cell cols="2">Hint ED ↓</cell><cell>EM</cell><cell>F1</cell></row><row><cell>Llama3 8B</cell><cell>No</cell><cell>11.43</cell><cell>14.82</cell><cell>16.37</cell></row><row><cell>Llama 8B</cell><cell>Yes</cell><cell>11.52</cell><cell>10.82</cell><cell>11.91</cell></row><row><cell>Llama3 8B-instruct</cell><cell>No</cell><cell>11.43</cell><cell>14.82</cell><cell>16.37</cell></row><row><cell>Llama3 8B-instruct</cell><cell>Yes</cell><cell>12.07</cell><cell>14.48</cell><cell>16.07</cell></row><row><cell>Llama3.1 8B</cell><cell>No</cell><cell>6.99</cell><cell>34.16</cell><cell>37.35</cell></row><row><cell>Llama3.1 8B</cell><cell>Yes</cell><cell>8.01</cell><cell>25.72</cell><cell>27.51</cell></row><row><cell>Llama3.1 8B-instruct</cell><cell>No</cell><cell>7.31</cell><cell>39.69</cell><cell>44.47</cell></row><row><cell>Llama3.1 8B-instruct</cell><cell>Yes</cell><cell>6.14</cell><cell>40.80</cell><cell>44.58</cell></row><row><cell>Llama3.1 70B-instruct</cell><cell>No</cell><cell>3.32</cell><cell>66.61</cell><cell>70.16</cell></row><row><cell>Llama3.1 70B-instruct</cell><cell>Yes</cell><cell>3.27</cell><cell cols="2">67.89 71.24</cell></row></table></figure>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Crossword puzzles and lexical memory</title>
		<author>
			<persName><forename type="first">R</forename><surname>Nickerson</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Attention and performance VI</title>
				<imprint>
			<publisher>Routledge</publisher>
			<date type="published" when="1977">1977</date>
			<biblScope unit="page" from="699" to="718" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Crossword puzzles for chemistry education: learning goals beyond vocabulary</title>
		<author>
			<persName><forename type="first">E</forename><surname>Yuriev</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Capuano</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">L</forename><surname>Short</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Chemistry education research and practice</title>
		<imprint>
			<biblScope unit="volume">17</biblScope>
			<biblScope unit="page" from="532" to="554" />
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">The use of crossword puzzles as a strategy to teach maritime english vocabulary, Scientific Bulletin&quot; Mircea cel Batran</title>
		<author>
			<persName><forename type="first">C</forename><surname>Sandiuc</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Balagiu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Naval Academy</title>
		<imprint>
			<biblScope unit="volume">23</biblScope>
			<biblScope unit="page" from="236A" to="242" />
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Language models are few-shot learners</title>
		<author>
			<persName><forename type="first">T</forename><surname>Brown</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Mann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Ryder</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Subbiah</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">D</forename><surname>Kaplan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Dhariwal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Neelakantan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Shyam</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Sastry</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Askell</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Advances in neural information processing systems</title>
		<imprint>
			<biblScope unit="volume">33</biblScope>
			<biblScope unit="page" from="1877" to="1901" />
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<monogr>
		<author>
			<persName><forename type="first">H</forename><surname>Touvron</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Lavril</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Izacard</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Martinet</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M.-A</forename><surname>Lachaux</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Lacroix</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Rozière</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Goyal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Hambro</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Azhar</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2302.13971</idno>
		<title level="m">Llama: Open and efficient foundation language models</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b5">
	<monogr>
		<author>
			<persName><forename type="first">J</forename><surname>Achiam</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Adler</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Agarwal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Ahmad</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Akkaya</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><forename type="middle">L</forename><surname>Aleman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Almeida</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Altenschmidt</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Altman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Anadkat</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2303.08774</idno>
		<title level="m">Gpt-4 technical report</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b6">
	<monogr>
		<title level="m" type="main">A multi-strategy approach to crossword clue answer retrieval and ranking</title>
		<author>
			<persName><forename type="first">A</forename><surname>Zugarini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Ernandes</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<monogr>
		<author>
			<persName><forename type="first">E</forename><surname>Wallace</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Tomlin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Xu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Yang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Pathak</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Ginsberg</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Klein</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2205.09665</idno>
		<title level="m">Automated crossword solving</title>
				<imprint>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Die rätselrevolution: Automated german crossword solving</title>
		<author>
			<persName><forename type="first">A</forename><surname>Zugarini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Rothenbacher</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Klede</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Ernandes</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">M</forename><surname>Eskofier</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Zanca</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">CLiC-it</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">The webcrow french crossword solver</title>
		<author>
			<persName><forename type="first">G</forename><surname>Angelini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Ernandes</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Iaquinta</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Stehlé</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Simões</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Zeinalipour</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Zugarini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Gori</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International Conference on Intelligent Technologies for Interactive Entertainment</title>
				<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2023">2023</date>
			<biblScope unit="page" from="193" to="209" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<monogr>
		<author>
			<persName><forename type="first">S</forename><surname>Saha</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Chakraborty</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Saha</surname></persName>
		</author>
		<author>
			<persName><forename type="first">U</forename><surname>Garain</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2406.09043</idno>
		<title level="m">Language models are crossword solvers</title>
				<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b11">
	<monogr>
		<title level="m" type="main">Italian crossword generator: Enhancing education through interactive word puzzles</title>
		<author>
			<persName><forename type="first">K</forename><surname>Zeinalipour</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Iaquinta</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Zanollo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Angelini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Rigutini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Maggini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Gori</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Clue-instruct: Text-based clue generation for educational crossword puzzles</title>
		<author>
			<persName><forename type="first">A</forename><surname>Zugarini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Zeinalipour</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">S</forename><surname>Kadali</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Maggini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Gori</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Rigutini</surname></persName>
		</author>
		<ptr target="https://aclanthology.org/2024.lrec-main.297" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)</title>
				<meeting>the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)<address><addrLine>Torino, Italia</addrLine></address></meeting>
		<imprint>
			<publisher>ELRA and ICCL</publisher>
			<date type="published" when="2024">2024</date>
			<biblScope unit="page" from="3347" to="3356" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">CALAMITA: Challenge the Abilities of LAnguage Models in ITAlian</title>
		<author>
			<persName><forename type="first">G</forename><surname>Attanasio</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Basile</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Borazio</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Croce</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Francis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Gili</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Musacchio</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Nissim</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Patti</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Rinaldi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Scalena</surname></persName>
		</author>
		<ptr target="CEUR-WS.org" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 10th Italian Conference on Computational Linguistics (CLiC-it 2024)</title>
		<title level="s">CEUR Workshop Proceedings</title>
		<meeting>the 10th Italian Conference on Computational Linguistics (CLiC-it 2024)<address><addrLine>Pisa, Italy</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2024-12-06">December 4 -December 6, 2024. 2024</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
