<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Assessing the Asymmetric Behaviour of Italian Large Language Models across Different Syntactic Structures</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author role="corresp">
							<persName><forename type="first">Elena</forename><forename type="middle">Sofia</forename><surname>Ruzzetti</surname></persName>
							<email>sofia.ruzzetti@uniroma2.it</email>
							<affiliation key="aff0">
								<orgName type="institution">University of Rome Tor Vergata</orgName>
								<address>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Federico</forename><surname>Ranaldi</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">University of Rome Tor Vergata</orgName>
								<address>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Dario</forename><surname>Onorati</surname></persName>
							<affiliation key="aff1">
								<orgName type="institution">Sapienza University of Rome</orgName>
								<address>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Davide</forename><surname>Venditti</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">University of Rome Tor Vergata</orgName>
								<address>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Leonardo</forename><surname>Ranaldi</surname></persName>
							<affiliation key="aff2">
								<orgName type="department">School of Informatics</orgName>
								<orgName type="institution">University of Edinburgh</orgName>
								<address>
									<country key="GB">UK</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Tommaso</forename><surname>Caselli</surname></persName>
							<affiliation key="aff3">
								<orgName type="institution">University of Groningen</orgName>
								<address>
									<country key="NL">The Netherlands</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Fabio</forename><forename type="middle">Massimo</forename><surname>Zanzotto</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">University of Rome Tor Vergata</orgName>
								<address>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
						</author>
						<author>
							<affiliation key="aff4">
								<orgName type="department">Tenth Italian Conference on Computational Linguistics</orgName>
								<address>
									<addrLine>Dec 04 -06</addrLine>
									<postCode>2024</postCode>
									<settlement>Pisa</settlement>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Assessing the Asymmetric Behaviour of Italian Large Language Models across Different Syntactic Structures</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">BE726947CE9089DCEEBE4AC4A3AEB29D</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T17:32+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>LLMs</term>
					<term>Natural Language Understanding</term>
					<term>Syntax</term>
					<term>Attributions</term>
					<term>Localization</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>While LLMs get more proficient at solving tasks and generating sentences, we aim to investigate the role that different syntactic structures have on models' performances on a battery of Natural Language Understanding tasks. We analyze the performance of five LLMs on semantically equivalent sentences that are characterized by different syntactic structures. To correctly solve the tasks, a model is implicitly required to correctly parse the sentence. We found out that LLMs struggle when there are more complex syntactic structures, with an average drop of 16.13(±11.14) points in accuracy on Q&amp;A task.</p><p>Additionally, we propose a method based on token attribution to spot which area of the LLMs encode syntactic knowledge, by identifying model heads and layers responsible for the generation of a correct answer.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>Large Language Models (LLMs) excel at understanding and generating text that appears human-written. Thus, it is intriguing to determine whether the models' text comprehension aligns in some way with human cognitive processes. A peculiarity of natural languages is that the same meaning can be encoded by multiple syntactic constructions. In Italian, for instance, the unmarked sentence follows a subject-verb-object (SVO) word order. However, inversions of this ordering do not necessarily lead to ungrammatical sentences. A case in point is represented by cleft sentence, i.e., sentences where the unmarked SVO sequence is violated. This corresponds to specific communicative functions, namely emphasize a component, and it is obtained by putting one element in a separate clause. In particular, Object Relative Clauseswhere the element that is emphasized is the object of the sentence -are difficult to understand <ref type="bibr" target="#b0">[1,</ref><ref type="bibr" target="#b1">2]</ref>. For example the sentence "Sono i professori che i presidi hanno elogiato alla riunione d'istituto" is more challenging for an Italian speaker than its semantically equivalent unmarked version "I presidi hanno elogiato i professori alla riunione d'istituto" where the SVO order is restored. Similarly, in Nominal Copular constructions, the inversion of subject and verb clause is documented to cause difficulties in understanding the meaning of the sentence <ref type="bibr" target="#b2">[3]</ref>.</p><p>Hence, syntax plays a crucial role not only in the general construction of language but also in the native speakers ability to comprehend sentences: in fact, a correct syntactic parsing of the sentences is necessary to understand their meaning, and some syntactic structures are preferred over others. To what extent this preference is replicated by LLMs needs to be further explored.</p><p>If the model shows some knowledge about syntax, there should be an area of the model responsible for that. We aim to detect the area of a model responsible for its syntactic knowledge. Extensive work has been devoted to understanding how Transformer-based architectures encode information and one main objective is to localize which area of the model is responsible for a certain behavior <ref type="bibr" target="#b3">[4,</ref><ref type="bibr" target="#b4">5]</ref>. Despite its usage as an explanation mechanism being debated <ref type="bibr" target="#b5">[6,</ref><ref type="bibr" target="#b6">7]</ref>, the attention mechanism is an interesting starting point given its wide use in Transformer architecture. While the attention weights alone cannot be used as an explanation of a model's behavior <ref type="bibr" target="#b7">[8,</ref><ref type="bibr" target="#b8">9]</ref>, an analysis that includes multiple components of the attention module is shown to be beneficial to obtain an interpretation of how a model processes an input sentence <ref type="bibr" target="#b9">[10,</ref><ref type="bibr" target="#b10">11]</ref>.</p><p>Probing is a common method used to detect the presence of linguistic properties of language in models <ref type="bibr" target="#b11">[12]</ref>. Probing consists of training an auxiliary classifier on top of a model's internal representation, which could be the output of a specific layer, to determine which linguistic property the model has learned and encoded. In particular, it has been proposed to probe Transformerbased models to reconstruct syntactic representations like dependency parse trees from their hidden states <ref type="bibr" target="#b12">[13]</ref>. Probing tasks concluded that syntactic features are encoded in the middle layers <ref type="bibr" target="#b13">[14]</ref>. Correlation analysis on the weights matrices of the monolingual BERT models confirmed the localization of syntactic information in the middle layers showing that the models trained on syntactically similar languages were similar on middle layers <ref type="bibr" target="#b14">[15]</ref>. While an altered word order seems to play a crucial role in Transformer-based models' ability to process language <ref type="bibr" target="#b15">[16,</ref><ref type="bibr" target="#b16">17]</ref>, the correlation between LLMs downstream performance and the encoding of syntax needs to be further explored.</p><p>In this paper, we initially examine how syntax influences the LLMs' capability of understanding language. To achieve this, we will analyze five open weights LLMs -trained on the Italian Language either from scratch or during a finetuning phase -and measure their performance in question-answering (Q&amp;A) tasks that require an implicit parsing of the roles of words in the sentence to provide the correct answer. We use an available set of Q&amp;A tasks designed for Italian speakers <ref type="bibr" target="#b0">[1]</ref> and propose similar template-based questions for two other datasets of Italian sentences characterized by different syntactic structures (Section 2.1). The results show that the models are affected by the different syntactic structures in solving the proposed tasks (Section 3.1): LLMs struggle when more complex syntactic structures are present, with an average drop in accuracy of 16.13(±11.14) points.</p><p>We then propose a method -based on norm-based attribution <ref type="bibr" target="#b9">[10]</ref>-to localize where syntactic knowledge is encoded by identifying the models' attention heads and layers that are responsible for the generation of a correct answer (Section 2.2). Although some differences can be observed across the five LLMs, we notice that syntactic information is more widely included in the middle and top layers of the models.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Methods and Data</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1.">Question-answering Tasks to assess LLMs Syntactic Abilities</head><p>In this Section, we introduce the dataset we collected -largely extracted from the AcCompl-It task <ref type="bibr" target="#b17">[18]</ref> in EVALITA 2020 <ref type="bibr" target="#b18">[19]</ref> -to assess LLMs syntactic abilities. The dataset is split in three subdatasets. Each of the subdataset is composed of pairs of sentences that share the same meaning but a different word order. One of the sentences in each pair is characterized by a simpler structure, easier to understand also for humans, while the second is characterized by an alternative -but still correctsyntactic structure. We aim to understand whether a different structure can influence the model performance in processing those similar sentences. We define, for each subdataset, a Q&amp;A task to assess the LLMs capabilities in understanding sentences when their syntactic structure makes them more complex. The Q&amp;A task requires the model to implicitly parse the role of the words in the sentence to get the correct answer: for this reason, we identify some important words that the model should attend to while getting the correct answer.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Object Clefts constructions</head><p>The first subset is derived from Chesi and Canal <ref type="bibr" target="#b0">[1]</ref>: this dataset contains 128 sentences characterized by Object Clefts (OC) constructions. The OC sentences in this dataset all share the same structure (see Table <ref type="table">1</ref>): the object and subject are words indicating either a person or a group of people, the predicate describes an action that the subject performs towards the object. The object is always introduced as the first element of the sentence in a left-peripheral position. The displacement of the object in the left-peripheral position makes the OC harder to understand <ref type="bibr" target="#b1">[2]</ref>. We will compare those sentences with semantically equivalent ones that preserve the unmarked SVO word order.</p><p>To assess whether the difficulty humans have in understanding Object Cleft sentences can also be registered in LLMs for the Italian language, we tested them on the same Q&amp;A task that Chesi and Canal <ref type="bibr" target="#b0">[1]</ref> proposed to human subjects. Given one OC sentence, the model is prompted with a yes or no question asking whether one of the participants (subject or object) was involved in the action described by the predicate (see Table <ref type="table">1</ref> for an example). The ability of a model to comprehend cleft sentences can be measured as the accuracy it obtains on this Q&amp;A task. Moreover, we perform the same Q&amp;A task on SVO sentences that we directly derived from the OC clauses in Chesi and Canal <ref type="bibr" target="#b0">[1]</ref>: in this case, we restored the SVO order and produced sentences that are semantically equivalent to the corresponding OC (see Table <ref type="table">1</ref>).</p><p>To correctly solve the task, the model must interpret the role of the nouns of the sentences playing the role of subject and object to answer the comprehension question. Hence, the model should implicitly parse the sentences and focus on those relevant words during the generation of the answer.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>The Copular Constructions</head><p>The second subdataset -which includes 64 pairs of sentences-is derived from a study involving Nominal Copular constructions (NC) from Greco et al. <ref type="bibr" target="#b19">[20]</ref>. The NC sentences are composed of two main constituents: a Determiner Phrase (𝐷𝑃 𝑠𝑢𝑏𝑗 ) and a Verbal Phrase (𝑉 𝑃 ). The verbal phrase contains a copula and another Determiner Phrase that acts as the nominal part of the predicate (𝐷𝑃 𝑝𝑟𝑒𝑑 ). In this dataset, the effect of the position of the subject with respect to the copular predicate is studied. Two semantically equivalent Table <ref type="table">1</ref> Examples from the dataset under investigation. For each subdataset, an example is composed of two semantically equivalent sentences, that differ from the syntactic point of view, and a comprehension question on them.</p><p>sentences are presented for each example. In one case, the sentence presents a canonical structure (NC canonical), with the subject (𝐷𝑃 𝑠𝑢𝑏𝑗 ) preceding the copular predicate. In the second case, an inverse structure (NC inverse) -with the subject following the predicate and the 𝐷𝑃 𝑝𝑟𝑒𝑑 introduced as the first element of the sentence -is presented (see Table <ref type="table">1</ref>). NC inverse sentences are syntactically correct but are harder to understand for humans than the NC canonical <ref type="bibr" target="#b2">[3]</ref>.</p><p>The structure of the sentences in this dataset is enriched by two Prepositional Phrases, one in each of the Determiner Phrases. The 𝐷𝑃 𝑠𝑢𝑏𝑗 includes a subject accompanied by an article and augmented with a Prepositional Phrase (𝑃 𝑃 𝑠𝑢𝑏𝑗 ) that features a complement referring to the subject. Similarly, the 𝐷𝑃 𝑝𝑟𝑒𝑑 consists not only of a noun and an article but is instead further enriched with another Prepositional Phrase 𝑃 𝑃 𝑝𝑟𝑒𝑑 . The 𝑃 𝑃 𝑝𝑟𝑒𝑑 gives more information about the relation between the subject noun and the nominal part of the predicate.</p><p>We exploit the different role of the two Prepositional Phrases to design a Q&amp;A task on NC canonical and NC inverse sentences and hence assess whether a more complex syntactic structure can influence LLMs capabilities. Given an NC sentence, the model is asked to correctly interpret the meaning of the sentence by examining its predicate: in particular, the model is asked to predict the additional information related to the nominal predicate -which is included in the 𝑃 𝑃 𝑝𝑟𝑒𝑑 -by answering a "wh-" question (in Italian, "Di cosa", see the example in Table <ref type="table">1</ref>). While both Prepositional Phrase answer to a wh-question, only the 𝑃 𝑃 𝑝𝑟𝑒𝑑 is related to the predicate of the sentence and hence the model should be able to predict the 𝑃 𝑃 𝑝𝑟𝑒𝑑 and ignore the 𝑃 𝑃 𝑠𝑢𝑏𝑗 .</p><p>To solve the proposed task and to properly understand NC sentences, humans and LLMs are required to im-plicitly parse the sentence and accurately identify the nominal part of the verbal phrase and, in particular, the Prepositional Phrase that it contains (𝑃 𝑃 𝑝𝑟𝑒𝑑 ).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Minimal Verbal Structure with Inversion of Subject and Verb</head><p>Finally, the last subdataset we investigate is derived from Greco et al. <ref type="bibr" target="#b19">[20]</ref> and contains sentences characterized by minimal verbal structure (MVP). MVP sentences are composed of a subject, a predicate andfor sentences with transitive predicates -of an object (see Table <ref type="table">1</ref>). In this subdataset, the inversion of the subject and the verb is studied: the pairs of sentences under investigation have the same meaning (and lexicon) but in one cases the subject of the sentence follows the predicate (MVP post) while in the others the subject precedes the predicate (MVP pre). The latter configuration, in Italian, is more common that the former: we aim to investigate whether this syntactic variation can alter the performance of an LLM.</p><p>We define, for each pair of sentences, a question that asks the model to predict which element of the sentence is involved in a certain action, either as the subject entity or the object. In particular, for sentences that contain intransitive verbs, the model is always asked to predict the subject of the sentence, while in transitive cases (like the one in Table <ref type="table">1</ref>) the model is either asked to predict the subject or the object of the sentence. For this subdataset, while the original data included both declarative and interrogative sentences, we retained only the declarative ones: we test the model with a total of 192 sentence pairs.</p><p>To answer those questions, the relevant words -both for humans and LLMs -are the nouns that play the role of subject, or object if present, in sentences. In the next Section, we describe how it is possible to quantify whether a model is able to identify the role of those words during the generation of the answer.</p><p>Qwen2-7B LLaMAntino-3-ANITA-8B Llama-2-7b modello-italia-9b Meta-Llama- </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Table 2</head><p>Models accuracy on the different subdataset on the proposed Q&amp;A tasks. Models tend to produce less accurate answers when exposed to more rare syntactic structures.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2.">Localizing Syntactic Knowledge via Attributions</head><p>Knowing which sentence structures are easier or more difficult for a model to analyze is not enough. Considering the black-box nature of these models, it is essential to understand which layers are responsible for encoding syntax, thus making the models more interpretable. We hypothesize that there is an area of the model responsible for correctly analyzing the sentence from the syntactic point of view in order to get the answer to the Q&amp;A task. In fact, as discussed in the previous Section, to answer correctly, the model needs to implicitly parse the roles of the words in the sentence and identify the relevant words for the response (subjects and objects in the questions on OC, SVO and MVP sentences and the correct prepositional phrases in NC sentences). Hence, a knowledge of syntax is required to identify the relevant words and, consequentially, generate the correct answer.</p><p>In generating the answer, we expect the model to "focus" on those relevant words. We can identify to which token the model focuses during generation, measuring token-to-token attributions <ref type="bibr" target="#b7">[8,</ref><ref type="bibr" target="#b9">10]</ref>. In fact, token-totoken attribution methods quantify the influence of a token in the generation of the other. We argue that the part of the model architecture most aware of syntax is the one that systematically focuses on relevant words when the model is prompted to answer syntax-related questions. Kobayashi et al. <ref type="bibr" target="#b9">[10]</ref> demonstrate that a mechanism -called the norm-based attribution -that it incorporates also the dense output layer of the Attention Mechanism is an accurate metric for token-to-token attribution. We will refer to the matrix 𝐴 ℎ (𝑋) -computed for the attention head ℎ for a sequence 𝑋 -as an attribution matrix. Some examples and a more detailed description of norm-based attribution can be found in the Appendix (A.1). The attribution matrix 𝐴 ℎ (𝑋), for each sequence of tokens 𝑋, describes where the model focuses during the generation of each token. By examining all the attention heads, some of them may focus more often on the subject, the object, or the prepositional phrase in the predicate while generating the answer for the task. In particular, for each attention head ℎ, we consider the tokens to be attributed for the generated answer produced by the model: for each correct answer generated by the model, we count the number of times the tokens with the larger attribution value are the relevant ones. This measures the accuracy of the attention head ℎ in recognizing the relevant words to generate the answer.</p><p>The more often the attention head focuses on the relevant words, the more syntactic knowledge the head encodes. For each downstream task presented in Section 2.1, we collect the accuracy of all heads at all levels. Then, we identify a head as "responsible" for generating the target word in a task if its score is higher than the average score for that task. Specifically, we assume a Gaussian distribution of scores for each task and identify a head as responsible if the probability of observing a value at least as extreme as the one observed is below a threshold 𝛼 &lt; 0.05. We also consider responsible all heads that obtain an excellent accuracy score (greater than 0.9) in focusing on the relevant words. With this procedure, for each layer and task, we can localize the responsible heads and determine where the model encodes syntax the most.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.3.">Models and Prompting Method</head><p>We focus on Instruction-tuned LLMs, all of comparable size, and trained -either from scratch or only fine-tuned -on the Italian language. The models<ref type="foot" target="#foot_0">1</ref> under investigation are Qwen2-7B [22], LLaMAntino-3-ANITA-8B <ref type="bibr" target="#b21">[23]</ref>, Llama-2-7b <ref type="bibr" target="#b22">[24]</ref>, modello-italia <ref type="bibr" target="#b23">[25]</ref>, and Meta-Llama-3-8B <ref type="bibr" target="#b24">[26]</ref>. To solve the Q&amp;A task, we prompted each model with 4 different -but semantically equivalentinstructions. The complete list of the prompts is in Appendix A.2. All prompts ask the model to solve the task in zero-shot by answering only with one or two words. At most 128 tokens are generated, with greedy decoding. Once the generation is completed, a manual check of the responses is performed to obtain a simplified response to be compared with the gold. For the subsequent analysis, for each model and task, only the prompt for which the higher accuracy is obtained is considered. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Experiments and Results</head><p>We initially revise model's accuracy on question comprehension task and assess models capabilities when different syntactic structures are involved (Section 3.1). Then, we aim to spot the layers responsible for the correct syntactic understanding of the sentences (Section 3.2).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">Models accuracy on question-answering task</head><p>Results on each of the subdatasets show that the syntactic structure of a sentence influences the models' understanding of that sentence (see Table <ref type="table">2</ref>): across all tasks, LLMs tend to obtain larger accuracy on sentences characterized by a unmarked syntactic structure.</p><p>On the first task, on OC and SVO sentences, the models tend to struggle, especially in the OC sentences. On OC sentences, some models, in fact, do not perform far from the random baseline of 50% accuracy ("yes" and "no" answers are balanced). When comparing OC and SVO sentences, on average, the model accuracy drops by 11.88(±3.84) points when the sentence presents the object in the left-peripheral position. This result aligns with the difficulty that humans encounter in understanding those sentences. The model that achieves the highest accuracy in this task in OC sentences is LLaMAntino-3-ANITA-8B, with an accuracy of 76.56. It is important to note that the model performance increase of 11.72 points with respect to the corresponding Meta-LLama-3-8b (that achieves an accuracy of 64.84): these results stress the effectiveness of the finetuning for the Italian language. Across the LLaMa-based models the LLaMAntino-3-ANITA-8B is still the best performing model, followed by Meta-LLama-3-8b and with a larger gap by LLama-2-7b. The Qwen2-7B model is the best answering to the task on unmarked sentences.</p><p>On the NC sentences, similar patterns to the one ob-served in the previous subdataset emerge. In particular, the NC inverse sentences are harder than the corresponding NC canonical: the average model accuracy is 81.88(±11.78) on NC canonical sentences, while the accuracy on NC inverse sentences is much lower, with an average value of 64.06(±28.26). Also in this case, the results demonstrate that models are affected by different syntactic patterns. The model that better capture the right information to extract is modello-italia-9b on both NC inverse and NC-canonical sentences. Although the performance of Llama-2-7b is rather low on inverse NC sentences (the model tends to generate very often the 𝑃 𝑃 𝑠𝑢𝑏𝑗 ), the remaining LLaMA-base models achieve better performance on both tasks. Finally, results on the MVP task further confirm the models' behavior observed on the previous two tasks: the inversion of the subject and verb positions causes the models to perform worst on MVP post sentences (87.5(±19.38) average accuracy) with respect to MVP pre (68.23(±10.37) average accuracy). The average drop in performance is larger than in previous subtasks: these results confirm that the inversion of the subject, even in basic sentences, can degrade models' understanding. Modello-italia-9b -probably due to the limited length of the input sentences -tends to replicate the input sentences. The other models solve the tasks with excellent accuracy in the MVP pre sentences.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">Localizing Layers responsible for Syntax</head><p>After quantifying the impact of different syntactic structures on model performance, we can identify the attention heads and levels of the models that mostly encodes syntax. In Figure <ref type="figure" target="#fig_0">1</ref> the number of responsible head at each layer of the models is reported for the Q&amp;A task on NC sentences, (the remaining tasks are in Appendix A.3). The general trend is that the most active in identifying relevant words during response generation layers are comprised between layer 19 and 25. Moreover, for all models, the layers we identify as responsible often handle multiple syntactic structures. The most noticeable result is that for the same task, the same activation trend emerges across all sentences.</p><p>A large number of responsible attention heads appear around layer 19 to 27 in LLaMAntino-3-ANITA-8B and Meta-Llama-3-8B. Layer 21, in particular, is the layer with the most responsible heads both in NC and MVP tasks. This layer is predominant also in the OC task, concomitant with layers 19 and 22 (Figure <ref type="figure" target="#fig_2">3a</ref>). For Llama-2, we observe the same pattern as the most active layers are between 18 and 25. On the Qwen2-7B model and modelloitalia-9b active layers are higher in the architecture: from layer 18 to 24 for Qwen2-7B (with layer 23 being the more active in NC and MVP tasks) and from layer 21 to 31 on NC and MVP senteces for modello-italia-9b. This finding suggests a different interpretation of LLMs layers from that previously observed in BERT <ref type="bibr">[27]</ref>.</p><p>While we could expect some correlation between the accuracy of the task and the capability of the model to identify the correct word in the sentence, the responsible heads appear to be shared across different syntactic structures. Those results suggest that some layers, more than others, encode syntactic information about the role of a word in a sentence. Moreover, different models and architectures seem to share a rather similar organization.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Conclusions</head><p>In this paper, we have investigated how semantically equivalent sentences are processed by LLMs in Italian when their syntax differs. We tested LLMs trained on the Italian -or with Italian data in the pre-trainig material -and measured how their capabilities in a battery of Q&amp;A tasks that rely on parsing the correct role of words in a sentence to be solved. Our findings confirm that cleft sentences and construction with an inversion of subject and verb are difficult to understand also for LLMs -similarly to what observed for humans. Furthermore, we have identified systematically using token-to-token attribution that syntactic information tends to be encoded in the middle and top layers of LLMs. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A. Appendix</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A.1. Token-to-token norm-based attribution</head><p>As described in Section 2.2, we adopt norm-based token-to-token attribution to spot what is the most relevant word during the generation of the answer in LLMs on our task. The norm based approach is proposed in Kobayashi et al. <ref type="bibr" target="#b9">[10]</ref>. Given the query weight matrix 𝑊 ℎ 𝑄 , key weight matrix 𝑊 ℎ 𝐾 , value weight matrix 𝑊𝑉 and the attention output weight matrix 𝑊 ℎ 𝑂 of an attention head ℎ, the norm-based attribution for each token of a sequence 𝑋 is calculated as the product of the attention weights and the norm of the projected token representation 𝑋𝑊 ℎ 𝑉 𝑊 ℎ 𝑂 (see the original work Kobayashi et al. <ref type="bibr" target="#b9">[10]</ref> for a detailed discussion).</p><formula xml:id="formula_0">A ℎ (𝑋) := 𝑠𝑜𝑓 𝑡𝑚𝑎𝑥 (︂ 𝑋𝑊 ℎ 𝑄 •(𝑋𝑊 ℎ 𝐾 ) ⊤ √ 𝑑𝑣</formula><p>)︂</p><p>• ‖𝑋𝑊 ℎ 𝑉 𝑊 ℎ 𝑂 ‖ For our analysis, we consider all rows relative to a token in the answer generated by the model. To assess whether a model understands the syntactic relationship between words, it must focus on relevant words during the generation. In particular, the token with the highest attribution should be one belonging to the relevant word. For example, in Figure <ref type="figure" target="#fig_1">2</ref>, the attribution of Meta-Llama-3-8B on one NC sentence is presented. During the generation of the answer (the tokens of the answer index rows in the figure), the most attributed tokens belong to the relevant words in the input (the tokens of the input index columns).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A.2. Prompts to Instruction-Tuned LLMs for the Italian Laguage</head><p>Each model has been prompted with four different prompts for each Q&amp;A task (as described in Section 2.1).</p><p>Here is a complete list of the prompts template used in our experiments: in the template the {Item} is the sentence to be analyzed and {Question} is replaced with the corresponding comprehension question. OC and SVO senteces:</p><p>• Data la frase "{Item}", rispondi alla seguente domanda:"{Question}" Rispondi SOLAMENTE con SI o NO. • Considera la frase: "{Item}". Rispondi con 'SI' o 'NO' alla seguente domanda:"{Question}" • Considera la frase: "{Item}". {Question} Rispondi brevemente, SOLAMENTE con con 'SI' o 'NO'. • Considera la frase: "{Item}". Rispondi con 'SI' o 'NO'. {Question} NC sentences:</p><p>• Data la frase "{Item}", rispondi alla seguente domanda:"{Question}" Rispondi in due parole. • Considera la frase: "{Item}". Rispondi solo con le due parole che rispondono alla seguente domanda:"{Question}" • Considera la frase: "{Item}". {Question} Rispondi SOLO con le due parole che rispondono alla seguente domanda. • Considera la frase: "{Item}". Rispondi solo con due parole. {Question} MVP sentences:</p><p>• Data la frase "{Item}", rispondi alla seguente domanda:"{Question}" Rispondi solo con un nome. • Considera la frase: "{Item}". Rispondi solo con il nome che risponde alla seguente domanda:"{Question}" • Considera la frase: "{Item}". {Question} Rispondi SOLO con il nome che risponde alla domanda. • Considera la frase: "{Item}". Rispondi solo con un nome. {Question}</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A.3. Responsible Attention Heads per Layer in each subtask</head><p>In Figure <ref type="figure" target="#fig_2">3</ref>, the responsible attention heads per layer is depicted. As described in Section 3.2, some layers tend to demonstrate a high number of attention heads responsible for the generation. In particular, layers around layer 20 seem to focus more on relevant words for the correct generation of the answer than the other. Since the correct generation implies the capability of understanding the role of different words by a model, we claim that those level encodes some kind of syntactic information.</p><p>It is worth noticing that similar layers are responsible for the different sub tasks, in particular for the LLaMa-base models and for Qwen-2-7b model.  </p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Number of responsible heads per layer in the Q&amp;A task defined over NC sentences. The higher the number of responsible heads, the more the layer as a whole focus on syntax.</figDesc><graphic coords="5,89.29,84.19,416.69,128.13" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: Norm-based attribution matrix of Meta-Llama-3-8B on one example of the task presented in Section 2.1 on NC sentences.</figDesc><graphic coords="9,89.29,260.67,416.69,212.30" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Figure 3 :</head><label>3</label><figDesc>Figure 3: Number of responsible heads per layer in the Q&amp;A task defined over two task: OC and SVO sentences (3a) and MVP sentences (3b).</figDesc><graphic coords="10,89.29,386.29,416.69,131.05" type="bitmap" /></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">All models parameters are available on Huggingface's transformers library<ref type="bibr" target="#b20">[21]</ref> </note>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Person features and lexical restrictions in italian clefts</title>
		<author>
			<persName><forename type="first">C</forename><surname>Chesi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Frontiers in Psychology</title>
		<imprint>
			<biblScope unit="volume">10</biblScope>
			<biblScope unit="page">2105</biblScope>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Individual differences in syntactic processing: The role of working memory</title>
		<author>
			<persName><forename type="first">J</forename><surname>King</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">A</forename><surname>Just</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of memory and language</title>
		<imprint>
			<biblScope unit="volume">30</biblScope>
			<biblScope unit="page" from="580" to="602" />
			<date type="published" when="1991">1991</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Asymmetries in extraction from nominal copular sentences: a challenging case study for nlp tools</title>
		<author>
			<persName><forename type="first">P</forename><surname>Lorusso</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">P</forename><surname>Greco</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Chesi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Moro</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Sixth Italian Conference on Computational Linguistics CLiC-it 2019</title>
				<meeting>the Sixth Italian Conference on Computational Linguistics CLiC-it 2019<address><addrLine>Bari</addrLine></address></meeting>
		<imprint>
			<publisher>CEUR</publisher>
			<date type="published" when="2019">November 13-15, 2019. 2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">A primer in BERTology: What we know about how BERT works</title>
		<author>
			<persName><forename type="first">A</forename><surname>Rogers</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Kovaleva</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Rumshisky</surname></persName>
		</author>
		<idno type="DOI">10.1162/tacl_a_00349</idno>
		<ptr target="https://aclanthology.org/2020.tacl-1.54.doi:10.1162/tacl_a_00349" />
	</analytic>
	<monogr>
		<title level="j">Transactions of the Association for Computational Linguistics</title>
		<imprint>
			<biblScope unit="volume">8</biblScope>
			<biblScope unit="page" from="842" to="866" />
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<monogr>
		<title level="m" type="main">A primer on the inner workings of transformer-based language models</title>
		<author>
			<persName><forename type="first">J</forename><surname>Ferrando</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Sarti</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Bisazza</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">R</forename><surname>Costa-Jussà</surname></persName>
		</author>
		<ptr target="https://arxiv.org/abs/2405.00208.arXiv:2405.00208" />
		<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Attention is not Explanation</title>
		<author>
			<persName><forename type="first">S</forename><surname>Jain</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">C</forename><surname>Wallace</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/N19-1357</idno>
		<ptr target="https://aclanthology.org/N19-1357.doi:10.18653/v1/N19-1357" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</title>
		<title level="s">Long and Short Papers</title>
		<editor>
			<persName><forename type="first">J</forename><surname>Burstein</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">C</forename><surname>Doran</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">T</forename><surname>Solorio</surname></persName>
		</editor>
		<meeting>the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies<address><addrLine>Minneapolis, Minnesota</addrLine></address></meeting>
		<imprint>
			<publisher>Association for Computational Linguistics</publisher>
			<date type="published" when="2019">2019</date>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page" from="3543" to="3556" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Attention is not not explanation</title>
		<author>
			<persName><forename type="first">S</forename><surname>Wiegreffe</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Pinter</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/D19-1002</idno>
		<ptr target="https://aclanthology.org/D19-1002.doi:10.18653/v1/D19-1002" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Association for Computational Linguistics</title>
				<editor>
			<persName><forename type="first">K</forename><surname>Inui</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">J</forename><surname>Jiang</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">V</forename><surname>Ng</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">X</forename><surname>Wan</surname></persName>
		</editor>
		<meeting>the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Association for Computational Linguistics<address><addrLine>Hong Kong, China</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="11" to="20" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">What does BERT look at? an analysis of BERT&apos;s attention</title>
		<author>
			<persName><forename type="first">K</forename><surname>Clark</surname></persName>
		</author>
		<author>
			<persName><forename type="first">U</forename><surname>Khandelwal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Levy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">D</forename><surname>Manning</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/W19-4828</idno>
		<ptr target="https://aclanthology.org/W19-4828.doi:10.18653/v1/W19-4828" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, Association for Computational Linguistics</title>
				<editor>
			<persName><forename type="first">T</forename><surname>Linzen</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">G</forename><surname>Chrupała</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Y</forename><surname>Belinkov</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">D</forename><surname>Hupkes</surname></persName>
		</editor>
		<meeting>the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, Association for Computational Linguistics<address><addrLine>Florence, Italy</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="276" to="286" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Is attention interpretable?</title>
		<author>
			<persName><forename type="first">S</forename><surname>Serrano</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><forename type="middle">A</forename><surname>Smith</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/P19-1282</idno>
		<ptr target="https://aclanthology.org/P19-1282.doi:10.18653/v1/P19-1282" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics</title>
				<editor>
			<persName><forename type="first">A</forename><surname>Korhonen</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">D</forename><surname>Traum</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">L</forename><surname>Màrquez</surname></persName>
		</editor>
		<meeting>the 57th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics<address><addrLine>Florence, Italy</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="2931" to="2951" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Attention is not only a weight: Analyzing transformers with vector norms</title>
		<author>
			<persName><forename type="first">G</forename><surname>Kobayashi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Kuribayashi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Yokoi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Inui</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2020.emnlp-main.574</idno>
		<ptr target="https://aclanthology.org/2020.emnlp-main.574.doi:10.18653/v1/2020.emnlp-main.574" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Association for Computational Linguistics</title>
				<editor>
			<persName><forename type="first">B</forename><surname>Webber</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">T</forename><surname>Cohn</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Y</forename><surname>He</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Y</forename><surname>Liu</surname></persName>
		</editor>
		<meeting>the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Association for Computational Linguistics</meeting>
		<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="7057" to="7075" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Incorporating Residual and Normalization Layers into Analysis of Masked Language Models</title>
		<author>
			<persName><forename type="first">G</forename><surname>Kobayashi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Kuribayashi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Yokoi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Inui</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2021.emnlp-main.373</idno>
		<ptr target="https://aclanthology.org/2021.emnlp-main.373.doi:10.18653/v1/2021.emnlp-main.373" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Online and</title>
				<editor>
			<persName><forename type="first">M.-F</forename><surname>Moens</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">X</forename><surname>Huang</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">L</forename><surname>Specia</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">S</forename><forename type="middle">W</forename><surname>.-T</surname></persName>
		</editor>
		<editor>
			<persName><surname>Yih</surname></persName>
		</editor>
		<meeting>the 2021 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Online and<address><addrLine>Punta Cana, Dominican Republic</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2021">2021</date>
			<biblScope unit="page" from="4547" to="4568" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Analysis methods in neural language processing: A survey</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Belinkov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Glass</surname></persName>
		</author>
		<idno type="DOI">10.1162/tacl_a_00254</idno>
		<ptr target="https://aclanthology.org/Q19-1004.doi:10.1162/tacl_a_00254" />
	</analytic>
	<monogr>
		<title level="j">Transactions of the Association for Computational Linguistics</title>
		<imprint>
			<biblScope unit="volume">7</biblScope>
			<biblScope unit="page" from="49" to="72" />
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">A structural probe for finding syntax in word representations</title>
		<author>
			<persName><forename type="first">J</forename><surname>Hewitt</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">D</forename><surname>Manning</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/N19-1419</idno>
		<ptr target="https://aclanthology.org/N19-1419.doi:10.18653/v1/N19-1419" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</title>
		<title level="s">Long and Short Papers</title>
		<editor>
			<persName><forename type="first">J</forename><surname>Burstein</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">C</forename><surname>Doran</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">T</forename><surname>Solorio</surname></persName>
		</editor>
		<meeting>the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies<address><addrLine>Minneapolis, Minnesota</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page" from="4129" to="4138" />
		</imprint>
	</monogr>
	<note>Association for Computational Linguistics</note>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">What does BERT learn about the structure of language?</title>
		<author>
			<persName><forename type="first">G</forename><surname>Jawahar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Sagot</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Seddah</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/P19-1356</idno>
		<ptr target="https://aclanthology.org/P19-1356.doi:10.18653/v1/P19-1356" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics</title>
				<editor>
			<persName><forename type="first">A</forename><surname>Korhonen</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">D</forename><surname>Traum</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">L</forename><surname>Màrquez</surname></persName>
		</editor>
		<meeting>the 57th Annual Meeting of the Association for Computational Linguistics<address><addrLine>Florence, Italy</addrLine></address></meeting>
		<imprint>
			<publisher>Association for Computational Linguistics</publisher>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="3651" to="3657" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Exploring linguistic properties of monolingual BERTs with typological classification among languages</title>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">S</forename><surname>Ruzzetti</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Ranaldi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Logozzo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Mastromattei</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Ranaldi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><forename type="middle">M</forename><surname>Zanzotto</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2023.findings-emnlp.963</idno>
		<ptr target="https://aclanthology.org/2023.findings-emnlp.963.doi:10.18653/v1/2023.findings-emnlp.963" />
	</analytic>
	<monogr>
		<title level="m">Findings of the Association for Computational Linguistics: EMNLP 2023, Association for Computational Linguistics</title>
				<editor>
			<persName><forename type="first">H</forename><surname>Bouamor</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">J</forename><surname>Pino</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">K</forename><surname>Bali</surname></persName>
		</editor>
		<meeting><address><addrLine>Singapore</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2023">2023</date>
			<biblScope unit="page" from="14447" to="14461" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Masked language modeling and the distributional hypothesis: Order word matters pretraining for little</title>
		<author>
			<persName><forename type="first">K</forename><surname>Sinha</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Jia</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Hupkes</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Pineau</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Williams</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Kiela</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2021.emnlp-main.230</idno>
		<ptr target="https://aclanthology.org/2021.emnlp-main.230.doi:10.18653/v1/2021.emnlp-main.230" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Online and</title>
				<editor>
			<persName><forename type="first">M.-F</forename><surname>Moens</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">X</forename><surname>Huang</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">L</forename><surname>Specia</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">S</forename><forename type="middle">W</forename><surname>.-T</surname></persName>
		</editor>
		<editor>
			<persName><surname>Yih</surname></persName>
		</editor>
		<meeting>the 2021 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Online and<address><addrLine>Punta Cana, Dominican Republic</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2021">2021</date>
			<biblScope unit="page" from="2888" to="2913" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Word order does matter and shuffled language models know it</title>
		<author>
			<persName><forename type="first">M</forename><surname>Abdou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Ravishankar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Kulmizev</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Søgaard</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2022.acl-long.476</idno>
		<ptr target="https://aclanthology.org/2022.acl-long.476.doi:10.18653/v1/2022.acl-long.476" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics</title>
		<title level="s">Long Papers</title>
		<editor>
			<persName><forename type="first">S</forename><surname>Muresan</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">P</forename><surname>Nakov</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">A</forename><surname>Villavicencio</surname></persName>
		</editor>
		<meeting>the 60th Annual Meeting of the Association for Computational Linguistics<address><addrLine>Dublin, Ireland</addrLine></address></meeting>
		<imprint>
			<publisher>Association for Computational Linguistics</publisher>
			<date type="published" when="2022">2022</date>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page" from="6907" to="6919" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">Accompl-it@ evalita2020: Overview of the acceptability &amp; complexity evaluation task for italian</title>
		<author>
			<persName><forename type="first">D</forename><surname>Brunato</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Chesi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Dell'orletta</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Montemagni</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Venturi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Zamparelli</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">CEUR WORK-SHOP PROCEEDINGS</title>
		<title level="s">CEUR Workshop Proceedings</title>
		<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
	<note>CEUR-WS. org</note>
</biblStruct>

<biblStruct xml:id="b18">
	<monogr>
		<title/>
		<author>
			<persName><surname>Evalita</surname></persName>
		</author>
		<ptr target="https://www.evalita.it/campaigns/evalita-2020/" />
		<imprint>
			<date type="published" when="2020">2020. 2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">Asymmetries in nominal copular sentences: Psycholinguistic evidence in favor of the raising analysis</title>
		<author>
			<persName><forename type="first">M</forename><surname>Greco</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Lorusso</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Chesi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Moro</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Lingua</title>
		<imprint>
			<biblScope unit="volume">245</biblScope>
			<biblScope unit="page">102926</biblScope>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<monogr>
		<title level="m" type="main">HuggingFace&apos;s Transformers: Stateof-the-art Natural Language Processing</title>
		<author>
			<persName><forename type="first">T</forename><surname>Wolf</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Debut</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Sanh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Chaumond</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Delangue</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Moi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Cistac</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Rault</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Louf</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Funtowicz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Brew</surname></persName>
		</author>
		<idno>ArXiv abs/1910.0</idno>
		<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<monogr>
		<author>
			<persName><forename type="first">M</forename><surname>Polignano</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Basile</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Semeraro</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2405.07101</idno>
		<title level="m">Advanced natural-based interaction for the italian language: Llamantino-3-anita</title>
				<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<monogr>
		<author>
			<persName><forename type="first">H</forename><surname>Touvron</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Martin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Stone</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Albert</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Almahairi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Babaei</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Bashlykov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Batra</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Bhargava</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Bhosale</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Bikel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Blecher</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">C</forename><surname>Ferrer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Cucurull</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Esiobu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Fernandes</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Fu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Fu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Fuller</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Gao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Goswami</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Goyal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Hartshorn</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Hosseini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Hou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Inan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Kardas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Kerkez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Khabsa</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Kloumann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Korenev</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">S</forename><surname>Koura</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M.-A</forename><surname>Lachaux</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Lavril</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Liskovich</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Lu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Mao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Martinet</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Mihaylov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Mishra</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Molybog</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Nie</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Poulton</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Reizenstein</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Rungta</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Saladi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Schelten</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Silva</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">M</forename><surname>Smith</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Subramanian</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><forename type="middle">E</forename><surname>Tan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Tang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Taylor</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Williams</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">X</forename><surname>Kuan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Xu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Yan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Zarov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Fan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Kambadur</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Narang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Rodriguez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Stojnic</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Edunov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Scialom</surname></persName>
		</author>
		<ptr target="https://arxiv.org/abs/2307.09288.arXiv:2307.09288" />
		<title level="m">Llama 2: Open foundation and fine-tuned chat models</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b23">
	<monogr>
		<ptr target="https://www.igenius.ai/it/language-models" />
		<title level="m">iGenius | Large Language Model -igenius</title>
				<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b24">
	<analytic>
		<title level="a" type="main">model card</title>
	</analytic>
	<monogr>
		<title level="m">AI@Meta</title>
				<imprint>
			<date type="published" when="2024">2024</date>
			<biblScope unit="volume">3</biblScope>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
