<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="it">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Exploring Italian sentence embeddings properties through multi-tasking</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Vivi</forename><surname>Nastase</surname></persName>
							<email>nastase@gmail.com</email>
							<affiliation key="aff0">
								<orgName type="institution">Idiap Research Institute</orgName>
								<address>
									<settlement>Martigny</settlement>
									<country key="CH">Switzerland</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Giuseppe</forename><surname>Samo</surname></persName>
							<email>giuseppe.samo@idiap.ch</email>
							<affiliation key="aff0">
								<orgName type="institution">Idiap Research Institute</orgName>
								<address>
									<settlement>Martigny</settlement>
									<country key="CH">Switzerland</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Chunyang</forename><surname>Jiang</surname></persName>
							<email>chunyang.jiang42@gmail.com</email>
							<affiliation key="aff0">
								<orgName type="institution">Idiap Research Institute</orgName>
								<address>
									<settlement>Martigny</settlement>
									<country key="CH">Switzerland</country>
								</address>
							</affiliation>
							<affiliation key="aff1">
								<orgName type="institution">University of Geneva</orgName>
								<address>
									<settlement>Geneva</settlement>
									<country key="CH">Switzerland</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Paola</forename><surname>Merlo</surname></persName>
							<email>paola.merlo@unige.ch</email>
							<affiliation key="aff0">
								<orgName type="institution">Idiap Research Institute</orgName>
								<address>
									<settlement>Martigny</settlement>
									<country key="CH">Switzerland</country>
								</address>
							</affiliation>
							<affiliation key="aff1">
								<orgName type="institution">University of Geneva</orgName>
								<address>
									<settlement>Geneva</settlement>
									<country key="CH">Switzerland</country>
								</address>
							</affiliation>
						</author>
						<author>
							<affiliation key="aff2">
								<orgName type="department" key="dep1">10th</orgName>
								<orgName type="department" key="dep2">Italian Conference on Computational Linguistics</orgName>
								<address>
									<addrLine>Dec 04 -06</addrLine>
									<postCode>2024</postCode>
									<settlement>Pisa</settlement>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Exploring Italian sentence embeddings properties through multi-tasking</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">A3628BF77B11EF2C93F3A4F552D30257</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T17:34+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>synthetic structured data</term>
					<term>multi-task</term>
					<term>diagnostic studies of deep learning models</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>We investigate to what degree existing LLMs encode abstract linguistic information in Italian in a multi-task setting. We exploit curated synthetic data on a large scale -several Blackbird Language Matrices (BLMs) problems in Italian -and use them to study how sentence representations built using pre-trained language models encode specific syntactic and semantic information. We use a two-level architecture to model separately a compression of the sentence embeddings into a representation that contains relevant information for a task, and a BLM task. We then investigate whether we can obtain compressed sentence representations that encode syntactic and semantic information relevant to several BLM tasks. While we expected that the sentence structure -in terms of sequence of phrases/chunks -and chunk properties could be shared across tasks, performance and error analysis show that the clues for the different tasks are encoded in different manners in the sentence embeddings, suggesting that abstract linguistic notions such as constituents or thematic roles does not seem to be present in the pretrained sentence embeddings.</p><p>L'obiettivo di questo lavoro è indagare fino a che punto gli attuali LLM apprendono rappresentazioni linguistiche astratte in configurazioni multitask. Utilizzando dati sintetici curati su larga scala di vari problemi BLM in italiano, studiamo come le rappresentazioni di frasi costruite da modelli di linguaggio pre-addestrati codifichino le informazioni semantiche e sintattiche. Abbiamo utilizzato un'architettura a due livelli per modellare separatamente, da un lato, la compressione degli embeddings delle frasi di input in una rappresentazione che contiene informazioni rilevanti per i tasks BLM e, dall'altro lato, il BLM stesso. Abbiamo poi verificato se fosse possibile ottenere rappresentazioni compresse di frasi che codificano informazioni sintattiche e semantiche rilevanti per i diversi tasks BLM. Contrariamente alla predizione che la struttura della frase -in termini di sequenza di frasi/chunks -e le proprietà dei chunk possano essere condivise tra i vari tasks, i risultati e l'analisi degli errori mostrano che gli indizi per i diversi task sono codificati in modo diverso negli embeddings delle frasi. Questo risultato suggerisce che nozioni linguistiche astratte come i costituenti o i ruoli tematici non vi sembrano essere presenti.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="it">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>Driven by increasing computational scale and progress in deep learning techniques, NLP models can rival human capabilities on established benchmarks. New benchmarks, then, that capture deeper levels of language understanding must be created and analysed <ref type="bibr" target="#b0">[1]</ref>.</p><p>Blackbird's Language Matrices (BLM) <ref type="bibr" target="#b1">[2]</ref> is a recent task inspired by visual tests of analytic intelligence (Raven Progressive Matrices/RPMs, <ref type="bibr" target="#b2">[3]</ref>). The BLM tasks have cast light on whether the correct predictions in previously studied linguistic problems, e.g. number agreement or verb alternations, stem from sentence embeddings that encode deeper linguistic information, such as syntactic structure and semantic properties of phrases <ref type="bibr" target="#b3">[4,</ref><ref type="bibr" target="#b4">5,</ref><ref type="bibr" target="#b5">6]</ref>. We found that higher-level information -syntac-tic structure and argument structure -can be assembled from the information encoded in the sentence embeddings. This, however, may not be due to a deeper understanding of such information encoded by LLMs, but rather because of useful surface indicators <ref type="bibr" target="#b6">[7]</ref>.</p><p>In this paper, we adopt BLMs to investigate whether current pretrained models encode abstract linguistic notions, such as constituents, and are able to do so in a manner that comprises both functional elements, such as pronouns, demonstratives and lexical elements, such as nominal constituents.</p><p>We concentrate on Italian, and study several grammatical problems whose solutions can theoretically help each other, in a multi-task setting. We adopt a two-level architecture developed specifically to model what we know about how humans solve puzzles similar to BLMs <ref type="bibr" target="#b7">[8]</ref>. Level 1 aims to obtain compressed sentence representations that capture information about constituents and their properties; level 2 uses the compressed sentence representations to solve a BLM problem. This architecture provides a tool to study how LLMs encode different types of syntactic and semantic information.</p><p>We make two contributions: (i) an initial core BLM dataset for Italian that covers linguistic problems of different nature; (ii) single and multi-task experiments that provide new insights into the information encoded by LLMs. The datasets are available at https://www.idiap.ch/datas et/(blm-agri|blm-causi|blm-odi) and the code at https: //github.com/CLCL-Geneva/BLM-SNFDisentangling.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Related Work</head><p>Multi-task learning has been popular in improving NLP systems' performance by using knowledge shared across multiple tasks <ref type="bibr" target="#b8">[9]</ref>.</p><p>Multi-task learning architectures include parallel, hierarchical, and modular designs <ref type="bibr" target="#b9">[10]</ref>. Parallel architectures share intermediate layers across tasks, conducive to efficient knowledge transfer <ref type="bibr" target="#b10">[11]</ref>. Hierarchical architectures capture task dependencies by layering task-specific modules on shared bases. Modular approaches selectively share components among tasks to balance between generalisation and task-specific optimisation <ref type="bibr" target="#b11">[12]</ref>. These training strategies are not mutually exclusive and can be combined.</p><p>Multi-task learning can be used efficiently in resourceconstrained environments, to counter data scarcity and overfitting: aggregating training data and sharing parameters across related tasks acts as a form of data augmentation <ref type="bibr" target="#b12">[13]</ref>.</p><p>Effective multi-task learning depends on the relatedness of the tasks involved. Tasks that are similar or have similar objectives tend to benefit more from shared representations. This observation has been used in various NLP tasks, including named entity recognition <ref type="bibr" target="#b13">[14]</ref>, text generation <ref type="bibr" target="#b14">[15]</ref>, and machine translation <ref type="bibr" target="#b15">[16]</ref>, among others. Selecting related tasks that contribute positively to the shared model's training is important and remains an active area of research <ref type="bibr" target="#b8">[9]</ref>.</p><p>Pretrained large language models exhibit generalpurpose abilities and knowledge, with high results with little or no fine-tuning on downstream tasks <ref type="bibr" target="#b16">[17,</ref><ref type="bibr" target="#b17">18]</ref>. We can then regard these language models as the results of "multi-task" learning, and our aim here is to test whether sentence embeddings obtained from these models encode syntactic and semantic information consistently, such that different BLM problems that rely on similar linguistic information draw on the same clues from these representations. In particular, we will use BLM tasks on subject-verb agreement -which relies on chunk structure and the chunks' grammatical number propertiesand on verb alternations -which relies on chunk structure and the chunks' semantic role properties -to test whether chunk structure is encoded in a manner that allows for it to be shared by the two tasks. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>BLM agreement problem</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">The BLM task and the BLM Italian datasets</head><p>Raven's progressive matrices are multiple-choice completion IQ tests, whose solution requires discovering underlying generative rules of a sequence of images <ref type="bibr" target="#b2">[3]</ref>.</p><p>A similar task has been developed for linguistic problems, called Blackbird Language Matrices (BLMs) <ref type="bibr" target="#b1">[2]</ref>, as given in Figure <ref type="figure">1</ref>, which illustrates the template of a BLM agreement matrix. A BLM comprises a context and an answer set. The context is a sequence of sentences generated following the relevant rules of a given linguistic phenomenon under investigation and that this way implicitly illustrates these grammatical properties. This sequence also follows some extra-linguistic progression rules. Each context is paired with a set of candidate answers. The answer sets contain minimally contrastive examples built by corrupting some of the generating rules.</p><p>The BLM Italian datasets consists of BLMs focused on the property of subject-verb agreement and two transitive-intransitive alternations: the change-of-state alternation and the object-drop alternation.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">BLM-AgrI -subject-verb agreement in Italian</head><p>The BLM-AgrI dataset is created by manually translating the seed French sentences <ref type="bibr" target="#b3">[4]</ref> into Italian by a native speaker, one of the authors, and then generating the full dataset following the same process of lexical augmentation and sentence shuffling among instances described in <ref type="bibr" target="#b3">[4]</ref>. The internal nominal structure in these languages is very similar, so translations are almost parallel. An illustrative, simplified example for Italian is provided in Figure <ref type="figure">7</ref>, in the appendix. The dataset comprises three subsets of increasing lexical complexity (called Type I, Type II and Type III) to test the ability of the system to handle item novelty.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">BLM-CausI and BLM-OdI</head><p>While BLM-AgrI tests information about a formal grammatical property, agreement, the Causative (Caus) and Object-drop (Od) alternation datasets test lexical semantic properties of verbs, their ability to enter or not a causative alternation. Caus represents the causative/inchoative alternation, where the object of the transitive verb bears the same semantic role (Patient) as the subject of the intransitive verb (L'artista ha aperto la finestra/La finestra si è aperta 'The artist opened the window'/'The window opened'). The transitive form of the verb has a causative meaning. In contrast, the subject in Od bears the same semantic role (Agent) in both the transitive and intransitive forms (L'artista dipingeva la finestra/L'artista dipingeva 'the artist painted the window'/'the artist painted') and the verb does not have a causative meaning <ref type="bibr" target="#b18">[19,</ref><ref type="bibr" target="#b19">20]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>BLM-CausI context and answers</head><p>The context set of the verb alternation varies depending on the presence of one or two arguments and their attributes (agents, Ag; patients, Pat) and the active (Akt) and passive (Pass) or passive voice of the verb. The non-linguistic factor that structures the sequence is an alternation every two items between a prepositional phrase introduced by any preposition (e.g., in pochi secondi, P-NP) and a PP introduced by the agentive da-NP (e.g., dall'artista, da-Ag/da-Pat).</p><p>The answer set is composed of one correct answer and contrastive wrong answers, all formed by the same four elements: a verb, two nominal constituents and a prepositional phrase. Figure <ref type="figure" target="#fig_1">2</ref> shows the template.<ref type="foot" target="#foot_0">1</ref> </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>BLM-OdI Context and Answers</head><p>The BLM for Od is the same as for Caus, but here the passive voice serves as a confounding element and one of the contrastive answers for Caus is, in fact, the correct answer here.</p><p>The template is also in Figure <ref type="figure" target="#fig_1">2</ref>. Due to the asymmetry between the two classes of verbs, the contexts of the BLMs minimally differ in the intransitive followed by P-NP (sentence 7). The correct answer also varies across the two groups, although in both cases it is an intransitive form with a da-NP. Examples are shown in the Appendix.  We illustrate the data in Figure <ref type="figure">8</ref> in the appendix with the Italian Change-of-state verb chiudere 'close'.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Lexicalisation</head><p>In line with previous work on BLMs, each dataset also contains a varying amount of lexicalisation. In type I the lexical material of the sentences within a single context does not change, in type II only the verb remains the same, in type III data all words can change (Figure <ref type="figure" target="#fig_8">9</ref>, in the appendix).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3.">Dataset statistics</head><p>Each subset is split 90:20:10 into train:dev:test subsets. The training and testing are disjoint (agreement data is split based on the correct answer, the alternations data based on the verb). Agreement has 230 test instances for type I, 4121 for types II and III. The verb alternations have 240 test instances for all subsets. We randomly sample a number of training instances, depending on the experimental set-up.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Multi-task representations</head><p>Sentence embeddings encode much information from the input sentence -lexical, syntactic, semantic, and possibly other types of information. Previous experiments have shown that sentence embeddings can be compressed into very small representations (vectors of size 5) that encode information about the structure of the sentence in terms of chunks and their properties, such that they contribute to finding the sequence patterns in BLMs <ref type="bibr" target="#b5">[6]</ref>. In this work, we investigate whether several BLM tasks can share the same structural information from a sentence embedding. Towards this end, we built a multi-task version of a two-level system, illustrated in Figure <ref type="figure" target="#fig_2">3</ref>. In this system, one level processes individual sentences and learns to compress them into small vectors that retain information pertinent to a task and the other level uses the compressed sentence representation to find patterns across an input sequence to solve a BLM task. The multitask variation consists in a single shared sentence-level component, and multiple task components, one for each of the BLM tasks.</p><p>The BLM problems encode a linguistic phenomenon through data that has structure on multiple levelswithin sentences, and across a sequence of sentences. We can exploit this structure to develop an indirectly supervised approach to discover and use these different levels of structure. We thus model the solving of a BLM task as a two-step process: (i) compress individual sentences into a representation that emphasizes the sentence structure relevant to the BLM problem (e.g. chunks and their grammatical number for the subject-verb agreement task) (ii) use the compressed representations to detect the sequence-level pattern and solve the BLM task. This two-step process has been shown to be used by people solving visual intelligence tests <ref type="bibr" target="#b20">[21]</ref>. In our case, this setup allows us to investigate whether the sentence level can be guided to learn shared information, relevant to the different linguistic tasks described in section 3.</p><p>We implement this approach in the two-level intertwined architecture illustrated in Figure <ref type="figure" target="#fig_2">3</ref>, and described in detail elsewhere <ref type="bibr" target="#b5">[6]</ref>. The data is pre-encoded with Electra <ref type="bibr" target="#b17">[18]</ref>. <ref type="foot" target="#foot_1">2</ref> The sentence representations is provided by the embedding of the [CLS] token. <ref type="foot" target="#foot_2">3</ref> . We chose Electra because of its stronger sentence-level supervision signal, which leads to higher results when testing the encoding of structural information compared to BERT, RoBERTa, and models tuned by semantic similarity <ref type="bibr" target="#b5">[6]</ref>.</p><p>The two levels are learned together. The input is a BLM instance which is processed on the fly to produce training instances for the sentence level for each sentence 𝑖𝑛 𝑘 in the input sequence 𝑆. The compressed sentence representations on the latent layer 𝑧𝑖𝑛 𝑘 are stacked and passed as input to the task level, which produces a sentence representation 𝑎𝑛𝑠𝑤 as output, which is compared to the answer set of the respective BLM instance 𝐴.</p><p>The sentence level uses a variational encode-decoder architecture to learn how to compress on the latent layer a representation that captures relevant structural information. We guide the system towards this representation by constructing a contrastive set of candidates for comparison with the reconstructed input. The correct output (𝑜𝑢𝑡 + ) is the same as the input (𝑖𝑛), and a selection of other sentences from the input sequence will be the contrastive negative outputs (𝑂𝑢𝑡 − = {𝑜𝑢𝑡 − 𝑖 , 𝑖 = 1, 𝑁𝑛𝑒𝑔𝑠}, 𝑁𝑛𝑒𝑔𝑠 = 7 (note that an input sequence consists of sentences with different patterns to each other -Figure <ref type="figure" target="#fig_1">1 and 2</ref>). We use a max-margin loss function to take advantage of the contrastive answers, 𝑖𝑛 ˆis the reconstructed input sentence from the sampled latent vector 𝑧𝑖𝑛:</p><formula xml:id="formula_0">𝑙𝑜𝑠𝑠𝑠𝑒𝑛𝑡(𝑖𝑛) = 𝑚𝑎𝑥𝑀 (𝑖𝑛 ˆ, 𝑜𝑢𝑡 + , 𝑂𝑢𝑡 − ) + 𝐾𝐿(𝑧𝑖𝑛||𝒩 (0, 1)) 𝑚𝑎𝑥𝑀 (𝑖 ˆ𝑛, 𝑜𝑢𝑡 + , 𝑂𝑢𝑡 − ) = 𝑚𝑎𝑥(0, 1 − 𝑐𝑜𝑠(𝑖𝑛 ˆ, 𝑜𝑢𝑡 + ) + ∑︀ 𝑜𝑢𝑡 − 𝑖 ∈𝑂𝑢𝑡 − 𝑐𝑜𝑠(𝑖𝑛 ^,𝑜𝑢𝑡 − 𝑖 )</formula></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>𝑁𝑛𝑒𝑔𝑠</head><p>)</p><p>The loss at the task level for input sequence 𝑆 is computed in a similar manner for the constructed answer 𝑎𝑛𝑠𝑤, but relative to the answer set 𝒜 and the correct answer 𝑎𝑐 of the task:</p><formula xml:id="formula_1">𝑙𝑜𝑠𝑠 𝑡𝑎𝑠𝑘 (𝑆) = 𝑚𝑎𝑥𝑀 (𝑎𝑛𝑠𝑤, 𝑎𝑐, 𝐴 ∖ {𝑎𝑐})</formula><p>+ 𝐾𝐿𝑠𝑒𝑞(𝑧𝑆|𝒩 (0, 1)).</p><p>The loss of the two-level systems is:</p><formula xml:id="formula_2">𝑙𝑜𝑠𝑠(𝑆) = ∑︀ 𝑖𝑛 𝑘 ∈𝑆 𝑙𝑜𝑠𝑠𝑠𝑒𝑛𝑡(𝑖𝑛 𝑘 ) + 𝑙𝑜𝑠𝑠 𝑡𝑎𝑠𝑘 (𝑆)</formula><p>The input batches are shuffled, to alternate between tasks during training, and avoid getting stuck in a local maximum for one of the tasks.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Multi-task results</head><p>Previous published work from our group and current ongoing work has benchmarked the problems generated by some of these datasets <ref type="bibr" target="#b3">[4,</ref><ref type="bibr" target="#b4">5]</ref>. This work has shown that information about the syntactic phrases in a sentence and their properties can be obtained from sentence embeddings, and this information is helpful in solving the BLM tasks. We had studied these tasks separately, and investigate here whether such structure is encoded in the sentence embeddings, or whether it is assembled based on shallower patterns within the sentence representations. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Agr</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Discussion</head><p>We expect that if the multi-task setup succeeds in sharing information across tasks, then the results on the individual test data will be at least as good as when learning tasks individually, given that the multitask setup uses a larger training set data -the union of the training sets of the individual tasks. But, overall, this does not seem to be the case. As the results in Figure <ref type="figure" target="#fig_3">4</ref> show (and also the detailed results in Tables 1-2 for the Italian Electra pretrained model, and in Tables 3-4 for a multilingual Electra pretrained model), single-task training outperforms multi-tasking in the agreement and verb alternation subtasks. The drop suggests that the multi-task model is not able to learn shared properties for these tasks, and forcing it to do so leads to a model that is not optimal for either of them. Both tasks require information about the syntactic structure (or sequence of phrases), while each requires different phrase properties -grammatical number for the agreement task, and semantic properties for the verb alternation. While the system is able to distil all this information from sentence embeddings in the single-task setting, it is not able to compress it into a shared representation when learning the tasks together.</p><p>The Od single-task and multi-task have comparable performance, probably because the Od tasks involve a simpler alternation than the Caus task. They do not have a causative meaning and do not require a change in the semantic role of the subjects.  The comparison of all the tasks suggests that some syntactic and semantic regularities -such as constituents, grammatical number and semantic roles-cannot be encoded together as they compete with each other when the system learns to distil them from the pretrained sentence embeddings.</p><p>Error Analysis For the agreement task, errors on the grammatical number of the attractor nouns (WN1, WN2) are high under both paradigms. These are "sequence errors", indicating that the system was not able to detect the patterns in the input sequence, possibly because individual sentence structures were not properly detected. Previous experiments have shown, though, that in the single-task setting, the sentence level does manage to compress the desired information <ref type="bibr" target="#b5">[6]</ref>. The fact that both these errors increase in the multi-task setting indicates that the information compression on the sentence level is less successful than in the single-task setting.</p><p>For the alternation tasks, error patterns vary, although their distributions remain similar between single-task and multi-task environments. We observe an overall increase of error proportions in the multi-task environment. Specifically, mistakes of the type I-int are frequent in type III data for the Caus task. These errors incorrectly map the thematic roles onto the syntax of the arguments (e.g. L'artista si è chiuso 'the artist closed' or La carbonara mangiava 'the carbonara was eating'). In the same dataset, we also note an increase of errors related to the last constituent in type I and type II data (errors of type E-WrBy, e.g. La finestra si chiuse dall'artista 'the window closed by the artist'). Finally, for the Od task, we remark that R-trans errors are not the most prominent -these are the errors resulting in standard transitive clauses (e.g., L'artista dipinse un paesaggio 'the artist painted a landscape')-and do not increase in multi-task environments, suggesting that the chosen answer is not derived from some forms of transitive bias <ref type="bibr" target="#b21">[22]</ref>.</p><p>An overall comparison shows that the error patterns vary across subtasks. This variety in error patterns confirms that the different dimensions (types of alternations, levels of lexicalisation and single and multi-task learning)  are separate uncorrelated dimensions. It also indicates that the differences in the F1 results shown in Figure <ref type="figure" target="#fig_3">4</ref> are real, despite the more homogeneous trends exhibited by these aggregated F1 numbers.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.">Conclusions</head><p>In this paper, we have presented curated synthetic datasets of Italian on two linguistic phenomena of an heterogeneous nature, such as agreement and verbal transitive/intransitive alternation, embedded in the BLM task.</p><p>The results on the performance and the error analysis of a tailored two-level architecture have shown that multitask environments do not help, suggesting that abstract linguistic notions, such as constituents or thematic roles do not seem to be present in the learning process.</p><p>Current work is developing new analyses and architectures to probe further in the encoding of information in sentence embeddings and creating new BLM problems across various languages and linguistic phenomena.     </p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head></head><label></label><figDesc>da-Ag P-NP 4 Pat Pass da-Ag da-NP 5 da-NP Correct 2 Ag Akt da-NP I-Int 3 Pat Pass da-Ag ER-Pass 4 Ag Pass da-Pat IER-Pass 5 Pat Akt Ag R-Trans 6 Ag Akt Pat IR-Trans 7 Pat Akt da-Ag E-WrBy 8 Ag Akt da-Pat IE-WrBy Od context 1 Ag Akt Pat P-NP 2 Ag Akt Pat da-NP 3 Pat Pass da-Ag P-NP 4 Pat Pass da-Ag da-NP 5 Pat Pass P-NP 6 Pat Pass da-NP 7 Ag Akt P-NP ? ??? Od answers 1 Pat Akt da-NP I-Int 2 Ag Akt da-NP Correct 3 Pat Pass da-Ag IER-Pass 4 Ag Pass da-Pat ER-Pass 5 Pat Akt Ag IR-Trans 6 Ag Akt Pat R-Trans 7 Pat Akt da-Ag IE-WrBy 8 Ag Akt da-Pat E-WrBy</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: BLM contexts answers and their location of errors (see text) for the Change of state group (Caus) and the object drop (Od) class.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Figure 3 :</head><label>3</label><figDesc>Figure 3: A two-level VAE: the sentence level learns to compress a sentence into a representation useful to solve the BLM problem on the task level.</figDesc><graphic coords="4,89.29,101.03,208.35,73.83" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head>Figure 4 :</head><label>4</label><figDesc>Figure 4: Performance comparison across single-task and multi-task training paradigms for the three subtasks (single task darker shade of each colour, multi-task lighter shade), trained on type-I data, tested on the three types, and averaged over three independent runs. Results obtained using the Italian Electra pretrained model.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_5"><head>Figure 5 :</head><label>5</label><figDesc>Figure 5: Error analysis for agreement: multi-vs. single task, training on type I data, testing on all.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_6"><head></head><label></label><figDesc>Od task error analysis</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_7"><head>Figure 6 :</head><label>6</label><figDesc>Figure 6: Error analysis between single and multi-task training paradigms trained on type-I data, tested on the three types, as averages over three runs (single task darker shade of each colour, multi-task lighter shade). For the Caus and Od tasks, we report only three representative error types of I, E and R.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_8"><head>Figure 9 :</head><label>9</label><figDesc>Figure 9: Examples of Od BLMs for type I, type II and type III</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 1</head><label>1</label><figDesc>Multi-task learning results as F1 averages over three runs (and standard deviation). Training with 3000 instances -1000 from each task.</figDesc><table><row><cell cols="2">train on test on</cell><cell></cell><cell>task</cell><cell></cell></row><row><cell></cell><cell></cell><cell>agreement</cell><cell>Caus</cell><cell>Od</cell></row><row><cell>type I</cell><cell>type I</cell><cell>0.772 (0.011)</cell><cell>0.910 (0.002)</cell><cell>0.996 (0.003)</cell></row><row><cell></cell><cell>type II</cell><cell>0.660 (0.016)</cell><cell>0.849 (0.022)</cell><cell>0.938 (0.007)</cell></row><row><cell></cell><cell>type III</cell><cell>0.483 (0.042)</cell><cell>0.870 (0.027)</cell><cell>0.893 (0.010)</cell></row><row><cell>type II</cell><cell>type I</cell><cell>0.504 (0.056)</cell><cell>0.917 (0.012)</cell><cell>0.993 (0.004)</cell></row><row><cell></cell><cell>type II</cell><cell>0.519 (0.027)</cell><cell>0.872 (0.007)</cell><cell>0.981 (0.007)</cell></row><row><cell></cell><cell>type III</cell><cell>0.406 (0.018)</cell><cell>0.907 (0.004)</cell><cell>0.950 (0.009)</cell></row><row><cell>type III</cell><cell>type I</cell><cell>0.274 (0.012)</cell><cell>0.946 (0.003)</cell><cell>0.994 (0.002)</cell></row><row><cell></cell><cell>type II</cell><cell>0.330 (0.004)</cell><cell cols="2">0.929 (0.003) 0.983 (0.003)</cell></row><row><cell></cell><cell>type III</cell><cell>0.325 (0.008)</cell><cell>0.889 (0.014)</cell><cell>0.967 (0.007)</cell></row><row><cell cols="2">train on test on</cell><cell></cell><cell>task</cell><cell></cell></row><row><cell></cell><cell></cell><cell>agreement</cell><cell>Caus</cell><cell>Od</cell></row><row><cell>type I</cell><cell>type I</cell><cell>0.909 (0.007)</cell><cell>0.919 (0.005)</cell><cell>1.000 (0.000)</cell></row><row><cell></cell><cell>type II</cell><cell>0.760 (0.030)</cell><cell>0.906 (0.017)</cell><cell>0.971 (0.003)</cell></row><row><cell></cell><cell>type III</cell><cell>0.707 (0.028)</cell><cell>0.926 (0.005)</cell><cell>0.940 (0.010)</cell></row><row><cell>type II</cell><cell>type I</cell><cell>0.881 (0.013)</cell><cell>0.932 (0.007)</cell><cell>1.000 (0.000)</cell></row><row><cell></cell><cell>type II</cell><cell>0.784 (0.007)</cell><cell>0.903 (0.010)</cell><cell>0.983 (0.003)</cell></row><row><cell></cell><cell>type III</cell><cell>0.714 (0.005)</cell><cell>0.956 (0.005)</cell><cell>0.975 (0.009)</cell></row><row><cell>type III</cell><cell>type I</cell><cell>0.296 (0.011)</cell><cell>0.960 (0.005)</cell><cell>0.998 (0.002)</cell></row><row><cell></cell><cell>type II</cell><cell>0.345 (0.002)</cell><cell cols="2">0.950 (0.007) 0.993 (0.004)</cell></row><row><cell></cell><cell>type III</cell><cell>0.336 (0.005)</cell><cell>0.918 (0.010)</cell><cell>0.994 (0.004)</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 2</head><label>2</label><figDesc>Single task learning results as F1 averages over three runs (and standard deviation). Training with 2160 instances for Caus and Od for all types, and for agreement 2052 instances for type I (maximum available), and 3000 instances for type II and type III.</figDesc><table /><note>B.2.</note></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_3"><head>Results with the multilingual Electra pretrained model: google/electra-base-discriminator</head><label></label><figDesc></figDesc><table><row><cell cols="2">train on test on</cell><cell></cell><cell>task</cell><cell></cell></row><row><cell></cell><cell></cell><cell>agreement</cell><cell>Caus</cell><cell>Od</cell></row><row><cell>type I</cell><cell>type I</cell><cell>0.664 (0.053)</cell><cell>0.543 (0.011)</cell><cell>0.714 (0.012)</cell></row><row><cell></cell><cell>type II</cell><cell>0.733 (0.018)</cell><cell>0.407 (0.023)</cell><cell>0.561 (0.002)</cell></row><row><cell></cell><cell>type III</cell><cell>0.586 (0.022)</cell><cell>0.483 (0.016)</cell><cell>0.656 (0.016)</cell></row><row><cell>type II</cell><cell>type I</cell><cell>0.599 (0.025)</cell><cell>0.610 (0.035)</cell><cell>0.646 (0.010)</cell></row><row><cell></cell><cell>type II</cell><cell>0.660 (0.019)</cell><cell>0.536 (0.004)</cell><cell>0.601 (0.004)</cell></row><row><cell></cell><cell>type III</cell><cell>0.518 (0.025)</cell><cell cols="2">0.601 (0.011) 0.686 (0.019)</cell></row><row><cell>type III</cell><cell>type I</cell><cell>0.320 (0.047)</cell><cell>0.551 (0.014)</cell><cell>0.729 (0.015)</cell></row><row><cell></cell><cell>type II</cell><cell>0.401 (0.058)</cell><cell>0.450 (0.021)</cell><cell>0.661 (0.020)</cell></row><row><cell></cell><cell>type III</cell><cell>0.378 (0.052)</cell><cell>0.413 (0.012)</cell><cell>0.618 (0.005)</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_4"><head>Table 3</head><label>3</label><figDesc>Multi-task learning results as F1 averages over three runs (and standard deviation). Training with 3000 instances -1000 from each task.</figDesc><table><row><cell cols="2">train on test on</cell><cell></cell><cell>task</cell><cell></cell></row><row><cell></cell><cell></cell><cell>agreement</cell><cell>Caus</cell><cell>Od</cell></row><row><cell>type I</cell><cell>type I</cell><cell>0.875 (0.031)</cell><cell>0.599 (0.040)</cell><cell>0.749 (0.030)</cell></row><row><cell></cell><cell>type II</cell><cell>0.886 (0.005)</cell><cell>0.425 (0.019)</cell><cell>0.579 (0.037)</cell></row><row><cell></cell><cell>type III</cell><cell>0.815 (0.016)</cell><cell>0.529 (0.020)</cell><cell>0.660 (0.014)</cell></row><row><cell>type II</cell><cell>type I</cell><cell>0.841 (0.024)</cell><cell>0.543 (0.027)</cell><cell>0.651 (0.007)</cell></row><row><cell></cell><cell>type II</cell><cell>0.881 (0.003)</cell><cell>0.486 (0.005)</cell><cell>0.596 (0.010)</cell></row><row><cell></cell><cell>type III</cell><cell>0.814 (0.008)</cell><cell cols="2">0.582 (0.026) 0.685 (0.013)</cell></row><row><cell>type III</cell><cell>type I</cell><cell>0.826 (0.022)</cell><cell cols="2">0.632 (0.023) 0.761 (0.023)</cell></row><row><cell></cell><cell>type II</cell><cell>0.878 (0.005)</cell><cell cols="2">0.557 (0.013) 0.697 (0.009)</cell></row><row><cell></cell><cell>type III</cell><cell>0.874 (0.006)</cell><cell>0.475 (0.010)</cell><cell>0.592 (0.024)</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_5"><head>Table 4</head><label>4</label><figDesc>Single task learning results as F1 averages over three runs (and standard deviation). Training with 2160 instances for Caus and Od for all types, and for agreement 2052 instances for type I (maximum available), and 3000 instances for type II and type III.</figDesc><table /></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">Following BLM formal specifications<ref type="bibr" target="#b1">[2]</ref>, we build the errors representing violations of internal (I ), external (E) and relational (R) rules of the BLM, and their combination (e.g. IE IER, etc.). This information is used in the first part of the error acronym. The second part of the errors' label indicates the structure the sentence represent: intransitive (Int), passive (Pass), Transitive (Trans) or, in some cases, the NP introduced by the da preposition (WrBy).</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_1">Italian Electra (E-It) pretrained model: dbmdz/electra-base-italianxxl-cased-discriminator. Multi-lingual Electra (E-M) model: google/electra-base-discriminator.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_2"><ref type="bibr" target="#b2">3</ref> To simplify the discussion of the method, we write "sentence" instead of "sentence embedding", when discussing the system.</note>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgments</head><p>We gratefully acknowledge the partial support of this work by the Swiss National Science Foundation, through grant SNF Advanced grant TMAG-1_209426 to PM.</p></div>
			</div>

			<div type="annex">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A. Appendix</head><p>A.1. An Italian example for the subject-verb agreement BLM Context 1 Il vaso con il fiore si è rotto. 2 I vasi con il fiore si sono rotti. 3 Il vaso con i fiori si è rotto. 4 I vasi con i fiori si sono rotti. 5 Il vaso con il fiore del giardino si è rotto. 6 I vasi con il fiore del giardino si sono rotti. 7 Il vaso con i fiori del giardino si è rotto. 8 ???</p><p>Answer set 1 Il vaso con i fiori e il giardino si è rotto. coord 2 I vasi con i fiori del giardino si sono rotti. correct 3 Il vaso con il fiore si è rotto. WNA 4 I vasi con il fiore del giardino si sono rotti. WN1 5 I vasi con i fiori dei giardini si sono rotti. WN2 6 Il vaso con il fiore del giardino si sono rotti. AEV 7 Il vaso con i fiori del giardino si sono rotti. AEN1 8 Il vaso con il fiore dei giardini si sono rotti. AEN2 </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A.2. Verb alternation examples</head><p>Caus -Context 1 Una stella del cinema chiuse la sua carriera con forza 2 Una stella del cinema chiuse la sua carriera da pochissimo tempo 3 La sua carriera fu chiusa da una stella del cinema con forza 4 La sua carriera fu chiusa da una stella del cinema da pochissimo tempo 5 La sua carriera fu chiusa con forza 6 La sua carriera fu chiusa da pochissimo tempo 7 La sua carriera si chiuse con forza 8 ???</p><p>Caus -Answers 1 La sua carriera si chiuse da pochissimo tempo 2 Una stella del cinema si chiuse da pochissimo tempo 3 La sua carriera fu chiusa da una stella del cinema 4 Una stella del cinema fu chiusa dalla sua carriera 5 La sua carriera chiuse una stella del cinema 6 Una stella del cinema chiuse la sua carriera 7 La sua carriera si chiuse da una stella del cinema 8 Una stella del cinema si chiuse dalla sua carriera Od, typeI -Context 1 La turista mangia una carbonara in un secondo 2 La turista mangia una carbonara da mezz'ora 3 Una carbonara è mangiata dalla turista in un secondo 4 Una carbonara è mangiata dalla turista da mezz'ora 5 Una carbonara è mangiata in un secondo 6 Una carbonara è mangiata da mezz'ora 7 La turista mangia in un secondo 8 ???</p><p>Od, typeI -Answers 1 Una carbonara mangia da mezz'ora 2 La turista mangia da mezz'ora 3 Una carbonara è mangiata dalla turista 4 La turista è mangiata da una carbonara 5 Una carbonara mangia la turista 6 La turista mangia una carbonara 7 Una carbonara mangia dalla turista 8 La turista mangia da una carbonara Od, typeII -Context 1 La zia mangia una bistecca nella sala grande 2 La presidente può mangiare una bistecca da programma 3 La specialità della casa deve essere mangiata dalla turista nella sala grande 4 Una bistecca fu mangiata dalla presidente da sola 5 La specialità della casa deve essere mangiata in un secondo 6 Una bistecca deve poter essere mangiata da sola 7 La turista deve mangiare con fame 8 ???</p><p>Od, typeII -Answers 1 La specialità della casa può mangiare da sola 2 La squadra di calcio deve mangiare da mezz'ora 3 Una bistecca è mangiata dalla turista 4 La squadra di calcio può essere mangiata da una carbonara 5 La pasta col pomodoro può mangiare la squadra di calcio 6 La squadra di calcio mangia una bistecca 7 La specialità della casa deve poter mangiare dalla turista 8 La presidente mangia da una bistecca Od, typeIII -Context 1 L'attore deve canticchiare un motivetto dopo il festival 2 L'amica di mia mamma deve cucire la tasca da qualche giorno 3 L'inno nazionale può essere cantato dal vincitore del festival con solo pianoforte 4 Una bistecca deve essere mangiata dalla turista da sola 5 Il manuale è insegnato nell'aula magna 6 Questi attrezzi devono essere intagliati da manuale 7 I due fratelli studiano con molta attenzione 8 ???</p><p>Od, typeIII -Answers 1 La pasta frolla deve impastare da sola 2 L'autrice deve poter scrivere da qualche giorno 3 I libri di testo devono poter essere studiati dai candidati 4 Questi stilisti devono poter essere tessuti dai vestiti per la parata 5 Questi motivi greci possono tessere questi stilisti 6 L'idraulico saldò i cavi del lampadario 7 La stanza pulisce da una delle propretarie dell'albergo 8 Le sommozzatrici pescarono da delle trote</p></div>			</div>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<monogr>
		<author>
			<persName><forename type="first">S</forename><surname>Ruder</surname></persName>
		</author>
		<ptr target="http://www.ruder.io/nlp-benchmarking" />
		<title level="m">Challenges and Opportunities in NLP Benchmarking</title>
				<imprint>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<monogr>
		<title level="m" type="main">Blackbird language matrices (BLM), a new task for rule-like generalization in neural networks: Motivations and formal specifications</title>
		<author>
			<persName><forename type="first">P</forename><surname>Merlo</surname></persName>
		</author>
		<idno type="DOI">10.48550/arXiv.2306.11444</idno>
		<idno>ArXiv cs.CL 2306.11444</idno>
		<ptr target="https://doi.org/10.48550/arXiv.2306.11444.doi:10.48550/arXiv.2306.11444" />
		<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Standardization of progressive matrices</title>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">C</forename><surname>Raven</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">British Journal of Medical Psychology</title>
		<imprint>
			<biblScope unit="volume">19</biblScope>
			<biblScope unit="page" from="137" to="150" />
			<date type="published" when="1938">1938</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">BLM-AgrF: A new French benchmark to investigate generalization of agreement in neural networks</title>
		<author>
			<persName><forename type="first">A</forename><surname>An</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Jiang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">A</forename><surname>Rodriguez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Nastase</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Merlo</surname></persName>
		</author>
		<ptr target="https://aclanthology.org/2023.eacl-main.99" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, Association for Computational Linguistics</title>
				<meeting>the 17th Conference of the European Chapter of the Association for Computational Linguistics, Association for Computational Linguistics<address><addrLine>Dubrovnik, Croatia</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2023">2023</date>
			<biblScope unit="page" from="1363" to="1374" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Grammatical information in BERT sentence embeddings as two-dimensional arrays</title>
		<author>
			<persName><forename type="first">V</forename><surname>Nastase</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Merlo</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2023.repl4nlp-1.3</idno>
		<ptr target="https://aclanthology.org/2023.repl4nlp-1.3.doi:10.18653/v1/2023.repl4nlp-1.3" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 8th Workshop on Representation Learning for NLP (RepL4NLP 2023), Association for Computational Linguistics</title>
				<editor>
			<persName><forename type="first">B</forename><surname>Can</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">M</forename><surname>Mozes</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">S</forename><surname>Cahyawijaya</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">N</forename><surname>Saphra</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">N</forename><surname>Kassner</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">S</forename><surname>Ravfogel</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">A</forename><surname>Ravichander</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">C</forename><surname>Zhao</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">I</forename><surname>Augenstein</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">A</forename><surname>Rogers</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">K</forename><surname>Cho</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">E</forename><surname>Grefenstette</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">L</forename><surname>Voita</surname></persName>
		</editor>
		<meeting>the 8th Workshop on Representation Learning for NLP (RepL4NLP 2023), Association for Computational Linguistics<address><addrLine>Toronto, Canada</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2023">2023</date>
			<biblScope unit="page" from="22" to="39" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Are there identifiable structural parts in the sentence embedding whole?</title>
		<author>
			<persName><forename type="first">V</forename><surname>Nastase</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Merlo</surname></persName>
		</author>
		<ptr target="https://aclanthology.org/2024.blackboxnlp-1.3" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 7th BlackboxNLP Workshop: Analyzing and Interpreting Neural Networks for NLP</title>
				<editor>
			<persName><forename type="first">Y</forename><surname>Belinkov</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">N</forename><surname>Kim</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">J</forename><surname>Jumelet</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">H</forename><surname>Mohebbi</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">A</forename><surname>Mueller</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">H</forename><surname>Chen</surname></persName>
		</editor>
		<meeting>the 7th BlackboxNLP Workshop: Analyzing and Interpreting Neural Networks for NLP<address><addrLine>Miami, Florida, US</addrLine></address></meeting>
		<imprint>
			<publisher>Association for Computational Linguistics</publisher>
			<date type="published" when="2024">2024</date>
			<biblScope unit="page" from="23" to="42" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Understanding natural language understanding systems, Sistemi intelligenti</title>
		<author>
			<persName><forename type="first">A</forename><surname>Lenci</surname></persName>
		</author>
		<idno type="DOI">10.1422/107438</idno>
		<ptr target="https://www.rivisteweb.it/doi/10.1422/107438.doi:10.1422/107438" />
	</analytic>
	<monogr>
		<title level="m">Rivista quadrimestrale di scienze cognitive e di intelligenza artificiale</title>
				<imprint>
			<date type="published" when="2023">2023</date>
			<biblScope unit="page" from="277" to="302" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">What one intelligence test measures: A theoretical account of the processing in the Raven Progressive Matrices Test</title>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">A</forename><surname>Carpenter</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">A</forename><surname>Just</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Shell</surname></persName>
		</author>
		<idno type="DOI">10.1037/0033-295X.97.3.404</idno>
	</analytic>
	<monogr>
		<title level="j">Psychological Review</title>
		<imprint>
			<biblScope unit="volume">97</biblScope>
			<biblScope unit="page" from="404" to="431" />
			<date type="published" when="1990">1990</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">A survey of multi-task learning in natural language processing: Regarding task relatedness and training methods</title>
		<author>
			<persName><forename type="first">Z</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Yu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Yu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Guo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Jiang</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2023.eacl-main.66</idno>
		<ptr target="https://aclanthology.org/2023.eacl-main.66.doi:10.18653/v1/2023.eacl-main.66" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, Association for Computational Linguistics</title>
				<editor>
			<persName><forename type="first">A</forename><surname>Vlachos</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">I</forename><surname>Augenstein</surname></persName>
		</editor>
		<meeting>the 17th Conference of the European Chapter of the Association for Computational Linguistics, Association for Computational Linguistics<address><addrLine>Dubrovnik, Croatia</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2023">2023</date>
			<biblScope unit="page" from="943" to="956" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Multi-task learning in natural language processing: An overview</title>
		<author>
			<persName><forename type="first">S</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><surname>Yang</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">ACM Computing Surveys</title>
		<imprint>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<monogr>
		<author>
			<persName><forename type="first">S</forename><surname>Ruder</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1706.05098</idno>
		<title level="m">An overview of multi-task learning in deep neural networks</title>
				<imprint>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b11">
	<monogr>
		<author>
			<persName><forename type="first">J</forename><surname>Pfeiffer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Ruder</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Vulić</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">M</forename><surname>Ponti</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2302.11529</idno>
		<title level="m">Modular deep learning</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Which tasks should be learned together in multi-task learning?</title>
		<author>
			<persName><forename type="first">T</forename><surname>Standley</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Zamir</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Guibas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Malik</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Savarese</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International conference on machine learning</title>
				<meeting><address><addrLine>PMLR</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="9120" to="9132" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">An end-to-end progressive multi-task learning framework for medical named entity recognition and normalization</title>
		<author>
			<persName><forename type="first">B</forename><surname>Zhou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Cai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Yuan</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2021.acl-long.485</idno>
		<ptr target="https://aclanthology.org/2021.acl-long.485.doi:10.18653/v1/2021.acl-long.485" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing</title>
				<editor>
			<persName><forename type="first">C</forename><surname>Zong</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">F</forename><surname>Xia</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">W</forename><surname>Li</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">R</forename><surname>Navigli</surname></persName>
		</editor>
		<meeting>the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing</meeting>
		<imprint>
			<date type="published" when="2021">2021</date>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page" from="6214" to="6224" />
		</imprint>
	</monogr>
	<note>: Long Papers), Association for Computational Linguistics</note>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">MOCHA: A multi-task training approach for coherent text generation from cognitive perspective</title>
		<author>
			<persName><forename type="first">Z</forename><surname>Hu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><forename type="middle">P</forename><surname>Chan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Huang</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2022.emnlp-main.705</idno>
		<ptr target="https://aclanthology.org/2022.emnlp-main.705.doi:10.18653/v1/2022.emnlp-main.705" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics</title>
				<editor>
			<persName><forename type="first">Y</forename><surname>Goldberg</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Z</forename><surname>Kozareva</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Y</forename><surname>Zhang</surname></persName>
		</editor>
		<meeting>the 2022 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics<address><addrLine>Abu Dhabi, United Arab Emirates</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2022">2022</date>
			<biblScope unit="page" from="10324" to="10334" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Multi-task learning for multilingual neural machine translation</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Zhai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Hassan</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2020.emnlp-main.75</idno>
		<ptr target="https://aclanthology.org/2020.emnlp-main.75.doi:10.18653/v1/2020.emnlp-main.75" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Association for Computational Linguistics</title>
				<editor>
			<persName><forename type="first">B</forename><surname>Webber</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">T</forename><surname>Cohn</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Y</forename><surname>He</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Y</forename><surname>Liu</surname></persName>
		</editor>
		<meeting>the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Association for Computational Linguistics</meeting>
		<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="1022" to="1034" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">BERT: Pre-training of deep bidirectional transformers for language understanding</title>
		<author>
			<persName><forename type="first">J</forename><surname>Devlin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M.-W</forename><surname>Chang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Toutanova</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/N19-1423</idno>
		<ptr target="https://aclanthology.org/N19-1423.doi:10.18653/v1/N19-1423" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</title>
		<title level="s">Long and Short Papers</title>
		<meeting>the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies<address><addrLine>Minneapolis, Minnesota</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page" from="4171" to="4186" />
		</imprint>
	</monogr>
	<note>Association for Computational Linguistics</note>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">Electra: Pre-training text encoders as discriminators rather than generators</title>
		<author>
			<persName><forename type="first">K</forename><surname>Clark</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M.-T</forename><surname>Luong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><forename type="middle">V</forename><surname>Le</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">D</forename><surname>Manning</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">ICLR</title>
		<imprint>
			<biblScope unit="page" from="1" to="18" />
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<monogr>
		<title level="m" type="main">English Verb Classes and Alternations A Preliminary Investigation</title>
		<author>
			<persName><forename type="first">B</forename><surname>Levin</surname></persName>
		</author>
		<imprint>
			<date type="published" when="1993">1993</date>
			<publisher>University of Chicago Press</publisher>
			<pubPlace>Chicago and London</pubPlace>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">Automatic verb classification based on statistical distributions of argument structure</title>
		<author>
			<persName><forename type="first">P</forename><surname>Merlo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Stevenson</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Computational Linguistics</title>
		<imprint>
			<biblScope unit="volume">27</biblScope>
			<biblScope unit="page" from="373" to="408" />
			<date type="published" when="2001">2001</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">What one intelligence test measures: a theoretical account of the processing in the raven progressive matrices test</title>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">A</forename><surname>Carpenter</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">A</forename><surname>Just</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Shell</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Psychological review</title>
		<imprint>
			<biblScope unit="volume">97</biblScope>
			<biblScope unit="page">404</biblScope>
			<date type="published" when="1990">1990</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<analytic>
		<title level="a" type="main">Verb argument structure alternations in word and sentence embeddings</title>
		<author>
			<persName><forename type="first">K</forename><surname>Kann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Warstadt</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Williams</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">R</forename><surname>Bowman</surname></persName>
		</author>
		<idno type="DOI">10.7275/q5js-4y86</idno>
		<ptr target="https://aclanthology.org/W19-0129.doi:10.7275/q5js-4y86" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Society for Computation in Linguistics (SCiL) 2019</title>
				<meeting>the Society for Computation in Linguistics (SCiL) 2019</meeting>
		<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="287" to="297" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
