<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="it">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Exploring syntactic information in sentence embeddings through multilingual subject-verb agreement</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Vivi</forename><surname>Nastase</surname></persName>
							<email>nastase@gmail.com</email>
							<affiliation key="aff0">
								<orgName type="institution">Idiap Research Institute</orgName>
								<address>
									<settlement>Martigny</settlement>
									<country key="CH">Switzerland</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Chunyang</forename><surname>Jiang</surname></persName>
							<email>chunyang.jiang42@gmail.com</email>
							<affiliation key="aff0">
								<orgName type="institution">Idiap Research Institute</orgName>
								<address>
									<settlement>Martigny</settlement>
									<country key="CH">Switzerland</country>
								</address>
							</affiliation>
							<affiliation key="aff1">
								<orgName type="institution">University of Geneva</orgName>
								<address>
									<settlement>Geneva</settlement>
									<country key="CH">Switzerland</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Giuseppe</forename><surname>Samo</surname></persName>
							<email>giuseppe.samo@idiap.ch</email>
							<affiliation key="aff0">
								<orgName type="institution">Idiap Research Institute</orgName>
								<address>
									<settlement>Martigny</settlement>
									<country key="CH">Switzerland</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Paola</forename><surname>Merlo</surname></persName>
							<email>paola.merlo@unige.ch</email>
							<affiliation key="aff0">
								<orgName type="institution">Idiap Research Institute</orgName>
								<address>
									<settlement>Martigny</settlement>
									<country key="CH">Switzerland</country>
								</address>
							</affiliation>
							<affiliation key="aff1">
								<orgName type="institution">University of Geneva</orgName>
								<address>
									<settlement>Geneva</settlement>
									<country key="CH">Switzerland</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Exploring syntactic information in sentence embeddings through multilingual subject-verb agreement</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">C8A3E542D00EC21D66316FAED0131086</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T17:33+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>syntactic information</term>
					<term>synthetic structured data</term>
					<term>multi-lingual</term>
					<term>cross-lingual</term>
					<term>diagnostic studies of deep learning models</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>In this paper, our goal is to investigate to what degree multilingual pretrained language models capture cross-linguistically valid abstract linguistic representations. We take the approach of developing curated synthetic data on a large scale, with specific properties, and using them to study sentence representations built using pretrained language models. We use a new multiple-choice task and datasets, Blackbird Language Matrices (BLMs), to focus on a specific grammatical structural phenomenon -subject-verb agreement across a variety of sentence structures -in several languages. Finding a solution to this task requires a system detecting complex linguistic patterns and paradigms in text representations. Using a two-level architecture that solves the problem in two steps -detect syntactic objects and their properties in individual sentences, and find patterns across an input sequence of sentences -we show that despite having been trained on multilingual texts in a consistent manner, multilingual pretrained language models have language-specific differences, and syntactic structure is not shared, even across closely related languages.</p><p>Questo lavoro chiede se i modelli linguistici multilingue preaddestrati catturino rappresentazioni linguistiche astratte valide attraverso svariate lingue. Il nostro approccio sviluppa dati sintetici curati su larga scala, con proprietà specifiche, e li utilizza per studiare le rappresentazioni di frasi costruite con modelli linguistici preaddestrati. Utilizziamo un nuovo task a scelta multipla e i dati afferenti, le Blackbird Language Matrices (BLM), per concentrarci su uno specifico fenomeno strutturale grammaticale -l'accordo tra il soggetto e il verbo -in diverse lingue. Per trovare la soluzione corretta a questo task è necessario un sistema che rilevi modelli e paradigmi linguistici complessi nelle rappresentazioni testuali. Utilizzando un'architettura a due livelli che risolve il problema in due fasi -prima impara gli oggetti sintattici e le loro proprietà nelle singole frasi e poi ne ricava gli elementi comuni -dimostriamo che, nonostante siano stati addestrati su testi multilingue in modo coerente, i modelli linguistici multilingue preaddestrati presentano differenze specifiche per ogni lingua e inoltre la struttura sintattica non è condivisa, nemmeno tra lingue tipologicamente molto vicine.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="it">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>Large language models, trained on huge amount of texts, have reached a level of performance that rivals human capabilities on a range of established benchmarks <ref type="bibr" target="#b0">[1]</ref>. Despite high performance on high-level language processing tasks, it is not yet clear what kind of information these language models encode, and how. For example, transformer-based pretrained models have shown excellent performance in tasks that seem to require that the model encodes syntactic information <ref type="bibr" target="#b1">[2]</ref>.</p><p>All the knowledge that the LLMs encode comes from unstructured texts and the shallow regularities they are very good at detecting, and which they are able to leverage into information that correlates to higher structures in language. Most notably, <ref type="bibr" target="#b2">[3]</ref> have shown that from the unstructured textual input, BERT <ref type="bibr" target="#b3">[4]</ref> is able to infer POS, structural, entity-related, syntactic and semantic information at successively higher layers of the architecture, mirroring the classical NLP pipeline <ref type="bibr" target="#b4">[5]</ref>. We ask: How is this information encoded in the output layer of the model, i.e. the embeddings? Does it rely on surface information -such as inflections, function words -and is assembled on the demands of the task/probes <ref type="bibr" target="#b5">[6]</ref>, or does it indeed reflect something deeper that the language model has assembled through the progressive transformation of the input through its many layers?</p><p>To investigate this question, we use a seemingly simple task -subject-verb agreement. Subject-verb agreement is often used to test the syntactic abilities of deep neural networks <ref type="bibr" target="#b6">[7,</ref><ref type="bibr" target="#b7">8,</ref><ref type="bibr" target="#b8">9,</ref><ref type="bibr" target="#b9">10]</ref>, because, while apparently simple and linear, it is in fact structurally, and theoretically, complex, and requires connecting the subject and the verb across arbitrarily long or complex structural distance. It has an added useful dimension -it relies on syntactic structure and grammatical number information that many languages share.</p><p>In previous work we have shown that simple struc-tural information -the chunk structure of a sentencewhich can be leveraged to determine subject-verb agreement, or to contribute towards more semantic tasks, can be detected in the sentence embeddings obtained from a pre-trained model <ref type="bibr" target="#b10">[11]</ref>. This result, though, does not cast light on whether the discovered structure is deeper and more abstract, or it is rather just a reflection of surface indicators, such as function words or morphological markers.</p><p>To tease apart these two options, we set up an experiment covering four languages: English, French, Italian and Romanian. These languages, while different, have shared properties that make sharing of syntactic structure a reasonable expectation, if the pretrained multilingual model does indeed discover and encode syntactic structure. We use parallel datasets in the four languages, built by (approximately) translating the BLM-AgrF dataset <ref type="bibr" target="#b11">[12]</ref>, a multiple-choice linguistic test inspired from the Raven Progressive Matrices visual intelligence test, previously used to explore subject-verb agreement in French.</p><p>Our work offers two contributions: (i) four parallel datasets -on English, French, Italian and Romanian, focused on subject-verb agreement; (ii) cross-lingual and multilingual testing of a multilingual pretrained model, to explore the degree to which syntactic structure information is shared across different languages. Our crosslingual and multilingual experiments show poor transfer across languages, even those most related, like Italian and French. This result indicates that pretrained models encode syntactic information based on shallow and language-specific clues, from which they are not yet able to take the step towards abstracting grammatical structure. The datasets are available at https://www.idiap.ch /dataset/(blm-agre|blm-agrf|blm-agri|blm_agrr) and the code at https://github.com/CLCL-Geneva/BLM-SNFDise ntangling.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">BLM task and BLM-Agr datasets</head><p>Inspired by existing IQ tests -Raven's progressive matrices (RPMs)-we have developed a framework, called Blackbird Language Matrices (BLMs) <ref type="bibr" target="#b12">[13]</ref> and several datasets <ref type="bibr" target="#b11">[12,</ref><ref type="bibr" target="#b13">14]</ref>. RPMs consist of a sequence of images, called the context, connected in a logical sequence by underlying generative rules <ref type="bibr" target="#b14">[15]</ref>. The task is to determine the missing element in this visual sequence, the answer. The candidate answers are constructed to be similar enough that the solution can be found only if the rules are identified correctly.</p><p>Solving an RPM problem is usually done in two steps: (i) identify the relevant objects and their attributes; (ii) decompose the main problem into subproblems, based on object and attribute identification, in a way that allows detecting the global pattern or underlying rules <ref type="bibr" target="#b15">[16]</ref>. Such an approach can be very useful for probing language models, as it allows to test whether they indeed detect the relevant linguistic objects and their properties, and whether (or to what degree) they use this information to find larger patterns. We have developed BLMs as a linguistic test. Figure <ref type="figure">1</ref> illustrates the template of a BLM subject-verb agreement matrix, with the different linguistic objects -chunks/phrases -and their relevant properties, in this case grammatical number. Examples in all languages under investigation are provided in Appendix B.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>BLM-Agr datasets</head><p>A BLM problem for subject-verb agreement consists of a context set of seven sentences that share the subject-verb agreement phenomenon, but differ in other aspects -e.g. number of linearly intervening noun phrases between the subject and the verb (called attractors because they can interfere with the agreement), different grammatical numbers for these attractors, and different clause structures. The sequence is generated by a rule of progression of number of attractors, and alternation in the grammatical number of the different phrases. Each context is paired with a set of candidate answers generated from the correct answer by altering it to produce minimally contrastive error types. We have two types of errors (see Figure <ref type="figure">1:</ref> (i) sequence errorsthese candidate answers are grammatically correct, but they are not the correct continuation of the sequence; (ii) agreement errors -these candidate answers are gram-matically erroneous, because the verb is in agreement with one of the intervening attractors. By constructing candidate answers with such specific error types, we can investigate the kind of information and structure learned.</p><p>The seed data for French was created by manually completing data previously published data <ref type="bibr" target="#b16">[17]</ref>. From this initial data, we generated a dataset that comprises three subsets of increasing lexical complexity (details in <ref type="bibr" target="#b11">[12]</ref>): Types I, II, III, corresponding to different amounts of lexical variation within a problem instance. Each subset contains three clause structures uniformly distributed within the data. The dataset used here is a variation of the BLM-AgrF <ref type="bibr" target="#b11">[12]</ref> that separates sequence-based from other types of errors, to be able to perform deeper analyses into the behaviour of pretrained language models.</p><p>The datasets in English, Italian and Romanian were created by manually translating the seed French sentences into the other languages by native (Italian and Romanian) and near-native (English) speakers. The internal structure in these languages is very similar, so translations are approximately parallel. The differences lie in the treatment of preposition and determiner sequences that must be conflated into one word in some cases in Italian and French, but not in English. French and Italian use numberspecific determiners and inflections, while Romanian and English encode grammatical number exclusively through inflections. In English most plural forms are marked by a suffix. Romanian has more variation, and noun inflections also encode case. Determiners are separate tokens, which are overt indicators of grammatical number and of phrase boundaries, whereas inflections may or may not be tokenized separately.</p><p>Table <ref type="table" target="#tab_1">1</ref> shows the datasets statistics for the four BLM problems. After splitting each subset 90:10 into train:test subsets, we randomly sample 2000 instances as train data. 20% of the train data is used for development. A sentence dataset From the seed files for each language we build a dataset to study sentence structure independently of a task. The seed files contain noun, verb and prepositional phrases, with singular and plural variations. From these chunks, we build sentences with all (grammatically correct) combinations of np [pp1 [pp2]] vp<ref type="foot" target="#foot_0">1</ref> . For each chunk pattern 𝑝 of the 14 pos-sibilities (e.g., 𝑝 = "np-s pp1-s vp-s"), all corresponding sentences are collected into a set 𝑆𝑝.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>English French Italian Romanian</head><p>The dataset consists of triples (𝑖𝑛, 𝑜𝑢𝑡 + , 𝑂𝑢𝑡 − ), where 𝑖𝑛 is an input sentence, 𝑜𝑢𝑡 + is the correct outputa sentence different from 𝑖𝑛 but with the same chunk pattern. 𝑂𝑢𝑡 − are 𝑁𝑛𝑒𝑔𝑠 = 7 incorrect outputs, randomly chosen from the sentences that have a chunk pattern different from 𝑖𝑛. For each language, we sample uniformly approx. 4000 instances from the generated data based on the pattern of the input sentence, randomly split 80:20 into train:test. The train part is split 80:20 into train:dev, resulting in a 2576:630:798 split for train:dev:test.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Probing the encoding of syntax</head><p>We aim to test whether the syntactic information detected in multilingual pretrained sentence embeddings is based on shallow, language-specific clues, or whether it is more abstract structural information. Using the subject-verb agreement task and the parallel datasets in four languages provides clues to the answer.</p><p>The datasets all share sentences with the same syntactic structures, as illustrated in Figure <ref type="figure">1</ref>. However, there are language specific differences, as in the structure of the chunks (noun or verb or prepositional phrases) and each language has different ways to encode grammatical number (see section 2).</p><p>If the grammatical information in the sentences in our dataset -i.e. the sequences of chunks with specific properties relevant to the subject-verb agreement task (Figure <ref type="figure">1</ref>) -is an abstract form of knowledge within the pretrained model, it will be shared across languages. We would then see a high level of performance for a model trained on one of these languages, and tested on any of the other. Additionally, when training on a dataset consisting of data in the four languages, the model should detect a shared parameter space that would lead to high results when testing on data for each language.</p><p>If however the grammatical information is a reflection of shallow language indicators, we expect to see higher performance on languages that have overt grammatical number and chunk indicators, such as French and Italian, and a low rate of cross-language transfer.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">System architectures</head><p>A sentence-level VAE To test whether chunk structure can be detected in sentence embeddings we use a VAE-like system, which encodes a sentence, and decodes a different sentence with the same chunk structure, using a set of contrastive negative examples -sentences that have different chunk structures from the input -to encourage the latent to encode the chunk structure.</p><p>The architecture of the sentence-level VAE is similar to a previously proposed system <ref type="bibr" target="#b17">[18]</ref>: the encoder consists of a CNN layer with a 15x15 kernel, which is applied to a 32x24-shaped sentence embedding, followed by a linear layer that compresses the output of the CNN into a latent layer of size 5. The decoder mirrors the encoder.</p><p>An instance consists of a triple (𝑖𝑛, 𝑜𝑢𝑡 + , 𝑂𝑢𝑡 − ), where 𝑖𝑛 is an input sentence with embedding 𝑒𝑖𝑛 and chunk structure 𝑝, 𝑜𝑢𝑡 + is a sentence with embedding 𝑒 𝑜𝑢𝑡 + with same chunk structure 𝑝, and 𝑂𝑢𝑡 − = {𝑠 𝑘 |𝑘 = 1, 𝑁𝑛𝑒𝑔𝑠} is a set of 𝑁𝑛𝑒𝑔𝑠 = 7 sentences with embeddings 𝑒𝑠 𝑘 , each with chunk pattern different from 𝑝 (and different from each other). The input 𝑒𝑖𝑛 is encoded into latent representation 𝑧𝑖, from which we sample a vector 𝑧 ˜𝑖, which is decoded into the output 𝑒 ˆ𝑖𝑛. To encourage the latent to encode the structure of the input sentence we use a max-margin loss function, to push for a higher similarity score for 𝑒 ˆ𝑖𝑛 with the sentence that has the same chunk pattern as the input (𝑒 𝑜𝑢𝑡 + ) than the ones that do not. At prediction time, the sentence from the {𝑜𝑢𝑡 + } ∪ 𝑂𝑢𝑡 − options that has the highest score relative to the decoded answer is taken as correct.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Two-level VAE for BLMs</head><p>We use a two-level system illustrated in Figure <ref type="figure" target="#fig_0">2</ref>, which separates the solving of the BLM task on subject-verb agreement into two steps: (i) compress sentence embeddings into a representation that captures the sentence chunk structure and the relevant chunk properties (on the sentence level) (ii) use the compressed sentence representations to solve the BLM agreement problems, by detecting the pattern across the sequence of structures (on the task level). This architecture will allow us to test whether sentence structure -in terms of chunks -is shared across languages in a pretrained multilingual model. All reported experiments use Electra <ref type="bibr" target="#b18">[19]</ref> 2 , with the sentence representations the embedding of the [CLS] token (details in <ref type="bibr" target="#b10">[11]</ref>).</p><p>An instance for a BLM problem consists of an ordered context sequence 𝑆 of sentences, 𝑆 = {𝑠𝑖|𝑖 = 1, 7} as input, and an answer set 𝐴 with one correct answer 𝑎𝑐, 2 Electra pretrained model: google/electra-base-discriminator and several incorrect answers 𝑎𝑒𝑟𝑟. Every sentence is embedded using the pretrained model. To simplify the discussion, in the sections that follows, when we say sentence we actually mean its embedding.</p><p>The two-level VAE system takes a BLM instance as input, decomposes its context sequence 𝑆 into sentences and passes them individually as input to the sentencelevel VAE. For each sentence 𝑠𝑖 ∈ 𝑆, the system builds on-the-fly the candidate answers for the sentence level: the same sentence 𝑠𝑖 from input is used as the correct output, and a random selection of sentences from 𝑆 are the negative answers. After an instance is processed by the sentence level, for each sentence 𝑠𝑖 ∈ 𝑆, we obtain its representation from the latent layer 𝑙𝑠 𝑖 , and reassemble the input sequence as 𝑆 𝑙 = 𝑠𝑡𝑎𝑐𝑘[𝑙𝑠 𝑖 ], and pass it as input to the task-level VAE. The loss function combines the losses on the two levels -a max-margin loss on the sentence level that contrasts the sentence reconstructed on the sentence level with the correct answer and the erroneous ones, and a max-margin loss on the task level that contrasts the answer constructed by the decoder with the answer set of the BLM instance (details in <ref type="bibr" target="#b10">[11]</ref>).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">Experiments</head><p>To explore how syntactic information -in particular chunk structure -is encoded, we perform cross-language and multi-language experiments, using first the sentences dataset, and then the BLM agreement task. We report F1 averages over three runs.</p><p>Cross-lingual experiments -train on data from one language, test on all the others -show whether patterns detected in sentence embeddings that encode chunk structure are transferable across languages. The results on testing on the same language as the training provide support for the experimental set-up -the high results show that the pretrained language model used does encode the necessary information, and the system architecture is adequate to distill it.</p><p>The multilingual experiments, where we learn a model from data in all the languages, will provide additional clues -if the performance on testing on individual languages is comparable to when training on each language alone, it means some information is shared across languages and can be beneficial.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.1.">Syntactic structure in sentences</head><p>We use only the sentence level of the system illustrated in Figure <ref type="figure" target="#fig_0">2</ref> to explore chunk structure in sentences, using the data described in Section 2. For the cross-lingual experiments, the training dataset for each language is used to train a model that is then tested on each test set. For the multilingual setup, we assemble a common training data from the training data for all languages.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.2.">Solving the BLM agreement task</head><p>We solve the BLM agreement task using the two-level system, where a compacted sentence representation learned on the sentence level should help detect patterns in the input sequence of a BLM instance. Because the datasets are parallel, with shared sentence and sequence patterns, we test whether the added learning signal from the task level can help push the system to learn to map an input sentence into a representation that captures structure shared across languages. We perform cross-lingual experiments, where a model is trained on data from one language, and tested on all the test sets, and a multilingual experiment, where for each type I/II/III data, we assemble a training dataset from the training sets of the same type from the other languages. The model is then tested on the separate test sets.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3.">Evaluation</head><p>For each training set we build three models, and plot the average F1 score. The standard deviation is very small, so we do not include it in the plot, but it is reported in the results Tables in Appendix C.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Results</head><p>Structure in sentences Figure <ref type="figure" target="#fig_1">3</ref> shows the results for the experiments on detecting chunk structure in sentence embeddings, in cross-lingual and multilingual training setups, for comparison (detailed results in Table <ref type="table" target="#tab_3">3</ref>). Two observations are relevant to our investigation: (i) while training and testing on the same language leads to good performance -indicating that Electra sentence embeddings do contain relevant information about chunks, and that the system does detect the chunk pattern in these representations -there is very little transfer effect. A slight effect is detected for the model learned on Italian and tested on French; (ii) learning using multilingual training data leads to a deterioration of the performance, compared to learning in a monolingual setting. This again indicates that the system could not detect a shared parameter space for the information that is being learned, the chunk structure, and thus this information is encoded differently in the languages under study.  <ref type="figure">, "</ref>x" for French, "+" for Italian, "*" for Romanian. We note that while representations cluster by the pattern, the clusters for different languages are disjoint.</p><p>An additional interesting insight comes from the analysis of the latent layer representations. Figure <ref type="figure" target="#fig_2">4</ref> shows the tSNE projection of the latent representations of the sentences in the training data after multilingual training. Different colours show different chunk patterns, and different markers show different languages. Had the information encoding syntactic structure been shared, the clusters for the same pattern in the different languages would overlap. Instead, we note that each language seems to have its own quite separate pattern clusters.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Structure in sentences for the BLM agreement task</head><p>When the sentence structure detection is embedded in the system for solving the BLM agreement task, where an additional supervision signals comes from the task, we note a similar result as when processing the sentences individually. Figure <ref type="figure" target="#fig_3">5</ref> shows the results for the multilingual and monolingual training setups for the type I data. Complete results are in Tables 4-5 in the appendix.</p><p>Discussion and related work Pretrained language models are learned from shallow cooccurrences through a lexical prediction task. The input information is transformed through several transformer layers, various parts boosting each other through self-attention. Analysis of the architecture of transformer models, like BERT <ref type="bibr" target="#b3">[4]</ref>, have localised and followed the flow of specific types of linguistic information through the system <ref type="bibr" target="#b19">[20,</ref><ref type="bibr" target="#b2">3]</ref>, to the degree that the classical NLP pipeline seems to be reflected in the succession of the model's layers. Analysis of contextualized token embeddings shows that they can encode specific linguistic information, such as sentence structure <ref type="bibr" target="#b20">[21]</ref> (including in a multilingual set-up <ref type="bibr" target="#b21">[22]</ref>), predicate argument structure <ref type="bibr" target="#b22">[23]</ref>, subjecthood and objecthood <ref type="bibr" target="#b23">[24]</ref>, among others. Sentence embeddings have also been probed using classifiers, and determined to encode specific types of linguistic information, such as subject-verb agreement <ref type="bibr" target="#b8">[9]</ref>, word order, tree depth, constituent information <ref type="bibr" target="#b24">[25]</ref>, auxiliaries <ref type="bibr" target="#b25">[26]</ref> and argument structure <ref type="bibr" target="#b26">[27]</ref>.</p><p>Generative models like LLAMA seem to use English as the latent language in the middle layers <ref type="bibr" target="#b27">[28]</ref>, while other analyses of internal model parameters has lead to uncovering language agnostic and language specific networks of parameters <ref type="bibr" target="#b29">[29]</ref>, or neurons encoding cross-language number agreement information across several internal layers <ref type="bibr" target="#b30">[30]</ref>. It has also been shown that subject-verb agreement information is not shared by BiLSTM models <ref type="bibr" target="#b31">[31]</ref> or multilingual BERT <ref type="bibr" target="#b32">[32]</ref>. Testing the degree to which word/sentence embeddings are multilingual has usually been done using a classification probe, for tasks like NER, POS tagging <ref type="bibr" target="#b33">[33]</ref>, language identification <ref type="bibr" target="#b34">[34]</ref>, or more complex tasks like question answering and sentence retrieval <ref type="bibr" target="#b35">[35]</ref>. There are contradictory results on various cross-lingual model transfers, some of which can be explained by factors such as domain and size of training data, typological closeness of languages <ref type="bibr" target="#b36">[36]</ref>, or by the power of the classification probes. Generative or classification probes do not provide insights into whether the pretrained model finds deeper regularities and encodes abstract structures, or the predictions are based on shallower features that the probe used assembles for the specific test it is used for <ref type="bibr" target="#b37">[37,</ref><ref type="bibr" target="#b5">6]</ref>.</p><p>We aimed to answer this question by using a multilingual setup, and a simple syntactic structure detection task in an indirectly supervised setting. The datasets used -in English, French, Italian and Romanian -are (approximately) lexically parallel, and are parallel in syntactic structure. The property of interest is grammatical number, and the task is subject-verb agreement. The languages chosen share commonalities -French, Italian and Romanian are all Romance languages, English and French share much lexical material -but there are also differences: French and Italian use a similar manner to encode grammatical number, mainly through articles that can also signal phrase boundaries. English has a very limited form of nominal plural morphology, but determiners are useful for signaling phrase boundaries. In Romanian, number is expressed through inflection, suffixation and case, and articles are also often expressed through specific suffixes, thus overt phrase boundaries are less common than in French, Italian and English. These commonalities and differences help us interpret the results, and provide clues on how the targeted syntactic information is encoded.</p><p>Previous experiments have shown that syntactic information -chunk sequences and their properties -can be accessed in transformer-based pretrained sentence embeddings <ref type="bibr" target="#b10">[11]</ref>. In this multilingual setup, we test whether this information has been identified based on languagespecific shallow features, or whether the system has uncovered and encoded more abstract structures.</p><p>The low rate of transfer for the monolingual training setup and the decreased performance for the multilingual training setup for both our experimental configurations indicate that the chunk sequence information is language specific and is assembled by the system based on shallow features. Further clues come from the fact that the only transfer happens between French and Italian, which encode phrases and grammatical number in a very similar manner. Embedding the sentence structure detection into a larger system, where it receives an additional learning signal (shared across languages) does not help to push towards finding a shared sentence representation space that encodes in a uniform manner the sentence structure shared across languages.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Conclusions</head><p>We have aimed to add some evidence to the question How do state-of-the-art systems ≪know≫ what they ≪know≫? <ref type="bibr" target="#b37">[37]</ref> by projecting the subject-verb agreement problem in a multilingual space. We chose languages that share syntactic structures, and have particular differences that can provide clues about whether the models learned rely on shallower indicators, or the pretrained models encode deeper knowledge. Our experiments show that pretrained language models do not encode abstract syntactic structures, but rather this information is assembled "upon request" -by the probe or task -based on language-specific indicators. Understanding how information is encoded in large language models can help determine the next necessary step towards making language models truly deep.    </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>C. Results</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>C.1. Chunk sequence detection in sentences</head></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: A two-level VAE: the sentence level learns to compress a sentence into a representation useful to solve the BLM problem on the task level.</figDesc><graphic coords="4,89.29,492.43,204.18,72.35" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 3 :</head><label>3</label><figDesc>Figure 3: Cross-language testing for detecting chunk structure in sentence embeddings.</figDesc><graphic coords="5,89.29,447.09,208.34,110.03" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Figure 4 :</head><label>4</label><figDesc>Figure 4: tSNE projection of the latent representation of sentences from the training data, coloured by their chunk pattern. Different markers indicate the languages: "o" for English, "x" for French, "+" for Italian, "*" for Romanian. We note that while representations cluster by the pattern, the clusters for different languages are disjoint.</figDesc><graphic coords="5,295.16,148.67,218.28,166.10" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head>Figure 5 :</head><label>5</label><figDesc>Figure 5: Average F1 performance on training on type I data over three runs -cross-language and multi-language</figDesc><graphic coords="6,85.83,101.03,211.81,107.58" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_4"><head>Figure 6 :</head><label>6</label><figDesc>Figure 6: Parallel examples of a type I data instance in English, French, Italian and Romanian</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 1</head><label>1</label><figDesc>Test data statistics. The amount of training data is always 2000 instances.</figDesc><table><row><cell>Type I</cell><cell>230</cell><cell>252</cell><cell>230</cell><cell>230</cell></row><row><cell>Type II</cell><cell>4052</cell><cell>4927</cell><cell>4121</cell><cell>4571</cell></row><row><cell>Type III</cell><cell>4052</cell><cell>4810</cell><cell>4121</cell><cell>4571</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_3"><head>Table 3</head><label>3</label><figDesc>Average F1 scores (standard deviation) for sentence chunk detection in sentences</figDesc><table><row><cell cols="3">C.2. Results on the BLM Agr* data</cell><cell></cell><cell></cell><cell></cell></row><row><cell>train on</cell><cell>test on</cell><cell>type_I_EN</cell><cell>type_I_FR</cell><cell>type_I_IT</cell><cell>type_I_RO</cell></row><row><cell>type_I</cell><cell></cell><cell>0.839 (0.007)</cell><cell>0.938 (0.011)</cell><cell cols="2">0.868 (0.021) 0.462 (0.023)</cell></row><row><cell>type_II</cell><cell></cell><cell>0.696 (0.006)</cell><cell>0.944 (0.003)</cell><cell>0.759 (0.004)</cell><cell>0.409 (0.031)</cell></row><row><cell>type_III</cell><cell></cell><cell>0.558 (0.013)</cell><cell>0.791 (0.026)</cell><cell>0.641 (0.023)</cell><cell>0.290 (0.027)</cell></row><row><cell></cell><cell></cell><cell>type_II_EN</cell><cell>type_II_FR</cell><cell>type_II_IT</cell><cell>type_II_RO</cell></row><row><cell>type_I</cell><cell></cell><cell cols="4">0.748 (0.001) 0.873 (0.006) 0.851 (0.015) 0.448 (0.015)</cell></row><row><cell>type_II</cell><cell></cell><cell>0.642 (0.002)</cell><cell>0.871 (0.012)</cell><cell>0.802 (0.002)</cell><cell>0.394 (0.012)</cell></row><row><cell>type_III</cell><cell></cell><cell>0.484 (0.023)</cell><cell>0.760 (0.027)</cell><cell>0.691 (0.023)</cell><cell>0.299 (0.010)</cell></row><row><cell></cell><cell></cell><cell>type_III_EN</cell><cell>type_III_FR</cell><cell>type_III_IT</cell><cell>type_III_RO</cell></row><row><cell>type_I</cell><cell></cell><cell>0.643 (0.003)</cell><cell></cell><cell>(0.022)</cell><cell>0.236 (0.004)</cell></row><row><cell>type_II</cell><cell></cell><cell>0.585 (0.010)</cell><cell>0.797 (0.008)</cell><cell>0.693 (0.009)</cell><cell>0.240 (0.006)</cell></row><row><cell>type_III</cell><cell></cell><cell>0.480 (0.026)</cell><cell>0.739 (0.027)</cell><cell>0.691 (0.017)</cell><cell>0.262 (0.002)</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_4"><head>Table 4</head><label>4</label><figDesc>Multilingual learning results for the BLM agreement task in terms of average F1 over three runs, and standard deviation.</figDesc><table><row><cell>test on</cell><cell>train on</cell><cell>type_I_EN</cell><cell>type_I_FR</cell><cell>type_I_IT</cell><cell>type_I_RO</cell></row><row><cell>type_I_EN</cell><cell></cell><cell>0.884 (0.002)</cell><cell>0.123 (0.032)</cell><cell>0.125 (0.046)</cell><cell>0.106 (0.034)</cell></row><row><cell>type_I_FR</cell><cell></cell><cell>0.103 (0.032)</cell><cell>0.948 (0.009)</cell><cell>0.466 (0.010)</cell><cell>0.164 (0.029)</cell></row><row><cell>type_I_IT</cell><cell></cell><cell>0.113 (0.033)</cell><cell>0.341 (0.018)</cell><cell>0.845 (0.010)</cell><cell>0.183 (0.021)</cell></row><row><cell>type_I_RO</cell><cell></cell><cell>0.113 (0.026)</cell><cell>0.186 (0.014)</cell><cell>0.188 (0.015)</cell><cell>0.733 (0.027)</cell></row><row><cell>type_II_EN</cell><cell></cell><cell>0.757 (0.015)</cell><cell>0.119 (0.009)</cell><cell>0.129 (0.029)</cell><cell>0.103 (0.019)</cell></row><row><cell>type_II_FR</cell><cell></cell><cell>0.132 (0.024)</cell><cell>0.868 (0.010)</cell><cell>0.433 (0.008)</cell><cell>0.187 (0.011)</cell></row><row><cell>type_II_IT</cell><cell></cell><cell>0.100 (0.020)</cell><cell>0.386 (0.016)</cell><cell>0.875 (0.004)</cell><cell>0.196 (0.009)</cell></row><row><cell>type_II_RO</cell><cell></cell><cell>0.088 (0.007)</cell><cell>0.174 (0.005)</cell><cell>0.173 (0.006)</cell><cell>0.726 (0.009)</cell></row><row><cell>type_III_EN</cell><cell></cell><cell>0.638 (0.025)</cell><cell>0.117 (0.007)</cell><cell>0.129 (0.028)</cell><cell>0.108 (0.013)</cell></row><row><cell>type_III_FR</cell><cell></cell><cell>0.114 (0.007)</cell><cell>0.820 (0.013)</cell><cell>0.406 (0.013)</cell><cell>0.169 (0.017)</cell></row><row><cell>type_III_IT</cell><cell></cell><cell>0.091 (0.009)</cell><cell>0.337 (0.016)</cell><cell>0.806 (0.009)</cell><cell>0.170 (0.013)</cell></row><row><cell>type_III_RO</cell><cell></cell><cell>0.086 (0.008)</cell><cell>0.170 (0.007)</cell><cell>0.174 (0.003)</cell><cell>0.314 (0.010)</cell></row><row><cell></cell><cell></cell><cell>type_II_EN</cell><cell>type_II_FR</cell><cell>type_II_IT</cell><cell>type_II_RO</cell></row><row><cell>type_I_EN</cell><cell></cell><cell>0.772 (0.030)</cell><cell>0.154 (0.023)</cell><cell>0.103 (0.014)</cell><cell>0.090 (0.007)</cell></row><row><cell>type_I_FR</cell><cell></cell><cell>0.151 (0.006)</cell><cell>0.972 (0.006)</cell><cell>0.484 (0.015)</cell><cell>0.143 (0.018)</cell></row><row><cell>type_I_IT</cell><cell></cell><cell>0.106 (0.014)</cell><cell>0.417 (0.018)</cell><cell>0.791 (0.004)</cell><cell>0.151 (0.034)</cell></row><row><cell>type_I_RO</cell><cell></cell><cell>0.107 (0.002)</cell><cell>0.177 (0.020)</cell><cell>0.170 (0.009)</cell><cell>0.625 (0.014)</cell></row><row><cell>type_II_EN</cell><cell></cell><cell>0.670 (0.002)</cell><cell>0.158 (0.015)</cell><cell>0.106 (0.006)</cell><cell>0.100 (0.010)</cell></row><row><cell>type_II_FR</cell><cell></cell><cell>0.188 (0.009)</cell><cell>0.903 (0.007)</cell><cell>0.434 (0.010)</cell><cell>0.146 (0.013)</cell></row><row><cell>type_II_IT</cell><cell></cell><cell>0.100 (0.010)</cell><cell>0.448 (0.011)</cell><cell>0.840 (0.003)</cell><cell>0.152 (0.020)</cell></row><row><cell>type_II_RO</cell><cell></cell><cell>0.093 (0.013)</cell><cell>0.182 (0.008)</cell><cell>0.159 (0.011)</cell><cell>0.636 (0.006)</cell></row><row><cell>type_III_EN</cell><cell></cell><cell>0.620 (0.005)</cell><cell>0.150 (0.012)</cell><cell>0.116 (0.007)</cell><cell>0.092 (0.009)</cell></row><row><cell>type_III_FR</cell><cell></cell><cell>0.168 (0.007)</cell><cell>0.870 (0.005)</cell><cell>0.386 (0.008)</cell><cell>0.127 (0.012)</cell></row><row><cell>type_III_IT</cell><cell></cell><cell>0.091 (0.005)</cell><cell>0.387 (0.002)</cell><cell>0.770 (0.008)</cell><cell>0.132 (0.016)</cell></row><row><cell>type_III_RO</cell><cell></cell><cell>0.082 (0.014)</cell><cell>0.175 (0.007)</cell><cell>0.172 (0.003)</cell><cell>0.311 (0.017)</cell></row><row><cell></cell><cell></cell><cell>type_III_EN</cell><cell>type_III_FR</cell><cell>type_III_IT</cell><cell>type_III_RO</cell></row><row><cell>type_I_EN</cell><cell></cell><cell>0.739 (0.012)</cell><cell>0.174 (0.023)</cell><cell>0.154 (0.013)</cell><cell>0.059 (0.009)</cell></row><row><cell>type_I_FR</cell><cell></cell><cell>0.160 (0.007)</cell><cell>0.923 (0.013)</cell><cell>0.434 (0.005)</cell><cell>0.196 (0.029)</cell></row><row><cell>type_I_IT</cell><cell></cell><cell>0.132 (0.011)</cell><cell>0.384 (0.016)</cell><cell>0.797 (0.009)</cell><cell>0.197 (0.005)</cell></row><row><cell>type_I_RO</cell><cell></cell><cell>0.091 (0.011)</cell><cell>0.164 (0.023)</cell><cell>0.170 (0.022)</cell><cell>0.280 (0.010)</cell></row><row><cell>type_II_EN</cell><cell></cell><cell>0.662 (0.008)</cell><cell>0.164 (0.009)</cell><cell>0.142 (0.015)</cell><cell>0.076 (0.010)</cell></row><row><cell>type_II_FR</cell><cell></cell><cell>0.202 (0.013)</cell><cell>0.883 (0.001)</cell><cell>0.454 (0.010)</cell><cell>0.203 (0.010)</cell></row><row><cell>type_II_IT</cell><cell></cell><cell>0.111 (0.004)</cell><cell>0.425 (0.005)</cell><cell>0.840 (0.002)</cell><cell>0.203 (0.006)</cell></row><row><cell>type_II_RO</cell><cell></cell><cell>0.086 (0.007)</cell><cell>0.158 (0.006)</cell><cell>0.158 (0.012)</cell><cell>0.379 (0.013)</cell></row><row><cell>type_III_EN</cell><cell></cell><cell>0.654 (0.010)</cell><cell>0.155 (0.006)</cell><cell>0.140 (0.016)</cell><cell>0.082 (0.007)</cell></row><row><cell>type_III_FR</cell><cell></cell><cell>0.183 (0.003)</cell><cell>0.860 (0.004)</cell><cell>0.431 (0.004)</cell><cell>0.191 (0.003)</cell></row><row><cell>type_III_IT</cell><cell></cell><cell>0.106 (0.003)</cell><cell>0.373 (0.003)</cell><cell>0.836 (0.005)</cell><cell>0.182 (0.004)</cell></row><row><cell>type_III_RO</cell><cell></cell><cell>0.082 (0.001)</cell><cell>0.156 (0.007)</cell><cell>0.155 (0.007)</cell><cell>0.353 (0.006)</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_5"><head>Table 5</head><label>5</label><figDesc>Results as average F1 (sd) over three runs, for the BLM subject-verb agreement task, in the monolingual training setting.</figDesc><table /></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">pp1 and pp2 may be included or not, pp2 may be included only if pp1 is included</note>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgments</head><p>We gratefully acknowledge the partial support of this work by the Swiss National Science Foundation, through grant SNF Advanced grant TMAG-1_209426 to PM.</p></div>
			</div>

			<div type="annex">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A. Generating data from a seed file</head><p>To build the sentence data, we use a seed file that was used to generate the subject-verb agreement data. A seed, consisting of noun, prepositional and verb phrases with different grammatical numbers, can be combined to build sentences consisting of different sequences of such chunks. Table <ref type="table">2</ref> includes a partial line from the seed file. To produce the data in the 4 languages, we translate the seed file, from which the sentences and BLM data are then constructed. The computers with the programs of the experiment are broken.</p><p>The computers with the programs of the experiments are broken.</p><p>The computers with the program of the experiment are broken.</p><p>The computers with the program of the experiment is broken. ...</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Table 2</head><p>A line from the seed file on top, and a set of individual sentences built from it, as well as one BLM instance.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>B. Example of data for the agreement BLM B.1. Example of BLM instances (type I) in different languages</head><p>English -Context 1 The owner of the parrot is coming. 2 The owners of the parrot are coming. <ref type="bibr" target="#b2">3</ref> The owner of the parrots is coming. <ref type="bibr" target="#b3">4</ref> The owners of the parrots are coming. <ref type="bibr" target="#b4">5</ref> The owner of the parrot in the tree is coming. <ref type="bibr" target="#b5">6</ref> The owners of the parrot in the tree are coming. <ref type="bibr" target="#b6">7</ref> The owner of the parrots in the tree is coming. ? ??? English -Answers 1 The owners of the parrots in the tree are coming. <ref type="bibr" target="#b1">2</ref> The owners of the parrots in the trees are coming. <ref type="bibr" target="#b2">3</ref> The owner of the parrots in the tree is coming. <ref type="bibr" target="#b3">4</ref> The owners of the parrots in the tree are coming. <ref type="bibr" target="#b4">5</ref> The owners of the parrot in the tree are coming. <ref type="bibr" target="#b5">6</ref> The owners of the parrots in the trees are coming. <ref type="bibr" target="#b6">7</ref> The owners of the parrots and the trees are coming. ? The owners of the parrots in the tree in the gardens are coming.  </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>French</head></div>			</div>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Superglue: A stickier benchmark for general-purpose language understanding systems</title>
		<author>
			<persName><forename type="first">A</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Pruksachatkun</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Nangia</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Singh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Michael</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Hill</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Levy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Bowman</surname></persName>
		</author>
		<ptr target="https://proceedings.neurips.cc/paper/2019/file/4496bf24afe7fab6f046bf4923da8de6-Paper.pdf" />
	</analytic>
	<monogr>
		<title level="m">Advances in Neural Information Processing Systems</title>
				<editor>
			<persName><forename type="first">H</forename><surname>Wallach</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">H</forename><surname>Larochelle</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">A</forename><surname>Beygelzimer</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">F</forename><surname>Alché-Buc</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">E</forename><surname>Fox</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">R</forename><surname>Garnett</surname></persName>
		</editor>
		<imprint>
			<publisher>Curran Associates, Inc</publisher>
			<date type="published" when="2019">2019</date>
			<biblScope unit="volume">32</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Emergent linguistic structure in artificial neural networks trained by self-supervision</title>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">D</forename><surname>Manning</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Clark</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Hewitt</surname></persName>
		</author>
		<author>
			<persName><forename type="first">U</forename><surname>Khandelwal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Levy</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Proceedings of the National Academy of Sciences</title>
		<imprint>
			<biblScope unit="volume">117</biblScope>
			<biblScope unit="page" from="30046" to="30054" />
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">A primer in BERTology: What we know about how BERT works</title>
		<author>
			<persName><forename type="first">A</forename><surname>Rogers</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Kovaleva</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Rumshisky</surname></persName>
		</author>
		<idno type="DOI">10.1162/tacl_a_00349</idno>
		<ptr target="https://aclanthology.org/2020.tacl-1.54.doi:10.1162/tacl_a_00349" />
	</analytic>
	<monogr>
		<title level="j">Transactions of the Association for Computational Linguistics</title>
		<imprint>
			<biblScope unit="volume">8</biblScope>
			<biblScope unit="page" from="842" to="866" />
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">BERT: Pre-training of deep bidirectional transformers for language understanding</title>
		<author>
			<persName><forename type="first">J</forename><surname>Devlin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M.-W</forename><surname>Chang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Toutanova</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/N19-1423</idno>
		<ptr target="https://aclanthology.org/N19-1423.doi:10.18653/v1/N19-1423" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</title>
		<title level="s">Long and Short Papers</title>
		<meeting>the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies<address><addrLine>Minneapolis, Minnesota</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page" from="4171" to="4186" />
		</imprint>
	</monogr>
	<note>Association for Computational Linguistics</note>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">BERT rediscovers the classical NLP pipeline</title>
		<author>
			<persName><forename type="first">I</forename><surname>Tenney</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Das</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Pavlick</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/P19-1452</idno>
		<ptr target="https://aclanthology.org/P19-1452.doi:10.18653/v1/P19-1452" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics</title>
				<editor>
			<persName><forename type="first">A</forename><surname>Korhonen</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">D</forename><surname>Traum</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">L</forename><surname>Màrquez</surname></persName>
		</editor>
		<meeting>the 57th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics<address><addrLine>Florence, Italy</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="4593" to="4601" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Designing and interpreting probes with control tasks</title>
		<author>
			<persName><forename type="first">J</forename><surname>Hewitt</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Liang</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/D19-1275</idno>
		<ptr target="https://aclanthology.org/D19-1275.doi:10.18653/v1/D19-1275" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Association for Computational Linguistics</title>
				<editor>
			<persName><forename type="first">K</forename><surname>Inui</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">J</forename><surname>Jiang</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">V</forename><surname>Ng</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">X</forename><surname>Wan</surname></persName>
		</editor>
		<meeting>the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Association for Computational Linguistics<address><addrLine>Hong Kong, China</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="2733" to="2743" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Assessing the ability of LSTMs to learn syntax-sensitive dependencies</title>
		<author>
			<persName><forename type="first">T</forename><surname>Linzen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Dupoux</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Goldberg</surname></persName>
		</author>
		<idno type="DOI">10.1162/tacl_a_00115</idno>
		<ptr target="https://www.mitpressjournals.org/doi/abs/10.1162/tacl_a_00115" />
	</analytic>
	<monogr>
		<title level="j">Transactions of the Association of Computational Linguistics</title>
		<imprint>
			<biblScope unit="volume">4</biblScope>
			<biblScope unit="page" from="521" to="535" />
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Colorless green recurrent networks dream hierarchically</title>
		<author>
			<persName><forename type="first">K</forename><surname>Gulordava</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Bojanowski</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Grave</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Linzen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Baroni</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/N18-1108</idno>
		<ptr target="http://aclweb.org/anthology/N18-1108.doi:10.18653/v1/N18-1108" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</title>
				<meeting>the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</meeting>
		<imprint>
			<publisher>Association for Computational Linguistics</publisher>
			<date type="published" when="2018">2018</date>
			<biblScope unit="page" from="1195" to="1205" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<monogr>
		<title level="m" type="main">Assessing bert&apos;s syntactic abilities</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Goldberg</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1901.05287</idno>
		<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Syntactic structure from deep learning</title>
		<author>
			<persName><forename type="first">T</forename><surname>Linzen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Baroni</surname></persName>
		</author>
		<idno type="DOI">10.1146/annurev-linguistics-032020-051035</idno>
	</analytic>
	<monogr>
		<title level="j">Annual Review of Linguistics</title>
		<imprint>
			<biblScope unit="volume">7</biblScope>
			<biblScope unit="page" from="195" to="212" />
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Are there identifiable structural parts in the sentence embedding whole?</title>
		<author>
			<persName><forename type="first">V</forename><surname>Nastase</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Merlo</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Workshop on analyzing and interpreting neural networks for NLP (BlackBoxNLP)</title>
				<meeting>the Workshop on analyzing and interpreting neural networks for NLP (BlackBoxNLP)</meeting>
		<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">BLM-AgrF: A new French benchmark to investigate generalization of agreement in neural networks</title>
		<author>
			<persName><forename type="first">A</forename><surname>An</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Jiang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">A</forename><surname>Rodriguez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Nastase</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Merlo</surname></persName>
		</author>
		<ptr target="https://aclanthology.org/2023.eacl-main.99" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, Association for Computational Linguistics</title>
				<meeting>the 17th Conference of the European Chapter of the Association for Computational Linguistics, Association for Computational Linguistics<address><addrLine>Dubrovnik, Croatia</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2023">2023</date>
			<biblScope unit="page" from="1363" to="1374" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<monogr>
		<title level="m" type="main">Blackbird language matrices (BLM), a new task for rule-like generalization in neural networks: Motivations and formal specifications</title>
		<author>
			<persName><forename type="first">P</forename><surname>Merlo</surname></persName>
		</author>
		<idno type="DOI">10.48550/arXiv.2306.11444</idno>
		<idno>ArXiv cs.CL 2306.11444</idno>
		<ptr target="https://doi.org/10.48550/arXiv.2306.11444.doi:10.48550/arXiv.2306.11444" />
		<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">BLM-s/lE: A structured dataset of English spray-load verb alternations for testing generalization in LLMs</title>
		<author>
			<persName><forename type="first">G</forename><surname>Samo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Nastase</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Jiang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Merlo</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Findings of the 2023 Conference on Empirical Methods in Natural Language Processing</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Standardization of progressive matrices</title>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">C</forename><surname>Raven</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">British Journal of Medical Psychology</title>
		<imprint>
			<biblScope unit="volume">19</biblScope>
			<biblScope unit="page" from="137" to="150" />
			<date type="published" when="1938">1938</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">What one intelligence test measures: a theoretical account of the processing in the raven progressive matrices test</title>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">A</forename><surname>Carpenter</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">A</forename><surname>Just</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Shell</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Psychological review</title>
		<imprint>
			<biblScope unit="volume">97</biblScope>
			<biblScope unit="page">404</biblScope>
			<date type="published" when="1990">1990</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Subject-verb agreement errors in french and english: The role of syntactic hierarchy</title>
		<author>
			<persName><forename type="first">J</forename><surname>Franck</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Vigliocco</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Nicol</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Language and cognitive processes</title>
		<imprint>
			<biblScope unit="volume">17</biblScope>
			<biblScope unit="page" from="371" to="404" />
			<date type="published" when="2002">2002</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">Grammatical information in BERT sentence embeddings as two-dimensional arrays</title>
		<author>
			<persName><forename type="first">V</forename><surname>Nastase</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Merlo</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2023.repl4nlp-1.3</idno>
		<ptr target="https://aclanthology.org/2023.repl4nlp-1.3.doi:10.18653/v1/2023.repl4nlp-1.3" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 8th Workshop on Representation Learning for NLP (RepL4NLP 2023), Association for Computational Linguistics</title>
				<editor>
			<persName><forename type="first">B</forename><surname>Can</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">M</forename><surname>Mozes</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">S</forename><surname>Cahyawijaya</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">N</forename><surname>Saphra</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">N</forename><surname>Kassner</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">S</forename><surname>Ravfogel</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">A</forename><surname>Ravichander</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">C</forename><surname>Zhao</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">I</forename><surname>Augenstein</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">A</forename><surname>Rogers</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">K</forename><surname>Cho</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">E</forename><surname>Grefenstette</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">L</forename><surname>Voita</surname></persName>
		</editor>
		<meeting>the 8th Workshop on Representation Learning for NLP (RepL4NLP 2023), Association for Computational Linguistics<address><addrLine>Toronto, Canada</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2023">2023</date>
			<biblScope unit="page" from="22" to="39" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">Electra: Pre-training text encoders as discriminators rather than generators</title>
		<author>
			<persName><forename type="first">K</forename><surname>Clark</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M.-T</forename><surname>Luong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><forename type="middle">V</forename><surname>Le</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">D</forename><surname>Manning</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">ICLR</title>
		<imprint>
			<biblScope unit="page" from="1" to="18" />
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">What do you learn from context? probing for sentence structure in contextualized word representations</title>
		<author>
			<persName><forename type="first">I</forename><surname>Tenney</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Xia</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Poliak</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">T</forename><surname>Mccoy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Kim</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Van Durme</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">R</forename><surname>Bowman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Das</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">The Seventh International Conference on Learning Representations (ICLR)</title>
				<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="235" to="249" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">A structural probe for finding syntax in word representations</title>
		<author>
			<persName><forename type="first">J</forename><surname>Hewitt</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">D</forename><surname>Manning</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/N19-1419</idno>
		<ptr target="https://aclanthology.org/N19-1419.doi:10.18653/v1/N19-1419" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</title>
		<title level="s">Long and Short Papers</title>
		<meeting>the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies<address><addrLine>Minneapolis, Minnesota</addrLine></address></meeting>
		<imprint>
			<publisher>Association for Computational Linguistics</publisher>
			<date type="published" when="2019">2019</date>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page" from="4129" to="4138" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<analytic>
		<title level="a" type="main">Finding universal grammatical relations in multilingual BERT</title>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">A</forename><surname>Chi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Hewitt</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">D</forename><surname>Manning</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2020.acl-main.493</idno>
		<ptr target="https://aclanthology.org/2020.acl-main.493.doi:10.18653/v1/2020.acl-main.493" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics</title>
				<editor>
			<persName><forename type="first">D</forename><surname>Jurafsky</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">J</forename><surname>Chai</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">N</forename><surname>Schluter</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">J</forename><surname>Tetreault</surname></persName>
		</editor>
		<meeting>the 58th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics</meeting>
		<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="5564" to="5577" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<analytic>
		<title level="a" type="main">Semantic role labeling meets definition modeling: Using natural language to describe predicate-argument structures</title>
		<author>
			<persName><forename type="first">S</forename><surname>Conia</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Barba</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Scirè</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Navigli</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2022.findings-emnlp.313</idno>
		<ptr target="https://aclanthology.org/2022.findings-emnlp.313.doi:10.18653/v1/2022.findings-emnlp.313" />
	</analytic>
	<monogr>
		<title level="m">Findings of the Association for Computational Linguistics: EMNLP 2022, Association for Computational Linguistics</title>
				<editor>
			<persName><forename type="first">Y</forename><surname>Goldberg</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Z</forename><surname>Kozareva</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Y</forename><surname>Zhang</surname></persName>
		</editor>
		<meeting><address><addrLine>Abu Dhabi, United Arab Emirates</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2022">2022</date>
			<biblScope unit="page" from="4253" to="4270" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b23">
	<analytic>
		<title level="a" type="main">Deep subjecthood: Higher-order grammatical features in multilingual BERT</title>
		<author>
			<persName><forename type="first">I</forename><surname>Papadimitriou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">A</forename><surname>Chi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Futrell</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Mahowald</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2021.eacl-main.215</idno>
		<ptr target="https://aclanthology.org/2021.eacl-main.215.doi:10.18653/v1/2021.eacl-main.215" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, Association for Computational Linguistics</title>
				<editor>
			<persName><forename type="first">P</forename><surname>Merlo</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">J</forename><surname>Tiedemann</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">R</forename><surname>Tsarfaty</surname></persName>
		</editor>
		<meeting>the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, Association for Computational Linguistics</meeting>
		<imprint>
			<date type="published" when="2021">2021</date>
			<biblScope unit="page" from="2522" to="2532" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b24">
	<analytic>
		<title level="a" type="main">What you can cram into a single $&amp;!#* vector: Probing sentence embeddings for linguistic properties</title>
		<author>
			<persName><forename type="first">A</forename><surname>Conneau</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Kruszewski</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Lample</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Barrault</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Baroni</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/P18-1198</idno>
		<ptr target="https://aclanthology.org/P18-1198.doi:10.18653/v1/P18-1198" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics</title>
				<editor>
			<persName><forename type="first">I</forename><surname>Gurevych</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Y</forename><surname>Miyao</surname></persName>
		</editor>
		<meeting>the 56th Annual Meeting of the Association for Computational Linguistics<address><addrLine>Melbourne, Australia</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2018">2018</date>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page" from="2126" to="2136" />
		</imprint>
	</monogr>
	<note>: Long Papers), Association for Computational Linguistics</note>
</biblStruct>

<biblStruct xml:id="b25">
	<analytic>
		<title level="a" type="main">Fine-grained analysis of sentence embeddings using auxiliary prediction tasks</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Adi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Kermany</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Belinkov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Lavi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Goldberg</surname></persName>
		</author>
		<ptr target="https://openreview.net/forum?id=BJh6Ztuxl" />
	</analytic>
	<monogr>
		<title level="m">5th International Conference on Learning Representations, ICLR 2017</title>
		<title level="s">Conference Track Proceedings</title>
		<meeting><address><addrLine>Toulon, France; OpenReview</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2017">April 24-26, 2017. 2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b26">
	<analytic>
		<title level="a" type="main">How abstract is linguistic generalization in large language models? experiments with argument structure</title>
		<author>
			<persName><forename type="first">M</forename><surname>Wilson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Petty</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Frank</surname></persName>
		</author>
		<idno type="DOI">10.1162/tacl_a_00608</idno>
		<ptr target="https://aclanthology.org/2023.tacl-1.78.doi:10.1162/tacl_a_00608" />
	</analytic>
	<monogr>
		<title level="j">Transactions of the Association for Computational Linguistics</title>
		<imprint>
			<biblScope unit="volume">11</biblScope>
			<biblScope unit="page" from="1377" to="1395" />
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b27">
	<monogr>
		<author>
			<persName><forename type="first">C</forename><surname>Wendler</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Veselovsky</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Monea</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>West</surname></persName>
		</author>
		<title level="m">Do llamas work in English? on the latent language of multilingual transformers</title>
				<editor>
			<persName><forename type="first">L.-W</forename></persName>
		</editor>
		<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b28">
	<monogr>
		<idno type="DOI">10.18653/v1/2024.acl-long.820</idno>
		<ptr target="https://aclanthology.org/2024.acl-long.820.doi:10.18653/v1/2024.acl-long.820" />
		<title level="m">Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics</title>
				<editor>
			<persName><forename type="first">A</forename><surname>Ku</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">V</forename><surname>Martins</surname></persName>
		</editor>
		<editor>
			<persName><surname>Srikumar</surname></persName>
		</editor>
		<meeting>the 62nd Annual Meeting of the Association for Computational Linguistics<address><addrLine>Bangkok, Thailand</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2024">2024</date>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page" from="15366" to="15394" />
		</imprint>
	</monogr>
	<note>Association for Computational Linguistics</note>
</biblStruct>

<biblStruct xml:id="b29">
	<analytic>
		<title level="a" type="main">Language-specific neurons: The key to multilingual capabilities in large language models</title>
		<author>
			<persName><forename type="first">T</forename><surname>Tang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Luo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Huang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Zhao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Wei</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J.-R</forename><surname>Wen</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2024.acl-long.309</idno>
		<ptr target="https://aclanthology.org/2024.acl-long.309.doi:10.18653/v1/2024.acl-long.309" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics</title>
				<editor>
			<persName><forename type="first">L.-W</forename><surname>Ku</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">A</forename><surname>Martins</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">V</forename><surname>Srikumar</surname></persName>
		</editor>
		<meeting>the 62nd Annual Meeting of the Association for Computational Linguistics<address><addrLine>Bangkok, Thailand</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2024">2024</date>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page" from="5701" to="5715" />
		</imprint>
	</monogr>
	<note>: Long Papers), Association for Computational Linguistics</note>
</biblStruct>

<biblStruct xml:id="b30">
	<analytic>
		<title level="a" type="main">Data-driven crosslingual syntax: An agreement study with massively multilingual models</title>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">G</forename><surname>De Varda</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Marelli</surname></persName>
		</author>
		<idno type="DOI">10.1162/coli_a_00472</idno>
		<ptr target="https://aclanthology.org/2023.cl-2.1.doi:10.1162/coli_a_00472" />
	</analytic>
	<monogr>
		<title level="j">Computational Linguistics</title>
		<imprint>
			<biblScope unit="volume">49</biblScope>
			<biblScope unit="page" from="261" to="299" />
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b31">
	<analytic>
		<title level="a" type="main">Understanding cross-lingual syntactic transfer in multilingual recurrent neural networks</title>
		<author>
			<persName><forename type="first">P</forename><surname>Dhar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Bisazza</surname></persName>
		</author>
		<ptr target="https://aclanthology.org/2021.nodalida-main.8" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa)</title>
				<editor>
			<persName><forename type="first">S</forename><surname>Dobnik</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">L</forename><surname>Øvrelid</surname></persName>
		</editor>
		<meeting>the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa)<address><addrLine>Sweden, Reykjavik, Iceland (Online</addrLine></address></meeting>
		<imprint>
			<publisher>Linköping University Electronic Press</publisher>
			<date type="published" when="2021">2021</date>
			<biblScope unit="page" from="74" to="85" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b32">
	<analytic>
		<title level="a" type="main">Cross-linguistic syntactic evaluation of word prediction models</title>
		<author>
			<persName><forename type="first">A</forename><surname>Mueller</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Nicolai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Petrou-Zeniou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Talmina</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Linzen</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2020.acl-main.490</idno>
		<ptr target="https://aclanthology.org/2020.acl-main.490.doi:10.18653/v1/2020.acl-main.490" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics</title>
				<editor>
			<persName><forename type="first">D</forename><surname>Jurafsky</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">J</forename><surname>Chai</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">N</forename><surname>Schluter</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">J</forename><surname>Tetreault</surname></persName>
		</editor>
		<meeting>the 58th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics</meeting>
		<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="5523" to="5539" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b33">
	<analytic>
		<title level="a" type="main">How multilingual is multilingual BERT?</title>
		<author>
			<persName><forename type="first">T</forename><surname>Pires</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Schlinger</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Garrette</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/P19-1493</idno>
		<ptr target="https://aclanthology.org/P19-1493.doi:10.18653/v1/P19-1493" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics</title>
				<editor>
			<persName><forename type="first">A</forename><surname>Korhonen</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">D</forename><surname>Traum</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">L</forename><surname>Màrquez</surname></persName>
		</editor>
		<meeting>the 57th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics<address><addrLine>Florence, Italy</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="4996" to="5001" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b34">
	<analytic>
		<title level="a" type="main">Language models are few-shot multilingual learners</title>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">I</forename><surname>Winata</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Madotto</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Lin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Yosinski</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Fung</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2021.mrl-1.1</idno>
		<ptr target="https://aclanthology.org/2021.mrl-1.1.doi:10.18653/v1/2021.mrl-1.1" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 1st Workshop on Multilingual Representation Learning, Association for Computational Linguistics</title>
				<editor>
			<persName><forename type="first">D</forename><surname>Ataman</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">A</forename><surname>Birch</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">A</forename><surname>Conneau</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">O</forename><surname>Firat</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">S</forename><surname>Ruder</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">G</forename><forename type="middle">G</forename><surname>Sahin</surname></persName>
		</editor>
		<meeting>the 1st Workshop on Multilingual Representation Learning, Association for Computational Linguistics<address><addrLine>Punta Cana, Dominican Republic</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2021">2021</date>
			<biblScope unit="page" from="1" to="15" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b35">
	<analytic>
		<title level="a" type="main">XTREME: A massively multilingual multi-task benchmark for evaluating cross-lingual generalisation</title>
		<author>
			<persName><forename type="first">J</forename><surname>Hu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Ruder</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Siddhant</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Neubig</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Firat</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Johnson</surname></persName>
		</author>
		<ptr target="https://proceedings.mlr.press/v119/hu20b.html" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 37th International Conference on Machine Learning</title>
				<editor>
			<persName><forename type="first">H</forename><forename type="middle">D</forename><surname>Iii</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">A</forename><surname>Singh</surname></persName>
		</editor>
		<meeting>the 37th International Conference on Machine Learning<address><addrLine>PMLR</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="volume">119</biblScope>
			<biblScope unit="page" from="4411" to="4421" />
		</imprint>
	</monogr>
	<note>Proceedings of Machine Learning Research</note>
</biblStruct>

<biblStruct xml:id="b36">
	<analytic>
		<title level="a" type="main">Towards a common understanding of contributing factors for cross-lingual transfer in multilingual language models: A review</title>
		<author>
			<persName><forename type="first">F</forename><surname>Philippy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Guo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Haddadan</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2023.acl-long.323</idno>
		<ptr target="https://aclanthology.org/2023.acl-long.323.doi:10.18653/v1/2023.acl-long.323" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics</title>
				<editor>
			<persName><forename type="first">A</forename><surname>Rogers</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">J</forename><surname>Boyd-Graber</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">N</forename><surname>Okazaki</surname></persName>
		</editor>
		<meeting>the 61st Annual Meeting of the Association for Computational Linguistics<address><addrLine>Toronto, Canada</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2023">2023</date>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page" from="5877" to="5891" />
		</imprint>
	</monogr>
	<note>: Long Papers), Association for Computational Linguistics</note>
</biblStruct>

<biblStruct xml:id="b37">
	<analytic>
		<title level="a" type="main">Understanding natural language understanding systems, Sistemi intelligenti</title>
		<author>
			<persName><forename type="first">A</forename><surname>Lenci</surname></persName>
		</author>
		<idno type="DOI">10.1422/107438</idno>
		<ptr target="https://www.rivisteweb.it/doi/10.1422/107438.doi:10.1422/1074" />
	</analytic>
	<monogr>
		<title level="m">Rivista quadrimestrale di scienze cognitive e di intelligenza artificiale</title>
				<imprint>
			<date type="published" when="2023">2023</date>
			<biblScope unit="page" from="277" to="302" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
