<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Towards an Automatic Evaluation of (In)coherence in Student Essays</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Filippo</forename><surname>Pellegrino</surname></persName>
							<email>filippo.pellegrino.job@gmail.com</email>
							<affiliation key="aff0">
								<orgName type="institution">Eurac Research Institute Bozen -Südtirol</orgName>
								<address>
									<addrLine>Viale Druso Drususallee, 1</addrLine>
									<postCode>39100</postCode>
									<settlement>Bolzano, Autonome Provinz</settlement>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Jennifer</forename><forename type="middle">Carmen</forename><surname>Frey</surname></persName>
							<email>jennifercarmen.frey@eurac.edu</email>
							<affiliation key="aff0">
								<orgName type="institution">Eurac Research Institute Bozen -Südtirol</orgName>
								<address>
									<addrLine>Viale Druso Drususallee, 1</addrLine>
									<postCode>39100</postCode>
									<settlement>Bolzano, Autonome Provinz</settlement>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Lorenzo</forename><surname>Zanasi</surname></persName>
							<email>lorenzo.zanasi@eurac.edu</email>
							<affiliation key="aff0">
								<orgName type="institution">Eurac Research Institute Bozen -Südtirol</orgName>
								<address>
									<addrLine>Viale Druso Drususallee, 1</addrLine>
									<postCode>39100</postCode>
									<settlement>Bolzano, Autonome Provinz</settlement>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Towards an Automatic Evaluation of (In)coherence in Student Essays</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">BB4FE9398D66280A303B8AB18ADDC277</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T17:38+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Coherence modelling</term>
					<term>data perturbation</term>
					<term>transformers</term>
					<term>education</term>
					<term>student essays</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Coherence modeling is an important task in natural language processing (NLP) with potential impact on other NLP tasks such as Natural Language Understanding or Automated Essay Scoring. Automatic approaches in coherence modeling aim to distinguish coherent from incoherent (often synthetically created) texts or to identify the correct continuation for a given sample of texts, as demonstrated for Italian in the DisCoTex task of EVALITA 2023. While early work on coherence modelling has focused on exploring definitions of the phenomenon, exploring the performance of neural models has dominated the field in recent years. However, coherence modelling can also offer interesting linguistic insights with pedagogical implications. In this article, we target coherence modeling for the Italian language in a strongly domain-specific scenario, i.e. education. We use a corpus of student essays collected to analyse students' text coherence in combination with data perturbation techniques to experiment with the effect of various linguistically informed features of incoherent writing on current coherence modelling strategies used in NLP. Our results show the capabilities of encoder models to capture features of (in)coherence in a domain-specific scenario discerning natural from artificially corrupted texts.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>Argumentative essay writing is a fundamental objective in education for both vocational schools and high schools in Italy, as indicated in <ref type="bibr" target="#b0">[1,</ref><ref type="bibr" target="#b1">2]</ref>. It requires students to present arguments supported by personal knowledge or external sources in a coherent and convincing manner. However, writing coherent texts poses both cognitive and linguistic challenges to novice writers and textual competences related to it are frequently claimed to be insufficient, putting pressure on the educational system. Automatically discerning incoherent texts or passages could help teachers to better understand students' problems and give targeted instructions, while students would benefit from more frequent and more timely feedback. However, to date, most NLP research in automatic coherence modelling focused on semantic similarity between two parts of texts using mostly well-formed newspaper or Wikipedia texts, offering little information for educational contexts. In this study, we explore coherence from an educational perspective, utilizing recent language models and data perturbation techniques to probe their value for linguistically informed and informative automatic coherence evaluation for student essays. While large language models have been used successfully in domain general coherence modelling before, we test their effectiveness for text analysis in this domain-specific scenario, taking into account both surface and non-standard language features. We discuss:</p><p>• data perturbation techniques to artificially reproduce real-life scenario incoherence in textual data • a custom probing task design • automatic evaluation of coherence using different encoding models</p><p>The results of our experiments show the performances of encoder models in recognizing patterns of (in)coherence in a domain-specific educational context such as upper secondary school student essays. The paper is organized as follows: Section 2 provides an overview of previous approaches to coherence modelling and NLP data perturbation with a focus on Italian NLP. Section 3 introduces the data we used for this study, giving information on the research project it originates in as well as on the corpus design and annotation. Section 4 provides a detailed description of our methodology introducing our custom probing tasks (Section 4.1), used Models (Section 4.2.1) and text encoding 4.3 as well as a description of the two analyses performed (Section 4.4 and Section 4.5). Sections 5 and 6 present and discuss our results and Section 7 concludes the article with final considerations.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Related Work</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1.">Coherence modelling</head><p>Coherence modeling is an important task in natural language processing (NLP) with potential impact on other NLP tasks such as Natural Language Understanding or automated essay scoring. Early work on coherence modelling focused on the definition of the phenomenon <ref type="bibr" target="#b2">[3,</ref><ref type="bibr" target="#b3">4,</ref><ref type="bibr" target="#b4">5,</ref><ref type="bibr" target="#b5">6,</ref><ref type="bibr" target="#b6">7]</ref> and provides valuable frameworks such as Centering Theory <ref type="bibr" target="#b7">[8,</ref><ref type="bibr" target="#b8">9]</ref> and Entity-Grid approach <ref type="bibr" target="#b9">[10]</ref>.</p><p>Following the great development of neural network systems in recent years, many works such as <ref type="bibr" target="#b10">[11,</ref><ref type="bibr" target="#b11">12,</ref><ref type="bibr" target="#b12">13,</ref><ref type="bibr" target="#b13">14]</ref> explored coherence modelling implementing further and more sophisticated solutions for the English language.</p><p>Recently, the Italian NLP community has approached the topic from an engineering point of view, using Italian pre-trained neural models to distinguish coherent from (mainly synthetically constructed) non-coherent texts <ref type="bibr" target="#b14">[15,</ref><ref type="bibr" target="#b15">16,</ref><ref type="bibr" target="#b16">17,</ref><ref type="bibr" target="#b17">18]</ref>. Some efforts were also made for multilingual scenarios <ref type="bibr" target="#b18">[19]</ref> demonstrating the encoding capabilities of multilingual models for coherence features.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2.">Data perturbation</head><p>In data perturbation, dataset entries are corrupted with specific computational operations to simulate noise condition and test the model performance on real world conditions <ref type="bibr" target="#b19">[20]</ref>. Many studies on data perturbation and data augmentation in NLP focus on model agnostic methods <ref type="bibr" target="#b19">[20,</ref><ref type="bibr" target="#b20">21,</ref><ref type="bibr" target="#b21">22,</ref><ref type="bibr" target="#b22">23]</ref> using random deletion, random swap, synonym replacement, random insertion and punctuation insertion techniques for text classification with limited amount of data. More sophisticated and task-oriented data augmentation approaches are proposed for sentiment analysis <ref type="bibr" target="#b23">[24]</ref>, hate speech classification <ref type="bibr" target="#b24">[25]</ref>, hypernymy detection <ref type="bibr" target="#b25">[26]</ref> and domain specific classification <ref type="bibr" target="#b26">[27]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Data</head><p>The data used in this study originates from a research project, conducted in South Tyrol between 2020 and 2024. The project named ITACA: Coerenza nell'ITAliano Accademico <ref type="bibr" target="#b27">[28]</ref> had the aim to study textual competences of students in their first language Italian with particular focus on aspects of text coherence. Within the project various outcomes have been produced: a corpus of Italian student essays collected in Italian South Tyrolean upper secondary schools, a validated rating scale to evaluate coherence in student essays, and coherence ratings for texts in the corpus from three independent raters using the previously developed rating scale. The products are described in the following section.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">ITACA Corpus</head><p>The ITACA corpus<ref type="foot" target="#foot_0">1</ref> is an annotated learner corpus created within the project ITACA: Coerenza nell'ITAliano Accademico <ref type="bibr" target="#b27">[28]</ref>. It consists of a total of 636 argumentative essays from Italian L1 upper secondary school students from the autonomous province of Bolzano/Bozen<ref type="foot" target="#foot_1">2</ref> during the school year 2021/2022. The texts were collected by asking 12th grade students to type an argumentative essay following precise indications of writing time, text length and topic. The full assignment can be consulted in the Appendix B. While the assignment asked for a minimun text length of 600 words, the average number of tokens in the essay is with 668, just slightly above the minimum length requirement.</p><p>The totality of the 636 collected texts constitutes 382,964 tokens. All data were collected digitally and anonymously and underwent subsequent control and cleaning procedures, partly manually, to ensure their integrity and to guarantee the anonymity of the participants. Essays were collected, by asking students to type their essays into an input field in an online form, additional metadata was collected by a subsequent online questionnaire asking for basic socio-demographic information, students' language background, and reading and writing habits. The whole corpus was automatically tokenized, lemmatized and annotated for part-of-speech and syntactic dependencies with the support of project collaborators from Fondazione Bruno Kessler, who also supported the project in the setup of an interface for manual annotation based on Inception <ref type="bibr" target="#b28">[29]</ref>.</p><p>A manual annotation of a subset of 388 texts was performed by two trained annotators and offers detailed descriptions of the text's structure, with a focus on the use of various linguistic features (such as punctuation, connectives, agreements, anaphora, contradictions) that enhance or limit the text's cohesion and coherence. The manual annotation of the corpus was guided by the three sections elaborated in <ref type="bibr" target="#b29">[30]</ref> and contained annotations for traits of incoherence referring to 1. segmentation (e.g. splice comma, added comma, not-signed parenthetical clause) 2. logic-argumentative plan (e.g. issues in the use of connectives, contradictions) 3. thematic-referential plan (e.g. critical agreement, critical anaphora, not-expanded comment)</p><p>The corpus is accessible through an ANNIS search interface <ref type="foot" target="#foot_2">3</ref> and can be downloaded in various formats from the Eurac Research Clarin Center (ERCC) under the CLARIN ACADEMIC END-USER LICENCE ACA-BY-NC-NORED 1.0 licence 4 . Downloads and further documentation can also be accessed via Eurac Research's PORTA platform 5 .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">Manual coherence ratings</head><p>Each single essay was additionally manually evaluated in a double-blind manner by a panel of six experts who applied a specially created, rating scale, which was subsequently validated to assess textual coherence. The items were rated on a Likert scale from one to ten and referred to three dimensions of coherence (structure, comprehensibility, segmentation). The average structure score 𝜇 is attested at 4.55 with standard deviation 𝜎 = 5. For comprehensibility, 𝜇 = 6.29 and 𝜎 = 1.65, while for segmentation 𝜇 = 5.99 and 𝜎 = 1.79.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Methodology</head><p>In this study, we focus on NLP data perturbation <ref type="bibr" target="#b19">[20,</ref><ref type="bibr" target="#b20">21]</ref> and custom probing tasks <ref type="bibr" target="#b30">[31]</ref> to evaluate the ability of Italian BERT models of discerning features of coherence given different pre-training conditions and fine tuning.</p><p>In our analysis, we aim to evaluate automatic coherence modelling techniques, applying them to student essays with varying degrees of well-formedness and coherence.</p><p>We conducted a number of experiments probing whether state-of-the-art coherence modelling techniques based on BERT encodings would be able to distinguish between original, i.e. allegedly coherent texts and those containing features of incoherence identified for student writing before. In our case study, we use data perturbation techniques to reproduce specific students' errors observed during the textual analysis of the ITACA project <ref type="bibr" target="#b27">[28]</ref> (see Section 3), in order to apply text modification in a fully controlled fashion. We used representations obtained from BERT <ref type="bibr" target="#b31">[32]</ref> models to demonstrate the ability of automatic systems to encode patterns of (in)coherence in a specialized scenario such as Italian student essays and evaluate their potential for educational purposes.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.">Custom Probing Tasks</head><p>Using data perturbation techniques, we aim to reproduce both general-purpose coherence modelling perturbation strategies and modifications inspired by some of the most salient features of textual (in)coherence observed in the annotation process for the ITACA project. These include incoherent order of arguments and sentences, incorrect use of connectives, overuse of polyfunctional connectives, unresolved co-reference, the use of splice comma and an overuse of paratactical constructions. Assuming that students would not produce the these Although data perturbation can also operate on the character level, we opted for token-and sentence-level approaches maintaining parameters in a controlled setting.</p><p>We implemented the following custom probing tasks:</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Sentence Order Perturbation [SHUFF]:</head><p>As in other synthetic datasets for coherence modelling <ref type="bibr" target="#b14">[15]</ref> this data perturbation technique is to randomly shuffle sentences within the texts.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Connective Perturbation [LICO]:</head><p>In order to imitate texts in which the logical connection between phrases is erroneous, we randomly substituted connectives used in the text exploiting both manual and automatic processing with Stanza<ref type="foot" target="#foot_3">6</ref> ; To identify the connectives to substitute, we referred to a string matching of all connectives listed in the Lexicon of Italian Connectives (LICO) <ref type="bibr" target="#b32">[33]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Polyfunctional Connective Perturbation [POLY-FUNCT]:</head><p>Based on the ITACA corpus annotation scheme, we implement a probing task, imitating young writers tendency to use simple polifunctional connectives instead of highly semantically loaded ones. For this, we substitute all connectives in the text by the polyfunctional connective "e".</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Pronoun Perturbation [PRON]:</head><p>For a very simplistic approximation of corrupted anaphoric references, we identified pronouns with Stanza and replaced them randomly by other pronouns isoleted from the corpus. To ensure a minimum of correct pronouns, only 50% of the pronouns in the text were corrupted.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Splice Comma Perturbation [SPLICE]:</head><p>A splice comma is the use of a comma to join two independent sentences. The comma can substitute a dot, a colon, or semicolon <ref type="bibr" target="#b33">[34,</ref><ref type="bibr" target="#b34">35,</ref><ref type="bibr" target="#b35">36,</ref><ref type="bibr" target="#b36">37]</ref>. In our case, long pause markers such as periods, colons, or semicolons were substituted with a comma. We apply the perturbation to just 50% of the conjunctions in the text to partially keep punctuation unaltered. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Perturbation</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Table 1</head><p>Example Sentences under Text Perturbations. The example corresponds to the English "This morning I went to the market. I bought some apples and oranges. Then I went back home and baked a cake"</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Parataxis Perturbation [PARATAX]:</head><p>Coordinating conjunctions extracted with Stanza are substituted with punctuation taken from a list to create paratactic sentences. We apply the perturbation to just 50% of the conjunctions in the text to keep some conjunctions untouched.</p><p>Text perturbation examples can be consulted in Table <ref type="table">1</ref> 4.2. Models</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.1.">Pre-trained Models</head><p>For our experiments, we test three different BERT-based models to obtain vector representations for our probing tasks.</p><p>1. BERT-ita base <ref type="bibr" target="#b37">[38]</ref>: trained with Italian data from the OPUS corpora collection <ref type="foot" target="#foot_4">7</ref> and Wikipedia <ref type="foot" target="#foot_5">8</ref> .The final training corpus has a size of 13GB and 2,050,057,573 tokens. 2. GilBERTo<ref type="foot" target="#foot_6">9</ref> : RoBERTa based model <ref type="bibr" target="#b38">[39]</ref>. The model is trained with the subword masking technique for 100k steps managing 71GB of Italian text with 11,250,012,896 words <ref type="bibr" target="#b39">[40]</ref>. The team took up a vocabulary of 32k BPE subwords, generated using SentencePiece tokenizer <ref type="bibr" target="#b40">[41]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.2.">BERT-ita Fine-tuning</head><p>Inspired by the works of <ref type="bibr" target="#b41">[42]</ref> and <ref type="bibr" target="#b42">[43]</ref>, the BERT-ita model was fine-tuned using a dataset of high school es- </p><formula xml:id="formula_0">says</formula></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3.">Text Encoding</head><p>We retrieved vector representations and performed a binary text classification experiment for each perturbation technique <ref type="foot" target="#foot_8">11</ref> . The model is fed with batch size = 1 with all the texts contained in the set. To overcome the length input limit of 512 tokens imposed by BERT models and process the entire text in a row with no loss of contextual information, we split the text into two segments when reached the max input lenght. Furthermore, we adopted a mean-pooling strategy by calculating the mean between the last hidden state of each contextualized token embedding in the batch across the input sequence length.</p><p>The final text representation is the mean of all segment embeddings in the batch.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.4.">Model Performance Analysis</head><p>We first perform a model performance analysis, comparing the model performance in classification for each of the custom probing tasks with each of the three models. Classification is performed with a Random Forest classifier <ref type="bibr" target="#b43">[44]</ref>, defining each experiment as a binary classification between the original and perturbated texts. The classes were balanced across the entire dataset. To optimize the amount of available data for training and testing, we use 10-fold cross-validation for evaluation. We compare model performance against a majority class baseline (0.5 for balanced binary classification) and against each other using f1 scores.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.5.">Error Analysis</head><p>In a subsequent analysis, we compare the model predictions of our best-performing model with the human coherence ratings provided for the corpus. In order to obtain a single coherence score for each essay, the scores were averaged over the different annotators and the three components (structure, comprehensibility and segmentation; see Section 3). We perform an error analysis by comparing the predictions for unmodified texts with the highest and lowest coherence scores using a random forest classifier trained with the model that achieved the best results in the model comparison. Assuming that all tasks have the same weight, we select the best performing model according to the average f1 score achieved in the model performance analysis (see Section 4.4). The train set for this evaluation corresponds to 90% of the data, while the test set represents the 5% of essays with the highest (𝜇 = 8.28, 𝜎 = 0.36) and the 5% with the lowest coherence scores (𝜇 = 2.63, 𝜎 = 0.51). Finally, we interpret the results, manually investigating texts that were misclassified as modified texts from both tails of the test set.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Results</head><p>The classification experiments show the ability of the BERT models to encode the features of (in)coherence represented by the perturbation techniques introduced in Section 4.1. The following sections illustrate our findings for the BERT model comparison and the error analysis conducted on a selected subset of non-modified texts.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.1.">Models Comparison Analysis</head><p>F1 scores for most models were very similar with just small differences between the three models. In average, GilBERTo was found to be the best performing model for most tasks, probably due to its higher amount of training data and its lighter model architecture. However, we do not expect these differences to be significant. Except for the improvement in the shuffling task after fine-tuning, the ITACA-bert model remains comparable to its base version, probably due to the scarcity of domain-specific training data. Results showed that models achieved better performance on semantic tasks such as polyfunctional conjunction perturbation or pronoun perturbation while struggling with syntactic probing tasks such as shuffling and splice comma perturbation. For the shuffling task, a considerable improvement can be observed after finetuning (+0.12% from F1 = 0.38 to F1 = 0.50). However, neither of the shuffling models performs better than a random baseline, while the splice comma experiment models performed slightly better, with the BERT-ita and Gilberto models marginally beating the baseline of 0.5. A graphical comparison between model performances can be seen in Figure <ref type="figure" target="#fig_0">1</ref>.</p><p>A detailed overview of the classification results for single tasks and models can be found in the Appendix A. The tables provide measures of the f1 score for each experiment and model.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.2.">Error analysis on evaluation set</head><p>To better observe the encoding and classification performance of BERT, we decide to isolate the texts with the highest and the lowest coherence scores according to the average coherence scores as specified in 4.5. The resulting test set corresponds roughly to the 10% of the total number of texts in the corpus. Our expectation is that texts with lower coherence scores have a higher chance to be misclassified as modified texts, while texts with higher coherence scores should not lead the classifiers to identify traits of incoherence as specified in the cus- tom probing tasks. We perform all analysis using the GilBERTo model for text encoding, as it was revealed to be the best performing model when averaging f1 scores on all tasks of the model performance analysis (see Section 4.4). However, we exclude the shuffling task as model performance was below the baseline and therefore too low for interpretation. Thus, we train a random forest classifier with the 90% of the train set, for all custom probing tasks described in Section 4.1.</p><p>Our results show that the distribution of misclassified labels is generally skewed toward texts with lower coherence scores, but misclassifications for texts with higher coherence scores were also found. While the splice comma and polyfunctional conjunction (see Figure <ref type="figure" target="#fig_1">2</ref>) probing tasks showed clearly more misclassifications on the lower tail of the dataset, also well-rated texts were occasionally misclassified as perturbed texts. On the contrary, the small number of misclassifications on the parataxis and pronoun perturbation probing tasks might suggest that the operationalizations taken in this work are too simplistic to be representative of students' mistakes in the texts and, therefore, not able to pick up on traits of incoherence present in the students' essays. The results of the experiment can be consulted in Appendix A.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.">Discussion</head><p>Although data perturbation cannot fully reproduce the variability of real-word students' mistakes, our results give precious insights about the ability of BERT encoders to capture degrees of coherence on both syntactic and semantic level. Of course, the efficiency of the data perturbation might be influenced by several factors, such as the fact that the original texts used for our experiments already naturally contain errors of the same or other types. However, we argue that this is the case for any type of data set of unknown quality that is subject to automatic coherence evaluation. Thus, before the evaluation, texts have not been subjected to any review and, excluding other external factors, they reproduce real-world writing conditions. The results of language encoding and classification depend on the difficulty of the perturbation task and on the original training of the BERT model. However, despite the fact that the BERT-ita base and GilBERTo exploit different training strategies, no drastic performance fluctuations have been observed on our selected language tasks. Even though the effects of fine-tuning with domain-specific data is limited to the amount of affordable data, the effect can already be observed by looking at the increment on the shuffling task performance.</p><p>The classification of the evaluation set highlighted the potential of data perturbation techniques for the encoding of (in)coherence features. Previous approaches to coherence modelling implemented solutions inspired by theoretical intuitions. In our case, we decided to start from natural textual errors and check the ability of the model in capturing the same features presented in the text. For a more transparent interpretation of results and explanation of individual classification it would be of interest to check how attention maps change according to the tuning of the model <ref type="bibr" target="#b44">[45]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7.">Conclusion</head><p>In this paper, we presented an evaluation of coherence modelling techniques for detecting incoherence in student essays based on surface-level features of incoherence. We used the ITACA corpus of Italian upper secondary school essays to perform a number of classification techniques using data perturbation and BERT-based text encoding methods. After a preliminary comparison between pre-trained and fine-tuned models we adopted the best performing one according to our results. The results of the chosen tasks are influenced by the implementation of the perturbation technique, the encoding ability of the model, and the amount and the quality of the data the model is pre-trained on. The best performances are bounded to the model pre-trained with the highest amount of data (GilBERTo). We based our evaluation on simple f1 measures considering this sufficiently indicative of the encoding ability of the model applied to each specific probing task. Since we mainly tested custom perturbation techniques and the encoding abilities of BERT models, future research directions might involve data perturbation techniques enhancement, XAI techiques for model behaviour analysis <ref type="bibr" target="#b45">[46,</ref><ref type="bibr" target="#b44">45]</ref> and the exploitation of state-of-the-art generative one shot and few-shot models in a highly domain-specific scenario such as school essays writing. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A. Appendix A</head><note type="other">Aug</note></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Model performances comparison on single probing tasks</figDesc><graphic coords="5,302.62,84.19,203.36,181.84" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: Classification results on evaluation set. The figure shows the amount of misclassified labels for the essays that lie in the highest and lowest tail of the score ranking ITACA dataset.</figDesc><graphic coords="6,89.29,84.19,203.37,123.28" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head></head><label></label><figDesc>Example Sentence None Stamattina io sono andato al mercato. Ho comprato delle mele e delle arance. Poi sono tornato a casa e ho preparato una torta. Sentence Order Perturbation Poi sono tornato a casa e ho preparato una torta. Stamattina io sono andato al mercato. Ho comprato delle mele e delle arance. LICO Connective Perturbation Stamattina io sono andato al mercato. Ho comprato delle mele e delle arance. Poi sono tornato a casa invece di ho preparato una torta. Polyfunctional Connective Perturbation Stamattina io sono andato al mercato. Ho comprato delle mele e delle arance. e sono tornato a casa e ho preparato una torta. Pronoun Perturbation Stamattina noi sono andato al mercato. Ho comprato delle mele e delle arance. Poi sono tornato a casa e ho preparato una torta. Splice Comma Perturbation Stamattina io sono andato al mercato, Ho comprato delle mele e delle arance, Poi sono tornato a casa e ho preparato una torta. Parataxis Perturbation Stamattina io sono andato al mercato. Ho comprato delle mele, delle arance. Poi sono tornato a casa. ho preparato una torta.</figDesc><table /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_3"><head>Table 2</head><label>2</label><figDesc>Model comparison on f1 score for each task. Each probe is run as a binary classification task on 636 dataset entries. The baseline is set on 0.5 "In base all'esperienza maturata durante la pandemia di Covid-19, il Ministro dell'Istruzione ha proposto di estendere permanentemente, a partire dal prossimo anno scolastico, la Didattica Digitale Integrata (DDI, modalità didattica che combina momenti di insegnamento a distanza e attività svolte in classe) al triennio delle scuole superiori [...]. Immagina di dover scrivere una lettera al Ministro in cui esponi le tue ragioni a favore o contro questa possibilità, argomentandole in modo da convincerlo della bontà delle tue idee [...]. Durante lo svolgimento del testo ricordati di: 1. Chiarire la tesi che intendi difendere. 2. Spiegare le motivazioni a sostegno della tesi. 3. Prendere in considerazione il punto di vista alternativo e illustrare le ragioni per cui non sei d'accordo. 4. Arrivare a una conclusione. 5. Prima di consegnare, ricordati di rileggere con cura il testo che hai scritto. Il tuo obiettivo è convincere il Ministro della bontà della tesi che sostieni. Hai 100 minuti di tempo per scrivere un testo di almeno 600 parole. "</figDesc><table><row><cell>Techniques</cell><cell cols="2">GilBERTo F1 Score</cell><cell cols="2">ITACA-bert F1 Score</cell><cell>BERT-base-italian F1 Score</cell></row><row><cell>SHUFF</cell><cell>0.43</cell><cell></cell><cell>0.5</cell><cell></cell><cell>0.38</cell></row><row><cell>LICO</cell><cell>0.97</cell><cell></cell><cell>0.96</cell><cell></cell><cell>0.95</cell></row><row><cell>POLYFUNCT</cell><cell>0.88</cell><cell></cell><cell>0.88</cell><cell></cell><cell>0.89</cell></row><row><cell>PRON</cell><cell>1.0</cell><cell></cell><cell>0.99</cell><cell></cell><cell>0.99</cell></row><row><cell>SPLICE</cell><cell>0.56</cell><cell></cell><cell>0.49</cell><cell></cell><cell>0.55</cell></row><row><cell>PARATAX</cell><cell>0.99</cell><cell></cell><cell>0.95</cell><cell></cell><cell>0.97</cell></row><row><cell cols="2">Aug Techniques</cell><cell cols="2">Train Dataset Len</cell><cell>Num Labels</cell><cell>Baseline</cell><cell>Accuracy</cell></row><row><cell>LICO</cell><cell></cell><cell>575</cell><cell></cell><cell>2</cell><cell>0.5</cell><cell>0.96</cell></row><row><cell cols="2">POLYFUNCT</cell><cell>575</cell><cell></cell><cell>2</cell><cell>0.5</cell><cell>0.78</cell></row><row><cell>PRON</cell><cell></cell><cell>575</cell><cell></cell><cell>2</cell><cell>0.5</cell><cell>0.98</cell></row><row><cell>SPLICE</cell><cell></cell><cell>575</cell><cell></cell><cell>2</cell><cell>0.5</cell><cell>0.7</cell></row><row><cell>PARATAX</cell><cell></cell><cell>575</cell><cell></cell><cell>2</cell><cell>0.5</cell><cell>0.98</cell></row><row><cell>Table 3</cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell>Error analysis</cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell>B. Appendix B</cell><cell></cell><cell></cell><cell></cell><cell></cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">https://www.porta.eurac.edu/lci/itaca/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_1">texts are collected in Bolzano, Bressanone, Merano and Brunico</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_2">https://commul.eurac.edu/annis/itaca</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="6" xml:id="foot_3">https://stanfordnlp.github.io/stanza/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="7" xml:id="foot_4">https://opus.nlpl.eu/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="8" xml:id="foot_5">https://it.wikipedia.org/wiki/Pagina_principale</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="9" xml:id="foot_6">https://github.com/idb-ita/GilBERTo?tab=readme-ov-file</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="10" xml:id="foot_7">https://colab.research.google.com/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="11" xml:id="foot_8">The code for this part of the project was written with the help of the AI tool Chat GPT.</note>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgments</head><p>We thank Fondazione Bruno Kessler Trento for their support on the ITACA corpus and for allowing us to use their student essay dataset for fine-tuning.</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Ministero dell&apos;Istruzione</title>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">E D R</forename></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Indicazioni nazionali per i licei, Ministero dell&apos;Istruzione</title>
				<meeting><address><addrLine>Roma, Italia</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2010">2010</date>
		</imprint>
		<respStmt>
			<orgName>Università e della Ricerca</orgName>
		</respStmt>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<monogr>
		<title level="m" type="main">Ministero dell&apos;Istruzione, Istituti tecnici: linee guida per il passaggio al nuovo ordinamento, Ministero dell&apos;Istruzione</title>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">E D R</forename></persName>
		</author>
		<imprint>
			<date type="published" when="2010">2010</date>
			<pubPlace>Roma, Italia</pubPlace>
		</imprint>
		<respStmt>
			<orgName>Università e della Ricerca</orgName>
		</respStmt>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Context and cognition: Knowledge frames and speech act comprehension</title>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">A</forename><surname>Van Dijk</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of pragmatics</title>
		<imprint>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page" from="211" to="231" />
			<date type="published" when="1977">1977</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Conditions for text coherence</title>
		<author>
			<persName><forename type="first">T</forename><surname>Reinhart</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Poetics today</title>
		<imprint>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page" from="161" to="180" />
			<date type="published" when="1980">1980</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Functional sentence perspective and the organization of the text</title>
		<author>
			<persName><forename type="first">F</forename><surname>Danes</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Papers on functional sentence perspective</title>
		<imprint>
			<biblScope unit="volume">23</biblScope>
			<biblScope unit="page" from="106" to="128" />
			<date type="published" when="1974">1974</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">On the status of theme in english: Arguments from discourse</title>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">H</forename><surname>Fries</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Micro and macro connexity of texts</title>
		<imprint>
			<biblScope unit="volume">45</biblScope>
			<date type="published" when="1983">1983</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Coherence and coreference</title>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">R</forename><surname>Hobbs</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Cognitive science</title>
		<imprint>
			<biblScope unit="volume">3</biblScope>
			<biblScope unit="page" from="67" to="90" />
			<date type="published" when="1979">1979</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<monogr>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">J</forename><surname>Grosz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">K</forename><surname>Joshi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Weinstein</surname></persName>
		</author>
		<title level="m">Centering: a framework for modelling the coherence of discourse</title>
				<imprint>
			<date type="published" when="1994">1994</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<monogr>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">Di</forename><surname>Eugenio</surname></persName>
		</author>
		<idno>arXiv preprint cmp-lg/9608007</idno>
		<title level="m">Centering in italian</title>
				<imprint>
			<date type="published" when="1996">1996</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Modeling local coherence: An entity-based approach</title>
		<author>
			<persName><forename type="first">R</forename><surname>Barzilay</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Lapata</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Computational Linguistics</title>
		<imprint>
			<biblScope unit="volume">34</biblScope>
			<biblScope unit="page" from="1" to="34" />
			<date type="published" when="2008">2008</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<monogr>
		<author>
			<persName><forename type="first">Y</forename><surname>Farag</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Yannakoudakis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Briscoe</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1804.06898</idno>
		<title level="m">Neural automated essay scoring and coherence modeling for adversarially crafted input</title>
				<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">A neural local coherence model for text quality assessment</title>
		<author>
			<persName><forename type="first">M</forename><surname>Mesgar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Strube</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2018 conference on empirical methods in natural language processing</title>
				<meeting>the 2018 conference on empirical methods in natural language processing</meeting>
		<imprint>
			<date type="published" when="2018">2018</date>
			<biblScope unit="page" from="4328" to="4339" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">A model of coherence based on distributed sentence representation</title>
		<author>
			<persName><forename type="first">J</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Hovy</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)</title>
				<meeting>the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)</meeting>
		<imprint>
			<date type="published" when="2014">2014</date>
			<biblScope unit="page" from="2039" to="2048" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">A neural local coherence model</title>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">T</forename><surname>Nguyen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Joty</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics</title>
		<title level="s">Long Papers</title>
		<meeting>the 55th Annual Meeting of the Association for Computational Linguistics</meeting>
		<imprint>
			<date type="published" when="2017">2017</date>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page" from="1320" to="1330" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Discotex at evalita 2023: overview of the assessing discourse coherence in italian texts task</title>
		<author>
			<persName><forename type="first">D</forename><surname>Brunato</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Colla</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Dell'orletta</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Dini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">P</forename><surname>Radicioni</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">A</forename><surname>Ravelli</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">CEUR WORKSHOP PROCEEDINGS</title>
				<imprint>
			<date type="published" when="2023">2023</date>
			<biblScope unit="volume">3473</biblScope>
			<biblScope unit="page" from="1" to="8" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Mpg at discotex: Predicting text coherence by treebased modelling of linguistic features</title>
		<author>
			<persName><forename type="first">M</forename><surname>Galletti</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Gravino</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Prevedello</surname></persName>
		</author>
		<ptr target="CEUR.org" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Eighth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop</title>
				<meeting>the Eighth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop<address><addrLine>EVALITA</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2023">2023. 2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<monogr>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">D</forename><surname>Hromei</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Croce</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Basile</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Basili</surname></persName>
		</author>
		<title level="m">Extremita at evalita 2023: Multi-task sustainable scaling to large language models at its extreme</title>
				<imprint>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">Iussnets at disco-tex: A fine-tuned approach to coherence</title>
		<author>
			<persName><forename type="first">E</forename><surname>Zanoli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Barbini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Chesi</surname></persName>
		</author>
		<ptr target="org" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Eighth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop</title>
				<meeting>the Eighth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop<address><addrLine>EVALITA</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2023">2023. 2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">Coherent or not? stressing a neural language model for discourse coherence in multiple languages</title>
		<author>
			<persName><forename type="first">D</forename><surname>Brunato</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Dell'orletta</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Dini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">A</forename><surname>Ravelli</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Findings of the Association for Computational Linguistics: ACL 2023</title>
				<imprint>
			<date type="published" when="2023">2023</date>
			<biblScope unit="page" from="10690" to="10700" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<monogr>
		<title level="m" type="main">Evaluating the robustness of neural language models to input perturbations</title>
		<author>
			<persName><forename type="first">M</forename><surname>Moradi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Samwald</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2108.12237</idno>
		<imprint>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b20">
	<monogr>
		<author>
			<persName><forename type="first">Y</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Pan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Tan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M.-Y</forename><surname>Kan</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2110.07159</idno>
		<title level="m">Interpreting the robustness of neural nlp models to textual perturbations</title>
				<imprint>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b21">
	<monogr>
		<author>
			<persName><forename type="first">J</forename><surname>Wei</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Zou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Eda</forename></persName>
		</author>
		<idno type="arXiv">arXiv:1901.11196</idno>
		<title level="m">Easy data augmentation techniques for boosting performance on text classification tasks</title>
				<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b22">
	<monogr>
		<author>
			<persName><forename type="first">A</forename><surname>Karimi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Rossi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Prati</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2108.13230</idno>
		<title level="m">Aeda: an easier data augmentation technique for text classification</title>
				<imprint>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b23">
	<analytic>
		<title level="a" type="main">Toward text data augmentation for sentiment analysis</title>
		<author>
			<persName><forename type="first">H</forename><forename type="middle">Q</forename><surname>Abonizio</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">C</forename><surname>Paraiso</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Barbon</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE Transactions on Artificial Intelligence</title>
		<imprint>
			<biblScope unit="volume">3</biblScope>
			<biblScope unit="page" from="657" to="668" />
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b24">
	<analytic>
		<title level="a" type="main">Augment to prevent: short-text data augmentation in deep learning for hate-speech classification</title>
		<author>
			<persName><forename type="first">G</forename><surname>Rizos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Hemker</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Schuller</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 28th ACM international conference on information and knowledge management</title>
				<meeting>the 28th ACM international conference on information and knowledge management</meeting>
		<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="991" to="1000" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b25">
	<monogr>
		<author>
			<persName><forename type="first">T</forename><surname>Kober</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Weeds</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Bertolini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Weir</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2005.01854</idno>
		<title level="m">Data augmentation for hypernymy detection</title>
				<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b26">
	<analytic>
		<title level="a" type="main">Detecting environmental, social and governance (esg) topics using domain-specific language models and data augmentation</title>
		<author>
			<persName><forename type="first">T</forename><surname>Nugent</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Stelea</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">L</forename><surname>Leidner</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Flexible Query Answering Systems: 14th International Conference, FQAS 2021</title>
				<meeting><address><addrLine>Bratislava, Slovakia</addrLine></address></meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2021">September 19-24, 2021. 2021</date>
			<biblScope unit="page" from="157" to="169" />
		</imprint>
	</monogr>
	<note>Proceedings 14</note>
</biblStruct>

<biblStruct xml:id="b27">
	<analytic>
		<title level="a" type="main">In viaggio verso itaca: la coerenza testuale come meta della scrittura scolastica. proposta di una griglia di valutazione</title>
		<author>
			<persName><forename type="first">A</forename><surname>Bienati</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Vettori</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Zanasi</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Italiano a scuola</title>
		<imprint>
			<biblScope unit="volume">4</biblScope>
			<biblScope unit="page" from="55" to="70" />
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b28">
	<analytic>
		<title level="a" type="main">The inception platform: Machineassisted and knowledge-oriented interactive annotation</title>
		<author>
			<persName><forename type="first">J.-C</forename><surname>Klie</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Bugert</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Boullosa</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">E</forename><surname>De Castilho</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Gurevych</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 27th international conference on computational linguistics: System demonstrations</title>
				<meeting>the 27th international conference on computational linguistics: System demonstrations</meeting>
		<imprint>
			<date type="published" when="2018">2018</date>
			<biblScope unit="page" from="5" to="9" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b29">
	<monogr>
		<author>
			<persName><forename type="first">A</forename><surname>Ferrari</surname></persName>
		</author>
		<title level="m">Linguistica del testo. Principi, fenomeni, strutture</title>
				<imprint>
			<publisher>Carocci</publisher>
			<date type="published" when="2014">2014</date>
			<biblScope unit="volume">151</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b30">
	<monogr>
		<author>
			<persName><forename type="first">A</forename><surname>Conneau</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Kruszewski</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Lample</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Barrault</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Baroni</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1805.01070</idno>
		<title level="m">What you can cram into a single vector: Probing sentence embeddings for linguistic properties</title>
				<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b31">
	<monogr>
		<author>
			<persName><forename type="first">J</forename><surname>Devlin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M.-W</forename><surname>Chang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Toutanova</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1810.04805</idno>
		<title level="m">Bert: Pre-training of deep bidirectional transformers for language understanding</title>
				<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b32">
	<analytic>
		<title level="a" type="main">Lico: A lexicon of italian connectives</title>
		<author>
			<persName><forename type="first">A</forename><surname>Feltracco</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Jezek</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Magnini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Stede</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">CLiC it</title>
		<imprint>
			<biblScope unit="page">141</biblScope>
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b33">
	<analytic>
		<title level="a" type="main">Una varietà dell&apos;italiano tra scritto e parlato: la scrittura degli apprendenti, Ferrari A</title>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">E</forename><surname>Roggia</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">De Cesare AM</title>
		<imprint>
			<biblScope unit="page" from="197" to="224" />
			<date type="published" when="2010">2010. 2010</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b34">
	<analytic>
		<title level="a" type="main">Didattica della scrittura e linguistica del testo: tre priorità di intervento, Ostinelli M.(a cura di)</title>
		<author>
			<persName><forename type="first">L</forename><surname>Cignetti</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">La didattica dell&apos;italiano. Problemi e prospettive</title>
				<meeting><address><addrLine>Locarno</addrLine></address></meeting>
		<imprint>
			<publisher>DFA SUPSI</publisher>
			<date type="published" when="2015">2015</date>
			<biblScope unit="page" from="14" to="24" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b35">
	<monogr>
		<author>
			<persName><forename type="first">A</forename><surname>Colombo</surname></persName>
		</author>
		<author>
			<persName><surname>Dubbi</surname></persName>
		</author>
		<title level="m">errori, correzioni nell&apos;italiano scritto: Dubbi, errori, correzioni nell&apos;italiano scritto</title>
				<imprint>
			<publisher>FrancoAngeli</publisher>
			<date type="published" when="2010">2010</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b36">
	<analytic>
		<title level="a" type="main">Scritto e parlato, il parlato nello scritto. per una didattica della consapevolezza diamesica</title>
		<author>
			<persName><forename type="first">M</forename><surname>Prada</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Italiano LinguaDue</title>
		<imprint>
			<biblScope unit="volume">8</biblScope>
			<biblScope unit="page" from="232" to="260" />
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b37">
	<monogr>
		<title level="m" type="main">Italian bert and electra models</title>
		<author>
			<persName><forename type="first">S</forename><surname>Schweter</surname></persName>
		</author>
		<idno type="DOI">10.5281/zenodo.4263142</idno>
		<ptr target="https://doi.org/10.5281/zenodo.4263142.doi:10.5281/zenodo.4263142" />
		<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b38">
	<monogr>
		<author>
			<persName><forename type="first">Y</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Ott</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Goyal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Du</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Joshi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Levy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Lewis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Zettlemoyer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Stoyanov</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1907.11692</idno>
		<title level="m">Roberta: A robustly optimized bert pretraining approach</title>
				<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b39">
	<monogr>
		<author>
			<persName><forename type="first">J</forename><surname>Abadji</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">O</forename><surname>Suarez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Romary</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Sagot</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2201.06642</idno>
		<title level="m">Towards a cleaner document-oriented multilingual crawled corpus</title>
				<imprint>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b40">
	<monogr>
		<author>
			<persName><forename type="first">T</forename><surname>Kudo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Richardson</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1808.06226</idno>
		<title level="m">Sentencepiece: A simple and language independent subword tokenizer and detokenizer for neural text processing</title>
				<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b41">
	<analytic>
		<title level="a" type="main">Italian-legal-bert: A pretrained transformer language model for italian law</title>
		<author>
			<persName><forename type="first">D</forename><surname>Licari</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Comandè</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">EKAW (Companion</title>
				<imprint>
			<date type="published" when="2022">2022</date>
			<biblScope unit="page">3256</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b42">
	<monogr>
		<author>
			<persName><forename type="first">I</forename><surname>Beltagy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Lo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Cohan</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1903.10676</idno>
		<title level="m">Scibert: A pretrained language model for scientific text</title>
				<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b43">
	<analytic>
		<title level="a" type="main">Random forests</title>
		<author>
			<persName><forename type="first">L</forename><surname>Breiman</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Machine learning</title>
		<imprint>
			<biblScope unit="volume">45</biblScope>
			<biblScope unit="page" from="5" to="32" />
			<date type="published" when="2001">2001</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b44">
	<monogr>
		<author>
			<persName><forename type="first">K</forename><surname>Clark</surname></persName>
		</author>
		<author>
			<persName><forename type="first">U</forename><surname>Khandelwal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Levy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">D</forename><surname>Manning</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1906.04341</idno>
		<title level="m">What does bert look at? an analysis of bert&apos;s attention</title>
				<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b45">
	<monogr>
		<author>
			<persName><forename type="first">M</forename><surname>Danilevsky</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Qian</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Aharonov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Katsis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Kawas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Sen</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2010.00711</idno>
		<title level="m">A survey of the state of explainable ai for natural language processing</title>
				<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
