<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Generating and Evaluating Multi-Level Text Simplification: A Case Study on Italian</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Michele Papucci</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Giulia Venturi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Felice Dell'Orletta</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>ItaliaNLP Lab @ Institute for Computational Linguistics, National Research Council</institution>
          ,
          <addr-line>Pisa</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Pisa</institution>
          ,
          <addr-line>Pisa</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <abstract>
        <p>Recent advances in Generative AI and Large Language Models (LLMs) have enabled the creation of highly realistic synthetic content, yet controlling model outputs remains a challenge. In this study, we explore the use of LLMs to generate high-quality synthetic data for Automatic Text Simplification (ATS), evaluating the ability of models fine-tuned on Italian to produce multiple simplified versions of the same original sentence that vary in readability and in their lexical and (morpho-)syntactic characteristics. The approach is tested across two domains, Wikipedia and Public Administration, allowing us to explore domain sensitivity. Additionally, we compare the linguistic phenomena observed in the generated data with those found in ATS resources previously created through manual or semi-automatic methods. Our results suggest that the best-performing LLM can generate linguistically diverse simplifications that align with known simplification patterns, ofering a promising direction for building reliable ATS resources, including simplifications suited to varying levels of reader proficiency.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Automatic Text Simplification</kwd>
        <kwd>Large Language Models</kwd>
        <kwd>Synthetic Data</kwd>
        <kwd>Linguistic Complexity</kwd>
        <kwd>Sentence Readability</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        phrase generation [
        <xref ref-type="bibr" rid="ref7 ref8">7, 8</xref>
        ] or machine translation [
        <xref ref-type="bibr" rid="ref10 ref9">9, 10</xref>
        ].
      </p>
      <p>
        More recently, Large Language Models (LLMs) have
Automatic Text Simplification (ATS) aims to reduce the introduced a new paradigm for ATS, also opening the
linguistic complexity of a text while preserving its mean- possibility of generating synthetic resources whose
qualing. Given that the dominant approach is data-driven, ity still requires thorough assessment [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. This trend
where models learn simplification operations from exam- aligns with broader eforts to leverage LLMs for
alleviatples of complex-simple sentence pairs [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], the availability ing the limitations of real-world data through synthetic
and nature of resources for ATS play a crucial role in de- data generation [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. Evaluation initiatives such as BLESS
termining the quality of these models. [12] have demonstrated that LLMs, under a few-shot
      </p>
      <p>
        Traditionally, manually constructed resources have setting, are capable of generating simplified sentences
been favored for their reliability and controllability [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. across multiple datasets, languages, and prompts. Yet,
However, the cost and labor-intensiveness of such eforts research to date has primarily focused on English and
limit their scalability, domain coverage, and language has relied on a limited set of evaluation metrics, leaving
diversity. To address these limitations, researchers have open questions about model behavior across diferent
explored unsupervised methods for resource construc- domains, languages, and target user needs. Notable
extion, including mining sentence pairs from aligned cor- ceptions for the Italian language include [13] and [14],
pora, primarily Wikipedia and Simple Wikipedia [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], or who assessed the ability of both open and proprietary
exploiting crowdsourcing approaches [
        <xref ref-type="bibr" rid="ref4 ref5">4, 5</xref>
        ]. In light of LLMs to produce simplified sentences. The former
foconcerns about the suitability of Wikipedia as an ATS cused on increased sentence readability, while the
latresource [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], and to tackle the broader scarcity of parallel ter examined both readability and semantic similarity,
simplification data especially for low-resource languages, comparing model-generated simplifications with those
researchers have also proposed methods to automatically written by human simplifiers. Interestingly, both studies
create parallel resources, inspired for example by para- targeted the administrative domain.
Starting from these premises, this paper introduces a
CLiC-it 2025: Eleventh Italian Conference on Computational Linguis- multifaceted approach to assess the ability of three small
tics, September 24 — 26, 2025, Cagliari, Italy LLMs fine-tuned on the Italian language to generate
seng$iumliai.cvheenletu.prai@puilccc.ic@nrp.iht d(G.u.nVipein.ittu(rMi);. fPealipcuec.dcei)l;lorletta@ilc.cnr.it tence simplifications along a gradient of complexity.
Af(F. Dell’Orletta) ter identifying the best-performing model, we examined
 https://michelepapucci.github.io/ (M. Papucci); its output along three main dimensions: i) its ability to
http://www.italianlp.it/people/giulia-venturi/ (G. Venturi); produce multiple simplifications for the same input
senhttp://www.italianlp.it/people/felice-dellorletta/ (F. Dell’Orletta) tence with increasing levels of readability; ii) the extent
(G.0V0e0n0t-u0r0i0)3-4251-7254 (M. Papucci); 0000-0001-5849-0979 to which the linguistic characteristics of the simplified
© 2025 Copyright for this paper by its authors. Use permitted under Creative Commons License sentences difer from those of the original; and iii) the
reAttribution 4.0 International (CC BY 4.0).
      </p>
      <p>
        3. Experimental Settings
lationship between the distribution of linguistic features
and the readability level. This in-depth linguistic
analysis of LLM-generated simplifications aims to achieve LLM selection. To identify the most suitable LLM for
two main objectives. First, it investigates whether small, the task of generating simplified sentences, we
considopen LLMs can reliably produce multiple simplifications ered three models specifically developed for the Italian
with varying degrees of linguistic complexity, thereby language, which difer in terms of architecture and
numofering a scalable strategy for creating resources tailored ber of parameters: ANITA1 [17], LLaMAntino-22 [18],
to diferent target populations, which remain scarce [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. and Italia3. All models were tested in a 0-shot setting. The
Second, it aims to explore whether specific linguistic models’ performance was evaluated against the test splits
patterns observed in original–simplified sentence pairs of the following Italian sentence simplification datasets:
are influenced by the approach used to construct ATS 51 paired original/simplified sentences from SIMPITIKI 4
resources, as discussed in [15]. [19], 994 sentence pairs filtered from PaCCSS–IT [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], 101
sentence pairs from the Terence corpus and 17 from the
Teacher corpus [16], 49 sentence pairs extracted from
2. Methodology ADMIN-it [20], for a total of 1,212 sentence pairs.
As evaluation metrics, we selected a set of
complemenThe approach we propose for assessing the ability of tary measures addressing diferent aspects of sentence
LLMs to automatically generate sentence simplifications simplification. Specifically, we included i) two metrics
along a gradient of linguistic complexity is articulated in widely used in the literature that focus on surface-level
three main steps: properties related to writing style, i.e. BLEU [21] and
1. selection of an LLM fine-tuned on the Italian lan- SARI [22], and ii) two semantic similarity metrics used
guage, capable of reliably generating sentences in to assess meaning preservation, i.e. BertScore [23] and
the target language, and identification of a corpus SentenceTransformer Similarity [24, 25]. In addition, we
of human-written sentences to be used as original evaluated the simplified sentences in terms of variation in
inputs; readability computed by READ-IT [26], the first
machine2. prompting the selected LLM to generate multiple learning-based automatic readability assessment tool
desimplified versions of each original sentence to veloped for Italian, combining traditional surface features
obtain diverse outputs per input; with lexical, morpho-syntactic, and syntactic information
3. evaluation of the resulting sentence pairs in terms correlated with linguistic complexity.
      </p>
      <p>of their linguistic feature diversity and variation All models were evaluated on a single generation for
in readability levels. each input. Each model was prompted using its
respective system prompt, combined with a shared task-specific</p>
      <p>The main objective of the first two steps, described instruction to simplify the text while preserving the
origiin Section 3, is to construct a parallel corpus composed nal meaning.5. The results are reported in Table 1, where
of human-written original sentences and multiple au- it should be noted that the evaluation metrics follow
tomatically generated simplified versions. This allows an increasing trend, meaning that higher scores
correfor capturing a range of sentence transformations char- spond to more simplified sentences. In contrast,
READacterized by diferent linguistic phenomena. In this re- IT scores exhibit the opposite trend: they range from 0
spect, the proposed methodology is particularly suitable (most readable sentence) to 100 (least readable sentence),
for low-resource languages, where simplified corpora re- as they reflect the level of linguistic complexity of the
inmain scarce, especially those addressing multiple reader put. Notably, LLaMAntino-2 consistently outperformed
profiles, domains, or textual genres. the other LLMs across all evaluation metrics,
generat</p>
      <p>
        The evaluation of the generated simplifications, which ing sentences that are simpler than the original inputs
constitutes the main focus of this study, is presented in both surface-level properties and semantic content.
in Section 4. Our multifaceted evaluation methodology Moreover, its outputs had the lowest READ-IT scores,
aims to assess not only how readability levels vary across indicating that they are the least linguistically complex
the multiple simplifications and relative to the original among those produced by the tested models. As a result,
sentence, but also how the lexical, morpho-syntactic, and it was selected for the second step of our methodology.
syntactic characteristics of the sentence pairs change. A
further contribution of this study lies in a comparative 1HuggingFace handle:
swap-uniba/LLaMAntino-3-ANITA-8B-Instanalysis designed to explore whether specific linguistic 2DHPuOgg-iInTgAFace handle: swap-uniba/LLaMAntino-2-7b-hf-dolly-ITA
phenomena observed in the LLM-generated simplifica- 3HuggingFace handle: iGeniusAI/Italia-9B-Instruct-v0.1
tions resemble those found in existing Italian ATS re- 4From SIMPITIKI we took only the Wikipedia sentence pairs and
sources, specifically two created manually [ 16] and one excluded the Administrative domain ones, since those are the same
semi-automatically [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. 5sSeenetAenpcpeesnadlirxeaAdyfoprrmesoernet dinetAaiDlsM.IN-IT.
      </p>
      <sec id="sec-1-1">
        <title>Model ANITA LLaMAntino-2 Italia</title>
        <p>
          Textual domains. We tested the full experimental set- keeping the original information content. For instance,
ting on two corpora representative of two Italian lan- the simplest sentence (i.e. the sentence with the lowest
guage varieties that are widely acknowledged to exhibit READ-IT score) is characterized by a reduced distance
significantly diferent linguistic features. Specifically, between the nominal subject (le concessioni ‘the
conceswe selected a collection of sentences downloaded from sions’) and the main verb (devono essere considerate ‘must
Wikipedia pages, as it is the most frequently addressed be considered’). In addition, the main verb undergoes
domain in the literature on ATS [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]. As a counterpart, i) a lexical simplification since the simpler considerare
we included the “PaWaC – Public Administration Web ‘to consider’ replaces the more complex original verb
as Corpus” (PaWaC [27]), which contains a wide range interdersi ‘to understand’ and ii) a morphological
simpliof administrative texts (resolutions, circular letters, etc.) fication since the epistemic future is replaced by a more
and represents the Italian language used in public ad- straightforward present-tense form. Also in the case of
ministration, a language variety well-known for its high the Wikipedia example, the most simplified sentences
level of multilevel linguistic complexity [28]. For both are the result of structural transformations. Namely, the
domains, we randomly sampled 10,000 sentences to serve two versions with the lowest READ-IT scores contain
as the original texts for generating multiple simplified the main at the active voice instead of the passive, and
variants. feature shorter syntactic dependency links among words.
Generation of multiple simplifications. Step two Linguistic profiling. Our evaluation step includes a
of our methodology was performed by prompting comparative analysis of the distribution of multilevel
linLLaMAntino-2 with the same prompt introduced previ- guistic features automatically extracted from the original
ously to generate multiple simplified versions for the col- and the LLaMAntino-2–generated simplified sentences.
lection of the original 10,000 sentences for the Wikipedia To this end, we adopted Profiling-UD [ 30], a web-based
and administrative domains. To this end, we employed tool designed to linguistically profile multilingual texts
the Divergent Beam Search decoding technique [29] to using the Universal Dependencies (UD) formalism [31].
obtain multiple simplifications for each original sentence. The full set of features is detailed in Table 3. They can
Through manual inspection of the outputs generated un- be grouped into nine categories, each corresponding to
der diferent decoding settings, we found that using 20 linguistic phenomena possibly related to sentence
combeams divided into 10 groups, with a diversity penalty plexity. These range from raw text properties, such as
 = 0.7, provided the best results in terms of diversity sentence length, to more complex features, such as the
of the simplifications and text fluency. distribution of UD Parts-of-Speech (POS), dependency
        </p>
        <p>Using this decoding strategy, we obtained 10 simplifi- relations, and verbal inflectional properties (e.g. mood,
cations for each original sentence. The resulting resource tense)6. Additional features capture global and local
synwas automatically revised by removing duplicate simpli- tactic structure, such as the structure of verbal predicates,
ifcations and cases where the original and simplified sen- the order of nuclear sentence elements (subject and
obtences were identical. After this clean-up, we obtained ject) relative to the verb, and the use of subordination.
71,837 original/simplified sentence pairs for Wikipedia The set also includes features modeling the lexical variety
and 78,184 pairs for PaWaC. of sentences, specifically in terms of the i) distribution</p>
        <p>Table 2 reports two examples randomly extracted from of word forms and lemmas belonging to the New Basic
the generated resource. Concerning the administrative Italian Vocabulary (NBIV) [32], further classified into the
domain, we can see that the least simplified PaWac sen- three usage categories, and ii) distribution of lemmas
tences (i.e. those with the higher READ-IT scores) are sim- based on classes of frequency computed using the Italian
plified primarily through the deletion of informational section of the MC4 corpus [33, 34] according to the
folcontent (e.g. non automaticamente rinnovabili ‘not
automatically renewable’ is removed). In contrast, the most 6Foorgr/uth/peolsi/sitndoefxU.hDtmPl,OS orfefersytnotahctttipcs://ruenlaivtieornsasldetpoendhetntpcsie:/s/.
simplified sentences display linguistic features typically universaldependencies.org/u/dep/index.html, and of verbal
morphoassociated with more readable sentence structures while logical tags https://universaldependencies.org/u/feat/index.html.</p>
        <p>Original</p>
        <sec id="sec-1-1-1">
          <title>Simplifications</title>
        </sec>
        <sec id="sec-1-1-2">
          <title>Original</title>
        </sec>
        <sec id="sec-1-1-3">
          <title>Simplifications</title>
        </sec>
        <sec id="sec-1-1-4">
          <title>PaWac</title>
          <p>Stabilito pertanto che le concessioni rilasciate in base al presente bando dovranno
intendersi come decennali e non automaticamente rinnovabili (It is therefore
established that the concessions granted under this call shall be understood decennial and not
automatically renewable)
Stabilito pertanto che le concessioni rilasciate in base al presente bando dovranno
intendersi come decennali. (It is therefore established that the concessions granted under
this call shall be understood as decennial.)
Stabilito pertanto che le concessioni rilasciate in base al presente bando dovranno
intendersi come decennali e non rinnovabili. (It is therefore established that the
concessions granted under this call shall be understood as decennial and not renewable.)
Il presente bando stabilisce che le concessioni devono essere considerate decennali e
non automaticamente rinnovabili. (This call establishes that the concessions must be
considered decennial and not automatically renewable.)
Il presente bando stabilisce che le concessioni devono essere considerate decennali
e non rinnovabili automaticamente (This call establishes that the concessions must be
considered decennial and not renewable automatically.)
Quindi le concessioni basate sul presente bando devono essere considerate decennali
e non automaticamente rinnovabili. (Therefore the concessions based on this call must
be considered decennial and not automatically renewable)</p>
        </sec>
        <sec id="sec-1-1-5">
          <title>Wikipedia</title>
          <p>Alcuni composti aromatici più pesanti, come lo xilene, possono essere utilizzati al
posto del toluene ottenendo rese comparabili. (Some heavier aromatic compounds, such
as xylene, can be used instead of toluene obtaining comparable yields.)
Alcuni composti aromatici più pesanti possono essere utilizzati al posto del toluene
ottenendo rese comparabili. (Some heavier aromatic compounds can be used instead of
toluene obtaining comparable yields.)
La maggior parte degli aromi più pesanti possono essere utilizzati al posto di toluene
ottenendo rese comparabili. (The majority of heavier aromatics can be used in place of
toluene obtaining comparable results.)
La maggior parte degli aromi più pesanti possono essere utilizzati al posto di toluene.
(The majority of heavier aromatics can be used in place of toluene.)
È possibile utilizzare xilene invece di toluene per ottenere un prodotto finale simile. (It
is possible to use xylene instead of toluene to obtain a similar end product.)
È possibile utilizzare xilene invece di toluene per ottenere una resa simile. (It is possible
to use xylene instead of toluene to obtain a comparable yield.)
.70
.61
.34
.31
.29
.59
.34
.25
.21
.16
.15
lowing function:  = ⌊log2 (()) ⌋, where MFL
is the most frequent lemma in the corpus and CL is the
considered lemma.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>4. Linguistic Analysis of Simplified</title>
    </sec>
    <sec id="sec-3">
      <title>Sentences</title>
      <p>The evaluation of the LLaMAntino-2–generated
simpliifed sentences was conducted both in terms of readability
scores (see Section 4.1) and linguistic profiles (see
Section 4.2) in comparison to their corresponding original
sentences. In addition, we investigated whether there is
a relationship between the changes in linguistic features
and the variation in readability levels across
original/simplified sentence pairs, with the aim of identifying which
linguistic phenomena are most associated with variation
in linguistic complexity (see Section 4.3). All evaluations
were conducted considering a randomly sampled
subset of 2,000 paired original/simplified sentences for each
domain7. Finally, Section 4.4 presents the results of a
comparative analysis designed to examine whether
different approaches to the construction of ATS resources
influence the linguistic characteristics of simplified texts.
4.1. Sentence Readability
The first evaluation step was conducted by considering,
for each original sentence, three representative cases
among the multiple automatically generated
simplifica7The dataset is freely available at https://github.com/michelepapucci/
multilevel-text-simplification-italian
tions: the Most simplified sentence , i.e. the one with the
lowest READ-IT score, the Least simplified sentence , with
the highest score, and a Randomly-selected simplification ,
selected from the remaining simplifications. The
comparison was computed adopting the Kernel Density
Estimation (KDE), a probability distribution estimate obtained
by smoothing out the READ-IT data points to create a
continuous curve. Results are reported in Figure 1, where
we can see that for both domains, all three types of
simplifications exhibit a higher frequency of data points with
lower READ-IT scores, confirming that the simplified
sentences are generally easier to read. However, the shape
of the distributions indicates that readability
improvements vary depending on the source domain.
Specifically, Wikipedia original sentences show a more uniform
distribution across READ-IT scores, while PaWaC
sentences are more concentrated at the higher end of the
readability spectrum. This indicates that the simplified
sentences in the administrative corpus remain less
accessible than Wikipedia simplified sentences, reflecting the
intrinsically higher linguistic complexity of
administrative texts. Looking at the multiple simplifications, the
Most simplified sentences exhibit a strongly left-skewed
distribution in both domains, indicating that at least one
version per original achieves significantly lower
READIT scores. For the Randomly-selected simplifications , the
KDE curve for Wikipedia shows a marked shift toward
lower scores, suggesting that model-generated
simplifications are generally simpler than their originals. A similar
trend is observed for the PaWaC domain, although the
distribution is flatter and less uniform, indicating greater
variability across the simplified outputs.
4.2. Linguistic Features
The linguistic profile–based evaluation is twofold. The
ifrst level focuses on analyzing the diferences between
each of the three types of generated simplifications and
Feature
sent_len
aux_Sub
verbal_head
subord_3
tree_depth
subord_prop
verbs_Ind
verbs_Fut
avg_Schain_len
n_prep_chains
links_len_max
subord_post
highest_class
principal_prop
dep_iobj
verbs_Sing3
obj_pre
subord_1
verbs_Ger
dep_aux
upos_AUX
links_len_avg
verbs_Plur3
avg_Pchain_len
aux_Part
subj_post
dep_appos
obj_post
verbs_Pres
aux_Pres
verb_edges_5
verb_edges_0
dep_parataxis
verb_edges_1
subord_2
verbs_Fin
aux_Inf
their corresponding original sentence, in terms of
linguistic profile. To this end, we applied a Multivariate
Analysis of Variance (MANOVA), which, unlike
traditional ANOVA that considers only a single dependent
variable, MANOVA evaluates whether the mean vectors</p>
      <sec id="sec-3-1">
        <title>Original vs Least Simplified Original vs Randomly-Selected Original vs Most Simplified</title>
        <p>of multiple dependent variables difer significantly
between groups, making it well-suited to our multi-feature
linguistic profiling. To quantify the degree of diference
in each comparison, we report Pillai’s Trace, one of the
statistics derived from MANOVA. Pillai’s Trace is
particularly robust, especially in situations where assumptions
like homogeneity of covariance matrices may be violated.</p>
        <p>Higher values of Pillai’s Trace indicate greater
multivariate diferences between groups.</p>
        <p>The results, summarized in Table 4, show that all related to sentence complexity, regardless of the textual
comparisons yield statistically significant diferences domain, and are typically modified to improve sentence
( ≤ 10− 4) in both domains. Among the three sets, the readability. As expected, among these features we find
Least Simplified sentences consistently yield the small- sentence length (sent_len), which displays the highest
est Pillai’s Trace values (.12 for Wikipedia and .16 for  score in Wikipedia and the second highest in PaWaC.
PaWaC), indicating the greatest similarity to the origi- However, by inspecting the diferences across domains,
nal sentences. In contrast, the Most Simplified sentences we observe that administrative sentences are particularly
show the highest values (.44 and .46), indicating that shortened compared to their originals. Since the majority
the simplification process led to substantial transforma- of the features considered are closely tied to sentence
tions in their linguistic profiles. The Randomly-Selected length, this outcome may impact the distribution of the
simplifications fall in between, though they are closer other most varying features.
to the least simplified set, indicating that they retain a Nevertheless, we can see that several features
modelconsiderable degree of the original sentences’ linguistic ing diferent syntactic properties of sentences are highly
characteristics. This aligns with the trend observed in ranked in terms of  score for both domains. One such
feaFigure 1, where the KDE curve for the Randomly-Selected ture is the distribution of verbal heads (verbal_head), i.e.
simplifications peaks at lower READ-IT scores, similar tokens POS-tagged as verbs that function as the syntactic
to the most simplified set, but also shows a broader tail, head in dependency relations, which is notably reduced
indicating that some of these sentences remain close in in the simplified sentences. This reduction is closely
readability to the originals. This trend is shared across do- linked to the decreased use of subordination, as indicated
mains, even with some diferences that highlight domain- by lower values of a set of related features capturing this
specific characteristics of the simplification process. phenomenon. The set includes: the overall distribution of</p>
        <p>Notably, we generally observe slightly higher Pillai’s subordinate clauses (subord_prop), their position relative
Trace values for the PaWaC dataset. This suggests that, to the principal clause (subord_post), and their
organialthough simplified sentences in the administrative do- zation into sequences of embedded subordinate clauses
main tend to have higher READ-IT scores than those (avg_Schain_len). Among these, we can also include a
from Wikipedia, the MANOVA results indicate that their feature from the verb inflectional morphology group that
generation involves more substantial transformations, is closely related to reduced subordination: the lower
possibly afecting multiple linguistic features, pointing distribution of subjunctives (aux_Sub). Additionally,
feato more articulated simplification processes in this do- tures modeling both global and local aspects of syntactic
main. Consequently, even the Least Simplified PaWaC tree structure vary significantly in both domains. These
sentences display a more distinct linguistic profile com- include syntactic tree depth (tree_depth), indicative of
senpared to their originals. tence complexity [36], as well as two features associated
Feature-based Analysis. It is focused on the set of with long-distance dependencies, well-known sources of
Randomly-selected Simplifications , which serve as rep- cognitive load [37, 38]: the length of the longest
depenresentative examples of typical simplifications, as they dency link (links_len_max) and the number of embedded
were randomly selected from the pool excluding the ex- sequences of prepositional complements (n_prep_chains).
tremes. Specifically, we applied the Wilcoxon signed- A similar pattern is observed in the lower frequency of
rank test (with  &lt; 0.05) to compare the distribution subjects and objects in non-canonical position
occurof each feature between the original sentence and its ring in simplified sentences, specifically pre-verbal
obcorresponding simplification. In addition, to quantify jects (obj_pre) and post-verbal subjects (subj_post), both
the strength of the observed diferences, we computed known to be harder to process. On the lexical side,
simplitheir rank-biserial correlation score  [35], which ranges fied sentences in both domains exhibit a reduced
proporbetween +1 (when the value of the feature occurring tion of lemmas from the highest frequency class
(highin the original sentence is higher than in the simplified est_class). Interestingly, both domains display negative 
sentence) and − 1 (in the opposite case). By capturing scores for the distribution of auxiliary verbs (upos_AUX
the efect size of the Wilcoxon test, the  score reflects and dep_aux), indicating an increase in auxiliary usage in
the magnitude of statistically significant distributional simplified versions. An in-depth analysis of verb forms
diferences. Tables 5 and 6 show features with || ≥ 0.4 reveals that this may reflect a higher prevalence of
‘pasand their mean and standard deviation for the Wikipedia sato prossimo’ tenses (roughly present perfect tenses) and
and PaWac domains8. a corresponding reduction of ‘passato remoto’ (roughly</p>
        <p>Quite interestingly, a subset of the reported features simple pasts), particularly in Wikipedia.
is shared across the two domains. This suggests that When focusing on features that vary significantly and
these features correspond to linguistic phenomena highly with || ≥ 0.4 in only one domain, we find that they
capture finer-grained phenomena. They predominantly
8The full list of features is reported in Appendix C. involve the distribution of specific verb tenses, such as
present tense forms (*_Pres) in Wikipedia (whereas in to exhibit a relatively high level of linguistic complexity
PaWaC they show only || = 0.15), and future (*_Fut) even after simplification (see Figure 1). It is therefore
and imperfect (*_Imp) tenses in PaWaC (but not signifi- plausible that a surface-level transformation such as
recantly varying in Wikipedia). A similar trend is observed ducing sentence length is less predictive of changes in
for specific verb moods such as particles ( *_Part), which readability scores in this domain. This interpretation is
vary above our threshold only in Wikipedia, and condi- also consistent with the MANOVA results, which
inditionals (*_Cond), varying significantly in PaWaC. cate that simplified PaWaC sentences difer more
substantially from their original versions across multiple
4.3. Linguistic Features and Readability linguistic features, suggesting a more articulated
simpliifcation process.</p>
        <p>As a third level of analysis, we investigated which lin- Among the top-ranked correlated features, we find
guistic phenomena characterize automatically simplified several that, while sensitive to sentence length, also
resentences in relation to the diferences in readability be- lfect deeper, linguistically motivated transformations
intween the original and simplified versions. To this end, volved in the simplification process. This is the case of
considering the Randomly-selected simplification , we com- the distribution of verbal heads (verbal_head_per_sent)
puted Spearman correlations between the diferences in and of a subset of related features modeling the
subthe distribution of the linguistic features, extracted us- ordination. These include: the overall distribution
ing Profiling-UD, and the corresponding diferences in of subordinate clauses (subordinate_proposition_dist);
their READ-IT scores. The results are reported in Ap- their organization in recursively embedded
subordipendix B, where we compare the correlation scores for nate clause chains within a top-level subordinate clause
the Wikipedia and PaWac domains. We focus on the set (avg_subordinate_chain_len_dif ); their relative order
of linguistic features that show statistically significant with respect to the principal clause (subordinate_post), a
correlations (i.e.  &lt; 0.05). characteristic associated with diferences in cognitive
pro</p>
        <p>As can be seen, most of the correlation scores are pos- cessing dificulty [ 39]; and a specific type of subordinate
itive. This suggests that an increase in the diference clauses, i.e. relative clauses (dep_dist_acl:relcl), which are
of specific linguistic features between original and sim- well-known sources of processing dificulty. In addition,
plified sentences is often directly proportional to the we find two features related to long-distance
construcincrease in their readability diference. This is the case, tions: the length of the longest dependency link in a
for example, for the distribution of subordinate clauses sentence (max_links_len) and the number of embedded
(subordinate_proposition) in both domains, which tend sequences of prepositional complements governed by a
to be significantly reduced in the simplified sentences, nominal head (n_prepositional_chains).
leading to lower syntactic complexity and, consequently, Focusing on lexical variation, the reduction in the
proa lower READ-IT score. By contrast, the diference in the portion of lemmas belonging to the highest frequency
distribution of auxiliary verbs (upos_dist_AUX ) shows class (highest_class) shows a positive correlation with
a negative correlation with the diference in READ-IT readability improvement, particularly in PaWac ( =
scores for both domains, as the distribution of auxiliaries 0.20) compared to Wikipedia ( = 0.16). Conversely,
increases in the simplified sentences. a slight increase in the use of ‘high availability words’
Cross-Domain Correlation Patterns. When ranking (lower-frequency lemmas referring to everyday objects
the linguistic features in decreasing order of correlation, or actions and well known to speakers), as identified in
we observe that the most strongly correlated features the NBIV (in_AD_types), is negatively correlated in both
are shared across both domains, despite diferences in domains.
correlation scores. Notably, many of the top-ranked ones
correspond to those discussed in the previous section. 4.4. Comparing Simplification
This seems to support the hypothesis that the linguistic
phenomena mostly involved in the transformations of Approaches
original sentences are also those that have the greatest
impact on sentence readability.</p>
        <p>
          As expected, the most strongly correlated feature is
sentence length (tokens_per_sent), which is considerably
reduced in the simplified sentences. Interestingly, even
if this pattern holds across both domains, the
correlation is stronger for Wikipedia ( = 0.51) than for PaWac
( = 0.42). This seems to align with and complement the
intuition that simplifying administrative texts is
particularly challenging, as many of the PaWac sentences tend
We complemented the linguistic profiling of the
LLaMAntino-2–generated simplified sentences with a
comparative analysis aimed at identifying whether
certain linguistic phenomena are specific to the LLM-based
approach to ATS resource construction or are shared
across diferent simplification methodologies. To this
end, we started from the findings of [ 15], who compared
two Italian ATS resources created manually, “Teacher”
and “Terence” [16], and one semi-automatically,
PaCCSSIT [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ], focusing on the distribution of a set of linguistic
features comparable to those used in the present study. [15]. This aligns with observations about the insertion
Our main goal is to assess whether some linguistic fea- of explicit arguments to reduce the inference load
associtures are characteristic of simplified sentences regardless ated with null-subject constructions [40]. Interestingly,
of the simplification method adopted. While prelimi- however, the tendency to favor the canonical Italian
arnary, our results provide initial insights into whether an gument order, with subjects preceding the verb and
obLLM-based method yields simplified sentences with char- jects following it, is not consistently observed across
acteristics similar to those produced by human experts. resources. While unmarked word orders are generally
        </p>
        <p>The first characteristic shared by sentences simplified preferred in simplification, as they are known to ease
by both human experts and automatically generated con- processing in free word-order languages [41], a higher
cerns their sentence length. Simplified sentences are proportion of pre-verbal subjects is found only in the
always shorter than their original counterparts. This PaWac LLaMAntino-2-generated simplifications and in
could be expected since sentence length has been con- the Teacher corpus. An even less consistent pattern
sidered as a shallow proxy of sentence complexity and emerges for post-verbal objects, whose distribution
difis widely used by traditional readability assessment for- fers across original and simplified sentences without a
mulas. However, the diferent average length in original- systematic direction.
simplified sentence pairs may difer according to textual
genre, as shown in our analysis and discussed in [15].</p>
        <p>A second group of features common to all ATS re- 5. Conclusion
sources includes those modeling the morpho-syntactic
profile of the simplified sentences 9. Similarly to manu- This study investigated the ability of small LLMs
fineally and semi-automatically built simplifications, the sen- tuned on the Italian language to generate sentence
simtences automatically generated by LLaMAntino-2 tend plifications in a zero-shot setting, focusing on two
linto contain fewer pronouns, adverbs, and punctuation guistically distinct domains: Wikipedia and Public
Admarks, and a higher proportion of determiners. However, ministration. All tested models were able to produce
simin contrast to the findings reported in [ 15], which were plified sentences that preserved the surface-level
propalso based on the Wilcoxon signed-rank test ( &lt; 0.05), erties and semantic content of the original inputs while
the LLM-generated simplified sentences exhibit a higher improving readability. Among them, LLaMAntino-2
confrequency of nouns, and the variation in the distribution sistently outperformed the other models across all
evalof adjectives compared to the original sentences is not uation metrics. Beyond single-sentence simplification,
statistically significant. We leave to future work the in- we also showed that prompting the model to generate
vestigation of whether this trend may be influenced by multiple outputs for the same input sentence results in a
the textual genre of the original sentences. meaningful gradient of linguistic complexity.</p>
        <p>Among the features common across approaches, we Domain-specific analyses revealed that, although
simifnd those capturing global and local syntactic structure. plified sentences in the administrative domain remain
As also observed in Section 4.2, simplified sentences tend less accessible than their Wikipedia counterparts,
simto have shallower syntactic trees and shorter dependency plifying administrative texts involves more substantial
links, suggesting that reducing syntactic depth and de- linguistic transformations, as suggested by MANOVA
pendency length is a broadly adopted simplification strat- results, thus pointing to more complex simplification
egy. However, when examining finer-grained syntactic strategies in this domain. These findings highlight the
properties, some diferences emerge. A first example potential of this approach to support the development
concerns the use of subordination. While previous stud- of ATS resources tailored to specific reader profiles and
ies suggest that subordinate clauses following the main domains. Despite a few cross-domain diferences, our
clause are easier to process [39], only the “Terence” cor- analysis of the linguistic features most afected by
simpus and PaCCSS-IT show a higher percentage of post- plification shows that many transformations are shared
verbal subordinates. By contrast, an opposite trend is across domains and closely align with known
simplificaobserved in the sentences automatically generated by tion patterns found in manually constructed ATS corpora.
LLaMAntino-2 as well as in the manually built “Teacher” These findings support two key directions for future
corpus, where post-verbal subordinates are less frequent. work. First, the generation of synthetic simplifications
A second example is the distribution of subjects. All re- using small, language-specific LLMs ofers a promising
sources show an increased presence of overt subjects in method for building ATS resources in low-resource
setsimplified sentences, particularly in the “Teacher” cor- tings. Second, the linguistic properties characterizing
pus, representing an intuitive manual simplification in LLM-generated simplifications can inform Controllable
Text Generation approaches [42], enabling models to be
guided toward specific simplification strategies aligned
with the needs of diferent reader populations.
9The values of some linguistic features are not reported in Tables 6
and 5, as their rank-biserial correlation scores are || ≤ 0.4.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Acknowledgments</title>
      <p>This work has been supported by the project “XAI-CARE”
funded by the European Union - Next Generation EU
NRRP M6C2 “Investment 2.1 Enhancement and
strengthening of biomedical research in the NHS”
(PNRR-MAD2022-12376692_VADALA’ – CUP F83C22002470001) and
by the PRIN 2022 project TEAMING-UP - Teaming up
with Social Artificial Agents (20177FX2A7) funded by the
Italian Ministry of University and Research.</p>
      <p>18653/v1/2024.findings-acl.658. for sentences in Italian administrative language,
[12] T. Kew, A. Chi, L. Vásquez-Rodríguez, S. Agrawal, in: Y. He, H. Ji, S. Li, Y. Liu, C.-H. Chang (Eds.),
D. Aumiller, F. Alva-Manchego, M. Shardlow, Proceedings of the 2nd Conference of the
AsiaBLESS: Benchmarking large language models on Pacific Chapter of the Association for
Computasentence simplification, in: H. Bouamor, J. Pino, tional Linguistics and the 12th International Joint
K. Bali (Eds.), Proceedings of the 2023 Conference Conference on Natural Language Processing
(Volon Empirical Methods in Natural Language Pro- ume 1: Long Papers), Association for
Computacessing, Association for Computational Linguis- tional Linguistics, Online only, 2022, pp. 849–866.
tics, Singapore, 2023, pp. 13291–13309. URL: https: URL: https://aclanthology.org/2022.aacl-main.63/.
//aclanthology.org/2023.emnlp-main.821/. doi:10. doi:10.18653/v1/2022.aacl-main.63.
18653/v1/2023.emnlp-main.821. [21] K. Papineni, S. Roukos, T. Ward, W.-J. Zhu, Bleu:
[13] D. Nozza, G. Attanasio, Is it really that simple? a method for automatic evaluation of machine
prompting large language models for automatic translation, in: P. Isabelle, E. Charniak, D. Lin
text simplification in Italian, in: F. Boschetti, G. E. (Eds.), Proceedings of the 40th Annual Meeting of
Lebani, B. Magnini, N. Novielli (Eds.), Proceedings the Association for Computational Linguistics,
Asof the 9th Italian Conference on Computational sociation for Computational Linguistics,
PhiladelLinguistics (CLiC-it 2023), CEUR Workshop Pro- phia, Pennsylvania, USA, 2002, pp. 311–318. URL:
ceedings, Venice, Italy, 2023, pp. 322–333. URL: https://aclanthology.org/P02-1040/. doi:10.3115/
https://aclanthology.org/2023.clicit-1.39/. 1073083.1073135.
[14] M. Russodivito, V. Ganfi, G. Fiorentino, R. Oliveto, [22] W. Xu, C. Napoles, E. Pavlick, Q. Chen, C.
CallisonAI vs. human: Eefctiveness of LLMs in simplifying Burch, Optimizing statistical machine translation
Italian administrative documents, in: F. Dell’Orletta, for text simplification, Transactions of the
AsA. Lenci, S. Montemagni, R. Sprugnoli (Eds.), Pro- sociation for Computational Linguistics 4 (2016)
ceedings of the 10th Italian Conference on Compu- 401–415. URL: https://aclanthology.org/Q16-1029/.
tational Linguistics (CLiC-it 2024), CEUR Workshop doi:10.1162/tacl_a_00107.</p>
      <p>Proceedings, Pisa, Italy, 2024, pp. 842–853. URL: [23] T. Zhang, V. Kishore, F. Wu, K. Q. Weinberger,
https://aclanthology.org/2024.clicit-1.91/. Y. Artzi, Bertscore: Evaluating text generation
[15] D. Brunato, F. Dell’Orletta, G. Venturi, with bert, in: International Conference on
LearnLinguistically-based comparison of difer- ing Representations, 2020. URL: https://openreview.
ent approaches to building corpora for text net/forum?id=SkeHuCVFDr.
simplification: A case study on italian, Fron- [24] N. Reimers, I. Gurevych, Sentence-bert: Sentence
tiers in Psychology Volume 13 - 2022 (2022). embeddings using siamese bert-networks, in:
ProURL: https://www.frontiersin.org/journals/ ceedings of the 2019 Conference on Empirical
Methpsychology/articles/10.3389/fpsyg.2022.707630. ods in Natural Language Processing, Association
doi:10.3389/fpsyg.2022.707630. for Computational Linguistics, 2019. URL: https:
[16] D. Brunato, F. Dell’Orletta, G. Venturi, S. Monte- //arxiv.org/abs/1908.10084.</p>
      <p>magni, Design and annotation of the first Italian [25] N. Reimers, I. Gurevych, Making monolingual
sencorpus for text simplification, in: A. Meyers, I. Re- tence embeddings multilingual using knowledge
hbein, H. Zinsmeister (Eds.), Proceedings of the distillation, in: Proceedings of the 2020 Conference
9th Linguistic Annotation Workshop, Association on Empirical Methods in Natural Language
Profor Computational Linguistics, Denver, Colorado, cessing, Association for Computational Linguistics,
USA, 2015, pp. 31–41. URL: https://aclanthology. 2020. URL: https://arxiv.org/abs/2004.09813.
org/W15-1604/. doi:10.3115/v1/W15-1604. [26] F. Dell’Orletta, S. Montemagni, G. Venturi, READ–
[17] M. Polignano, P. Basile, G. Semeraro, Advanced IT: Assessing readability of Italian texts with a view
natural-based interaction for the italian language: to text simplification, in: N. Alm (Ed.), Proceedings
Llamantino-3-anita, 2024. arXiv:2405.07101. of the Second Workshop on Speech and Language
[18] P. Basile, E. Musacchio, M. Polignano, L. Siciliani, Processing for Assistive Technologies, Association
G. Fiameni, G. Semeraro, Llamantino: Llama 2 mod- for Computational Linguistics, Edinburgh, Scotland,
els for efective text generation in italian language, UK, 2011, pp. 73–83. URL: https://aclanthology.org/
2023. arXiv:2312.09993. W11-2308/.
[19] S. Tonelli, A. P. Aprosio, F. Saltori, Simpitiki: a [27] L. C. Passaro, A. Lenci, PaWaC - Public
Administrasimplification corpus for italian, Proceedings of tion Web as Corpus (Processed), http://data.europa.</p>
      <p>CLiC-it (2016). eu/88u/dataset/elrc_1282, 2019. [Data set].
[20] M. Miliani, S. Auriemma, F. Alva-Manchego, [28] M. Cortelazzo, Il linguaggio amministrativo:
prin</p>
      <p>A. Lenci, Neural readability pairwise ranking cipi e pratiche di modernizzazione, Carocci, 2021.
[29] A. K. Vijayakumar, M. Cogswell, R. R. Sel- pora as evidence for theories of syntactic processing
varaju, Q. Sun, S. Lee, D. J. Crandall, D. Ba- complexity, Cognition 109 (2008) 193–210.
tra, Diverse beam search: Decoding diverse [39] J. Miller, R. Weinert, Spontaneous spoken language.
solutions from neural sequence models, CoRR Syntax and discourse, Oxford University Press,
abs/1610.02424 (2016). URL: http://arxiv.org/abs/ 1998.</p>
      <p>1610.02424. arXiv:1610.02424. [40] G. Barlacchi, S. Tonelli, Ernesta: A sentence
sim[30] D. Brunato, A. Cimino, F. Dell’Orletta, G. Venturi, plification tool for children’s stories in italian, in:
S. Montemagni, Profiling-UD: a tool for linguis- Computational Linguistics and Intelligent Text
Protic profiling of texts, in: N. Calzolari, F. Béchet, cessing: 14th International Conference, CICLing
P. Blache, K. Choukri, C. Cieri, T. Declerck, S. Goggi, 2013, Springer Berlin Heidelberg, 2013, pp. 476–487.
H. Isahara, B. Maegaard, J. Mariani, H. Mazo, [41] M. HASPELMATH, Against markedness (and what
A. Moreno, J. Odijk, S. Piperidis (Eds.), Proceedings to replace it with), Journal of Linguistics 42 (2006)
of the Twelfth Language Resources and Evaluation 25–70. doi:10.1017/S0022226705003683.
Conference, European Language Resources Associ- [42] Z. Li, M. Shardlow, How do control tokens afect
ation, Marseille, France, 2020, pp. 7145–7151. URL: natural language generation tasks like text
simplihttps://aclanthology.org/2020.lrec-1.883/. ifcation, Natural Language Engineering 30 (2024)
[31] M.-C. De Marnefe, C. D. Manning, J. Nivre, D. Ze- 915–942. doi:10.1017/S1351324923000566.
man, Universal dependencies, Computational
linguistics 47 (2021) 255–308.
[32] T. De Mauro, I. Chiari, Il nuovo
vocabolario di base della lingua italiana,
Internazionale [accessed on 03/03/2023]
(2016). URL: https://www.internazionale.
it/opinione/tullio-de-mauro/2016/12/23/
il-nuovo-vocabolario-di-base-della-lingua-italiana.
[33] G. Sarti, M. Nissim, IT5: Text-to-text pretraining for</p>
      <p>Italian language understanding and generation, in:
N. Calzolari, M.-Y. Kan, V. Hoste, A. Lenci, S. Sakti,
N. Xue (Eds.), Proceedings of the 2024 Joint
International Conference on Computational
Linguistics, Language Resources and Evaluation
(LRECCOLING 2024), ELRA and ICCL, Torino, Italy, 2024,
pp. 9422–9433. URL: https://aclanthology.org/2024.</p>
      <p>lrec-main.823.
[34] L. Xue, N. Constant, A. Roberts, M. Kale, R.
Al</p>
      <p>Rfou, A. Siddhant, A. Barua, C. Rafel, mT5:
A massively multilingual pre-trained text-to-text
transformer, in: Proceedings of the 2021
Conference of the North American Chapter of the
Association for Computational Linguistics:
Human Language Technologies, Association for
Computational Linguistics, Online, 2021, pp. 483–498.</p>
      <p>URL: https://aclanthology.org/2021.naacl-main.41.</p>
      <p>doi:10.18653/v1/2021.naacl-main.41.
[35] H. W. Wendt, Dealing with a common problem in
social science: A simplified rank-biserial coeficient
of correlation based on the statistic., European J. of</p>
      <p>Social Psychology (1972).
[36] L. Frazier, Syntactic complexity, in: D. Dowty,</p>
      <p>L. Karttunen, A. Zwicky (Eds.), Natural Language
Parsing, Cambridge University Press, Cambridge,</p>
      <p>UK, 1985.
[37] E. Gibson, Linguistic complexity: Locality of
syn</p>
      <p>tactic dependencies, Cognition 24 (1998) 1–76.
[38] V. Demberg, F. Keller, Data from eye-tracking
cor</p>
    </sec>
    <sec id="sec-5">
      <title>A. Prompt Template for Sentence</title>
    </sec>
    <sec id="sec-6">
      <title>Simplification</title>
      <p>Each model was prompted using its respective system
prompt provided in the Hugging Face documentation.
We also provided a task-specific prompt to instruct the
model to perform the Sentence Simplification task. The
following prompt pattern was used:
# # # I s t r u z i o n e : S e m p l i f i c a l a
s e g u e n t e f r a s e mantenendo i l
p i ù p o s s i b i l e i n t a t t o i l
s i g n i f i c a t o .
# # # I n p u t : { o r i g i n a l _ s e n t e n c e }
# # # O u t p u t :
English translation: “Instruction: Simplify
the following sentence while keeping the
meaning the same as much as possible.”.</p>
    </sec>
    <sec id="sec-7">
      <title>B. Linguistic Features and</title>
    </sec>
    <sec id="sec-8">
      <title>Readability Correlation</title>
    </sec>
    <sec id="sec-9">
      <title>Heatmap</title>
    </sec>
    <sec id="sec-10">
      <title>C. Linguistic Features of Original and Simplified Sentences</title>
      <p>During the preparation of this work, the author(s) used ChatGPT (OpenAI) in order to: Paraphrase
and reword, Improve writing style, and Grammar and spelling check. After using these
tool(s)/service(s), the author(s) reviewed and edited the content as needed and take(s) full
responsibility for the publication’s content.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>F.</given-names>
            <surname>Alva-Manchego</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Scarton</surname>
          </string-name>
          , L. Specia,
          <article-title>Datadriven sentence simplification: Survey and benchmark</article-title>
          ,
          <source>Computational Linguistics</source>
          <volume>46</volume>
          (
          <year>2020</year>
          )
          <fpage>135</fpage>
          -
          <lpage>187</lpage>
          . URL: https://aclanthology.org/
          <year>2020</year>
          .cl-
          <volume>1</volume>
          .4/. doi:
          <volume>10</volume>
          . 1162/coli_a_
          <fpage>00370</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>M. J.</given-names>
            <surname>Ryan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Naous</surname>
          </string-name>
          , W. Xu,
          <article-title>Revisiting non-English text simplification: A unified multilingual benchmark</article-title>
          , in: A.
          <string-name>
            <surname>Rogers</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Boyd-Graber</surname>
          </string-name>
          , N. Okazaki (Eds.),
          <source>Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume</source>
          <volume>1</volume>
          :
          <string-name>
            <surname>Long</surname>
            <given-names>Papers)</given-names>
          </string-name>
          ,
          <source>Association for Computational Linguistics</source>
          , Toronto, Canada,
          <year>2023</year>
          , pp.
          <fpage>4898</fpage>
          -
          <lpage>4927</lpage>
          . URL: https://aclanthology.org/
          <year>2023</year>
          .
          <article-title>acl-long</article-title>
          .
          <volume>269</volume>
          /. doi:
          <volume>10</volume>
          .18653/v1/
          <year>2023</year>
          .
          <article-title>acl-long</article-title>
          .
          <volume>269</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>D.</given-names>
            <surname>Kauchak</surname>
          </string-name>
          ,
          <article-title>Improving text simplification language modeling using unsimplified text data</article-title>
          , in: H.
          <string-name>
            <surname>Schuetze</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Fung</surname>
          </string-name>
          , M. Poesio (Eds.),
          <source>Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume</source>
          <volume>1</volume>
          :
          <string-name>
            <surname>Long</surname>
            <given-names>Papers)</given-names>
          </string-name>
          ,
          <source>Association for Computational Linguistics</source>
          , Sofia, Bulgaria,
          <year>2013</year>
          , pp.
          <fpage>1537</fpage>
          -
          <lpage>1546</lpage>
          . URL: https://aclanthology.org/P13-1151/.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>D.</given-names>
            <surname>Pellow</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Eskenazi</surname>
          </string-name>
          ,
          <article-title>An open corpus of everyday documents for simplification tasks</article-title>
          , in: S.
          <string-name>
            <surname>Williams</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Siddharthan</surname>
            ,
            <given-names>A</given-names>
          </string-name>
          . Nenkova (Eds.),
          <source>Proceedings of the 3rd Workshop</source>
          on Predicting and
          <article-title>Improving Text Readability for Target Reader Populations (PITR), Association for Computational Linguistics</article-title>
          , Gothenburg, Sweden,
          <year>2014</year>
          , pp.
          <fpage>84</fpage>
          -
          <lpage>93</lpage>
          . URL: https://aclanthology.org/W14-1210/. doi:
          <volume>10</volume>
          . 3115/v1/
          <fpage>W14</fpage>
          -1210.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>F.</given-names>
            <surname>Alva-Manchego</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Martin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bordes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Scarton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Sagot</surname>
          </string-name>
          , L. Specia,
          <article-title>ASSET: A dataset for tuning and evaluation of sentence simplification models with multiple rewriting transformations</article-title>
          , in: D.
          <string-name>
            <surname>Jurafsky</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Chai</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Schluter</surname>
          </string-name>
          , J. Tetreault (Eds.),
          <article-title>Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics</article-title>
          , Online,
          <year>2020</year>
          , pp.
          <fpage>4668</fpage>
          -
          <lpage>4679</lpage>
          . URL: https://aclanthology.org/
          <year>2020</year>
          .acl-main.
          <volume>424</volume>
          /. doi:
          <volume>10</volume>
          .18653/v1/
          <year>2020</year>
          .acl-main.
          <volume>424</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>W.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Callison-Burch</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.</surname>
          </string-name>
          <article-title>Napoles, Problems in current text simplification research: New data can help, Transactions of the Association for Computational Linguistics 3 (</article-title>
          <year>2015</year>
          )
          <fpage>283</fpage>
          -
          <lpage>297</lpage>
          . URL: https://aclanthology.org/Q15-1021/. doi:
          <volume>10</volume>
          .1162/ tacl_a_
          <fpage>00139</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>D.</given-names>
            <surname>Brunato</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Cimino</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Dell'Orletta</surname>
          </string-name>
          ,
          <string-name>
            <surname>G.</surname>
          </string-name>
          <article-title>Venturi, PaCCSS-IT: A parallel corpus of complexsimple sentences for automatic text simplification</article-title>
          , in: J.
          <string-name>
            <surname>Su</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Duh</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          Carreras (Eds.),
          <source>Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing</source>
          , Association for Computational Linguistics, Austin, Texas,
          <year>2016</year>
          , pp.
          <fpage>351</fpage>
          -
          <lpage>361</lpage>
          . URL: https://aclanthology.org/D16-1034/. doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>D16</fpage>
          -1034.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>L.</given-names>
            <surname>Martin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Fan</surname>
          </string-name>
          , É. de la Clergerie,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bordes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Sagot</surname>
          </string-name>
          , MUSS:
          <article-title>Multilingual unsupervised sentence simplification by mining paraphrases</article-title>
          , in: N.
          <string-name>
            <surname>Calzolari</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Béchet</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Blache</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Choukri</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Cieri</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Declerck</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Goggi</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Isahara</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Maegaard</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Mariani</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Mazo</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Odijk</surname>
          </string-name>
          , S. Piperidis (Eds.),
          <source>Proceedings of the Thirteenth Language Resources and Evaluation Conference</source>
          , European Language Resources Association, Marseille, France,
          <year>2022</year>
          , pp.
          <fpage>1651</fpage>
          -
          <lpage>1664</lpage>
          . URL: https://aclanthology.org/
          <year>2022</year>
          .lrec-
          <volume>1</volume>
          .176/.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>A.</given-names>
            <surname>Palmero Aprosio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Tonelli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Turchi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Negri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Di</surname>
          </string-name>
          <string-name>
            <surname>Gangi</surname>
          </string-name>
          ,
          <article-title>Neural text simplification in low-resource conditions using weak supervision</article-title>
          , in: A.
          <string-name>
            <surname>Bosselut</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Celikyilmaz</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Ghazvininejad</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Iyer</surname>
            , U. Khandelwal,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Rashkin</surname>
          </string-name>
          , T. Wolf (Eds.),
          <source>Proceedings of the Workshop on Methods for Optimizing and Evaluating Neural Language Generation</source>
          , Association for Computational Linguistics, Minneapolis, Minnesota,
          <year>2019</year>
          , pp.
          <fpage>37</fpage>
          -
          <lpage>44</lpage>
          . URL: https://aclanthology.org/W19-2305/. doi:
          <volume>10</volume>
          . 18653/v1/
          <fpage>W19</fpage>
          -2305.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>M.</given-names>
            <surname>Miliani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Alva-Manchego</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Lenci</surname>
          </string-name>
          ,
          <article-title>Simplifying administrative texts for Italian L2 readers with controllable transformers models: A data-driven approach</article-title>
          , in: F. Boschetti,
          <string-name>
            <given-names>G. E.</given-names>
            <surname>Lebani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Magnini</surname>
          </string-name>
          , N. Novielli (Eds.),
          <source>Proceedings of the 9th Italian Conference on Computational Linguistics</source>
          (CLiC-it
          <year>2023</year>
          ), CEUR Workshop Proceedings, Venice, Italy,
          <year>2023</year>
          , pp.
          <fpage>303</fpage>
          -
          <lpage>315</lpage>
          . URL: https://aclanthology.org/
          <year>2023</year>
          .clicit-
          <volume>1</volume>
          .37/.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>L.</given-names>
            <surname>Long</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Xiao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Ding</surname>
          </string-name>
          , G. Chen,
          <string-name>
            <given-names>H.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <article-title>On LLMs-driven synthetic data generation, curation, and evaluation: A survey</article-title>
          , in: L.
          <string-name>
            <surname>-W. Ku</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Martins</surname>
          </string-name>
          , V. Srikumar (Eds.),
          <source>Findings of the Association for Computational Linguistics: ACL</source>
          <year>2024</year>
          ,
          <article-title>Association for Computational Linguistics</article-title>
          , Bangkok, Thailand,
          <year>2024</year>
          , pp.
          <fpage>11065</fpage>
          -
          <lpage>11082</lpage>
          . URL: https: //aclanthology.org/
          <year>2024</year>
          .findings-acl.
          <volume>658</volume>
          /. doi: 10.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>