=Paper= {{Paper |id=Vol-3878/91_main_long |storemode=property |title=AI vs. Human: Effectiveness of LLMs in Simplifying Italian Administrative Documents |pdfUrl=https://ceur-ws.org/Vol-3878/91_main_long.pdf |volume=Vol-3878 |authors=Marco Russodivito,Vittorio Ganfi,Giuliana Fiorentino,Rocco Oliveto |dblpUrl=https://dblp.org/rec/conf/clic-it/RussodivitoGFO24 }} ==AI vs. Human: Effectiveness of LLMs in Simplifying Italian Administrative Documents== https://ceur-ws.org/Vol-3878/91_main_long.pdf
                                AI vs. Human: Effectiveness of LLMs in Simplifying Italian
                                Administrative Documents
                                Marco Russodivito1,† , Vittorio Ganfi1,*,† , Giuliana Fiorentino1 and Rocco Oliveto1
                                1
                                    University of Molise, Italy


                                                   Abstract
                                                   This study investigates the effectiveness of Large Language Models (LLMs) in simplifying Italian administrative texts compared
                                                   to human informants. This research evaluates the performance of several well-known LLMs, including GPT-3.5-Turbo, GPT-4,
                                                   LLaMA 3, and Phi 3, in simplifying a corpus of Italian administrative documents (s-ItaIst), a representative corpus of Italian
                                                   administrative texts. To accurately compare the simplification abilities of humans and LLMs, six parallel corpora of a
                                                   subsection of ItaIst are collected. These parallel corpora were analyzed using both complexity and similarity metrics to assess
                                                   the outcomes of LLMs and human participants. Our findings indicate that while LLMs perform comparably to humans in
                                                   many aspects, there are notable differences in structural and semantic changes. The results of our study underscore the
                                                   potential and limitations of using AI for administrative text simplification, highlighting areas where LLMs need improvement
                                                   to achieve human-level proficiency.

                                                   Keywords
                                                   Automatic Text Simplification, Large Language Models, Italian Administrative language



                                1. Introduction                                                                            2. From an analytical perspective, several statistical
                                                                                                                              analyses were conducted to measure the seman-
                                Due to the increasing popularity of generative Artifi-                                        tic and complexity closeness between human and
                                cial Intelligence (AI) language tools [1, 2], significant                                     LLM-generated data. The comparison of scores
                                attention has been devoted to the use of LLMs for text                                        for both LLM and human datasets highlights sig-
                                simplification [3]. Several studies have addressed the ap-                                    nificant differences and similarities in manual and
                                plication of LLMs to simplify texts, particularly focusing                                    AI-driven simplification.
                                on administrative documents, including those in Italian
                                [4, 5, 6]. Italian administrative texts are often notably    The results concerning readability indexes (e.g., Gulpease)
                                complex and obscure [7, 8, 9], which restricts a large seg-  and semantic and structural similarities (e.g., edit dis-
                                ment of the population from fully accessing the content      tance) reveal that LLMs generally perform comparably
                                produced by the Italian public administration [10, 11].      to human informants. However, AI-simplified texts are
                                   This work aims to (a) evaluate the quality of automatic   slightly less similar to the original documents than those
                                text simplification performed by several well-known          generated by human simplifiers. LLMs tend to introduce
                                LLMs, and (b) compare LLM-based simplification with          more changes in the simplified corpora than human anno-
                                human-based simplification. To address these research        tators. The empirical study indicates that texts simplified
                                questions, the following procedures were undertaken:         by AI exhibit more structural and lexical dissimilarities
                                                                                             from the original documents than those simplified by
                                        1. From an empirical perspective, a large corpus of humans.
                                           Italian administrative texts was collected (i.e., Replication package.            All the codes and data
                                           ItaIst). A parallel simplified counterpart of the are available on Figshare at https://figshare.com/s/
                                           corpus was created using different LLMs. Addi- 4d927fe648c6f1cb4227.
                                           tionally, a shorter version of the administrative
                                           corpus was manually simplified by two annota-
                                           tors.                                             2. Related Work
                                CLiC-it 2024: Tenth Italian Conference on Computational Linguistics,                                      Several researchers have conducted research on evalu-
                                Dec 04 — 06, 2024, Pisa, Italy
                                *
                                  Corresponding author.
                                                                                                                                          ating the accountability of LLMs in text simplification
                                †
                                  These authors contributed equally.
                                                                                                                                          and on assessing the metrics employed to measure the
                                $ marco.russodivito@unimol.it (M. Russodivito);                                                           quality of LLM text simplification [12, 13, 14, 15, 16]. In
                                vittorio.ganfi@unimol.it (V. Ganfi); giuliana.fiorentino@unimol.it                                        particular, numerous studies have focused on assessing
                                (G. Fiorentino); rocco.oliveto@unimol.it (R. Oliveto)                                                     the use of LLMs to simplify Italian administrative texts,
                                 0009-0004-8860-1739 (M. Russodivito); 0000-0002-0892-7287                                               highlighting the potential of these models to enhance
                                (V. Ganfi); 0000-0002-0392-9056 (G. Fiorentino);
                                                                                                                                          text readability. Some studies have specifically evalu-
                                0000-0002-7995-8582 (R. Oliveto)
                                             © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License ated the readability of simplified administrative texts
                                             Attribution 4.0 International (CC BY 4.0).




CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
by comparing parallel corpora of simplified documents            topics and text types of the main corpus. Table 1 provides
and adopting a qualitative interpretative approach [17].         a summary of the s-ItaIst.
Other contributions have assessed the outputs of LLMs
in simplification tasks, particularly focusing on models         Table 1
partially trained on Italian [18].                               An overview of the main metrics of the s-ItaIst corpus.
   Our paper analyzes the differences between LLM and
                                                                                    Metrics          Value
human simplification of Italian administrative texts, fol-
                                                                                    # documents           8
lowing a quantitative approach. By examining these dif-                             # sentences       1,314
ferences, our study aims to highlight the similarities and                          # tokens         33,295
dissimilarities that emerge during the simplification of                            # types           5,622
administrative documents by humans and AI.


3. Study Design                                                  3.2. LLMs
Our study aims to analyze the effectiveness of modern To investigate both open-source and commercial mod-
LLMs in simplifying administrative text. To achieve this, els, the s-ItaIst corpus was simplified using four distinct
we address the following Research Question (RQ):            commercial LLMs, namely GPT-3.5-Turbo [21] and GPT-4
                                                            [22] by OpenAI, LLaMA 3 [23] by Meta, and Phi 3 [23] by
         How effective are AI systems at simplifying        Microsoft. For open-source models, we used the LLaMA 3
         administrative texts compared to humans?           8B2 and Phi 3 3.8B3 variants, both fine-tuned on large
                                                            Italian corpora. This selection explores models of vari-
This question evaluates whether modern AI can achieve
                                                            ous sizes while ensuring optimal performance for Italian
a level of quality comparable to human experts, our refer-
                                                            tasks.
ences, by analyzing how well LLMs can reduce complex-
                                                                 A detailed prompt was formulated to instruct each
ity while preserving the original meaning of the texts.
                                                            model to perform the simplification task properly, avoid-
   The study has been conducted on a sub-corpus of ItaIst,
                                                            ing summary and applying state-of-the-art simplification
utilizing several LLMs to support the text simplification
                                                            rules [9]. The full prompt can be found in Appendix B.
process.
                                                                 The OpenAI models were accessed via APIs4 , while
                                                            the open-source models were hosted on an AWS EC2
3.1. Corpus                                                 G65 instance equipped with a single Nvidia L4 GPU with
                                                            24GB vRAM.
The ItaIst corpus has been created as part of the Ver-
bACxSS research project. It was composed by linguists
and jurists to create a representative linguistic resource 3.3. Experimental Procedure
for contemporary administrative Italian [19, 20]. ItaIst
                                                            To address our research question, we conducted an em-
was assembled by collecting recent official documents
                                                            pirical study to compare automatic and manual simpli-
from local and regional public administration websites
                                                            fications. Our study, illustrated in Figure 1, can be sum-
of eight Italian regions (Basilicata, Calabria, Campania,
                                                            marized in three main steps: (i) constructing a corpus of
Lazio, Lombardy, Molise, Tuscany, and Veneto) covering
                                                            administrative documents (i.e., s-ItaIst), (ii) simplifying
topics such as garbage, healthcare, and public services.
                                                            this corpus using four LLMs and two human annotators,
The corpus includes a variety of text types, such as Ten-
                                                            and (iii) comparing the LLM-simplified corpora with the
ders Notices, Planning Acts, Services Charters.
                                                            human-simplified corpora.
   The reliability of the corpus design was ensured by (a)
                                                                 It is worth noting that the s-ItaIst corpus was subdi-
linguists, who checked the corpus represents administra-
                                                            vided into small sections (2-6 sentences) to avoid exceed-
tive Italian in terms of textual and diatopic features, and
                                                            ing the context windows of the LLMs and to facilitate
(b) jurists, who selected and validated each document
                                                            human informants during simplification6 .
included in ItaIst. The resulting corpus, comprising 208
documents, consists of around 2, 000, 000 tokens and 2 https://huggingface.co/DeepMount00/Llama-3-8b-Ita (last seen 07-
45, 000 types1 . More information about the ItaIst corpus 21-2024)
                                                            3
can be found in Appendix A.                                   https://huggingface.co/e-palmisano/Phi3-ITA-mini-4K-instruct
                                                              (last seen 07-21-2024)
   To make a fair comparison between humans and AI, a 4 https://openai.com/api/ (last seen 07-21-2024)
sub-corpus of ItaIst (hereinafter, s-ItaIst) was extracted. 5 https://aws.amazon.com/it/ec2/instance-types/g6/ (last seen 07-21-
The s-ItaIst sub-corpus was composed by selecting rep- 2024)
                                                            6
resentative documents from each region, balancing the s-ItaIst corpus was segmented into a total of 619 sections of text.
                                                                  Each section, then, was assigned to human annotators and LLMs
1
    https://huggingface.co/datasets/VerbACxSS/ItaIst              for simplification.
                               Manual                                                                                Automatic
                            simplification                                                                         simplification



                                                                     s-ItaIst




                 Human1                               Human2                         GPT-4      GPT-3.5-Turbo    LLAMA 3             Phi 3
                 Parallel                             Parallel                       Parallel     Parallel        Parallel          Parallel
                 Corpus                               Corpus                         Corpus       Corpus          Corpus            Corpus



                  Completixy Metrics                                                                            Similarity Metrics
                        Gulpease Index                                                                          Semantic Similarity (%)
                      Flesch-Vacca Index                                                                          Edit Distance (%)

                          NVdB (%)
                       Passive verbs (%)                            Metrics
                                                                   Extractions

Figure 1: Experimental design schema: The s-ItaIst corpus was simplified both automatically and manually by two humans
and four LLMs. The resulting parallel corpora were analyzed using complexity and similarity metrics.



   Human annotators with strong backgrounds in linguis-                       In literature several simplicity measures (for instance,
tics and deep knowledge about administrative text simpli-                  SAMSA [29], and SARI [30]) are employed, although
fication simplified the corpus following common simplifi-                  their results may vary depending on the level of analysis
cation rules identified in the literature [24, 25, 8, 9]. They             examined and, of course, on the design of the metrics.
exploited a custom web application that (i) assigned sec-                  Therefore, SAMSA aims to measure structural simplic-
tions of the document to simplify and (ii) tracked the time                ity through monitoring sentence splitting accuracy, and
they spent during such an activity. Similarly, each LLM                    SARI was developed to measure the simplicity advan-
was instructed to automatically simplify every document                    tage when just lexical paraphrasing was evaluated. Fur-
in the corpus one section at a time.                                       thermore, some study shows that when calculated using
   This approach provided a comprehensive comparison                       multi-operation manual references, both a generic met-
dataset of six distinct parallel corpora. We analyzed these                ric like BLEU [31] and an operation-specific one like
data to compare human and automatic simplifications                        SARI have low associations with assessments of over-
by extracting features such as complexity and similarity                   all simplicity[32]. Thus, to measure the readability of
metrics to measure the quality of the simplified texts and                 investigated corpora we selected
their relatedness to the original text. Furthermore, we
                                                                                 1. Flesch Vacca Index, Gulpease Index and READ-IT,
computed the Wilcoxon Signed-Rank Test [26] to statisti-
                                                                                    since they are advanced instruments designed
cally evaluate the difference between LLMs and human
                                                                                    to investigate the degree of simplicity of Italian
metrics and Cliff’s Delta [27, 28] to provide a measure of
                                                                                    texts, and
the effect size.
                                                                                 2. percentages of some lexical and structural fea-
                                                                                    tures (i.e., amount of most common lexical items
3.4. Metrics                                                                        and active verb forms) increasing the readability
To assess the quality of the simplifications, we employed                           of texts.
both complexity and similarity metrics from the litera-                      Also for similarity metrics, computational literature
ture. Complexity metrics compare the ease of the original                 offers several resources aiming to measure the structural
and simplified text, while similarity metrics measure the                 or semantic proximity of texts. Some of these operate at
distance between them. We implemented these metrics                       the n-gram overlap (e.g., BLEU [31] and METEOR [33]),
according to the state-of-the-art, leveraging natural lan-                while others consider other features. For this analysis,
guage processing (NLP) techniques (e.g., tokenization,                    we select Semantic Similarity to quantify the degree of
POS tagging7 ).                                                           semantic closeness between corpora and Edit distance
                                                                          to measure structural similarities between investigated
7
    The process of tokenization and tagging was conducted using the       corpora.
    spaCy natural language processing tool: https://spacy.io (last seen      To support future research, we have made our metrics
    07-21-2024)
implementation publicly available8 .                                                   opted for the latter approach, which leverages
  Details concerning considered complexity metrics                                     cosine similarity between contextual embeddings
herein are shown:                                                                      (obtained through sentence-transformers
         • Gulpease Index [34]: This metric evaluates the                              and an open-source multilingual model10 ) to eval-
           readability of an Italian text and assesses the edu-                        uate similarity at the sentence level, encapsulat-
           cation level required to fully comprehend it. It is                         ing the overall contextual meaning [42].
           calculated using the following formula:                                   • Edit distance (%) [43]: This metric measures the
                                                                                       similarity between two strings based on the num-
                    300 * (𝑠𝑒𝑛𝑡𝑒𝑛𝑐𝑒𝑠) − 10 * (𝑐ℎ𝑎𝑟𝑎𝑐𝑡𝑒𝑟𝑠)
            89 +                                                                       ber of single-character edits (insertions, deletions,
                                     𝑡𝑜𝑘𝑒𝑛𝑠
                                                            (1)                        or substitutions) required to transform one text
         • Flesch Vacca Index [35]: This is an adaptation of                           into the other. A value close to zero indicates a
           the original Flesch Reading Ease formula for eval-                          relatively minor difference between the two texts,
           uating the readability of Italian texts, computed                           while a high value indicates significant rephras-
           as follows:                                                                 ing.
                                𝑠𝑦𝑙𝑙𝑎𝑏𝑙𝑒𝑠       𝑡𝑜𝑘𝑒𝑛𝑠
                  217 − 130 *             −                 (2)
                                 𝑡𝑜𝑘𝑒𝑛𝑠       𝑠𝑒𝑛𝑡𝑒𝑛𝑐𝑒𝑠                     3.5. Threats to validity
         • READ-IT [36]: The tool is the first advanced
                                                        We analyze the validity of our study by examining con-
           readability evaluation instrument for Italian, com-
                                                        struct, internal, and external validity. This evaluation
           bining traditional raw text features with lexi-
                                                        helps us understand the strengths and limitations of our
           cal, morpho-syntactic, and syntactic information.
                                                        methodology and the generalizability of our findings.
           Four different readability models are included in
                                                            Construct validity: The two linguistic experts in-
           the tool: READ-IT BASE includes only raw fea-
                                                        volved in the manual simplification of the s-ItaIst cor-
           tures, calculating sentence length (average num-
                                                        pus may have produced divergent variants due to their
           ber of words per sentence) and word length (av-
                                                        subjective approaches. Despite differences in seniority,
           erage number of characters per word); READ-IT
                                                        both experts have strong linguistic backgrounds (holding
           LEXICAL combines raw (e.g., word length) and
                                                        PhDs) and several years of experience. Nevertheless, in-
           lexical (e.g., Type/Token Ratio) features; READ-
                                                        volving two human simplifiers allowed us to explore dis-
           IT SYNTACTIC employs raw text (e.g., sentence
                                                        tinct simplification approaches and compare automatic
           length) and morpho-syntactic (e.g., average num-
                                                        simplification against two varied benchmarks.
           ber of clauses per sentence) properties; READ-IT
                                                            Internal validity: The LLMs used for automatic text
           GLOBAL includes all other features, combining
                                                        simplification, particularly those from HuggingFace, may
           raw text, lexical, morpho–syntactic and syntactic
                                                        have been trained on non-administrative texts, poten-
           (e.g., the depth of the whole parse tree) features 9 .
                                                        tially introducing issues in the simplified text. However,
         • NVdB (%): "Il Nuovo vocabolario di base della lin-
                                                        we relied on state-of-the-art models tested against several
           gua italiana" [37] consists of fundamental and
                                                        benchmarks [44, 45, 46, 47]. Additionally, the embeddings
           commonly used words representing the essential
                                                        for calculating Semantic Similarity were obtained through
           lexicon of the Italian language. The ease of a text
                                                        a multilingual model chosen for its high ranking on the
           can be roughly estimated by the number of words
           listed in the basic vocabulary [38].         MTEB leaderboard11 , particularly for its performance in
                                                        the STS22 benchmark (it) [48].
         • Passive (%): Overuse of passive voice can lead
           to ambiguity and complexity, especially for read-External validity: Our study focuses on the sub-
           ers who may struggle with comprehension [24, corpus ItaIst, consisting of eight administrative docu-
                                                        ments. Although the number of documents is relatively
           25, 9]. It is calculated by identifying verbs with
           aux:pass occurring in the Dependency Parsing
                                                        small, the corpus includes over 1, 000 sentences. Manual
           Tree.                                        simplification of the corpus took Human1 and Human2
                                                        15 and 23 hours respectively. Extending our study to the
  Details concerning considered similarity metrics entire ItaIst corpus would have been infeasible. However,
herein are shown:                                       the documents of the ItaIst sub-corpus were not chosen
     • Semantic Similarity (%) [39]: This metric mea- randomly; they were selected to represent the variety of
       sures the distance between the semantic mean- administrative texts.
       ings of two documents. It can be computed ex-
       ploiting relevant methodologies from the litera- 10
                                                           https://huggingface.co/intfloat/multilingual-e5-base (last seen 07-
       ture, such as BERTscore[40] and SBERT [41]. We
                                                                                 21-2024)
8                                                                           11
    https://pypi.org/project/italian-ats-evaluator (last seen 07-21-2024)        https://huggingface.co/spaces/mteb/leaderboard (last seen 07-21-
9
    http://www.italianlp.it/demo/read-it (last seen 04-10-2024)                  2024)
Table 2
Metrics evaluated across the original corpus and the human and LLM simplified corpora.
                                 Original     Human1       Human2            GPT-3.5-Turbo         GPT-4       LLaMA 3        Phi 3
 Tokens                            33,295       34,135       29,755                  30,032        31,722         36,035      36,056
 Sentences                           1,314       1,506        1,744                   1,515         1,840          1,944       1,900
 Tokens per Sentences                25.33       22.66        17.06                   19.53         17.24          18.53       18.97
 Sentences per Documents           164.25       188.25       218.00                  189.37        230.00         243.00      237.50
 Gulpease Index                      44.31       49.72        50.64                   48.49         51.34          50.26       50.16
 Flesch Vacca Index                  19.97       34.23        33.63                   30.33         36.75          34.09       33.75
 NVdB (%)                            73.28       80.44        76.89                   78.28         81.07          80.18       80.16
 Passive (%)                         20.87       15.78        17.71                   13.99         12.00          15.81       15.72
 READ-IT BASE (%)                    75.91       68.62        51.00                   66.61         55.00          58.37       57.69
 READ-IT LEXICAL (%)                 93.64       85.37        89.71                   91.96         90.29          77.13       75.74
 READ-IT SYNTACTIC (%)               63.72       53.14        40.09                   38.42         29.92          40.97       41.24
 READ-IT GLOBAL (%)                  86.48       69.24        61.34                   68.69         54.60          59.26       58.37
 Semantic Similarity (%)                 -       96.52        97.26                   96.06         95.80          94.96       94.96
 Edit distance (%)                       -       35.84        29.20                   49.21         52.14          55.48       55.44



4. Results and Discussion                                     LEXICAL). To validate our outcomes, we performed the
                                                              Wilcoxon Signed-Rank Test and calculated Cliff’s Delta
A preliminary analysis of our results, summarized in effect size to analyze the difference between GPT-4 and
Table 2, reveals several significant similarities and differ- human metrics. By examining the results in Table 3, we
ences between the human and LLM datasets. For instance, can assert that:
the variation in the number of tokens is similar across
both human and LLM corpora, although LLMs generally                  GPT-4 simplifications can be comparable
increase the number of sentences more prominently than               to human simplifications. GPT-4 simplifi-
human annotators.                                                    cations are negligibly better for complexity
  Regarding complexity metrics, all the parallel corpora             metrics, moderately worse for similarity,
(both human and LLM) exhibit a general increase in read-             and largely rephrased compared to human
ability compared to the original texts. For example, the             simplifications.
majority of the corpora improve the Gulpease Index read-          The results of the Wilcoxon Signed-Rank Test and Cliff’s
ability metric, shifting the difficulty level from very dif-   Delta Effect Size for the other models, though not fully
ficult to difficult for middle school reading levels [34]      significant, are listed in Appendix C.
(except for Human1 and GPT-3.5-Turbo). Additionally,              A brief extract taken from Original, Human1, Human2
complexity metrics vary similarly across both human and        and GPT-4 parallel corpora, representing the same phrase
LLM groups, with differences between manual and AI             simplified by the two human annotators and GPT-4 is
simplifiers not significantly greater than those between       shown below 12 :
Human1 and Human2 or among GPT-3.5-Turbo, GPT-4,
LLaMA 3, and Phi 3.                                                        Original: fatturato minimo annuo, per
   The analysis of semantic and structural distance met-                   gli ultimi tre esercizi, pari o superiore al
rics from the original s-ItaIst shows more pronounced                      valore stimato del presente appalto
differences between human and LLM datasets. In terms                       Human1: Guadagno in un anno (fat-
of semantic similarity (Semantic Similarity), the Human1                   turato minimo annuo) negli ultimi 3 anni
and Human2 corpora are closer to the original meaning                      di valore uguale o superiore al valore di
than the LLM-simplified corpora. These differences are                     questo bando
even more pronounced when considering edit distance                        Human2: l’ammontare di fatture emesse
(Edit distance). The percentage of edit distance is higher                 annualmente, per gli ultimi tre anni, deve
in the LLM group, with each LLM corpus exceeding the                       essere pari o superiore al valore stimato
human ones by at least 10%.                                                del presente appalto
   Higher degrees of Semantic Similarity and lower de-                     GPT-4: un fatturato annuo minimo, negli
grees of Edit distance in human corpora indicate that                      ultimi tre anni, uguale o maggiore al val-
human annotators tend to make fewer changes to the                         ore stimato dell’appalto
original text compared to LLMs.                                12
                                                                    A more extensive example of data regarding human and LLM
   As reported in Table 2, GPT-4 achieved the best re-              simplifications collected in the parallel corpora designed for this
sults across the majority of metrics (except for READ-IT            study can be found in Appendix D.
Table 3                                                                    are critical in administrative texts.
Results of the Wilcoxon Signed-Rank Test and Cliff’s Delta                    Despite this limitation, LLMs can serve as valuable
Effect Size performed on GPT-4, Human1, and Human2 metrics.                support tools for text simplification, significantly accel-
              Metrics                   p-value      Effect Size           erating a process that typically requires hours of manual
              Gulpease Index           < 0.0001      negligible      ↗     work. By generating initial drafts, LLMs can reduce the
              Flesch Vacca Index       < 0.0001      negligible      ↗     workload of human experts, who would then review and
     Human1




              NVdB                       0.0108      negligible      ↗     refine the AI-generated drafts, ensuring the preservation
              Passive                    0.0004      negligible      ↘     of the overall meaning and legal integrity of the text.
              READ-IT BASE             < 0.0001      small           ↘     The results achieved in our study indicated that modern
              READ-IT LEXICAL          < 0.0001      negligible      ↗     LLMs can simplify administrative documents almost as
              READ-IT SYNTACTIC        < 0.0001      small           ↘     effectively as humans. However, the achieved findings
              READ-IT GLOBAL           < 0.0001      small           ↘
                                                                           indicate that LLMs are not fully capable of preserving
              Semantic Similarity      < 0.0001      small           ↘
              Edit distance            < 0.0001      large           ↗     the semantic meaning of the text, tending to rephrase
              Gulpease Index             0.0092      negligible      ↗     more extensively than humans. This could introduce le-
              Flesch Vacca Index       < 0.0001      negligible      ↗     gal issues into the simplified text. Further study could be
     Human2




              NVdB                     < 0.0001      small           ↗     conducted to evaluate the juridical equivalence of auto-
              Passive                  < 0.0001      negligible      ↘     matically simplified documents. A manual investigation
              READ-IT BASE               0.0292      negligible      ↗     of our parallel corpus, supervised by expert jurists, may
              READ-IT LEXICAL                                              reveal important implications in this sensitive context.
              READ-IT SYNTACTIC        < 0.0001      negligible      ↘        Another promising direction for future research is to
              READ-IT GLOBAL           < 0.0001      negligible      ↘     investigate the impact of automatic simplification on text
              Semantic Similarity      < 0.0001      medium          ↘
                                                                           comprehension. An additional empirical study could be
              Edit distance            < 0.0001      large           ↗
                                                                           designed to evaluate whether automatically simplified
                                                                           documents are easier to understand than their original
                                                                           versions.
  In the above syntagmas, the similarities between the                        Additionally, it would be worthwhile to explore dif-
simplifications are quite obvious: for example, the tech-                  ferent prompting strategies to further improve simpli-
nical term esercizio or the more ambiguous word pari are                   fication quality. For instance, few-shot prompting [50]
replaced by the more common lexical equivalents anno                       with some manually simplified gold samples could better
or uguale, respectively.                                                   align LLMs with human style.

5. Conclusion                                                              Acknowledgments
In this study, we investigated the automatic simplifica-                   This contribution is a result of the research conducted
tion of Italian administrative documents. Our results                      within the framework of the PRIN 2020 (Progetti di Rile-
demonstrate that LLMs can effectively simplify these                       vante Interesse Nazionale) “VerbACxSS: on analytic verbs,
texts, performing comparably to humans 13 .                                complexity, synthetic verbs, and simplification. For ac-
   Among the models examined, GPT-4 shows superior                         cessibility” (Prot. 2020BJKB9M), funded by the Italian
performance in text simplification, exhibiting significant                 Ministry of Universities and Research.
improvements in complexity metrics. Nonetheless, it is                     Giuliana Fiorentino and Rocco Oliveto are responsible for
noteworthy that humans tend to maintain a higher level                     research question identification, study design, research
of Edit distance and Semantic Similarity, ensuring the                     supervision and data analysis. However, for academic
preservation of the original meaning and structure of                      reasons, Section 2, Section 3.1, Section 3.3, Section 4, and
the text. In other words, humans—aware of the impor-                       Section 5 are attributed to Vittorio Ganfi; and Section 1,
tance of precise language for these documents—mostly                       Section 3, Section 3.2, Section 3.4 and Section 3.5 to Marco
preserved the original meaning and structure, whereas                      Russodivito.
LLMs, while simplifying, tended to rephrase extensively.
This rephrasing, although effective in reducing complex-
ity, might inadvertently alter the legal nuances, which                    References
13
     Further evidence showing that LLM simplifications preserve the         [1] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit,
     meaning of the original texts was obtained in a study, conducted           L. Jones, A. N. Gomez, L. Kaiser, I. Polosukhin, At-
     on the same data. The unpublished research indicated that expe-
                                                                                tention is all you need, in: Advances in Neural
     rienced evaluators, i.e., jurists having administrative competence,
     agree that LLM simplifications of administrative texts maintain            Information Processing Systems (NIPS), volume 30,
     the legal integrity of the original documents [49].                        2017.
 [2] T. Wolf, L. Debut, V. Sanh, J. Chaumond, C. De-               crosoft Bing, medRxiv (2023). doi:10.1101/2023.
     langue, A. Moi, P. Cistac, T. Rault, R. Louf, M. Fun-         06.04.23290786.
     towicz, J. Davison, S. Shleifer, P. von Platen, C. Ma,   [14] P. Mavrepis, G. Makridis, G. Fatouros, V. Koukos,
     Y. Jernite, J. Plu, C. Xu, T. Le Scao, S. Gugger,             M. M. Separdani, D. Kyriazis, Xai for all: Can large
     M. Drame, Q. Lhoest, A. Rush, Transformers: State-            language models simplify explainable ai?, arXiv
     of-the-art natural language processing, in: Confer-           preprint arXiv:2401.13110 (2024).
     ence on Empirical Methods in Natural Language            [15] Y. Ma, S. Seneviratne, E. Daskalaki, Improving Text
     Processing: System Demonstrations (EMNLP), 2020,              Simplification with Factuality Error Detection, in:
     pp. 38–45.                                                    Workshop on Text Simplification, Accessibility, and
 [3] M. J. Ryan, T. Naous, W. Xu, Revisiting non-English           Readability (TSAR), 2022, pp. 173–178.
     text simplification: A unified multilingual bench-       [16] F. Alva-Manchego, C. Scarton, L. Specia, Data-
     mark, Association for Computational Linguistics               Driven Sentence Simplification: Survey and Bench-
     (ACL) (2023).                                                 mark, Computational Linguistics 46 (2020) 135–187.
 [4] D. Brunato, F. Dell’Orletta, G. Venturi, S. Monte-       [17] M. Miliani, F. Alva-Manchego, A. Lenci, Simplifying
     magni, Design and Annotation of the First Italian             Administrative Texts for Italian L2 Readers with
     Corpus for Text Simplification, in: Linguistic An-            Controllable Transformers Models: A Data-driven
     notation Workshop (LAW), 2015, pp. 31–41.                     Approach., in: CLiC-it, 2023.
 [5] M. Miliani, S. Auriemma, F. Alva-Manchego,               [18] D. Nozza, G. Attanasio, et al., Is it really that sim-
     A. Lenci, Neural readability pairwise ranking for             ple? prompting language models for automatic text
     sentences in Italian administrative language, in:             simplification in italian, in: CEUR Workshop Pro-
     Asia-Pacific Chapter of the Association for Compu-            ceedings, 2023.
     tational Linguistics(AACL) and International Joint       [19] D. Vellutino, et al., L’italiano istituzionale per la
     Conference on Natural Language Processing (IJC-               comunicazione pubblica, Il mulino, Bologna, 2018.
     NLP), 2022, pp. 849–866.                                 [20] D. Vellutino, N. Cirillo, Corpus «itaist»: Note per
 [6] M. Miliani, M. S. Senaldi, G. Lebani, A. Lenci, Un-           lo sviluppo di una risorsa linguistica per lo studio
     derstanding Italian Administrative Texts: A Reader-           dell’italiano istituzionale per il diritto di accesso
     Oriented Study for Readability Assessment and                 civico, Italiano LinguaDue 16 (2024) 238–250.
     Text Simplification, in: Workshop on AI for Public       [21] T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Ka-
     Administration (AIxPA), 2022, pp. 71–87.                      plan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sas-
 [7] S. Lubello,         La lingua del diritto e                   try, A. Askell, et al., Language models are few-shot
     dell’amministrazione, Il mulino, Bologna, 2017.               learners, Advances in Neural Information Process-
 [8] M. Cortelazzo, Il linguaggio amministrativo. Prin-            ing Systems (NIPS) 33 (2020) 1877–1901.
     cipi e pratiche di modernizzazione, Carocci, Roma,       [22] J. Achiam, S. Adler, S. Agarwal, L. Ahmad, I. Akkaya,
     2021.                                                         F. L. Aleman, D. Almeida, J. Altenschmidt, S. Alt-
 [9] G. Fiorentino, V. Ganfi, Parametri per semplificare           man, S. Anadkat, et al., Gpt-4 technical report,
     l’italiano istituzionale: Revisione della letteratura,        arXiv preprint arXiv:2303.08774 (2023).
     Italiano LinguaDue 16 (2024) 220–237.                    [23] AI@Meta, Llama 3 model card (2024). URL:
[10] E. Piemontese (Ed.), Il dovere costituzionale di farsi        https://github.com/meta-llama/llama3/blob/main/
     capire. A trent’anni dal Codice di stile, Carocci,            MODEL_CARD.md.
     Roma, 2023.                                              [24] E. Piemontese, Criteri e proposte di semplificazione,
[11] S. Lubello, Da dembsher al codice di stile e oltre: un        in: Codice di stile delle comunicazioni scritte a uso
     bilancio sul linguaggio burocratico, in: E. Piemon-           delle pubbliche amministrazioni, Istituto Poligrafico
     tese (Ed.), Il dovere costituzionale di farsi capire A        e Zecca dello Stato, Roma, 1994.
     trent’anni dal Codice di stile, Carocci, Roma, 2023,     [25] A. Fioritto, Manuale di stile. Strumenti per semplifi-
     pp. 54–70.                                                    care il linguaggio delle amministrazioni pubbliche,
[12] G. Gonzalez Delgado, B. Navarro Colorado, The                 Il mulino, Bologna, 1997.
     Simplification of the Language of Public Adminis-        [26] F. Wilcoxon, Probability tables for individual com-
     tration: The Case of Ombudsman Institutions, in:              parisons by ranking methods, Biometrics 3 (1947)
     Proceedings of the Workshop on DeTermIt! Evalu-               119–122.
     ating Text Difficulty in a Multilingual Context, 2024,   [27] N. Cliff, Dominance statistics: Ordinal analyses to
     pp. 125–133.                                                  answer ordinal questions., Psychological bulletin
[13] R. Doshi, K. Amin, P. Khosla, S. Bajaj, S. Chheang,           114 (1993) 494–509.
     H. P. Forman, Utilizing large Language Models to         [28] N. Cliff, Ordinal methods for behavioral data analy-
     Simplify Radiology Reports: a comparative analysis            sis, Psychology Press, New York, 2014.
     of ChatGPT3.5, ChatGPT4.0, Google Bard, and Mi-          [29] E. Sulem, O. Abend, A. Rappoport, Semantic
     structural evaluation for text simplification, in:           doi:10.3389/fpsyg.2022.707630.
     M. Walker, H. Ji, A. Stent (Eds.), Proceedings of the   [39] D. Chandrasekaran, V. Mago, Evolution of semantic
     2018 Conference of the North American Chapter                similarity—A survey, ACM Computing Surveys
     of the Association for Computational Linguistics:            (CSUR) 54 (2021) 1–37.
     Human Language Technologies, Volume 1 (Long             [40] T. Zhang, V. Kishore, F. Wu, K. Q. Weinberger,
     Papers), Association for Computational Linguistics,          Y. Artzi, Bertscore: Evaluating text generation
     New Orleans, Louisiana, 2018, pp. 685–696. URL:              with bert, in: International Conference on Learn-
     https://aclanthology.org/N18-1063. doi:10.18653/             ing Representations, 2020. URL: https://openreview.
     v1/N18-1063.                                                 net/forum?id=SkeHuCVFDr.
[30] W. Xu, C. Napoles, E. Pavlick, Q. Chen, C. Callison-    [41] N. Reimers, I. Gurevych, Sentence-BERT: Sentence
     Burch, Optimizing Statistical Machine Translation            Embeddings using Siamese BERT-Networks, in:
     for Text Simplification, Transactions of the As-             Conference on Empirical Methods in Natural Lan-
     sociation for Computational Linguistics 4 (2016)             guage Processing (EMNLP), Association for Com-
     401–415. URL: https://doi.org/10.1162/tacl_a_00107.          putational Linguistics, 2019.
     doi:10.1162/tacl_a_00107.                               [42] A. Barayan, J. Camacho-Collados, F. Alva-
[31] K. Papineni, S. Roukos, T. Ward, W.-J. Zhu, Bleu: a          Manchego,         Analysing zero-shot readability-
     method for automatic evaluation of machine trans-            controlled sentence simplification, arXiv preprint
     lation, in: Proceedings of the 40th Annual Meet-             arXiv:2409.20246 (2024).
     ing on Association for Computational Linguistics,       [43] F. P. Miller, A. F. Vandome, J. McBrewster, Lev-
     ACL ’02, Association for Computational Linguis-              enshtein distance: Information theory, computer
     tics, USA, 2002, p. 311–318. URL: https://doi.org/           science, string (computer science), string metric,
     10.3115/1073083.1073135. doi:10.3115/1073083.                damerau? Levenshtein distance, spell checker, ham-
     1073135.                                                     ming distance, Alpha Press, Olando, 2009.
[32] F. Alva-Manchego, C. Scarton, L. Specia, The            [44] D. Hendrycks, C. Burns, S. Basart, A. Zou,
     (Un)Suitability of Automatic Evaluation Metrics for          M. Mazeika, D. Song, J. Steinhardt, Measuring
     Text Simplification, Computational Linguistics 47            massive multitask language understanding, Inter-
     (2021) 861–889. URL: https://doi.org/10.1162/coli_           national Conference on Learning Representations
     a_00418. doi:10.1162/coli_a_00418.                           (ICLR) (2021).
[33] S. Banerjee, A. Lavie, Meteor: An automatic metric      [45] R. Zellers, A. Holtzman, Y. Bisk, A. Farhadi, Y. Choi,
     for mt evaluation with improved correlation with             Hellaswag: Can a machine really finish your sen-
     human judgments, in: Workshop on Intrinsic and               tence?, in: Proceedings of the 57th Annual Meeting
     Extrinsic Evaluation Measures for Machine Trans-             of the Association for Computational Linguistics,
     lation and/or Summarization, 2005, pp. 65–72.                2019, p. 4791–4800.
[34] P. Lucisano, M. E. Piemontese, Gulpease: una for-       [46] P. Clark, I. Cowhey, O. Etzioni, T. Khot, A. Sab-
     mula per la predizione della leggibilita di testi in         harwal, C. Schoenick, O. Tafjord, Think you have
     lingua italiana, Scuola e città (1988) 110–124.              solved question answering? try arc, the ai2 rea-
[35] V. Franchina, R. Vacca, Adaptation of flesh readabil-        soning challenge, arXiv preprint arXiv:1803.05457
     ity index on a bilingual text written by the same            (2018).
     author both in italian and english languages, Lin-      [47] D. Dua, Y. Wang, P. Dasigi, G. Stanovsky, S. Singh,
     guaggi 3 (1986) 47–49.                                       M. Gardner, Drop: A reading comprehension bench-
[36] F. Dell’Orletta, S. Montemagni, G. Venturi, Read–it:         mark requiring discrete reasoning over paragraphs,
     Assessing readability of italian texts with a view to        in: J. Burstein, C. Doran, T. Solorio (Eds.), Proceed-
     text simplification, in: Proceedings of the second           ings of the 2019 Conference of the North American
     workshop on speech and language processing for               Chapter of the Association for Computational Lin-
     assistive technologies, 2011, pp. 73–83.                     guistics: Human Language Technologies, Volume 1
[37] T. De Mauro, I. Chiari,               Il nuovo vo-           (Long and Short Papers), 2019, pp. 2368–2378.
     cabolario di base della lingua italiana                 [48] N. Muennighoff, N. Tazi, L. Magne, N. Reimers,
     (2016).      URL:      https://www.internazionale.           MTEB: Massive text embedding benchmark, in:
     it/opinione/tullio-de-mauro/2016/12/23/                      European Chapter of the Association for Computa-
     il-nuovo-vocabolario-di-base-della-lingua-italiana.          tional Linguistics (EACL), 2023, pp. 2014–2037.
[38] D. Brunato, F. Dell’Orletta, G. Venturi,                [49] G. Fiorentino, M. Russodivito, V. Ganfi, R. Oliveto,
     Linguistically-Based Comparison of Differ-                   Validazione e confronto tra semplificazione auto-
     ent Approaches to Building Corpora for                       matica e semplificazione manuale di testi in italiano
     Text Simplification: A Case Study on Ital-                   istituzionale ai fini dell’efficacia comunicativa, in:
     ian,       Frontiers in Psychology 13 (2022).                Automated texts In the ROMance languages and be-
     yond” (AI-ROM-II), 2nd International Conference,    advances of few-shot learning methods and appli-
     To appear.                                          cations, Science China Technological Sciences 66
[50] J. Wang, K. Liu, Y. Zhang, B. Leng, J. Lu, Recent   (2023) 920–944.
Table 5                                                      Table 7
Results of the Wilcoxon Signed-Rank Test and Cliff’s Delta   Results of the Wilcoxon Signed-Rank Test and Cliff’s Delta
Effect Size performed on GPT-3.5-Turbo, Human1, and Human2   Effect Size performed on Phi 3, Human1, and Human2 metrics.
metrics.
                                                                        Metrics                p-value    Effect Size
          Metrics                p-value   Effect Size                  Gulpease Index          0.0134    negligible    ↗
          Gulpease Index        < 0.0001   negligible    ↘              Flesch Vacca Index




                                                               Human1
          Flesch Vacca Index    < 0.0001   negligible    ↘              NVdB
 Human1




          NVdB                  < 0.0001   negligible    ↘              Passive
          Passive                                                       READ-IT BASE          < 0.0001    small         ↘
          READ-IT BASE            0.0052   negligible    ↘              READ-IT LEXICAL       < 0.0001    negligible    ↘
          READ-IT LEXICAL       < 0.0001   negligible    ↗              READ-IT SYNTACTIC     < 0.0001    small         ↘
          READ-IT SYNTACTIC     < 0.0001   small         ↘              READ-IT GLOBAL        < 0.0001    small         ↘
          READ-IT GLOBAL                                                Semantic Similarity   < 0.0001    medium        ↘
          Semantic Similarity   < 0.0001   small         ↘              Edit distance         < 0.0001    large         ↗
          Edit distance         < 0.0001   medium        ↗              Gulpease Index
          Gulpease Index        < 0.0001   small         ↘              Flesch Vacca Index




                                                               Human2
          Flesch Vacca Index    < 0.0001   negligible    ↘              NVdB                  < 0.0001    small         ↗
 Human2




          NVdB                  < 0.0001   negligible    ↗              Passive
          Passive                 0.0072   negligible    ↘              READ-IT BASE          < 0.0001    negligible    ↗
          READ-IT BASE          < 0.0001   small         ↗              READ-IT LEXICAL       < 0.0001    small         ↘
          READ-IT LEXICAL         0.0091   negligible    ↗              READ-IT SYNTACTIC
          READ-IT SYNTACTIC                                             READ-IT GLOBAL
          READ-IT GLOBAL          0.0003   negligible    ↗              Semantic Similarity   < 0.0001    large         ↘
          Semantic Similarity   < 0.0001   medium        ↘              Edit distance         < 0.0001    large         ↗
          Edit distance         < 0.0001   large         ↗


Table 6                                                      A. Corpus ItaIst
Results of the Wilcoxon Signed-Rank Test and Cliff’s Delta
Effect Size performed on LLaMA 3, Human1, and Human2       The ItaIst corpus is a comprehensive collection of Italian
metrics.                                                   administrative documents. Table 4 provides an overview
          Metrics                p-value   Effect Size     of the topics and regions from which these documents
          Gulpease Index          0.0077   negligible    ↗ were collected. This corpus has been assembled to rep-
          Flesch Vacca Index                               resent the diversity and complexity of contemporary ad-
 Human1




          NVdB                                             ministrative Italian, ensuring its relevance for linguistic
          Passive                                          and computational analysis.
          READ-IT BASE          < 0.0001   small         ↘
          READ-IT LEXICAL       < 0.0001   negligible    ↘
          READ-IT SYNTACTIC     < 0.0001   small         ↘   Table 4
          READ-IT GLOBAL        < 0.0001   small         ↘   Topics and regions of documents collected in ItaIst
          Semantic Similarity   < 0.0001   medium        ↘                      Garbage   Healthcare     Public services
          Edit distance         < 0.0001   large         ↗    Basilicata              8            3                   9
          Gulpease Index                                      Calabria               11            5                   9
          Flesch Vacca Index                                  Campania               14            7                   9
 Human2




          NVdB                  < 0.0001   small         ↗    Lazio                   9            3                   9
          Passive                                             Lombardia              15            3                  11
          READ-IT BASE          < 0.0001   negligible    ↗    Molise                 10            7                   9
          READ-IT LEXICAL       < 0.0001   small         ↘    Toscana                19            4                  12
          READ-IT SYNTACTIC                                   Veneto                  9            5                  10
          READ-IT GLOBAL
          Semantic Similarity   < 0.0001   large         ↘
          Edit distance         < 0.0001   large         ↗
                                                             B. Prompt engineering
                                                             In the context of LLMs, the term prompt refers to the
                                                             instructions provided to a language model to generate
                                                             a specific response. Prompt engineering is the process
                                                             of designing a clear and detailed prompt to instruct the
                                                             model to generate a desired response. The prompt we
                                                             used to ask the models to simplify administrative text is:

                                                               Sei un dipendente pubblico che deve scrivere dei doc-
                                                             umenti istituzionali italiani per renderli semplici e com-
                                                             prensibili per i cittadini. Ti verrà fornito un documento
pubblico e il tuo compito sarà quello di riscriverlo appli- The Wilcoxon Signed-Rank Test and Cliff’s Delta effect size
cando regole di semplificazione senza però modificare il were employed to evaluate the metrics of GPT-3.5-Turbo,
significato del documento originale. Ad esempio potresti LLaMA 3, and Phi 3 models in comparison to two human
rendere le frasi più brevi, eliminare le perifrasi, esplicitare simplifiers, labelled as Human1 and Human2. These anal-
sempre il soggetto, utilizzare parole più semplicii, trasfor- yses provide insights into the relative effectiveness of
mare i verbi passivi in verbi di forma attiva, spostare le AI-driven simplifications versus human efforts.
frasi parentetiche alla fine del periodo.
                                                            D. Examples
C. Tests
                                                          Table 8 provides several examples of text simplification.
Table 5, Table 6, and Table 7 report the results of the For each example, we present the original text alongside
statistical analyses conducted to compare the simplifica- its simplified versions. The values of the complexity and
tion performance of various LLMs against human experts. similarity metrics are reported for each text.
Table 8
Examples of simplifications.
                 L’operatore di Polizia Locale, quindi, rappresenta un importante punto di riferimento per la collettività. Nell’ambito delle
 Original




                 sue funzioni, esso svolge i propri compiti in maniera autorevole, dando prova di preparazione professionale e sensibilità
                 nel contatto relazionale. La sua attività, inoltre, è caratterizzata dal costante sforzo teso alla migliore interpretazione delle
                 situazioni e delle problematiche incontrate, applicando un approccio educativo e orientato alla più adeguata risposta ai
                 bisogni della cittadinanza.
                 Gulpease Index Flesch Vacca Index NVdB (%) Passive (%) Semantic Similarity (%) Edit distance (%)
                                 38                        12           77 %           28 %                               -                      -
                 La Polizia Locale è un punto di riferimento per i cittadini. La Polizia Locale ha autorevolezza, professionalità e sensibilità
 Human1




                 nel contatto con i cittadini. La Polizia Locale cerca sempre di interpretare al meglio situazioni e problematiche incontrate.
                 La Polizia Locale si comporta in modo da educare e rispondere adeguatamente ai bisogni dei cittadini.
                 Gulpease Index Flesch Vacca Index NVdB (%) Passive (%) Semantic Similarity (%) Edit distance (%)
                                 55                        33           67 %             0%                           93 %                   56 %
                 L’operatore di Polizia Locale, quindi, è un importante punto di riferimento per la collettività. Quando è in servizio, esso
 Human2




                 svolge i propri compiti in maniera autorevole. Dimostra preparazione professionale e sensibilità nel contatto con gli
                 utenti. Cerca sempre di interpretare le situazioni e i problemi nel migliore dei modi. Applica un approccio educativo e
                 cerca di adeguarsi ai bisogni della cittadinanza.
                 Gulpease Index Flesch Vacca Index NVdB (%) Passive (%) Semantic Similarity (%) Edit distance (%)
                                 58                        42           83 %             0%                           98 %                   35 %
                 L’operatore di Polizia Locale è un punto di riferimento importante per la comunità. Svolge i suoi compiti con autorità,
 GPT-4




                 mostrando competenza professionale e sensibilità nelle relazioni. Inoltre, lavora sempre per capire al meglio le situazioni
                 e i problemi che incontra, usando un approccio educativo per rispondere adeguatamente ai bisogni dei cittadini.
                 Gulpease Index Flesch Vacca Index NVdB (%) Passive (%) Semantic Similarity (%) Edit distance (%)
                                 48                        32           84 %             0%                           97 %                   48 %
                 L’operatore di Polizia Locale è un punto di riferimento importante per la comunità. Nell’esercizio delle sue funzioni,
 GPT-3.5-Turbo




                 svolge i compiti con autorevolezza, dimostrando professionalità e sensibilità nei rapporti con le persone. Si impegna
                 costantemente a comprendere al meglio le situazioni e le problematiche affrontate, adottando un approccio educativo e
                 orientato a rispondere in modo adeguato ai bisogni dei cittadini.
                 Gulpease Index Flesch Vacca Index NVdB (%) Passive (%) Semantic Similarity (%) Edit distance (%)
                                 45                        27           78 %             0%                           98 %                   45 %
                 L’operatore di Polizia Locale è un punto di riferimento importante per la comunità. Esegue i suoi compiti con autorità,
 LLaMA 3




                 professionalità e sensibilità nel rapporto con la gente.
                 La sua attività è caratterizzata dal costante impegno per comprendere meglio le situazioni e i problemi, e per rispondere
                 in modo educativo ai bisogni dei cittadini, con un approccio orientato alla loro assistenza.
                 Gulpease Index Flesch Vacca Index NVdB (%) Passive (%) Semantic Similarity (%) Edit distance (%)
                                 50                        37           85 %           28 %                           96 %                   54 %
                 L’operatore di Polizia Locale è un punto di riferimento importante per la comunità. Esegue i suoi compiti con autorità,
 Phi 3




                 professionalità e sensibilità nel trattare le persone. La sua attività è guidata dal desiderio di capire meglio le situazioni e
                 le problematiche, e di rispondere in modo appropriato ai bisogni dei cittadini, con un approccio educativo.
                 Gulpease Index Flesch Vacca Index NVdB (%) Passive (%) Semantic Similarity (%) Edit distance (%)
                                 52                        38           82 %           28 %                           96 %                   56 %