On the Reform of the Italian Constitution: an Interdisciplinary Text Readability Analysis Calogero Jerik Scozzaro1,2,* , Matteo Delsanto1 , Antonio Mastropaolo2 , Enrico Mensa1 , Luisa Revelli2 and Daniele Paolo Radicioni1,* 1 Università degli Studi di Torino, Italy 2 Università della Valle d’Aosta, Italy Abstract This work can be considered as an instant paper: on June 18, 2024 the Constitutional Reform Bill presented by the Italian Government last November and reviewed by the Senate’s Constitutional Affairs Committee on the "premiership" received its first approval by the Italian Senate. We present an analysis aimed at linguistically and computationally characterizing the readability of the text amendments now being discussed. It puts together evidences from different perspectives: legal and linguistic analysis, traditional readability indices, a novel attempt to define readability through the prediction of reading times; all such perspectives are compared with the output obtained by prompting GPT to take into consideration also the output of that language model. The proposed analyses can be intended as a technical contribution to the reflection on issues fundamental to democracy in Italy and beyond, concerning the need to analyze the quality of the writing of such fundamental documents for the democratic life of states. Keywords Text Readability, Text Simplification, Reading Times, Constitutional Reform Analysis. 1. Introduction The Italian Constitution is the cornerstone of the country’s legal and political system: it establishes the framework for government, delineates the separation of powers, and guarantees the fundamental rights and freedoms of individuals. Its role in the Italian legislative system is multifaceted, serving as the supreme law and acting as a source of legitimacy for all laws and regulations: any law or regulation that contradicts the Constitution can be declared unconstitutional and void by the Constitutional Court (Corte Costituzionale). Such supremacy ensures that all legislative and executive actions conform to constitutional principles. The Constitution also embeds the principles of democracy, ensuring that the government is elected by the people and that sovereignty is exerced by citizens through the Italian Parliament. Additionally, it includes provisions to protect political pluralism and to prevent the concentration of power. This study proposes an analysis of the Constitutional Reform Bill, under deliberation by the Italian Parliament since mid-November 2023: the proposed reform impacts on Articles 59, 88, 92, and 94 of the Constitution affecting relevant topics, such as the direct election of the President of the Council of Ministers, and deeply modifies the role of the President of the Republic.1 Ensuring the readability of this text would be of the utmost importance, since the comprehensibility of the basic democratic elements being modified is a basic pillar of the democratic system. This study thus provides an analysis of how clear and readable is the set of articles as reformulated in the present Constitutional Reform Bill. In general, legislative and regulatory text documents contain complex, highly specialized language, lengthy sentences that are typically considered as difficult to understand. It is featured by specific semiotic and linguistic conventions, vocabulary, semantics, syntax and morphology that may result as difficult to understand by laypeople with no domain expertise. The drafting process of the Italian NL4AI 2024: Eighth Workshop on Natural Language for Artificial Intelligence, November 26-27th, 2024, Bolzano, Italy [1] * Corresponding author. $ calogerojerik.scozzaro@unito.it (C. J. Scozzaro); daniele.radicioni@unito.it (D. P. Radicioni) © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). 1 https://www.senato.it/leg/19/BGT/Schede/Ddliter/testi/57694_testi.htm CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings Constitution, however, also reflects a linguistic refinement process, targeted at simplifying the text as much as possible: normative clarity was reputed a necessary device for making the constitutional text understandable and knowable according to democratic ideals [2]. In the following we present a pool of analyses and measures to assess the readability of the Constitu- tional Reform Bill. We start by briefly recalling a bit of history of the Italian Constitution (Section 2); in Section 3 our analyses are presented: we first provide a linguistic and legal analysis of the meaning of the proposed amendments to lay the foundations for comparisons with automatic approaches (Section 3.1). In Section 3.2 we illustrate a first assessment of the readability of the proposed amendments based on well-known readability indices, and introduce a novel measure of text readability based on the prediction of reading times, and report and comment the output of a prompting session involving Chat-GPT4o [3, 4]. These are the main contributions of the work: unfortunately enough, it attests converging evidence that the text being proposed is poor, regards as the quality of the writing; secondly, it provides a working example of how different disciplines may join efforts providing as many (possibly complimentary) analyses. Finally, we introduce a novel readability metric, whereby the readability of a text is estimated based on the prediction of reading times. This metric is fully different from existing approaches, as it relies on training a system on eye-tracking (low-level behavioral) data, also refined through the adoption of a language model fine-tuned on a corpus of Regional regulations, under the assumption that reading times are a direct function of the cognitive load implied in text understanding, and can thus be employed as a proxy for text readability. 2. Background: History and Main Linguistic Traits of the Italian Constitution After the election of the Constituent Assembly in June 1946, the process of creating the Italian Con- stitution began with the establishment of a Committee tasked with drafting and proposing the initial Constitution project. This Committee was supported by a drafting committee responsible for refining the text, addressing both legal and linguistic issues, and ensuring sufficient normative clarity necessary to make the constitutional text understandable and knowable according to democratic ideals [2, 136]. The Italian Constitution is the result of this legal and linguistic clarification effort, aimed at reaching a community of speakers that was still largely dialect-speaking and poorly literate [5]. From a syntactic perspective, the brevity of sentences in the Italian Constitution, averaging 19 words: structures are deliberately kept as simple as possible. Sentences composed of a single clause are predominant, and exhibit a regular arrangement of components [6]. In cases of complex sentences, coordination is preferred over subordination [7]. The commitment of the drafters to create a linear and accessible text is also evidenced by the sparing use of the subjunctive mood, employed only 26 times, and primarily in contexts where it is mandatory [8]. Regarding the lexicon, 1, 002 of the 1, 357 lemmas used in the Italian Constitution belong to the basic vocabulary, which comprises the most frequent and familiar words for Italian speakers. Their overall recurrence, accounting for 92.13% of the text (9, 369 tokens in the original text; 10, 717 in the currently effective version) to a good extent ensures lexical readability [9]. The overall semantic accessibility also depends on collocations and contextual meanings. The presence of legal phrases with technical-scientific connotations and a significant number of redefinitions –common words used with specific sectoral meanings– may create comprehension difficulties or interpretive misunderstandings, particularly for younger or less competent speakers [10, 11, 12]. The following analysis focuses on the set of articles with the proposed amendments, as presently (as of June, 2024) approved by the Italian Senate.2 2 These are available at the URL https://www.senato.it/service/PDF/PDFServer/BGT/01414450.pdf. 3. Analysis of the Constitutional Reform Bill The proposed amendments to the five articles of the Italian Constitution result in an overall textual and structural increase. The total number of paragraphs increases from 14 to 20, the number of words from 279 to 581, and the number of clauses from 16 to 26 (excluding transitional provisions). The average sentence length of 22.3 words is higher than for the corresponding portion of the original text (18.4 words per sentence), though it still adheres to the controlled writing guidelines, which recommend not exceeding 20-25 words per sentence [13, 14]. The deviation from the brevity principles that guided the original Constitution drafters is more evident when considering that 4 of the newly proposed clauses are composed of more than 40 words, and 2 of these exceed 50 words. 3.1. Analysis of the Amendments to Articles 92 and 94 The first clause of the second paragraph of Article 92 consists of 41 words. It requires the reader to understand the legal phrase “universal and direct suffrage.”3 Within the same Article 92, the third paragraph (53 words) provides details on the procedures for the election of the Chambers and the President of the Council of Ministers through a long clause (26 words) introduced by a gerund and marked by a closing comma, which disrupts the syntactic-semantic linearity of the sentence. The twenty paragraphs included in the amendment proposals vary from of a minimum of 13 words (Art. 88, paragraph 1) to a maximum of 69 words (Art. 94, paragraph 3). The paragraphs containing the main innovations compared to the original constitutional text are designed to introduce procedural methods intended to be legally unambiguous. Such an objective is pursued in some cases through the use of redundancy mechanisms: role designations –in their complete formulation, considering that the title of ‘president’ is attributed to multiple offices– are therefore repeated. The resulting cumbersome effect of this solution is notable in paragraph 3 of Art. 94, presented below along with a possibly simplified rewrite. [Art. 94, paragraph 3: 3 propositions; 68 words; 381 chars]4 Within ten days since its formation, the Government presents itself to the Chambers to obtain their trust vote. If the trust in the Government led by the elected President is not voted, the President of the Republic renews the mandate to the elected President to form the Government. If, even in this case, the Government does not obtain the trust vote of the Chambers, the President of the Republic proceeds to dissolve the Chambers. Simplified rewrite (3 propositions; 59 words; 301 characters):5 After being formed, the Government has ten days to present itself to the Chambers and seek their trust vote. If trust is not granted, the President of the Republic once again tasks the elected President with forming the Government. If, even in this second case, the Government does not obtain favourable trust vote, the President of the Republic dissolves the Chambers. From a methodological point of view, one of the authors (expert in linguistic sciences) produced the simplified version of the proposed amendment, while the remaining authors approved the simplified rewrite as semantically equivalent to the original.6 3 This is an unavoidable technical expression, already present in four other articles of the original text (Article 56, Article 58, Article 122, and Article 126). 4 The original Italian formulation is: “Entro dieci giorni dalla sua formazione il Governo si presenta alle Camere per ottenerne la fiducia. Nel caso in cui non sia approvata la mozione di fiducia al Governo presieduto dal Presidente eletto, il Presidente della Repubblica rinnova l’incarico al Presidente eletto di formare il Governo. Qualora anche in quest’ultimo caso il Governo non ottenga la fiducia delle Camere, il Presidente della Repubblica procede allo scioglimento delle Camere”. 5 “Dopo essere stato formato il Governo ha dieci giorni di tempo per presentarsi alle Camere e chiedere la loro fiducia. Se la fiducia non viene concessa il Presidente della Repubblica incarica di nuovo il Presidente eletto di formare il Governo. Se anche in questo secondo caso il Governo non ottiene la fiducia il Presidente della Repubblica scioglie le Camere”. 6 More specifically, the simplified texts were drafted by trying to keep them as structurally faithful as possible to the original. At the same time the simplified version was modified at different levels: from a lexical point of view, technical or potentially ambiguous terminology was replaced by lexemes belonging to the basic vocabulary of standard Italian; from a morpho- syntactic point of view, complex formulations were reworked through structurally more linear solutions and dislocation of components. A different example is provided in the fifth paragraph of Art. 92, which states: “The President of the Republic entrusts the elected President of the Council of Ministers with the task of forming the Government; appoints and dismisses ministers upon this proposal.” In this case, the implicitness resulting from the anaphoric use of “this” makes it grammatically acceptable for “this” to refer to the antecedent “Government” rather than to the intended referent, the “elected President of the Council of Ministers,” which is presumably the substitution intended by the Legislator.7 The following paragraph consists of a single proposition (30 words) that includes two parenthetical statements, a relative clause, and two anaphoric references (‘who’, [orders] ‘it’): In the event of the resignation of the elected President of the Council of Ministers, following parliamentary notification, he may request the dissolution of the Chambers to the President of the Republic, who orders it. The content of the paragraph could be presented through a simpler utterance (24 words): In the event of resignation, the elected President of the Council of Ministers has seven days to ask the President of the Republic to dissolve the Chambers. The failure to exercise the power to which the Legislator presumably refers is the proposal to dissolve the Chambers by the elected President of the Council of Ministers, and not the dissolution of the Chambers by the President of the Republic, as a literal interpretation of the text might suggest. This reading would even leave open the possibility that the President of the Republic might not accept the dissolution request made by the President of the Council of Ministers. The first sentence could be integrated and reformulated as follows, with the parenthetical statement moved forward to preserve the unity of the phrase “conferire l’incarico” (to assign the task): If the resigning President of the Council of Ministers does not exercise this power, the President of the Republic may, once during the legislative term, assign to him or another parliamentarian elected in connection with him the task of forming the Government. The solutions to be provided in cases of “death, permanent disability, or removal from office,” which present completely different scenarios and are incompatible with the prospect of a second mandate for the elected President of the Council of Ministers, should be separated and included in a separate paragraph. The phrase illustrating a permanent disability can be considered an example of under- determination [15], as it is subject to various plausible interpretations and therefore a potential source of disputes during the application phase. From a lexical standpoint, the overall examination of the amendment texts –leaving apart semantic re- definitions and specialized locutions (‘seduta comune’, ‘revoca della fiducia’, ‘informativa parlamentare’: respectively “joint session”, “revocation of trust”, “information to lawmakers”) that cannot be replaced in the legal field– reveals defects that could be avoided, such as the use of bureaucratic terms (‘avere luogo’ instead of ‘svolgersi’, both meaning ‘to take place’; ‘conferire’ instead of ‘assegnare’ or ‘attribuire’, both meaning ‘to give’) and collocations that are not part of the standard Italian language, such as the transitive use of the verb ‘importare’ in the phrase ‘importare obbligo’ (‘to involve an obligation’) (Art. 94, paragraph 4: “Il voto contrario [...] non importa obbligo di dimissioni”). 3.2. Readability Analysis based on the Prediction of Reading Times 3.2.1. Readability Indices Readability indices are used to estimate the difficulty of reading a text [16]. These indices are calculated based on various linguistic elements, including the number of syllables, words, and sentences. The most popular readability index is the Flesch Reading Ease Score [17], which assigns a score between 0 (hardest) and 100 (easiest) based on the number of syllables per 100 words and the number of words per sentence. This reading ease score can be converted into a grade level, resulting in the Flesch-Kincaid 7 We tried to preserve in the translation the ambiguity stemming from the original sentence “Il Presidente della Repubblica conferisce al Presidente del Consiglio eletto l’incarico di formare il Governo; nomina e revoca, su proposta di questo, i ministri”, where the demonstrative pronoun “di questo” has been translated with “this”. Table 1 Indices assessing the readability of the considered articles of the Italian Constitution along with their proposed amendments. While both Flesch-Vacca and GulpEase are readability scores (such that a higher score is preferable, which is shown by the symbol ↑), the READ-IT score grasps the difficulty, so a lower score is preferable in this case (↓). By ‘in force’ we indicate an article currently in force, ‘proposed reform’ refers to an amended article according to the governative proposal; while ‘simplified prop.ref.’ indicates our simplification of the proposed reform. Articles Flesch-Vacca ↑ GulpEase ↑ READ-IT ↓ Art. 92 in force 43.32 49.26 0.10% Art. 92 proposed reform 32.81 45.34 100.00% Art. 94 in force 49.44 55.97 48.00% Art. 94 proposed reform 35.62 47.36 95.10% Art. 94 simplified prop.ref. 45.03 52.26 52.40% Grade Level [18]. In 1972, Franchina and Vacca created an Italian adaptation of the Flesch Reading Ease Score [19]. Additionally, in 1986, another readability index was developed for the Italian language, the GulpEase Index [20]. Both indices follow a scoring scale similar to the Flesch Reading Ease Score, where higher scores indicate greater readability. More recently, a readability index specifically designed for text simplification has been devised: READ–IT [21]. This index combines traditional raw text features with lexical, morpho-syntactic and syntactic information, and allows computing the readability (namely, a difficulty) score for entire texts, and sentences therein. The Vacca Index scores computed for the texts at hand show a decrease in readability for the proposed versions of the Constitution articles with respect to the in force versions. Specifically, the readability score for Article 92 drops from 43.32 (version currently in force) to 32.81 for the newly proposed version. Likewise, the readability score for the Article 94 decreased from 49.44 in the former version to 35.62, characterizing the amended version of the article.8 The Vacca index scores computed for articles 92 and 94 are reported in Table 1. The GulpEase Index scores are consistent with the Vacca Index, showing a significant reduction in readability, for both Article 92 and 94: the scores of the former drop from 49.26 in the version in force to 45.34 in the amended version, and from 55.97 to 47.36 in the amended version for the latter article.9 Moreover, the simplified version of Article 94 introduced in Section 3.1 shows an increase in readability compared to the new version on both the Vacca Index and GulpEase, with values of 45.03 and 52.26, respectively. In Table 1 we also provide the READ-IT difficulty scores that, similar to previous indices, show that the proposed amended texts contain elements worsening the overall readability. Finally, we note that the proposed simplified text consistently received more favorable scores, showing that the readability associated to the proposed amendments can be substantially improved. Figure 1 illustrates the Flesch-Vacca and GulpEase scores for all articles in the Constitution, along with the average score for each index. The plotted points collectively describe the readability of the whole Constitution, and provide a context to the scores computed for Article 92 –for which we propose a comparison between the in-force and proposed version– and Article 94, for which we additionally report the readability score of our simplified text. For the Vacca Index, both the in force and proposed versions of articles 92 and 94 are above the average value. Conversely, for the GulpEase Index, the in force versions are above the average, while the proposed versions fall below it, thereby resulting in a reduced readability. 3.2.2. Reading Times as a Proxy for Readability To date, eye tracking systems allow to collect precise data in form of timestamped fixations that describe and to a good extent allow to reconstruct readers’ behavior and difficulty throughout the reading task. 8 According to the Vacca Index, text associated with scores between 30-50 is understandable for university students, while scores between 50-60 characterize text suited for high school students. 9 According to the GulpEase Index, texts with scores below 40 are difficult to understand for high school graduates, while those with scores below 60 are difficult for people with a middle school diploma. Figure 1: Flesch-Vacca Index scores (on the left) and GulpEase Index scores (on the right) for all the 139 articles in the Constitution. Regards as the Articles 92 and 94, points marked in orange report the values obtained for the proposed modifications, while scores for the in-force versions are marked in black. Green points report the values associated to the simplified version (only available for the Article 94). On the other side, the refinement and spread of language models allows to automatically perform subtle forms of linguistic analysis, such as determining the semantic coherence between a term and its surrounding context, thereby determining the predictability of words given their preceding context. To give some background on how eye-tracking works, two main eye movements are commonly individuated throughout the reading task, fixations and saccades. Fixations are brief stops (with duration ranging from 50 to 1500 ms) that typically occur at each word; sometimes more stops are needed, depending on words length and difficulty. Saccades are fast (ranging from 10 to 100 ms) movements between each two fixations, used in repositioning the point of focus. Based on these underpinnings we explored a novel approach to assess text documents readability: in essence, this approach relies on the following intuition. Reading times can be employed as a proxy for different significant stages in linguistic processing. In particular, the total reading time (TRT) —the overall duration of eye fixations for each word, including the backward regression movements— is largely acknowledged to grasp the time taken by the overall semantic integration [22]. Moreover, two partial and finer-grained measures have been also proposed: the duration of the first fixation (FFD) that allows estimating the effort underlying lexical access [23], and the number of fixations (NF), that is typically associated to the integration of words in the frame of what has been read so far [24]. In this setting higher reading times are a function of a higher cognitive load, and report about less readable text excerpt. A model was trained and tested on eye-tracking data collected from 60 subjects reading a Regional Regulation from the Aosta Valley; it basically relies on a LightGBM regressor that incorporates word- related statistics known to influence sentence and word processing (such as word frequency, word length, word position within the sentence, previous word frequency, and previous word length). This model was also refined through surprisal scores, computed based on a fine-tuned version of an Italian GPT-2 model [25]. This fine-tuning step was performed by exposing the language model to 2, 950 Regional laws and 131 Regional regulations from the Aosta Valley Region. The LightGBM regressor is based on the gradient boosting framework, an ensemble learning technique that utilizes a pool of weak learners (decision trees), and its algorithm is featured by a leaf-wise tree growth strategy, implying that the algorithm grows the tree by expanding the leaf with the maximum delta loss instead of growing it level by level, in depth-wise fashion, to find optimal split points more quickly. A comprehensive search for optimal hyperparameters was performed using a grid search technique. The hyperparameters considered for optimization are: the maximum number of leaves in each tree; the learning rate; the number of estimators (trees) to be built to tune the balance between under- and over-fitting; the maximum depth of each tree. The optimization process targeted the mean absolute error (MAE). The evaluation of different parameter combinations was performed through a 5-fold cross-validation strategy during the grid search. From a methodological standpoint we are of course aware of the differences between the text properties of the Italian Constitution compared to a Regional Regulation,10 but since there are no available datasets that include Eye-Tracking data associated with the reading of the Constitution, we resorted to data originally conceived to predict the reading times associated to Regional norms from the Aosta Valley [26]. The adopted model implements an approach successfully employed for the CMCL 2021 Shared Task on Eye-Tracking Prediction [27, 28]. As mentioned, a key element in our model is the adoption of surprisal scores. We briefly recall this feature, which is illustrated in more detail in [26]. Further details on the application of the closely related metrics of perplexity may be found in [29, 30, 31]. In the last few years neural language models gained a central role in analyzing reading as well, since they are able to acquire conditional probability distri- butions over the lexicon that are to a good extent predictive of human processing times. Probabilistic language modeling, as a device to describe the incremental mechanisms underlying language processing, is acknowledged as helpful to account (at a high level) for basic cognitive strategies [32, 33]. Such strategies are primarily concerned with planning and handling expectations on what follows, and on evaluating how these match with actual stimuli [34]. One chief assumption is that words predictability should be intended as a function of the probability of a word given the context, and the probability of that word may work, in turn, as a main predictor of reading times [35]. In essence, the less likely the emission of a word, the higher the surprisal associated to that word, and —what counts more for our present concerns— the longer the time it requires for readers to process it, effort(𝑡) ∝ surprisal(𝑤𝑡 ) = − log(𝑃 (𝑤𝑡 |𝑤1 , . . . , 𝑤𝑡−1 )). Surprisal scores were thus plugged into our model to support the prediction of reading times by also accounting for the difficulty of predicting words. Results The average predicted reading times for the articles, measured in milliseconds, do not show significant differences. Narrowing the analysis to tokens without stop words, the amended version of Article 92 exhibits slightly shorter reading times, while in Article 94 the proposed amendments result in longer reading times. In Table 2 (‘average’ section: top of Table) we report the average total reading times (TRTs) associated to tokens in the original and in the amended version of the Article 92, as well as in the original, amended, and simplified version of the Article 94. In the ‘sum’ section (at the bottom of the same Table) we display the sum of TRTs predicted for tokens in the articles. Different from the above average, this measure is no longer averaged over all tokens in the text, and thus reflects in closer fashion the increased difficulty stemming from lengthy text sequences. As expected, the reading times predicted for the simplified version of Article 94 are slightly lower compared to the proposed reform, both considering the whole text, and by filtering stop words. In Figure 2 we provide a comparison of the predicted TRTs for the articles 92 and 94 against the rest of the Italian Constitution: in particular we report the average predicted TRTs and the sum of predicted TRTs for the tokens in each article of the Constitution. Articles 92 and 94 (both the in force and proposed versions) show lower average TRT compared to the average over of the entire Constitution. Additionally, the sum of TRTs (reported on the right sub-figure in Figure 2) indicates that, while the in force versions of these two articles align with the rest of the Constitution, the proposed versions are significantly longer, among the longest articles in the Constitution. 10 The language of the Constitution tends to be more formal, precise, and abstract, and operationalizes broad principles and fundamental rights such as ‘freedom’ and ‘democracy’. Regional Legislation, on the other side is more specific and practical, dealing with concrete issues and administrative matters that are the fields in which regions produce norms and regulations. These may use more technical jargon relevant to specific sectors like health, environment, transportation, or education. Table 2 Predicted reading times for the Articles 92 and 94 (‘in force’), and their amended text (‘proposed reform’ rows); we also report the TRT predicted for the simplified version of the Art. 94, ‘simplified prop.ref.’. TRTs associated to the whole text, with no filtering, and to the text after stopwords filtering are reported. In the top subtable (‘average’) the average TRTs (complemented by their standard deviations) are reported; the bottom subtable illustrates figures computed as the sum of all TRTs predicted for the tokens in the considered article. average TRT - whole text TRT - filtered text 92 in force 260.01 (146.8) 395.14 (45.09) 92 proposed reform 258.32 (156.26) 381.22 (102.75) 94 in force 257.24 (133.03) 351.14 (71.85) 94 proposed reform 261.63 (141.1) 360.19 (87.91) 94 simplified prop.ref. 254.89 (141.37) 357.44 (82.82) sum TRT - whole text TRT - filtered text 92 in force 9, 620.55 7, 122.48 92 proposed reform 41, 589.73 32, 022.29 94 in force 22, 380.21 18, 259.32 94 proposed reform 62, 528.91 50, 427.21 94 simplified prop.ref. 55, 566.57 44, 322.79 Let us inspect more closely the TRTs predicted for Article 94 at the sentence level, reported in Table 3. These are also computed as averaged figures and as the sum of TRTs of the tokens in the considered sentence, as formerly described. All newly proposed sentences (namely sentences number 4, 5, 8, 9, and 10) are featured by TRTs higher than the mean TRTs of the version presently in force of the Article 94, that is 257.24 ms (please refer to Table 2). The Pearson correlation index between the sum of TRTs predicted for the three versions of Article 94 and the READ-IT scores amounts to 0.69 (𝑝 < 0.2), 0.60 (𝑝 < 0.068), and 0.62 (𝑝 < 0.057) for the current version, its proposed amendment, and the simplified rewrite for the amendment, respectively. This datum shows that reading times predictions are able to capture what has been described at the linguistic and legal level, and through the indices surveyed in Table 1. To refine our analysis at the word level, we manually annotated the words identified as difficult/poorly Figure 2: Average total reading times (on the left) and sum of total reading times (on the right) for all the 139 articles in the Constitution. Regards as the Articles 92 and 94, points marked in orange report the values obtained for the proposed modifications, while scores for the in-force versions are marked in black. Green points report the values associated to the simplified version (only available for the Article 94). Table 3 Predicted total reading times (TRTs) for the Article 94. Reported results refer to values computed at the sentence level for the original formulation, the proposed modifications, and the TRTs predicted for the simplified version: ‘in force’, ‘proposed reform’ rows, and ‘simplified prop.ref.’, respectively. Dashes in the ‘in force’ column indicate that that specific sentence is not currently present in the Italian Constitution (that is, a new sentence was added in the proposed reform); dashes in the rightmost columns indicate that no TRT was computed for the text of the corresponding sentence, since it was left unaltered in the proposed reform. Art. 94 in force proposed reform simplified prop.ref. Sent. num. average (sum) average (sum) average (sum) 1 274.03 (2, 192.22) – – 2 309.75 (4, 646.2) – – 3 259.95 (4, 159.26) – – 4 – 257.97 (7, 739.01) 258.06 (5, 161.16) 5 – 271.39 (5, 970.62) 243.78 (4, 631.85) 6 245.29 (4, 660.5) – – 7 231.79 (6, 722.03) – – 8 – 268.35 (5, 635.35) 294.34 (6, 769.90) 9 – 270.63 (7, 848.38) 242.82 (5, 827.80) 10 – 257.82 (12, 890.76) 244.36 (10, 018.65) readable or demanding in the proposed reform sentences and calculated their average predicted TRTs.11 Tokens from the Article 92 and annotated as difficult are associated to predicted TRT that on average reach 412.15 ms (235.25 for tokens not marked as difficult), while for the Article 94 we found that the average over TRTs is 391.80 (and 240.77). Moreover, we partitioned the words in the amended versions into two groups: those whose predicted TRT is above the mean (258.32 ms for Article 92, and 261.32 for Article 94) and those below the mean. We then calculated the correlation with the manually annotated words, finding a Pearson correlation of 0.27 (Article 92) and 0.30 (Article 94) with significance of 𝑝 < 0.0006 and 𝑝 < 0.000003 respectively. Both trials reveal a reasonable fit between the human annotation and the predicted TRTs, thus corroborating the proposed approach as consistent with human annotation. Also, our initial assumption that longer reading times report about an augmented cognitive load seems to be confirmed, based on both correlation with human introspection on what counts as readable (or not), and with more traditional indices. 3.3. GPT-based Analysis We prompted Chat-GPT4o to analyze and compare the readability of articles in force and their proposed amendment. Specifically, the following prompt was used (here reported in English, but originally used in Italian):12 I will send you two versions of an article from the Constitution. I would like to know if one is more complicated than the other or not, adding which parts you find more complex and why. Focus on linguistic complexity and not on meaning. Present your comments point by point. Version 1:
Version 2:
Regarding Article 92, GPT answers are in line with our our linguistic analysis. GPT remarks that the proposed amendment has longer and more intricate sentences, packed with information, redundancies and technical language which make it difficult to follow. These evaluations are consistent with the results obtained by applying the Flesch-Vacca, GulpEase and READ-IT indices to the sentences in (old and renewed version of) the Article 92. Such scores are reported in Table 4: we note that the amended 11 A single annotation was collected, performed by an expert linguist. 12 The transcript of the full interaction with GPT is available at the URL https://github.com/mensae/costituzione-analysis/. Table 4 Comparison of the Flesch-Vacca, GulpEase, and READ-IT indices, along with the sum of TRTs for the six sentences from the Article 92. Reported figures characterize the version currently in force (only containing two sentences, 1 and 6) and the proposed amendments, where the first sentence is kept unaltered, sentences from 2 to 5 are newly introduced, and sentence 6 modifies the second sentence in the original formulation. The symbol ↑ indicates that a higher score is preferable, while ↓ indicates that a lower score is preferable. Sent. num Flesch-Vacca ↑ GulpEase ↑ READ-IT ↓ TRT in force proposed in force proposed in force proposed in force proposed 1 39.75 39.75 46.5 46.5 77.90% 77.90% 5, 094.65 5, 094.65 2 – 30.09 – 47.81 – 47.8% – 10, 111.83 3 – 53 – 56.89 – 0.30% – 3, 240.81 4 – 6.35 – 39.93 – 100.00% – 13, 033.98 5 – 52.33 – 57.67 – 98.30% – 3, 510.4 6 53.58 35, 4 54.26 47.4 3.20% 97.40% 4, 525.89 6, 598.06 sentence number 6, corresponding to sentence 2 in the version currently in force, received reduced readability scores with respect to its former wording. We observe that the newly added paragraphs in the proposedly amended version are characterized by reduced readability –or, equivalently, increased reading times–, on average, over all considered indices. Similarly, for the Article 94 we note that the use of domain-specific language, difficult for non-expert readers, is highlighted for the sentences in the version currently in force, as well. Also in this case the comments collected by prompting GPT, about the overall amount of information, length and complexity of the sentences in the proposed paragraphs treat all such factors as possibly confusing and badly affecting the readability of the text. These notes are corroborated by the indices reported in Tables 1, 2, and 3. Finally, also consistently with previous results, GPT highlighted how our simplified version for the proposed amendments makes use of more direct and common language, employs shorter sentences, fewer subordinate clauses and incidental propositions, and exhibits less redundancy and repetitions. 4. Conclusions Recently, there has been a growing interest in the Italian legal field among researchers in computational linguistics, as demonstrated by the works in [36] and [37]. Our work contributes to this field from a different angle. We presented a multi-layered analysis of the Constitutional Reform Bill. This analysis integrates legal and linguistic perspectives, traditional readability indices, and a novel approach employing predictive methods for reading times. From the viewpoint of linguistic and legal experts, the proposed amendments exhibit defects in terms of the quality of the writing: numerous issues arise from both syntactic and semantic perspectives. These aspects contribute to a text that is not only challenging to follow and comprehend compared to the original, but also susceptible to multiple plausible interpretations, potentially becoming a source of disputes. These observations are substantiated through readability analyses, which also confirm improved readability in our simplified version of the amendment. Furthermore, correlations derived from our analysis using TRTs and READ-IT indices show that predictions of reading times effectively capture the linguistic and legal complexities described. Similar to READ-IT, our method may be computed at the sentence level. In addition, our system may be trained with eye-tracking data from different sorts of reader, and the employed language model may be fine-tuned on various kinds of text, thus targeting a more flexible notion of readability, associated to a specific group of readers and to a specific kind of text. More specifically, provided that eye-tracking data on texts as close as possible to those of interest are needed, the proposed analytical approach may be easily extended to novel domains and different sorts of text —such as, e.g., textbooks, newspapers, code, assembly instructions for general manufactured items, and so forth—. Likewise, since our approach does not depend on specific (possibly arbitrary) parameters, it may be employed to predict reading times of specific target groups such as, e.g., children, domain experts, laypeople in domain-specific settings. Along this axis, again, it would be necessary to record reading times from such profiled reader groups. Finally, a brief examination using a modern Large Language Model, GPT-4o, aligns with previous findings, identifying complexities in the proposed amendments, highlighting both general problems like text length and intricate text structure (e.g., due to overuse of subordinate clauses) and adoption of specialist jargon, in accordance with the analysis offered by jurists and linguists. These outcomes indicate that future research can significantly benefit from the automated and combined use of LLMs alongside more specialized tools to identify critical components within texts. References [1] G. Bonetta, C. D. Hromei, L. Siciliani, M. A. Stranisci, Preface to the Eighth Workshop on Natural Language for Artificial Intelligence (NL4AI), in: Proceedings of the Eighth Workshop on Natural Language for Artificial Intelligence (NL4AI 2024) co-located with 23th International Conference of the Italian Association for Artificial Intelligence (AI*IA 2024), 2024. [2] G. Busia, Il percorso di elaborazione del testo costituzionale, Bologna: Il Mulino, 1998, pp. 129–164. [3] OpenAI, J. Achiam, S. A. et al., Gpt-4 technical report, 2024. URL: https://arxiv.org/abs/2303.08774. arXiv:2303.08774. [4] OpenAI, Hello GPT4o Web page, 2024. URL: https://openai.com/index/hello-gpt-4o/. [5] T. De Mauro, Storia linguistica dell’Italia unita, Biblioteca di cultura moderna, Laterza, 1963. URL: https://books.google.it/books?id=1l0mAAAAMAAJ. [6] B. M. Garavelli, L’italiano della repubblica: caratteri linguistici della costituzione, in: V. Coletti (Ed.), L’italiano dalla nazione allo Stato, Le Lettere, Firenze, 2011, p. 211. [7] L. Cignetti, Sfondi e rilievi testuali nella costituzione della repubblica italiana, in: Le (Ed.), Rilievi, 2005. [8] M. A. Cortelazzo, Un elogio linguistico [della costituzione], LID’O. Lingua italiana d’oggi VI (2009) 43–52. [9] T. De Mauro, Il linguaggio della costituzione, Lid’O: lingua italiana d’oggi: VI, 2009 (2009) 31–42. [10] E. Corino, La costituzione italiana è ancora un testo facile?, in: A. Ferrari, L. Lala, F. Pecorari (Eds.), L’italiano dei testi costituzionali, Edizioni dell’Orso, Alessandria, 2022, pp. 293–318. [11] E. Leso, 27 dicembre 1947: Lingua della costituzione e lingua di tutti, in: F. Bambi (Ed.), Un secolo per la Costituzione (1848-1948). Concetti e parole nello svolgersi del lessico costituzionale italiano, Accademia della Crusca, Firenze, 2012, pp. 277–290. [12] G. Rovere, Annotazioni metodologiche sulla comprensibilità del lessico costituzionale italiano, in: A. Ferrari, L. Lala, F. Pecorari (Eds.), L’italiano dei testi costituzionali, Edizioni dell’Orso, Alessandria, 2022, pp. 271–292. [13] M. E. Piemontese, M. Piemontese, et al., Capire e farsi capire. teorie e tecniche della scrittura controllata (1996). [14] Accademia della Crusca in collaborazione con il CLIEO e l’ITTIG, Guida alla redazione degli atti amministrativi, CLIEO e ITTIG, Firenze, 2011. URL: https://www.ittig.cnr.it/Ricerca/Testi/ GuidaAttiAmministrativi.pdf, documento online. [15] L. Revelli, A. Mastropaolo, R. D. Paolo, et al., La sottodeterminazione nei testi giuridici: verso un’analisi linguistico-computazionale, in: Fare linguistica applicata con le digital humanities, volume 14, Officinaventuno, 2022, pp. 131–144. [16] A. Siddharthan, Syntactic simplification and text cohesion, Research on Language and Computation 4 (2006) 77–109. URL: http://dx.doi.org/10.1007/s11168-006-9011-1. doi:10.1007/ s11168-006-9011-1. [17] R. F. Flesch, A new readability yardstick., The Journal of applied psychology 32 3 (1948) 221–33. URL: https://api.semanticscholar.org/CorpusID:39344661. [18] R. Flesch, Marks of readable style; a study in adult education., Teachers College Contributions to Education (1943). [19] V. Franchina, R. Vacca, Adaptation of Flesh readability index on a bilingual text written by the same author both in Italian and English languages, Linguaggi 3 (1986) 47–49. [20] P. Lucisano, M. E. Piemontese, et al., GulpEase: una formula per la predizione della leggibilità di testi in lingua italiana, Scuola e città (1988) 110–124. [21] F. Dell’Orletta, S. Montemagni, G. Venturi, READ–IT: Assessing readability of Italian texts with a view to text simplification, in: Proceedings of the second workshop on speech and language processing for assistive technologies, 2011, pp. 73–83. [22] R. Radach, A. Kennedy, Eye movements in reading: Some theoretical context, The Quarterly journal of experimental psychology 66 (2013) 429–452. [23] M. J. Hofmann, S. Remus, C. Biemann, R. Radach, L. Kuchinke, Language models explain word reading times better than empirical predictability, Frontiers in Artificial Intelligence 4 (2022) 730570. [24] L. Frazier, K. Rayner, Making and correcting errors during sentence comprehension: Eye move- ments in the analysis of structurally ambiguous sentences, Cognitive psychology 14 (1982) 178–210. [25] W. de Vries, M. Nissim, As Good as New. How to Successfully Recycle English GPT-2 to Make Models for Other Languages, in: Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, Association for Computational Linguistics, 2021. URL: http://dx.doi.org/10. 18653/v1/2021.findings-acl.74. doi:10.18653/v1/2021.findings-acl.74. [26] C. J. Scozzaro, D. Colla, M. Delsanto, A. Mastropaolo, E. Mensa, L. Revelli, D. P. Radicioni, et al., Legal text reader profiling: Evidences from eye tracking and surprisal based analysis, in: Pro- ceedings of the Workshop on DeTermIt! Evaluating Text Difficulty in a Multilingual Context@ LREC-COLING 2024, ELRA and ICCL, 2024, pp. 114–124. [27] N. Hollenstein, E. Chersoni, C. L. Jacobs, Y. Oseki, L. Prévot, E. Santus, CMCL 2021 shared task on eye-tracking prediction, in: E. Chersoni, N. Hollenstein, C. Jacobs, Y. Oseki, L. Prévot, E. Santus (Eds.), Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics, Association for Computational Linguistics, Online, 2021, pp. 72–78. URL: https://aclanthology.org/ 2021.cmcl-1.7. doi:10.18653/v1/2021.cmcl-1.7. [28] Y. Bestgen, LAST at CMCL 2021 shared task: Predicting gaze data during reading with a gra- dient boosting decision tree approach, in: E. Chersoni, N. Hollenstein, C. Jacobs, Y. Oseki, L. Prévot, E. Santus (Eds.), Proceedings of the Workshop on Cognitive Modeling and Compu- tational Linguistics, Association for Computational Linguistics, Online, 2021, pp. 90–96. URL: https://aclanthology.org/2021.cmcl-1.10. doi:10.18653/v1/2021.cmcl-1.10. [29] D. Colla, M. Delsanto, M. Agosto, B. Vitiello, D. P. Radicioni, Semantic coherence markers: The contribution of perplexity metrics, Artificial Intelligence in Medicine 134 (2022) 102393. [30] D. Colla, M. Delsanto, D. P. Radicioni, Semantic coherence dataset: Speech transcripts, Data in Brief 46 (2023) 108799. [31] F. Sigona, D. P. Radicioni, B. G. Fivela, D. Colla, M. Delsanto, E. Mensa, A. Bolioli, P. Vigorelli, A computational analysis of transcribed speech of people living with dementia: The anchise 2022 corpus, Computer Speech & Language 89 (2025) 101691. [32] E. G. Wilcox, J. Gauthier, J. Hu, P. Qian, R. Levy, On the predictive power of neural language models for human real-time comprehension behavior., CoRR abs/2006.01912 (2020). URL: http: //dblp.uni-trier.de/db/journals/corr/corr2006.html#abs-2006-01912. [33] E. G. Wilcox, T. Pimentel, C. Meister, R. Cotterell, R. P. Levy, Testing the predictions of surprisal theory in 11 languages, Transactions of the Association for Computational Linguistics 11 (2023) 1451–1470. [34] R. Levy, Expectation-based syntactic comprehension, Cognition 106 (2008) 1126–1177. [35] I. F. Monsalve, S. L. Frank, G. Vigliocco, Lexical surprisal as a general predictor of reading time, in: Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics, 2012, pp. 398–408. [36] D. Licari, G. Comandè, ITALIAN-LEGAL-BERT: A Pre-trained Transformer Language Model for Italian Law, in: D. Symeonidou, R. Yu, D. Ceolin, M. Poveda-Villalón, D. Audrito, L. D. Caro, F. Grasso, R. Nai, E. Sulis, F. J. Ekaputra, O. Kutz, N. Troquard (Eds.), Companion Proceedings of the 23rd International Conference on Knowledge Engineering and Knowledge Management, volume 3256 of CEUR Workshop Proceedings, CEUR, Bozen-Bolzano, Italy, 2022. URL: https://ceur-ws.org/ Vol-3256/#km4law3, iSSN: 1613-0073. [37] S. Auriemma, M. Madeddu, M. Miliani, A. Bondielli, L. C. Passaro, A. Lenci, BureauBERTo: adapting UmBERTo to the Italian bureaucratic language, in: F. Falchi, F. Giannotti, A. Monreale, C. Boldrini, S. Rinzivillo, S. Colantonio (Eds.), Proceedings of the Italia Intelligenza Artificiale - Thematic Workshops co-located with the 3rd CINI National Lab AIIS Conference on Artificial Intelligence (Ital IA 2023), volume 3486 of CEUR Workshop Proceedings, CEUR-WS.org, Pisa, Italy, 2023, pp. 240–248. URL: https://ceur-ws.org/Vol-3486/42.pdf.