-

1613-0073

AMELIA - Argument Mining Evaluation on Legal documents in ItAlian: A CALAMITA Challenge

Giulia Grundler

giulia.grundler2@unibo.it 0 1 4

Andrea Galassi

a.galassi@unibo.it 1 2 4

Piera Santin

1 3 4

Alessia Fidelangeli

0 1 4

Federico Galli

0 1 4

Elena Palmieri

1 2 4

Francesca Lagioia

0 1 3 4

Giovanni Sartor

0 1 3 4

Paolo Torroni

1 2 4 0 CIRSFID Alma-AI, Faculty of Law, University of Bologna , Italy 1 CLiC-it 2024: Tenth Italian Conference on Computational Linguistics 2 DISI, Alma-AI, University of Bologna , Italy 3 European University Institute, Law Department , Italy 4 LLM , Argument Mining, Legal Analytics, VAT, CALAMITA, CLiC-it

This challenge consists of three classification tasks, in the context of argument mining in the legal domain. The tasks are based on a dataset of 225 Italian decisions on Value Added Tax, annotated to identify and categorize argumentative text. The objective of the first task is to classify each argumentative component as premise or conclusion, while the second and third tasks aim at classifying the type of premise: legal vs factual, and its corresponding argumentation scheme. The classes are highly unbalanced, hence evaluation is based on the macro F1 score.

CEUR ceur-ws.org

1. Challenge: Introduction and Motivation

pable of reasoning, as opposed to simply recognizing patterns from vast amounts of data is an open research question and the subject of a lively ongoing debate [ 1 ]. A way to describe human reasoning is through its ability to understand, evaluate, and invent arguments composed by claims, evidence, and conclusions meaningfully connected with one another [ 2 ]. For this reason, the ability to recognize arguments could be considered as a first step in a sequence of reasoning tasks of increasing complexity, that goes from the detection and classification of argumentative discourse units or argument components, through argument structure prediction, reconstruction, evaluation, down to argument generation. Automatizing these tasks is the object of argument mining [ 3, 4, 5 ]. We believe that gauging the ability of LLMs to address even basic argument mining tasks would provide meaningful cues as to these models’ ability to process and understand logical relations expressed in natural language.

While several datasets for argument mining in English (A. Galassi)

0000-0002-7255-9343 (G. Grundler); 0000-0001-9711-7042 (A. Galassi); 0000-0002-0734-9657 (P. Santin); 0000-0003-3739-5387 (F. Galli); 0000-0001-5176-8843 (E. Palmieri); 0000-0001-7083-3487 (F. Lagioia); 0000-0003-2210-0398 (G. Sartor); 0000-0002-9253-8638 (P. Torroni) © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).

resources for other languages remain scarce. To the best of our knowledge, only a few works exist for Italian. In [11], the authors use the CorEA corpus of user comments (support or attack), to pairs of arguments. In [12], the authors propose a new model for stance detection, trained and evaluated on a corpus of Italian tweets where users were discussing on a highly polarized political debate.

Among the many domains of interest for argument mining, our focus is on the legal domain, where argumentation is fundamental for the decision-making process.

Legal reasoning relies heavily on well-structured arguments, as legal professionals must construct and deconstruct arguments within formal documents, providing a challenging setting for assessing an LLMs’ ability to engage in complex reasoning tasks. Despite its relevance, little attention has been given to argument mining in the legal domain in Italian. Most existing work in legal NLP for Italian has focused on tasks such as law article retrieval [13, 14], outcome prediction [ 15 ], analysis of contracts [ 16, 17 ], and summarization [ 18, 19 ].

Our challenge for CALAMITA [ 20 ] consists of three classification tasks over argumentative texts. We mostly pus for argument mining on legal documents in English.

Since we leverage real legal documents, not synthetic or artificially constructed case studies, our dataset reflects the real complexity and nuances of legal argumentation.

It is therefore particularly relevant for a robust assessment of LLMs’ abilities in real-world applications. To the best of our knowledge, we are the first to propose a challenge of argument mining over legal documents in Italian.

The challenge requires understanding not only the have been developed over the last decade [6, 7, 8, 9, 10], follow the setting used in Demosthenes [ 21, 22 ], a corItalian language but domain-specific technical language.

Such a language uses complex syntactic structures, and a specialized terminology. Besides language, the challenge tests LLMs’ ability to recognize and interpret legal arguments by recognizing typical argumentation schemes [ 23 ], e.g., patterns of reasoning used in human discourse, ofering a principled approach to argument analysis and evaluation. Identifying schemes is challenging as there are many possible schemes, and arguments are often only partially laid out in the text, leaving many important parts implicit for brevity or because they are considered common knowledge. Nonetheless, this task lends itself to generalization beyond the legal domain, making the insights transferable to other fields where structured reasoning plays a critical role.

2. Challenge: Description Si osserva poi che ritenere che la mancata

possibilità di detrazione a favore di soggetti come il ricorrente comporti un aiuto di Stato in favore degli ospedali pubblici, in quanto le perdite degli stessi vengono ripianate dalle USL e dalla Regioni trascura di considerare l’accessibilità, indiscriminata, ai servizi dei nosocomi pubblici da parte dei soggetti iscritti al SSN, rispetto a quella ad un libero professionista sanitario che, in quanto tale, ben potrebbe rifiutarsi di prestare i propri servigi al pare di un normale contraente. • Argument conclusion: the statement that follows logically from the premise(s) and represents the ifnal point being argued for.

Dunque, l’uficio ha riconosciuto la non imponibilità IVA delle cessioni all’esportazione, così cessando sul punto la materia del contendere.

We consider an argument as a set of interconnected portions of texts called argument components. The connections between components form a specific pattern of Argument components can be involved in more than relationships that represents a reasoning paradigm. one relationship, therefore a component may be the con

The following tasks presume that argument compo- clusions of other premises, as well as a premise of other nents have already been identified from the source doc- arguments. In that case, the component is to be classified uments. Argument components can therefore be clas- as a premise. sified according to their role in the connections (such as Premises or Conclusions), according to their content Premise Type classification. Multi-label classifica(such as Legal or Factual), and according to the rela- tion: classify an argumentative premise as factual or legal tionship pattern they contribute to (the Argumentative (or both).

Scheme).

This challenge proposes three classification tasks, in the context of argument mining in the legal domain: • Factual premise: a premise that describes factual situations and events, pertaining to the substance or the procedure of the case. • Argument Component classification : given an argumentative component, classify it as premise or conclusion. • Premise Type classification : given a premise,

classify it as factual or legal. • Argument Scheme classification : given a predetermined set of argument schemes, classify a legal premise as belonging to one or more such schemes.

The following paragraphs contain a definition of each class, along with an example extracted from the dataset.

The translated version of the examples is available in Appendix A.

Argument Component classification. Binary classiifcation: given an argumentative component, classify it as premise or conclusion.

Indubbiamente, la contribuente ha impugnato la sentenza di prime cure, rappresentando nuovamente di non aver potuto proporre appello avverso la pronuncia di condanna di primo grado, per causa di forza maggiore. • Legal premise: a premise that specifies the legal content (legal rules, precedents, interpretation of applicable laws and principles).

La giurisprudenza citata, alla motivazione della quale si fa rinvio, ha tra l’altro preso posizione espressamente e positivamente sulla conformità della normativa italiana rispetto a quella dell’Unione Europea, risultando così confutata anche la doglianza della difesa sul punto che ha chiesto la sospensione del procedimento, con investitura della Corte di Giustizia

Europea della questione.

• Argument premise: a proposition that provides a reason or support for the argument.

Since a premise could be both factual and legal, this task is framed as multi-label binary classification.

Argument Scheme classification. Legal premises determine the nature of the legal reasoning they support, hence they are labeled with the corresponding reasoning pattern, called argument scheme. We define five schemes relevant for tax law. Each legal premise may be assigned multiple schemes, therefore we frame this task as multilabel multi-class classification.

Given a legal premise, classify it as belonging to one or more of the following schemes: (established) rule, precedent, classification , interpretative, or principle. • Rule (or established rule) scheme: it is used whenever an explicit reference to codified law is present. This reference can be the reference to a certain article or the quotation of the text of a certain article.

Infatti, è ben vero che, ai sensi del combinato disposto dagli articoli 54 e 23 D.Lgs. n. 546/1992, il convenuto in appello deve costituirsi entro 60 giorni dal giorno in cui ricorso è stato notificato. • Precedent scheme: it is used whenever there is an explicit reference to a previous decision. In the dataset we considered only the references to a decision of both the Court of Cassation or the European Court of Justice.

L’ Amministrazione “ha l’onere di provare ed allegare gli elementi probatori su cui si fondi la contestazione, tra i quali possono rilevare, in via indiziaria, quali elementi sintomatici della mancata esecuzione della prestazione dal fatturante, l’assenza della minima dotazione personale e strumentale, l’immediatezza dei rapporti (cedente/prestatore fatturante interposto e cessionario/committente), una conclamata inidoneità allo svolgimento dell’attività economica e la non corrispondenza tra i cedenti e la società coinvolta nell’operazione”. con incarico a terzi) e cedere il prodotto finito ottenuto. • Interpretative scheme: it is used whenever the Court expresses new interpretative assertions (that may depend on previous case law) thereby creating new precedents.

Si vuole dire, in sostanza, che la finalità del contraddittorio anticipato è quella di mettere il contribuente nella condizione di potere fare valere le proprie osservazioni prima che la decisione sia adottata e, quindi, di far sì che l’Amministrazione possa tener conto di tutti gli elementi del caso nell’adottare (o non adottare) il provvedimento ovvero nel dare a questo un contenuto piuttosto che un altro. • Principle scheme: it is used whenever the Court explicitly refers to a principle of law (e.g. the Principle of proportionality).

Nell’ordinamento unionale, pertanto, il principio del contraddittorio in ambito tributario prescinde dalla natura del tributo e deve trovare applicazione ogni qualvolta l’amministrazione sulla base della documentazione esibita ritenga dovere dare alla stessa documentazione interpretazione diversa da quella data dal contribuente invitandolo, come detto, a fornire nel corso del contraddittorio le ragioni della propria scelta.

3. Data description 3.1. Origin of data

In conclusione, per quanto fin qui esposto, i “compro oro” possono essere definiti come “esercizi commerciali che acquistano, commerciano o rivendono oggetti d’oro, di metalli preziosi o recanti pietre preziose usati e li cedono nella forma di materiale, di rottami d’oro o di metalli preziosi alle fonderie o ad altre aziende specializzate nel recupero di materiali preziosi”.

Trattano esclusivamente prodotti finiti e non possono, congiuntamente, acquistare oro da gioielleria usato, fonderlo (per proprio conto o

The data consists of argumentative portions of text extracted from 225 Italian decisions on Value Added Tax (VAT) by the Regional Tax Commissions from various judicial districts. The decisions were downloaded par• Classification scheme : it is used whenever a legal tially from the open Giustizia Tributaria database1 and concept is defined, its properties are listed, and from other judicial databases accessed through university a certain fact or legal deed must be qualified as licensing agreements. The decisions range from 2010 to having those properties. 2022 and concern taxable transactions, exemptions, outof-scope transactions, and the right to obtain a deduction.

The argumentative components were extracted from the sections “Motivi della decisione”, “Diritto” or “Fatto e diritto”, depending on the format of each decision.

The collected data were anonymised modifying any identification data of natural or legal persons involved in the proceedings. In particular, the names of the parties in the proceeding and, to provide the highest privacy standards, also the names of the companies have been replaced with initials (e.g., Mario Rossi in “MR”, Company 1Tax Justice database accessible at: https://www.giustizia-tributaria. it/. s.r.l in “C s.r.l.”). The names of the judges composing the judicial panel have been replaced by “giu1, giu2, [...] giuN”. Also, addresses and places were replaced with ’XXX’, and dates were changed to show only the year in the following format: DD/MM/2015. set, some of which are included in Section 2. Here we report the zero-shot version. The translation of the zeroshot prompts is available in Appendix B. The few-shot version is available in Appendix C.

3.2. Annotation details

The dataset was annotated by four tax law experts. Annotation guidelines are significantly based on our previous work on the Demosthenes corpus [ 21 ], a dataset with English documents from the Court of Justice of the European Union. The guidelines were adapted to the Italian decisions, and refined through an iterative process of validation and discussion, to solve conflicts between an- Premise Type classification: given a premise, classify notators. In particular, the annotation is based on the it as factual, legal or both. same classes used in Demosthenes. However, the struc- Prompt: “Classifica la seguente premessa come di fatto ture of the decisions is diferent: while in the English ‘F’, legale ‘L’ o entrambe. Le premesse di fatto (F) descrivono corpus the annotation is done at the sentence level, it situazioni ed eventi fattuali relativi al caso di specie. Le preis not always possible to meet this criterion in Italian messe legali (L) specificano il contenuto giuridico (norme decisions. Therefore, the constraint has been relaxed, giuridiche, precedenti, interpretazione delle leggi e dei prinallowing a single annotation to cover multiple sentences cipi applicabili). L’output atteso è una lista con tutte le and a single sentence to contain multiple annotations. label applicabili. Ad esempio: [‘F’, ‘L’]. Testo: ” The tagged decisions are available in our GitHub repository. 2 Argument Component classification: given an argumentative text, classify it as premise or conclusion.

Prompt: “Classifica il seguente testo argomentativo come

premessa ‘prem’ o conclusione ‘conc’. Per premessa (prem) si intende una proposizione che fornisce una ragione o un supporto per l’argomentazione. Per conclusione (conc) si intende l’afermazione che segue logicamente dalle premesse e rappresenta il punto finale che viene argomentato. Testo: ” Argument Scheme classification: given a legal premise, classify it as one or more of the following argu3.3. Data format mentative schemes: Rule, Prec, Class, Itpr, Princ. Data are available as a Hugging Face Dataset,3 divided in Prompt: “Classifica la seguente premessa legale in uno three splits: train, val and test. Each row represents an o più dei seguenti schemi argomentativi: Rule, Prec, Class, argumentative component, with the following columns: Itpr, Princ. Rule: se esiste un riferimento esplicito o implicito a un articolo di legge o la citazione del testo di una • Text: the text of the component norma. Prec: se esiste un riferimento ad una precedente • Document: the document it belongs to pronuncia della Corte di Cassazione o della Corte di Gius• Component: if it is a premise (prem) or a conclu- tizia dell’Unione Europea. Class: se c’è la definizone di un sion (conc) concetto giuridico o degli elementi costitutivi dello stesso. • Type: a list value representing the type of a Itpr: se c’è il riferimento a uno dei criteri interpretativi premise; the list contains F for a Factual premise contenuti all’art. 12 delle preleggi (letterale, teleologica, and L for a Legal one. psicologica, sistematica) al codice civile. Princ: se c’è un • Scheme: a list value representing the argumenta- riferimento espresso a un prinicpio generale del diritto (es. tive schemes of a legal premise. The values are: principio di proporzionalità). L’output atteso è una lista Rule, Prec, Class, Itpr and Princ. con tutte le label applicabili. Ad esempio: [‘Prec’, ‘Princ’, • Chain_id: univocal for each document, it specifies ‘Rule’]. Testo: ” the argumentative chain the component belongs to (e.g. A1, A2,..., B1, B2,...) 3.5. Detailed data statistics • Id: an univocal numerical id

3.4. Example of prompts used for zero and few shots

For each task, we propose both a zero-shot and a few-shot prompt. For the few-shot version, we have selected some particularly representative examples from the training 2https://github.com/adele-project/AMELIA/ 3https://huggingface.co/datasets/nlp-unibo/AMELIA The composition of the dataset is summarized in Table 1. The splitting between train, validation, and test data was done at the document level so that components of the same document belong to the same split. It was performed manually, with a ratio of approximately 60:20:20, and the aim of balancing the Scheme classes as much as possible. We adopt the train/val/test format to make the results comparable with as many methods as possible, such as fine-tuned transformer-based models.

Split

N docs

Train Validation

Test Total Prem

4. Metrics

capture or leverage such contextual details that would otherwise aid in more accurate argument classification.

Due to the heavy unbalance between the classes, we eval- Another limitation is the manual annotation process, uate the results using the macro F1 score. Additionally, which, despite eforts to ensure consistency through we evaluate the F1 score of each class to provide further expert annotators and conflict resolution, may still be insights. subject to human bias or interpretation inconsistencies.

As a reference, in Demosthenes [ 21 ] the best macro F1 These subjective elements could afect the quality and results for the three tasks are 0.88 for Argument Com- reproducibility of the tasks. ponent classification, 0.85 for Premise Type classification, and 0.75 for Argument Scheme classification. It is important to specify that these scores are not directly 6. Ethical issues comparable and we provide them only as a reference of the dificulty of the tasks.

The dataset comprises legal decisions that have been anonymised to protect the privacy of the individuals.

However, it is important to acknowledge the potential 5. Limitations risks related to re-identification, even with anonymisation eforts, especially in legal contexts where case details The original documents, along with the argument min- could be cross-referenced with external sources. Care ing annotation, are already available as part of the Adele was taken to remove any personal identifiers, such as tool.4 The original documents, annotated according to names, addresses, and dates, but residual risks may rethe task of outcome prediction instead of argument min- main. ing, are also published in [ 15 ]. Additionally, the use of this dataset raises questions

The dataset is limited in size, consisting of only 225 regarding the deployment of AI systems in legal contexts. legal decisions on Value Added Tax (VAT). While this AI used by a judicial authority in researching and interprovides a valuable resource for testing argument min- preting facts and the law are considered high-risk by the ing models in the Italian tax legal domain, the relatively AI Act.5 Those systems must conform to the essential small dataset may not capture the full diversity of argu- requirements (e.g. data governance, user transparency, mentative structures present in the broader Italian tax human oversight, etc.) and the conformity must be doculegal system or other legal domains. This could limit mented. the scalability of models trained on this dataset. Also, Finally, a critical aspect is the transparency and acgiven that the legal decisions are from a specific time countability of AI systems when applied in sensitive doframe (2010-2022), the dataset may not reflect more re- mains like law. Users of the models should understand cent developments or changes in legal reasoning or tax their limitations, especially in tasks involving nuanced law. reasoning like legal argumentation. Furthermore, ensur

Secondly, the dataset has been anonymised to protect ing that legal professionals and stakeholders have the the privacy of individuals and legal entities. While this ability to audit and interpret the decisions made by AI is necessary to comply with data protection regulations, models is crucial to avoid undermining trust in legal inthe anonymisation process may have removed certain stitutions. contextual details (e.g., names of places or entities) that could be relevant for understanding the nuances of certain legal arguments. As a result, models may not fully 4https://adele-tool.eu/ 5https://eur-lex.europa.eu/eli/reg/2024/1689/oj.

7. Data license and copyright issues

[5] J. Lawrence, C. Reed, Argument Mining: A Survey, Computational Linguistics 45 (2020) 765–818. doi:10.1162/coli_a_00364.

The dataset used in this challenge consists of legal deci- [6] I. Habernal, D. Faber, N. Recchia, S. Bretthauer, sions on Value Added Tax (VAT) made by the Regional I. Gurevych, I. S. genannt Döhmann, C. Burchard, Tax Commissions in Italy, available and downloaded from Mining legal arguments in court decisions, Artif. the Giustizia Tributaria and other judicial databases ac- Intell. Law 32 (2024) 1–38. cessed through university licensing agreements. These [7] V. Niculae, J. Park, C. Cardie, Argument mining legal texts, being oficial public documents, are generally with structured svms and rnns, in: ACL (1), Asnot subject to copyright restrictions. The dataset con- sociation for Computational Linguistics, 2017, pp. sists of a non-substantial part of the respective databases. 985–995.

Moreover, the use of data is compliant with the text and [8] P. Poudyal, J. Savelka, A. Ieven, M. F. Moens, data mining exception under the EU Copyright Directive T. Goncalves, P. Quaresma, ECHR: Legal corpus and implementing national law.6 for argument mining, in: E. Cabrio, S. Villata

Since the data has been processed and annotated, the (Eds.), Proceedings of the 7th Workshop on Arannotations and derived data are subject to copyright by gument Mining, Association for Computational the authors of this challenge. To promote transparency Linguistics, Online, 2020, pp. 67–75. URL: https: and further research, the dataset is released under the //aclanthology.org/2020.argmining-1.8. Creative Commons Attribution 4.0 International (CC BY [9] T. Mayer, S. Marro, E. Cabrio, S. Villata, Enhancing 4.0) license. This license allows others to share, use, and evidence-based medicine with natural language aradapt the data, as long as appropriate credit is given to the gumentative analysis of clinical trials, Artif. Intell. creators, and any modifications are explicitly indicated. Medicine 118 (2021) 102098. [10] P. Accuosto, H. Saggion, Mining arguments in scientific abstracts with discourse-level embeddings, Acknowledgments Data Knowl. Eng. 129 (2020) 101840. [11] P. Basile, V. Basile, E. Cabrio, S. Villata, Argument This work was partially supported by the following Mining on Italian News Blogs, volume 1749 of CEUR projects: “ADELE – Analytics for DEcision of LEgal cases” Workshop Proceedings, CEUR-WS.org, 2016. URL: (Justice Programme, GA. No. 101007420); PRIN2022 https://ceur-ws.org/Vol-1749/paper8.pdf. PRIMA - PRivacy Infringements Machine-Advice (Ref. [12] M. Lai, V. Patti, G. Rufo, P. Rosso, Stance evolution Prot. n.: 20224TPEYC - CUP J53D23005130001); “FAIR - and twitter interactions in an italian political debate, Future Artificial Intelligence Research” – Spoke 8 “Perva- in: M. Silberztein, F. Atigui, E. Kornyshova, E. Mésive AI’’, under the European Commission’s NextGener- tais, F. Meziane (Eds.), Natural Language Processing ation EU programme, PNRR – M4C2 – Investimento 1.3, and Information Systems, Springer International Partenariato Esteso (PE00000013). Publishing, Cham, 2018, pp. 15–27. [13] A. Tagarelli, A. Simeri, Unsupervised law article References mining based on deep pre-trained language representation models with application to the italian civil code, Artificial Intelligence and Law 30 (2021) 417–473. doi:10.1007/s10506- 021- 09301- 8.

A. Translated Examples

Argument Scheme classification.

It should be noted that viewing the inability to deduct expenses for individuals such as the plaintif as state aid to public hospitals overlooks the indiscriminate accessibility of public hospital services for individuals registered with the National Health Service (SSN). In contrast, a self-employed healthcare professional may refuse to provide services as an ordinary contractor.

Thus, the ofice recognized the VAT non-taxable nature of the exportation, thus considering there is no longer any grounds to proceed on the matter. Undoubtedly, the taxpayer appealed the first instance ruling, again representing that she could not appeal against the first instance decision due to force majeure. The cited case law, to which reference is made for

its reasoning, has explicitly and positively addressed the conformity of Italian legislation with that of the

European Union. This efectively refutes the defense’s

objection on this point, which requested the suspension of the proceedings and the referral of the issue to the

European Court of Justice. In fact, it is true that under Articles 54 and 23 of Legislative Decree No. 546/1992, the defendant on

appeal must come up for trial within 60 days from the day on which appeal was served.

The Administration “has the burden of proving and attaching the evidence on which the dispute is based, among which the absence of the minimum personal and instrumental equipment, the immediacy of the relationships (transferor/interposed invoicing provider and transferee/buyer), an overt unsuitability to carry out the economic activity and the mismatch between the transferors and the company involved in the transaction may be circumstantial.” Classification Scheme: In conclusion, given what has been said so far, “gold shop” can be defined as “business establishments that buy, trade or resell used objects of gold, precious metals or bearing precious stones and dispose of them in the form of material, scrap gold or precious metals to foundries or other companies specialising in the recovery of precious materials”. They deal only in finished products and may not purchase used jewelery gold, melt it down (for their account or by commissioning a third party) and dispose of the resulting finished product.

Interpretative Scheme: It means that, in essence, the purpose of the right to be heard is to put the taxpayer in the position of being able to make his or her observations before the decision is made and, therefore, to ensure that the administration can take into account all the elements of the case in adopting (or not adopting) the measure or in giving this one content rather than another. Principle Scheme: In the European Union system, therefore, the right to be heard in tax matters is independent of the nature of the tax and must be applied whenever the administration on the basis of the documentation exhibited deems it necessary to give the same documentation an interpretation that difers from that given by the taxpayer, inviting him, as mentioned, to provide in the exercise of the right to be heard the reasons for his choice.

B. Translated Prompts

Argument Component classification.

“Classify the following argumentative text as premise ‘prem’ or conclusion ‘conc’. A premise (prem) is a proposition that provides a reason or support for the argument. A conclusion (conc) is the statement that follows logically from the premise(s) and represents the final point being argued for.

Text:” Premise Type classification.

“Classify the following premise as factual ‘F’, legal ‘L’ or both. Factual premises (F) describe factual situations and events, pertaining to the substance or the procedure of the case. Legal premises (L) specify the legal content (legal rules, precedents, interpretation of applicable laws and principles). The expected output is a list with all applicable labels. For example: [‘F’, ‘L’]. Text:”

“Classify the following legal premise as one or more of the following argumentative schemes: Rule, Prec, Class, Itpr, Princ. Rule: whether there is an explicit or implicit reference to an article of law or citation of the text of a certain article. Prec: whether there is a reference to a previous ruling of the Supreme Court or the Court of Justice of the European Union. Class: if there is a definition of a legal concept or its constituent elements. Itpr: if there is reference to one of the interpretative criteria contained in Article 12 of the prelegislations (literal, teleological, psychological, systematic) to the Civil Code. Princ: if there is a reference to a general principle of law (e.g. principle of proportionality). The expected output is a list with all applicable labels. For example: [‘Prec’, ‘Princ’, ‘Rule’]. Text:”

C. Few-shot prompts

Argument Component classification. “Classifica il seguente testo argomentativo come premessa ‘prem’ o conclusione ‘conc’. Per premessa (prem) si intende una proposizione che fornisce una ragione o un supporto per l’argomentazione. Per conclusione (conc) si intende l’afermazione che segue logicamente dalle premesse e rappresenta il punto finale che viene argomentato.

Esempi: Testo: Si osserva poi che ritenere che la mancata possibilità di detrazione a favore di soggetti come il ricorrente comporti un aiuto di Stato in favore degli ospedali pubblici, in quanto le perdite degli stessi vengono ripianate dalle USL e dalla Regioni trascura di considerare l’accessibilità, indiscriminata, ai servizi dei nosocomi pubblici da parte dei soggetti iscritti al SSN, rispetto a quella ad un libero professionista sanitario che, in quanto tale, ben potrebbe rifiutarsi di prestare i propri servigi al pare di un normale contraente Risposta: prem Risposta: conc Testo: L’appello è infondato e va respinto Testo: Va osservato che la motivazione dell’atto di accertamento non può esaurirsi nel rilievo dello scostamento, ma deve essere integrata con la dimostrazione dell’applicabilità in concreto dello ‘standard’ prescelto e con le ragioni per le quali sono state disattese le contestazioni sollevate dal contribuente. (cfr. Cass. S.U. 26635/2009, Cass. 12558/2010, Cass. 12428/2012, Cass. 23070/2012) Testo: Dunque, l’uficio ha riconosciuto la non imponibilità IVA delle cessioni all’esportazione, così cessando sul punto la materia del contendere Testo: Risulta d’altronde dalle osservazioni scritte del governo spagnolo che quest’ultimo non riesce a discernere tale diferenza ad un esame delle pertinenti norme dell’ordinamento spagnolo. Testo: Il Collegio, esaminata l’eccezione preliminare svolta nel suo appello dall’Uficio e relativa alla richiesta nullità della sentenza per mancata instaurazione del contraddittorio, la respinge

“Classifica la seguente premessa come di fatto ‘F’, legale ‘L’ o entrambe. Le premesse di fatto (F) descrivono situazioni ed eventi fattuali relativi al caso di specie. Le premesse legali (L) specificano il contenuto giuridico (norme giuridiche, precedenti, interpretazione delle leggi e dei principi applicabili). L’output atteso è una lista con tutte le label applicabili. Ad esempio: [‘F’, ‘L’]. Testo: Per i primi giudici nel caso di specie questa esenzione non poteva essere applicata perché la complessiva attività di ‘A’ srl era un’attività commerciale svolta in concorrenza con altre imprese operanti nel settore Risposta: [‘F’] Testo: In assenza di sifatti elementi, che in via presuntiva avrebbero potuto fare giungere questo giudice a conclusioni diverse in via logica, si deve confermare l’esito cui è giunta la commissione provinciale Risposta: [‘F’] Testo: Su questo si osserva che si deve condividere la circostanza dedotta dal giudice di prime cure per cui deve essere il contribuente, ove sia contestata la inerenza e verità della rappresentazione ricavabile dal documento contabile, a dare la dimostrazione della fondatezza e della correttezza del comportamento tenuto Risposta: [‘L’] Testo: L’Uficio non potrà impedire ad un imprenditore, per esempio, di cedere immobili con prezzi bassi onulli per ricavare liquidità a fronte di nuovi impegni, ma dovrà rilevare la condotta antieconomica dello stesso sulla base dell’utile di esercizio Risposta: [‘L’] Testo: Invero l’avviso di accertamento è fondato sul mancato rispetto, da parte del contribuente, nel calcolo del ROL, delle disposizioni dell’articolo 96, secondo comma, del TUIR, che ne definisce le modalità Risposta: [‘F’, ‘L’] Testo: La società ‘A’, per quanto previsto dall’art. 4, comma 18 del Regolamento CEE n. 2913/1992, riveste il ruolo di ‘dichiarante in Dogana‘, soggetto passivo della obbligazione Risposta: [‘F’, ‘L’]

“Classifica la seguente premessa legale in uno o più dei seguenti schemi argomentativi: Rule, Prec, Class, Itpr, Princ. Rule: se esiste un riferimento esplicito o implicito a un articolo di legge o la citazione del testo di una norma. Prec: se esiste un riferimento ad una precedente pronuncia della Corte di Cassazione o della Corte di Giustizia dell’Unione Europea. Class: se c’è la definizone di un concetto giuridico o degli elementi costitutivi dello stesso. Itpr: se c’è il riferimento a uno dei criteri interpretativi contenuti all’art. 12 delle preleggi (letterale, teleologica, psicologica, sistematica) al codice civile. Princ: se c’è un riferimento espresso a un prinicpio generale del diritto (es. principio di proporzionalità). L’output atteso è una lista con tutte le label applicabili. Ad esempio: [‘Prec’, ‘Princ’, ‘Rule’]. Testo: Infatti, è ben vero che, ai sensi del combinato disposto dagli articoli 54 e 23 D.Lgs. n. 546/1992, il convenuto in appello deve costituirsi entro 60 giorni dal giorno in cui ricorso è stato notificato.

Risposta: [‘Rule’] Testo: L’Amministrazione “ha l’onere di provare ed allegare gli elementi probatori su cui si fondi la contestazione, tra i quali possono rilevare, in via indiziaria, quali elementi sintomatici della mancata esecuzione della prestazione dal fatturante, l’assenza della minima dotazione personale e strumentale, l’immediatezza dei rapporti (cedente/prestatore fatturante interposto e cessionario/committente), una conclamata inidoneità allo svolgimento dell’attività economica e la non corrispondenza tra i cedenti e la società coinvolta nell’operazione” Testo: In conclusione, per quanto fin qui esposto, i “compro oro” possono essere definiti come “esercizi commerciali che acquistano, commerciano o rivendono oggetti d’oro, di metalli preziosi o recanti pietre preziose usati e li cedono nella forma di materiale, di rottami d’oro o di metalli preziosi alle fonderie o ad altre aziende specializzate nel recupero di materiali preziosi”. Trattano esclusivamente prodotti finiti e non possono, congiuntamente, acquistare oro da gioielleria usato, fonderlo (per proprio conto o con incarico a terzi) e cedere il prodotto finito ottenuto Risposta: [‘Class’] Testo: Si vuole dire, in sostanza, che la finalità del contraddittorio anticipato è quella di mettere il contribuente nella condizione di potere fare valere le proprie osservazioni prima che la decisione sia adottata e, quindi, di far sì che l’Amministrazione possa tener conto di tutti gli elementi del caso nell’adottare (0 non adottare) il provvedimento ovvero nel dare a questo un contenuto piuttosto che un altro. Risposta: [‘Itpr’] Testo: Nell’ordinamento unionale, pertanto, il principio del contraddittorio in ambito tributario prescinde dalla natura del tributo e deve trovare applicazione ogni qualvolta l’amministrazione sulla base della documentazione esibita ritenga dovere dare alla stessa documentazione interpretazione diversa da quella data dal contribuente invitandolo, come detto fornire nel corso del contraddittorio le ragioni della propria scelta Risposta: [‘Princ’] Testo: In sintesi per esterovestizione si intende la ifttizia localizzazione della residenza fiscale di un soggetto all’estero, in particolare in un Paese con un trattamento fiscale più vantaggioso di quello nazionale,che la giurisprudenza configura in termini di abuso del diritto riconosciuto, in via tendenziale, come principio generale anche nel diritto dei singoli Stati membri (v. Cass., Sez. Un., n. 30055 del 2008, secondo la quale il divieto di abuso del diritto si traduce in un principio generale antielusivo che trova fondamento, in tema di tributi non armonizzati, nei principi costituzionali di capacità contributiva e di progressività dell’imposizione).

Risposta: [‘Prec’, ‘Class’, ‘Princ’] Testo: La denuncia, infatti, non codificata nel codice di procedura penale (a diferenza della notizia di reato di cui all’articolo 347 c.p.p.), può definirsi come qualunque atto con il quale chiunque abbia notizia di un reato perseguibile d’uficio ne informa il pubblico ministero o un uficiale di polizia giudiziaria.

[1]

E. M.

Bender ,

Gebru ,

McMillan-Major ,

Shmitchell , On the dangers of stochastic parrots: Can language models be too big? , in: FAccT, [14]

Bellandi ,

Castano ,

Ceravolo , E. Damiani, ACM, 2021 , pp. 610 - 623 . A. Ferrara , S.

Montanelli , S.

Picascia , A . Polimeno,

[2]

Walton , Argumentation Theory:

Very Short D. Riva , Knowledge-based legal document reIntroduction, Springer

, Boston, MA, 2009 , pp. trieval: A case study on italian civil court de1-22 . doi: 10 .1007/978- 0- 387 - 98197- 0 _1. cisions, in: D. Symeonidou , R.

Yu , D.

Ceolin ,

[3]

Lippi ,

Torroni , Argumentation mining: State M. Poveda-Villalón , D.

Audrito , L. D.

Caro , F.

Grasso, of the art and emerging trends , ACM Trans . Internet R. Nai , E.

Sulis , F. J.

Ekaputra , O.

Kutz , N. TroTechn. 16 ( 2016 ) 10 : 1 - 10 : 25 . doi: 10 .1145/2850417. quard (Eds.), Companion Proceedings of the 23rd

[4]

Cabrio ,

Villata , Five years of argument mining: International Conference on Knowledge Engineera data-driven analysis , in: IJCAI, ijcai.org , 2018 , pp. ing and Knowledge Management, Bozen-Bolzano , 5427 - 5433 . Italy, September 26-29 , 2022 , volume 3256 of CEUR Workshop Proceedings, CEUR-WS.org , 2022 . URL: https://ceur-ws. org/ Vol- 3256 /km4law2.pdf.

[15]

Galli ,

Grundler ,

Fidelangeli ,

Galassi ,

Lagioia ,

Palmieri ,

Ruggeri , G. Sartor,

Torroni , Predicting outcomes of italian VAT decisions , in: JURIX , volume 362 of Frontiers in Artificial Intelligence and Applications , IOS Press, 2022 , pp. 188 - 193 . doi: 10 .3233/FAIA220465.

[16]

Galassi ,

Lagioia ,

Jabłonowska ,

Lippi , Argument premise: Unfair clause detection in terms of service across multiple languages , Artificial Intelligence and Law ( 2024 ) 1 - 49 . doi: 10 .1007/s10506- 024- 09398- 7.

[17]

Drawzeski ,

Galassi ,

Jablonowska ,

Lagioia ,

Lippi ,

Micklitz , G. Sartor, G. Tagiuri,

Torroni , A corpus for multilingual analysis of online terms of service , in: NLLP@EMNLP, Association for Computational Linguistics, 2021 , pp. 1 - 8 . doi: 10 .18653/v1/ 2021 .nllp- 1 .1. Argument conclusion:

[18]

Ragazzi , G. Moro,

Guidi , G. Frisoni, Lawsuit: a large expert-written summarization dataset of italian constitutional court verdicts , Artificial Intelligence and Law ( 2024 ) 1 - 37 . doi: 10 .1007/ s10506- 024- 09414- w.

[19]

Licari ,

Bushipaka , G. Marino, G. Comandé, Premise Type classification. T. Cucinotta , Legal holding extraction from italian case documents using italian-legal-bert text sum- Factual premise: marization , in: Proceedings of the Nineteenth International Conference on Artificial Intelligence and Law , ICAIL '23, Association for Computing Machinery, New York, NY, USA, 2023 , p. 148 - 156 . doi: 10 .1145/3594536.3595177. Legal premise:

[20]

Attanasio ,

Basile ,

Borazio ,

Croce ,

Francis ,

Gili , E. Musacchio,

Nissim ,

Patti ,

Rinaldi ,

Scalena , CALAMITA: Challenge the Abilities of LAnguage Models in ITAlian , in: Proceedings of the 10th Italian Conference on Computational Linguistics (CLiC-it 2024 ), Pisa, Italy, December 4 - December 6, 2024 , CEUR Workshop Proceedings, CEUR-WS.org, 2024 .

[21]

Grundler ,

Santin ,

Galassi ,

Galli ,

Godano ,

Lagioia ,

Palmieri ,

Ruggeri , G. Sartor,

Torroni , Detecting arguments in CJEU decisions on fiscal state aid , in: G. Lapesa,

Schneider ,

Jo , Rule Scheme: S. Saha (Eds.), Proceedings of the 9th Workshop on Argument Mining, International Conference on Computational Linguistics , Online and in Gyeongju, Republic of Korea , 2022 , pp. 143 - 157 . URL: https: //aclanthology.org/ 2022 .argmining- 1 . 14 .

[22]

Santin ,

Grundler ,

Galassi ,

Galli ,

La- Precedent Scheme : gioia, E. Palmieri,

Ruggeri , G. Sartor,

Torroni , Argumentation structure prediction in CJEU decisions on fiscal state aid , in: ICAIL, ACM, 2023 , pp. 247 - 256 .

[23]

Walton ,

Reed ,

Macagno , Argumentation schemes, Cambridge University Press, 2008 .