Legal Drafting supported by AI: enhancing LEOS Monica Palmirani1∗, Fabio Vitali2, Generoso Longo1, Emanuele Di Sante1, Aurora Brega1, Andrea D’Arpa1, Michele Corazza1 1University of Bologna, ALMA-AI, via Galliera 3, Bologna, 40121, Italy 2 University of Bologna, DISI, via Mura Anteo Zamboni 7, Bologna, 40126, Italy Abstract Legal drafting is a complex activity that involves different actors and end-users, usually belonging to the administration staff. AI tools could support this activity by providing useful aid for various tasks. This paper presents two scenarios where the AI add-on supports the legal drafting activity conducted using the LEOS web editor, developed by the EU Commission for EU legislation. The two scenarios are the following: i) retrieving the relevant existing normative definitions connected with the ongoing bill, by using algorithms based on semantic similarity; ii) suggesting the normative more pertinent references when some information is missing (e.g., the year); iii) aiding the drafter in following templates for improving clearness and regularity in the norms (e.g., modifications). Additionally, it is crucial to model a user interface that is capable of guaranteeing some foundational principles: accessibility, transparency, usability, user experience, and explicability. This paper presents the output of this project conducted in collaboration with the DG Informatics of the EU Commission. Keywords Akoma Ntoso, LEOS, similarity, AI.1 1. Introduction apply Symbolic AI based on rules [12]. LEOS [5], [10] is one of the most promising web editors for legal The legal drafting activity is a crucial task in the drafting, it has been developed by the EU Commission legislative procedure in any deliberative assembly. to support the internal legal drafting activities but also The goals of this task are many: i) to support the with the aim to serve the Member States as well. political decision-makers; ii) to standardize the LEOS is an open-source web editor specific for language with the legal tradition, adopting legal drafting, it is written in Angular and it is oriented multilingual translations when necessary; iii) to apply to manage all the law-making process [15]. drafting rules to improve quality, and clearness; iv) to The aim of this work is to develop a framework guarantee the Rule of Law and the theory of law architecture that is capable of enhancing LEOS with principles; v) to track the modifications happening add-ons, developed with AI technologies, that over time due to the the legislative process. In the last improve the quality of the legal content, help the legal 15 years many specialized editors have been drafters, and manage the law-making process. The developed [13],[5],[3],[1], in order to support these two add-ons provide the following features important goals using Natural language processing [7],[4][4],[14]: technology [6]. Among the proposed solutions some use the Semantic Web approach [2], while others Ital-IA 2024: 4th National Conference on Artificial Intelligence, 0000-0002-8557-8084 (M. Palmirani); 0000-0002-7562-5203 organized by CINI, May 29-30, 2024, Naples, Italy (F. Vitali); 0000-0002-7288-6635 (M. Corazza); ∗ Corresponding author. © 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). All authors contributed equally. monica.palmirani@unibo.it (M. Palmirani); fabio.vitali@unibo.it (F. Vitali); generoso.longo@studio.unibo.it (G. Longo); emanuele.disante@studio.unibo.it (E. Di Sante); aurora.brega@stuido.unibo.it (A. Braga); andrea.darpa@studio.unibo.it (A. D’Arpa); michele.corazza@unibo.it (M. Corazza). CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings (i) Suggest the pertinent normative provision (e.g., “For the purposes of this Directive, the definitions using similarity with the bill definitions laid down in Article 2 of Directive topic; 2000/60/EC shall apply”). (ii) Suggest the pertinent normative Fourthly, the context is important for providing reference using the thematic similarity the relevant output of the suggestion. A definition with the bill; depends on the topic of the bill. For example, we have (iii) Take into consideration the temporal many definitions of ‘accuracy’ and it depends on the information and the nested normative topic of the document. references; Fifthly, the user interface is a fundamental pillar (iv) Use the metadata of ELI2 and for guaranteeing good usability, transparency, and EUROVOC3 to improve the similarity. explicability of the AI behaviors and output [8]. Finally, we use Akoma Ntoso [9] serialization for The aim is also to create a user interface capable fostering the structure of the legal documents, the of: normative references, the metadata of the lifecycle of (i) Reduce manual/error-prone work the document, the date of entry into force, into typing the normative references, also operation, and the date of repeal. avoiding repetitions in legislative citations; 3. Dataset (ii) Maximising reuse of similar legal The dataset used is composed by 10 years of European concepts (e.g., definition); legislation (2010-2021), about 15.000 regulations (iii) Increasing transparency and and directives. It was provided by the European searchability of the existing legal Publication Office in Formex 3.0 XML format. We have knowledge included in the corpora. converted all the documents in Akoma Ntoso, and using a natural language processing approach we 2. Methodology have annotated the definitions and the normative The adopted methodology is based on hybrid AI [11], references. and it uses multiple techniques for achieving its goals. The dataset includes about 899 documents with We do not generate new text (e.g., using LLM o definitions. For definitions, we have considered only generative AI), but we intend to suggest pertinent, the explicit provisions usually titled “Definitions” or contextual, and significant existing legal knowledge where a regular pattern can surely identify the extracted by the legal corpora, using a similarity index relationship between a term (definiens) and according to the bill parameters that the legal drafter description (definiendum) (e.g., ‘definiens’ means is writing. We also use the EUROVOC classification and definiendum, “‘domain’ means one or several data other contextual information provided by the experts sets that cover specific topics;”). The definitions that during the drafting process (e.g., type of provision). include normative references are managed by Secondly, the approach takes into consideration navigating the link to include the complete the temporal validity of the normative provisions, information (e.g. ‘personal data’ means personal data excluding those that are repealed, or suggesting the as defined in point (1) of Article 4 of Regulation (EU) appropriate versions of the consolidated text 2016/679). according to the view date typed by the end-user. If the author seeks the normative definition of “privacy” 4. Use Cases before the GDPR, they can set the date of view before the 5 May of 2016 (the date of entering into force of 4.1. Normative References the act) and the system will respect this setting. Normative references are qualified citations used for Thirdly, we resolve the normative references in mentioning other documents or provisions relevant order to include in the model of indexing the text cited for the normative discourse. The errors during the in the recursive way as well (only the first level), typing of the normative references produce incorrect allowing us to grasp more information, especially links and additional effort in the control phases. when the definition is limited in the text and it The system permits to type incomplete normative consists only of normative citations to another references and to retrieve and rank the existing and 2 ELI: https://eur-lex.europa.eu/eli-register/about.html 3 EUROVOC: https://eur- lex.europa.eu/browse/eurovoc.html?locale=it into force references which are similar to the XML documents and an SQL database containing the information requested by the end-user. In the case a correspondence between each document and its citation of the form “Regulation 406”, for example, the EUROVOC categorization. Each EUROVOC is system returns all the Regulation which are valid, into associated with an average of the Word2Vec [16] force, numbered 406 and pertinent to the EUROVOC embeddings of the words composing it. The eXist of the bill. The system completes the reference (e.g., database including all the AKN-XML documents4 can Regulation 406/2010) and returns the title of the also use Lucene Java library to calculate the index of document and other information for identifying the the document text and in particular to the definitions act as well. (defBody elements). When a new document enters the Due to the evolution of the European institutions, eXist database it is also indexed in the SQLDB and the the references have changed syntax and patterns over Word2Vec representation of its definitions is stored. time. For this reason, the end-user can easily make a If the document does not have EUROVOC tags, we mistake in the citation format. Our tool helps the end- extract them from CELLAR and we serialize the user to compose the reference according to the information in the metadata of the Akoma Ntoso historical period of the document cited. For example, documents. a Regulation before 1968 is cited using During legal drafting, if the end-user wants to get number/yy/EEC (e.g., Regulation No 1009/67/EEC); a suggestion (e.g., normative reference or definition), after 1968 we have number/yy (e.g. Regulation (EEC) they need to provide some parameters as inputs, in No 2195/91) and after 2009 we have yyyy/number order to calculate the corresponding indexes like the (e.g., Regulation (EU) 2016/679). title and the EUROVOC keywords of the bill (proposal of law). The dynamic input typed by the end user (e.g., 4.2. Legal Definitions incomplete normative reference or definition keywords) is parsed to compare the content with the Legal definitions are a sensitive part of the law existing document collection in eXist. After a first filter because they define new legal concepts, new using traditional Information Retrieval techniques for terminologies, equivalences between different other grasping the relevant documents, the similarity score definitions, and exceptions in the case of specific is calculated based on the text retrieved and cases. In EU legislation, we usually have a clear article compared with the embeddings of the input called “Definitions”, but sometimes we could also find parameters stored in the SQL DB (for EUROVOC technical definitions in the last part of the act or in the values) and using the similarity algorithm of Lucine annexes. for the definitions. The ranking is based on the index Additionally, we could have definitions organized score, the temporal parameters, considering the in a long list of points, which might be connected to normative citations included in the normative each other. Definitions are composed of three main provision retrieved as well. parts: definiens (term); definiendum (description); Lucene Similarity class implements the scoring legal concept (abstract class of concept). The use of model. The library offers several already-built the same term for multiple definitions is not implementations of the Similarity class, which reflect infrequent, and the term might have completely different scoring models developed in the field of different meaning in different domains (e.g., pollution Information Retrieval. Our implementation adopts has different definitions according to the domain like Default Similarity class, which combines the Boolean water, energy, industry, etc.). model, adopted to filter documents matching the For this reason, the tool calculates the similarity of query, and a readjustment of the Vector Space model, a given term (which can also be composed of multiple based on TF-IDF weights, for scoring results. In words) with the existing, valid, and updated (present particular, VSM is refined by Lucene taking into in consolidated versions of documents ) definitions in account the corpus statistics contained in the inverted the legal corpus, using the similarity index as a index, the number of terms that correspond to the criterion. query, and the multiplying enhancement factors expressed in the research. This class is also exploited 5. Architecture by the process chain of indexing, since it deals with the The overall architecture (see Figure 1) is composed calculation of the normalization factors, which of an XML database that includes the Akoma Ntoso depend on the length of the fields and the boost 4 eXist is an XML database that is indexed using Lucine and querable with XQuery. factors specified in the configuration(Similarity 7. Conclusions (Lucene 3.6.1 API) (apache.org)). The current paper presents two add-ons 6. User-interface integrated into LEOS web editor to enhance legal drafting tasks using AI applications. The user interface The user interface (see Figure 2) is a fundamental is a fundamental component of this work that is part of this application. LEOS is enriched with an add- designed to incorporate the principles of on that enables these functionalities in a selective transparency, accessibility, user experience, and way. The suggestions are offered in a portion of the explicability. The methodology is to not generate new window that allows the end-user to confirm or discard text (e.g., like LLMs) to avoid hallucinations, which the output, or to integrate the results in the drafting could affect the democratic rules of the law-making text. process. Our custom components are organised in a We aim to extract and offer to the legal drafters the dedicated application folder, comprising new legal knowledge stored in the corpus, which is components (stored in .component.ts, sometimes difficult to find due to the large volume of .component.html, and .component.scss files), new documents, and to return the relevant information classes (.ts files), and service (in a .service.ts file). This accompanied with a particular index score based on service manages the essential methods and global temporal parameters, similarity of text using qualified variables used by our approach. legal provisions like definitions and normative To maintain consistency, we adopted a style for references. The first results were evaluated by legal our extension that closely imitates the original experts and they are promising and pertinent to the application's design. Many of the components used drafting text. Moreover, the end-users appreciated the were taken from the eUI library, and we followed the provided suggestions, which could retrieve pertinent guidelines and suggestions provided by the eUI information using topic similarity, cutting repetitive framework. The version of the eUI library used is 14, work and focusing on higher-level tasks. the same one adopted by LEOS and used in its native components. Therefore, both the shape and color of Acknowledgements the interface elements are consistent with those indicated by the framework. This project is co-funded by DG Informatics of the The components we added, we always provide European Commission inside of the larger project feedback to the user, displaying results when LEOS and with the support of the European generated, an error message if the service responses Commission funds within ERC HyperModeLex. Grant raise an issue, and an alert if the user's request is not agreement ID: 101055185. executed correctly, accordingly with the functionality we aim to provide. We designed it so that the user References knows the reasons for an incomplete or incorrect [1] Agnoloni Tommaso, E Francesconi, P Spinosa, request and is given the opportunity to make any xmLegesEditor: an opensource visual XML necessary corrections. We also strive to maintain editor for supporting legal national standards consistency in the terms used in the labels, ensuring Proceedings of the V legislative XML workshop, that each element is identified by a unique name and 239-251, 2007 avoiding multiple elements with the same name. [2] Casanovas, Pompeu; Palmirani, Monica; Peroni, The end user of the service is an expert in Silvio; Van Engers, Tom; Vitali, Fabio, Semantic legislative matters, so we prioritised making the Web for the Legal Domain: The'next step, interface simple and intuitive but also very specific for «SEMANTIC WEB», 2016, 7, pp. 213 - 227 professional tasks in drafting, considering that the [3] Grant Vergottini, user has clear knowledge of the subject matter being https://xcential.com/legispro/standards/ addressed. We created mockups of the interface to [4] Griglio Elena - Marchetti Carlo, La "specialità" evaluate it before implementation, ensuring that it is delle sfide tecnologiche applicate al drafting indeed usable and effective. The end-user is parlamentare : dal quadro comparato constantly involved in the evaluation with regular all'esperienza del Senato italiano / Elena Griglio, meetings where the usability is tested and feedback is Carlo Marchetti, Osservatorio sulle fonti. - 2022, incorporated in the software. n. 3, p. 361-386 [5] LEOS https://joinup.ec.europa.eu/collection/justice- law-and-security/solution/leos-open-source- software-editing-legislation [6] Lesmo Leonardo, Alessandro Mazzei, Monica Palmirani, Daniele Paolo Radicioni: TULSI: an NLP system for extracting legal modificatory provisions. Artif. Intell. Law 21(2): 139-172 (2013) [7] Lorello Laura, La qualità della legislazione diventa un’esigenza bicamerale : considerazioni sul nuovo Comitato per la legislazione del Senato della Repubblica nel primo semestre di attività, Federalismi.it : rivista di diritto pubblico italiano, comunitario e comparato. - 2023, n. 27, p. 17-45 [8] Palmirani M., Vitali F, Legislative Drafting Systems, in: Usability in Government Systems, NEW YORK, Morgan Kaufmann, 2012, pp. 133 - 151 [9] Palmirani Monica (2011). Legislative change management with Akoma Ntoso. In Legislative XML for the semantic Web, 101–130. Springer [10] Palmirani Monica et. al. Drafting legislation in the era of AI and digitisation, https://joinup.ec.europa.eu/sites/default/files /document/2022- 06/Drafting%20legislation%20in%20the%20e ra%20of%20AI%20and%20digitisation%20% E2%80%93%20study.pdf [11] Palmirani Monica, F Sovrano, D Liga, S Sapienza, F Vitali, Hybrid AI Framework for Legal Analysis of the EU Legislation Corrigenda, Legal Knowledge and Information Systems, 68-75 [12] Palmirani Monica, Luca Cervone, Octavian Bujor, Marco Chiappetta: RAWE: An Editor for Rule Markup of Legal Texts. RuleML (2) 2013 [13] Palmirani Monica, Raffaella Brighi: An XML Editor for Legal Information Management. EGOV 2003: 421-429 [14] Tafani Laura, Federico Ponte, Le tecniche legislative statali, regionali e dell'Unione europea a confronto : per un auspicabile ravvicinamento, Osservatorio sulle fonti. - 2022, n. 1, p. 447-497 [15] Leos Manual https://joinup.ec.europa.eu/collection/justice- law-and-security/solution/leos-open-source- software-editing-legislation/releases [16] Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. Distributed Representations of Words and Phrases and their Compositionality. In Proceedings of NIPS, 2013. Figure 1 – Architecture of the system. Figure 2 – Interface of LEOS with the add-on. Figure 3 – Interface of LEOS with the add-on results.