SAVIA: Artificial Intelligence in support of the lawmaking process Michele Visciarelli1,*,† , Giovanni Guidi1,† , Laura Morselli1,† , Domitilla Brandoni1 , Giuseppe Fiameni2 , Luisa Monti3 , Stefano Bianchini3 and Cosimo Tommasi3 1 CINECA, via Magnanelli 6/3, Casalecchio di Reno (B0), 40033, Italy 2 NVIDIA AI Technology Center, Milan , Italy 3 Assemblea Legislativa Emilia Romagna, viale Aldo Moro 50, Bologna, 41127, Italy Abstract We explore the use of open-source Large Language Models (LLMs) to support legal professionals, lawmakers, and citizens in accessing information on the current and past legislation of the Emilia-Romagna region. We develop a generative AI tool based on the Retrieval-Augmented Generation (RAG) technique to answer questions related to regional laws and their implementing acts, retrieving relevant information from the Emilia-Romagna law corpus. To adapt pre-trained LLMs to this downstream task, we follow a multi-step approach. First, we use the QLoRa technique to quantize and adapt the pre-trained LLMs to the regional legal text dataset. Next, we fine-tune the domain-adapted models using an "ad-hoc" instruction-based dataset. We then implement a module to retrieve relevant contextual information from the legal documents dataset. Finally, we align the models with domain-specific instructions using RAG-based prompting. We evaluate the performance of the domain-adapted models using the perplexity metric, and the results of the final fine-tuned models are assessed by domain experts, focusing on the quality of the generated text and the relevance of the answers. Our results show that domain adaptation on domain-specific text is a crucial step for enhancing the quality of the generated text in expert domains, such as legal texts, which contain a vast amount of specialized vocabulary and expressions. This approach leads to higher performance compared to models fine-tuned only on small Question-Answer datasets. Additionally, our findings highlight the importance of the retrieval module, which must be able to reliably find the most relevant documents to provide useful and up-to-date insights to lawmakers and citizens. Keywords Generative AI, LLM, Legal AI, NLP 1. Introduction ural Language Processing (NLP) [13], by improvements in hardware acceleration for linear algebra, expanding In the last years the interest for Generative Artificial model’s size up to several billions of parameters, by the Intelligence (Generative AI) applications has grown im- introduction of quantization techniques allowing train- portance among the research and industry community, ing of large NNs even on consumer GPUs [14, 15], and by thanks to the introduction of Foundation Models in dif- the release of large high-quality and open-source datasets ferent AI domains, such as text generation (GPT-series [16]. [1, 2, 3], LLaMA-series [4, 5, 6], MEGATRON [7]), im- Large Language Models (LLMs) for text generation age generation (DALL· E [8, 9, 10], Stable Diffusion [11]), have achieved remarkable performance and great inter- and video generation (Sora [12]). The progress in Deep est even outside the research and industry professional Learning modelling has been fostered by important ad- community, in particular after the release of ChatGPT to vancements in the neural network (NN) research, such as the public [17]. Despite their success, the use of LLMs on the introduction of the Transformer architecture in Nat- domain-specific Question-Answer (QA) tasks still face several challenges, that hinder their spread beyond the Ital-IA 2024: 4th National Conference on Artificial Intelligence, orga- research community, especially when applied to tasks nized by CINI, May 29-30, 2024, Naples, Italy for which explainability and high quality responses are * Corresponding author. † of paramount importance [18]. These authors contributed equally. $ m.visciarelli@cineca.it (M. Visciarelli); g.guidi@cineca.it Some of the challenges that LLMs are still facing are (G. Guidi); l.morselli@cineca.it (L. Morselli); d.brandoni@cineca.it the following: (D. Brandoni); gfiameni@nvidia.com (G. Fiameni); luisa.monti@regione.emilia-romagna.it (L. Monti); • difficulty of maintaining up-to-date knowledge stefano.bianchini@regione.emilia-romagna.it (S. Bianchini); • costs of training and inference of large models, cosimo.tommasi@regione.emilia-romagna.it (C. Tommasi) costs and difficulty to collect large amount of high-  0000-0003-0753-2571 (M. Visciarelli); 0000-XXX (G. Guidi); 0000-0003-0753-2571 (L. Morselli); 0000-0002-8157-1459 quality domain-specific data (D. Brandoni); 0000-0001-8687-6609 (G. Fiameni) • hallucinated answers, i.e. answers that provided © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). false information without warning CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings • out-of-date or generic answers, even when the • Mistral-7B-v0.1: a 7B model, that implements user expects a specific, current response grouped-query and sliding window attention, Ro- tary Position Embedding, that can handle context Retrieval-Augmented Generation or RAG has recently of arbitrary size [23]. emerged as a paradigm to address such challenges [19]. • Mixtral-8x7B-Instruct-v0.1: a 46.7B mixture In particular, RAG combines a language model with an of experts model, trained on instructions in En- information retrieval system to dynamically fetch rel- glish, French, Italian, German and Spanish, with evant external information to enhance the model’s re- maximum context length of 32k [24]. sponses, by encoding the user’s question into a dense representation, and retrieving passages relevant to the Domain experts qualitatively evaluated the perfor- question from an indexed data source, adding this infor- mances of the final models obtained from the different mation to the LLM prompt. Different studies have shown pre-trained LLMs. that RAG enhances the quality of the generation process, leading to higher accuracy, better robustness, reduced 2.1. Unsupervised Domain-Adaptation hallucinations, higher interpretability, and even the pos- sibility to perform open-domain QA just by updating The first step in the procedure was the domain-adaptation the knowledge-base [20, 21]. RAG also offers a balanced of the model on legal text. We collected the PDFs of the approach in terms of customization and resource require- regional laws of Emilia-Romagna, as well as the relative ments, being more flexible and cost-effective than full implementing acts at the regional level (e.g. "atto del fine-tuning, although still requiring labeled data and a dirigente" and "atto di giunta") and the available reports supervised training phase. on the expected and measured impact of a given law (e.g. In this work we present SAVIA, a project developed by "clausola valutativa", "ex-ante and "ex-post" reports). We CINECA and Assemblea Legislativa of Emilia-Romagna. split the legal documents in chunks, and we implemented This project, started in Autumn 2023 and expected to end a cleaning pipeline to remove typos, bad characters, and in march 2025, has the goal of creating a model capable irrelevant parts of the documents such as headers and of answering questions on the Region’s law and their footers. We also added mii-llm/gazzetta-ufficiale [25] in respective implementing acts, as well as on related "ex- the training dataset, given the affinity of this dataset to ante" and "ex-post" reports of the laws’ impact. In Section our application, both in language, semantics and type 2 we present the data used for this project and the work- of documents. We did not perform domain-adaptive to- flow that has been adopted. In Section 3, we describe the kenization [26], using instead the pre-trained models procedure and the details of the experiments and tests native tokenizers to tokenize the legal corpus. conducted, and in Section 4 we show the obtained results. Not all the three model under investigation under- Our conclusion are then presented in Section 5. went domain adaptation. LLaMAntino-2-7b-hf-ITA and Mistral-7B-v0.1 were adapted, while Mixtral-8x7B- Instruct-v0.1, after tests regarding its native capabilities 2. Methodology of producing adequate Italian legal text, has not been domain adapted. To obtain a model capable of understanding Italian lan- guage in the law domain and responding to questions related to laws enacted in the Emilia-Romagna region, 2.2. Model Alignment on we followed a multi-step approach. We started from an Instruction-Based dataset open-source LLM and adapted it to the legal language With the support of domain experts, we generated an through unsupervised domain adaptation (Section 2.1). Q&A dataset mimicking different levels of domain lan- The resulting domain-adapted model was then fine-tuned guage proficiency, ranging from questions that could by for question-answering (Q&A) on an instruction-based written by non-expert users, to the ones that may be dataset prepared by domain experts for this purpose (Sec- asked by experts in the legal domain. We developed a tion 2.2). Finally, we implemented a domain-adapted semi-automatic procedure to further enrich this Q&A retrieval model (Section 2.3) to enrich the answers with dataset, using legal documents metadata. The following relevant information from the law corpus. is an example included in the instruction-based dataset: The full workflow was reproduced starting from dif- ferent open-source LLMs: • Q: "Da quando è stata istituita la regione, quali normative sono state adottate per incentivare la • LLaMAntino-2-7b-hf-ITA: a 7B model, based partecipazione?" on LLama-2, specifically fine-tuned for the Italian • A: "La prima legge regionale riguardante la parte- language [22]. cipazione ad essere stata approvata è la legge nu- mero 3 del 2010. In seguito, la legge numero 3 del 2010 è stata abolita e sostituita con la legge Table 1 regionale numero 15 del 2018.", Perplexity for base and domain-adapted models under study. To fine-tune the domain-adapted LLMs, we used the Model Perplexity instruction-based dataset prepared by the domain experts. LLaMAntino-2-7b-hf-ITA 5.2447 For the loss function computation, we removed the por- LLaMAntino-2-7b-hf-ITA domain-adapted 1.1789 Mistral-7B-v0.1 5.2254 tion of the text containing the prompt, as in many cases Mistral-7B-v0.1 domain-adapted 2.3618 the prompt added by the RAG module can account for up to 50% of the total text length. This approach helped to optimize the training process more effectively. 4. Results 2.3. Domain-Adapted Retrieval Model To evaluate the quality and proceed to select the can- didate for instruction-based fine-tuning, all domain- To enrich the user’s question with relevant information adapted models were evaluated using the perplexity met- from the legal documents database, we develop a retrieval ric (PPL) on an held-out evaluation dataset based on laws. module based on Semantic Similarity search technique. The metric is reported in Table 1. We used a Sentence-BERT model [27] to populate a vec- Three different domain experts (lawmakers of Assem- tor store with embedding generated from the legal docu- blea Legislativa) were asked to evaluate the answers gen- ments text chunks. The content similar to a user’s ques- erated by the final instruction fine-tuned models to a set tion is retrieved using the semantic search library FAISS of 25 questions. The qualitative analysis of the experts [28, 29]. reported that, in general, the answers provided by the LLaMAntino-based model were considered too short and 3. Experiment dry. The answers provided by the Mixtral-based model were considered the most complete, clear and satisfactory The project has been carried out exploiting the compu- in terms of quality of the used specific words. Below we tational resources of the supercomputer LEONARDO, report an example of answers provided by the different hosted by CINECA. Each node in the booster partition is final models on a given question. For context, we also equipped with four NVidia A100 SXM6 64GB GPUs and include the answer of chatGPT (3.5) to the same question. a single 32-cores Intel Ice Lake CPU. For all models, only data parallelism has been em- • Question: Sul tema della partecipazione, quali ployed, given the fact that all these models could ade- leggi sono state fatte in Emilia-Romagna? quately fit in the VRAM of the GPUs at our disposal. For • Answer of Mixtral-8x7B-Instruct-v0.1 fine-tuned: the same reason, LLaMAntino-2-7b-hf-ITA and Mistral- La prima legge regionale approvata in tema 7B-v0.1 have not been quantized during domain adap- di partecipazione è la legge regionale 9 feb- tation and instruction fine-tuning, opting to preserve braio 2010, n. 3, intitolata "NORME PER LA the weights’ precision. Mixtral-8x7B-Instruct-v0.1 un- DEFINIZIONE, RIORDINO E PROMOZIONE derwent 4-bit quantization instead [30], due to its size. DELLE PROCEDURE DI CONSULTAZIONE E For domain adaptation and instruction fine-tuning, we PARTECIPAZIONE ALLA ELABORAZIONE applied LoRA adapters on Q, K, V layers of the models DELLE POLITICHE REGIONALI E LOCALI". [15]. The training procedure for the models under study Questa legge è stata successivamente abrogata were the following: e sostituita con la legge regionale 22 ottobre 2018, n. 15, denominata "LEGGE SULLA PARTE- • pre-trained LLMs causal Language Modelling CIPAZIONE ALL’ELABORAZIONE DELLE on the legal text chunks. This has been per- POLITICHE PUBBLICHE. ABROGAZIONE formed on LLaMAntino-2-7b-hf-ITA and Mistral- DELLA LEGGE REGIONALE 9 FEBBRAIO 2010, 7B-v0.1 and needed for each model, on average, N. 3". La legge regionale attualmente in vigore in 400 GPU hours (approximately 4 days on a sin- materia di partecipazione è la legge regionale 22 gle LEONARDO booster node) training for four ottobre 2018, n. 15. epochs; • model alignment on domain adapted • Answer of LLaMAntino-2-7b-hf-ITA domain- LLaMAntino-2-7b-hf-ITA and Mistral-7B-v0.1, adapted + instruction fine-tuned: - Legge Re- and on base pre-trained Mixtral-8x7B-Instruct- gionale 9 febbraio 2010, n. 3 - Legge Regionale 22 v0.1, on the QA dataset. This step required ottobre 2018, n. 15 approximately 96 GPU hours, or 24 node hours • Answer of Mistral-7B-v0.1 domain-adapted + in- (4 GPUs per node), to complete a 12 epochs struction fine-tuned: In relazione alla parteci- training run, on a single LEONARDO node. pazione, le leggi regionali sono state le seguenti: - Legge Regionale 3 del 2010 - Legge Regionale 15 sources. del 2018 Given the scope and the length of the project, sev- • Answer of chatGPT-3.5: Non ho accesso a in- eral improvements to the workflow are foreseen in the formazioni in tempo reale o a dati specifici near future, as well as the possibility to test with more riguardanti leggi promulgate dopo gennaio 2022. pre-trained open source models, for example new Italian- Tuttavia, posso fornirti un’informazione generale native models that will be developed in the near future sul tema della partecipazione politica in Italia. In and domain adaptation of Mixture-of-Experts models Italia, molte regioni hanno promulgato leggi e (such as Mixtral-8x7B-v0.1). Our future work will also normative per favorire la partecipazione dei cit- focus on further improving the retrieval module with bet- tadini alla vita politica e amministrativa. Queste ter embedding models, and on applying more powerful leggi possono riguardare vari aspetti, come la con- techniques to train the LLMs, such as Direct Preference sultazione pubblica su decisioni amministrative, Optimization (DPO, [31]) and Reinforcement Learning la promozione della trasparenza nelle istituzioni, from Human Feedback (RLHF, [32]). o l’istituzione di forme di democrazia partecipa- tiva a livello locale. Per quanto riguarda l’Emilia- Romagna, una delle regioni più attive nel promuo- Acknowledgments vere la partecipazione cittadina, potrebbero essere We are extremely grateful to the President of Assem- state adottate diverse leggi e iniziative in questo blea Legislativa Emilia-Romagna, Emma Petitti, for the ambito. Alcune di queste potrebbero riguardare long-eyed vision that created the conditions to launch la partecipazione ai processi decisionali locali, la the project – and to the Director General of Assemblea promozione della trasparenza e dell’accesso alle Legislativa Emilia-Romagna, Leonardo Draghetti, to set informazioni pubbliche, o la creazione di stru- strategically the project and ensure the necessary human menti e piattaforme per coinvolgere attivamente and material resources. i cittadini nelle decisioni che li riguardano. Per Besides, this endeavour would not have been possible ottenere informazioni specifiche sulle leggi rela- without the commitment of the President of CINECA, tive alla partecipazione in Emilia-Romagna dopo Francesco Ubertini, and of the Director of the Su- il 2022, ti consiglio di consultare le fonti ufficiali percomupting applications and innovation Director of della Regione Emilia-Romagna, come il sito web CINECA, Sanzio Bassini. istituzionale o i comunicati stampa delle autorità Special thanks goes to Giovanna Favero of Assemblea regionali. In alternativa, potresti contattare di- Legislativa Emilia-Romagna for her efforts in making rettamente gli uffici regionali competenti per ot- available laws, implementing acts, as well as related ”ex- tenere informazioni aggiornate sulle leggi e le ante” and ”ex-post” reports. iniziative in materia di partecipazione politica e amministrativa. References 5. Conclusions [1] T. B. Brown, B. Mann, N. Ryder, M. Sub- biah, et al., Language Models are Few- We explored different approaches to adapt open-source Shot Learners, arXiv e-prints (2020) LLMs for question-answering on the Emilia-Romagna arXiv:2005.14165arXiv:2005.14165, law corpus. We adapted the different LLMs on a corpus doi:10.48550/arXiv.2005.14165. composed of the Emilia-Romagna regional laws and the [2] M. Chen, J. Tworek, H. Jun, Q. Yuan, et relative implementing acts, and we further refined the l., Evaluating Large Language Models domain-adapted models on a custom QA dataset provided Trained on Code, arXiv e-prints (2021) by domain experts. Finally, we exploited RAG to enrich arXiv:2107.03374arXiv:2107.03374, the user’s question with relevant contextual information doi:10.48550/arXiv.2107.03374. extracted from the law database. [3] OpenAI, J. Achiam, S. Adler, S. Agarwal, et We experimented with different open-source LLMs, al., GPT-4 Technical Report, arXiv e-prints such as Mistral-7B-v0.1, LLaMAntino-2-7b-hf-ITA, (2023) arXiv:2303.08774arXiv:2303.08774, doi: Mixtral-8x7B-Instruct-v0.1. Our results show that 10.48550/arXiv.2303.08774. domain-adapted LLMs that are able to answer specific do- [4] H. Touvron, T. Lavril, G. Izacard, X. Mar- main questions can be a helpful tool to support decision- tinet, et al., LLaMA: Open and Efficient making in specialized fields such as the legal domain, Foundation Language Models, arXiv e-prints that often need to retrieve exact, concise and easy-to- (2023) arXiv:2302.13971arXiv:2302.13971, understand information from large and unstructured data doi:10.48550/arXiv.2302.13971. [5] H. Touvron, L. Martin, K. Stone, P. Al- [17] OPENAI, Introducing chatgpt, Tech. rep., OpenAI bert, et al., Llama 2: Open Foundation and (2022). Fine-Tuned Chat Models, arXiv e-prints [18] L. Huang, W. Yu, W. Ma, W. Zhong, et (2023) arXiv:2307.09288arXiv:2307.09288, al., A Survey on Hallucination in Large Lan- doi:10.48550/arXiv.2307.09288. guage Models: Principles, Taxonomy, Chal- [6] B. Rozière, J. Gehring, F. Gloeckle, S. Sootla, lenges, and Open Questions, arXiv e-prints et al., Code Llama: Open Foundation (2023) arXiv:2311.05232arXiv:2311.05232, doi: Models for Code, arXiv e-prints (2023) 10.48550/arXiv.2311.05232. arXiv:2308.12950arXiv:2308.12950, [19] P. Lewis, E. Perez, A. Piktus, F. Petroni, doi:10.48550/arXiv.2308.12950. et al., Retrieval-Augmented Generation for [7] D. Narayanan, M. Shoeybi, J. Casper, Knowledge-Intensive NLP Tasks, arXiv e-prints P. LeGresley, et al., Efficient Large-Scale (2020) arXiv:2005.11401arXiv:2005.11401, Language Model Training on GPU Clus- doi:10.48550/arXiv.2005.11401. ters Using Megatron-LM, arXiv e-prints [20] S. Siriwardhana, R. Weerasekera, E. Wen, (2021) arXiv:2104.04473arXiv:2104.04473, T. Kaluarachchi, et al., Improving the Do- doi:10.48550/arXiv.2104.04473. main Adaptation of Retrieval Augmented [8] A. Ramesh, M. Pavlov, G. Goh, S. Gray, C. Voss, Generation (RAG) Models for Open Do- A. Radford, M. Chen, I. Sutskever, Zero-Shot main Question Answering, arXiv e-prints Text-to-Image Generation, arXiv e-prints (2022) arXiv:2210.02627arXiv:2210.02627, (2021) arXiv:2102.12092arXiv:2102.12092, doi:10.48550/arXiv.2210.02627. doi:10.48550/arXiv.2102.12092. [21] P. Zhao, H. Zhang, Q. Yu, Z. Wang, et al., [9] A. Ramesh, P. Dhariwal, A. Nichol, C. Chu, Retrieval-Augmented Generation for AI- M. Chen, Hierarchical Text-Conditional Image Generated Content: A Survey, arXiv e-prints Generation with CLIP Latents, arXiv e-prints (2024) arXiv:2402.19473arXiv:2402.19473, (2022) arXiv:2204.06125arXiv:2204.06125, doi: doi:10.48550/arXiv.2402.19473. 10.48550/arXiv.2204.06125. [22] P. Basile, E. Musacchio, M. Polignano, L. Sicil- [10] Z. Shi, X. Zhou, X. Qiu, X. Zhu, Im- iani, G. Fiameni, G. Semeraro, LLaMAntino: proving Image Captioning with Better LLaMA 2 Models for Effective Text Gen- Use of Captions, arXiv e-prints (2020) eration in Italian Language, arXiv e-prints arXiv:2006.11807arXiv:2006.11807, (2023) arXiv:2312.09993arXiv:2312.09993, doi:10.48550/arXiv.2006.11807. doi:10.48550/arXiv.2312.09993. [11] R. Rombach, A. Blattmann, D. Lorenz, P. Esser, [23] A. Q. Jiang, A. Sablayrolles, A. Mensch, B. Ommer, High-Resolution Image Synthesis C. Bamford, et al., Mistral 7B, arXiv e-prints with Latent Diffusion Models, arXiv e-prints (2023) arXiv:2310.06825arXiv:2310.06825, (2021) arXiv:2112.10752arXiv:2112.10752, doi: doi:10.48550/arXiv.2310.06825. 10.48550/arXiv.2112.10752. [24] M. A. team, Mixtral of experts, Tech. rep., Mistral [12] OpenAI, Video generation models as world simula- AI (2023). tors, Tech. rep., OpenAI (2024). [25] E. Federici, M. Ferraretto, N. Landro, Gazzetta [13] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkor- Ufficiale: A dataset of legislative texts, public and eit, L. Jones, A. N. Gomez, L. Kaiser, I. Polo- private acts (2024). sukhin, Attention Is All You Need, arXiv e-prints URL https://huggingface.co/datasets/mii-llm/ (2017) arXiv:1706.03762arXiv:1706.03762, doi: gazzetta-ufficiale 10.48550/arXiv.1706.03762. [26] M. Liu, T.-D. Ene, R. Kirby, C. Cheng, et [14] E. J. Hu, Y. Shen, P. Wallis, Z. Allen-Zhu, Y. Li, al., ChipNeMo: Domain-Adapted LLMs S. Wang, L. Wang, W. Chen, LoRA: Low-Rank Adap- for Chip Design, arXiv e-prints (2023) tation of Large Language Models, arXiv e-prints arXiv:2311.00176arXiv:2311.00176, (2021) arXiv:2106.09685arXiv:2106.09685, doi: doi:10.48550/arXiv.2311.00176. 10.48550/arXiv.2106.09685. [27] N. Reimers, I. Gurevych, Sentence-BERT: [15] T. Dettmers, A. Pagnoni, A. Holtzman, Sentence Embeddings using Siamese L. Zettlemoyer, QLoRA: Efficient Finetun- BERT-Networks, arXiv e-prints (2019) ing of Quantized LLMs, arXiv e-prints arXiv:1908.10084arXiv:1908.10084, (2023) arXiv:2305.14314arXiv:2305.14314, doi:10.48550/arXiv.1908.10084. doi:10.48550/arXiv.2305.14314. [28] J. Johnson, M. Douze, H. Jégou, Billion-scale simi- [16] H. Face, Hugging face datasets (2016). larity search with GPUs, IEEE Transactions on Big URL https://huggingface.co/datasets Data 7 (3) (2019) 535–547. [29] M. Douze, A. Guzhva, C. Deng, J. Johnson, G. Szil- vasy, P.-E. Mazaré, M. Lomeli, L. Hosseini, H. Jégou, The faiss library (2024). arXiv:2401.08281. [30] H. Face, bitsandbytes, Tech. rep., Hugging Face (2023). [31] R. Rafailov, A. Sharma, E. Mitchell, S. Er- mon, C. D. Manning, C. Finn, Direct Pref- erence Optimization: Your Language Model is Secretly a Reward Model, arXiv e-prints (2023) arXiv:2305.18290arXiv:2305.18290, doi: 10.48550/arXiv.2305.18290. [32] L. Ouyang, J. Wu, X. Jiang, D. Almeida, et al., Training language models to follow in- structions with human feedback, arXiv e-prints (2022) arXiv:2203.02155arXiv:2203.02155, doi: 10.48550/arXiv.2203.02155.