1. Introduction

SAVIA: Artificial Intelligence in support of the lawmaking process

Michele Visciarelli

m.visciarelli@cineca.it 1

Giovanni Guidi

g.guidi@cineca.it 1

Laura Morselli

l.morselli@cineca.it 1

Domitilla Brandoni

d.brandoni@cineca.it 1

Giuseppe Fiameni

gfiameni@nvidia.com 2

Luisa Monti

luisa.monti@regione.emilia-romagna.it 0

Stefano Bianchini

stefano.bianchini@regione.emilia-romagna.it 0

Cosimo Tommasi

cosimo.tommasi@regione.emilia-romagna.it 0 0 Assemblea Legislativa Emilia Romagna , viale Aldo Moro 50, Bologna, 41127 , Italy 1 CINECA , via Magnanelli 6/3, Casalecchio di Reno (B0), 40033 , Italy 2 NVIDIA AI Technology Center , Milan , Italy

We explore the use of open-source Large Language Models (LLMs) to support legal professionals, lawmakers, and citizens in accessing information on the current and past legislation of the Emilia-Romagna region. We develop a generative AI tool based on the Retrieval-Augmented Generation (RAG) technique to answer questions related to regional laws and their implementing acts, retrieving relevant information from the Emilia-Romagna law corpus. To adapt pre-trained LLMs to this downstream task, we follow a multi-step approach. First, we use the QLoRa technique to quantize and adapt the pre-trained LLMs to the regional legal text dataset. Next, we fine-tune the domain-adapted models using an "ad-hoc" instruction-based dataset. We then implement a module to retrieve relevant contextual information from the legal documents dataset. Finally, we align the models with domain-specific instructions using RAG-based prompting. We evaluate the performance of the domain-adapted models using the perplexity metric, and the results of the final fine-tuned models are assessed by domain experts, focusing on the quality of the generated text and the relevance of the answers. Our results show that domain adaptation on domain-specific text is a crucial step for enhancing the quality of the generated text in expert domains, such as legal texts, which contain a vast amount of specialized vocabulary and expressions. This approach leads to higher performance compared to models fine-tuned only on small Question-Answer datasets. Additionally, our findings highlight the importance of the retrieval module, which must be able to reliably find the most relevant documents to provide useful and up-to-date insights to lawmakers and citizens.

eol>Generative AI LLM Legal AI NLP

1. Introduction

• dificulty of maintaining up-to-date knowledge • costs of training and inference of large models, costs and dificulty to collect large amount of highquality domain-specific data • hallucinated answers, i.e. answers that provided false information without warning • out-of-date or generic answers, even when the user expects a specific, current response • Mistral-7B-v0.1: a 7B model, that implements grouped-query and sliding window attention, Rotary Position Embedding, that can handle context of arbitrary size [23]. • Mixtral-8x7B-Instruct-v0.1: a 46.7B mixture of experts model, trained on instructions in English, French, Italian, German and Spanish, with maximum context length of 32k [24].

Retrieval-Augmented Generation or RAG has recently

emerged as a paradigm to address such challenges [19].

In particular, RAG combines a language model with an information retrieval system to dynamically fetch relevant external information to enhance the model’s responses, by encoding the user’s question into a dense representation, and retrieving passages relevant to the Domain experts qualitatively evaluated the perforquestion from an indexed data source, adding this infor- mances of the final models obtained from the diferent mation to the LLM prompt. Diferent studies have shown pre-trained LLMs. that RAG enhances the quality of the generation process, leading to higher accuracy, better robustness, reduced 2.1. Unsupervised Domain-Adaptation hallucinations, higher interpretability, and even the possibility to perform open-domain QA just by updating The first step in the procedure was the domain-adaptation the knowledge-base [20, 21]. RAG also ofers a balanced of the model on legal text. We collected the PDFs of the approach in terms of customization and resource require- regional laws of Emilia-Romagna, as well as the relative ments, being more flexible and cost-efective than full implementing acts at the regional level (e.g. "atto del ifne-tuning, although still requiring labeled data and a dirigente" and "atto di giunta") and the available reports supervised training phase. on the expected and measured impact of a given law (e.g.

In this work we present SAVIA, a project developed by "clausola valutativa", "ex-ante and "ex-post" reports). We CINECA and Assemblea Legislativa of Emilia-Romagna. split the legal documents in chunks, and we implemented This project, started in Autumn 2023 and expected to end a cleaning pipeline to remove typos, bad characters, and in march 2025, has the goal of creating a model capable irrelevant parts of the documents such as headers and of answering questions on the Region’s law and their footers. We also added mii-llm/gazzetta-uficiale [ 25] in respective implementing acts, as well as on related "ex- the training dataset, given the afinity of this dataset to ante" and "ex-post" reports of the laws’ impact. In Section our application, both in language, semantics and type 2 we present the data used for this project and the work- of documents. We did not perform domain-adaptive tolfow that has been adopted. In Section 3, we describe the kenization [26], using instead the pre-trained models procedure and the details of the experiments and tests native tokenizers to tokenize the legal corpus. conducted, and in Section 4 we show the obtained results. Not all the three model under investigation underOur conclusion are then presented in Section 5. went domain adaptation. LLaMAntino-2-7b-hf-ITA and Mistral-7B-v0.1 were adapted, while Mixtral-8x7BInstruct-v0.1, after tests regarding its native capabilities 2. Methodology of producing adequate Italian legal text, has not been domain adapted.

2.2. Model Alignment on Instruction-Based dataset With the support of domain experts, we generated an

Q&A dataset mimicking diferent levels of domain language proficiency, ranging from questions that could by written by non-expert users, to the ones that may be asked by experts in the legal domain. We developed a semi-automatic procedure to further enrich this Q&A dataset, using legal documents metadata. The following is an example included in the instruction-based dataset: • Q: "Da quando è stata istituita la regione, quali normative sono state adottate per incentivare la partecipazione?" • A: "La prima legge regionale riguardante la partecipazione ad essere stata approvata è la legge numero 3 del 2010. In seguito, la legge numero 3

To obtain a model capable of understanding Italian lan

guage in the law domain and responding to questions related to laws enacted in the Emilia-Romagna region, we followed a multi-step approach. We started from an open-source LLM and adapted it to the legal language through unsupervised domain adaptation (Section 2.1). The resulting domain-adapted model was then fine-tuned for question-answering (Q&A) on an instruction-based dataset prepared by domain experts for this purpose (Section 2.2). Finally, we implemented a domain-adapted retrieval model (Section 2.3) to enrich the answers with relevant information from the law corpus.

The full workflow was reproduced starting from different open-source LLMs: • LLaMAntino-2-7b-hf-ITA: a 7B model, based on LLama-2, specifically fine-tuned for the Italian language [22]. del 2010 è stata abolita e sostituita con la legge regionale numero 15 del 2018.",

To fine-tune the domain-adapted LLMs, we used the

instruction-based dataset prepared by the domain experts. For the loss function computation, we removed the portion of the text containing the prompt, as in many cases the prompt added by the RAG module can account for up to 50% of the total text length. This approach helped to optimize the training process more efectively.

4. Results

2.3. Domain-Adapted Retrieval Model To evaluate the quality and proceed to select the candidate for instruction-based fine-tuning, all domainTo enrich the user’s question with relevant information adapted models were evaluated using the perplexity metfrom the legal documents database, we develop a retrieval ric (PPL) on an held-out evaluation dataset based on laws. module based on Semantic Similarity search technique. The metric is reported in Table 1.

We used a Sentence-BERT model [27] to populate a vec- Three diferent domain experts (lawmakers of Assemtor store with embedding generated from the legal docu- blea Legislativa) were asked to evaluate the answers genments text chunks. The content similar to a user’s ques- erated by the final instruction fine-tuned models to a set tion is retrieved using the semantic search library FAISS of 25 questions. The qualitative analysis of the experts [28, 29]. reported that, in general, the answers provided by the LLaMAntino-based model were considered too short and 3. Experiment dry. The answers provided by the Mixtral-based model were considered the most complete, clear and satisfactory The project has been carried out exploiting the compu- in terms of quality of the used specific words. Below we tational resources of the supercomputer LEONARDO, report an example of answers provided by the diferent hosted by CINECA. Each node in the booster partition is ifnal models on a given question. For context, we also equipped with four NVidia A100 SXM6 64GB GPUs and include the answer of chatGPT (3.5) to the same question. a single 32-cores Intel Ice Lake CPU.

For all models, only data parallelism has been employed, given the fact that all these models could adequately fit in the VRAM of the GPUs at our disposal. For the same reason, LLaMAntino-2-7b-hf-ITA and Mistral7B-v0.1 have not been quantized during domain adaptation and instruction fine-tuning, opting to preserve the weights’ precision. Mixtral-8x7B-Instruct-v0.1 underwent 4-bit quantization instead [30], due to its size.

For domain adaptation and instruction fine-tuning, we applied LoRA adapters on Q, K, V layers of the models [15]. The training procedure for the models under study were the following: • pre-trained LLMs causal Language Modelling on the legal text chunks. This has been performed on LLaMAntino-2-7b-hf-ITA and Mistral7B-v0.1 and needed for each model, on average, 400 GPU hours (approximately 4 days on a single LEONARDO booster node) training for four epochs; • model alignment on domain adapted

LLaMAntino-2-7b-hf-ITA and Mistral-7B-v0.1, and on base pre-trained Mixtral-8x7B-Instructv0.1, on the QA dataset. This step required approximately 96 GPU hours, or 24 node hours (4 GPUs per node), to complete a 12 epochs training run, on a single LEONARDO node.

• Question: Sul tema della partecipazione, quali

leggi sono state fatte in Emilia-Romagna? • Answer of Mixtral-8x7B-Instruct-v0.1 fine-tuned :

La prima legge regionale approvata in tema di partecipazione è la legge regionale 9 febbraio 2010, n. 3, intitolata "NORME PER LA DEFINIZIONE, RIORDINO E PROMOZIONE DELLE PROCEDURE DI CONSULTAZIONE E PARTECIPAZIONE ALLA ELABORAZIONE DELLE POLITICHE REGIONALI E LOCALI".

Questa legge è stata successivamente abrogata e sostituita con la legge regionale 22 ottobre 2018, n. 15, denominata "LEGGE SULLA PARTECIPAZIONE ALL’ELABORAZIONE DELLE POLITICHE PUBBLICHE. ABROGAZIONE DELLA LEGGE REGIONALE 9 FEBBRAIO 2010, N. 3". La legge regionale attualmente in vigore in materia di partecipazione è la legge regionale 22 ottobre 2018, n. 15. • Answer of LLaMAntino-2-7b-hf-ITA domainadapted + instruction fine-tuned : - Legge Regionale 9 febbraio 2010, n. 3 - Legge Regionale 22 ottobre 2018, n. 15 • Answer of Mistral-7B-v0.1 domain-adapted + instruction fine-tuned : In relazione alla partecipazione, le leggi regionali sono state le seguenti: - Legge Regionale 3 del 2010 - Legge Regionale 15 sources.

del 2018 Given the scope and the length of the project, sev• Answer of chatGPT-3.5: Non ho accesso a in- eral improvements to the workflow are foreseen in the formazioni in tempo reale o a dati specifici near future, as well as the possibility to test with more riguardanti leggi promulgate dopo gennaio 2022. pre-trained open source models, for example new ItalianTuttavia, posso fornirti un’informazione generale native models that will be developed in the near future sul tema della partecipazione politica in Italia. In and domain adaptation of Mixture-of-Experts models Italia, molte regioni hanno promulgato leggi e (such as Mixtral-8x7B-v0.1). Our future work will also normative per favorire la partecipazione dei cit- focus on further improving the retrieval module with bettadini alla vita politica e amministrativa. Queste ter embedding models, and on applying more powerful leggi possono riguardare vari aspetti, come la con- techniques to train the LLMs, such as Direct Preference sultazione pubblica su decisioni amministrative, Optimization (DPO, [31]) and Reinforcement Learning la promozione della trasparenza nelle istituzioni, from Human Feedback (RLHF, [32]). o l’istituzione di forme di democrazia partecipativa a livello locale. Per quanto riguarda l’Emilia- Acknowledgments Romagna, una delle regioni più attive nel promuovere la partecipazione cittadina, potrebbero essere We are extremely grateful to the President of Assemstate adottate diverse leggi e iniziative in questo blea Legislativa Emilia-Romagna, Emma Petitti, for the ambito. Alcune di queste potrebbero riguardare long-eyed vision that created the conditions to launch la partecipazione ai processi decisionali locali, la the project – and to the Director General of Assemblea promozione della trasparenza e dell’accesso alle Legislativa Emilia-Romagna, Leonardo Draghetti, to set informazioni pubbliche, o la creazione di stru- strategically the project and ensure the necessary human menti e piattaforme per coinvolgere attivamente and material resources. i cittadini nelle decisioni che li riguardano. Per Besides, this endeavour would not have been possible ottenere informazioni specifiche sulle leggi rela- without the commitment of the President of CINECA, tive alla partecipazione in Emilia-Romagna dopo Francesco Ubertini, and of the Director of the Suil 2022, ti consiglio di consultare le fonti uficiali percomupting applications and innovation Director of della Regione Emilia-Romagna, come il sito web CINECA, Sanzio Bassini. istituzionale o i comunicati stampa delle autorità Special thanks goes to Giovanna Favero of Assemblea regionali. In alternativa, potresti contattare di- Legislativa Emilia-Romagna for her eforts in making rettamente gli ufici regionali competenti per ot- available laws, implementing acts, as well as related ”extenere informazioni aggiornate sulle leggi e le ante” and ”ex-post” reports. iniziative in materia di partecipazione politica e amministrativa.

5. Conclusions We explored diferent approaches to adapt open-source

LLMs for question-answering on the Emilia-Romagna law corpus. We adapted the diferent LLMs on a corpus composed of the Emilia-Romagna regional laws and the relative implementing acts, and we further refined the domain-adapted models on a custom QA dataset provided by domain experts. Finally, we exploited RAG to enrich the user’s question with relevant contextual information extracted from the law database.

We experimented with diferent open-source LLMs, such as Mistral-7B-v0.1, LLaMAntino-2-7b-hf-ITA, Mixtral-8x7B-Instruct-v0.1. Our results show that domain-adapted LLMs that are able to answer specific domain questions can be a helpful tool to support decisionmaking in specialized fields such as the legal domain, that often need to retrieve exact, concise and easy-tounderstand information from large and unstructured data [5] H. Touvron, L. Martin, K. Stone, P. Al- [17] OPENAI, Introducing chatgpt, Tech. rep., OpenAI bert, et al., Llama 2: Open Foundation and (2022).

Fine-Tuned Chat Models, arXiv e-prints [18] L. Huang, W. Yu, W. Ma, W. Zhong, et (2023) arXiv:2307.09288arXiv:2307.09288, al., A Survey on Hallucination in Large Landoi:10.48550/arXiv.2307.09288. guage Models: Principles, Taxonomy, Chal[6] B. Rozière, J. Gehring, F. Gloeckle, S. Sootla, lenges, and Open Questions, arXiv e-prints et al., Code Llama: Open Foundation (2023) arXiv:2311.05232arXiv:2311.05232, doi: Models for Code, arXiv e-prints (2023) 10.48550/arXiv.2311.05232. arXiv:2308.12950arXiv:2308.12950, [19] P. Lewis, E. Perez, A. Piktus, F. Petroni, doi:10.48550/arXiv.2308.12950. et al., Retrieval-Augmented Generation for [7] D. Narayanan, M. Shoeybi, J. Casper, Knowledge-Intensive NLP Tasks, arXiv e-prints P. LeGresley, et al., Eficient Large-Scale (2020) arXiv:2005.11401arXiv:2005.11401, Language Model Training on GPU Clus- doi:10.48550/arXiv.2005.11401. ters Using Megatron-LM, arXiv e-prints [20] S. Siriwardhana, R. Weerasekera, E. Wen, (2021) arXiv:2104.04473arXiv:2104.04473, T. Kaluarachchi, et al., Improving the Dodoi:10.48550/arXiv.2104.04473. main Adaptation of Retrieval Augmented [8] A. Ramesh, M. Pavlov, G. Goh, S. Gray, C. Voss, Generation (RAG) Models for Open DoA. Radford, M. Chen, I. Sutskever, Zero-Shot main Question Answering, arXiv e-prints Text-to-Image Generation, arXiv e-prints (2022) arXiv:2210.02627arXiv:2210.02627, (2021) arXiv:2102.12092arXiv:2102.12092, doi:10.48550/arXiv.2210.02627. doi:10.48550/arXiv.2102.12092. [21] P. Zhao, H. Zhang, Q. Yu, Z. Wang, et al., [9] A. Ramesh, P. Dhariwal, A. Nichol, C. Chu, Retrieval-Augmented Generation for AIM. Chen, Hierarchical Text-Conditional Image Generated Content: A Survey, arXiv e-prints Generation with CLIP Latents, arXiv e-prints (2024) arXiv:2402.19473arXiv:2402.19473, (2022) arXiv:2204.06125arXiv:2204.06125, doi: doi:10.48550/arXiv.2402.19473. 10.48550/arXiv.2204.06125. [22] P. Basile, E. Musacchio, M. Polignano, L. Sicil[10] Z. Shi, X. Zhou, X. Qiu, X. Zhu, Im- iani, G. Fiameni, G. Semeraro, LLaMAntino: proving Image Captioning with Better LLaMA 2 Models for Efective Text GenUse of Captions, arXiv e-prints (2020) eration in Italian Language, arXiv e-prints arXiv:2006.11807arXiv:2006.11807, (2023) arXiv:2312.09993arXiv:2312.09993, doi:10.48550/arXiv.2006.11807. doi:10.48550/arXiv.2312.09993. [11] R. Rombach, A. Blattmann, D. Lorenz, P. Esser, [23] A. Q. Jiang, A. Sablayrolles, A. Mensch, B. Ommer, High-Resolution Image Synthesis C. Bamford, et al., Mistral 7B, arXiv e-prints with Latent Difusion Models, arXiv e-prints (2023) arXiv:2310.06825arXiv:2310.06825, (2021) arXiv:2112.10752arXiv:2112.10752, doi: doi:10.48550/arXiv.2310.06825. 10.48550/arXiv.2112.10752. [24] M. A. team, Mixtral of experts, Tech. rep., Mistral [12] OpenAI, Video generation models as world simula- AI (2023).

tors, Tech. rep., OpenAI (2024). [25] E. Federici, M. Ferraretto, N. Landro, Gazzetta [13] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkor- Uficiale: A dataset of legislative texts, public and eit, L. Jones, A. N. Gomez, L. Kaiser, I. Polo- private acts (2024). sukhin, Attention Is All You Need, arXiv e-prints URL https://huggingface.co/datasets/mii-llm/ (2017) arXiv:1706.03762arXiv:1706.03762, doi: gazzetta-uficiale 10.48550/arXiv.1706.03762. [26] M. Liu, T.-D. Ene, R. Kirby, C. Cheng, et [14] E. J. Hu, Y. Shen, P. Wallis, Z. Allen-Zhu, Y. Li, al., ChipNeMo: Domain-Adapted LLMs S. Wang, L. Wang, W. Chen, LoRA: Low-Rank Adap- for Chip Design, arXiv e-prints (2023) tation of Large Language Models, arXiv e-prints arXiv:2311.00176arXiv:2311.00176, (2021) arXiv:2106.09685arXiv:2106.09685, doi: doi:10.48550/arXiv.2311.00176. 10.48550/arXiv.2106.09685. [27] N. Reimers, I. Gurevych, Sentence-BERT: [15] T. Dettmers, A. Pagnoni, A. Holtzman, Sentence Embeddings using Siamese L. Zettlemoyer, QLoRA: Eficient Finetun- BERT-Networks, arXiv e-prints (2019) ing of Quantized LLMs, arXiv e-prints arXiv:1908.10084arXiv:1908.10084, (2023) arXiv:2305.14314arXiv:2305.14314, doi:10.48550/arXiv.1908.10084. doi:10.48550/arXiv.2305.14314. [28] J. Johnson, M. Douze, H. Jégou, Billion-scale simi[16] H. Face, Hugging face datasets (2016). larity search with GPUs, IEEE Transactions on Big

URL https://huggingface.co/datasets Data 7 ( 3 ) (2019) 535–547. [29] M. Douze, A. Guzhva, C. Deng, J. Johnson, G. Szilvasy, P.-E. Mazaré, M. Lomeli, L. Hosseini, H. Jégou,

The faiss library (2024). arXiv:2401.08281. [30] H. Face, bitsandbytes, Tech. rep., Hugging Face

(2023). [31] R. Rafailov, A. Sharma, E. Mitchell, S. Ermon, C. D. Manning, C. Finn, Direct Preference Optimization: Your Language Model is Secretly a Reward Model, arXiv e-prints (2023) arXiv:2305.18290arXiv:2305.18290, doi: 10.48550/arXiv.2305.18290. [32] L. Ouyang, J. Wu, X. Jiang, D. Almeida, et al., Training language models to follow instructions with human feedback, arXiv e-prints (2022) arXiv:2203.02155arXiv:2203.02155, doi: 10.48550/arXiv.2203.02155.

[1] T. B. Brown , B.

Mann , N.

Ryder , M.

Subbiah , et al., Language Models are FewShot Learners , arXiv e-prints ( 2020 ) arXiv: 2005 .14165arXiv: 2005 .14165, doi:10.48550/arXiv. 2005 . 14165 .

[2]

Chen ,

Tworek ,

Jun ,

Yuan , et l., Evaluating Large Language Models Trained on Code , arXiv e-prints ( 2021 ) arXiv: 2107 .03374arXiv: 2107 .03374, doi:10.48550/arXiv.2107.03374.

[3] OpenAI , J.

Achiam , S.

Adler , S.

Agarwal , et al., GPT-4 Technical Report , arXiv e-prints ( 2023 ) arXiv: 2303 .08774arXiv: 2303 .08774, doi: 10.48550/arXiv.2303.08774.

[4]

Touvron ,

Lavril ,

Izacard ,

Martinet , et al., LLaMA: Open and Eficient Foundation Language Models , arXiv e-prints ( 2023 ) arXiv: 2302 .13971arXiv: 2302 .13971, doi:10.48550/arXiv.2302.13971.