<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">ITALIAN-LEGAL-BERT: A Pre-trained Transformer Language Model for Italian Law</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Daniele</forename><surname>Licari</surname></persName>
							<email>d.licari@santannapisa.it</email>
							<affiliation key="aff0">
								<orgName type="department" key="dep1">EMbeDS</orgName>
								<orgName type="department" key="dep2">Sant&apos;Anna School of Advanced Studies</orgName>
								<address>
									<postCode>56127</postCode>
									<settlement>Pisa</settlement>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Giovanni</forename><surname>Comandè</surname></persName>
							<email>g.comande@santannapisa.it</email>
							<affiliation key="aff0">
								<orgName type="department" key="dep1">EMbeDS</orgName>
								<orgName type="department" key="dep2">Sant&apos;Anna School of Advanced Studies</orgName>
								<address>
									<postCode>56127</postCode>
									<settlement>Pisa</settlement>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">ITALIAN-LEGAL-BERT: A Pre-trained Transformer Language Model for Italian Law</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">6C281082B440B7B089AE7BB2B0B2D273</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-25T04:55+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Legal artificial intelligence</term>
					<term>Pre-trained language model</term>
					<term>Italian Legal BERT</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>The state of the art in natural language processing is based on transformer models that are pre-trained on general knowledge and enable efficient transfer learning in a wide variety of downstream tasks even with limited data sets. However, these models significantly decrease performance when operating in specific and sectoral domains. This is problematic in the Italian legal context, as there are many discrepancies between the language found in generic open source corpora (e.g., Wikipedia and news articles) and legal language, which can be cryptic, Latin-based, and domain idiolectal formulas.</p><p>In this paper, we introduce the ITALIAN-LEGAL-BERT model with additional pre-training of the Italian BERT model on Italian civil law corpora. It achieves better results than the 'general-purpose' Italian BERT in different domain-specific tasks.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>In many domains, specialized models performed better than pre-trained models on general domains <ref type="bibr">[1,</ref><ref type="bibr">2,</ref><ref type="bibr">3,</ref><ref type="bibr">4,</ref><ref type="bibr">5]</ref>. In general, the more semantically distant a domain-specific language is from the common language than the greater the advantages of using specialized models, especially in complex tasks.</p><p>In the Italian legal context, the discrepancy between specific language and general language is even more pronounced. The Italian legal language has its unavoidable complexity, like all technical languages, but it is made even more obscure by useless stylistic expedients that often forcibly show a continuity with the languages of the past (Latin or old Italian). The full understanding of judicial texts is the exclusive prerogative of domain experts. It contains technicalities with specific and unambiguous meanings ("contumacia", "anticresi", "anatocismo", "sinallagma"). It also makes extensive use of terms in general use but often employed with their own and specific meanings, if not entirely different from those in common use. For example, "nullità", "annullabilità", "inefficacia", "inutilizzabilità", which outside of legal language are synonyms of annulment, denote entirely distinct and different concepts and situations. Such locutions as "buon padre di famiglia" (good family man) and "possessore di buona fede" (possessor of good faith) indicate different concepts from the language of common use <ref type="bibr" target="#b29">[6]</ref>.</p><p>Chalkidis et al. <ref type="bibr" target="#b30">[7]</ref> developed the first transformers-based model for the English legal domain (LEGAL-BERT) by improving the performance of the general-purpose model (BERT-BASE) in several prediction tasks. The basic idea is that a model with legal domain knowledge can classify legal documents better than a model with general knowledge.</p><p>Taking inspiration from LEGAL-BERT, we report on the development of the ITALIAN-LEGAL-BERT model capable of understanding the semantic meaning of Italian legal texts by additional pre-training of ITALIAN XXL BERT (available on hunggingface hub <ref type="bibr" target="#b31">[8]</ref>) on Italian civil law corpora.</p><p>In this work, we make the following contributions:</p><p>1. We publicly release <ref type="foot" target="#foot_0">1</ref> ITALIAN-LEGAL-BERT to assist Italian legal NLP research. It is, to the best of our knowledge, the first pre-trained language model further trained on a large corpus of Italian civil cases. 2. We demonstrate that ITALIAN-LEGAL-BERT outperforms the generalized equivalent in terms of perplexity (PPT) and end results in downstream tasks such as sequence classification, semantic similarity, and named entities recognition in the Italian legal domain. 3. We also evaluated the model on anonymized datasets to explore whether it is biased toward demographic information and personal data.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Related Work</head><p>The legal writing system differs greatly from generic texts with many domain-specific peculiarities. Some researchers demonstrated that the use of domain-specific pre-trained models can improve the performance of downstream tasks in the legal domain. Chalkidis et al. <ref type="bibr" target="#b30">[7]</ref> proposed the LEGAL-BERT model pre-trained from scratch on 11.5 GB of legal texts and its variant further pretraining BERT-base on legal corpora. Their experiments indicated more substantial improvement in the most challenging end-task (i.e., multi-label classification in ECHR-CASES and contract header, lease details in CONTRACTS-NER) where in-domain knowledge is more important. In addition, no significant differences were found in performance between the two LEGAL-BERT variants.</p><p>Similar evidence is reported by Zheng et al. <ref type="bibr" target="#b32">[9]</ref>. They also trained LEGAL-BERT models both with additional pre-training from the BERT base and with pre-training from scratch using a 37GB legal text collection. They compared their LEGAL-BERT and BERT-Base models on different downstream NLP tasks with different difficulties and domain specificity. They suggest using domain-specific pre-trained models for highly difficult legal tasks. They performed better than BERT-base in complex downstream tasks such as identifying whether contract terms are potentially unfair <ref type="bibr" target="#b33">[10]</ref>. In contrast, additional domain pretraining adds little value to simpler tasks compared to BERT.</p><p>The recent works of Zhang et al <ref type="bibr" target="#b34">[11,</ref><ref type="bibr" target="#b35">12]</ref> on legal argument mining confirm this trend. Domain-specific BERT variants have achieved strong performance in many tasks. No significant differences were found between the two different methods of domain adaptation.</p><p>Success in this area encouraged researchers to create pre-trained language models on legal corpora in different languages <ref type="bibr" target="#b36">[13]</ref>. Masala et al 2021 <ref type="bibr" target="#b37">[14]</ref> released the jurBERT model pretrained on a large Romanian legal corpus. It outperformed several strong baselines for legal judgment prediction. In the same year, Douka et al <ref type="bibr" target="#b38">[15]</ref> created a language model adapted to French legal text demonstrating that their model works betters in the French legal domain than their generalized equivalents. In China, researchers <ref type="bibr" target="#b39">[16]</ref> have improved many predictive tasks on long Chinese legal documents through a pre-trained language model on millions of documents published by the Chinese government.</p><p>In Italy, A. Tagarelli and A. Simeri 2022 <ref type="bibr" target="#b40">[17]</ref> proposed LamBERTa models for retrieving law articles, developing a BERT further pre-trained on the Italian civil code (ICC, few megabytes of data). Their model outperformed the "predecessors" of BERT text classification models (BiLSTM, TextCNN, TextRCNN, Seq2Seq, Transformer) for prediction tasks on ICC articles. Unfortunately, they did not provide a direct comparison with the Italian BERT model on which the domain adaptation was performed. Therefore, it was not possible to evaluate the advantages of domain fitting of the BERT model over the equivalent generalized model in the reported downstream tasks.</p><p>The work cited above differs from ours in terms of the reference corpus, problems addressed, and analysis of results. First, our model was trained using a large amount of decrees, ordinances, and judgments of Italian courts. They may include, in addition to the cited laws of the civil code, judge's reasons, facts, decisions, reasons, proposals of the parties, medico-legal information, legal rules, verified evidence, witnesses, etc. Second, this variety of information and the size of the training dataset allowed us to create a language model that better represents the Italian legal context by capturing the complex semantic interactions between facts, reasons, and laws. Therefore, our model can be applied to more complex general tasks, such as identifying rhetorical roles, retrieving similar cases, extracting arguments, argument mining, legal reading comprehension, and legal question answering. Third, our analysis focused on directly comparing the generalized Italian BERT model and the adapted model on the legal domain ITALIAN-LEGAL-BERT, to assess the improvements achieved in several downstream tasks. Finally, our model was shared on the Huggingface platform, to maximize usability and make a concrete contribution to the growth of NLP applications in the Italian legal context.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Italian Legal BERT</head><p>Background. BERT (Bidirectional Encoder Representations from Transformers <ref type="bibr" target="#b41">[18]</ref>) is a contextual word embedding model using the transformers architecture <ref type="bibr" target="#b42">[19]</ref> that can create context-sensitive embedding for each word in a given sentence, which will then be used for downstream tasks. BERT can be embedded in a downstream task and is developed as a taskspecific integrated architecture.</p><p>Italian BERT. The Italian XXL BERT model (cased, 12-layer, 768-hidden, 12-heads, 110M parameters) has the Bidirectional Encoder Representations from Transformers architecture and has been trained on large Italian corpora 81 GB derived from Wikipedia in Italian, various texts from the OPUS corpora collection (opus.nlpl.eu), and data from the Italian part of the OSCAR corpus (oscar-corpus.com). It is available on the Huggingface model hub <ref type="bibr" target="#b31">[8]</ref> and trained by the Training procedure. We initialized ITALIAN-LEGAL-BERT with ITALIAN XXL BERT and pretrained for an additional 4 epochs on 3.7 GB of text from the National Jurisprudential Archive using the Huggingface PyTorch-Transformers library <ref type="bibr" target="#b31">[8]</ref>. We used BERT architecture with a language modeling head on top, AdamW Optimizer, initial learning rate 5e-5 (with linear learning rate decay, ends at 2.525e-9), sequence length 512, batch size 10 (imposed by GPU capacity), 8.4 million training steps, device 1*GPU V100 16GB. More details on the hyperparameters we consider for each training phase can be found in the appendix.</p><p>Training Dataset. National Jurisprudential Archive (Archivio Giurisprudenziale Nazionale, pst.giustizia.it) is a public repository containing millions of legal documents (decrees, orders, and civil judgments) from Italian courts and courts of appeal. We downloaded about 235,000 documents as PDF files. The documents were converted to plain text using the Tika framework <ref type="bibr" target="#b43">[20]</ref>.</p><p>Preprocessing Dataset. We preprocessed the case law corpus with some cleaning functions. We compacted whitespace and new lines using a regular expression. The sentence segmentation process was customized by adding new tokenization rules to the spaCy model for the Italian language. The added exceptions concern abbreviations and acronyms used in Italian legal texts <ref type="foot" target="#foot_1">2</ref> . Segmented sentences were cleaned up by removing all special characters through an additional expression rule. The final corpus contains 21,004,500 sentences and 498,002,402 words (3.7 GB). The final model input was created using the Italian BERT tokenizer on the corpus sentences, truncating them to the maximum length (512 tokens).</p><p>Evaluation Dataset. We downloaded an additional 20,000 civil cases from the National Jurisprudential Archive. We applied the same preprocessing procedure as the training set to create a corpus containing 566,000 sentences and 17,936,466 words. In order to evaluate the performance in the criminal context we have also downloaded 21,000 criminal cases from italgiureweb (italgiure.giustizia.it) corpus containing 702,677 sentences and 20,164,194 words. Finally, we applied random masking (15% tokens) to the sentences in both datasets.</p><p>MLM Evaluation. Perplexity (PPL) is one of the most common metrics for evaluating language models. It is the exponential of the cross-entropy loss, a lower perplexity indicates a better model. The perplexity for the MLM objective is computed to make predictions for the masked tokens (which represent 15% of the total here) while having access to the rest of the tokens.</p><p>The results in Table <ref type="table" target="#tab_0">1</ref> showed that ITALIAN-LEGAL-BERT dropped perplexity by 18.2% in civil cases and by 15.4% in criminal cases with respect to Italian XXL BERT. Lower perplexity scores on criminal cases could indicate greater use of commonly used notions than on civil cases. Fill Mask. A further qualitative investigation was conducted by asking the judges for some domain sentences and making an inference about a mask word contained in the sentence. We used the mask filling pipeline of the Hugging Face Transformers library to return the top 5 suggestions for the masked word. Tab 2 reports the results on Italian BERT and ITALIAN-LEGAL-BERT models, the strikethrough words have been masked to be predicted by the models.</p><p>This analysis helps us to better study the implicit knowledge that the ITALIAN-LEGAL-BERT model has accumulated during pre-training. As can be seen in Table <ref type="table" target="#tab_1">2</ref>, the correct word always appears in the top three in the inference made with the ITALIAN-LEGAL-BERT and indicates that our model succeeds in capturing the specific context better than the general model.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Downstream evaluation task</head><p>The Italian BERT and ITALIAN-LEGAL-BERT models were evaluated and compared on three domain-specific downstream tasks. In the first task, we trained the models with an additional sequence tagging layer on the top using spaCy <ref type="bibr" target="#b44">[21]</ref> to recognize name/role of actors involved in the trial. For the second task, we trained the models with a sequence classification head (a linear layer on top of the pooled output) for the classification of sentence type. In the last downstream task, we tested the models on textual semantic similarity using sentence embeddings (mean pooling on the last layer of the models) and cosine similarity.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.">Named Entity Recognition</head><p>We trained and evaluated the ITALIAN-LEGAL-BERT and Italian-BERT models on a Named Entity Recognition (NER) task to identify named entities on the type of person found in the text of judgments. We defined 7 entity types, as shown in Table 3 Dataset. We selected 118 judgments from the civil law database of the Court of Genoa with which we have a scientific collaboration agreement. Given the significant experience of our research group on these issues, the selected judgments are all those of personal injury (No. 59) contained in the database, and an equal number of family judgments was selected stratified to the text's length. Next, we converted the PDF files to plain text using Tika <ref type="bibr" target="#b43">[20]</ref>, applied some text cleaning functions (removal of multiple blank lines and extra spaces), and converted the texts to an annotable data structure (jsonl format) to import them into the Doccano annotation tool <ref type="bibr" target="#b45">[22]</ref>. We set up and used the Doccano tool for quick and easy manual annotation of texts with the 7 predefined entities. The experts found and annotated 6,355 entities; Table <ref type="table" target="#tab_2">3</ref> shows the distribution of entities on the dataset. Finally, the dataset was split 80% for model training (10% of the training set for validation) and 20% for model evaluation in a stratified to preserve the distribution of entities in the two subsets.</p><p>Model architecture. We created our NER models using spaCy's v3.2 Named Entity Recognition system <ref type="bibr" target="#b44">[21]</ref>. The model architecture consists of a two-tier pipeline: the contextual embedding layer and the transition-based chunking model <ref type="bibr" target="#b46">[23]</ref>. The first uses pre-trained language models to encode tokens into continuous vectors based on their context. The second predicts text structure by mapping it onto a set of state transitions. It uses the output (contextual word embeddings) from the previous step to incrementally construct states from the input sequence and assign them an entity label using a multilayer neural network. We trained and compared two spacy-based entity recognition pipelines using Italian BERT and ITALIAN-LEGAL-BERT as the contextual embedding layer.</p><p>Training procedure. We trained two named entity recognition pipelines, Italian BERT + spaCy's NER and ITALIAN-LEGAL-BERT+ spaCy's NER, using AdamW Optimizer, initial learning rate 5e-5 (with linearly decay), 20000 maximum number of steps, 250 warm-up steps, early stopping patience on the F1 validation score, and batch size 128 (see Table <ref type="table" target="#tab_0">10</ref> in the Appendix for more details).</p><p>Evaluation. We compared the two NER pipelines using the exact match criterion with gold-standard entities (both entity boundary and type are correct) in the test set. Precision, recall, and F-score are used to evaluate and compare the performance. The results in Table <ref type="table" target="#tab_3">4</ref> show that the NER pipeline with ITALIAN-LEGAL-BERT contextual encoder outperforms that with Italian BERT in recognizing most entities. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.">Sentence Classification</head><p>Unlike the English legal context, there are no public datasets on which to test models on downstream NLP tasks in the legal context. Then, we created a new benchmark dataset for sentence classification tasks. A common civil judgment has 5 basic parts:</p><p>1. INTRODUCTION: an indication of the judge who pronounced it; an indication of the parties and their lawyers; 2. CONCLUSION OF THE PARTIES: the conclusions of the prosecutor (if any) and those of the parties; 3. DEVELOPMENT OF THE TRIAL: summary of the appealed judgment and reasons of appeal; 4. REASON: the concise statement of the factual and legal reasons for the decision (the statement of reasons); 5. CONCLUSION: the decisional content of the judgment.</p><p>We want to evaluate the ITALIAN-LEGAL-BERT model on a sentence classification task by trying to predict the belonging section. Although this downstream task was created to benchmark it could have practical utility because Italian judgments do not follow a precise standard, often sections are merged or are identified in a variety of headers that making it difficult to apply rules based on regular expressions.</p><p>Benchmark Dataset. We randomly selected 6,190 sentences from documents with 5 sections (using regular expression) from Italian Civil Law DB (pst.giustizia.it) stratified on section length, Table <ref type="table" target="#tab_4">5</ref>.</p><p>Finally, the dataset was split 80% for training models and 20% for model evaluation in a stratified fashion on the section name to preserve the distribution of sentences across both subsets. The training set was further divided using its 10% for validation. Training procedure. We trained Italian BERT and ITALIAN-LEGAL-BERT models with a sequence classification head on top (a linear layer on top of the pooled output) using the same hyperparameters configuration for both (in Table <ref type="table" target="#tab_0">11</ref> in the Appendix). The final models were trained at the best epoch with a higher validation MCC (Matthew's correlation coefficient) score in the range of 1 to 7 epochs (5 was the best epoch for Italian BERT and 3 for ITALIAN-LEGAL-BERT).</p><p>Evaluation. We compared the results on the test set of the two models, Italian BERT and ITALIAN-LEGAL-BERT, trained with the same configuration (Table <ref type="table" target="#tab_0">11</ref>). Models' performance was evaluated on the F1 MACRO and MCC scores on the test set. The results in Table <ref type="table" target="#tab_5">6</ref> show that the pre-trained model on the Italian legal domain (0.89 F1, 0.83 MCC) outperforms the "general-purpose" models (0.869 F1, 0.806 MCC) in this sentence classification task. Model Bias. Similar to Chalkidis et al. <ref type="bibr" target="#b47">[24]</ref>, we investigated how sensitive our model is to personal data. The main information may concern "parties", "witnesses", "important companies", "identifiers", "dates" or "places". The purpose is to understand whether the model over-fits on these data and makes decisions based on demographic and personal information. E.g., 'Mario Rossi' is a judge then it is a 'Decision' sentence, or 'Daniele Licari' is a defendant then it is a 'Conclusion of the parties' sentence. The following experiments focused on the sensitivity of our models to such information by training and evaluating the models on an anonymized version of the dataset.</p><p>For entity recognition to be anonymized, we used the model from previous work ( <ref type="bibr" target="#b48">[25]</ref>) based on pre-trained Transformers embeddings and the transition-based chunking model of spaCy. It found 6,393 entities to be anonymized on the dataset (6,190 sentences). We applied two different anonymization strategies: OMISSIS and TAGGING. The OMISSIS strategy replaced Named Entities with a fixed value (e.g. "Daniele lives in Milan" -&gt; "OMISSIS lives in OMISSIS"). The TAGGING strategy replaced Named Entities with the entity name (e.g. "Daniele lives in Milan" -&gt; "PERSON lives in LOCATION).</p><p>The two versions of the anonymized dataset were used to train the two sentence classification models, with Italian BERT and ITALIAN-LEGAL-BERT, using the same configuration and training procedure performed on the raw data. Table <ref type="table" target="#tab_6">7</ref> shows the comparison of results on the classification models trained on raw and anonymized datasets. The results of the models on the anonymized dataset and the original dataset are very similar, which might indicate that personal data are not relevant for section prediction.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3.">Semantic Similarity</head><p>We tested the ability of the model on the task of determining whether two pieces of text are similar, in terms of meaning. The strong assumption is that two contiguous sentences within a specific section are semantically related and refer to the same context, instead, two sentences taken randomly from two different documents and different sections can refer to a different context.</p><p>Dataset. We built the dataset by taking, from a subset of 1,000 judgments from the Italian Civil Law DB, pairs of contiguous portions of the text (of 5 sentences) in the "CONCLUSION OF PARTIES" and "DEVELOPMENT OF PROCESS" sections and text pairs from two different documents and sections. We labeled as 'similar' the contiguous pairs from the same document and 'unsimilar' the pairs from different documents. The final dataset contains 2,000 text pairs (1,000 labeled as 'similar' and the other 1,000 as 'unsimilar'). The choice of taking similar sentences from the two sections was made on the basis that the "CONCLUSION OF THE PARTIES" and "DEVELOPMENT OF THE PROCESS" sections are more descriptive and with self-contained concepts than other sections such as 'REASON' or 'CONCLUSION' that contain many references to the previous sections.</p><p>Similarity Procedure. The semantic similarity between the text pairs in the dataset was evaluated using both the Italian BERT and ITALIAN-LEGAL-BERT models to obtain the context vectors of the two sentences to be compared (using mean pooling on the last layer) and, then, the cosine similarity for the similarity scores between the pair of vectors (𝑐𝑜𝑠(𝑥 𝑥 𝑥, 𝑦 𝑦 𝑦) = 𝑥 𝑥 𝑥•𝑦 𝑦 𝑦 ||𝑥 𝑥 𝑥||•||𝑦 𝑦 𝑦|| ). For each model, a similarity threshold was established to identify similar and non-similar texts. The Figure <ref type="figure" target="#fig_0">1</ref> shows the similarity scores distribution over the groups of 'similar' and 'unsimilar' pairs of sentences, calculated using the Italian BERT and ITALIAN-LEGAL-BERT models as contextual sentence encoders. Optimized threshold. A similarity threshold is a numerical value that is applied to the similarity scores to identify the two classes ('similar' and 'unsimilar'). Different thresholds produce different results in terms of precision, recall, and F1-score when compared to the annotated dataset. A threshold that is too low classifies all sentences as 'similar', and conversely, a value that is too high could lead to classifying all pairs of sentences as 'unsimilar'. The choice of a correct similarity threshold depends on the data under consideration and the specific vector space of a model. Therefore, we optimized its value independently on both models by selecting the best value that maximizes the F1-score on the dataset. The values tested are in the range of 0 to 1 with step 0.001. The experiments suggest 0.897 as the best threshold for Italian BERT and 0.981 for ITALIAN-LEGAL-BERT (the red dashed lines in Figure <ref type="figure" target="#fig_0">1</ref>).</p><p>Evaluation. The performance of text similarity classifications with optimized threshold was evaluated with precision, recall, and F1 score based on true labels. The experimental results, reported in Table <ref type="table" target="#tab_7">8</ref>, show that the ITALIAN-LEGAL-BERT model outperformed the Italian BERT model in this downstream semantic similarity task. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Limitations</head><p>The main limitations come from the limited computational resources with which our models were trained. We are aware that a larger batch size, extended parameters optimization, and a larger data set could lead to better results.</p><p>Another limitation concerns the use of a single data source. Unlike the English language, it is not easy to find large legal corpora on which to train domain-specific models. Although the dataset contains decrees, orders, and judgments from all Italian courts, we did not consider criminal law in our training. However, we have evaluated the perplexity of mask filling on more than 20,000 criminal cases obtaining results similar to the civil context. This suggests to us that the model might work well in the criminal context as well, but further investigation of downstream legal tasks is needed. In addition, although the model was evaluated on a different dataset of the pretraining data, the civil evaluation dataset could still contain some documents written by the judges themselves which could affect the gain of ITALIAN-LEGAL-BERT. We think it is a small gain since the criminal case evaluation dataset (written by different judges) is still significant compared to the generic Italian BERT model.</p><p>Moreover, the type of downstream task could be a limiting factor in model performance. ITALIAN LEGAL BERT is designed to improve current performance in complex Italian legal tasks, where domain knowledge is very important. As suggested by experiments on English Legal-BERT <ref type="bibr" target="#b32">[9]</ref>, using the model in simple downstream tasks may not lead to improvements over the model trained on general knowledge or even worsen performance.</p><p>Finally, a common limitation of all Deep Learning systems is that they are not easily interpreted and maintain biases in the data on which it was trained. In particular, biases in the data can lead the model to generate stereotypical or biased content. We explore if models are biased towards demographic and personal information via data anonymization, but the analysis depends on the specific downstream task and deserves further investigation.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.">Conclusion and Future Direction</head><p>In this article, we introduced ITALIAN-LEGAL-BERT which aims to improve the outcomes of downstream NLP tasks in the Italian legal domain and contribute to the advancement of NLP legal research, computational law, and legal technology applications. It is a pre-trained linguistic representation for Italian law based on ITALIAN BERT XXL with additional pretraining on 235,000 civil cases (domain-adaptive pretraining). We compared the ITALIAN-LEGAL-BERT and Italian BERT models on the downstream tasks of identifying named entities by person type, semantic similarity, and classifying rhetorical sentences by section class. We demonstrated that it can improve the performance of the 'general-purpose' model on downstream tasks in the Italian legal domain. In the future, we plan to exploit the ITALIAN-LEGAL-BERT's potential and test it on more complex tasks, such as rhetorical role identification (e.g. evidence, legal rule, reasoning, decision) <ref type="bibr" target="#b49">[26]</ref>, similar case retrieval, legal reading comprehension, and legal question answering. In addition, we are working to test it in combination with other deep learning architectures (LSTM, CNN) to achieve better results. Finally, we intend to release new versions of the ITALIAN-LEGAL-BERT pre-trained from scratch on the large Italian legal corpora.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Box plots showing the different range of semantic similarity scores over the groups of 'similar' and 'unsimilar' pairs of sentences using the Italian BERT and ITALIAN-LEGAL-BERT models. The red dashed line shows the optimized similarity threshold on the results of the two models.</figDesc><graphic coords="10,99.21,473.48,396.85,195.19" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1</head><label>1</label><figDesc>Italian Legal BERT and Italian XXL BERT Perplexity scores on evaluation datasets. Lower perplexity indicates better performance</figDesc><table><row><cell>Eval Dataset</cell><cell cols="2">N. Documents N. Sentences</cell><cell>Model</cell><cell>Perplexity</cell></row><row><cell>Civil cases (pst.giustizia.it)</cell><cell>20,000</cell><cell>566,000</cell><cell>ITALIAN-LEGAL-BERT Italian XXL BERT</cell><cell>8.9892 10.9891</cell></row><row><cell>Criminal cases (italgiureweb)</cell><cell>21,000</cell><cell>702,677</cell><cell>ITALIAN-LEGAL-BERT Italian XXL BERT</cell><cell>5.0518 6.0515</cell></row></table><note>MDZ Digital Library team at the Bavarian State Library.</note></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2</head><label>2</label><figDesc>Results of the Italian BERT and ITALIAN-LEGAL-BERT mask filling pipeline on the prediction of a single mask (strikethrough words). The probability of a specific token is reported in parentheses</figDesc><table><row><cell cols="2">Sentence (Mask is strikethrough) ITA BERT</cell><cell>ITA LEGAL BERT</cell></row><row><cell>Il padre può vedere il figlio a week-end alternati en: The father can see his son on alternate weekends</cell><cell>1. 'genitore' (53.61%) 2. 'padre' (27.70%) 3. 'papà' (6.81%) 4. 'marito' (2.19%) 5. 'proprietario' (0.62%)</cell><cell>1. 'padre' (99.24%) 2. 'genitore' (0.56%) 3. 'ricorrente' (0.05%) 4. 'resistente' (0.03%) 5. 'papà' (0.03%)</cell></row><row><cell>viene stabilita una collocazione paritetica dei figli en: an equal placement of the chil-dren is established.</cell><cell>1. 'garantita' (24.1%) 2. 'meno' (10.72%) 3. 'proposta' (6.66%) 4. 'stabilita' (4.52%) 5. 'assicurata' (4.09%)</cell><cell>1. 'prevista' (40.48%) 2. 'stabilita' (21.81%) 3. 'disposta' (12.74%) 4. 'assicurata' (6.32%) 5. 'garantita' (1.77%)</cell></row><row><cell>assegno di mantenimento compren-sivo di spese straordinarie en: maintenance allowance includ-ing extraordinary expenses.</cell><cell>1. '. ' (38.58%) 2. 'mediche' (17.01%) 3. ';' (6.62%) 4. 'legali' (4.55%) 5. 'generali' (3.35%)</cell><cell>1. 'straordinarie' (69.25%) 2. ':' (7.61%) 3. 'extra' (4.86%) 4. 'mediche' (4.30%) 5. '. ' (4.20%)</cell></row><row><cell>viene stabilito il mantenimento di-retto en: direct maintenance is estab-lished</cell><cell>1. 'trattamento' (8.58%) 2. 'prezzo' (7.43%) 3. 'contratto' (5.08%) 4. 'contributo' (4.23%) 5. 'lavoro' (4.06%)</cell><cell>1. 'pagamento' (48.93%) 2. 'versamento' (23.89%) 3. 'mantenimento' (5.20%) 4. 'trasferimento' (2.54%) 5. 'rimborso' (2.09%)</cell></row><row><cell>cambiamento di sesso senza oper-</cell><cell></cell><cell></cell></row><row><cell>azione chirurgica</cell><cell></cell><cell></cell></row><row><cell>en: sex change without surgery</cell><cell></cell><cell></cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 3</head><label>3</label><figDesc>Entity type description and their distribution on train and test set</figDesc><table><row><cell>Entity</cell><cell>Description</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_3"><head>Table 4</head><label>4</label><figDesc>Precision (P), recall (R) and F-score (F1) for The ITALIAN LEGAL BERT+Spacy NER and ITALIAN LEGAL BERT+Spacy NER models evaluated using exact match criterion on individual entities and macro-average on the test set. The Support is the number of samples in the different entity types.</figDesc><table><row><cell></cell><cell cols="6">ITALIAN LEGAL BERT+Spacy NER ITALIAN BERT+Spacy NER</cell><cell></cell></row><row><cell>Type</cell><cell>P</cell><cell>R</cell><cell>F1</cell><cell>P</cell><cell>R</cell><cell>F1</cell><cell>Support</cell></row><row><cell>Person</cell><cell cols="2">76.41 93.74</cell><cell>84.19</cell><cell cols="2">78.60 89.57</cell><cell>83.73</cell><cell>771</cell></row><row><cell>Person-Judge</cell><cell cols="2">97.53 98.75</cell><cell>98.14</cell><cell cols="2">95.24 100</cell><cell>97.56</cell><cell>80</cell></row><row><cell>Person-Lawyer</cell><cell cols="2">88.46 94.54</cell><cell>91.39</cell><cell cols="2">85.90 91.78</cell><cell>88.74</cell><cell>74</cell></row><row><cell>Person-Expert</cell><cell cols="2">73.08 79.17</cell><cell>76.00</cell><cell cols="2">66.33 79.17</cell><cell>70.37</cell><cell>24</cell></row><row><cell>Person-Witness</cell><cell cols="2">79.49 55.36</cell><cell>65.26</cell><cell cols="2">69.23 32.14</cell><cell>43.90</cell><cell>57</cell></row><row><cell>Person-Family</cell><cell cols="2">41.46 10.56</cell><cell>16.83</cell><cell cols="2">38.46 6.21</cell><cell>10.70</cell><cell>162</cell></row><row><cell cols="3">Person-Family-Children 60.00 47.37</cell><cell>52.94</cell><cell cols="2">46.41 74.74</cell><cell>57.26</cell><cell>95</cell></row><row><cell>Macro</cell><cell cols="2">73.78 68.50</cell><cell>69.25</cell><cell cols="2">68.17 67.66</cell><cell>64.61</cell><cell></cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_4"><head>Table 5</head><label>5</label><figDesc>Distribution of sentences over the 5 sections</figDesc><table><row><cell>SECTION NAME</cell><cell>N. SENTENCES</cell></row><row><cell>INTRODUCTION</cell><cell>560</cell></row><row><cell>CONCLUSION OF THE PARTIES</cell><cell>1,862</cell></row><row><cell>DEVELOPMENT OF THE TRIAL</cell><cell>949</cell></row><row><cell>REASON</cell><cell>1,810</cell></row><row><cell>CONCLUSION</cell><cell>1,009</cell></row><row><cell>TOTAL</cell><cell>6,190</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_5"><head>Table 6</head><label>6</label><figDesc>F1 and MCC for sentence classification using Italian BERT and ITALIAN-LEGAL-BERT models.</figDesc><table><row><cell>Model</cell><cell>F1</cell><cell>MCC</cell></row><row><cell>Italian BERT</cell><cell>0.869</cell><cell>0.806</cell></row><row><cell>ITALIAN-LEGAL-BERT</cell><cell>0.890</cell><cell>0.830</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_6"><head>Table 7</head><label>7</label><figDesc>F1 and MCC for sentence classification on raw and anonymized using Italian BERT and ITALIAN-LEGAL-BERT models.</figDesc><table><row><cell>Model</cell><cell>Anonymization Strategy</cell><cell>F1</cell><cell>MCC</cell></row><row><cell></cell><cell>NO</cell><cell>0.869</cell><cell>0.806</cell></row><row><cell>Italian BERT</cell><cell>OMISSIS</cell><cell>0.861</cell><cell>0.795</cell></row><row><cell></cell><cell>TAGGING</cell><cell>0.862</cell><cell>0.795</cell></row><row><cell></cell><cell>NO</cell><cell>0.890</cell><cell>0.830</cell></row><row><cell>Italian Legal BERT</cell><cell>OMISSIS</cell><cell>0.890</cell><cell>0.830</cell></row><row><cell></cell><cell>TAGGING</cell><cell>0.866</cell><cell>0.827</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_7"><head>Table 8</head><label>8</label><figDesc>Precision (P), recall (R) and F-score (F1) of text similarity classification task using ITALIAN LEGAL BERT and Italian LEGAL</figDesc><table><row><cell>Model</cell><cell>P</cell><cell>R</cell><cell>F1</cell></row><row><cell>Italian BERT</cell><cell cols="2">0.791 0.789</cell><cell>0.789</cell></row><row><cell cols="4">ITALIAN-LEGAL-BERT 0.825 0.822 0.822</cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">on huggingface.co/dlicari/Italian-Legal-BERT</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_1">The full list is available at https://huggingface.co/dlicari/Italian-Legal-BERT/blob/main/abbreviazioni.csv</note>
		</body>
		<back>
			<div type="annex">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A. Settings and the Hyperparameters</head></div>			</div>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title/>
	</analytic>
	<monogr>
		<title level="j">capelli</title>
		<imprint>
			<biblScope unit="volume">16</biblScope>
			<biblScope unit="page">54</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title/>
	</analytic>
	<monogr>
		<title level="j">colore</title>
		<imprint>
			<biblScope unit="issue">6</biblScope>
			<biblScope unit="page">17</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title/>
	</analytic>
	<monogr>
		<title level="j">peso</title>
		<imprint>
			<biblScope unit="issue">4</biblScope>
			<biblScope unit="page">48</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<monogr>
		<title level="m">&apos;genere&apos; (5.40%) 3</title>
				<imprint/>
	</monogr>
	<note>1.20%</note>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title/>
	</analytic>
	<monogr>
		<title level="j">profilo</title>
		<imprint>
			<biblScope unit="issue">1</biblScope>
			<date>01%</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title/>
	</analytic>
	<monogr>
		<title level="j">persona</title>
		<imprint>
			<biblScope unit="issue">0</biblScope>
			<biblScope unit="page">43</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<monogr>
		<title level="m">Il ricorrente ha chiesto revocarsi l&apos;obbligo di pagamento</title>
				<imprint/>
	</monogr>
	<note>The plaintiff requested that the payment obligation be revoked</note>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title/>
	</analytic>
	<monogr>
		<title level="j">Comune</title>
		<imprint>
			<date>11. 89%</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title/>
	</analytic>
	<monogr>
		<title level="j">giudice</title>
		<imprint>
			<biblScope unit="issue">9</biblScope>
			<date>17%</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title/>
	</analytic>
	<monogr>
		<title level="j">cittadino</title>
		<imprint>
			<biblScope unit="issue">4</biblScope>
			<date>70%</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title/>
	</analytic>
	<monogr>
		<title level="j">lavoratore</title>
		<imprint>
			<biblScope unit="issue">3</biblScope>
			<biblScope unit="page">17</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title/>
	</analytic>
	<monogr>
		<title level="j">sindaco</title>
		<imprint>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page">62</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title/>
	</analytic>
	<monogr>
		<title level="j">ricorrente</title>
		<imprint>
			<biblScope unit="volume">72</biblScope>
			<biblScope unit="page">64</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title/>
	</analytic>
	<monogr>
		<title level="j">convenuto</title>
		<imprint>
			<biblScope unit="issue">9</biblScope>
			<biblScope unit="page">64</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title/>
	</analytic>
	<monogr>
		<title level="j">resistente</title>
		<imprint>
			<biblScope unit="issue">3</biblScope>
			<date>99%</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title/>
	</analytic>
	<monogr>
		<title level="j">Ministero</title>
		<imprint>
			<biblScope unit="volume">2</biblScope>
			<biblScope unit="page">53</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Non avendo la Corte di merito valutato la prova en: Not having the Court of merit assessed the evidence 1</title>
	</analytic>
	<monogr>
		<title level="j">Commissione&apos;</title>
		<imprint>
			<biblScope unit="volume">41</biblScope>
			<biblScope unit="page">46</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title/>
	</analytic>
	<monogr>
		<title level="j">commissione</title>
		<imprint>
			<biblScope unit="volume">21</biblScope>
			<biblScope unit="page">54</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">&apos;classe&apos; (18.65%</title>
	</analytic>
	<monogr>
		<title level="j">valutazione</title>
		<imprint>
			<biblScope unit="issue">6</biblScope>
			<biblScope unit="page">84</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title/>
	</analytic>
	<monogr>
		<title level="j">Corte</title>
		<imprint>
			<date>56. 29%</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title/>
	</analytic>
	<monogr>
		<title level="j">corte</title>
		<imprint>
			<biblScope unit="page">26</biblScope>
			<date>23</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<analytic>
		<title/>
	</analytic>
	<monogr>
		<title level="j">sentenza</title>
		<imprint>
			<date>12. 49%</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<analytic>
		<title/>
	</analytic>
	<monogr>
		<title level="j">giurisprudenza</title>
		<imprint>
			<biblScope unit="issue">3</biblScope>
			<biblScope unit="page">87</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b23">
	<analytic>
		<title/>
	</analytic>
	<monogr>
		<title level="j">decisione</title>
		<imprint>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page">64</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b24">
	<analytic>
		<title level="a" type="main">BioBERT: a pre-trained biomedical language representation model for biomedical text mining</title>
		<author>
			<persName><forename type="first">J</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Yoon</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Kim</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Kim</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Kim</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">H</forename><surname>So</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Kang</surname></persName>
		</author>
		<idno type="DOI">10.1093/bioinformatics/btz682</idno>
		<idno>arXiv:</idno>
		<ptr target="1901.08746" />
	</analytic>
	<monogr>
		<title level="j">Bioinformatics</title>
		<imprint>
			<biblScope unit="page">682</biblScope>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b25">
	<analytic>
		<title level="a" type="main">Publicly Available Clinical BERT Embeddings</title>
		<author>
			<persName><forename type="first">E</forename><surname>Alsentzer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Murphy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Boag</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W.-H</forename><surname>Weng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Jindi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Naumann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Mcdermott</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/W19-1909</idno>
		<ptr target="https://aclanthology.org/W19-1909.doi:10.18653/v1/W19-1909" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2nd Clinical Natural Language Processing Workshop, Association for Computational Linguistics</title>
				<meeting>the 2nd Clinical Natural Language Processing Workshop, Association for Computational Linguistics<address><addrLine>Minneapolis, Minnesota, USA</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="72" to="78" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b26">
	<analytic>
		<title level="a" type="main">SciBERT: A Pretrained Language Model for Scientific Text</title>
		<author>
			<persName><forename type="first">I</forename><surname>Beltagy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Lo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Cohan</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/D19-1371</idno>
		<ptr target="https://aclanthology.org/D19-1371.doi:10.18653/v1/D19-1371" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Association for Computational Linguistics</title>
				<meeting>the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Association for Computational Linguistics<address><addrLine>Hong Kong, China</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="3615" to="3620" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b27">
	<analytic>
		<title level="a" type="main">HateBERT: Retraining BERT for abusive language detection in English</title>
		<author>
			<persName><forename type="first">T</forename><surname>Caselli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Basile</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Mitrović</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Granitzer</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2021.woah-1.3</idno>
		<ptr target="https://aclanthology.org/2021.woah-1.3.doi:10.18653/v1/2021.woah-1.3" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 5th Workshop on Online Abuse and Harms (WOAH 2021), Association for Computational Linguistics</title>
				<meeting>the 5th Workshop on Online Abuse and Harms (WOAH 2021), Association for Computational Linguistics</meeting>
		<imprint>
			<date type="published" when="2021">2021</date>
			<biblScope unit="page" from="17" to="25" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b28">
	<analytic>
		<title level="a" type="main">Alberto: Italian bert language understanding model for nlp challenging tasks based on tweets</title>
		<author>
			<persName><forename type="first">M</forename><surname>Polignano</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Basile</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Degemmis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Semeraro</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Basile</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">CLiC-it</title>
				<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b29">
	<analytic>
		<title level="a" type="main">Forte e chiaro: Il linguaggio del giudice</title>
		<author>
			<persName><forename type="first">M</forename><surname>Rosati</surname></persName>
		</author>
		<ptr target="https://www.uniba.it/ricerca/dipartimenti/sistemi-giuridici-ed-economici/edizioni-digitali/i-quaderni/Quaderni62017Triggiani.pdf" />
	</analytic>
	<monogr>
		<title level="m">LINGUAGGIO DEL PROCESSO</title>
				<imprint>
			<publisher>IL</publisher>
			<date type="published" when="2016">2016</date>
			<biblScope unit="page" from="115" to="119" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b30">
	<analytic>
		<title level="a" type="main">LEGAL-BERT: The Muppets straight out of Law School</title>
		<author>
			<persName><forename type="first">I</forename><surname>Chalkidis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Fergadiotis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Malakasiotis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Aletras</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Androutsopoulos</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2020.findings-emnlp.261</idno>
		<ptr target="https://aclanthology.org/2020.findings-emnlp.261.doi:10.18653/v1/2020.findings-emnlp.261" />
	</analytic>
	<monogr>
		<title level="m">Findings of the Association for Computational Linguistics: EMNLP 2020, Association for Computational Linguistics</title>
				<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="2898" to="2904" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b31">
	<analytic>
		<title level="a" type="main">Transformers: State-of-the-Art Natural Language Processing</title>
		<author>
			<persName><forename type="first">T</forename><surname>Wolf</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Debut</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Sanh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Chaumond</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Delangue</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Moi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Cistac</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Rault</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Louf</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Funtowicz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Davison</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Shleifer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Von Platen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Ma</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Jernite</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Plu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Xu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">Le</forename><surname>Scao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Gugger</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Drame</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><surname>Lhoest</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Rush</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2020.emnlp-demos.6</idno>
		<ptr target="https://aclanthology.org/2020.emnlp-demos.6.doi:10.18653/v1/2020.emnlp-demos.6" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, Association for Computational Linguistics</title>
				<meeting>the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, Association for Computational Linguistics</meeting>
		<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="38" to="45" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b32">
	<monogr>
		<title level="m" type="main">When Does Pretraining Help? Assessing Self-Supervised Learning for Law and the CaseHOLD Dataset</title>
		<author>
			<persName><forename type="first">L</forename><surname>Zheng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Guha</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">R</forename><surname>Anderson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Henderson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">E</forename><surname>Ho</surname></persName>
		</author>
		<idno>arXiv:</idno>
		<ptr target="2104.08671" />
		<imprint>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b33">
	<analytic>
		<title level="a" type="main">CLAUDETTE: an automated detector of potentially unfair clauses in online terms of service</title>
		<author>
			<persName><forename type="first">M</forename><surname>Lippi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Pałka</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Contissa</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Lagioia</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H.-W</forename><surname>Micklitz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Sartor</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Torroni</surname></persName>
		</author>
		<idno type="DOI">10.1007/s10506-019-09243-2</idno>
		<ptr target="https://doi.org/10.1007/s10506-019-09243-2.doi:10.1007/s10506-019-09243-2" />
	</analytic>
	<monogr>
		<title level="j">Artificial Intelligence and Law</title>
		<imprint>
			<biblScope unit="volume">27</biblScope>
			<biblScope unit="page" from="117" to="139" />
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b34">
	<monogr>
		<title level="m" type="main">Can Domain Pre-training Help Interdisciplinary Researchers from Data Annotation Poverty? A Case Study of Legal Argument Mining with BERT-based Transformers (??</title>
		<author>
			<persName><forename type="first">G</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Lillis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Nulty</surname></persName>
		</author>
		<imprint>
			<biblScope unit="page">10</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b35">
	<analytic>
		<title level="a" type="main">Enhancing Legal Argument Mining with Domain Pre-training and Neural Networks</title>
		<author>
			<persName><forename type="first">G</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Nulty</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Lillis</surname></persName>
		</author>
		<idno type="DOI">10.46298/jdmdh.9147</idno>
		<ptr target="https://jdmdh.episciences.org/9147.doi:10.46298/jdmdh.9147" />
	</analytic>
	<monogr>
		<title level="j">Journal of Data Mining &amp; Digital Humanities NLP4DH</title>
		<imprint>
			<date type="published" when="2022">2022. 9147</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b36">
	<monogr>
		<author>
			<persName><forename type="first">J</forename><surname>Cui</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Shen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Nie</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Chen</surname></persName>
		</author>
		<idno type="DOI">10.48550/arXiv.2204.04859</idno>
		<idno type="arXiv">arXiv:2204.04859</idno>
		<ptr target="http://arxiv.org/abs/2204.04859.doi:10.48550/arXiv.2204.04859" />
		<title level="m">A Survey on Legal Judgment Prediction: Datasets, Metrics, Models and Challenges</title>
				<imprint>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b37">
	<analytic>
		<title level="a" type="main">jurBERT: A Romanian BERT Model for Legal Judgement Prediction</title>
		<author>
			<persName><forename type="first">M</forename><surname>Masala</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Iacob</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">S</forename><surname>Uban</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M.-A</forename><surname>Cidotã</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Velicu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Rebedea</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Popescu</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2021.nllp-1.8</idno>
	</analytic>
	<monogr>
		<title level="j">NLLP</title>
		<imprint>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b38">
	<analytic>
		<title level="a" type="main">JuriBERT: A Masked-Language Model Adaptation for French Legal Text</title>
		<author>
			<persName><forename type="first">S</forename><surname>Douka</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Abdine</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Vazirgiannis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">E</forename><surname>Hamdani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">R</forename><surname>Amariles</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2021.nllp-1.9</idno>
	</analytic>
	<monogr>
		<title level="j">NLLP</title>
		<imprint>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b39">
	<analytic>
		<title level="a" type="main">Lawformer: A pre-trained language model for Chinese legal long documents</title>
		<author>
			<persName><forename type="first">C</forename><surname>Xiao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Hu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Tu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Sun</surname></persName>
		</author>
		<idno type="DOI">10.1016/j.aiopen.2021.06.003</idno>
		<ptr target="https://www.sciencedirect.com/science/article/pii/S2666651021000176.doi:10.1016/j.aiopen.2021.06.003" />
	</analytic>
	<monogr>
		<title level="j">AI Open</title>
		<imprint>
			<biblScope unit="volume">2</biblScope>
			<biblScope unit="page" from="79" to="84" />
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b40">
	<analytic>
		<title level="a" type="main">Unsupervised law article mining based on deep pre-trained language representation models with application to the italian civil code</title>
		<author>
			<persName><forename type="first">A</forename><surname>Tagarelli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Simeri</surname></persName>
		</author>
		<idno type="DOI">10.1007/s10506-021-09301-8</idno>
		<ptr target="https://doi.org/10.1007/s10506-021-09301-8.doi:10.1007/s10506-021-09301-8" />
	</analytic>
	<monogr>
		<title level="j">Artificial Intelligence and Law</title>
		<imprint>
			<biblScope unit="volume">30</biblScope>
			<biblScope unit="page" from="417" to="473" />
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b41">
	<monogr>
		<title level="m" type="main">BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding</title>
		<author>
			<persName><forename type="first">J</forename><surname>Devlin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M.-W</forename><surname>Chang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Toutanova</surname></persName>
		</author>
		<idno type="DOI">10.48550/arXiv.1810.04805</idno>
		<idno type="arXiv">arXiv:1810.04805</idno>
		<ptr target="http://arxiv.org/abs/1810.04805.doi:10.48550/arXiv.1810.04805" />
		<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b42">
	<monogr>
		<title level="m" type="main">Attention is all you need</title>
		<author>
			<persName><forename type="first">A</forename><surname>Vaswani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Shazeer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Parmar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Uszkoreit</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Jones</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">N</forename><surname>Gomez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Kaiser</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename></persName>
		</author>
		<idno>CoRR abs/1706.03762</idno>
		<ptr target="http://arxiv.org/abs/1706.03762.arXiv:1706.03762" />
		<imprint>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b43">
	<monogr>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">A</forename><surname>Mattmann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">L</forename><surname>Zitting</surname></persName>
		</author>
		<idno>OCLC: ocn731912756</idno>
		<title level="m">Tika in action</title>
				<meeting><address><addrLine>Shelter Island, NY</addrLine></address></meeting>
		<imprint>
			<publisher>Manning Publications</publisher>
			<date type="published" when="2012">2012</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b44">
	<monogr>
		<title level="m" type="main">spaCy 2: Natural language understanding with Bloom embeddings, convolutional neural networks and incremental parsing</title>
		<author>
			<persName><forename type="first">M</forename><surname>Honnibal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Montani</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
	<note>To appear</note>
</biblStruct>

<biblStruct xml:id="b45">
	<monogr>
		<author>
			<persName><forename type="first">H</forename><surname>Nakayama</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Kubo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Kamura</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Taniguchi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Liang</surname></persName>
		</author>
		<ptr target="https://github.com/doccano/doccano" />
		<title level="m">doccano: Text annotation tool for human</title>
				<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b46">
	<monogr>
		<title level="m" type="main">Neural architectures for named entity recognition</title>
		<author>
			<persName><forename type="first">G</forename><surname>Lample</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Ballesteros</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Subramanian</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Kawakami</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Dyer</surname></persName>
		</author>
		<idno>CoRR abs/1603.01360</idno>
		<ptr target="http://arxiv.org/abs/1603.01360.arXiv:1603.01360" />
		<imprint>
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b47">
	<analytic>
		<title level="a" type="main">Neural Legal Judgment Prediction in English</title>
		<author>
			<persName><forename type="first">I</forename><surname>Chalkidis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Androutsopoulos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Aletras</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/P19-1424</idno>
		<ptr target="https://aclanthology.org/P19-1424.doi:10.18653/v1/P19-1424" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics</title>
				<meeting>the 57th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics<address><addrLine>Florence, Italy</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="4317" to="4323" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b48">
	<analytic>
		<title level="a" type="main">Anonymization of italian legal textual documents using deep learning</title>
		<author>
			<persName><forename type="first">D</forename><surname>Licari</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">F</forename><surname>Romano</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Comandé</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceeding of the 16th International Conference on Statistical Analysis of Textual Data (JADT22)</title>
				<meeting>eeding of the 16th International Conference on Statistical Analysis of Textual Data (JADT22)<address><addrLine>Naple</addrLine></address></meeting>
		<imprint>
			<publisher>VADISTAT Press / Edizioni Erranti</publisher>
			<date type="published" when="2022">2022</date>
			<biblScope unit="volume">2</biblScope>
			<biblScope unit="page" from="552" to="559" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b49">
	<analytic>
		<title level="a" type="main">Automatic Classification of Rhetorical Roles for Sentences: Comparing Rule-Based Scripts with Machine Learning</title>
		<author>
			<persName><forename type="first">V</forename><forename type="middle">R</forename><surname>Walker</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Pillaipakkamnatt</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">M</forename><surname>Davidson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Linares</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">J</forename><surname>Pesce</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">ASAIL@ICAIL</title>
				<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
