1. Introduction

Exploring domain and task adaptation of LamBERTa models for article retrieval on the Italian Civil Code

Andrea Simeri

Andrea Tagarelli

0 0 Dept. Computer Engineering , Modeling, Electronics, and Systems Engineering (DIMES) , University of Calabria , 87036 Rende (CS) , Italy

This paper is concerned with AI-based NLP solutions to the law article retrieval problem, with application to the Italian legal domain and, particularly, to the Italian Civil Code. Based upon the current state-ofthe-art on this topic, we revise our early LamBERTa framework in a twofold way relating its domainadaptation feature: replacing the general-domain pre-trained model with a legal-specific one to fine-tune for the task of article retrieval, and delving into the injection of out-of-vocabulary legal terms into the models' tokenizer. Extensive experimental evaluation based on diferent collections of query sets, along with qualitative analysis on the models' prediction interpretability, have unveiled interesting findings about the combined efect of domain- and task-adaptation of an Italian BERT model on the task of law article retrieval.

eol>law article retrieval domain adaptation legal language models artificial intelligence and law

1. Introduction

Artificial Intelligence (AI) is increasingly used in the legal domain, which finds main motivations in the huge amount of information produced and in the involvement of diferent actors, such as legal professionals, law courts, legislators, law firms, and even citizens [ 1 ].

Starting with BERT [ 2 ], deep contextualized pre-trained language models (PLMs) have emerged in the NLP field showing outstanding performance in several discriminative and generative tasks. BERT and BERT-like models have also represented a breakthrough for the legal domain, especially concerning classification problems (e.g., [ 3, 4, 5, 6, 7, 8, 9 ]).

Early applications of such models to the legal domain include approaches that make PLMs adaptive to a specific legal data analysis task, i.e., they directly fine-tune a general-domain pretrained model to the task at hand. In contrast to such task-adaptive methods, domain-adaptive pre-training allows for deeply tailoring a pre-trained model to the domain of the target task [ 10 ]. To specialize a pre-trained model on the legal domain, there are two main strategies that stand as alternative to the direct application of an out-of-the-box pre-trained model for the downstream task, namely either to continue pre-training the model on a legal corpus, or to pre-train the model from scratch on a legal corpus.

Our study in this paper concerns the above topic contextualized to the Italian legal domain. In this respect, it should be noted that, despite a number of Italian BERT models exist (e.g., [ 11, 12, 13 ]), they mostly refer to general-domain language. In particular, no study leveraging BERT for the Italian civil law has been proposed until LamBERTa [ 14 ], the first BERT-based framework for law article retrieval as a prediction problem. LamBERTa is in fact designed to learn prediction models by fine-tuning an Italian pre-trained BERT on the Italian Civil Code (ICC), and to answer natural language queries by retrieving the most relevant ICC articles. Much more recently, a new contribution to the Italian legal domain has been ofered by the release of the first Italian BERT pre-trained on legal corpora, named ITALIAN-LEGAL-BERT [ 15 ].

Given this premise, in this paper we aim to answer the following research questions: • RQ1: How does the behavior of LamBERTa models change when fine-tuning a legal

Italian BERT rather than a general-domain Italian BERT? • RQ2: What is the impact of injecting out-of-vocabulary legal terms into LamBERTa models during the fine-tuning stage? Does it depend on how such terms’ representation is initialized? • RQ3: What aspects arise from the explanation of the diferent LamBERTa models through the interpretation of their predictions? • RQ4: Overall, is the combined efect of domain-adaptation and task-adaptation of a pre-trained Italian BERT model helpful to improve performance on the task of article retrieval from the Italian Civil Code?

To answer the above questions, we provide the following main contributions. We advance research on AI-based NLP for the Italian legal domain by updating the current state-of-the-art of PLMs for law article retrieval as a prediction task. Starting over our early LamBERTa framework, we develop a new variant of LamBERTa, which makes it domain-adaptive besides task-adaptive; we accomplish this by designing LamBERTa so as to learn ICC article classification models through a fine-tuning of an Italian legal pre-trained BERT on the ICC (Section 4). We further investigate on the domain-adaptation of LamBERTa models by gaining insights into the efect of injecting into them a few domain-specific terms, selected from the target legal corpus, and previously unseen in the pre-trained model’s vocabulary (Section 5). Moreover, we perform a qualitative analysis of the diferent LamBERTa models by explaining their underlying behaviors on a number of query instances (Section 6). We finally provide a discussion on our main findings that were drawn for our LamBERTa model variants based on an extensive collection of query sets at varying degrees of length and lexical complexity (Section 7).

2. Background

In this section, we provide background concepts on the Italian Civil Code, the LamBERTa framework [ 14 ], and ITALIAN-LEGAL-BERT [ 15 ].

2.1. The Italian Civil Code

The Italian Civil Code (ICC) is divided into six books, each of which provides rules for a specific theme in civil law. Book-1 (on Persons and the Family, articles 1-455) contains the discipline of the juridical capacity of persons, of the rights of the personality, of collective organizations, of the family; Book-2 (on Successions, articles 456-809) contains the discipline of succession due to death and the donation contract; Book-3 (on Property, articles 810-1172) contains the discipline of ownership and other real rights; Book-4 (on Obligations, articles 1173-2059) contains the discipline of obligations and their sources, that is mainly of contracts and illicit facts (civil liability); Book-5 (on Labor, articles 2060-2642) contains the discipline of the company in general, of subordinate and self-employed work, of profit-making companies and of competition; Book-6 (on the Protection of Rights, articles 2643-2969) contains the discipline of the transcription, of the proofs, of the debtor’s financial liability and of the causes of pre-emption, of the prescription.

For an analysis of the ICC article citation network and relating visualization tool, the interested reader might refer to [16] and [17].

2.2. The LamBERTa framework

Figure 1 shows the conceptual architecture of LamBERTa [ 14, 18 ]. The starting point is ITALIAN-XXL-UNCASED, a pre-trained Italian BERT model whose data source consists of a large Wikipedia dump, various texts from the OPUS corpora collection, and the Italian part of the OSCAR corpus; the final training corpus has a size of 81GB and 13 138 379 147 tokens. 1

LamBERTa models are generated by fine-tuning the pre-trained BERT models on a sequence classification task (i.e., BERT with a single linear classification layer on top) given in input the articles of the ICC or a portion of it. This fine-tuning is accomplished by using a typical configuration of BERT for masked language modeling, with 12 attention heads and 12 hidden 1bert-base-italian-xxl-uncased, available at https://huggingface.co/dbmdz/. layers, and initial (i.e., pre-trained) vocabulary of 31 102 tokens. Each model is trained for 10 epochs, using cross-entropy as loss function, AdamW optimizer and initial learning rate selected within [1e-5, 5e-5] on batches of 256 examples.

Notably, LamBERTa is flexible w.r.t. two peculiar modeling aspects: (i) the training-instance labeling scheme for a given set of ICC articles, and (ii) the learning approach. The former will be discussed later in Section 3, whereas the latter concerns the possibility of training models either on the individual books or on the entire ICC corpus; due to space limitations of this paper, we shall focus on the book-specific models.

Another feature of LamBERTa is the injection of previously unseen legal terms, selected from the task-specific corpus (i.e., ICC), that are out-of-vocabulary of the Italian pre-trained model. This way, the BERT tokenizer is enabled to recognize those terms appearing in the ICC, while fine-tuning on it, and hence, to avoid breaking them down into subwords. To select such terms to be added as new tokens in LamBERTa, the text of each book in the ICC is processed to remove Italian stopwords and filter out overly frequent terms (as occurring in more than 50% of the articles in ) as well as hapax terms. Table 1 reports the number of added tokens and the ifnal number of tokens, for each book of the ICC. ITALIAN-LEGAL-BERT [ 15 ] follows the typical BERT architecture, with a language modeling head on top, AdamW Optimizer, initial learning rate 5e-5. ITALIAN-LEGAL-BERT was built upon ITALIAN-XXL-CASED by further pre-training the latter for additional 4 epochs on a corpus extracted from the National Jurisprudential Archive (pst.giustizia.it), a repository containing millions of legal documents, such as decrees, orders, and civil judgments, from Italian courts and courts of appeal. The corpus used to train ITALIAN-LEGAL-BERT is 3.7 GB of text containing above 21M sentences and 498M words. The trained ITALIAN-LEGAL-BERT was evaluated on named entity recognition, sentence classification, and sentence similarity tasks, using 20K civil cases from the National Jurisprudential Archive and above 21K criminal cases from italgiureweb (italgiure.giustizia.it).

3. Training and evaluation data

One important model aspect of LamBERTa corresponds to the unsupervised article-labeling schemes that are used to produce a training set for each of the books in the ICC. This is not trivial since two main requirements need to be satisfied: (i) a one-to-one association must hold for classes and articles, since a LamBERTa model is designed to be a classifier at article level, i.e., class labels correspond to the articles in the book(s) covered by the model, and (ii) the entire ICC must be used to fully embed its knowledge. Therefore, a key issue is how to create as many training instances as possible for each article to make LamBERTa learn efectively. To this purpose, in [ 14 ], we defined diferent strategies for selecting and combining portions from each article to build the training set for any specific book, paying also attention to balance the contributions of each article, which are originally varying in length. Given a minimum number of training units per article ( ), by default set to 32, each of the article labeling schemes implements a round-robin (RR) method that iterates over replicas of the same group of training units per article until at least are generated. The most efective scheme turned out to be the unigram with parameterized emphasis on the title, which builds the set of training units for each article as comprised of two subsets: the one containing the article’s sentences with round-robin selection, and the other one containing only replicas of the article’s title.

In [ 14 ], LamBERTa models are assessed through extensive experiments by considering singlelabel and multi-label evaluation tasks, based on diferent types of queries , which vary by source, length and lexical characteristics. In this work, we shall use the following query-sets, each defined for any specific book of the ICC: ∙ (QT1) Randomly selected sentences from the articles of book ; ∙ (QT2) Same as QT1, but the sentences are paraphrased through an Italian-English-Italian translation of the queries; ∙ (QT3) Comments on the articles of book , i.e., annotations about the interpretation of the meanings and law implications associated to an article (laleggepertutti.it); ∙ (QT4) Case law decisions from the civil section of the Italian Court of Cassation that contains jurisprudential sentences associated with the articles of .

It should be noted that the above query sets represent diferent testbeds, whose “dificulty” is highly varying, from lower (QT1) to higher (QT3 and QT4). Due to space limitations of this paper, we refer the reader to [ 14 ] for further details on the characteristics of the query sets.

As concerns the assessment criteria, here we consider single-label evaluation criteria only. For each article , we start by measuring the precision for (), i.e., the number of times (queries) was correctly predicted out of all predictions of , the recall for (), i.e., the number of times (queries) was correctly predicted out of all queries actually pertinent to , and the F-measure for (). Then, we averaged over all articles to obtain the per-article average precision ( ), recall (), micro-averaged F-measure ( ) as the average over all s, and macro-averaged F-measure ( ) as the harmonic mean of and . In addition, we account for the top- predictions and the position (rank) of the correct article in predictions: the former is the fraction of correct article labels that are found in the top- predictions (i.e., top--probability results in response to each query), and averaging over all queries, which is the recall@ (@); the latter is the mean reciprocal rank ( ) considering for each query the rank of the correct prediction over the classification probability distribution, and averaging over all queries.

QT1 QT2 QT3 QT4 V1

4. Rebuilding LamBERTa based on ITALIAN-LEGAL-BERT

To answer our first research question ( RQ1), we develop a new version of LamBERTa by replacing the general-domain pre-trained Italian model (i.e., ITALIAN-XXL-UNCASED) with a legal-specific pre-trained Italian model (i.e., ITALIAN-LEGAL-BERT); recall that the latter model is the result of a further pre-training of the former, although on a cased version. We hereinafter refer to this version of LamBERTa as LamBERTa-V2, to distinguish from the original in [ 14 ] hereinafter denoted as LamBERTa-V1. Table 2 summarizes results of the comparison between the two versions based on their evaluation through all query sets. Note that results by original LamBERTa models are borrowed from [ 14 ].2

At a first glance, it can be noticed that although there is no absolute winner, LamBERTa-V1 generally achieves better performance than LamBERTa-V2. For all query types, LamBERTa-V2 appears to lose more when evaluated on queries pertaining the largest books (i.e., Book-4 and Book-5). Moreover, regardless of the book, the gap of LamBERTa-V2 is particularly evident for the most dificult query sets, i.e., QT3 and QT4, which contain queries that are the most distant, both lexically and semantically, from the language used in the training instances. On 2All results shown in Tables 2–4 correspond to the use of same seed setting (for handling computation randomness) and hardware configuration for all LamBERTa models.

QT1 QT2 QT3 QT4 V1

V2 best model for each book, evaluation criterion, and query set) average over all books, LamBERTa-V2 has indeed a percentage decrease of above 40% on case queries (QT4) and above 27% on comment queries (QT3); remarkably, while this holds for all criteria, the negative peaks are reached for the top-3 and top-10 predictions: -46.4% @3 and stands out that using a legal pre-trained model does not bring advantage over a general-domain pre-trained model to fine-tune on the downstream task of ICC article prediction, and actually the legal pre-trained model can often achieve worse performance.

5. Investigating on the domain-specific token injection

Efect of token injection removal. A major goal of this work is to delve into the efect of the domain-specific token injection into LamBERTa models. To answer our RQ2, we first analyze the changes in the behavior of LamBERTa when no out-of-vocabulary tokens are added. We shall use sufix NoDST to distinguish this setting from the original one using token injection.

Results obtained by LamBERTa-V1-NoDST and LamBERTa-V2-NoDST are shown in Table 3. LamBERTa-V2-NoDST w.r.t. LamBERTaFirst, we notice that the performance diference of V1-NoDST is reduced, though still remaining negative, with the exception of QT4, where LamBERTa-V2-NoDST achieves average percentage increase of about 3% up to 9% . More interesting is to compare the obtained results against those in Table 2. The new setting leads to an improvement of both versions of LamBERTa in most cases, where the ITALIAN-LEGALBERT based version takes major benefits. More precisely, the two versions of LamBERTa improves slightly on QT1 and QT2, and more significantly on QT3. By contrast, on QT4, while LamBERTa-V2-NoDST achieves average percentage increase vs. LamBERTa-V2 (from 65% to about 90%), taking light advantage on other models in terms of , , criteria, LamBERTa-V1NoDST tends to be worse than the original LamBERTa-V1 which remains the absolute winner according to the @ and criteria.

Efect of embedding initialization for token injection. The above results prompted us to further investigate on the efect of injecting out-of-vocabulary legal terms into LamBERTa models, by focusing now on the initialization of the added tokens. In fact, it should be noted that in the original setting of LamBERTa, the selected domain-specific tokens are added to the Italian pre-trained tokenizer using a random initialization. Therefore, to provide a more exhaustive answer to our RQ2, we define an enhanced setting for the domain-specific tokens to be added. Our goal is to compute initial embeddings for the new tokens that are not random but incorporate proper knowledge of the legal language. One approach we tried is to initialize each word to be added by getting the [CLS] output embedding computed when prompting the Italian pre-trained model, or alternatively ITALIAN-LEGAL-BERT, with just . Similarly, we tried by averaging the output embeddings of the tokens corresponding to the subwords of detected by the BERT tokenizer. Unfortunately, in both cases, exploiting the output embeddings shows to be inappropriate, which might be due to the fact that these contextualized representations incorporate also the segment and position embeddings. Then, we shifted our attention to vectors extracted from the token embeddings matrix (i.e., the first level of BERT input representation). Given a word to be added, we get the token embedding of each of its subwords (excluding [CLS] and [SEP] embeddings) and tried diferent pooling strategies. The one leading to the best results is initializing the ’s embedding with the initial embedding of its root subword. This setting is hereinafter referred to as ReDST.

Results for this new setting are reported in Table 4. A first remark that stands out is the benefit brought by this new setting of initialization of the injected tokens w.r.t. a random initialization, for both versions of LamBERTa. This holds always, with the exception of QT4 according to @ and criteria, whereby LamBERTa-V1 is the absolute winner over all models. Besides, LamBERTa-V1-ReDST and LamBERTa-V2-ReDST actually perform comparably or better than LamBERTa-V1-NoDST and LamBERTa-V2-NoDST on QT1, and clearly better than LamBERTa-V1-NoDST and LamBERTa-V2-NoDST on QT4 according to @ and .

6. Explainability aspects

Like for any machine and deep learning models, explainability of PLMs is central to understand their solutions provided for a given NLP task. This becomes even more crucial when artificial intelligence meets a challenging field like law (e.g., [19, 20]).

Since our earlier study [ 14 ], we investigated explainability of our LamBERTa models, with a focus on how they form complex relationships between the textual tokens, and their distinctive attention patterns. In this paper, we take a diferent perspective, which is more suited for best model for each book, evaluation criterion, and query set) 0.973 0.972 0.974 0.974 0.988 0.988 0.968 0.968 0.924 0.920 0.979 0.979

V2 providing interpretation of the models’ prediction.

To this purpose, we use LIME - Local Interpretable Model-Agnostic Explanations [21]. LIME aims to explain the behavior of the underlying classifier “around” a query instance: by perturbing interpretable parts of the input query (i.e., words, for textual data), LIME weighs these perturbed data points by their proximity to the original query instance, and observes the associated predictions by the underlying classifier to determine which of those changes will have most impact on the prediction of the original query. To this purpose, the explanation is accomplished by approximating the underlying classifier locally by an interpretable one, such as a linear model. It should be noted that, even if the original classifier deals with non-linear complexities, like our LamBERTa models, in the neighborhood of an instance it behaves roughly as a linear model; therefore the local linear approximation can reasonably be assumed to be correct since LIME looks at a very small region around the query instance.

Figures 2–4 show LIME explanations of the LamBERTa models for various use-case queries. It should be noted that the queries have been defined in the form of questions and do not have strong syntactical patterns matching sentences of the corresponding relevant articles.

In line with the quantitative results discussed in the previous section, the token injection could be unnecessary to get the correct prediction. In this regard, one example is given by the query Chi può essere escluso dal testamento? (Who can be excluded from the testament?), shown in Figure 2, which contains all words appearing in the pre-trained Italian-BERT vocabulary. Indeed, the ground-truth article (Art. 463) is correctly predicted by LamBERTa-V2-NoDST and LamBERTa-V1-NoDST, where the latter leverage on more terms as it can be noticed by the LIME-explainer’s highlighting on the important words.

Nonetheless, token injection can still be helpful in some cases. For instance, given query Al coniuge superstite è assicurato un trattamento preferenziale della porzione disponibile di patrimonio ereditario in aggiunta ai diritti di uso e abitazione? (Is the surviving spouse granted preferential treatment of the available portion of the hereditary patrimony in addition to the rights of use and habitation?), we ifnd this is correctly answered by both LamBERTa-V1 and LamBERTa-V2 models that exploit ICC-specific token injection; for the sake of brevity, in Figure 3 we report explanations for models with random initialization of the injected tokens. This query contains terms ‘superstite’ and ‘ereditario’ that were added to the tokenizer, and hence are fully recognized by such models — otherwise they are subworded by those models equipped with the original pre-trained ItalianBERT vocabulary. Notably, although LamBERTa-V2 achieves higher prediction probability for the ground-truth article (Art. 540), LamBERTa-V1 behaves overall better as it is able to also retrieve the second most relevant article for the query (i.e., Art. 548). In this regard, we notice that while LamBERTa-V1’s predictions are explained also by means of terms ‘superstite’ and ‘ereditario’, these are not recognized as essential by LamBERTa-V2.

The benefit from using token injection is also evident for query I figli nascituri sono rappresentati dai genitori congiuntamente? (Are the unborn children represented by the parents jointly?), where ‘nascituri’ is missing from the pre-trained Italian-BERT vocabulary, and hence it is subworded (into 3 tokens). The ground-truth article (Art. 320) is strongly predicted by LamBERTa-V1 and LamBERTa-V1-ReDST, but is not by the counterpart LamBERTa-V2 models: looking at the LIME explanation (Figure 4), the injected token ‘nascituri’ is well-recognized in LamBERTa-V1 as an important feature for prediction along with other neighboring words in the query, but this does not hold for LamBERTa-V2. This may hint at a phenomenon of fresh-knowledge acquisition, by a model (i.e., LamBERTa-V1) that fine-tunes a general-domain one, against early-knowledge update exhibited by a model (i.e., LamBERTa-V2) that fine-tunes a domain-specific one.

7. Discussion

Here we summarize main findings according to our previously stated research questions.

Concerning RQ1, the behavior of LamBERTa models changes depending on the Italian pre-trained BERT model, especially on more dificult query testbeds. However, for the task of ICC article retrieval, fine-tuning on Italian legal pre-trained model does not bring particular advantage w.r.t. fine-tuning on Italian general-domain pre-trained model; therefore, original LamBERTa turns out to be preferable in most query scenarios. We point out this should not be surprising — after all, in [22], Legal-BERT was shown to achieve only slightly better performances than BERT, on both classification and entity recognition tasks. More importantly, our result is in line with some recent studies that demonstrated how domain adaptive pretraining leads to significant improvements only with low-resource downstream tasks [23].

Injecting domain-specific tokens, i.e., out-of-vocabulary legal terms ( RQ2) can provide low or no benefits in most cases; however, a non-random initialization of the new tokens to be injected significantly improves LamBERTa performance. Moreover, remarks drawn from the explanation of the LamBERTa models’ predictions (RQ3) have confirmed that diferent domain-specific token-injection settings might be required to successfully address diferent query scenarios.

To sum up, thus answering our RQ4, domain adaptation reveals to be less determinant than task adaptation for the task of article retrieval w.r.t. the Italian Civil Code. Nonetheless, there are specific situations relating to the underlying lexical/semantic aspects of the input queries that require to be handled by diferent variants of LamBERTa for successfully accomplishing the retrieval task.

8. Conclusions

We presented a follow up of our research on Italian BERT models for the task of law article retrieval, with application to the Italian Civil Code (ICC), which updates the current state-ofthe-art of PLMs for prediction tasks in the Italian legal domain. Our goal was to delve into the efects of domain-adaptation in combination with the task-adaptation to learn Italian BERT models (LamBERTa) for the task of ICC article prediction. To this purpose, we investigated the role of a recently defined Italian legal BERT into our framework, as well as the efects of enhancing the tokenizer with new terms selected from the target legal corpus, by varying the setting for the initial token embeddings of such terms.

We expect that our work can pave the way for further exploration of domain-/task-adaptation for (Italian) legal BERT models. On our side, we plan to define efective methods to integrate task-oriented knowledge on a pre-trained (domain-adaptive) model. Also, we believe that a from-scratch pre-trained Italian legal BERT model is worthy being developed. [16] L. La Cava, A. Simeri, A. Tagarelli, The Italian Civil Code network analysis, in: Proc.

RELATED – Relations in the Legal Domain Workshop @ICAIL 2021, volume 2896 of CEUR Workshop Proceedings, CEUR-WS.org, 2021, pp. 3–16. [17] L. La Cava, A. Simeri, A. Tagarelli, LawNet-Viz: A Web-based System to Visually Explore Networks of Law Article References, in: Proc. 45th Int. ACM SIGIR Conf. on Research and Development in Information Retrieval, ACM, 2022, pp. 3300–3305. doi:10.1145/ 3477495.3531668. [18] A. Tagarelli, A. Simeri, LamBERTa: Law Article Mining Based on Bert Architecture for the Italian Civil Code, in: Proc. 18th Italian Research Conference on Digital Libraries, volume 3160 of CEUR Workshop Proceedings, CEUR-WS.org, 2022. [19] K. Branting, B. Weiss, B. Brown, C. Pfeifer, A. Chakraborty, L. Ferro, M. Pfaf, A. S. Yeh, Semi-supervised methods for explainable legal prediction, in: Proc. Int. Conf. on Artificial Intelligence and Law (ICAIL), 2019, pp. 22–31. [20] P. Hacker, R. Krestel, S. Grundmann, F. Naumann, Explainable AI under contract and tort law: legal incentives and technical challenges, Artif. Intell. Law 28 (2020) 415–439. [21] M. T. Ribeiro, S. Singh, C. Guestrin, “Why Should I Trust You?”: Explaining the Predictions of Any Classifier, in: Proc. 22nd ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, ACM, 2016, pp. 1135–1144. doi:10.1145/2939672.2939778. [22] I. Chalkidis, M. Fergadiotis, P. Malakasiotis, N. Aletras, I. Androutsopoulos, LEGAL-BERT: the muppets straight out of law school, CoRR abs/2010.02559 (2020). [23] S. Geng, R. Lebret, K. Aberer, Legal Transformer Models May Not Always Help, CoRR abs/2109.06862 (2021).

[1]

Surden , Artificial intelligence and law: An overview, 35 GA . ST. U. L. REV. 1305 ( 2019 ).

[2]

Devlin ,

Chang ,

Lee ,

Toutanova , BERT: pre-training of deep bidirectional transformers for language understanding , in: Proc. NAACL-HLT , 2019 , pp. 4171 - 4186 .

[3]

Rabelo ,

Kim ,

Goebel , Combining similarity and transformer methods for case law entailment , in: Proc. Int. Conf. on Artificial Intelligence and Law (ICAIL) , 2019 , pp. 290 - 296 .

[4]

Chalkidis , I. Androutsopoulos,

Aletras , Neural legal judgment prediction in english , in: Proc. ACL , Association for Computational Linguistics, 2019 , pp. 4317 - 4323 .

[5]

Sanchez ,

He ,

Manotumruksa ,

Albakour ,

Martinez ,

Lipani , Easing legal news monitoring with learning to rank and BERT , in : Proc. ECIR , volume 12036 of Lecture Notes in Computer Science, Springer, 2020 , pp. 336 - 343 .

[6]

Shao ,

Mao ,

Liu , W. Ma,

Satoh ,

Zhang , S. Ma, BERT-PLI: modeling paragraphlevel interactions for legal case retrieval , in: Proc. IJCAI , 2020 , pp. 3501 - 3507 .

[7]

Chalkidis ,

Fergadiotis ,

Malakasiotis ,

Aletras , I. Androutsopoulos , LEGAL-BERT: the muppets straight out of law school , CoRR abs/ 2010 .02559 ( 2020 ).

[8]

Nguyen ,

P. M.

Nguyen ,

Vuong ,

Q. M.

Bui ,

C. M.

Nguyen , T. B. Dang , V.

Tran , M. L.

Nguyen , K.

Satoh , JNLP team: Deep learning approaches for legal processing tasks in COLIEE 2021 , CoRR abs/2106.13405 ( 2021 ). URL: https://arxiv.org/abs/2106.13405. arXiv: 2106 . 13405 .

[9]

Yoshioka ,

Aoki ,

Suzuki , BERT-based ensemble methods with data augmentation for legal textual entailment in COLIEE statute law task , in: Proc. Int. Conf. on Artificial Intelligence and Law (ICAIL) , ACM, 2021 , pp. 278 - 284 .

[10]

Gururangan ,

Marasovic ,

Swayamdipta ,

Lo ,

Beltagy ,

Downey ,

N. A.

Smith, Don't stop pretraining: Adapt language models to domains and tasks , in: Proc. Annual Meeting of the Association for Computational Linguistics (ACL) , ACL , 2020 , pp. 8342 - 8360 .

[11]

Polignano ,

Basile , M. de Gemmis, G. Semeraro, V. Basile, AlBERTo: Italian BERT Language Understanding Model for NLP Challenging Tasks Based on Tweets , in: Proc. 6th Italian Conf. on Computational Linguistics (CLiC-it) , volume 2481 of CEUR Workshop Proceedings, CEUR-WS.org , 2019 .

[12]

Puccinelli ,

Demartini , R. E. D'Aoust , Fixing comma splices in italian with BERT , in: Proc. 6th Italian Conf. on Computational Linguistics (CLiC-it) , volume 2481 of CEUR Workshop Proceedings, CEUR-WS.org , 2019 .

[13]

Tamburini , How “BERTology” Changed the State-of-the-Art also for Italian NLP , in: Proc. 7th Italian Conf. on Computational Linguistics (CLiC-it) , volume 2769 of CEUR Workshop Proceedings, CEUR-WS.org , 2020 .

[14]

Tagarelli ,

Simeri , Unsupervised law article mining based on deep pre-trained language representation models with application to the Italian civil code , Artif. Intell. Law 30 ( 3 ) ( 2022 ) 417 - 473 . Published: 15 September 2021 . doi: 10 .1007/s10506-021-09301-8.

[15]

Licari , G. Comandè, ITALIAN-LEGAL-BERT: A Pre-trained Transformer Language Model for Italian Law , in: Proc. KM4LAW Workshop with the 23rd Int. Conf on Knowledge Engineering and Knowledge Management , volume 3256 of CEUR Workshop Proceedings , CEUR, 2022 .