<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Exploring domain and task adaptation of LamBERTa models for article retrieval on the Italian Civil Code</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Andrea Simeri</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andrea Tagarelli</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Dept. Computer Engineering</institution>
          ,
          <addr-line>Modeling, Electronics, and Systems Engineering (DIMES)</addr-line>
          ,
          <institution>University of Calabria</institution>
          ,
          <addr-line>87036 Rende (CS)</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper is concerned with AI-based NLP solutions to the law article retrieval problem, with application to the Italian legal domain and, particularly, to the Italian Civil Code. Based upon the current state-ofthe-art on this topic, we revise our early LamBERTa framework in a twofold way relating its domainadaptation feature: replacing the general-domain pre-trained model with a legal-specific one to fine-tune for the task of article retrieval, and delving into the injection of out-of-vocabulary legal terms into the models' tokenizer. Extensive experimental evaluation based on diferent collections of query sets, along with qualitative analysis on the models' prediction interpretability, have unveiled interesting findings about the combined efect of domain- and task-adaptation of an Italian BERT model on the task of law article retrieval.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;law article retrieval</kwd>
        <kwd>domain adaptation</kwd>
        <kwd>legal language models</kwd>
        <kwd>artificial intelligence and law</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Artificial Intelligence (AI) is increasingly used in the legal domain, which finds main motivations
in the huge amount of information produced and in the involvement of diferent actors, such as
legal professionals, law courts, legislators, law firms, and even citizens [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>
        Starting with BERT [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], deep contextualized pre-trained language models (PLMs) have emerged
in the NLP field showing outstanding performance in several discriminative and generative
tasks. BERT and BERT-like models have also represented a breakthrough for the legal domain,
especially concerning classification problems (e.g., [
        <xref ref-type="bibr" rid="ref3 ref4 ref5 ref6 ref7 ref8 ref9">3, 4, 5, 6, 7, 8, 9</xref>
        ]).
      </p>
      <p>
        Early applications of such models to the legal domain include approaches that make PLMs
adaptive to a specific legal data analysis task, i.e., they directly fine-tune a general-domain
pretrained model to the task at hand. In contrast to such task-adaptive methods, domain-adaptive
pre-training allows for deeply tailoring a pre-trained model to the domain of the target task [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ].
To specialize a pre-trained model on the legal domain, there are two main strategies that stand as
alternative to the direct application of an out-of-the-box pre-trained model for the downstream
task, namely either to continue pre-training the model on a legal corpus, or to pre-train the
model from scratch on a legal corpus.
      </p>
      <p>
        Our study in this paper concerns the above topic contextualized to the Italian legal domain.
In this respect, it should be noted that, despite a number of Italian BERT models exist (e.g.,
[
        <xref ref-type="bibr" rid="ref11 ref12 ref13">11, 12, 13</xref>
        ]), they mostly refer to general-domain language. In particular, no study leveraging
BERT for the Italian civil law has been proposed until LamBERTa [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], the first BERT-based
framework for law article retrieval as a prediction problem. LamBERTa is in fact designed to
learn prediction models by fine-tuning an Italian pre-trained BERT on the Italian Civil Code
(ICC), and to answer natural language queries by retrieving the most relevant ICC articles. Much
more recently, a new contribution to the Italian legal domain has been ofered by the release of
the first Italian BERT pre-trained on legal corpora, named ITALIAN-LEGAL-BERT [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ].
      </p>
      <p>Given this premise, in this paper we aim to answer the following research questions:
• RQ1: How does the behavior of LamBERTa models change when fine-tuning a legal</p>
      <p>Italian BERT rather than a general-domain Italian BERT?
• RQ2: What is the impact of injecting out-of-vocabulary legal terms into LamBERTa
models during the fine-tuning stage? Does it depend on how such terms’ representation
is initialized?
• RQ3: What aspects arise from the explanation of the diferent LamBERTa models through
the interpretation of their predictions?
• RQ4: Overall, is the combined efect of domain-adaptation and task-adaptation of a
pre-trained Italian BERT model helpful to improve performance on the task of article
retrieval from the Italian Civil Code?</p>
      <p>To answer the above questions, we provide the following main contributions. We advance
research on AI-based NLP for the Italian legal domain by updating the current state-of-the-art of
PLMs for law article retrieval as a prediction task. Starting over our early LamBERTa framework,
we develop a new variant of LamBERTa, which makes it domain-adaptive besides task-adaptive;
we accomplish this by designing LamBERTa so as to learn ICC article classification models
through a fine-tuning of an Italian legal pre-trained BERT on the ICC (Section 4). We further
investigate on the domain-adaptation of LamBERTa models by gaining insights into the efect
of injecting into them a few domain-specific terms, selected from the target legal corpus, and
previously unseen in the pre-trained model’s vocabulary (Section 5). Moreover, we perform a
qualitative analysis of the diferent LamBERTa models by explaining their underlying behaviors
on a number of query instances (Section 6). We finally provide a discussion on our main findings
that were drawn for our LamBERTa model variants based on an extensive collection of query
sets at varying degrees of length and lexical complexity (Section 7).</p>
    </sec>
    <sec id="sec-2">
      <title>2. Background</title>
      <p>
        In this section, we provide background concepts on the Italian Civil Code, the LamBERTa
framework [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], and ITALIAN-LEGAL-BERT [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ].
      </p>
      <sec id="sec-2-1">
        <title>2.1. The Italian Civil Code</title>
        <p>The Italian Civil Code (ICC) is divided into six books, each of which provides rules for a specific
theme in civil law. Book-1 (on Persons and the Family, articles 1-455) contains the discipline of
the juridical capacity of persons, of the rights of the personality, of collective organizations, of
the family; Book-2 (on Successions, articles 456-809) contains the discipline of succession due to
death and the donation contract; Book-3 (on Property, articles 810-1172) contains the discipline
of ownership and other real rights; Book-4 (on Obligations, articles 1173-2059) contains the
discipline of obligations and their sources, that is mainly of contracts and illicit facts (civil
liability); Book-5 (on Labor, articles 2060-2642) contains the discipline of the company in general,
of subordinate and self-employed work, of profit-making companies and of competition; Book-6
(on the Protection of Rights, articles 2643-2969) contains the discipline of the transcription, of
the proofs, of the debtor’s financial liability and of the causes of pre-emption, of the prescription.</p>
        <p>For an analysis of the ICC article citation network and relating visualization tool, the interested
reader might refer to [16] and [17].</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. The LamBERTa framework</title>
        <p>
          Figure 1 shows the conceptual architecture of LamBERTa [
          <xref ref-type="bibr" rid="ref14">14, 18</xref>
          ]. The starting point is
ITALIAN-XXL-UNCASED, a pre-trained Italian BERT model whose data source consists of a
large Wikipedia dump, various texts from the OPUS corpora collection, and the Italian part of
the OSCAR corpus; the final training corpus has a size of 81GB and 13 138 379 147 tokens. 1
        </p>
        <p>LamBERTa models are generated by fine-tuning the pre-trained BERT models on a sequence
classification task (i.e., BERT with a single linear classification layer on top) given in input
the articles of the ICC or a portion of it. This fine-tuning is accomplished by using a typical
configuration of BERT for masked language modeling, with 12 attention heads and 12 hidden
1bert-base-italian-xxl-uncased, available at https://huggingface.co/dbmdz/.
layers, and initial (i.e., pre-trained) vocabulary of 31 102 tokens. Each model is trained for 10
epochs, using cross-entropy as loss function, AdamW optimizer and initial learning rate selected
within [1e-5, 5e-5] on batches of 256 examples.</p>
        <p>Notably, LamBERTa is flexible w.r.t. two peculiar modeling aspects: (i) the training-instance
labeling scheme for a given set of ICC articles, and (ii) the learning approach. The former will
be discussed later in Section 3, whereas the latter concerns the possibility of training models
either on the individual books or on the entire ICC corpus; due to space limitations of this paper,
we shall focus on the book-specific models.</p>
        <p>
          Another feature of LamBERTa is the injection of previously unseen legal terms, selected
from the task-specific corpus (i.e., ICC), that are out-of-vocabulary of the Italian pre-trained
model. This way, the BERT tokenizer is enabled to recognize those terms appearing in the ICC,
while fine-tuning on it, and hence, to avoid breaking them down into subwords. To select such
terms to be added as new tokens in LamBERTa, the text of each book in the ICC is processed to
remove Italian stopwords and filter out overly frequent terms (as occurring in more than 50% of
the articles in ) as well as hapax terms. Table 1 reports the number of added tokens and the
ifnal number of tokens, for each book of the ICC.
ITALIAN-LEGAL-BERT [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ] follows the typical BERT architecture, with a language modeling
head on top, AdamW Optimizer, initial learning rate 5e-5. ITALIAN-LEGAL-BERT was built
upon ITALIAN-XXL-CASED by further pre-training the latter for additional 4 epochs on a corpus
extracted from the National Jurisprudential Archive (pst.giustizia.it), a repository containing
millions of legal documents, such as decrees, orders, and civil judgments, from Italian courts and
courts of appeal. The corpus used to train ITALIAN-LEGAL-BERT is 3.7 GB of text containing
above 21M sentences and 498M words. The trained ITALIAN-LEGAL-BERT was evaluated on
named entity recognition, sentence classification, and sentence similarity tasks, using 20K civil
cases from the National Jurisprudential Archive and above 21K criminal cases from italgiureweb
(italgiure.giustizia.it).
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Training and evaluation data</title>
      <p>
        One important model aspect of LamBERTa corresponds to the unsupervised article-labeling
schemes that are used to produce a training set for each of the books in the ICC. This is not
trivial since two main requirements need to be satisfied: (i) a one-to-one association must hold
for classes and articles, since a LamBERTa model is designed to be a classifier at article level,
i.e., class labels correspond to the articles in the book(s) covered by the model, and (ii) the entire
ICC must be used to fully embed its knowledge. Therefore, a key issue is how to create as
many training instances as possible for each article to make LamBERTa learn efectively. To
this purpose, in [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], we defined diferent strategies for selecting and combining portions from
each article to build the training set for any specific book, paying also attention to balance the
contributions of each article, which are originally varying in length. Given a minimum number
of training units per article (  ), by default set to 32, each of the article labeling schemes
implements a round-robin (RR) method that iterates over replicas of the same group of training
units per article until at least   are generated. The most efective scheme turned out to
be the unigram with parameterized emphasis on the title, which builds the set of training units
for each article as comprised of two subsets: the one containing the article’s sentences with
round-robin selection, and the other one containing only replicas of the article’s title.
      </p>
      <p>
        In [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], LamBERTa models are assessed through extensive experiments by considering
singlelabel and multi-label evaluation tasks, based on diferent types of queries , which vary by
source, length and lexical characteristics. In this work, we shall use the following query-sets,
each defined for any specific book  of the ICC:
∙ (QT1) Randomly selected sentences from the articles of book ;
∙ (QT2) Same as QT1, but the sentences are paraphrased through an Italian-English-Italian
translation of the queries;
∙ (QT3) Comments on the articles of book , i.e., annotations about the interpretation of the
meanings and law implications associated to an article (laleggepertutti.it);
∙ (QT4) Case law decisions from the civil section of the Italian Court of Cassation that contains
jurisprudential sentences associated with the articles of .
      </p>
      <p>
        It should be noted that the above query sets represent diferent testbeds, whose “dificulty” is
highly varying, from lower (QT1) to higher (QT3 and QT4). Due to space limitations of this
paper, we refer the reader to [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] for further details on the characteristics of the query sets.
      </p>
      <p>As concerns the assessment criteria, here we consider single-label evaluation criteria only.
For each article , we start by measuring the precision for  (), i.e., the number of times
(queries)  was correctly predicted out of all predictions of , the recall for  (), i.e., the
number of times (queries)  was correctly predicted out of all queries actually pertinent to
, and the F-measure for  (). Then, we averaged over all articles to obtain the per-article
average precision ( ), recall (), micro-averaged F-measure (  ) as the average over all s, and
macro-averaged F-measure (  ) as the harmonic mean of  and . In addition, we account for
the top- predictions and the position (rank) of the correct article in predictions: the former is
the fraction of correct article labels that are found in the top- predictions (i.e., top--probability
results in response to each query), and averaging over all queries, which is the recall@ (@);
the latter is the mean reciprocal rank ( ) considering for each query the rank of the correct
prediction over the classification probability distribution, and averaging over all queries.</p>
      <p>QT1
QT2
QT3
QT4
V1</p>
      <p>V2</p>
      <p>V2</p>
    </sec>
    <sec id="sec-4">
      <title>4. Rebuilding LamBERTa based on ITALIAN-LEGAL-BERT</title>
      <p>
        To answer our first research question ( RQ1), we develop a new version of LamBERTa by
replacing the general-domain pre-trained Italian model (i.e., ITALIAN-XXL-UNCASED) with
a legal-specific pre-trained Italian model (i.e., ITALIAN-LEGAL-BERT); recall that the latter
model is the result of a further pre-training of the former, although on a cased version. We
hereinafter refer to this version of LamBERTa as LamBERTa-V2, to distinguish from the original
in [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] hereinafter denoted as LamBERTa-V1. Table 2 summarizes results of the comparison
between the two versions based on their evaluation through all query sets. Note that results by
original LamBERTa models are borrowed from [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ].2
      </p>
      <p>At a first glance, it can be noticed that although there is no absolute winner, LamBERTa-V1
generally achieves better performance than LamBERTa-V2. For all query types, LamBERTa-V2
appears to lose more when evaluated on queries pertaining the largest books (i.e., Book-4 and
Book-5). Moreover, regardless of the book, the gap of LamBERTa-V2 is particularly evident
for the most dificult query sets, i.e., QT3 and QT4, which contain queries that are the most
distant, both lexically and semantically, from the language used in the training instances. On
2All results shown in Tables 2–4 correspond to the use of same seed setting (for handling computation randomness)
and hardware configuration for all LamBERTa models.</p>
      <p>QT1
QT2
QT3
QT4
V1</p>
      <p>V2</p>
      <p>V2
best model for each book, evaluation criterion, and query set)
average over all books, LamBERTa-V2 has indeed a percentage decrease of above 40% on case
queries (QT4) and above 27% on comment queries (QT3); remarkably, while this holds for all
criteria, the negative peaks are reached for the top-3 and top-10 predictions: -46.4% @3 and
stands out that using a legal pre-trained model does not bring advantage over a general-domain
pre-trained model to fine-tune on the downstream task of ICC article prediction, and actually
the legal pre-trained model can often achieve worse performance.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Investigating on the domain-specific token injection</title>
      <p>Efect of token injection removal. A major goal of this work is to delve into the efect of the
domain-specific token injection into LamBERTa models. To answer our RQ2, we first analyze
the changes in the behavior of LamBERTa when no out-of-vocabulary tokens are added. We
shall use sufix NoDST to distinguish this setting from the original one using token injection.</p>
      <p>Results obtained by LamBERTa-V1-NoDST and LamBERTa-V2-NoDST are shown in Table 3.
LamBERTa-V2-NoDST w.r.t.
LamBERTaFirst, we notice that the performance diference of
V1-NoDST is reduced, though still remaining negative, with the exception of QT4, where
LamBERTa-V2-NoDST achieves average percentage increase of about 3%  up to 9% . More
interesting is to compare the obtained results against those in Table 2. The new setting leads to
an improvement of both versions of LamBERTa in most cases, where the
ITALIAN-LEGALBERT based version takes major benefits. More precisely, the two versions of LamBERTa
improves slightly on QT1 and QT2, and more significantly on QT3. By contrast, on QT4, while
LamBERTa-V2-NoDST achieves average percentage increase vs. LamBERTa-V2 (from 65% to
about 90%), taking light advantage on other models in terms of , ,  criteria,
LamBERTa-V1NoDST tends to be worse than the original LamBERTa-V1 which remains the absolute winner
according to the @ and   criteria.</p>
      <p>Efect of embedding initialization for token injection. The above results prompted us
to further investigate on the efect of injecting out-of-vocabulary legal terms into LamBERTa
models, by focusing now on the initialization of the added tokens. In fact, it should be noted
that in the original setting of LamBERTa, the selected domain-specific tokens are added to
the Italian pre-trained tokenizer using a random initialization. Therefore, to provide a more
exhaustive answer to our RQ2, we define an enhanced setting for the domain-specific tokens to
be added. Our goal is to compute initial embeddings for the new tokens that are not random
but incorporate proper knowledge of the legal language. One approach we tried is to initialize
each word  to be added by getting the [CLS] output embedding computed when prompting
the Italian pre-trained model, or alternatively ITALIAN-LEGAL-BERT, with just . Similarly,
we tried by averaging the output embeddings of the tokens corresponding to the subwords
of  detected by the BERT tokenizer. Unfortunately, in both cases, exploiting the output
embeddings shows to be inappropriate, which might be due to the fact that these contextualized
representations incorporate also the segment and position embeddings. Then, we shifted our
attention to vectors extracted from the token embeddings matrix (i.e., the first level of BERT
input representation). Given a word  to be added, we get the token embedding of each of its
subwords (excluding [CLS] and [SEP] embeddings) and tried diferent pooling strategies. The
one leading to the best results is initializing the ’s embedding with the initial embedding of
its root subword. This setting is hereinafter referred to as ReDST.</p>
      <p>Results for this new setting are reported in Table 4. A first remark that stands out is the
benefit brought by this new setting of initialization of the injected tokens w.r.t. a random
initialization, for both versions of LamBERTa. This holds always, with the exception of QT4
according to @ and   criteria, whereby LamBERTa-V1 is the absolute winner over all
models. Besides, LamBERTa-V1-ReDST and LamBERTa-V2-ReDST actually perform comparably
or better than LamBERTa-V1-NoDST and LamBERTa-V2-NoDST on QT1, and clearly better
than LamBERTa-V1-NoDST and LamBERTa-V2-NoDST on QT4 according to @ and  .</p>
    </sec>
    <sec id="sec-6">
      <title>6. Explainability aspects</title>
      <p>Like for any machine and deep learning models, explainability of PLMs is central to understand
their solutions provided for a given NLP task. This becomes even more crucial when artificial
intelligence meets a challenging field like law (e.g., [19, 20]).</p>
      <p>
        Since our earlier study [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], we investigated explainability of our LamBERTa models, with a
focus on how they form complex relationships between the textual tokens, and their distinctive
attention patterns. In this paper, we take a diferent perspective, which is more suited for
best model for each book, evaluation criterion, and query set)
0.973 0.972
0.974 0.974
0.988 0.988
0.968 0.968
0.924 0.920
0.979 0.979
      </p>
      <p>V1</p>
      <p>V2
providing interpretation of the models’ prediction.</p>
      <p>To this purpose, we use LIME - Local Interpretable Model-Agnostic Explanations [21]. LIME
aims to explain the behavior of the underlying classifier “around” a query instance: by perturbing
interpretable parts of the input query (i.e., words, for textual data), LIME weighs these perturbed
data points by their proximity to the original query instance, and observes the associated
predictions by the underlying classifier to determine which of those changes will have most
impact on the prediction of the original query. To this purpose, the explanation is accomplished
by approximating the underlying classifier locally by an interpretable one, such as a linear
model. It should be noted that, even if the original classifier deals with non-linear complexities,
like our LamBERTa models, in the neighborhood of an instance it behaves roughly as a linear
model; therefore the local linear approximation can reasonably be assumed to be correct since
LIME looks at a very small region around the query instance.</p>
      <p>Figures 2–4 show LIME explanations of the LamBERTa models for various use-case queries.
It should be noted that the queries have been defined in the form of questions and do not have
strong syntactical patterns matching sentences of the corresponding relevant articles.</p>
      <p>In line with the quantitative results discussed in the previous section, the token injection
could be unnecessary to get the correct prediction. In this regard, one example is given by
the query Chi può essere escluso dal testamento? (Who can be excluded from the testament?), shown
in Figure 2, which contains all words appearing in the pre-trained Italian-BERT vocabulary.
Indeed, the ground-truth article (Art. 463) is correctly predicted by LamBERTa-V2-NoDST and
LamBERTa-V1-NoDST, where the latter leverage on more terms as it can be noticed by the
LIME-explainer’s highlighting on the important words.</p>
      <p>Nonetheless, token injection can still be helpful in some cases. For instance, given query
Al coniuge superstite è assicurato un trattamento preferenziale della porzione disponibile di patrimonio
ereditario in aggiunta ai diritti di uso e abitazione? (Is the surviving spouse granted preferential treatment
of the available portion of the hereditary patrimony in addition to the rights of use and habitation?), we
ifnd this is correctly answered by both LamBERTa-V1 and LamBERTa-V2 models that exploit
ICC-specific token injection; for the sake of brevity, in Figure 3 we report explanations for
models with random initialization of the injected tokens. This query contains terms ‘superstite’
and ‘ereditario’ that were added to the tokenizer, and hence are fully recognized by such models
— otherwise they are subworded by those models equipped with the original pre-trained
ItalianBERT vocabulary. Notably, although LamBERTa-V2 achieves higher prediction probability for
the ground-truth article (Art. 540), LamBERTa-V1 behaves overall better as it is able to also
retrieve the second most relevant article for the query (i.e., Art. 548). In this regard, we notice
that while LamBERTa-V1’s predictions are explained also by means of terms ‘superstite’ and
‘ereditario’, these are not recognized as essential by LamBERTa-V2.</p>
      <p>The benefit from using token injection is also evident for query I figli nascituri sono
rappresentati dai genitori congiuntamente? (Are the unborn children represented by the parents jointly?), where
‘nascituri’ is missing from the pre-trained Italian-BERT vocabulary, and hence it is subworded
(into 3 tokens). The ground-truth article (Art. 320) is strongly predicted by LamBERTa-V1 and
LamBERTa-V1-ReDST, but is not by the counterpart LamBERTa-V2 models: looking at the LIME
explanation (Figure 4), the injected token ‘nascituri’ is well-recognized in LamBERTa-V1 as an
important feature for prediction along with other neighboring words in the query, but this does
not hold for LamBERTa-V2. This may hint at a phenomenon of fresh-knowledge acquisition, by
a model (i.e., LamBERTa-V1) that fine-tunes a general-domain one, against early-knowledge
update exhibited by a model (i.e., LamBERTa-V2) that fine-tunes a domain-specific one.</p>
    </sec>
    <sec id="sec-7">
      <title>7. Discussion</title>
      <p>Here we summarize main findings according to our previously stated research questions.</p>
      <p>Concerning RQ1, the behavior of LamBERTa models changes depending on the Italian
pre-trained BERT model, especially on more dificult query testbeds. However, for the task of
ICC article retrieval, fine-tuning on Italian legal pre-trained model does not bring particular
advantage w.r.t. fine-tuning on Italian general-domain pre-trained model; therefore, original
LamBERTa turns out to be preferable in most query scenarios. We point out this should
not be surprising — after all, in [22], Legal-BERT was shown to achieve only slightly better
performances than BERT, on both classification and entity recognition tasks. More importantly,
our result is in line with some recent studies that demonstrated how domain adaptive
pretraining leads to significant improvements only with low-resource downstream tasks [23].</p>
      <p>Injecting domain-specific tokens, i.e., out-of-vocabulary legal terms ( RQ2) can provide low or
no benefits in most cases; however, a non-random initialization of the new tokens to be injected
significantly improves LamBERTa performance. Moreover, remarks drawn from the explanation
of the LamBERTa models’ predictions (RQ3) have confirmed that diferent domain-specific
token-injection settings might be required to successfully address diferent query scenarios.</p>
      <p>To sum up, thus answering our RQ4, domain adaptation reveals to be less determinant than
task adaptation for the task of article retrieval w.r.t. the Italian Civil Code. Nonetheless, there
are specific situations relating to the underlying lexical/semantic aspects of the input queries
that require to be handled by diferent variants of LamBERTa for successfully accomplishing
the retrieval task.</p>
    </sec>
    <sec id="sec-8">
      <title>8. Conclusions</title>
      <p>We presented a follow up of our research on Italian BERT models for the task of law article
retrieval, with application to the Italian Civil Code (ICC), which updates the current
state-ofthe-art of PLMs for prediction tasks in the Italian legal domain. Our goal was to delve into the
efects of domain-adaptation in combination with the task-adaptation to learn Italian BERT
models (LamBERTa) for the task of ICC article prediction. To this purpose, we investigated
the role of a recently defined Italian legal BERT into our framework, as well as the efects of
enhancing the tokenizer with new terms selected from the target legal corpus, by varying the
setting for the initial token embeddings of such terms.</p>
      <p>We expect that our work can pave the way for further exploration of domain-/task-adaptation
for (Italian) legal BERT models. On our side, we plan to define efective methods to integrate
task-oriented knowledge on a pre-trained (domain-adaptive) model. Also, we believe that a
from-scratch pre-trained Italian legal BERT model is worthy being developed.
[16] L. La Cava, A. Simeri, A. Tagarelli, The Italian Civil Code network analysis, in: Proc.</p>
      <p>RELATED – Relations in the Legal Domain Workshop @ICAIL 2021, volume 2896 of CEUR
Workshop Proceedings, CEUR-WS.org, 2021, pp. 3–16.
[17] L. La Cava, A. Simeri, A. Tagarelli, LawNet-Viz: A Web-based System to Visually Explore
Networks of Law Article References, in: Proc. 45th Int. ACM SIGIR Conf. on Research
and Development in Information Retrieval, ACM, 2022, pp. 3300–3305. doi:10.1145/
3477495.3531668.
[18] A. Tagarelli, A. Simeri, LamBERTa: Law Article Mining Based on Bert Architecture for the
Italian Civil Code, in: Proc. 18th Italian Research Conference on Digital Libraries, volume
3160 of CEUR Workshop Proceedings, CEUR-WS.org, 2022.
[19] K. Branting, B. Weiss, B. Brown, C. Pfeifer, A. Chakraborty, L. Ferro, M. Pfaf, A. S. Yeh,
Semi-supervised methods for explainable legal prediction, in: Proc. Int. Conf. on Artificial
Intelligence and Law (ICAIL), 2019, pp. 22–31.
[20] P. Hacker, R. Krestel, S. Grundmann, F. Naumann, Explainable AI under contract and tort
law: legal incentives and technical challenges, Artif. Intell. Law 28 (2020) 415–439.
[21] M. T. Ribeiro, S. Singh, C. Guestrin, “Why Should I Trust You?”: Explaining the Predictions
of Any Classifier, in: Proc. 22nd ACM SIGKDD Int. Conf. on Knowledge Discovery and
Data Mining, ACM, 2016, pp. 1135–1144. doi:10.1145/2939672.2939778.
[22] I. Chalkidis, M. Fergadiotis, P. Malakasiotis, N. Aletras, I. Androutsopoulos, LEGAL-BERT:
the muppets straight out of law school, CoRR abs/2010.02559 (2020).
[23] S. Geng, R. Lebret, K. Aberer, Legal Transformer Models May Not Always Help, CoRR
abs/2109.06862 (2021).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>H.</given-names>
            <surname>Surden</surname>
          </string-name>
          ,
          <article-title>Artificial intelligence and law: An overview, 35 GA</article-title>
          . ST. U. L. REV.
          <volume>1305</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>J.</given-names>
            <surname>Devlin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Toutanova</surname>
          </string-name>
          ,
          <article-title>BERT: pre-training of deep bidirectional transformers for language understanding</article-title>
          ,
          <source>in: Proc. NAACL-HLT</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>4171</fpage>
          -
          <lpage>4186</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>J.</given-names>
            <surname>Rabelo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Goebel</surname>
          </string-name>
          ,
          <article-title>Combining similarity and transformer methods for case law entailment</article-title>
          ,
          <source>in: Proc. Int. Conf. on Artificial Intelligence and Law (ICAIL)</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>290</fpage>
          -
          <lpage>296</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>I.</given-names>
            <surname>Chalkidis</surname>
          </string-name>
          , I. Androutsopoulos,
          <string-name>
            <given-names>N.</given-names>
            <surname>Aletras</surname>
          </string-name>
          ,
          <article-title>Neural legal judgment prediction in english</article-title>
          ,
          <source>in: Proc. ACL</source>
          , Association for Computational Linguistics,
          <year>2019</year>
          , pp.
          <fpage>4317</fpage>
          -
          <lpage>4323</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>L.</given-names>
            <surname>Sanchez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Manotumruksa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Albakour</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Martinez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Lipani</surname>
          </string-name>
          ,
          <article-title>Easing legal news monitoring with learning to rank and BERT</article-title>
          , in
          <source>: Proc. ECIR</source>
          , volume
          <volume>12036</volume>
          of Lecture Notes in Computer Science, Springer,
          <year>2020</year>
          , pp.
          <fpage>336</fpage>
          -
          <lpage>343</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Shao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Mao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          , W. Ma,
          <string-name>
            <given-names>K.</given-names>
            <surname>Satoh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , S. Ma, BERT-PLI:
          <article-title>modeling paragraphlevel interactions for legal case retrieval</article-title>
          ,
          <source>in: Proc. IJCAI</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>3501</fpage>
          -
          <lpage>3507</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>I.</given-names>
            <surname>Chalkidis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Fergadiotis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Malakasiotis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Aletras</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Androutsopoulos</surname>
          </string-name>
          ,
          <article-title>LEGAL-BERT: the muppets straight out of law school</article-title>
          , CoRR abs/
          <year>2010</year>
          .02559 (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>H.</given-names>
            <surname>Nguyen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. M.</given-names>
            <surname>Nguyen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Vuong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q. M.</given-names>
            <surname>Bui</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. M.</given-names>
            <surname>Nguyen</surname>
          </string-name>
          , T. B.
          <string-name>
            <surname>Dang</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Tran</surname>
            ,
            <given-names>M. L.</given-names>
          </string-name>
          <string-name>
            <surname>Nguyen</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Satoh</surname>
          </string-name>
          , JNLP team:
          <article-title>Deep learning approaches for legal processing tasks in COLIEE 2021</article-title>
          , CoRR abs/2106.13405 (
          <year>2021</year>
          ). URL: https://arxiv.org/abs/2106.13405. arXiv:
          <volume>2106</volume>
          .
          <fpage>13405</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>M.</given-names>
            <surname>Yoshioka</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Aoki</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Suzuki</surname>
          </string-name>
          ,
          <article-title>BERT-based ensemble methods with data augmentation for legal textual entailment in COLIEE statute law task</article-title>
          ,
          <source>in: Proc. Int. Conf. on Artificial Intelligence and Law (ICAIL)</source>
          , ACM,
          <year>2021</year>
          , pp.
          <fpage>278</fpage>
          -
          <lpage>284</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>S.</given-names>
            <surname>Gururangan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Marasovic</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Swayamdipta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Beltagy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Downey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. A.</given-names>
            <surname>Smith,</surname>
          </string-name>
          <article-title>Don't stop pretraining: Adapt language models to domains and tasks</article-title>
          , in: Proc.
          <article-title>Annual Meeting of the Association for Computational Linguistics (ACL)</article-title>
          ,
          <source>ACL</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>8342</fpage>
          -
          <lpage>8360</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>M.</given-names>
            <surname>Polignano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Basile</surname>
          </string-name>
          , M. de Gemmis, G. Semeraro, V. Basile,
          <article-title>AlBERTo: Italian BERT Language Understanding Model for NLP Challenging Tasks Based on Tweets</article-title>
          ,
          <source>in: Proc. 6th Italian Conf. on Computational Linguistics (CLiC-it)</source>
          , volume
          <volume>2481</volume>
          <source>of CEUR Workshop Proceedings, CEUR-WS.org</source>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>D.</given-names>
            <surname>Puccinelli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Demartini</surname>
          </string-name>
          ,
          <string-name>
            <surname>R. E. D'Aoust</surname>
          </string-name>
          ,
          <article-title>Fixing comma splices in italian with BERT</article-title>
          ,
          <source>in: Proc. 6th Italian Conf. on Computational Linguistics (CLiC-it)</source>
          , volume
          <volume>2481</volume>
          <source>of CEUR Workshop Proceedings, CEUR-WS.org</source>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>F.</given-names>
            <surname>Tamburini</surname>
          </string-name>
          ,
          <article-title>How “BERTology” Changed the State-of-the-Art also for Italian NLP</article-title>
          ,
          <source>in: Proc. 7th Italian Conf. on Computational Linguistics (CLiC-it)</source>
          , volume
          <volume>2769</volume>
          <source>of CEUR Workshop Proceedings, CEUR-WS.org</source>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>A.</given-names>
            <surname>Tagarelli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Simeri</surname>
          </string-name>
          ,
          <article-title>Unsupervised law article mining based on deep pre-trained language representation models with application to the Italian civil code</article-title>
          ,
          <source>Artif. Intell. Law</source>
          <volume>30</volume>
          (
          <issue>3</issue>
          ) (
          <year>2022</year>
          )
          <fpage>417</fpage>
          -
          <lpage>473</lpage>
          . Published:
          <issue>15</issue>
          <year>September 2021</year>
          . doi:
          <volume>10</volume>
          .1007/s10506-021-09301-8.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>D.</given-names>
            <surname>Licari</surname>
          </string-name>
          ,
          <string-name>
            <surname>G.</surname>
          </string-name>
          <article-title>Comandè, ITALIAN-LEGAL-BERT: A Pre-trained Transformer Language Model for Italian Law</article-title>
          ,
          <source>in: Proc. KM4LAW Workshop with the 23rd Int. Conf on Knowledge Engineering and Knowledge Management</source>
          , volume
          <volume>3256</volume>
          <source>of CEUR Workshop Proceedings</source>
          , CEUR,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>