<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Knowledge-Based Service Architecture for Legal Document Building</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Silvana Castano</string-name>
          <email>silvana.castano@unimi.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alfio</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ferrara</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Stefano Montanelli</string-name>
          <email>stefano.montanelli@unimi.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sergio Picascia</string-name>
          <email>sergio.picascia@unimi.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Davide Riva</string-name>
          <email>davide.riva1@unimi.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Digital Justice, Knowledge Extraction, Legal Concept Graph</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Università degli Studi di Milano, Department of Computer Science</institution>
          ,
          <addr-line>Via Celoria, 18 - 20133 Milano</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In this paper, we propose a knowledge-based service architecture for legal document building based on Natural Language Processing and learning techniques, to semantically analyze a database of ingested legal documents and propose the most prominent and pertinent textual suggestions for new document composition. After describing the proposed NLP services for knowledge extraction and textual suggestion selection and proposition, we describe the application of proposed document builder architecture by considering a case study of Italian civil judgements.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>CEUR
ceur-ws.org</p>
    </sec>
    <sec id="sec-2">
      <title>1. Introduction</title>
      <p>
        Legal documents constantly produced by Parliaments, Courts, and other institutional bodies
constitute a prominent source of information and knowledge not only for legal actors, like
judges or lawyers, but also for general subjects, like citizens or private and public organizations.
To improve both eficiency and efectiveness of courthouses and legal record ofices, and to
foster digital justice, a significant efort is being devoted in almost all countries to digital
transformation projects, by developing legal information systems and modular architectures
providing a variety of services for acquisition, management, classification, exploration, and
retrieval of legal documents [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. A main issue fostering digital transformation and courthouses
eficiency, is related to the availability of tools to support legal actors in the complex process
of producing a new document for the case at hand, by making available databases of previous
documentation as well as workflow management services for the diferent stages of the legal
proceedings [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. In particular, advanced legal document building environments are required,
to assist legal actors in the production of a new document, like judgements or lawyer acts, by
relying on predefined document templates and knowledge-based services to properly extract
useful information available in previously produced legal documents to propose in form of
nEvelop-O
CEUR
Workshop
Proceedings
suggestions. The availability of predefined document templates [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] and rules [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] promotes a
disciplined structuring of legal documents, which will be eventually ingested in the document
repository for subsequent processing. The availability of a legal document repository where
ingested documents have a very homogeneous structure can, in turn, facilitate the application
of AI techniques, for semantic processing and classification of document contents.
      </p>
      <p>In this paper, we propose a knowledge-based service architecture for legal document building
based on Natural Language Processing and Zero-Shot Learning techniques, to semantically
analyze a database of ingested legal documents and propose the most prominent and pertinent
textual suggestions for the composition of a new document. In particular, the document
building process relies on a predefined document template organized in sections, according
to a segmentation schema, and on a knowledge-extraction service and a suggestion-extraction
service to enforce the document building process. The knowledge-extraction service is responsible
for i) mining a set of featuring concepts that provide a topic-oriented description of textual
contents of ingested documents and ii) implementing a fine-grained semantic classification of
textual contents, where each concept is connected to the document portions from which the
concept emerged. The suggestion-extraction service is responsible for i) Search-by-Content, to
retrieve text portions of ingested documents that are most pertinent to the current text of the
document section under preparation; ii) Search-by-Concept, to retrieve text portions of ingested
documents that are most pertinent to one or more concepts, and possibly document section(s) of
interest, specified by the legal actor, that is, text portions that contain terminological occurrences
of the considered concept(s), coming from specified section(s), if any. The proposed service
architecture supports the legal document building process according to a “human-in-the-loop
approach”, where the document author, namely the legal actor, drafts the document under
preparation following the document template, by actively working on the text portions provided
by the suggestion service.</p>
      <p>The paper is organized as follows. In Section 2 we introduce the proposed service architecture
for a legal document builder. Section 3 describes the knowledge-extraction service, while Section
4 illustrates the suggestion-extraction service. In Section 5 we present a case study applying the
architecture on a corpus of Italian case law decisions in the framework of the NGUPP project.
Section 6 is devoted to related work. Finally, Section 7 provides the final remarks.</p>
    </sec>
    <sec id="sec-3">
      <title>2. Legal Document Builder: the Proposed Architecture</title>
      <p>Document building, sometimes referred to as document assembly, is a process that aims at
producing a textual document following a predefined schema with the support of digital,
automated tools. The task, even when entirely performed by humans, can be thought of as
a sequence of three phases: (1) definition of the document format and structure; (2) content
draft of each part or section; (3) editing of the document. Each of these phases is susceptible to
automated support to a variable extent, ranging from interactive, human-in-the-loop approaches
to fully automated services. For instance, document structure can be totally or partially based on
predefined templates, possibly tailored to user needs. At the same time, editing may be supported
by solutions ranging from error detection to automated rephrasing and text generation.</p>
      <p>Our proposed architecture is designed to support experts in producing documents specialized
domains that require (or would benefit from) a well-defined document structure, such as the
legal domain, which is the context we consider in this paper. A main requirement in designing
a document builder architecture for the legal domain is to preserve the importance of the
legal actor role and work with capability to provide automated support while preserving the
uniqueness/autonomy of the assessment and judgment capability of legal actors in document
writing. As a consequence, a fully automated approach is not viable nor desirable, and interactive
tool environments should be provided, to keep the legal actor “in-the-loop” and to reduce also
the risk of excessive standardization of legal documents, which is a requirement too, given the
value and purposes of legal documents.</p>
      <p>The architecture provides a set of services to support the document assembly process by
relying on predefined document template(s). This way, document building promotes a
disciplined approach to document generation, by enhancing the readability, homogeneity, and
quality of produced documentation, very important also for subsequent analysis and knowledge
extraction purposes.</p>
      <p>The proposed service architecture for legal document building is shown in Figure 1. The
architecture aims to support legal actors like judges and judicial oficers in the preparation
of oficial legal documents. In particular, the legal actor, namely the document author, starts
working by selecting a template among those available and she/he is supported in building the
document by receiving suggestions in the form of pertinent chunks of text that can be reused
“as-is” or adjusted/edited for insertion in the document under preparation. Suggestions are
extracted from a corpus of legal documents (e.g., judgements, case-law decisions) that have
been previously ingested and processed. To this end, a storage layer is defined to maintain i) the
document database for the raw ingested documents and corresponding texts; ii) the document
chunks and related embeddings extracted from documents for classification; iii) the concept
graph to store entries for the concepts extracted from documents.</p>
      <p>Two main service pipelines are defined: the knowledge-extraction service and the
suggestionextraction service. In the knowledge-extraction service, appropriate components are defined
to exploit the ingested documents for mining a set of featuring concepts that provide a
topicoriented description of underlying textual contents. The concepts extracted from the documents
are organized in a graph, where a pair of similar concepts is linked by an edge. Each concept is
also connected to the document portions from which the concept emerged, meaning that we
can retrieve the pertinent document segments where a certain concept occurs. For knowledge
extraction, the involved components are document segmentation and data preparation, and
concept mining.</p>
      <p>The suggestion-extraction service enforces two modalities to detect pertinent chunks to propose
to the user for document drafting: i) Search-by-Content, which retrieves similar chunks as
suggestions by considering the current text of the document section under preparation; ii)
Search-by-Concept, which is based on a concept specified by the legal actor to retrieve and
provide as suggestions those phrases that contain terminological occurrences of the considered
concept.</p>
      <p>In the following sections, we describe each service in more detail.</p>
    </sec>
    <sec id="sec-4">
      <title>3. The Knowledge-Extraction Service</title>
      <p>Ingested legal documents are submitted to a knowledge-extraction pipeline based on document
annotation/segmentation, and fine-grained indexing and classification techniques.</p>
      <sec id="sec-4-1">
        <title>3.1. Document Segmentation</title>
        <p>Document segmentation is concerned with the identification of relevant sections of a document
which are exploited to construct new documents and to filter suggestions for the document
drafting. The idea is that the legal actor receives suggestions extracted from the same, or strictly
related, document section that is currently under construction.</p>
        <p>
          Document segmentation is a type of document annotation that consists in dividing a document
into parts (known as segments) that “display local coherence” [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ], eventually assigning a label
to each of them. Since in our case we focus only on textual data, such parts may be contiguous
sequences of words, clauses or sentences. Local coherence is here interpreted from a functional
point of view, i.e. a text segment plays a single and indivisible function in the structure of the
document, e.g. introduction, argumentation, conclusion, and others.
        </p>
        <p>The first problem to solve is therefore the definition of a segmentation schema comprising
a set of functional segments and the descriptions of the respective text characteristics, which
serve as guidelines for annotation. Binding rules over the annotation output can be specified by
an additional set of axioms, including for instance a lower and/or upper bound on the number
of segments of a certain type. In most cases, definition of segment functions and characteristics
and axioms over them requires expert knowledge of the domain at hand, and shall be general
enough to encompass the variety of documents in the corpus of interest.</p>
        <p>The second problem is the actual annotation of documents, complying with the schema.
Given a segmentation schema and a corpus to annotate, the annotation can be performed either
manually (by human annotators), automatically (by rule-based or machine learning techniques),
or by a hybrid approach.</p>
        <p>Manual Segmentation. In the manual case, the annotation is performed by a group of human
annotators, generally, experts of the domain, who split each document into segments and
assign a label from the segmentation schema to each segment using a digital text annotation
tool. Leveraging expert knowledge can benefit annotation accuracy, especially in case of
complex documents or domain corpora like in the legal domain, but it hinders scalability,
since manual annotation is notoriously a labour-intensive, time-consuming activity. Moreover,
despite increasing accuracy, expert knowledge doesn’t ensure agreement among annotators,
which is often necessary to define a unique and homogeneous ground truth.</p>
        <p>Automated Segmentation. Automated systems for text segmentation typically rely on either
rule-based or machine learning models. Such models ensure two main advantages with respect
to manual annotation: they can achieve higher scalability, making it possible to process large
corpora, and, if machine learning models are employed, text characteristics can be learned
automatically instead of being a-priori defined. The main disadvantage is lower accuracy,
especially on complex documents. The problem may intensify in case of totally automated
systems, which rely on pre-trained models only, in the absence of any manually annotated data.
Hybrid Segmentation. By hybrid segmentation we indicate a text segmentation system that
receives as input a limited corpus of manually annotated documents and, training and validating
an automated system on such data, extends the annotation to other documents.</p>
        <p>In Section 5, we discuss legal document segmentation activity we performed for prototyping
a first release of a legal document builder in the framework of an ongoing research project
on digital justice in Italy. In this context, legal document segmentation has been manually
performed by legal experts of the project on a dataset case study of judgements. Since rule-based
models usually fall short in generalization, we deem machine learning models preferable to
process large and possibly heterogeneous legal corpora. We plan experimentation of diferent
classifiers as discussed in concluding remarks.</p>
      </sec>
      <sec id="sec-4-2">
        <title>3.2. Data Preparation</title>
        <p>Data preparation is concerned with the tokenization of ingested documents with the goal to
split them into document chunks. A document chunk represents the text unit to consider for
classification that can be associated with a concept. Document chunks associated with concepts
represent the suggestions that can be provided to the author while building a document. We
stress that the size of the document chunk should be large enough, so that the context can be
captured, but not too much extended to avoid segments that are long to read and potentially
noisy due to the presence of multiple concepts. In this paper, we choose to tokenize documents
by defining a chunk for few sentence/phrase detected in a document, up to a maximum size
of 128 tokens1. This is particularly appropriate for legal actors that are typically interested in
retrieving precise suggestions in which a given concept of interest appears and can be rapidly
read/assimilated.</p>
        <p>
          As a further preparation step, the terms appearing in document chunks are lemmatized and
a vector-based representation of each document chunk is finally built. The use of embedding
techniques to represent chunks allows to map the document contents on a semantic vector
space where the similarity of two chunks can be measured by comparing the corresponding
vector representations through a similarity metric (e.g., cosine similarity). For embedding
construction, Sentence-BERT [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ], a modification of the original BERT model based on siamese
and triplets networks, is employed to derive a semantically meaningful embedding for a given
sentence/phrase. As such, a document chunk  is associated with a set of terms   therein
1The size of the document chunk is determined by the maximum number of tokens that can be processed by the
considered embedding model.
contained. Any term is described as  = (  ,   ,  ) ̄ , where   is the label of the term (i.e., the
lemma),   is a description of the term meaning taken from a reference dictionary/vocabulary
(e.g., WordNet), and  ̄ is the corresponding vector-based representation according to
SentenceBERT, respectively. A document chunk  has the form  = (  ,   , )̄ , where   is the section
of the document where the chunk occurs,   is the original textual content of the chunk, and
 ̄ is the corresponding vector-based representation calculated as the mean of term vectors  ̄
with  ∈   . Embedding models have the capability to represent and compare the meaning of
entire text blocks like document chunks. On such a target, context-aware embedding models
ifne-tuned on document similarity tasks, like Sentence-BERT, are appropriate. In the legal field,
the phrase structure can be highly articulated, and some common terms can have a precise
technical meaning when used in a court (e.g., citation, clemency, designation). Sentence-BERT
can handle such a kind of situations, which may strongly deviate with respect to everyday
conversations.
        </p>
      </sec>
      <sec id="sec-4-3">
        <title>3.3. Concept Mining</title>
        <p>
          Our solution to concept mining is called ASKE and it is based on zero-shot learning techniques
and context-aware embedding models to enforce concept extraction [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ].
        </p>
        <p>Zero-shot learning is an unsupervised classification technique, characterized by the capability
to enforce classification without requiring any pre-existing annotation of the considered
documents. Initially, a seed knowledge is defined as a set of textual descriptions, each one featuring a
concept of interest, namely a seed concept, to consider for classification. Typically, for a seed
concept, a basic, gross-grained description is provided as a short text (e.g., one or two phrases)
or a list of keywords. As an example, for a seed concept about banking contract, a corresponding
textual description used for embedding is bank deposit, safe deposit box, bank credit opening, bank
advance, bank account, bank discount. Further concepts are derived from seed ones during the
extraction process, and they usually provide a more fine-grained description of the concept
instances occurring in the document chunks. A concept  , either seed or derived, is defined as a
pair  = (  , )̄ , where   is a label featuring the meaning of the concept expressed in a synthetic
and human-understandable way, and  ̄ is a vector-based concept representation. Each concept 
is initially associated with the set of terms   extracted from the textual description of  . The
vector concept  ̄ is built as the mean of the vectors of all the terms in   . Finally, the label  
corresponds to the label   of the term  ∈   , whose vector representation  ̄ is closest to the
concept vector  .̄ Concept extraction is defined as a progressive, iterative process articulated in
the following three steps:</p>
        <p>Zero-Shot Classification . Given a set of concepts (i.e., the seed concepts at the beginning
of the process), the document chunks are classified through zero-shot learning. A similarity
measure  , e.g. cosine similarity, is calculated over any pair of embeddings between chunks
and concepts. A document chunk  is classified with the concept  when the similarity value
satisfies  ( , ) ≥  , with  defined as a similarity threshold configured in the system. The value
of  is empirically determined according to experimental results. In this paper, the value  = 0.3
is employed in the proposed case-studies and experiments.</p>
        <p>Terminology Enrichment. Given a document chunk  classified with the concept  , the terms
in   are exploited for enriching the term set   . The idea is that the initial description of
the concept  can become more detailed if we add terminology taken from chunks that are
pertinent (i.e., classified) with  . This is done by summing, for each  ∈   , similarities  ( , )
and  ( ,   ), where   denotes the average embedding of document chunks classified with  .
Terms  ∈   satisfying a system-defined  similarity threshold are inserted in   .</p>
        <p>Concept Derivation. By enriching the term set   , it is possible that more fine-grained concepts
emerge from  , and they can be generated as new concepts. The discovery of possible new
concepts emerging from  is enforced by clustering the embedding vectors  ̄ of terms in   .
The Afinity Propagation (AP) algorithm is adopted to this end, since it allows to detect the
emergence of sub-groups of similar terms within   , without requiring to “a-priori define” the
number of clusters to generate. A new concept  ′ is created for each cluster returned by AP
on the terms   of a concept  . A link is defined between a concept  ′ and  to denote that
 ′ is derived from  and they are somehow similar/related in content. The concept  is then
updated since the terms in   can be changed due to enrichment. As a consequence,   and  ̄ are
re-calculated.</p>
        <p>The set of concepts obtained after derivation can trigger the execution of a new cycle based
on the above three steps. New derived concepts can contribute to improve the classification of
chunks with more fine-grained concepts. Further new concepts can be also discovered through
a new execution of enrichment and derivation on the basis of a refined classification result.
As such, concept extraction is characterized by a predefined endpoint condition based on a
termination threshold. When the number of new concepts created in the derivation step is lower
than the threshold, the concept extraction process is concluded. A concept graph providing a
topic-based description of the underlying document corpus is finally stored.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>4. Suggestion-Extraction for Document Drafting</title>
      <p>This service supports the legal document building process, by implementing a
human-in-theloop approach, where the document author, namely the legal actor, has to draft a document
under preparation and she/he receives suggestions extracted by the ASKE engine according to
the results of the knowledge-extraction service.</p>
      <p>The Document Builder Environment is characterized by an editing area composed of a list of
sections to be drafted, and a suggestion area where the author can retrieve pertinent chunks to
be re-used and optionally changed in the currently-edited section (see Figure 2). The author is
mainly focused on the editing area with the aim to write and create the document. The structure
of the document follows a template that the author has to choose within a predefined set. A
document template is articulated in a list of sections  1, … ,   . In the following, we call  ∗ the
document under editing (i.e., the active document), and  ∗ the section on which the author is
currently focused for drafting (i.e., the active section).</p>
      <p>Two modalities are defined to provide suggestions to the author during document drafting:
• Search-by-Content, which enforces similarity-based retrieval of text suggestions for the
section  ∗ by exploiting the chunks of ingested documents;
• Search-by-Concept, which enforces concept-based retrieval of text suggestions for the
section  ∗ by exploiting the concept graph.
In the Search-by-Content modality, suggestions are represented by document chunks belonging
to ingested documents that are considered most similar to the active one; the degree of similarity
is computed taking into consideration document chunks belonging to a section preceding  ∗.
The following steps are executed to determine the most pertinent suggestions for an active
section  ∗ with text  ∗.</p>
      <p>Detection of similar documents on previous sections. Call  1, … ,   the sections of the active
document  ∗ that precede the active section  ∗. The ASKE engine is invoked to determine the
ingested documents that contain similar chunks with respect to the texts  1, … ,   of sections
 1, … ,   , respectively. For each section   of the active document  ∗ with  ≤  , the corresponding
section content   is passed to the ASKE engine. The ASKE embedding model, namely the model
used for knowledge extraction, is queried with the aim of extracting a vector representation ⃗ of
  2. ASKE employs the cosine similarity function to compare ⃗ against the document chunks of
the   section belonging to any ingested document  . For each document  , a similarity degree
  is defined by summing all the similarity values provided by the matching chunks   in   . For
a section   , a set of similar documents   is returned as a result, where the similarity degree  
of a document  ∈   is over a prefixed system-designed threshold. An overall set of similar
documents  = ⋂==1   is finally returned to use as input for the next step.</p>
      <p>Detection of pertinent chunks on the active section. Given the active section  ∗, we retrieve
a set of pertinent suggestions among the chunks of the similar documents in  . As a default
behavior, the chunks of section  ∗ in the documents of  are retrieved as suggestions to support
the author in drafting the active section. Moreover, to obtain a refined set of suggestions, the
author can start drafting the active section  ∗ by inserting an initial section content  ∗. In this
case, ASKE is invoked to derive the vector representation ⃗∗ of  ∗. The embeddings of chunks
2When the section content   exceeds the size of a document chunk, a tokenization mechanism is employed to spit
  . For the sake of clarity, in the paper, we consider   as a single textual element with a corresponding vector-based
representation ⃗ .
in documents of  and section  ∗ are compared against ⃗∗. A ranked list of similar chunks are
returned as suggestions to show in the right-hand panel of Figure 2 (descending order).</p>
      <sec id="sec-5-1">
        <title>4.2. Search-by-Concept</title>
        <p>In the Search-by-Concept modality, suggestions represent the document chunks where an
occurrence of a given concept of interest appears according to the classification results of ASKE.</p>
        <p>In the suggestion area, the author can select a concept of interest  ∗ among a set of available
ones, namely the set of concepts derived by ASKE during knowledge extraction from the
ingested documents. The concept graph of ASKE is queried to retrieve the set of document
chunks   ∗ classified as  ∗ that are then returned as a result.</p>
        <p>A further filtering option is provided in the suggestion area to enable the author to choose a
target section of interest  ∗. When a target section is selected, only the chunks of   ∗ belonging
to the section  ∗ in the ingested documents are finally shown as suggestions in the right-hand
panel of Figure 2.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>5. Application to the Italian Legal Domain</title>
      <p>The proposed service architecture has been applied to a legal case study in framework of the
Next Generation UPP Italian project, aimed at providing artificial intelligence and advanced
information management techniques for digital transformation of Italian legal processes and
digital law in general.</p>
      <p>We firstly introduce the dataset employed in the case study, explaining document preparation
and segmentation. Then, we describe the scenario in which a legal actor, i.e. the judge, makes use
of the document builder functionalities in order to generate a new judgement, with examples of
Search-by-Content and Search-by-Concept text suggestions. The case study has been conducted
on documents written in Italian; examples reported in the paper have been translated into
English for the purpose of understanding. Also, for the sake of brevity, we report an excerpt of
the text suggestions.</p>
      <sec id="sec-6-1">
        <title>5.1. Dataset</title>
        <p>
          The dataset consists in a corpus of 50 Italian case law decisions, retrieved from 12 diferent
Courts located in Northern Italy. All the documents concern first degree civil law judgments
regarding the matter of unfair competition. Such kind of documents comes in PDF files of
diferent formats, depending on the court emitting it; plain text has been extracted from these
ifles in order to be used in the document segmentation and the data preparation processes.
For document segmentation, our legal partners of the project provided a document schema
defined on the basis of the Rules of the Court (March 2022). This set of rules has the function
of giving a rigid and predetermined structure to the introductory acts of the parties and to
the decision of the Court. In particular, for example, Rule 74 states that “any judgment of the
European Court of Human Rights must contain some basic information (such as the names of
the parties, agents, lawyers or advisers of the parties and a report of the procedure followed)
followed by some well-defined steps of the decision: the facts of the case; a summary of the
arguments of the parties; the legal reasons; operational provisions”. Similar criteria are also
established for the drafting of the appeals of lawyers. Based on Rule 74 recommendations and
related literature, our law project partners developed a document segmentation schema for
Italian juridical judgements/case law decisions [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ], which has been enforced in our proposed
architecture, for document segmentation and for document building. The segmentation schema
comprises five diferent structural sections with the following corresponding labels (identified
directly by the domain experts): court and parties, providing information of the court, the panel
of judges and the parties in the trial; background information, relating to the proceedings of the
trial and to the reconstruction of the facts; claims and arguments, claims made by the plaintifs
and counterclaims made by the defendants; reasoning, for each decision about an individual
claim; final decisions , for each individual claim. For the purposes of the case study, document
segmentation has been performed manually by legal domain experts involved in the project.
        </p>
      </sec>
      <sec id="sec-6-2">
        <title>5.2. Search-by-Content</title>
        <p>In the Search-by-Content scenario, the legal actor is required to provide to the system basic
knowledge regarding the case at hand, filling at least a section  of the chosen document template,
in order to receive suggestions on an active section  ∗. These suggestions are retrieved from
previous relevant case law decisions stored in the database of ingested documents and sorted
based on the computed similarity scores. In particular, the legal actor could write down the
sections  1 and  2 regarding background information and claims and arguments in order to receive
suggestions for the reasoning section  ∗. We exclude from the analysis the first section in the
aforementioned annotation schema, court and parties, since it is auto-compiled by the software
tools currently in use by legal actors and it includes general information that end up being not
relevant for the search process.</p>
        <p>As an example of document building, suppose to work on the generation of a new judgement
regarding the counterfeiting of machinery, for which the following two sections have
already been drafted:</p>
        <sec id="sec-6-2-1">
          <title>Background Information: The Supreme Court confirmed what it had already ruled in previous</title>
          <p>decisions, ruling definitively on the validity of the XXXX patent and the intervening
counterfeiting by YYYY; the plaintif then filed a petition for the continuation of the suspended
trial.</p>
          <p>Claims and Arguments: XXXX seeks relief for the contraction of revenues aferent to the
machinery, resulting from the presence of the machinery substitute of YYYY.</p>
          <p>and the reasoning section is the active section to be generated. By invoking the
Searchby-Content modality for drafting the reasoning section, the document builder retrieves as
suggestions the most relevant document chunks of the reasoning sections of the judgements
similar to the active counterfeiting of machinery document at hand. The top-2 retrieved
document chunks ranked by similarity score are shown below.</p>
          <p>(...) objected that such conduct challenged to XXXX would not be configurable since the
corporate purpose of the two companies is diferent in the sense that XXXX is limited to
carrying out only service activities for the repair of cleaning machinery, while YYYY carries
out as its prominent activity the production and sale of machinery, which XXXX does not deal
with, consequently, there could not be talk of unfair competition since the activity carried out
by the defendant company is merely marketing and not production pertaining exclusively to
the plaintif company.</p>
          <p>The plaintif initially attached only pecuniary damage by referring to the royalties that
the defendants would have to pay for the use of the trademark online, then to the cost of
advertising investments for the launch of the products then not sold due to unfair competition
and alternatively the retroversion of profits. Then referring to the fact that through unfair
competition XXXX’s turnover would have unlawfully grown by increasing its holding YYYY
at the expense of ZZZZ with a reduction for the latter in the number of shares.</p>
        </sec>
      </sec>
      <sec id="sec-6-3">
        <title>5.3. Search-by-Concept</title>
        <p>In the Search-by-Concept scenario, it is possible to ask for text suggestions pertinent to one or
more concepts  ∗ among the ones extracted by ASKE, which are representative of the actual
content of the ingested documents. Besides concept(s) of pertinence, the legal actor can also
choose the section(s)  ∗ to which retrieved text suggestions should belong to.</p>
        <p>Below, we report an example of retrieved text suggestions pertinent to the concept public
service located in background information section of ingested documents.</p>
        <sec id="sec-6-3-1">
          <title>XXXX sued YYYY and ZZZZ, claiming: that it carries out funeral activities; that, by virtue of</title>
          <p>Article 5, paragraph 2, R.L. no. 19 of 2004, ”in the event that the manager of public cemetery
and necropsy services also carries out the funeral activity referred to in Article 13 of this law,
corporate separation is mandatory...”; that, in implementation of this legislation, on March
27, 2008 the company YYYY had been established, a company that performs on behalf of the
Municipality of Rimini the cemetery and necropsy activity as provided for by Art. 1 point 3
letter c of R.L. n. 19 of 2004 i.e. funeral transport for indigent people, funeral collection and
transport on call of the Judicial Authority or for hygienic-sanitary needs, observation depot,
morgue, mortuary sanitary service, necroscopic medicine activities; by notarial deed dated
30/09/2009, YYYY conferred the commercial activity concerning funeral services to ZZZZ. (...)
The company XXXX reported: that it was dedicated to the provision of logistics and material
transport services for the ”MotoGP” world championship races and other motorcycle sports;
that it had entered into a contract with the organizing company YYYY, thus securing the
status of ”Express Supplier” and the ability to carry out all transports not only for YYYY, but
also for the various teams and suppliers participating in the MotoGP championship; that in
the structure of ZZZZ Mr. A. B. has always played a key role, following the conclusion of
a project collaboration contract, renewed until 12/31/2008, by which he had been given the
position of project manager organizing logistics for the MotoGP and other motorsport world
championships, including the contract acquisition stages, with qualifications general director
adviser and general director marketing and full decision-making and signing powers in the
name and on behalf of ZZZZ; (...)</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>6. Related Work</title>
      <p>
        Work related to the legal document building issues regards legal document assembly, legal
information retrieval, and document segmentation. The increasing interest towards AI applications
in the legal field pushed a significant interest towards legal document assembly [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] by relying
on a predefined document structure. In [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] the authors define a clear distinction between
a document-oriented approach and an issue-oriented approach to document assembly. The
former consists in the automatic selection of document components and the instantiation of
user provided values for certain variables. The latter makes use of an explicit representation of
legal rules, whose truths of the predicates are provided by the judge in the justification. These
rules have been explicitly defined over the years in diferent forms, such as decision trees [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]
and interchange languages [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Diferent collection of legal documents, such as Serbian case
laws [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] and the GDPR [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], have been modeled using these approaches.
      </p>
      <p>
        Legal information retrieval (LIR) is the discipline that aims at extracting information from a
corpus of legal documents, including case law decisions and legal codes. The digitization of
these documents produced a significant boost to the field of LIR, with many methodologies being
proposed for the problem: from boolean and rule-based approaches [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], from the exploitation
of thesauri [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] to ontologies [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. Recent studies [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] have been focused on the use of
large language models (LLM) [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ], that are capable of capturing the contextual representation
of the text rather than simply focusing on the occurrences of certain terms. There have also
been studies exploiting NLP techniques directly on Italian legal documents: ontology learning
systems [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ], article prediction [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ], and fine-tuned LLMs [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ].
      </p>
      <p>
        The segmentation of legal documents is a task that can be performed either manually or
automatically, with the respective advantages and disadvantages. Manual segmentation [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ] [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ]
relies on annotations performed by field experts on a corpus that do not contain many documents
due to the complexity of the task, although the results are more precise than any automated
system. Automatic approaches to text segmentation have been developed for general corpus in
order to detect a shift in the topic discussed by subsequent sentences. In [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ] semantic related
graphs, in which sentences are nodes and edges represents their relatedness, are exploited in
order to find maximal cliques. More recent studies approach the task employing global [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ]
and contextual [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ] vector representations of sentences. Automatic approaches have also been
developed for the segmentation of legal documents, such as US Court Decisions [
        <xref ref-type="bibr" rid="ref26">26</xref>
        ], and
Terms-of-Service documents [
        <xref ref-type="bibr" rid="ref27">27</xref>
        ].
      </p>
    </sec>
    <sec id="sec-8">
      <title>7. Concluding Remarks</title>
      <p>
        In this paper, we presented a reference service architecture for knowledge-based legal document
building. After discussing components and techniques composing the knowledge-extraction
service and the suggestion-extraction service of the proposed architecture, some examples
of the experimentation of a first legal document builder prototype on a corpus case study in
the Italian legal domain are discussed. Ongoing work is devoted to the consolidation of the
functionalities of the suggestion-extraction service of the document builder, with development
of a graphical interface for the knowledge-based document building functionalities. Future
work will be devoted to studying automated document segmentation techniques. In particular,
we envision a clause-level classification model, which first maps each clause into a vector space
exploiting a pre-trained contextual embedding model (e.g. Sentence-BERT [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]) and then applies
a classifier on the clause embedding vectors to determine the segment label of each clause. We
plan experimentation of diferent classifiers, operating either on single clauses, clause sequences
or clause networks.
      </p>
    </sec>
    <sec id="sec-9">
      <title>Acknowledgments</title>
      <p>This work is partially supported by i) the Next Generation UPP project within the PON
programme of the Italian Ministry of Justice, and ii) the project SERICS (PE00000014) under the
MUR NRRP funded by the EU - NextGenerationEU.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>M.</given-names>
            <surname>Oswald</surname>
          </string-name>
          ,
          <article-title>Algorithm-assisted decision-making in the public sector: framing the issues using administrative law rules governing discretionary power</article-title>
          ,
          <source>Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences</source>
          <volume>376</volume>
          (
          <year>2018</year>
          )
          <fpage>20170359</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>T.</given-names>
            <surname>Gordon</surname>
          </string-name>
          ,
          <article-title>A theory construction approach to legal document assembly</article-title>
          , in: PreProceedings of the Third International Conference on Logic, Informatics, and
          <string-name>
            <surname>Law</surname>
          </string-name>
          , Citeseer,
          <year>1989</year>
          , pp.
          <fpage>485</fpage>
          -
          <lpage>498</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>G.</given-names>
            <surname>Pinotti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Santosuosso</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Fazio</surname>
          </string-name>
          ,
          <article-title>A rule 74 for italian judges and lawyers</article-title>
          , in: Advances in Conceptual Modeling:
          <article-title>ER 2022 Workshops, CMLS, EmpER, and</article-title>
          <string-name>
            <surname>JUSMOD</surname>
          </string-name>
          , Hyderabad, India,
          <source>October 17-20</source>
          ,
          <year>2022</year>
          , Proceedings, Springer,
          <year>2023</year>
          , pp.
          <fpage>112</fpage>
          -
          <lpage>121</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>M.</given-names>
            <surname>Palmirani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Governatori</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Rotolo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Tabet</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Boley</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Paschke</surname>
          </string-name>
          , Legalruleml:
          <article-title>Xml-based rules and norms</article-title>
          .,
          <source>RuleML America</source>
          <volume>7018</volume>
          (
          <year>2011</year>
          )
          <fpage>298</fpage>
          -
          <lpage>312</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>H.</given-names>
            <surname>Kozima</surname>
          </string-name>
          ,
          <article-title>Text segmentation based on similarity between words</article-title>
          ,
          <source>CoRR cmp-lg/9601005</source>
          (
          <year>1996</year>
          ). URL: http://arxiv.org/abs/cmp-lg/9601005.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>N.</given-names>
            <surname>Reimers</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Gurevych</surname>
          </string-name>
          ,
          <article-title>Sentence-bert: Sentence embeddings using siamese bert-networks</article-title>
          , CoRR abs/
          <year>1908</year>
          .10084 (
          <year>2019</year>
          ). URL: http://arxiv.org/abs/
          <year>1908</year>
          .10084. arXiv:
          <year>1908</year>
          .10084.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>V.</given-names>
            <surname>Bellandi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Castano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Ceravolo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Damiani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ferrara</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Montanelli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Picascia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Polimeno</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Riva</surname>
          </string-name>
          ,
          <article-title>Knowledge-Based Legal Document Retrieval: A Case Study on Italian Civil Court Decisions</article-title>
          ,
          <source>in: Proc. of the 1st Int. Knowledge Management for Law Workshop (KM4LAW)</source>
          , Bozen-Bolzano, Italy,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>L. K.</given-names>
            <surname>Branting</surname>
          </string-name>
          ,
          <article-title>An issue-oriented approach to judicial document assembly</article-title>
          ,
          <source>in: Proceedings of the 4th international conference on Artificial intelligence and law</source>
          ,
          <year>1993</year>
          , pp.
          <fpage>228</fpage>
          -
          <lpage>235</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>D. B.</given-names>
            <surname>Evans</surname>
          </string-name>
          ,
          <article-title>Artificial intelligence and document assembly, Law Prac</article-title>
          . Mgmt.
          <volume>16</volume>
          (
          <year>1990</year>
          )
          <fpage>18</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>M.</given-names>
            <surname>Marković</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gostojić</surname>
          </string-name>
          ,
          <article-title>Knowledge-based legal document assembly</article-title>
          , arXiv preprint arXiv:
          <year>2009</year>
          .
          <volume>06611</volume>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>L.</given-names>
            <surname>Robaldo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Bartolini</surname>
          </string-name>
          ,
          <string-name>
            <surname>G. Lenzini,</surname>
          </string-name>
          <article-title>The dapreco knowledge base: representing the gdpr in legalruleml</article-title>
          ,
          <source>in: Proceedings of The 12th Language Resources and Evaluation Conference</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>5688</fpage>
          -
          <lpage>5697</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>W. Y.</given-names>
            <surname>Mok</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. R.</given-names>
            <surname>Mok</surname>
          </string-name>
          ,
          <article-title>Legal machine-learning analysis: First steps towards a.i. assisted legal research</article-title>
          ,
          <source>in: Proceedings of the Seventeenth International Conference on Artificial Intelligence and Law</source>
          , ICAIL '19,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA,
          <year>2019</year>
          , p.
          <fpage>266</fpage>
          -
          <lpage>267</lpage>
          . doi:
          <volume>10</volume>
          .1145/3322640.3326737.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>M. C. Klein</surname>
            ,
            <given-names>W. Van</given-names>
          </string-name>
          <string-name>
            <surname>Steenbergen</surname>
            ,
            <given-names>E. M.</given-names>
          </string-name>
          <string-name>
            <surname>Uijttenbroek</surname>
            ,
            <given-names>A. R.</given-names>
          </string-name>
          <string-name>
            <surname>Lodder</surname>
            ,
            <given-names>F. van Harmelen</given-names>
          </string-name>
          ,
          <article-title>Thesaurus-based retrieval of case law</article-title>
          .,
          <source>Frontiers in Artificial Intelligence and Applications</source>
          <volume>152</volume>
          (
          <year>2006</year>
          )
          <fpage>61</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>S.</given-names>
            <surname>Castano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ferrara</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Falduti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Montanelli</surname>
          </string-name>
          ,
          <article-title>Crime knowledge extraction: An ontologydriven approach for detecting abstract terms in case law decisions</article-title>
          ,
          <source>in: Proc. of the 17th Int Conference on Artificial Intelligence and Law</source>
          ,
          <source>ICAIL '19</source>
          ,
          <string-name>
            <surname>ACM</surname>
          </string-name>
          , New York, NY, USA,
          <year>2019</year>
          , p.
          <fpage>179</fpage>
          -
          <lpage>183</lpage>
          . doi:
          <volume>10</volume>
          .1145/3322640.3326730.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>W.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Guo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Cui</surname>
          </string-name>
          , L. Ma, BERT_LF:
          <article-title>A similar case retrieval method based on legal facts</article-title>
          ,
          <source>Wireless Communications and Mobile Computing</source>
          <year>2022</year>
          (
          <year>2022</year>
          )
          <fpage>1</fpage>
          -
          <lpage>9</lpage>
          . URL: https://doi.org/10.1155/
          <year>2022</year>
          /2511147. doi:
          <volume>10</volume>
          .1155/
          <year>2022</year>
          /2511147.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>A.</given-names>
            <surname>Ferrara</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Picascia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Riva</surname>
          </string-name>
          ,
          <article-title>Context-Aware Knowledge Extraction from Legal Documents through Zero-Shot Classification</article-title>
          ,
          <source>in: Proc. of the 1st ER Int. Workshop on Digital Justice</source>
          , Digital Law, and
          <article-title>Conceptual Modeling (JUSMOD22), Hyderabad</article-title>
          , India,
          <year>2022</year>
          , p.
          <fpage>81</fpage>
          -
          <lpage>90</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>J.</given-names>
            <surname>Devlin</surname>
          </string-name>
          , M.-
          <string-name>
            <given-names>W.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Toutanova</surname>
          </string-name>
          , Bert:
          <article-title>Pre-training of deep bidirectional transformers for language understanding</article-title>
          , arXiv preprint arXiv:
          <year>1810</year>
          .
          <volume>04805</volume>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>A.</given-names>
            <surname>Lenci</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Montemagni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Pirrelli</surname>
          </string-name>
          , G. Venturi,
          <article-title>Ontology learning from italian legal texts</article-title>
          , in: Law,
          <article-title>Ontologies and the Semantic Web</article-title>
          , IOS Press,
          <year>2009</year>
          , pp.
          <fpage>75</fpage>
          -
          <lpage>94</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>A.</given-names>
            <surname>Tagarelli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Simeri</surname>
          </string-name>
          ,
          <article-title>Unsupervised law article mining based on deep pre-trained language representation models with application to the italian civil code</article-title>
          ,
          <source>Artificial Intelligence and Law</source>
          (
          <year>2021</year>
          )
          <fpage>1</fpage>
          -
          <lpage>57</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>D.</given-names>
            <surname>Licari</surname>
          </string-name>
          , G. Comandè,
          <article-title>Italian-legal-bert: A pre-trained transformer language model for italian law (</article-title>
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>P.</given-names>
            <surname>Kalamkar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Tiwari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Agarwal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Karn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gupta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Raghavan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Modi</surname>
          </string-name>
          ,
          <article-title>Corpus for automatic structuring of legal documents</article-title>
          ,
          <source>arXiv preprint arXiv:2201.13125</source>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>G.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Nulty</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Lillis</surname>
          </string-name>
          ,
          <article-title>A decade of legal argumentation mining: Datasets and approaches</article-title>
          ,
          <source>in: Natural Language Processing and Information Systems: 27th International Conference on Applications of Natural Language to Information Systems, NLDB</source>
          <year>2022</year>
          , Valencia, Spain, June 15-17,
          <year>2022</year>
          , Proceedings, Springer,
          <year>2022</year>
          , pp.
          <fpage>240</fpage>
          -
          <lpage>252</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>G.</given-names>
            <surname>Glavaš</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Nanni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. P.</given-names>
            <surname>Ponzetto</surname>
          </string-name>
          ,
          <article-title>Unsupervised text segmentation using semantic relatedness graphs</article-title>
          ,
          <source>in: Proceedings of the Fifth Joint Conference on Lexical and Computational Semantics</source>
          , Association for Computational Linguistics,
          <year>2016</year>
          , pp.
          <fpage>125</fpage>
          -
          <lpage>130</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>O.</given-names>
            <surname>Koshorek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Cohen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Mor</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Rotman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Berant</surname>
          </string-name>
          ,
          <article-title>Text segmentation as a supervised learning task</article-title>
          , arXiv preprint arXiv:
          <year>1803</year>
          .
          <volume>09337</volume>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>A.</given-names>
            <surname>Solbiati</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Hefernan</surname>
          </string-name>
          , G. Damaskinos,
          <string-name>
            <given-names>S.</given-names>
            <surname>Poddar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Modi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Cali</surname>
          </string-name>
          ,
          <article-title>Unsupervised topic segmentation of meetings with bert embeddings</article-title>
          ,
          <source>arXiv preprint arXiv:2106.12978</source>
          (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>J.</given-names>
            <surname>Savelka</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. D.</given-names>
            <surname>Ashley</surname>
          </string-name>
          ,
          <article-title>Segmenting us court decisions into functional and issue specific parts</article-title>
          .,
          <source>in: JURIX</source>
          ,
          <year>2018</year>
          , pp.
          <fpage>111</fpage>
          -
          <lpage>120</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>D.</given-names>
            <surname>Aumiller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Almasian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Lackner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Gertz</surname>
          </string-name>
          ,
          <article-title>Structural text segmentation of legal documents</article-title>
          ,
          <source>in: Proceedings of the Eighteenth International Conference on Artificial</source>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>