<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>SEBD</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>JusBuild: a RAG-based Architecture for Legal Document Building - Discussion Paper</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Silvana Castano</string-name>
          <email>silvana.castano@unimi.it</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alfio Ferrara</string-name>
          <email>alfio.ferrara@unimi.it</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Stefano Montanelli</string-name>
          <email>stefano.montanelli@unimi.it</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sergio Picascia</string-name>
          <email>sergio.picascia@unimi.it</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Davide Riva</string-name>
          <email>davide.riva@polito.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Politecnico di Torino, Department of Control and Automation Engineering</institution>
          ,
          <addr-line>Corso Duca degli Abruzzi, 24 - 10129 Torino</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Università degli Studi di Milano, Department of Computer Science</institution>
          ,
          <addr-line>Via Celoria, 18 - 20133 Milano</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <volume>33</volume>
      <fpage>16</fpage>
      <lpage>19</lpage>
      <abstract>
        <p>The creation of legal documents often requires the generation of text that adheres to a predefined schema, a process that can be significantly enhanced through the use of digital and automated tools. This paper presents JusBuild, a Retrieval-Augmented Generation (RAG)-based document builder architecture designed to assist legal practitioners in drafting new legal documents. JusBuild operates through a multi-layered framework: the Document Segmentation Layer partitions legal documents into functional sections based on a predefined schema; the Storage Layer collects semantically meaningful vector representations of these sections, generated by an embedding model; and the Retrieval and RAG Layers provide real-time suggestions to the practitioner during the drafting process. To evaluate its versatility, JusBuild was tested on two distinct datasets, varying in document template, language, and judicial matter, demonstrating its adaptability and applicability across diverse legal contexts. The results highlight JusBuild's potential to streamline legal document drafting while maintaining user full control over document production, leveraging its “human-in-the-loop” approach.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Retrieval-Augmented Generation</kwd>
        <kwd>Natural Language Processing</kwd>
        <kwd>Legal Information Retrieval</kwd>
        <kwd>Digital Justice</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>ensuring that legal professionals maintain full control over the final output by selecting and refining
AI-generated suggestions.</p>
      <p>To validate JusBuild, we consider a dataset of first-instance civil law judgments from 12 courts of
Northern Italy. Furthermore, we test the adaptability of JusBuild across jurisdictions using a second
dataset with a diferent language, structure, and legal topic, i.e., a dataset of court decisions from the
US Securities and Exchange Commission. JusBuild is currently a prototype tool under continuous
development and refinement. An extended presentation of JusBuild as well as a discussion on possible
applications of the proposed architecture are described in [2].</p>
      <p>The paper is organized as follows: Related work is reviewed in Section 2. Sections 3 and 4 present the
JusBuild framework architecture and related layers. Section 5 evaluates its efectiveness across diferent
legal contexts, and Section 6 concludes with future research directions.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related work</title>
      <p>Our study integrates Retrieval-Augmented Generation (RAG) with legal document building, two evolving
areas in legal technology.</p>
      <p>Retrieval-Augmented Generation RAG enhances text generation by incorporating retrieved
domain knowledge [3], compensating for LLMs’ knowledge limitations. A typical RAG pipeline indexes
domain data, retrieves relevant text chunks via semantic search, and augments user queries with
retrieved information to prompt the LLM and generate responses.</p>
      <p>RAG has been successfully applied across domains [4], including legal applications like legal question
answering and document drafting. CBR-RAG [5] explores embedding models for legal question
answering, while CaseGPT [6] employs case-based reasoning in legal and medical contexts. Addressing legal
language complexity, ChatLaw [7] leverages a multi-agent model for professional legal consultations,
and [8] focuses on generating accessible legal explanations. HyPA-RAG [9] is a hybrid system combining
sparse (BM25) and dense (vector) methods with a knowledge graph retriever [10] for legal and policy
applications.</p>
      <p>For legal document drafting, LexDrafter [11] generates Definition articles for EUR-Lex, while
CLERC [12] provides a dataset for retrieving and generating legal citations.</p>
      <p>
        Legal Document Building Legal document building (or assembly) involves structured text
generation using Knowledge Bases [13], LLMs [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], or a combination of the two.
      </p>
      <p>Text Segmentation (TS) is a key component, partitioning documents into coherent sections based
on semantic, structural, or functional criteria. Methods range from linear (that predict segments in a
sequence) [14] to hierarchical (that recursively split the input document) [15] and from region-oriented
(that predict segment boundaries) [16] to class-oriented approaches (that predict segment classes given
the linguistic unit, e.g. sentences) [17]. Functional TS, crucial for isolating argumentative sections,
has been successfully applied using Conditional Random Fields (CRFs) [17, 18], statistical models that
combine good performance with exceptionally low complexity.</p>
      <p>Legal Information Retrieval ensures relevance in automated document building [19]. In this area,
approaches evolved from rule-based systems [20] and ontologies [21] to LLM-driven approaches [22],
enhancing tasks like named entity recognition [23] and case law retrieval [24].</p>
    </sec>
    <sec id="sec-3">
      <title>3. The JusBuild architecture</title>
      <p>Legal document building is a complex process requiring careful structuring, research, and refinement.
Traditionally, it involves three key phases: (i) defining the document format, (ii) drafting content for
each section, and (iii) editing. While human expertise remains essential, AI-driven tools can significantly
enhance eficiency by providing direct support with template-based structuring, semantic document
retrieval, and automated text generation.</p>
      <p>Judicial decisions, particularly those requiring detailed reasoning, exemplify the intricacies of legal
drafting. The process is influenced by interpretative frameworks, legal precedents, and subjective factors,
including a judge’s perspective, legal realism, and social contexts. A legal document builder must,
therefore, maintain human oversight while leveraging AI to streamline drafting. In this perspective,
integrating AI into document generation fosters structure and readability of legal texts, improving both
eficiency and accessibility.</p>
      <p>JusBuild is designed as a document builder architecture to assist judges in drafting court orders and
decisions while preserving their autonomy. A key design principle of JusBuild is the human-in-the-loop
approach, ensuring that the judge retains full control over the drafting process. Users can either select
suggested textual content or refine it based on legal and factual considerations. The JusBuild architecture
is illustrated in Figure 1.</p>
      <p>Legal
Documents</p>
      <p>Document Segmentation</p>
      <p>Layer</p>
      <p>Storage Layer</p>
      <p>Retrieval Layer</p>
      <p>RAG Layer</p>
      <p>Document Builder Environment
Suggestions Editing Area</p>
      <p>Legal
Practitioners</p>
      <p>JusBuild employs AI-driven techniques to facilitate the document assembly process. It relies on a
predefined legal document template and a structured corpus of legal documents, which are systematically
segmented into sections, which are in turn indexed in a database. These data serve as the foundation
for retrieving and generating section-specific suggestions to support the drafting process.</p>
      <p>The Document Builder Environment consists of two primary components: an editing area, where users
draft sections of a legal document, and a suggestion area, which provides relevant text recommendations
that can be reused or modified. The drafting process follows a structured template, composed of sections
1, . . . ,  . Here, we define * as the active document currently being edited and * as the active section
under revision. Each section  ( = 1, . . . ,  ) is assigned a section label  from a predefined set of
labels .</p>
      <p>JusBuild is characterized by the following featuring functionalities:
• A Document Segmentation Layer, based on the model proposed in [18], which automatically
partitions legal documents into functional sections aligned with the predefined template using
supervised classification techniques.
• A Retrieval-Augmented Generation (RAG) Layer, which expands the set of textual
suggestions retrieved from the legal document corpus by incorporating AI-generated content. This
hybrid approach enriches the drafting process by blending relevant judicial precedents with
AI-generated text, ofering judges a broader range of suggestions. The RAG pipeline ensures that
generated content remains contextually relevant, aligns with past legal decisions, and maintains
consistency with prior sections of the document.</p>
      <p>To assist in drafting the active section, JusBuild provides two types of suggestions:
• The Retrieved Sections, denoted as ^1 , . . . , ^ , are obtained by retrieving sections
labeled  from past documents in which the preceding sections (1, . . . , −1 ) closely resemble
1, . . . , −1 .
• The Generated Sections, represented as ˜1 , . . . , ˜ , are produced by a generative large
language model (LLM). These are created using an analogy-based prompt that leverages the Retrieved
Sections to generate new, contextually relevant content.</p>
      <p>Suggestions are extracted from a corpus  of legal documents (e.g., court decisions), each denoted as
. Individual sections within these documents are referred to as , and sentences within sections are
labeled as . Embedding vectors for sections—whether from the active document or from —as well as
for sentences, are represented using boldface notation.</p>
      <p>At initialization, the corpus  undergoes processing by the Document Segmentation Layer, which
applies text segmentation based on sentence embeddings s ∈ R. The segmented sections, along with
their corresponding Section Embeddings z ∈ R , computed using a dedicated embedding model, are
then stored in the Storage Layer. This layer leverages a Vector Database to facilitate eficient retrieval
within the Document Builder Environment.</p>
      <p>During query execution, the Storage Layer interacts with both the Document Retrieval Layer
and the RAG Layer, which respectively return the Retrieved Sections ^1 , . . . , ^ and the Generated
Sections ˜1 , . . . , ˜ , providing the user with a comprehensive set of suggestions for drafting the active
section.</p>
    </sec>
    <sec id="sec-4">
      <title>4. The JusBuild layers</title>
      <p>The JusBuild architecture is composed of four main layers, namely the Document Segmentation Layer,
the Storage Layer, the Document Retrieval Layer, and the RAG Layer described in the following.</p>
      <sec id="sec-4-1">
        <title>4.1. Document Segmentation Layer</title>
        <p>The Document Segmentation Layer partitions legal documents into functionally coherent sections,
recognizing that judicial texts serve distinct roles such as Introduction, Argumentation, and
Conclusions. Many older documents lack standardized formatting, making automated segmentation
essential.</p>
        <p>We frame functional Text Segmentation as a supervised classification task at the sentence level,
assuming each sentence serves a single function. However, this approach presents challenges, including
data scarcity (due to the need for expert annotation), label imbalance (as sections vary in length),
legal-specific jargon, and potential discontinuities in section structure.</p>
        <p>Splitting</p>
        <p>Sentence Sequence S1 S2</p>
        <p>Sn</p>
        <p>Node
Classification</p>
        <p>Vectorizer
Pair Classifier
Graph Rewiring
Node Classifier</p>
        <p>Si Sj</p>
        <p>S1</p>
        <p>Same label
Different label
S3</p>
        <p>S2</p>
        <p>Data
Preprocessing
CRF
Rewiring</p>
        <p>Label Sequence
l1 ln</p>
        <p>To address these issues, Ferrara et al. [18] proposed a Conditional Random Field (CRF)-based model
(Figure 2) that enhances traditional segmentation by incorporating an auxiliary Pair Classiefir. This
classifier estimates the likelihood  that two sentences (,  ) belong to the same section, modifying
the CRF’s graph structure: adding links for  ≥  and pruning for  &lt;  (with thresholds  &lt;  ).
The final classification is obtained using a CRF model:</p>
        <p>P =  (ΨPΛ + SM)
(1)
where P ∈ Σ (Σ being the simplex of dimension ) is the vector of probabilities of each section
label for sentence , Ψ ∈ [0; 1]× denotes the transition matrix of the rewired graph, S ∈ R×
is the feature matrix of all sentence nodes in the graph, i.e. the matrix of sentence embeddings, and
Λ ∈ R× and M ∈ R× are weight matrices to be learnt.</p>
        <p>The model improves segmentation accuracy through:
• Pairwise classification, leveraging sentence pairs ( (︀ )︀ per document) to combat data scarcity
2
and enable few-shot segmentation;
• Graph rewiring, handling section discontinuities common in judicial texts;
• Domain-specific embedding and/or Feature engineering, ensuring sound legal text
representation;
• Loss weighting, emphasizing rare section labels to mitigate class imbalance;
• Positional knowledge injection, initializing P to enforce known section order (e.g.,</p>
        <p>Introduction first, Conclusions last).</p>
        <p>Labeled sentences are finally grouped to reconstruct the full sections.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Storage Layer</title>
        <p>The Storage Layer manages Document Sections and their embeddings, supporting CRUD operations.
For each section , it stores its document ID, position, text, section label  , and embedding vector
z ∈ R.</p>
        <p>To facilitate section-level retrieval, we employ a specialized embedding model designed for
longcontext processing using LSG (Local, Sparse, Global) attention [25]. This enables capturing section
semantics over tens of thousands of tokens, reducing them to a single vector of dimension  ≈ 10 3.</p>
        <p>Eficient querying is enabled by a Vector Database, which supports fast similarity searches based on
a predefined distance metric  . This is critical for both Document Retrieval and RAG Layers, ensuring
relevant sections from past documents are retrieved for reference.</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Document Retrieval Layer</title>
        <p>Document IDs</p>
        <p>Vector
Search
filtered by
section</p>
        <p>Text</p>
        <p>Embedding
sort</p>
        <p>Get section M for
each document</p>
        <p>Suggestions</p>
        <p>The Document Retrieval Layer (Figure 3) identifies stored documents 1, . . . ,  whose sections
ind,e.p.e.n, den−1tly searched in parallel, as section order may vary.</p>
        <p>1 are similar to the active document’s written sections 1, . . . , −1 . Each section is
(2)
(3)</p>
        <p>The process involves embedding 1, . . . , −1 into vectors x1, . . . , x−1 ∈ R and querying the
Storage Layer. For each section label , the database returns the  closest stored sections:
1 = arg</p>
        <p>min (x , z).</p>
        <p>:=</p>
        <p>To rank entire documents, we aggregate section distances. Given distance   between  and  ,
we sum distances for each document, penalizing missing sections with the -th closest distance:
−1
(, {1, . . . , −1 }) = ∑︁ ^, ^ =
=1
{︃ , if  ∈ ,
 , otherwise.</p>
        <p>This approach eficiently approximates true section distances while leveraging the vector database’s
computational power. The top  closest documents are selected, and their  sections (e.g., Conclusions)
are extracted as ^1 , . . . , ^ for downstream processing in the Document Builder Environment and
RAG Layer.</p>
      </sec>
      <sec id="sec-4-4">
        <title>4.4. RAG Layer</title>
        <p>The RAG Layer expands the set of suggested sections for the user by leveraging a generative LLM to
produce new section candidates ˜1 , . . . , ˜ . To ensure coherence with the draft document * —i.e., its
existing sections 1, . . . , −1 —and align with patterns observed in similar cases, we employ a tailored
prompting strategy.</p>
        <p>At the core of this approach is one-shot learning, where the most similar retrieved document 1
(excluding its final section ^1 ) is presented as an example. This balances context relevance and prompt
eficiency, avoiding the computational overhead of multi-shot prompts. The LLM receives the following
structured prompt:</p>
        <p>Given this document: 1. Given its next section: ^1 . Now, generate the next section for this
new document: 1, . . . , −1 .</p>
        <p>Repeating this process  times yields multiple Generated Sections, ˜1 , . . . , ˜ . These are combined
with the Retrieved Sections and presented to the user in the Document Builder Environment, ofering
lfexible drafting options.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Experimental results</title>
      <p>To ensure robustness across diverse legal applications, we validate our architecture on two distinct
datasets, difering in language, document structure, and legal domain. The evaluation focuses on
generating the final court decision section, assessing three key components: Document Segmentation,
Document Retrieval, and RAG.</p>
      <p>For segmentation, we benchmark our approach against a linear-chain CRF, a common baseline for
text segmentation. In retrieval, we measure the similarity between retrieved and actual conclusions
using ROUGE-L (favouring recall) and BLEURT (favouring precision), alongside the Mean Reciprocal
Rank (MRR) to evaluate ranking accuracy. For the RAG Layer, we conduct three comparisons using
these similarity metrics: (i) diferent LLMs, (ii) retrieved vs. generated conclusions, and (iii) conclusions
generated with vs. without the RAG pipeline.</p>
      <sec id="sec-5-1">
        <title>5.1. Data and Setup</title>
        <p>The first dataset [ 17] consists of 173 U.S. trade secret court decisions annotated with functional segments,
while the second [26] includes 38 Italian unfair competition rulings labeled by legal experts. Despite
their diferences, both datasets follow a structured format, with text segmentation based on predefined
section labels. The evaluation considers entire sections for retrieval and generation, focusing on the
concluding segment.</p>
        <p>Experiments are run on a Zorin 16.3 machine (32GB RAM, 16-core CPU). Some architectural features
remain metadata-independent (e.g., segmentation hyperparameters, retrieval count), while others
depend on document metadata (e.g., language-specific embeddings and prompts). The following is a
full list of the hyperparameter configurations:
• Segmentation: Uses a feedforward neural network (256 hidden units, ReLU activation), trained
with AdamW and cyclic learning rates [10−3 , 10−2 ]. We test  ∈ {0.65, 0.80, 0.95} and  =  /2 .</p>
        <p>Sentence embedding employs Legal-BERT for English and Italian-Legal-BERT for Italian texts.
• Storage and Retrieval: Chroma DBMS, 2 distance metric, and HNSW indexing. LSG-BART
provides section embeddings. We retrieve  = 10 sections, targeting “Conclusions” (English) or
“Decision” (Italian).
• RAG: We generate  = 1 section per request, comparing GPT-4o-Mini and Mistral-7B [27] to
evaluate performance.</p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. Results</title>
        <p>We evaluate the performance of segmentation, retrieval, and generation separately.
Document Segmentation. Our approach is benchmarked against a linear-chain CRF, with 1 scores
reported in Table 1. While our model surpasses the baseline on Italian data, it underperforms on US
cases, possibly due to class imbalance (e.g., frequent “Analysis” sections) and limitations in sentence
vectorization. Notably, our model remains consistent across diferent label sets, suggesting robustness
despite data scarcity.</p>
        <sec id="sec-5-2-1">
          <title>Model</title>
        </sec>
        <sec id="sec-5-2-2">
          <title>Italian</title>
        </sec>
        <sec id="sec-5-2-3">
          <title>US (5 sections)</title>
        </sec>
        <sec id="sec-5-2-4">
          <title>US (7 sections)</title>
          <p>Baseline (linear-chain CRF)
Ours ( = 0.65)
Ours ( = 0.80)
Ours ( = 0.95)
Document Retrieval. Retrieval is assessed via Mean Reciprocal Rank (MRR) based on ROUGE-L
and BLEURT scores. For Italian cases, the top-ranked retrieved conclusion averages between the first
and second position ( − = 0.556,   = 0.567). For US cases, retrieval is
more challenging due to substantial variation in “Conclusions” sections, reflected in lower scores
( − = 0.357,   = 0.517). The reliance of ROUGE-L on exact word
sequences contributes to this drop.</p>
          <p>RAG Performance. We compare two LLMs (Mistral-7B and GPT-4o-Mini) in generating conclusions,
evaluating retrieval vs. generation and the impact of prompt augmentation. Figure 4 shows that
GPT4o-Mini consistently outperforms Mistral-7B, though the latter benefits from being open-source and
locally executable. Generated conclusions often match or surpass retrieved ones in quality, suggesting
their viability for legal drafting. Prompt augmentation aids generation in Italian cases, especially for
GPT-4o-Mini, likely due to its larger context length. However, results for US cases remain ambiguous,
with increased variance in pure generation scenarios.</p>
          <p>In analysing the generated text, we find that RAG-produced conclusions often encapsulate key
legal determinations, aligning with the true decisions despite diferences in specificity. While human
(a)
(b)
refinement remains necessary, the system efectively supports legal drafting by providing structured,
relevant content.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Concluding remarks</title>
      <p>We present JusBuild, an NLP-powered architecture designed to assist legal practitioners in drafting
case law decisions. JusBuild integrates: i) a CRF-based segmentation model to structure legal texts
into functional sections, ii) a vector database for eficient semantic search, and iii) a hybrid retrieval
and generation system suggesting precedent-based and LLM-generated content, always maintaining a
human-in-the-loop approach.</p>
      <p>Future work includes fine-tuning LLMs with legal-specific data, leveraging user feedback for model
improvement, and exploring tailored prompting techniques to enhance document generation.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments</title>
      <p>This work is supported in part by PNRR-NGEU program under MUR 118/2023, and in part by project
SERICS (PE00000014) under the NRRP MUR program funded by the EU - NGEU. Views and opinions
expressed are those of the authors only, and do not necessarily reflect those of the European Union or
the Italian MUR. Neither the European Union nor the Italian MUR can be held responsible for them.</p>
    </sec>
    <sec id="sec-8">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the author(s) used ChatGPT, Claude in order to: Grammar and
spelling check, Paraphrase and reword. After using this tool/service, the author(s) reviewed and edited
the content as needed and take(s) full responsibility for the publication’s content.
[2] M. Bufa, A. Ferrara, S. Picascia, D. Riva, S. Castano, Enhancing legal document building with
retrieval-augmented generation, Submitted to: Computer Law and Security Review (2025).
[3] P. Lewis, E. Perez, A. Piktus, F. Petroni, V. Karpukhin, N. Goyal, H. Küttler, M. Lewis, W. tau Yih,
T. Rocktäschel, S. Riedel, D. Kiela, Retrieval-augmented generation for knowledge-intensive nlp
tasks, 2021. URL: https://arxiv.org/abs/2005.11401. arXiv:2005.11401.
[4] Y. Gao, Y. Xiong, X. Gao, K. Jia, J. Pan, Y. Bi, Y. Dai, J. Sun, M. Wang, H. Wang, Retrieval-augmented
generation for large language models: A survey, 2024. URL: https://arxiv.org/abs/2312.10997.
arXiv:2312.10997.
[5] N. Wiratunga, R. Abeyratne, L. Jayawardena, K. Martin, S. Massie, I. Nkisi-Orji, R. Weerasinghe,
A. Liret, B. Fleisch, Cbr-rag: Case-based reasoning for retrieval augmented generation in llms
for legal question answering, in: J. A. Recio-Garcia, M. G. Orozco-del Castillo, D. Bridge (Eds.),
Case-Based Reasoning Research and Development, Springer Nature Switzerland, Cham, 2024, pp.
445–460.
[6] R. Yang, Casegpt: a case reasoning framework based on language models and retrieval-augmented
generation, 2024. URL: https://arxiv.org/abs/2407.07913. arXiv:2407.07913.
[7] J. Cui, M. Ning, Z. Li, B. Chen, Y. Yan, H. Li, B. Ling, Y. Tian, L. Yuan, Chatlaw: A multi-agent
collaborative legal assistant with knowledge graph enhanced mixture-of-experts large language
model, 2024. URL: https://arxiv.org/abs/2306.16092. arXiv:2306.16092.
[8] A. Louis, G. van Dijck, G. Spanakis, Interpretable long-form legal question answering with
retrieval-augmented large language models, Proceedings of the AAAI Conference on Artificial
Intelligence 38 (2024) 22266–22275. URL: https://ojs.aaai.org/index.php/AAAI/article/view/30232.
doi:10.1609/aaai.v38i20.30232.
[9] R. Kalra, Z. Wu, A. Gulley, A. Hilliard, X. Guan, A. Koshiyama, P. C. Treleaven, HyPA-RAG: A hybrid
parameter adaptive retrieval-augmented generation system for AI legal and policy applications,
in: S. Kumar, V. Balachandran, C. Y. Park, W. Shi, S. A. Hayati, Y. Tsvetkov, N. Smith, H. Hajishirzi,
D. Kang, D. Jurgens (Eds.), Proceedings of the 1st Workshop on Customizable NLP: Progress and
Challenges in Customizing NLP for a Domain, Application, Group, or Individual (CustomNLP4U),
Association for Computational Linguistics, Miami, Florida, USA, 2024, pp. 237–256. URL: https:
//aclanthology.org/2024.customnlp4u-1.18/. doi:10.18653/v1/2024.customnlp4u-1.18.
[10] D. Edge, H. Trinh, N. Cheng, J. Bradley, A. Chao, A. Mody, S. Truitt, D. Metropolitansky, R. O. Ness,
J. Larson, From local to global: A graph rag approach to query-focused summarization, 2025. URL:
https://arxiv.org/abs/2404.16130. arXiv:2404.16130.
[11] A. Chouhan, M. Gertz, LexDrafter: Terminology drafting for legislative documents using retrieval
augmented generation, in: N. Calzolari, M.-Y. Kan, V. Hoste, A. Lenci, S. Sakti, N. Xue (Eds.),
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language
Resources and Evaluation (LREC-COLING 2024), ELRA and ICCL, Torino, Italia, 2024, pp. 10448–
10458. URL: https://aclanthology.org/2024.lrec-main.913.
[12] A. B. Hou, O. Weller, G. Qin, E. Yang, D. Lawrie, N. Holzenberger, A. Blair-Stanek, B. V. Durme,
Clerc: A dataset for legal case retrieval and retrieval-augmented analysis generation, 2024. URL:
https://arxiv.org/abs/2406.17186. arXiv:2406.17186.
[13] M. Marković, S. Gostojić, Legal document assembly system for introducing law students with
legal drafting, Artificial Intelligence and Law 31 (2022) 829–863. URL: https://doi.org/10.1007/
s10506-022-09339-2. doi:10.1007/s10506-022-09339-2.
[14] O. Koshorek, A. Cohen, N. Mor, M. Rotman, J. Berant, Text segmentation as a supervised learning
task, in: M. Walker, H. Ji, A. Stent (Eds.), Proceedings of the 2018 Conference of the North American
Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume
2 (Short Papers), Association for Computational Linguistics, New Orleans, Louisiana, 2018, pp.
469–473. URL: https://aclanthology.org/N18-2075. doi:10.18653/v1/N18-2075.
[15] G. Glavaš, A. Ganesh, S. Somasundaran, Training and domain adaptation for supervised text
segmentation, in: J. Burstein, A. Horbach, E. Kochmar, R. Laarmann-Quante, C. Leacock, N. Madnani,
I. Pilán, H. Yannakoudakis, T. Zesch (Eds.), Proceedings of the 16th Workshop on Innovative Use
of NLP for Building Educational Applications, Association for Computational Linguistics, Online,
2021, pp. 110–116. URL: https://aclanthology.org/2021.bea-1.11.
[16] A. Solbiati, K. Hefernan, G. Damaskinos, S. Poddar, S. Modi, J. Cali, Unsupervised topic
segmentation of meetings with bert embeddings, 2021. URL: https://arxiv.org/abs/2106.12978.
arXiv:2106.12978.
[17] K. D. Savelka, Jaromir; Ashley, Segmenting us court decisions into functional and issue specific
parts., in: JURIX, 2018, pp. 111–120.
[18] A. Ferrara, S. Picascia, D. Riva, Few-shot legal text segmentation via rewiring conditional random
ifelds: A preliminary study, in: International Conference on Conceptual Modeling, Springer, 2023,
pp. 141–150.
[19] C. Sansone, G. Sperlí, Legal information retrieval systems: State-of-the-art and open issues,
Information Systems 106 (2022) 101967. URL: https://www.sciencedirect.com/science/article/pii/
S0306437921001551. doi:https://doi.org/10.1016/j.is.2021.101967.
[20] W. Y. Mok, J. R. Mok, Legal machine-learning analysis: First steps towards a.i. assisted legal
research, in: Proceedings of the Seventeenth International Conference on Artificial Intelligence
and Law, ICAIL ’19, Association for Computing Machinery, New York, NY, USA, 2019, p. 266–267.
doi:10.1145/3322640.3326737.
[21] S. Castano, A. Ferrara, M. Falduti, S. Montanelli, Crime knowledge extraction: An
ontologydriven approach for detecting abstract terms in case law decisions, in: Proc. of the 17th Int
Conference on Artificial Intelligence and Law, ICAIL ’19, ACM, New York, NY, USA, 2019, p.
179–183. doi:10.1145/3322640.3326730.
[22] A. Ferrara, S. Picascia, D. Riva, Context-Aware Knowledge Extraction from Legal Documents
through Zero-Shot Classification, in: Proc. of the 1st ER Int. Workshop on Digital Justice, Digital
Law, and Conceptual Modeling (JUSMOD22), Hyderabad, India, 2022, p. 81 – 90.
[23] A. Elnaggar, R. Otto, F. Matthes, Deep learning for named-entity linking with transfer learning
for legal documents, in: Proceedings of the 2018 Artificial Intelligence and Cloud Computing
Conference, AICCC ’18, Association for Computing Machinery, New York, NY, USA, 2018, p. 23–28.</p>
      <p>URL: https://doi.org/10.1145/3299819.3299846. doi:10.1145/3299819.3299846.
[24] Y. Shao, J. Mao, Y. Liu, W. Ma, K. Satoh, M. Zhang, S. Ma, Bert-pli: Modeling paragraph-level
interactions for legal case retrieval, in: C. Bessiere (Ed.), Proceedings of the Twenty-Ninth
International Joint Conference on Artificial Intelligence, IJCAI-20, International Joint Conferences
on Artificial Intelligence Organization, 2020, pp. 3501–3507. URL: https://doi.org/10.24963/ijcai.
2020/484. doi:10.24963/ijcai.2020/484, main track.
[25] C. Condevaux, S. Harispe, Lsg attention: Extrapolation of pretrained transformers to long
sequences, in: Pacific-Asia Conference on Knowledge Discovery and Data Mining, Springer, 2023,
pp. 443–454.
[26] E. Zanoli, M. Barbini, D. Riva, S. Picascia, E. Furiosi, S. D’Ancona, C. Chesi, et al.,
Annotators-inthe-loop: testing a novel annotation procedure on italian case law, in: Proceedings of the 17th
Linguistic Annotation Workshop (LAW-XVII), Association for Computational Linguistics, 2023,
pp. 118–128.
[27] A. Q. Jiang, A. Sablayrolles, A. Mensch, C. Bamford, D. S. Chaplot, D. d. l. Casas, F. Bressand,
G. Lengyel, G. Lample, L. Saulnier, et al., Mistral 7b, arXiv preprint arXiv:2310.06825 (2023).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>S.</given-names>
            <surname>Castano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ferrara</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Montanelli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Picascia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Riva</surname>
          </string-name>
          ,
          <article-title>A knowlegde-based service architecture for legal document building</article-title>
          ,
          <source>in: 2nd International Workshop on Knowledge Management and Process Mining for Law</source>
          , volume
          <volume>3637</volume>
          ,
          <string-name>
            <surname>Sherbrooke</surname>
          </string-name>
          , Quebec, Canada,
          <year>2023</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>15</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>