<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Legal Document Query Language: Conceptualizing Linguistic Commands for AI Assistants in Civil Appeal Proceedings ⋆</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Ruben Agazzi</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Carlo Batini</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Matteo Palmonari</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Monica Vitali</string-name>
        </contrib>
      </contrib-group>
      <pub-date>
        <year>2026</year>
      </pub-date>
      <abstract>
        <p>In Italy's civil appeal proceedings, judges rely on legal repositories, case management databases, and case iflings for decision-making. Generative AI ofers promising support for these cognitive tasks but struggles to account for their specificity, limiting efectiveness. In this paper, we present early results on an efort to improve solutions based on generative AI to support second-instance civil proceedings by proposing a Legal Document Query Language (LDQL) for Civil Appeal Proceedings. Inspired by structured query languages, LDQL specifies recurring operations and primitives for such operations. It can be viewed as a conceptual layer to guide the selection of prompts optimized for specific cognitive tasks, the completion of these prompts with textual elements specific to the user request, and the specification of constraints on features that responses should satisfy. As a first contribution, the language helps better clarify the variety of the underlying operations. We discuss a use case where LDQL is employed to interact with specialized prompts with an LLM or an LLM-based system and report about the quality (e.g. accuracy, eficiency, usefulness) of the response in relation to the overall task. Preliminary results suggest that a language like LDQL can support a better orchestration of LLM-based linguistic services thus making it worth proceeding with its implementation using a multi-agent architecture.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Legal documents</kwd>
        <kwd>AI assistants</kwd>
        <kwd>Query languages</kwd>
        <kwd>Document management</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>In Italy’s second-instance civil proceedings, referred here also as appeal civil proceedings, judges
consult legal document repositories, case management databases, and key case filings to track
procedural events, access precedents, and analyze case materials to decide and motivate it.</p>
      <p>Solutions based on generative AI, in short, AI assistants, are promising means to support
the cognitive tasks underpinning this consultation and decision-making process, which are
left to judges. However, current solutions based on long document processing or even RAG
architecture do not explicitly consider the variety of the cognitive tasks performed in the process
and hardly capture the specificity of these tasks, which constrain specific linguistic operations
on certain documents and data. In this paper, we present early results on an efort to evaluate
SEBD 2025
⋆You can use this document as the template for preparing your publication. We recommend using the latest version
of the ceurart style.
* Corresponding author.
and improve AI assistants to support civil appeal proceedings by proposing a Legal Document
Query Language (LDQL) for Civil Appeal Proceedings.</p>
      <p>Inspired by structured query languages, LDQL specifies recurring operations and primitives
for such operations. It can be viewed as a conceptual layer to guide the selection of prompts
optimized for specific cognitive tasks, the completion of these prompts with textual elements
specific to the user request, and the specification of constraints on the documents to consider or
on features that responses should satisfy. As a first contribution, the conceptual language helps
better clarify the variety and specificity of the underlying operations. As a second contribution,
we use this conceptualization for an early evaluation in terms of reliability and expected utility
of AI assistants on an Appeal use case, obtained with the popular ChatGPT assistant and with
an on-premise prototype assistant based on a smaller model. Based on the early findings we
discuss our plans to extend the prototype exploiting the LDQL conceptualization. The paper is
organized as follows. In section2, we discuss the related work, while in Section 3 we introduce
a use case in the area of labour that allowed us to identify cognitive activities that can be
expressed in terms of verbal commands. Such commands have been a posteriori expressed with
a variety of prompts that have been shown to a judge that evaluated them in terms of various
qualities. The result of the evaluation is discussed in Section 4, where we discuss the takeaways
of the experiment, while in Section 5, we provide a description of the functional architecture of
a prototype whose production is in progress. Section 6 briefly addresses future work.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        Languages adopted in document management systems are surveyed in [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Speech act theory [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ],
[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] investigates speech acts; a speech act is created when speaker/writer  makes an utterance
 to hearer/reader  in context . Various classifications of speech acts are provided, and for
each item in the classification a list of acts is provided. A paper afine to ours is [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], where a set
of commands, some of them similar to ours, are defined to compare question answering on a set
of documents spanning from traditional search engines to LLMs. Through a survey, they find
that users prefer search engines over LLMs for high-stakes queries, where concerns regarding
information provenance outweigh the perceived utility of LLM responses. Furthermore, they
define a set of qualities associated to commands similar to ours.
      </p>
      <p>
        Think aloud techniques are widely adopted to elicit various aspects of human activity,
including in the legal domain. In [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], lawyers’ interactive information behaviour is discussed; in [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ],
specialist legal expertise is measured through a think-aloud verbal protocol analysis.
      </p>
      <p>
        Various surveys appeared recently in the literature on Prompt engineering, e.g., a huge
amount of techniques are discussed in [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. Literature on specific commands is rich, especially
for the Search command and for the Summarize command, e.g., in [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], where the two classical
Extractive Abstractive techniques are compared on diferent aspects. In [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], a command
resulting from the composition of our Summarize and Integration commands is applied to build
a unique text from several overlapping and contradictory documents.
      </p>
      <p>AI-powered assistants such as Copilot are now appearing, providing some of our commands
such as Summarize, integrated in platforms such as in the case of Copilot in Microsoft 365.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Experimentation phase: eliciting the commands with a think aloud approach and conceiving and experimenting the commands</title>
      <p>To elicit linguistic commands that are most relevant in expressing cognitive activities of a judge
in a civil procedure we focused on one case study in the area of labour and one judge, limiting
our investigation to documents shown in Figure 1. For reach of the First degree Judgement,
Second degree Appeal and Second degree Brief we consider the text of the document, in which
we assume are mentioned several precedent judgements related to similar cases from Courts,
Courts of Appeal and Court of Cassation, see Figure 1.</p>
      <p>By means of interviews, we were able to reconstruct all the actions (Figure 2) related to the
drafting of the judgment, expressed in terms of text fragments corresponding to Statement of
facts, Precedent citations, legal reasoning (ground) and rulings. Such text fragments can be
inherited from documents in the case file or citations of judgements in court, court of appeal,
and court of Cassation, or else, as to the ruling, the result of the autonomous development of
the judge’s reasoning.</p>
      <p>Adopting a think-aloud approach, we associated one or more cognitive activities with each
action, which we have expressed in terms of linguistic commands, see the list of the eleven
commands in the left-hand part of Figure 3.</p>
      <p>At this point, we asked ChatGPT Scholar to produce a list of Acts of Speech for document
management in the legal domain; the result was a list that included e.g. Search and Summarize,
but also procedural acts such as Register (a judgment). Using a technique based on in-context
prompting, with a further prompt we asked to produce a list with, e.g., create and without, e.g.,
register, resulting in responses to the fourteen acts of speech in the right-hand side of Figure 3.</p>
      <p>We also mapped the two lists of commands and acts. It may be seen that the two acts of
Speech Argue and Deduce have no counterpart in our list; this is reasonable, since we excluded,
coherently with the rules of the European AI Act, actions in which Generative AI can play an
active and autonomous role in sentencing. The Order command is expressed in LDQL with
the lexicographic condition, There are three linguistic commands, namely, Calculate, Expand
and Integrate, that have no counterpart in the Acts of Speech; Calculate computes simple
mathematical expressions, Expand is the complementary command w.r.t. Summarize, Integrate
results in the fusion of several documents or part of them into a unique text.</p>
      <p>To proceed with the experimentation, we have first produced a syntax for the eleven
commands. Following the syntax of the Select instruction in relational databases: SELECT &lt;what
(among columns)&gt;, FROM &lt;where (among relations)&gt;, WHERE &lt;condition&gt;, we
defined a similar general syntax for commands of LDQL shown in the left-hand part of Figure 4.
In relational databases SELECT refers to a unique command, which is replaced in LDQL by
the set of diferent commands shown in Figure 3. Notice that the What part (the output of
the command) spans over a significant set of categories, Entity and Concept are the typical
categories of the targets of semantic enrichment and search; other categories cover the most
relevant documents and semantic parts of documents resulting from the experimentation on
the case considered. The Where part refers to all categories in the What part, except Entity and
Concept, including the three documents of the Appeal case file we mentioned in Figure 1. Finally,
the How (condition) part in this first specification of LDQL corresponds to logical, mathematical,
temporal, spatial and organizational operators; a formal definition of the syntax is out of scope
in this paper.</p>
      <p>Figure 4 shows the syntax of atomic commands (Search, Summarize, etc.). Besides atomic
commands, LDQL supports composite commands, which have a functional nested syntax, in
which the Where part of a command C1 is expressed in terms of a second command C2, e.g.,
Summarize(Search "..."). A formal representation of simple and composite commands in terms
of a syntax, e.g. in Backus Normal Form, could easily be provided, but it is out of the scope of
this paper.</p>
      <p>To proceed with the experimentation, we have focused on the first four commands, Search,
Summarize, Extract, and Compare, and for each of them we produced from six to ten prompts.
Prompts for the Search and Summarize commands are shown in Tables 2 and 3, where we
will discuss their evaluation. Notice that prompts 1, 2, and 6 of the Summarize command are
composite commands, and in one of them we used the command Calculate.</p>
      <p>A significant aspect of LDQL is the set of qualities that each prompt should respect. We
distinguish two types of properties, respectively “command-driven” properties and “user-driven
properties”, see Table 1.
Command driven properties are accuracy, namely the precision of the information provided
in output (e.g. an excerpt is faithful to the original) and completeness, meaning that all texts
requested have been produced in output. Command-driven properties are defined for the Search
command and have to be evaluated by testing the prompt with diferent examples of where and
what. User driven properties are evaluated by the user. Eficiency and Utility are common to all
commands; eficiency is the perceived saved time when using the command, while utility is the
perceived added value for the user in his/her activity. Efectiveness is defined for Summarize,
Extract and Compare, and represents the perceived adherence of the output to user needs; e.g.,
for Summarize, the summary must contain all the relevant aspects of the original document,
and hide the details.</p>
      <p>In the following, we report preliminary results of system and user evaluations. Tables 1 and 3
show the evaluations for the search (nine prompts) and summarize (seven prompts) commands,
referring to their qualities as indicated in Table 1. Accuracy and completeness are evaluated in
scale 0-1, while remaining qualities are evaluated with a scale where 1 corresponds to “not at
all”, 2 corresponds to “a little”, 3 corresponds to “somewhat”, and 4 corresponds to “a lot”.
NM
0.8
0.75
NM
NM
1
1
Search in S1 and summarize the text quoted between " " from the Supreme Court judgment Cass. October 17, 2018, No.
26017.</p>
      <p>Calculate the number of words in the citation (50) and summarize the citation text in fewer than 30 words.</p>
      <p>Summarize the rulings of Judgment S1 in fewer than 50 words.</p>
      <p>Summarize very long sentences associated with quoted judgments in S1.
.Calculate the number of words in the phrases between " " related to the judgments cited in S1, search for the judgment
number containing the phrase "contested, in the case," and summarize in fewer than 100 words the sentence in S2
related to the judgment.</p>
      <p>Search for all the following precedents of merit in the Appeal, extract the quoted sentences, reporting them in quotation
marks, and for each, summarize the legal reasoning.</p>
      <p>In Table 4, we show the aggregated results for the entire set of four commands in terms of the
user-evaluated criteria. Contrary to the initial intuition, the command Compare is considered
less eficient and useful w.r.t. other commands, and contributes significantly to lower the global
evaluations. Interestingly, the Efectiveness quality has the highest average score, meaning that
ChatGPT in the perception of the user, “made a very good job”, independently from time saved
and value for the user of the commands. A more comprehensive analysis appears in the next
section.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Takeaways of the experimentation phase</title>
      <p>Beyond the assessments provided, the experimentation phase was experienced by the judge-user
as a process of learning, acquiring knowledge, and practicing a new language for interacting
with the available knowledge within civil proceedings. This experience has led to the following
reflections.</p>
      <p>The ultimate purpose of applying Generative AI to civil proceedings is to extract knowledge
from the procedural documents submitted by the parties and the evidence produced during the
trial, thereby assisting the judge in formulating the final decision.</p>
      <p>In line with this objective, the Search command—whose utility and eficiency are
evident—demonstrates enhanced efectiveness over traditional keyword-based search due to its
ability to incorporate temporal conditions (e.g., retrieving the most recent decisions of the Court
of Cassation or the Court of Justice of the European Union from a specified year onward) as well
as spatial/organizational conditions (e.g., filtering decisions issued by Courts of Appeal within
one or more regions). The combination of these two types of conditions further enhances its
eficacy.</p>
      <p>Regarding the Summarize command, its usefulness and eficiency are significantly leveraged
following the entry into force of Ministerial Decree No. 110 of August 7, 2023. This decree,
issued by the Ministry of Justice, establishes criteria for drafting, setting word limits, and
structuring judicial documents by defining the fields necessary for entering information into the
procedural database. The decree imposes word limits on party submissions (such as petitions
and pleadings) and similarly guides judges in drafting their judgments. Consequently, in the
process of selecting documents for inclusion in judgments, it has become extremely beneficial to
use functionalities that enable the extraction of summarized portions of procedural documents
within a specified word limit. Even more useful is a functionality that by using the Summarize
and Expand commands together, continuously generates versions with increasing or decreasing
word limits allowing the judge to select the most appropriate word count.</p>
      <p>Both the Extractive and Abstractive types of the command Summarize have proven to be
useful, depending on the specific needs they serve. For example: Extractive summarization is
particularly beneficial during the case study phase, while Abstractive summarization is more
appropriate during the drafting of the judgment itself— which is inherently dialogical. Also,
Extractive summarization may be preferable when dealing with novel cases, whereas Abstractive
summarization may be better suited for cases where prior rulings exist.</p>
      <p>In this regard, combining the Summarize command with Search to target specific sets of
documents, as described above, further enhances its utility.</p>
      <p>Since the Extract command is designed to retrieve aspects of a document relevant to its
meaning by analyzing words and phrases linked by syntactic rules, it is particularly useful
for extracting the facts of the case from party submissions and the first-instance judgment, as
well as identifying the syllogism or logical reasoning proposed by the parties for the judge’s
consideration.</p>
      <p>When used in conjunction with the Compare command, the Extract command proves to be
both eficient and useful for addressing specific judicial needs. In particular, it aids in:
• Assessing the necessity of an evidentiary investigation: If the factual accounts extracted
from the parties’ submissions align, the need for fact-finding at the first-instance stage is
obviated.
• Second-instance proceedings: If no discrepancies exist between the facts stated in the
ifrst-instance judgment and those presented in the appeal (including syllogisms or the
logical reasoning of the parties), the Compare command can help pinpoint the exact point
of divergence and extract the decisive logical aspect of the opposing arguments.</p>
      <p>The preliminary conclusion of this experimentation phase is that, beyond the ability to use
linguistic commands individually, these commands should be regarded as a toolbox that can be
utilized in various sequences depending on the scenario and the specific objectives of the judge.</p>
      <p>This approach should consider:
• The quantity, quality, and diversity of the knowledge to be processed;
• The homogeneity or divergence of the legal materials;
• The maturity, emerging nature, stability, instability, or obsolescence of the legal framework
governing rulings both at the preliminary and at the adjudicative stages.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Prototype</title>
      <p>In this experiment, a human has used a top-tier LLM with exceptionally long context via a
chatbot, deciding which document(s) should be passed to the LLM for each prompt. Documents
were anonymized beforehand.</p>
      <p>Our goal is to develop a system that exploits the conceptualization provided by LDQL can
handle diferent prompts with dedicated prompt processing. We want a system that can operate
with an LLM hosted on-premises, avoiding sharing sensitive data with third-party companies
and having reasonable eficiency constraints (an LLM as small as possible). We are developing
such a system as an evolution of DAVE, a prototype web application that combines semantic
search and a conversational interface, which can exploit an on-premise small LLM and a Retrieval
Augmented Generation (RAG) framework.</p>
      <p>
        DAVE is designed to analyze document collections in knowledge-intensive domains, mixing
features that cover both directions in the extractive-abstractive spectrum discussed in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
It provides a graphical user interface (GUI) that facilitates search, visualization, exploration
and interactions with the documents in the knowledge base. The system adopts an
entitycentric approach, by annotating the input documents with entities found using entity extraction
pipelines. Describing these algorithms in detail is out of the scope of this paper; the reader
can refer to our previous work where diferent versions of these pipelines are denfied in
detail [
        <xref ref-type="bibr" rid="ref10 ref11 ref12">10, 11, 12</xref>
        ]. As a result of this pre-processing step, documents are annotated with named
entities of diferent types (e.g., Person, Organization, Location, Date, Money, etc.). On top of
these annotated documents DAVE ofers the following key functionalities.
      </p>
      <p>1. Entity-Driven Faceted Search: The search process begins with a textual keyword
search, retrieving an initial set of relevant documents. Users can then refine their queries
by leveraging named entities associated with the documents in the annotations, allowing
for more precise filtering and structured retrieval.
2. Conversational Interface: DAVE includes a chatbot, i.e., a conversational interface
that enables users to interact with the document collection through natural language
prompts. The underlying system, initially developed to provide question-answering
functionalities, is based on the Retrieval-Augmented Generation (RAG) framework. The
retrieval module incorporates named entities to enhance the retrieval of relevant chunks,
which are subsequently used for answer generation. The user can select which retrieval
strategy is used to answer the question (e.g., with or w/o entities), including a "no retrieval"
option.
3. Document Viewer: DAVE includes a built-in document viewer that allows users to read
the full text of a document along with its corresponding entity annotations. Users can
inspect named entity mentions, modify entity links to the Wikipedia knowledge base,
and view or edit clusters of mentions that refer to the same entity, enabling more refined
entity management and knowledge integration. Therefore users can correct mistakes
made by the entity extraction process and consolidate the knowledge base interactively.
4. Integration of Functionalities: The system enables seamless integration between
faceted search and the conversational interface. For instance, users can first identify a set
of relevant documents using the entity-driven faceted search and then use the chatbot
on top of the selected set of documents; in this case, only chunks from the selected
documents are considered for generating the system’s responses (filters are applied by the
RAG’s retrieval module). This helps the user narrow the focus of the chatbot interactively.
A special case of this integration is when the chatbot is activated from the Document
Viewer: in this case, the chatbot only looks at chunks within the document that the user
is exploring.</p>
      <p>To use DAVE on the Appeal use case, we added a new feature: the chatbot can bypass the
RAG module, and consider the whole document(s) as a context to respond to the user’s prompt.
Obviously, the possibility to use this feature is subject to whether the documents fit into the
size of the context window of the underlying LLM; if the document exceeds the context limit,
the RAG module is activated.</p>
      <sec id="sec-5-1">
        <title>5.1. Early experiments on the Appeal use case and considerations on LDQL</title>
        <p>
          In this section, we report on preliminary experiments conducted to use DAVE in the Appeal use
case. The objective of these experiments is to obtain preliminary insights into the capabilities
that a small on-premise LLM exhibits on the same conditions in which we tested ChatGPT. The
response generation in our current prototype is performed using the Phi-3.5-mini-ITA model, a
ifne-tuned Italian-language version of Microsoft’s Phi-3.5-mini [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ], with a 128k-token context
window. The model is deployed on a server equipped with a single NVIDIA Tesla T4 GPU,
featuring 16 GB of VRAM. We employed a quantized version of Phi-3.5-mini-ITA to optimize
VRAM usage. While we are aware of novel and promising LLMs that work with Italian, we
selected this model because we empirically found that it achieves a good trade-of between
quality of responses, speed and size, especially when quantized.
        </p>
        <p>
          Preprocessing. Initially, all documents were processed using the document annotation
pipeline to extract named entities (NER), link entities to Wikipedia where possible (NEL),
identify entities without a corresponding Wikipedia entry (NIL Prediction), and cluster entity
mentions referring to the same entity [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]. After processing, the documents were indexed
in an Elasticsearch database, segmented into chunks with a maximum length of 500 tokens,
and assigned embeddings computed using the gte-multilingual-base embedding model from
Alibaba-NLP [14].
        </p>
        <p>DAVE usage. After processing and indexing all documents, we tested DAVE’s conversational
interface by submitting the same set of questions previously posed to ChatGPT, with exactly
the same prompts, and collecting the responses. To use a setting as similar as possible to the
one used int he experiments with ChatGPT4o, we used the new functionality we introduced,
where DAVE tries to use the full document as context if it fits the context window, and the
RAG otherwise. We identified the relevant documents with search and constrained the chatbot
to work with the selected documents. We employed Retrieval-Augmented Generation (RAG)
for only 5 out of the 16 prompts, specifically those related to the appeal document, as it is the
longest.</p>
        <p>Focusing on the evaluation that can be performed by system developers (see Table 1, we
found that most of the responses were reasonably accurate and complete, though their quality
was generally lower compared to ChatGPT’s GPT-4o model, particularly for the summarization
and extraction commands. For certain tasks, such as the count command, the model did not
produce satisfactory results. This limitation is likely due to the inherent architecture of the
model: Phi-3.5-mini is significantly smaller than ChatGPT’s GPT-4o, with only 3.82 billion
parameters compared to the estimated 1.76 trillion parameters of GPT-4o. Additionally, we used
a quantized model due to hardware memory constraints.</p>
        <p>A challenge we encountered was the limited context length available due to hardware memory
constraints. Although Phi-3.5-mini supports a 128k-token context window, we were only able
to utilize a maximum of 20,000 tokens. This is a significant disadvantage compared to ChatGPT,
which has the computational resources to process the full text of all documents within its
context for answer generation. Despite this limitation, we were able to fit almost all documents
within the available context. For the longest document (the appeal document), where the full
text exceeded the context limit, we used the RAG framework, where we retrieve the most
relevant chunks using a hybrid retrieval method that combines vector search, full-text search,
and entity-based search. While this constraint posed a challenge, the model still handled these
cases quite efectively.</p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. What’s next? LDQL and multi-agent RAG</title>
        <p>The preliminary experiments suggest that ChatGPT and DAVE return promising responses
when fed with whole documents as context, but this naif prompting method is not reliable
enough for some tasks (e.g., it does not retrive all precedents in S2 - see Table 2) and can hardly
scale to more complex cases with more and longer documents to consider. Observe that in
the experiments, the user has selected the documents to consider for generating the response.
However, we would like a system that determines where information should be searched. Also,
no specific strategy other than some heuristics has been used to optimize the prompts. While this
specific use case has documents of moderate length (yet, one exceeded the memory constraints
in our infrastructure), appeal decisions may need to work with longer documents. This means
that some sort of RAG framework needs to be considered. However, previous work has found
that pushing the user response to the retrieval module as-is is frequently inefective [ 15, 16].
And in fact, several approaches are proposed to process the user prompts and documents with
more complex pipelines, e.g., [17, 18]. The LDQL provides a good conceptualization to handle
these challenges, by conceptualizing the user needs in a structured format. This format suggest
that extracting specific elements from user prompt could be beneficial: diferent commands
may be associated with diferent prompts; the What part, may be used by the retrieval module;
the Where part, may be translated in filters on documents to consider for the reply.</p>
        <p>We plan to address all these observations by moving from a naive RAG architecture to a
modular RAG architecture backed by a multi-agent system [19], which is depicted, together
with two screenshots, in Figure 5. In this architecture, users’ prompt are interpreted by one
agent that, using the LDQL as a reference, extracts specific elements and generate and route
queries (e.g., to the RAG module) optimized for specific commands. Other agents may need
to verify and cross check the responses, another pattern applied by RAG systems in other
domains [17, 18, 19].</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusions</title>
      <p>In this work, we have presented a project for defining a verbal command language that extends
the structure of the Select statement for databases to documents. We extend the command into a
set of eleven verbal commands that express the cognitive needs of a judge in a civil appeal trial.
We present the outcomes of an experiment, in which a judge has evaluated prompts based on
ifve of the eleven commands in terms of efectiveness, eficiency and utility. We finally describe
the architecture of a prototype whose production is in progress. Next phases of the project
will be the completion of the experimentation with the remaining six commands and the full
production of the prototype.</p>
      <p>Preliminary results suggest that a language like LDQL can support a better orchestration of
LLM-based linguistic services thus making it worth proceeding with its implementation using a
multi-agent architecture. While the current LDQL proposal refers to Civil Appeal Proceedings,
we believe that similar complex cognitive tasks may be useful in other domains both internal
and external to Justice.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments</title>
      <p>This work is part of a research project conducted with the CINI consortium and funded by
the Italian Ministry of Justice. It is also partially funded by the European innovation actions
enRichMyData (HE 101070284) and DataPACT (HE 101189771), and the Italian project Discount
Quality for Responsible Data Science (PRIN 202248FWFS), funded by the European Community
- Next Generation EU.</p>
    </sec>
    <sec id="sec-8">
      <title>Declaration on Generative AI</title>
      <p>The authors have not employed any Generative AI tools.
J. Bao, H. Behl, et al., Phi-3 technical report: A highly capable language model locally on
your phone, arXiv preprint arXiv:2404.14219 (2024).
[14] X. Zhang, Y. Zhang, D. Long, W. Xie, Z. Dai, J. Tang, H. Lin, B. Yang, P. Xie, F. Huang, et al.,
mgte: Generalized long-context text representation and reranking models for multilingual
text retrieval, in: Proceedings of the 2024 Conference on Empirical Methods in Natural
Language Processing: Industry Track, 2024, pp. 1393–1412.
[15] Y. Gao, Y. Xiong, X. Gao, K. Jia, J. Pan, Y. Bi, Y. Dai, J. Sun, H. Wang, H. Wang,
Retrieval-augmented generation for large language models: A survey, arXiv preprint
arXiv:2312.10997 2 (2023).
[16] Y. Zhou, Y. Liu, X. Li, J. Jin, H. Qian, Z. Liu, C. Li, Z. Dou, T.-Y. Ho, P. S. Yu, Trustworthiness
in retrieval-augmented generation systems: A survey, arXiv preprint arXiv:2409.10102
(2024).
[17] S. Semnani, V. Yao, H. Zhang, M. Lam, WikiChat: Stopping the hallucination of large
language model chatbots by few-shot grounding on Wikipedia, in: H. Bouamor, J. Pino,
K. Bali (Eds.), Findings of the Association for Computational Linguistics: EMNLP 2023,
Association for Computational Linguistics, Singapore, 2023, pp. 2387–2413. URL: https:
//aclanthology.org/2023.findings-emnlp.157.
[18] Z. Xu, M. J. Cruz, M. Guevara, T. Wang, M. Deshpande, X. Wang, Z. Li, Retrieval-augmented
generation with knowledge graphs for customer service question answering, in:
Proceedings of the 47th International ACM SIGIR Conference on Research and Development in
Information Retrieval, 2024, pp. 2905–2909.
[19] A. Singh, A. Ehtesham, S. Kumar, T. T. Khoei, Agentic retrieval-augmented generation: A
survey on agentic rag, arXiv preprint arXiv:2501.09136 (2025).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>C.-O.</given-names>
            <surname>Truică</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.-S.</given-names>
            <surname>Apostol</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Darmont</surname>
          </string-name>
          , T. B.
          <string-name>
            <surname>Pedersen</surname>
          </string-name>
          ,
          <article-title>The forgotten document-oriented database management systems: An overview and benchmark of native xml dodbmses in comparison with json dodbmses</article-title>
          ,
          <source>Big Data Research</source>
          <volume>25</volume>
          (
          <year>2021</year>
          )
          <fpage>100205</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>J. R.</given-names>
            <surname>Searle</surname>
          </string-name>
          ,
          <article-title>Expression and meaning: Studies in the theory of speech acts</article-title>
          , Cambridge University Press,
          <year>1979</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>K.</given-names>
            <surname>Allan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Lamarque</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Asher</surname>
          </string-name>
          ,
          <article-title>Speech act theory: Overview, Concise encyclopedia of philosophy of language (</article-title>
          <year>1997</year>
          )
          <fpage>454</fpage>
          -
          <lpage>66</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>T.</given-names>
            <surname>Worledge</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Hashimoto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Guestrin</surname>
          </string-name>
          ,
          <article-title>The extractive-abstractive spectrum: Uncovering verifiability trade-ofs in llm generations</article-title>
          ,
          <source>arXiv preprint arXiv:2411.17375</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>S.</given-names>
            <surname>Makri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Blandford</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. L.</given-names>
            <surname>Cox</surname>
          </string-name>
          ,
          <article-title>This is what i'm doing and why: Methodological reflections on a naturalistic think-aloud study of interactive information behaviour</article-title>
          ,
          <source>Information Processing &amp; Management</source>
          <volume>47</volume>
          (
          <year>2011</year>
          )
          <fpage>336</fpage>
          -
          <lpage>348</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>P. J.</given-names>
            <surname>Macmillan</surname>
          </string-name>
          ,
          <article-title>Thinking like an Expert Lawyer: Measuring Specialist Legal Expertise through Think-Aloud Problem Solving and Verbal Protocol Analysis</article-title>
          ,
          <source>Ph.D. thesis</source>
          , Bond University,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>P.</given-names>
            <surname>Sahoo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. K.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Saha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Jain</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Mondal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Chadha</surname>
          </string-name>
          ,
          <article-title>A systematic survey of prompt engineering in large language models: Techniques and applications</article-title>
          ,
          <source>arXiv preprint arXiv:2402.07927</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>G.</given-names>
            <surname>Sharma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Sharma</surname>
          </string-name>
          ,
          <article-title>Automatic text summarization methods: A comprehensive review</article-title>
          ,
          <source>SN Computer Science</source>
          <volume>4</volume>
          (
          <year>2022</year>
          )
          <fpage>33</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>G.</given-names>
            <surname>Lior</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Caciularu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Cattan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Levy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Shapira</surname>
          </string-name>
          , G. Stanovsky,
          <article-title>Seam: A stochastic benchmark for multi-document tasks</article-title>
          ,
          <source>arXiv preprint arXiv:2406.16086</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>R.</given-names>
            <surname>Pozzi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Rubini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Bernasconi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Palmonari</surname>
          </string-name>
          ,
          <article-title>Named entity recognition and linking for entity extraction from italian civil judgements</article-title>
          , in: R.
          <string-name>
            <surname>Basili</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Lembo</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Limongelli</surname>
            ,
            <given-names>A</given-names>
          </string-name>
          . Orlandini (Eds.),
          <source>AIxIA 2023 - Advances in Artificial Intelligence - XXIInd International Conference of the Italian Association for Artificial Intelligence</source>
          ,
          <source>AIxIA</source>
          <year>2023</year>
          , Rome, Italy, November 6-
          <issue>9</issue>
          ,
          <year>2023</year>
          , Proceedings, volume
          <volume>14318</volume>
          of Lecture Notes in Computer Science, Springer,
          <year>2023</year>
          , pp.
          <fpage>187</fpage>
          -
          <lpage>201</lpage>
          . URL: https://doi.org/10.1007/978-3-
          <fpage>031</fpage>
          -47546-7_
          <fpage>13</fpage>
          . doi:
          <volume>10</volume>
          . 1007/978-3-
          <fpage>031</fpage>
          -47546-7\_
          <fpage>13</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>V.</given-names>
            <surname>Bellandi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Bernasconi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Lodi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Palmonari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Pozzi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ripamonti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Siccardi</surname>
          </string-name>
          ,
          <article-title>An entity-centric approach to manage court judgments based on natural language processing</article-title>
          ,
          <source>Computer Law Security Review</source>
          <volume>52</volume>
          (
          <year>2024</year>
          )
          <article-title>105904</article-title>
          . doi:
          <volume>10</volume>
          .1016/j.clsr.
          <year>2023</year>
          .
          <volume>105904</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>R.</given-names>
            <surname>Pozzi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Barbera</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. Alva</given-names>
            <surname>Principe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Giardini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Rubini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Palmonari</surname>
          </string-name>
          ,
          <article-title>Combining knowledge graphs and nlp to analyze instant messaging data in criminal investigations</article-title>
          , in: M.
          <string-name>
            <surname>Barhamgi</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          Wang (Eds.),
          <source>Web Information Systems Engineering - WISE 2024</source>
          , Springer Nature Singapore, Singapore,
          <year>2025</year>
          , pp.
          <fpage>427</fpage>
          -
          <lpage>442</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>M.</given-names>
            <surname>Abdin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Aneja</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Awadalla</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Awadallah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. A.</given-names>
            <surname>Awan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Bach</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bahree</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          . Bakhtiari,
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>