<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Gender Violence in Numbers: Prompting Italian LLMs to Characterize Crimes Against Women</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Giulia Rizzi</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Daniel Scalena</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Elisabetta Fersini</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Groningen</institution>
          ,
          <addr-line>CLCG, Groningen</addr-line>
          ,
          <country country="NL">The Netherlands</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Milano-Bicocca</institution>
          ,
          <addr-line>Milan</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <abstract>
        <p>This paper investigates the application of various prompting strategies and Italian-language large language models (LLMs) to extract salient characteristics of gender-based crimes from judicial courtroom decisions. Recognizing the complex linguistic and legal structures inherent in such documents, we evaluate several types of prompting across multiple LLMs fine-tuned or pretrained on Italian corpora. Our approach focuses on identifying key elements such as crime typology, victim-perpetrator relationships, modus operandi, and main motivations behind the crimes against women. We present a comparative analysis of LLM performance on a small set of judicial courtrooms, highlighting the impact of prompt design on the extraction of legally and socially relevant information. The findings demonstrate the potential of prompt engineering to enhance the ability of LLMs to support socio-legal research and policy development in the context of gender-based violence.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Gender violence</kwd>
        <kwd>Information extraction</kwd>
        <kwd>Italian court rulings</kwd>
        <kwd>Language Models</kwd>
        <kwd>CLiC-it</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        rulings in the Italian judicial system. The study’s primary
objectives are firstly to explore the role of prompt
engiIn recent years, large language models (LLMs) have neering in guiding the model’s behaviour and improving
demonstrated remarkable capabilities in a variety of nat- output fidelity and secondly to evaluate the feasibility
ural language processing (NLP) tasks, showing potential of using these extracted outputs to generate statistical
for transforming domains that rely heavily on unstruc- analyses of juridical court rulings. A thorough evaluation
tured textual data [
        <xref ref-type="bibr" rid="ref24 ref9">1</xref>
        ]. In this field, the legal sector is of multiple models and prompt strategies has been
underdistinguished by its unique challenges and opportunities, taken, enabling the identification of both the capabilities
which can be attributed to the complexity, formalism, and limitations of state-of-the-art LLMs in the context
and high-stakes nature of judicial language. of complex, structured information retrieval within the
      </p>
      <p>Despite their general proficiency, LLMs remain largely legal domain.
untested in such highly specialized applications where The contributions of this study can be summarised as
linguistic nuances and factual accuracy are paramount. follows:
The extraction of structured information from legal
documents, such as the personal information of the accused,
necessitates not only an advanced understanding of the
language, but also strict adherence to domain-specific
taxonomies and ethical considerations regarding data
sensitivity. The anonymised and variable structure of
legal texts further complicates this task, necessitating
the development of tailored strategies for efective model
deployment. Beyond their technical relevance, such
advancements are of considerable societal value given their
potential to underpin large-scale analyses of sociological
and criminological trends.</p>
      <p>This work investigates the use of LLMs to automate
the extraction of key information from anonymised court
• Prompt Evaluation – We performed a
systematic evaluation and selection of prompts tailored
to a legal taxonomy, identifying the linguistic
and semantic limitations that afect model
performance.
• Empirical Assessment of LLM Outputs – We
perform a detailed analysis of model behavior
across multiple dimensions of a legal information
extraction task, highlighting typical failure modes
and model biases.
• Data-Driven Legal Insights – We uncover
statistical trends in italian criminal justice, while
emphasizing the importance of post-extraction
validation due to the inherent risks of
misinterpretation or hallucination, especially on such
anonymised data.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Works</title>
      <p>
        Information extraction Information Extraction (IE) Modern language models are typically trained on vast
is a foundational task in natural language processing amounts of data to capture various linguistic patterns.
that aims to automatically extract structured informa- However, especially in the case of smaller models, the
tion such as named entities, events, and relationships training data is often heavily skewed toward English,
refrom unstructured text. Traditional IE pipelines often sulting in reduced performance on other languages. As
rely on rules or shallow machine learning models [
        <xref ref-type="bibr" rid="ref14">2, 3</xref>
        ], discussed in Section 2, relatively few studies have
investibut recent advances have significantly improved the field, gated the intersection of non-English languages and legal
introducing more sophisticated training procedures and domains. For this reason, we began by selecting
modcomplex pipelines that leverage models’ embedding ca- els whose pre-training process or fine-tuning includes at
pabilities [4, 5]. With the advent of large language mod- least some Italian-language data, so as to guarantee a
minels, especially generative ones, there is a growing shift imal level of competence in Italian. In particular, we
evaltoward end-to-end approaches that require minimal task- uated three instruction-tuned checkpoints: (i) LLaMA
specific supervision. 3.1 8B1 [17]; (ii) Anita2 [
        <xref ref-type="bibr" rid="ref22">18</xref>
        ], a further Italian-specific
ifne -tune of LLaMA 3.1 8B; and (iii) Phi-3-mini (4B
parameters), instruction-tuned variant. All three models were
probed on a representative subset of prompts designed to
test instruction-following and the ability to emit precisely
structured text suitable for information-extraction.
Despite being the smallest model and having predominantly
English training data, Phi-3-mini consistently produced
the best-structured italian outputs and therefore emerged
as the top performer in this preliminary screening.
      </p>
      <sec id="sec-2-1">
        <title>In legal domain The legal domain presents unique</title>
        <p>
          challenges for information extraction due to its
specialized terminology, complex document structures, and
domain-specific entity types and relationships [ 6, 7, 8].
Recent studies have examined the potential of LLMs for
legal IE tasks [
          <xref ref-type="bibr" rid="ref12 ref29 ref35">9, 10</xref>
          ]. These works highlight the dificulty
of identifying entities such as case participants, legal
concepts, and procedural events due to the prevalence
of cross-references, frequent amendments, and highly
specialized jargon [11, 12].
        </p>
        <p>Legal documents from diferent jurisdictions or legal
systems introduce further complications, as they may
follow distinct conventions, terminologies, and structural
norms, making domain transfer particularly
challenging [13]. Most current language models are primarily
trained on English-language data, largely sourced from
Western, English-speaking jurisdictions (e.g., the United
States and the United Kingdom). Research has shown
that LLM performance on legal IE tasks can vary
significantly between in-domain and out-of-domain
contexts, with performance degradation often linked to
differences in document formality, legal drafting templates,
and jurisdiction-specific clauses [ 14]. The intricate nature
of legal texts adds another layer of complexity, as legal
terminology and document structures can vary widely
across legal systems and languages, necessitating
specialized methods for handling non-English legal texts.</p>
        <p>
          Most existing work has focused on English legal
documents. To the best of our knowledge, while some
attempts have been made in the Italian legal domain [
          <xref ref-type="bibr" rid="ref31">15,
16</xref>
          ], no prior work has specifically addressed Italian court
rulings, whose structure and terminology difer
significantly from those of the Anglo-Saxon legal tradition.
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Method</title>
      <sec id="sec-3-1">
        <title>3.2. Prompts</title>
        <p>A campaign was designed to study several prompt
engineering techniques to optimise the model’s responses to
the extraction task. The following prompts types have
been investigated:
1. Direct Instruction Prompt: This type of
prompt directly asks for specific information or
task completion, with clear, unambiguous
instructions. It’s straightforward and expects a
precise answer. For example: "What is the victim’s
name?".
2. Socratic Prompt: This type of prompt
encourages Socratic reasoning by asking consequent
questions. The goal is to guide the model toward
discovering information or coming to conclusions.
For example: "What is the victim’s name?"
followed by “What is &lt;name&gt;’s gender?”.
3. Structured Prompt: This type of prompt
provides a specific framework or format in which
the response should be structured. The adopted
JSON-like format includes predefined fields into
which the information should be extracted. This
ensures consistency and organization in the
answers. For example: “Extract the following details:
{victim_name: ?, victim_gender: ?}”.</p>
        <sec id="sec-3-1-1">
          <title>In this section we describe the introduced pipeline to</title>
          <p>extract information out of italian criminal court rulings. 1meta-llama/Llama-3.1-8B-Instruct
2swap-uniba/LLaMAntino-3-ANITA-8B-Inst-DPO-ITA</p>
          <p>According to the selected types, 145 prompts have been identifying details, based on the surrounding textual
condefined, both manually and by utilising Large Language text, with the goal of filling in the fields marked as “
OMISModels (LLMs)3. SIS”. However, an analysis of the model outputs revealed
an overall unsatisfactory quality of de-anonymization.
3.3. Dataset While the models demonstrated certain inferential
capabilities, the generated outputs frequently proved to
To construct a suitable dataset for our study, 2,000 be inaccurate, incomplete, or contextually inconsistent.
anonymized judicial court rulings were extracted from The most critical issues arose in the reconstruction of
the DeJure corpus4 based on the presence of references personal names: models frequently suggested names that
to specific norms related to gender-based crimes, i.e. Art. were inconsistent with the grammatical gender used in
609-quinquies, art. 572, art. 582, art. 609-bis, art. 609- the text, leading to uncoherent court rulings. For
inoctis, art. 609-ter, art. 612-bis of the Italian Penal Code. stance, masculine names have been observed to be used
We engaged 5 judicial experts to finally select only those in instances where feminine pronouns or adjectives were
judicial court rulings efectively relevant for the consid- employed, thereby compromising the document’s natural
ered case study. This targeted extraction strategy was lfow and readability. Furthermore, the models
demonemployed to ensure the relevance of the selected court strated inconsistency in the attribution of names
throughrulings to the legal domain under investigation. From out the document, frequently assigning diferent names
this initial pool, a subset of 1,000 court rulings was sub- to the same individual across multiple mentions. The
jected to manual evaluation by legal domain experts. The absence of global coherence indicated a restricted
contexexperts assessed each sentence for its appropriateness tual awareness, thereby diminishing the dependability of
and relevance, ultimately identifying 865 court rulings the automated procedure. In light of the aforementioned
as suitable for inclusion in the final dataset. This process limitations, manual de-anonymization was ultimately
ensured both the domain specificity and the quality of deemed the preferred approach in order to ensure both
the data used in subsequent analyses. The dataset ob- accuracy and internal consistency.
tained has been used for the identification of pertinent The manual de-anonymisation process enabled the
information and for the extraction of statistics to finally introduction of specific cases, designed to provide a
thormodel the gender-base violence phenomenon. ough and robust evaluation of the models.</p>
          <p>Furthermore, in order to assess the ability of the se- Foreign names were introduced to assess the models’
lected models to extract salient information from the ability to handle information that deviates from
convencourt rulings, we created a subset of de-anonymisation tional paradigms. The incorporation of such cases into
judicial court rulings. This process was aimed at recon- the study was intended to assess the models’ capacity
structing the removed/obscured information - such as to process unconventional information and to ensure
proper names, places, entities or other identifying ref- consistency and accuracy, even in the presence of
eleerences - by relying exclusively on the available textual ments that fall outside the more prevalent data categories
content. The de-anonymization process was aimed at utilised during their training.
creating a small benchmark for qualitative analysis to Additionally, complex cases involving multiple
individcompare the performance of the Italian large language uals sharing the same surname were included to assess
models. Specifically, the original anonymised court rul- the models’ ability to disambiguate identities, especially
ings have been annotated to introduce pseudo-real infor- in cases where roles difer, such as a victim and defendant
mation that the models could extract, in order to simulate with the same surname. This required the models to
cora plausible context of application of the model itself. The rectly infer identities based on contextual details. Lastly,
de-anonymised court rulings are utilised to evaluate the a case without any personal data was included with the
capabilities of the selected models, as well as to identify objective of evaluating the eficacy of the selected
modthe most efective prompts for the task of extracting the els in discerning instances wherein the requested data is
information included in the taxonomy. notably absent. The inclusion of this particular type of
input allows to assess the models’ ability to handle
situaDe-anonymisation A subset of anonymized court rul- tions in which information is either completely missing
ings was initially subjected to a de-anonymization pro- or deliberately omitted.
cess using the considered language models. Each model The de-anonymisation procedure, enriched by
was prompted to infer the missing information, such as these particular cases, results in a small dataset of 10
names of individuals, organizations, locations, and other judicial courtroom decisions that is well-suited for the
evaluation of the models’ performance in challenging
and incomplete scenarios.</p>
        </sec>
        <sec id="sec-3-1-2">
          <title>3Manually generated prompts have been included as examples in the</title>
          <p>definition of a few-shot instruction to ask Chat-GPT to generate
new ones.
4www.dejure.it
The first dataset (composed of 865 anonymized judicial
court rulings) was used to extract statistical insights on
gender-based violence in Italian court rulings, while the
second one composed of 10 de-anonymized court rulings
served to evaluate the models’ ability in the task of
automatic information extraction and for the selection of the
most promising prompts to adopt for the extraction task.</p>
          <p>The understanding of crimes against woman starting
from judicial courtroom decisions presents significant
challenges, primarily due to the inherent complexity of
legal language, which often involves dense, formal
phrasing and domain-specific terminology. Additionally,
judicial court rulings typically span between 3 to 15 pages
(averaging about 21,000 characters, with the longest
surpassing 137,000), resulting in lengthy and unstructured
documents that demand robust document-level
understanding. Compounding the dificulty is the frequent
occurrence of multiple crimes described across diferent
temporal contexts within a single sentence, requiring
ifne-grained temporal reasoning and event
disentanglement to accurately identify and extract relevant legal
information.</p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>3.4. Taxonomy</title>
        <sec id="sec-3-2-1">
          <title>A taxonomy has been defined in order to model all the</title>
          <p>relationships that are useful for the definition of the
offence and the relevant entities. The objective is to obtain a
complete and valid characterisation of the analysed court
rulings. In order to achieve the desired taxonomy, the
various classifications defined and proposed by the Istituto
Nazionale di Statistica (ISTAT) were adopted and
subsequently grouped into categories. Additional information
about the identified categories, along with a schematic
representation, are reported in Appendix A.</p>
          <p>The proposed taxonomy has been adopted in the
definition of the prompts for the extraction of salient
characteristics of gender-based violence.</p>
        </sec>
      </sec>
      <sec id="sec-3-3">
        <title>3.5. Inference pipeline description</title>
        <sec id="sec-3-3-1">
          <title>We prompt the selected models to extract relevant information from court rulings. To ensure reproducibility, we use greedy decoding and, apply the model’s original chat template from its instructed version.</title>
          <p>A key challenge in prompting models with court
rulings is their length in tokens, which can significantly slow
down the generation process. Since we query the same
model multiple times on the same ruling using diferent
prompts, we leverage the decode-only nature of language
models by precomputing the key-value cache for each
token in the ruling. At inference time, this allows us to
avoid redundant computation of internal states during
each forward pass.</p>
          <p>Each prompt includes a predefined set of labels from
which the model is expected to choose based on the
extracted information. The model should output at least
one label, optionally accompanied by an explanation or
the relevant text span. For evaluation, we perform an
exact string match between the stripped model output
and the set of possible labels.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Discussion</title>
      <sec id="sec-4-1">
        <title>The selected models has been evaluated on the de</title>
        <p>anonymized subset of court rulings focusing both on
model performances and computational requirements.</p>
        <p>Furthermore, results analysis allowed for the selection
of the most promising prompts.</p>
        <sec id="sec-4-1-1">
          <title>4.1. Prompts Evaluation</title>
          <p>The selection of prompts played a pivotal role in
determining the efectiveness of the selected language models
in extracting structured information from juridical court
rulings. This phase of experimentation revealed not only
the variability in the interpretative capabilities of large
language models (LLMs), but also several intrinsic
limitations related to prompt design and the models’
generalization ability when confronted with legal language.</p>
          <p>Preliminary analyses were conducted on the manually
de-anonymised subset of court rulings, which permitted
the empirical identification of prompt configurations that
were optimally suited to the information extraction task.
This experiment was able to shed light on a number of
dificulties encountered by the models. In many cases,
LLMs exhibited a fundamental misunderstanding of the
semantic scope required by the prompt, often retrieving
information that, while contextually related, diverged
significantly from the specific data fields defined by the
taxonomy (e.g., returning descriptive actions instead of
categorical labels like profession or relationship type).</p>
          <p>One of the primary limitations encountered was the
ambiguity in natural language and its impact on the
LLMs’ reasoning process. This was especially evident
when models were asked to infer information indirectly
stated or entirely absent from the text. Instead of
indicating the lack of evidence, models frequently hallucinated
responses, fabricating plausible but unfounded details.
This behavior critically undermines the reliability of
extracted data, particularly in legally sensitive contexts.</p>
          <p>Another noteworthy limitation was the tendency of
models to prioritize certain lexical or structural cues over
deeper contextual understanding. This resulted in
erroneous classification of attributes such as gender, age,
and relationship roles, particularly in complex or
nonstandardized sentence structures. Furthermore, despite
clear instructions embedded in the prompt (e.g., limiting
response length or choosing from a set of predefined
options), the outputs regularly violated these constraints by
(a) Pie Chart representing the victims’ gender
distribution.</p>
          <p>79%
19%</p>
          <p>2%
52%
19%
29%</p>
          <p>Male
Female
Not Specified
Male
Female
Not Specified
including a rationale that justifies the provided answer,
revealing the models’ limited capacity for controlled
generation. Nevertheless, such an explanation is not only
not requested, but is also frequently illogical or based
on spurious correlations, thereby accentuating the
interpretability issue.</p>
          <p>The comparison of the selected prompts demonstrated
that the adoption of direct instruction prompts, which
explicitly instructed the model to select from provided
options or adhere to strict syntactic patterns5, resulted
in a substantial enhancement in performance stability.
Nevertheless, the more general limitations in
comprehension and factual accuracy persist, particularly in
circumstances where information is partial or ambiguous.</p>
        </sec>
        <sec id="sec-4-1-2">
          <title>4.2. Extracted Statistics</title>
          <p>The statistical analysis was carried out on a set of 607
anonymized judicial rulings. This final number resulted
from a filtering process that excluded rulings exceeding (b) Pie Chart representing the culprits’ gender
disthe token limits of the models used, as well as those tribution.
containing errors introduced during the OCR extraction
of the original documents. After applying these cleaning Figure 1: Gender distribution of victims and culprits.
steps, 607 out of the original 865 rulings were deemed
suitable for analysis.</p>
          <p>As discussed in Section 3.1, we focus on the results ob- A similar phenomenon was observed in the data
pertained from the best-performing model, Phi-3-Mini (4B), taining to nationality. The majority of individuals
idenwhich demonstrated strong performance while maintain- tified as both victims and culprits were of Italian origin
ing low computational requirements. All generations are (89% and 90% respectively). A mere proportion of the
produced using greedy decoding to allow reproducibility, subjects belonged to minority groups, with Nigerian,
Chiwith the maximum number of tokens set to 512. The nese, and Albanian nationals being the most frequently
extraction process was guided by the adoption of the mentioned among non-Italian individuals. In some cases
prompts selected in the prompt evaluation phase, with (1,3% and 2,1% for culprits and victims), the nationality of
the objective of capturing relevant characteristics and the subjects could not be established due to the absence
extracting statistics and trends that would encompass of explicit references within the anonymised texts.
the entire taxonomy area.</p>
        </sec>
      </sec>
      <sec id="sec-4-2">
        <title>Demographic Trends A significant skew emerged</title>
        <p>in the gender distribution of both victims and culprits.
As shown in Figure 1a, the inferred victims were
predominantly female, comprising approximately 79% of the
identified cases. In contrast, as shown in figure 1b, the
majority of culprits were male, accounting for 52% of
the dataset. These figures align with established
criminological patterns observed in domestic and gender-based
violence cases. A notable proportion of records (19%
for victims and 29% for perpetrators) lacked suficient
information to determine gender, reflecting the
limitations imposed by anonymization and the challenges in
automatic extraction.</p>
      </sec>
      <sec id="sec-4-3">
        <title>5As an example when asking for the victim gender: Qual è il genere</title>
        <p>della vittima? Rispondi con "maschio", "femmina" o "non specificato"
which translates to What is the victim gender? Reply with "male",
"female" or "not specified" .</p>
        <p>Nature of Relationships A thorough analysis of
interpersonal relationships indicated that the majority of
crimes occurred within familiar or intimate settings. As
represented in Figure 2, conjugal relationships were the
most frequently identified type of relationship (over 30%
of cases), followed closely by cohabiting arrangements
(over 21% of cases). These findings underscore the
imperative for meticulous examination of domestic
environments as pivotal contexts for violent ofences. A small
yet noteworthy proportion of cases (around 2% of cases)
exhibited ambiguous or non-identifiable relationships,
thereby further emphasising the complexity involved in
disambiguating personal information within anonymised
legal documents, which frequently report such
information in an indirect form.</p>
        <p>Crime Scene and Modus Operandi The most
frequent locations linked to criminal acts were private
res21%
50%
15%
24.2%, and 23.9% respectively) emerge as most frequent.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusions</title>
      <p>ex-bospyofcruoiebshneoasdybsfir/tigaeinnrtldfsrsi/egniredlxsf-rcieonhdaascbqietuaxa-nsitnpstoaunsceecs/ofurlnlieeiadngeduneti/feimedoptlhoeyrerrelatsitvreangers
Typologies of Crime and Motivation The most
prevalent ofence detected within the corpus is homicide
(around 36% of the cases), constituting over one-third
of all analysed court rulings. Other prevalent categories
included personal injury, physical assault, and threats
(12%, 9% and 7% respectively), which often co-occur with
domestic or interpersonal conflict. Finally, in terms of
motive, quarrels/futile motives, insanity and grudges (38%,</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <sec id="sec-6-1">
        <title>The work of Daniel Scalena has been partially funded by</title>
        <p>MUR under the grant ReGAInS, Dipartimenti di
Eccellenza 2023-2027 of the Department of Informatics,
Systems and Communication at the University of
Milano</p>
        <p>Bicocca.
Declaration on Generative AI
During the preparation of this work, the author(s) used ChatGPT (OpenAI) and Grammarly in order
to: Paraphrase and reword and Grammar and spelling check. After using these tool(s)/service(s), the
author(s) reviewed and edited the content as needed and take(s) full responsibility for the
publication’s content.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          org/
          <year>2020</year>
          .findings-emnlp.
          <volume>261</volume>
          /. doi:
          <volume>10</volume>
          .18653/v1/
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2020.findings-emnlp.
          <volume>261</volume>
          . [1]
          <string-name>
            <given-names>H.</given-names>
            <surname>Naveed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. U.</given-names>
            <surname>Khan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Qiu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Saqib</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>An-</surname>
          </string-name>
          [8]
          <string-name>
            <given-names>D.</given-names>
            <surname>Mamakas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Tsotsi</surname>
          </string-name>
          , I. Androutsopoulos,
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <surname>Technology</surname>
          </string-name>
          (
          <year>2023</year>
          ). C. Goant, ă, D. Preot, iuc-Pietro (Eds.), Proceedings of [2]
          <string-name>
            <given-names>T.</given-names>
            <surname>Gao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Fisch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <surname>Making</surname>
          </string-name>
          pre-trained lan- the
          <source>Natural Legal Language Processing Workshop</source>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <article-title>guage models better few-shot learners</article-title>
          , in: C. Zong,
          <year>2022</year>
          , Association for Computational Linguistics,
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <given-names>F.</given-names>
            <surname>Xia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Navigli</surname>
          </string-name>
          (Eds.),
          <source>Proceedings of the Abu Dhabi, United Arab Emirates (Hybrid)</source>
          ,
          <year>2022</year>
          , pp.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <article-title>59th Annual Meeting of the Association for Com- 130-142</article-title>
          . URL: https://aclanthology.org/
          <year>2022</year>
          .nllp-1.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <source>putational Linguistics and the 11th International</source>
          <volume>11</volume>
          /. doi:
          <volume>10</volume>
          .18653/v1/
          <year>2022</year>
          .nllp-
          <volume>1</volume>
          .
          <fpage>11</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <source>Joint Conference on Natural Language Processing</source>
          [9]
          <string-name>
            <given-names>D.</given-names>
            <surname>Mali</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Mali</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Barale</surname>
          </string-name>
          , Information extrac-
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          (Volume
          <volume>1</volume>
          :
          <string-name>
            <surname>Long</surname>
            <given-names>Papers)</given-names>
          </string-name>
          ,
          <article-title>Association for Compu- tion for planning court cases</article-title>
          , in: N. Aletras,
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <surname>tational Linguistics</surname>
          </string-name>
          , Online,
          <year>2021</year>
          , pp.
          <fpage>3816</fpage>
          -
          <lpage>3830</lpage>
          . I.
          <string-name>
            <surname>Chalkidis</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Barrett</surname>
          </string-name>
          , C. Goant, ă, D. Preot, iuc-
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          URL: https://aclanthology.org/
          <year>2021</year>
          .
          <article-title>acl-long</article-title>
          .
          <volume>295</volume>
          /. Pietro, G. Spanakis (Eds.),
          <source>Proceedings of the</source>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <source>doi:10.18653/v1/2021.acl-long.295. Natural Legal Language Processing Workshop</source>
          [3]
          <string-name>
            <given-names>J.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Shen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Dolan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Carin</surname>
          </string-name>
          ,
          <year>2024</year>
          , Association for Computational Linguistics,
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <string-name>
            <given-names>W.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <article-title>What makes good in-context examples Miami, FL</article-title>
          , USA,
          <year>2024</year>
          , pp.
          <fpage>97</fpage>
          -
          <lpage>114</lpage>
          . URL: https:
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <source>for GPT-3?</source>
          , in: E. Agirre,
          <string-name>
            <given-names>M.</given-names>
            <surname>Apidianaki</surname>
          </string-name>
          , I. Vulić //aclanthology.org/
          <year>2024</year>
          .nllp-
          <volume>1</volume>
          .8/. doi:
          <volume>10</volume>
          .18653/
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          (Eds.),
          <source>Proceedings of Deep Learning Inside Out v1/2024.nllp-1</source>
          .8.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <source>(DeeLIO</source>
          <year>2022</year>
          ): The 3rd Workshop on Knowledge [10]
          <string-name>
            <given-names>C.</given-names>
            <surname>Barale</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Rovatsos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Bhuta</surname>
          </string-name>
          , Automated
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <string-name>
            <surname>Dublin</surname>
          </string-name>
          , Ireland and Online,
          <year>2022</year>
          , pp.
          <fpage>100</fpage>
          -
          <lpage>114</lpage>
          . URL: Graber, N. Okazaki (Eds.), Findings of the As-
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          https://aclanthology.org/
          <year>2022</year>
          .deelio-
          <volume>1</volume>
          .10/. doi:10.
          <article-title>sociation for Computational Linguistics: ACL</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          <volume>18653</volume>
          /v1/
          <year>2022</year>
          .deelio-
          <volume>1</volume>
          .
          <fpage>10</fpage>
          .
          <year>2023</year>
          , Association for Computational Linguistics, [4]
          <string-name>
            <given-names>L.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Wei</surname>
          </string-name>
          , Learning to retrieve in- Toronto, Canada,
          <year>2023</year>
          , pp.
          <fpage>2992</fpage>
          -
          <lpage>3005</lpage>
          . URL: https:
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          <article-title>context examples for large language models</article-title>
          , in: //aclanthology.org/
          <year>2023</year>
          .findings-acl.
          <volume>187</volume>
          /. doi: 10.
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          <string-name>
            <given-names>Y.</given-names>
            <surname>Graham</surname>
          </string-name>
          , M. Purver (Eds.),
          <source>Proceedings of the 18653/v1/2023.findings-acl.187.</source>
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          <source>18th Conference of the European Chapter of the</source>
          [11]
          <string-name>
            <given-names>M.</given-names>
            <surname>Cemri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Çukur</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Koç</surname>
          </string-name>
          , Unsupervised simpli-
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          <article-title>Association for Computational Linguistics (Volume ifcation of legal texts</article-title>
          ,
          <year>2022</year>
          . URL: https://arxiv.org/
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          <volume>1</volume>
          :
          <string-name>
            <surname>Long</surname>
            <given-names>Papers)</given-names>
          </string-name>
          ,
          <source>Association for Computational abs/2209</source>
          .00557. arXiv:
          <volume>2209</volume>
          .
          <fpage>00557</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          <string-name>
            <surname>Linguistics</surname>
          </string-name>
          , St.
          <source>Julian's, Malta</source>
          ,
          <year>2024</year>
          , pp.
          <fpage>1752</fpage>
          -
          <lpage>1767</lpage>
          . [12]
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Rusnachenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Liang</surname>
          </string-name>
          , Le-
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          URL: https://aclanthology.org/
          <year>2024</year>
          .
          <article-title>eacl-long</article-title>
          .
          <volume>105</volume>
          /. gal_try at SemEval-2023 task 6: Voting hetero[5]
          <string-name>
            <given-names>J.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Sun</surname>
          </string-name>
          , J. Han,
          <string-name>
            <given-names>C.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <article-title>A survey on deep learning geneous models for entities identification in le-</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          <source>Knowledge and Data Engineering</source>
          <volume>34</volume>
          (
          <year>2022</year>
          )
          <fpage>50</fpage>
          -
          <lpage>70</lpage>
          . G. Da San Martino, H. Tayyar Madabushi, R. Ku-
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          URL: http://dx.doi.org/10.1109/TKDE.
          <year>2020</year>
          .
          <volume>2981314</volume>
          .
          <string-name>
            <surname>mar</surname>
          </string-name>
          , E. Sartori (Eds.),
          <source>Proceedings of the 17th</source>
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          <source>doi:10</source>
          .1109/tkde.
          <year>2020</year>
          .2981314. International Workshop on Semantic Evaluation [6]
          <string-name>
            <given-names>I.</given-names>
            <surname>Chalkidis</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Androutsopoulos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Michos</surname>
          </string-name>
          , Ex- (SemEval-2023), Association for Computational Lin-
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          <article-title>tracting contract elements</article-title>
          , in: Proceedings of guistics, Toronto, Canada,
          <year>2023</year>
          , pp.
          <fpage>1282</fpage>
          -
          <lpage>1286</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          <source>the 16th Edition</source>
          of the International Conference URL: https://aclanthology.org/
          <year>2023</year>
          .semeval-
          <volume>1</volume>
          .178/.
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          <source>on Articial Intelligence and Law</source>
          , ICAIL '17, As- doi:10.18653/v1/
          <year>2023</year>
          .semeval-
          <volume>1</volume>
          .
          <fpage>178</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          <article-title>sociation for Computing Machinery</article-title>
          , New York, [13]
          <string-name>
            <given-names>J.</given-names>
            <surname>Niklaus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Matoshi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Stürmer</surname>
          </string-name>
          , I. Chalkidis,
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          <string-name>
            <surname>NY</surname>
          </string-name>
          , USA,
          <year>2017</year>
          , p.
          <fpage>19</fpage>
          -
          <lpage>28</lpage>
          . URL: https://doi.org/ D. Ho,
          <article-title>MultiLegalPile: A 689GB multilingual legal</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          10.1145/3086512.3086515. doi:
          <volume>10</volume>
          .1145/3086512. corpus, in: L.
          <string-name>
            <surname>-W. Ku</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Martins</surname>
          </string-name>
          , V. Srikumar (Eds.),
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          3086515.
          <source>Proceedings of the 62nd Annual Meeting of the As</source>
          [7]
          <string-name>
            <given-names>I.</given-names>
            <surname>Chalkidis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Fergadiotis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Malakasiotis</surname>
          </string-name>
          ,
          <string-name>
            <surname>N.</surname>
          </string-name>
          <article-title>Ale- sociation for Computational Linguistics</article-title>
          (Volume
          <volume>1</volume>
          :
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          <article-title>pets straight out of law school</article-title>
          , in: T. Cohn, guistics, Bangkok, Thailand,
          <year>2024</year>
          , pp.
          <fpage>15077</fpage>
          -
          <lpage>15094</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref38">
        <mixed-citation>
          <string-name>
            <given-names>Y.</given-names>
            <surname>He</surname>
          </string-name>
          , Y. Liu (Eds.), Findings of the Association URL: https://aclanthology.org/
          <year>2024</year>
          .
          <article-title>acl-long</article-title>
          .
          <volume>805</volume>
          /.
        </mixed-citation>
      </ref>
      <ref id="ref39">
        <mixed-citation>
          <source>for Computational Linguistics: EMNLP</source>
          <year>2020</year>
          , As- doi:10.18653/v1/
          <year>2024</year>
          .
          <article-title>acl-long</article-title>
          .
          <volume>805</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref40">
        <mixed-citation>
          <article-title>sociation for Computational Linguistics</article-title>
          , Online, [14]
          <string-name>
            <given-names>M.</given-names>
            <surname>Masala</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. C. A.</given-names>
            <surname>Iacob</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. S.</given-names>
            <surname>Uban</surname>
          </string-name>
          , M. Cidota,
        </mixed-citation>
      </ref>
      <ref id="ref41">
        <mixed-citation>
          <year>2020</year>
          , pp.
          <fpage>2898</fpage>
          -
          <lpage>2904</lpage>
          . URL: https://aclanthology. H.
          <string-name>
            <surname>Velicu</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Rebedea</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Popescu</surname>
          </string-name>
          , jurBERT: A
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>