<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>An Architectural Framework for the Construction of a Crime Narrative Corpus from Judicial Records</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Giovanni Acampora</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Bilal Ahmed</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Autilia Vitiello</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Physics “Ettore Pancini", University of Naples Federico II</institution>
          ,
          <addr-line>via Cintia 21, 80126 Naples</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2026</year>
      </pub-date>
      <abstract>
        <p>Forensic investigation relies on reconstructing how criminal events unfold in time, space, and human interaction. This task is inherently complex due to the involvement of multiple sources of evidence; however, it could benefit from information reported in judicial judgments of previous criminal cases, as such documents often contain detailed reconstructions of events that can support investigative reasoning and hypothesis generation. Unfortunately, judicial judgments embed the narrative of criminal events within legally dense and procedurally driven text, which hinders their straightforward reuse for forensic reasoning purposes. In this context, Artificial Intelligence (AI) techniques could facilitate the generation of clear and structured crime narratives from judicial records, making them suitable for use by law enforcement oficers and, in the future, by AI-assisted investigative systems. Based on these considerations, the goal of this paper is to present an architectural proposal in which judicial records can be responsibly repurposed into a forensic crime narrative corpus through a human-centered pipeline that prioritizes transparency, traceability, and selective expert oversight. In detail, this framework combines lexicon-guided recall, transformer-based functional classification, uncertainty-driven active learning, and lightweight event and timeline structuring. By generating a publicly available corpus that curates crime narratives from judicial records, this framework establishes a solid foundation for future research on AI-assisted investigative reasoning and mixed-reality crime-scene support systems.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Forensic Natural Language Processing</kwd>
        <kwd>Judicial Records</kwd>
        <kwd>Criminal Event Extraction</kwd>
        <kwd>Crime Narrative Corpus</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Crime Scene Investigation (CSI) is a cognitively demanding process in which investigators synthesize
fragmented observations, witness accounts, and physical evidence to form coherent hypotheses about
past events [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Efective investigation depends not only on isolated facts, but also on reasoning about
sequences of actions, temporal progression, spatial relations, and interactions among actors. Due to the
complexity of this task for law enforcement oficers, Artificial Intelligence (AI) techniques are emerging
as good methods to support forensic investigations [
        <xref ref-type="bibr" rid="ref2 ref3 ref4 ref5 ref6">2, 3, 4, 5, 6</xref>
        ]. However, current AI deployments in
the criminal justice ecosystem focus primarily on downstream tasks such as forensic laboratory analysis,
crime pattern analysis, sentencing support, or prediction of legal outcomes [
        <xref ref-type="bibr" rid="ref7 ref8">7, 8</xref>
        ]. These systems rarely
assist investigators during the early reasoning stages of a case, when decisions about evidence collection
and hypothesis generation have the greatest operational consequences. A central obstacle to progress
is the scarcity of accessible training data that captures investigative narratives while respecting ethical,
legal, and privacy constraints [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
      </p>
      <p>
        Based on this consideration, publicly available judicial judgments could be a valuable and
underused source of structured narrative information for forensic investigation and AI-assisted forensic
reasoning. Indeed, many criminal judgments contain factual accounts that describe the conduct of
the ofense, victim-ofender interactions, and investigative observations of previous criminal events
that could support forensic reasoning and hypothesis generation. However, such narratives are
typically interwoven with legal reasoning, precedent discussion, and procedural commentary. Extracting
investigator-relevant narratives from these documents therefore requires careful linguistic separation
and systematic validation, based on insights from rhetorical role labeling and legal discourse analysis
[
        <xref ref-type="bibr" rid="ref10 ref11 ref12">10, 11, 12</xref>
        ].
      </p>
      <p>
        In order to bridge this gap, the goal of this work is to propose a human-centered path toward a forensic
crime narrative corpus derived from judicial records, together with a hybrid Human-in-the-Loop (HITL)
modeling workflow that can isolate, validate, and structure narrative content with high precision and
feasibility. The core claim is not that the process should be fully automated, but that it should be
designed around selective human oversight to ensure reliability, accountability, and practical utility
[
        <xref ref-type="bibr" rid="ref13 ref9">13, 9</xref>
        ]. In detail, we outline a hybrid HITL framework that combines lexicon-guided recall to prioritize
narrative-bearing text, transformer-based functional classification that builds on advances in legal NLP
[
        <xref ref-type="bibr" rid="ref10 ref14">10, 14</xref>
        ], uncertainty-driven active learning to reduce annotation burden [
        <xref ref-type="bibr" rid="ref15 ref16">15, 16</xref>
        ], and lightweight event
and timeline assembly to provide incident-level representations [17, 18]. The proposed design aims to
maximize narrative coverage while controlling annotation efort and maintaining auditability, consistent
with best practices in corpus construction and HITL systems [
        <xref ref-type="bibr" rid="ref13 ref15 ref16">15, 16, 13</xref>
        ]. Unlike conventional legal
Natural Language Processing (NLP) work, which often targets judgment outcomes or doctrinal structure
[
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], the proposed corpus and workflow are geared toward modeling crime narratives to support forensic
reasoning and investigative decision support.
      </p>
      <p>
        The intended outcome is a dataset that supports functional annotations at the sentence-level that
distinguish narrative events, contextual descriptions, and procedural or legal discourse. In addition,
it is expected to include event templates that capture actors, actions, targets, locations, and temporal
ordering, drawing on principles from event extraction and narrative schema learning [19, 20]. Each
case is also expected to support a canonical narrative reconstruction with explicit links to the source
text in order to preserve provenance and auditability [
        <xref ref-type="bibr" rid="ref13">21, 13</xref>
        ]. The resulting data set is intended to
enable a range of downstream research tasks that are currently limited by the lack of narrative-centric
forensic data. These include event-sequence modeling of criminal behavior, comparative analysis of
ofense patterns across cases, temporal reasoning over investigative actions, and knowledge grounding
for interactive decision-support systems. In particular, such structured crime narratives can serve as
training and evaluation material for AI agents designed to assist law enforcement personnel during
crime-scene investigations, including mixed-reality and situated AI systems that provide contextual
guidance, procedural prompts, and hypothesis support during evidence collection. Within this scope,
the planned release of the dataset is accompanied by a reproducible baseline sentence-level functional
classification model and annotation protocols, intended to establish a shared reference point for future
forensic NLP and HITL investigative AI research.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. The Proposed Framework</title>
      <p>
        The proposed architecture follows an incremental and auditable pipeline designed to balance narrative
coverage, annotation eficiency, and robustness across heterogeneous judicial writing styles. The core
design principle is to maximize the recall of crime narrative content while limiting human efort through
selective verification and iterative refinement. Human oversight is incorporated where automated
confidence is insuficient, supporting scalability without sacrificing traceability [
        <xref ref-type="bibr" rid="ref13 ref15">15, 13</xref>
        ]. The general
workflow is illustrated in Figure 1.
      </p>
      <p>In the following, all components of our framework are discussed in detail.</p>
      <sec id="sec-2-1">
        <title>2.1. Judicial Case Records</title>
        <p>The source material consists of publicly available criminal judgments in which the conduct of the ofense
and the interactions between the victims are described with suficient detail to support the narrative
reconstruction. These documents vary substantially in length, rhetorical structure, and format. Some
judgments provide explicit sections such as Facts or The Ofending, whereas others distribute narrative
information across multiple paragraphs without clear demarcation. In this work, extraction is restricted</p>
        <p>Judicial Case Records
(Criminal Judgments &amp; Appeals)</p>
        <p>Document Segmentation &amp; Windowing
(Paragraphs/Section Cues/Sliding Windows)</p>
        <p>Lexicon-Guided Recall Filter</p>
        <p>(Behavioural &amp; Event Triggers)
Transformers Based Sentence Classifier</p>
        <p>Model vt
(NE/CS/PL/OT)</p>
        <p>Prediction
confidence</p>
        <p>High Confidence
Lightweight Event Extraction and Schema</p>
        <p>Generation
(Actor/Action/Target/Location/Time)
Released Forensic Crime-Narrative
dataset</p>
        <p>Lexicon Dictionary L0
Lexicon Expansion</p>
        <p>L ∪ ∆ t</p>
        <p>Human-in-the-Loop Review</p>
        <p>
          (Expert Validation)
High Uncertainaty
to factual descriptions of the crime incident and its immediate context. Legal principles, sentencing
rationales, and procedural arguments are treated as contextual metadata and excluded from narrative
modeling. This separation reduces the risk of conflating factual accounts with judicial interpretation
and aligns with established practices in the labeling of the rhetorical role of legal documents [
          <xref ref-type="bibr" rid="ref10 ref12">10, 12</xref>
          ]. At
the same time, judicial narratives are post hoc reconstructions and should not be treated as exhaustive
accounts of crime-scene dynamics. The goal is therefore not to recreate all investigative detail, but to
provide a consistent narrative signal that enables event-oriented analysis, comparative study across
cases, and grounded downstream modeling.
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Document Segmentation</title>
        <p>Each judicial judgment is segmented into paragraphs. Let a document  be represented as an ordered
sequence of paragraphs (1, 2, . . . , ). To preserve local narrative coherence while limiting contextual
drift, overlapping paragraph windows of fixed length  are constructed:</p>
        <p>() = {  = (, +1, . . . , +−1 ) | 1 ≤  ≤  −  + 1 }.</p>
        <p>Where available, section headings such as Facts, The Ofending, or Background are treated as weak
structural signals to prioritize narrative-dense regions. When headings are unreliable or absent, sliding
windows preserve coverage throughout the document.</p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Lexicon-Guided Recall</title>
        <p>
          An initial domain-aware lexicon 0 is constructed by forensic domain inspection. The lexicon includes
behavioral verbs, motion indicators, references to force or instruments, and expressions commonly
associated with crime narratives. The lexicon is used as a recall-oriented prior rather than as a strict
rule-based filter, which aligns with previous work on weak supervision and lexicon-guided annotation
[
          <xref ref-type="bibr" rid="ref16">16</xref>
          ].
        </p>
        <p>A candidate window  is retained if it contains at least one lexical trigger:</p>
        <sec id="sec-2-3-1">
          <title>The retained window set at iteration  is defined as:</title>
          <p>(; ) = ⊮[∃ℓ ∈  : ℓ ∈ ].</p>
          <p>() = {  ∈  () | (; ) = 1 }.</p>
          <p>To reduce selection bias, windows that lack lexical triggers may still be sampled under the active
learning regime so that previously unseen narrative expressions can be identified and incorporated.</p>
        </sec>
      </sec>
      <sec id="sec-2-4">
        <title>2.4. Functional Classification</title>
        <p>
          Sentences within retained windows are classified according to their functional role in the document,
following rhetorical role labeling traditions [
          <xref ref-type="bibr" rid="ref10 ref11">10, 11</xref>
          ]. Each sentence  is assigned to one of four categories:
Narrative Event (NE), Contextual Setting (CS), Procedural Legal (PL), or Other (OT).
        </p>
        <p>A transformer-based encoder produces a contextual representation ℎ =  (), which is passed to a
linear classification head:</p>
        <p>( =  | ) = softmax( ℎ + ).</p>
        <p>
          The model is trained using cross-entropy loss over manually verified samples, consistent with standard
ifne-tuning approaches in legal NLP [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ].
        </p>
      </sec>
      <sec id="sec-2-5">
        <title>2.5. Human Verification and Active Learning</title>
        <p>Human review is focused on sentences with high predictive uncertainty, estimated using entropy:
 () = −
∑︁  ( =  | ) log  ( =  | ).</p>
        <p>Sentences exceeding a predefined uncertainty threshold  are routed for expert annotation. Validated
narrative expressions are incorporated into the lexicon through incremental updates:
+1 =  ∪ ∆ .</p>
        <p>
          This iterative design follows established HITL and active learning practices that aim to improve
annotation eficiency while maintaining quality [
          <xref ref-type="bibr" rid="ref13 ref15 ref16">15, 16, 13</xref>
          ].
        </p>
      </sec>
      <sec id="sec-2-6">
        <title>2.6. Event Extraction and Timelines</title>
        <p>Sentences classified as Narrative Event are converted into structured event templates:
 = ( ,  ,  , ℓ ,  ),
where  denotes the agent,  the action,  the target, ℓ the location reference, and  the temporal
index. This representation is based on the principles of semantic role labeling and temporal event
extraction [20, 17, 18].</p>
        <p>Events are assembled into a case-level timeline:
 = (1, 2, . . . , ), 1 ≤  2 ≤ · · · ≤ 
.</p>
        <p>The representation is intentionally lightweight to ensure robustness across diverse judicial writing
styles and to support scalable validation.</p>
      </sec>
      <sec id="sec-2-7">
        <title>2.7. Forensic Crime Narrative Dataset</title>
        <p>For each judicial case, the proposed framework returns a crime narrative dataset that includes structured
metadata, sentence-level functional labels with confidence and verification status, event templates with
temporal ordering, and narrative reconstructions linked to the source text. The release is intended to
include annotation guidelines, inter-annotator agreement statistics [22], ofense-type coverage analysis,
and baseline benchmarks.</p>
        <p>In addition to the annotated data, a frozen sentence-level functional classification model trained on the
ifnal verified corpus is released as a reference artifact. This model is provided to support reproducibility
and comparative evaluation, and is not intended to represent an optimized or deployment-oriented
system.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Validation and Limitations</title>
      <p>
        The evaluation is expected to cover functional classification performance, event extraction quality,
and annotation eficiency. Classification can be assessed using macro-averaged precision, recall, and
F1 score. Event extraction quality can be measured via slot-level accuracy for event arguments, and
eficiency can be quantified as a reduction in human annotation efort relative to non-HITL baselines
[
        <xref ref-type="bibr" rid="ref13 ref15 ref16">15, 16, 13</xref>
        ].
      </p>
      <p>The resulting corpus is intended to support research on investigative decision support,
precedentbased crime pattern comparison, and knowledge grounding for mixed-reality systems that assist
investigators in reconstructing event sequences. These applications should be treated as advisory rather
than as determinative. The framework is designed to preserve provenance links to enable inspection
and to discourage ungrounded inference.</p>
      <p>
        Judicial narratives are post hoc reconstructions and may reflect legal priorities rather than
investigative completeness. They may also encode biases embedded in judicial processes. These constraints
require careful interpretation and appropriate safeguards in downstream use [
        <xref ref-type="bibr" rid="ref9">9, 23</xref>
        ].
      </p>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusion</title>
      <p>The goal of this paper is to propose a human-centered approach to constructing a forensic crime
narrative corpus from judicial records. By combining lexicon-guided recall, transformer-based functional
classification, and uncertainty-driven selective human oversight, the proposed framework aims to
address data scarcity while supporting transparency and auditability. The design is intended to provide
a foundation for future empirical work on forensic narrative modeling and AI-assisted investigative
reasoning.</p>
    </sec>
    <sec id="sec-5">
      <title>Author Contributions</title>
      <p>Bilal Ahmed is the main contributor to this work and led the conceptualization of the study, the design
of the proposed methodology, and the writing of the manuscript. Giovanni Acampora and Autilia
Vitiello provided supervision, methodological guidance and critical feedback, with particular support
on research formulation, machine learning design, and evaluation plan. All authors reviewed the
manuscript.</p>
    </sec>
    <sec id="sec-6">
      <title>Declaration on Generative AI</title>
      <sec id="sec-6-1">
        <title>The authors have not employed any Generative AI tools.</title>
        <p>[17] O. Kolomiyets, M.-F. Moens, Extracting narrative timelines as temporal dependency structures, in:</p>
        <p>Proceedings of ACL, 2012.
[18] W. Yao, B. Haghighi, H. Poon, S. Riedel, Temporal event knowledge acquisition via identifying
before/after relations, in: Proceedings of ACL, 2018.
[19] N. Chambers, D. Jurafsky, Unsupervised learning of narrative event chains, in: Proceedings of</p>
        <p>EMNLP, 2008.
[20] Z. Zhang, E. Strubell, E. Hovy, Transfer learning from semantic role labeling to event argument
extraction, in: Proceedings of EMNLP, 2022.
[21] P. Kalamkar, U. Sreejith, B. Nayak, K. Shrivastava, Corpus for automatic structuring of legal
documents, in: Proceedings of the Language Resources and Evaluation Conference (LREC), 2022.
[22] K. Krippendorf, Reliability in content analysis, Human Communication Research 30 (2004)
298–310.
[23] S. Barocas, A. D. Selbst, Big data’s disparate impact, California Law Review 104 (2016) 671–732.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>T. D.</given-names>
            <surname>Wilson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. C.</given-names>
            <surname>Gambino</surname>
          </string-name>
          ,
          <article-title>Crime scene investigation</article-title>
          ,
          <source>in: Handbook of Forensic Psychology</source>
          , 2 ed., Wiley-Blackwell,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>A.</given-names>
            <surname>Vitiello</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Di Nunzio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Garofano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Saliva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Ricci</surname>
          </string-name>
          , G. Acampora,
          <article-title>Bloodstain pattern analysis as optimisation problem</article-title>
          , Forensic science international
          <volume>266</volume>
          (
          <year>2016</year>
          )
          <fpage>e79</fpage>
          -
          <lpage>e85</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>G.</given-names>
            <surname>Acampora</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Di Nunzio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Garofano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Saliva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Vitiello</surname>
          </string-name>
          ,
          <article-title>Applying density-based clustering for bloodstain pattern analysis</article-title>
          ,
          <source>in: 2021 IEEE International Conference on Systems, Man, and Cybernetics</source>
          (SMC), IEEE,
          <year>2021</year>
          , pp.
          <fpage>28</fpage>
          -
          <lpage>33</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>G.</given-names>
            <surname>Acampora</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Vitiello</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Di Nunzio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Saliva</surname>
          </string-name>
          , L. Garofano,
          <article-title>Bloodstain pattern analysis-a new challenge for computational intelligence community</article-title>
          ,
          <source>in: International Conference on Fuzzy Computation Theory and Applications</source>
          , volume
          <volume>2</volume>
          , SCITEPRESS,
          <year>2014</year>
          , pp.
          <fpage>211</fpage>
          -
          <lpage>216</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>N.</given-names>
            <surname>Galante</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Cotroneo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Furci</surname>
          </string-name>
          , G. Lodetti,
          <string-name>
            <given-names>M. B.</given-names>
            <surname>Casali</surname>
          </string-name>
          ,
          <article-title>Applications of artificial intelligence in forensic sciences: Current potential benefits, limitations and perspectives</article-title>
          ,
          <source>International journal of legal medicine 137</source>
          (
          <year>2023</year>
          )
          <fpage>445</fpage>
          -
          <lpage>458</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>G.</given-names>
            <surname>Acampora</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Vitiello</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Di Nunzio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Saliva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Garofano</surname>
          </string-name>
          ,
          <article-title>Towards automatic bloodstain pattern analysis through cognitive robots</article-title>
          ,
          <source>in: 2015 IEEE International Conference on Systems, Man, and Cybernetics</source>
          ,
          <year>2015</year>
          , pp.
          <fpage>2447</fpage>
          -
          <lpage>2452</lpage>
          . doi:
          <volume>10</volume>
          .1109/SMC.
          <year>2015</year>
          .
          <volume>428</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>D. M.</given-names>
            <surname>Katz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. J.</given-names>
            <surname>Bommarito</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Blackman</surname>
          </string-name>
          ,
          <article-title>A general approach for predicting the behavior of the supreme court of the united states</article-title>
          ,
          <source>PLOS ONE 12</source>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>I.</given-names>
            <surname>Chalkidis</surname>
          </string-name>
          , I. Androutsopoulos,
          <string-name>
            <given-names>N.</given-names>
            <surname>Aletras</surname>
          </string-name>
          ,
          <article-title>Neural legal judgment prediction in english</article-title>
          ,
          <source>in: Proceedings of EMNLP-IJCNLP</source>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>K.</given-names>
            <surname>Jaidka</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Khosla</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Raj</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sap</surname>
          </string-name>
          ,
          <article-title>Investigating biases in legal ai systems</article-title>
          , in: AAAI/ACM Conference on AI,
          <string-name>
            <surname>Ethics</surname>
          </string-name>
          , and
          <string-name>
            <surname>Society</surname>
          </string-name>
          (AIES),
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>S. B.</given-names>
            <surname>Majumder</surname>
          </string-name>
          ,
          <string-name>
            <surname>D. Das</surname>
          </string-name>
          ,
          <article-title>Rhetorical role labelling for legal judgements using transformer-based models</article-title>
          ,
          <source>in: FIRE Workshop on Artificial Intelligence for Legal Assistance (AILA)</source>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>K. L. Cope</surname>
          </string-name>
          ,
          <article-title>Rhetorical roles of legal documents: A survey</article-title>
          ,
          <source>International Journal of Legal Information</source>
          <volume>42</volume>
          (
          <year>2016</year>
          )
          <fpage>198</fpage>
          -
          <lpage>214</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>A.</given-names>
            <surname>Gupta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Malik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Jatowt</surname>
          </string-name>
          ,
          <article-title>Rhetorical role labeling of legal documents using transformers and graph neural networks</article-title>
          ,
          <source>in: SemEval-2023 Task 6</source>
          ,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>L.</given-names>
            <surname>Weber</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Minervini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Benckendorf</surname>
          </string-name>
          , et al.,
          <article-title>A human-in-the-loop improves annotation error detection</article-title>
          ,
          <source>in: Findings of ACL</source>
          ,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>J.</given-names>
            <surname>Devlin</surname>
          </string-name>
          , M.-
          <string-name>
            <given-names>W.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Toutanova</surname>
          </string-name>
          , BERT:
          <article-title>Pre-training of deep bidirectional transformers for language understanding</article-title>
          ,
          <source>in: Proceedings of NAACL-HLT</source>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>U.</given-names>
            <surname>Hahn</surname>
          </string-name>
          , E. Buyko,
          <string-name>
            <given-names>R.</given-names>
            <surname>Landefeld</surname>
          </string-name>
          , et al.,
          <article-title>Active learning-based corpus annotation: the PathoJen corpus</article-title>
          ,
          <source>Journal of Biomedical Semantics</source>
          <volume>3</volume>
          (
          <year>2012</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>S.</given-names>
            <surname>Clancy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Cote</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Seering</surname>
          </string-name>
          ,
          <article-title>Active Learning with a Human in the Loop</article-title>
          ,
          <source>Technical Report, MITRE</source>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>