<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Hersonissos, Greece
* Corresponding author.
$ leila.feddoul@uni-jena.de (L. Feddoul); sarah.bachinger@uni-jena.de (S. T. Bachinger);
marianne.mauch@uni-jena.de (M. Mauch)
 https://www.fusion.uni-jena.de/people/details/leila-feddoul (L. Feddoul);
https://www.zedif.uni-jena.de/de/team/sarah-bachinger.html (S. T. Bachinger)</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>GerPS-NER: A Dataset for Named Entity Recognition to Support Public Service Process Creation in Germany</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Leila Feddoul</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sarah T. Bachinger</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Clara Lachenmaier</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sebastian Apel</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pirmin Karg</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Norman Klewer</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Denys Forshayt</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Robin Erd</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marianne Mauch</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>City Administration Jena</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Competence Center Digital Research (zedif), Friedrich Schiller University Jena</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Computational Linguistics, Bielefeld University</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Heinz Nixdorf Chair for Distributed Information Systems, Friedrich Schiller University Jena</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2024</year>
      </pub-date>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0001</lpage>
      <abstract>
        <p>For the end-to-end digitization of public administration in Germany, reliable entity recognition in legal texts is essential to support the creation of corresponding processes (legal norm analysis). To the best of our knowledge, datasets that can serve to train and test machine learning models on this specific entity recognition task with our domain-specific categories do not exist. Therefore, we present GerPS-NER, a dataset for entity recognition to support the legal norm analysis, consisting of 24k sentences from German law documents annotated with ten categories (e.g., main actor). We showcase the dataset generation workflow, including the data collection, the annotation guidelines, and the annotation phases. The dataset [1] is publicly available under a CC-BY 4.0 license.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Dataset</kwd>
        <kwd>Named Entity Recognition</kwd>
        <kwd>Legal Norm Analysis</kwd>
        <kwd>Annotation</kwd>
        <kwd>Machine Learning</kwd>
        <kwd>Public Administration</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Public administration institutions follow processes based on legislation (e.g., laws and
ordinances), when delivering a public service like car registration to citizens or companies. In
Germany, the Federal Information Management (FIM) [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] provides standardized methods for
analyzing such legal bases. Trained public administration employees mark relevant terms or
sentences of categories, such as the main actor as well as the result receiver, and create a list of
discovered process steps together with the related data fields 1. Finally, they assemble all these
elements in a process using a restricted and a extended BPMN [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] notation. This process is
used to create web forms allowing an online delivery of the service for potential applicants. In
this way, the service, the associated process, and data fields matching specific process steps
are described on the basis of the associated legal basis. This annotation of legal texts with
categories is called the FIM — legal norm analysis, which represents the first step in the creation
of administrative services with the identification of all involved process elements (e.g., steps or
actors).
      </p>
      <p>
        The Canarėno [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] (“Computer-assisted analysis of electronically available legal norms”) project
is one of three research projects of the “Open Design of Digital Administrative Architectures
(openDVA)”2 working group investigating the path from the legal text to its digital
implementation and how it can be simplified and partially automated, both for new or existing legal texts.
The project aims to support this manual and time-consuming FIM analysis of German legal
norms by automatically generating suggestions that assign categories3 to relevant
terms/sentences, allowing users to review them (accept, edit, or delete), and capturing corrections to
continuously improve the system (Human-in-the-Loop [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]). We aim to leverage techniques
from natural language processing, such as Named Entity Recognition (NER) [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], for the
category detection. For this purpose, we investigate diferent methods for solving it. Supervised
machine learning approaches are promising, but some methods need a relatively large amount
of annotated training data. Since such data is not available, we create the GerPS-NER corpus,
which can serve as a training and evaluation sets for any model aiming to annotate German
legal texts with our categories. In the following, we focus on explaining the workflow for the
creation of this corpus. Our contributions can be summarized as follows:
• Describing the dataset creation workflow including our iterative approach for designing
annotation guidelines.
• Proposing GerPS-NER, a dataset consisting of 24k annotated sentences with ten categories
of the legal norm analysis.
      </p>
      <p>
        The dataset [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] is publicly available under a CC-BY 4.0 licence, together with the code used
for law crawling, data processing, and the conversion of the annotated corpus into the final
standard format (IOB2 [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]).
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        Some datasets for NER on German legal texts exist, e.g., Leitner et al. [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] created a dataset using
19 entities (e.g., judge) for annotating German court decisions, called LER [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. Darji et al. [10]
published a dataset of legal references in German law that were annotated (e.g. Buch (book)) by
law experts. Wrzalik and Krechel [11] created a dataset of case documents and describe labelled
queries to the open legal plattform Open Legal Data for German Legal Information Retrieval
1Data fields represent the diferent pieces of information of online forms (e.g., first name, profession) that are
necessary for the result receiver to apply for a service.
2www.opendva.de
3Refer to Table 1 for the definitions
Data Collection
      </p>
      <p>Annotation
Guidelines
Corpus</p>
      <p>Pilot Annotation</p>
      <p>Phase
Real Annotation</p>
      <p>Phase
Annotation</p>
      <p>Agreement</p>
      <p>Adjudication</p>
      <p>GerPSNER
called GerDaLIR. To the best of our knowledge, there does not exist a corpus that is annotated
with the categories needed for legal norm analysis on German legal texts.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Dataset Creation Workflow</title>
      <p>The pipeline for creating the training corpus is depicted in Figure 1 and is based on [12]. First,
a conceptual model was defined to set the categories used for annotation and their potential
relations (refer to [13, 14] for more details about the created ontology). This was followed by
collecting relevant text data to construct the corpus and a first draft of the annotation guidelines
(refer to Section 4). After instructing annotators, the actual annotation phase started. We
divided this into: (1) A pilot annotation phase with a small number of documents where the
agreement between the diferent annotators (Inter-Annotator Agreement (IAA) 4) is considered
after each iteration. Here, the task definitions and guidelines are evaluated and improved based
on annotators’ feedback. In an adjudication step, annotation mismatches are resolved and a first
annotated corpus (gold standard) is created. (2) A real annotation phase where the improved
guidelines are applied but not fundamentally changed anymore. Each remaining document is
annotated only by a single annotator, omitting (IAA).</p>
      <sec id="sec-3-1">
        <title>3.1. Concept Model and Data Collection</title>
        <p>The categories used for GerPS-NER were based on those defined by FIM - legal norm analysis
and were extended during the project after agreement with the experts to allow a more concise
process description. A basic definition of the categories is given in Table 1, for a more detailed
description, refer to the latest annotation guideline5.</p>
        <p>After the definition and semantic description of the categories that should be detected in
the law texts, relevant data needs to be collected. For this purpose, we gather law paragraphs
that are used as a basis for the creation of public service processes in Germany. The website</p>
        <sec id="sec-3-1-1">
          <title>4A measure for the overlap between two annotators, e.g. Cohen’s kappa [15]. 5[1] in folder annotation_guidelines/10-01-2024_Annotation_Guide_V4.pdf</title>
          <p>FIM-Portal6 gives an overview of available services where, if available, links to the relevant law
paragraphs are accessible via the “Rechtsgrundlage” field of each service.</p>
          <p>We started with an internal initial pilot phase involving three human annotators and consisting
of 10 documents that were randomly selected from data we collected from the FIM-Portal. The
analysis of distribution of the collected service law paragraphs using the first version of the
data collection code revealed an non-uniform distribution of the services with respect to their
groups7 (e.g., health) meaning that some services were more often represented than others. In
addition, we noticed that we cover only 33 of the 160 available service groups (21%). For this
reason in the real annotation phase, we collected data using another data collection strategy
which covers all the groups in a relatively uniform way and thus creates a diverse and balanced
training corpus. To create the latter, a list of services provided by the FIM-Portal is used as input.
This list is given as a CSV file 8 including the content of the corresponding HTML pages. This file
can be downloaded after registration from the internal section of the FIM-Portal catalogue [16]</p>
        </sec>
        <sec id="sec-3-1-2">
          <title>6https://fimportal.de/</title>
          <p>
            7Refer to https://www.xrepository.de/api/xrepository/urn:de:fim:leika:leistungsgruppierung_20231229/download/
FIMLeiKaLeistungsgruppierungen_20231229.xlsx for the diferent service groups.
8Downloads_im_CSV-Format__LeiKa-plus__Alle_Leistungen_inkl._inhaltlicher_Beschreibungen__mit_HTML.csv
and contains the links to law paragraphs corresponding to the diferent services. Only links
pointing to a paragraph on the website gesetze-im-internet.de9 page are considered10. For each
of the those links, we crawl the content from gesetze-im-Internet. We select a fixed number (10)
of services from each group type. Each collection of paragraphs related to a specific service are
stored in a separate document and identified using the service ID (LeiKa [
            <xref ref-type="bibr" rid="ref2">2</xref>
            ]). Ideally, one would
have 10 documents of each type of service, but for some there were not enough, so for these
types of service there are less documents. In total, we collected 1020 documents from 141 service
types. Note that we first removed the 10 already annotated documents during the previous
initial pilot phase which means that we had 1010 documents available for the real annotation
phase. The code for data crawling [17] and the balanced corpus without annotations [18] are
published on Zenodo under an CC BY 4.0 licence.
          </p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Annotation Phase</title>
        <p>With the aim of iteratively creating an annotation guideline for a larger annotation campaign,
we started an initial pilot phase with a set of 10 documents. The annotation was performed by
three annotators using the annotation tool INCEpTION [19], followed by a calculation of the
IAA, an adjudication step to solve mismatches, and a discussion of open issues with domain
experts. In the following, we will give more details about the adjudication phase of the initial
pilot phase and the real annotation phase.
3.2.1. Adjudication (Initial pilot phase)
The process for adjudication and thus the consolidation of the annotations done by all three
annotators to one annotated gold standard document was performed as follows using the curator
(adjudicator) interface of INCEpTION [19]:
• Consider only the lines with annotations.
• Automatically accept annotations where more than one annotator agree.
• If only two out of three annotators annotated a sentence (meaning the third judged the
sentence not relevant), we consider the annotations that were marked.
• Correct annotation based on the newest version of the annotation guidelines.
• For the annotations where less than 2 people agree (each one of the three annotators has
annotated diferently):
– Try to converge and agree on one annotation.
– If not possible, gather as an open issue to discuss with the expert. We grouped open
questions per document11.
• If, during the process, possible guideline extensions arise, we write them down and add
to the guidelines in the corresponding place.</p>
        <sec id="sec-3-2-1">
          <title>9www.gesetze-im-internet.de</title>
          <p>
            10E.g., https://www.gesetze-im-internet.de/betrsichv_2015/__19.html which links to paragraph 19 but not https:
//www.gesetze-im-internet.de/betrsichv_2015/, which links to the entire law.
11[
            <xref ref-type="bibr" rid="ref1">1</xref>
            ] in folder adjudication/expert_template.md
          </p>
          <p>Project 0
Long-Short-Span-Annotation</p>
          <p>(GuidelineV1)
3AnnDootcautomrse/nDtsoc2u0ment</p>
          <p>S1,S2,E
NoCuration
Status:Finished</p>
          <p>The guideline additions that originated from the adjudication phase are part of the
“Annotation_Guide_V0.2”12.</p>
          <p>
            The adjudication of the 10 documents of the initial pilot phase resulted in a first partial gold
standard corpus. We provide a short13 and an extended14 version of the same corpus depending
on the considered annotation span.
3.2.2. Real Annotation Phase
A larger annotation campaign involved four annotators (three law students (S1–3) and an
employee in the municipal administration of city Jena (E)), as displayed in Figure 2. It also
started with a pilot phase involving a small fraction of remaining documents. This is a training
phase to ensure all new annotators have the same understanding of the developed annotation
guideline and can annotate in a consistent way. The annotation campaign consists of 4 phases,
each of which is organized as an individual project in INCEpTION [19]. Each annotator received
an individual training at the beginning of their working time, which involved several individual
meetings. The annotators were required to record questions, comments and the time they spent
annotating for each annotated document. The individual phases are discussed in detail below.
12[
            <xref ref-type="bibr" rid="ref1">1</xref>
            ] in folder adjudication/curation_guidelines_addition.md
13[
            <xref ref-type="bibr" rid="ref1">1</xref>
            ] in folder intermediate_corpora/adjudication/short
14[
            <xref ref-type="bibr" rid="ref1">1</xref>
            ] in folder intermediate_corpora/adjudication/extended
Project 0 With three annotators (S1, S2, and E), the first 20 documents were annotated
according to the first version of the annotation guidelines 15. The aim was to create two versions
of the same corpus depending on the considered annotation span (long and short16). The
weekly discussions revealed several points that called for action in the form of a revision of the
annotation guidelines by mainly performing additions to the “general rules” and “specific rules”
sections, that can be summarized as follows:
• Allow the annotation of nested occurrences (entity mentions embedded in longer entity
mentions, e.g., a condition mentioning a required document).
• Annotate only the start and end of longer spans consisting of more than one word.
• If there is an “or” between words of the same type, (e.g., “Prüfung oder
Befähigungsnachweise” (examination or certificates of qualification)), these should be annotated as single
units.
• If there is a definition relevant to the service in the text, the entire definition is annotated
using the category that is also used for the defined concept.
• If the title of the service description mentions “Intended for cancellation”17 or something
similar indicating this service is not up-to-date, the text should not be annotated.
• Negatively formulated actions or conditions, such as “Eine Approbation wird nicht erteilt”
(A licence is not granted), must be marked as negative. To do this, there is a “Negation”
checkbox, which already has “No” selected by default. If there is a negation, “Yes” must
be entered by the annotators.
          </p>
          <p>Each of the annotators annotated all the 20 documents. The latter were adjudicated (curated)
by E, first just as a test round following the previously mentioned adjudication process of the
initial pilot phase. This can be seen as a training for E, that will take over the adjudication in
the next phase. Note that this phase corresponds to the first iteration (iteration 0) of the internal
initial pilot phase as described in Section 3.2.1, which was basically a test phase, and the same
documents will be annotated in the next iteration.</p>
          <p>Project 0.1 In this phase, the same first 20 documents were again annotated according to the
second version of the annotation guidelines18 with the same aim of the creation of two corpus
versions. In this case, however, this was realized using a start-end annotation, where only the
start and the end of relevant spans that consist of more than one word is annotated. The weekly
discussions revealed that the annotation of long spans in particular led to problems because,
despite the continuous expansion of the guidelines, the scope of what counts as long-span
annotation could not be clearly defined. So the guidelines were again adapted to consider only
short spans and the information about the annotation scope was added instead of all the sections
that describe long span annotations:</p>
          <p>
            As a rule, only one word is annotated per category. For example, let’s consider the document
“Bescheinigung über die Wohnberechtigung” (certificate of entitlement to residence). Although
15[
            <xref ref-type="bibr" rid="ref1">1</xref>
            ] in folder annotation_guidelines/30-03-2023_Annotation_Guide_V1.pdf
16For the long span annotations, we consider more information than for the short span, refer to the afore mentioned
annotation guideline for examples
17E.g., in German: “Vorgesehen zur Löschung”
18[
            <xref ref-type="bibr" rid="ref1">1</xref>
            ] in folder annotation_guidelines/13-10-2023_Annotation_Guide_V2.pdf
there is a phrase that describes the certificate in more detail, the core of the entire phrase and
the word that can be considered as a document on its own is the certificate (“Bescheinigung”)
and should therefore be annotated alone. Exceptions are the categories Bedingung (condition)
and Handlungsgrundlage (legal basis). Each of the annotators annotated all the 20 documents.
The documents were curated by E. The latter continued to follow the previous adjudication
process and thus only intervened if there were fewer than two matches in an annotation. This
project provides an intermediate gold standard with 20 documents19.
          </p>
          <p>Project 1 With the three annotators, 30 further documents were annotated according to
the third version of the annotation guidelines20. From now on, only a short span version of
the corpus with single-word annotations (except in exceptional cases) will be created. Each
document was annotated by two of the annotators. For this purpose, each document was
assigned one of the three combinations S1+S2, S1+S3 and S2+S3 to ensure that each person
annotated equally often with the other two. The adjudication phase was performed by E. Here,
the guideline underwent only minor changes such as: typos, removal of not needed rules21,
extension/adaptation of some passages22, and the removal of the negation checkbox, because
it was only rarely used. At the end, one document was removed23, because the legal basis
corresponding to the service had changed. Thus the document was not annotated and the final
number of annotated documents is 29. This project provides an intermediate gold standard with
29 documents24.</p>
          <p>Project 2 With the three annotators, the remaining 960 documents are being annotated
according to the fourth version of the annotation guidelines25. Each document is now only
processed by one annotator. Adjudication is therefore no longer necessary. From December
1st, 2023, annotation continued with only two annotators (S2 and S3). This project provides an
intermediate gold standard with 775 of the 960 documents, that were annotated until January
31st, 202426.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Annotation Guidelines</title>
      <p>A crucial aspect in the creation of training corpora with human annotations are precise and
comprehensive annotation guidelines. They define the task more precisely and aim to ensure
consistent annotations by diferent annotators which is important when training models. For
creating our annotation guidelines, the process is as follows:</p>
      <p>
        1. Creation of a rudimentary set of guidelines
19[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] in folder code/intermediate_corpora/annotation/Normenanalyse_0.1
20[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] in folder annotation_guidelines/13-10-2023_Annotation_Guide_V3.pdf
21E.g., “If there is a definition relevant to the service in the text, the entire definition is annotated using the category
that is also used for the defined concept”
22E.g., extended the explanation of the scope of annotation part
23a99040004076000.txt
24[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] in folder code/intermediate_corpora/annotation/Normenanalyse_1
25[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] in folder annotation_guidelines/10-01-2024_Annotation_Guide_V4.pdf
26[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] in folder code/intermediate_corpora/annotation/Normenanalyse_2
2. Annotation of a small number of documents with more than one annotator
3. Calculation of the IAA to show areas of high and low agreement
4. Discussion of issues and clarification
5. Refinement of the guidelines
6. Start again with Step 2
The initial pilot phase (10 documents from the unbalanced data collection) started by using a very
basic and a short version of the guidelines “Annotation_Guide_V0”27. This first iteration
(iteration 0) allowed the creation of a more detailed annotation guideline “Annotation_Guide_V0.1”28,
that was used to annotated the same 10 documents again (iteration 1). This second iteration
allowed to refine the annotation guideline again and to generate the first draft of the main
version “Annotation_Guide_V0.2”29 that consists of additions after the interview with the expert
(a FIM coach) and the corresponding adjudication phase. This results in the first main version
“Annotation_Guide_V1”30 that is used in the real annotation phase with four diferent annotators
(three law students and one employee in the municipal administration of city Jena, refer to
Figure 2). The latter was refined during the real annotation phase based on the comments
and requirements of the new annotators, but not fundamentally changed. This resulted in
three versions of the annotation guideline, each of which is used in a specific phase of the real
annotation phase as described in Section 3.2.2.
      </p>
    </sec>
    <sec id="sec-5">
      <title>5. GerPS-NER Dataset</title>
      <sec id="sec-5-1">
        <title>5.1. Conversion Scripts</title>
        <p>
          A commonly used data format for annotated data is IOB, meaning Inner-Outer-Begin. It consists
of a file where each line contains a token and the assigned label separated by whitespace. Here,
we use IOB2, which was defined in the CoNLL-2002 shared task [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. To indicate the boundaries
of an annotation, it will start with B to indicate the start of a label, and with I to show that this
token is part of an annotation but not the beginning. IOB1 will only use B- before a label if
immediately before the label another label is given. Tokens without an annotation are followed
by O.
        </p>
        <p>
          The intermediate corpora were generated using scripts that transforms the files in WebAnno TSV
3.3 format31 exported from INCEpTION [19] into IOB2-format needed by the models for training
and testing. As the projects in the real annotation phase difer in the style of annotation, for
the conversion of each project, an individual script was necessary32. Though more information
was collected in the annotation phases, we consider only the biggest annotated entities while
generating the IOB2 file, in case of overlapping annotations because there is not support for those
in IOB2-format. The overlap was allowed starting from phase “Project 0.1” in the real annotation
27[
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]in folder annotation_guidelines/Annotation_Guide_V0.md
28[
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] in folder annotation_guidelines/Annotation_Guide_V0.1.md
29[
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] in folder annotation_guidelines/Annotation_Guide_V0.2.md
30[
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] in folder annotation_guidelines/30-03-2023_Annotation_Guide_V1.pdf
31Refer to https://inception-project.github.io/releases/31.3/docs/user-guide.html#sect_webannotsv for more
information
32[
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] in folder code/conversion_to_iob
phase. This will allow catching as much information as possible during the annotation, and
testing the predictions of nested entities using machine learning models as a future work. In
addition, we also do not consider negation by the IOB2 file generation.
        </p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. Final Dataset</title>
        <p>GerPS-NER is the accumulation of the following intermediate corpora: the short span version
from adjudication phase (10 documents), project 0.1 (20), project 1 (29), and project 2 (775).
Therefore, it consists of 834 documents with 24,613 sentences with 495,303 tokens. Of the tokens,
120,517 (24.3%) were part of an annotation (total annotated tokens), with 29,701 annotations in
total (total annotations) as one annotation can consist of multiple tokens. Table 2 shows the
distribution of the annotations over the diferent categories.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion and Future Work</title>
      <p>We present GerPS-NER, a corpus of annotated German law texts using the extended FIM
- norm analysis categories consisting of 24,613 sentences with 495,303 tokens in total and
29,701 annotations (consisting of one or more annotated tokens) and extensively describe the
annotation process and guidelines. In future work, we will extend the corpus with ongoing
annotated documents and provide further evaluation of its content including metrics and
applications when comparing diferent techniques for annotation legal texts, as we will use
the GerPS-NER corpus to test the efectiveness of rulebased methods (e.g., [ 20]), fine-tuning of
diferent machine learning models, and for prompting with large language models (LLMs) [ 21].
As a start, Bachinger et al. [22] used a first corpus version (as described in Section 3.2.1) to
examine the usage of diferent prompt variations on the performance of NER with LLMs where
they report micro F1-scores for the optimal scenario of 0.82 for LeoLM [23].</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments</title>
      <p>The research projects Canarėno, simpLEX, and KollOM-Fit of the working group openDVA
were funded by the federal digitization budget and the federal government’s fund in the scope of
OZG implementation33 with the support of the Federal Ministry of the Interior and Community,
FITKO 34, and Thuringian Ministry of Finance in Germany35. We would like to thank all
employees, project partners, and supporters of the openDVA working group, who could not
be mentioned here by name, for their great support, helpful comments, discussions, and good
cooperation.
33https://www.bmi.bund.de/SharedDocs/pressemitteilungen/DE/2021/02/ozg-konjunkturmittelverteilung.html
34https://www.fitko.de
35https://finanzen.thueringen.de</p>
      <p>Conference, European Language Resources Association, Marseille, France, 2020, pp. 4478–
4485. URL: https://aclanthology.org/2020.lrec-1.551.
[10] H. Darji, J. Mitrović, M. Granitzer, A dataset of german legal reference annotations (2023).
[11] M. Wrzalik, D. Krechel, GerDaLIR: A German dataset for legal information retrieval, in:
N. Aletras, I. Androutsopoulos, L. Barrett, C. Goanta, D. Preotiuc-Pietro (Eds.), Proceedings
of the Natural Legal Language Processing Workshop 2021, Association for Computational
Linguistics, Punta Cana, Dominican Republic, 2021, pp. 123–128. URL: https://aclanthology.
org/2021.nllp-1.13. doi:10.18653/v1/2021.nllp-1.13.
[12] J. Pustejovsky, A. Stubbs, Natural Language Annotation for Machine Learning, number
v. 9, p. 878 in A Guide to corpus-building for applications, O’Reilly Media, Incorporated,
2013. URL: https://books.google.de/books?id=QtzmqamXxx4C.
[13] L. Feddoul, M. Raupach, F. Löfler, S. Babalou, J. Hoyer, M. Mauch, B. König-Ries, On which
legal regulations is a public service based? fostering transparency in public administration
by using knowledge graphs, in: INFORMATIK 2023 - Designing Futures: Zukünfte
gestalten, Gesellschaft für Informatik e.V., Bonn, 2023, pp. 1035–1040. doi:10.18420/
inf2023_115.
[14] J. Hoyer, Die Erstellung einer Prozessontologie zur Modellierung von
Verwaltungsprozessen, Bachelor’s thesis, Jena, 2023. URL: https://www.db-thueringen.de/receive/
dbt_mods_00057705, bachelorarbeit, Friedrich-Schiller-Universität Jena, 2023.
[15] J. Cohen, A coeficient of agreement for nominal scales,
Educational and Psychological Measurement 20 (1960) 37–46. URL: https://
doi.org/10.1177/001316446002000104. doi:10.1177/001316446002000104.
arXiv:https://doi.org/10.1177/001316446002000104.
[16] Kataloge - katalog download leistungen, https://fimportal.de/kataloge, 2023. Accessed:
05.01.2024.
[17] S. Bachinger, C. Lachenmaier, L. Feddoul, Data-collection-fim-laws, 2024. URL: https:
//doi.org/10.5281/zenodo.10450554. doi:10.5281/zenodo.10450554.
[18] S. Bachinger, C. Lachenmaier, L. Feddoul, Corpus-fim-laws, 2023. URL: https://doi.org/10.</p>
      <p>5281/zenodo.7900297. doi:10.5281/zenodo.7900297.
[19] J.-C. Klie, M. Bugert, B. Boullosa, R. E. de Castilho, I. Gurevych, The inception platform:
Machine-assisted and knowledge-oriented interactive annotation, in: Proceedings of
the 27th International Conference on Computational Linguistics: System
Demonstrations, Association for Computational Linguistics, 2018, pp. 5–9. URL: http://tubiblio.ulb.
tu-darmstadt.de/106270/, event Title: The 27th International Conference on Computational
Linguistics (COLING 2018).
[20] T. Eftimov, B. Koroušić Seljak, P. Korošec, A rule-based named-entity recognition method
for knowledge extraction of evidence-based dietary recommendations, PLOS ONE 12
(2017) 1–32. URL: https://doi.org/10.1371/journal.pone.0179488. doi:10.1371/journal.
pone.0179488.
[21] W. X. Zhao, K. Zhou, J. Li, T. Tang, X. Wang, Y. Hou, Y. Min, B. Zhang, J. Zhang, Z. Dong,
Y. Du, C. Yang, Y. Chen, Z. Chen, J. Jiang, R. Ren, Y. Li, X. Tang, Z. Liu, P. Liu, J.-Y.
Nie, J.-R. Wen, A Survey of Large Language Models, Technical Report, 2023. URL: http:
//arxiv.org/abs/2303.18223. doi:10.48550/arXiv.2303.18223, arXiv:2303.18223 [cs]
type: article.
[22] S. T. Bachinger, L. Feddoul, M. Mauch, B. König-Ries, Extracting Legal Norm Analysis
Categories from German Law Texts with Large Language Models, in: 25th Annual
International Conference on Digital Government Research (dg.o 2024), June 11–14, 2024,
Taipei, Taiwan, 2024. doi:10.1145/3657054.3657277.
[23] B. Plüster, LeoLM: Igniting German-language LLM Research, https://laion.ai/blog/leo-lm/,
2023.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>L.</given-names>
            <surname>Feddoul</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. T.</given-names>
            <surname>Bachinger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Apel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Karg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Klewer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Forshayt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Erd</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mauch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Lachenmaier</surname>
          </string-name>
          , Gerps-ner: Dataset and code,
          <year>2024</year>
          . doi:
          <volume>10</volume>
          .5281/zenodo.10822682.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2] https://fimportal.de/, Normenanalyse, https://fimportal.de/glossar,
          <year>2024</year>
          . Accessed:
          <volume>02</volume>
          .
          <fpage>01</fpage>
          .
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>T. O. M.</given-names>
            <surname>Group</surname>
          </string-name>
          ,
          <article-title>Business process model and notation</article-title>
          , https://www.omg.org/spec/BPMN/2. 0/About-BPMN,
          <year>2010</year>
          . Accessed:
          <volume>02</volume>
          .
          <fpage>01</fpage>
          .
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>M.</given-names>
            <surname>Mauch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. T.</given-names>
            <surname>Bachinger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Bornheimer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Breidenbach</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Ehrhardt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Feddoul</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Legner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Löfler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Löfler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Raupach</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Schindler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Schröder</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>König-Ries</surname>
          </string-name>
          ,
          <article-title>From legal texts to digitized services for public administration</article-title>
          ,
          <source>in: Language Models: Legal Parrots or more? Proceedings of the 27th International Legal Informatics Symposium IRIS</source>
          <year>2024</year>
          ,
          <year>2024</year>
          . URL: https://easychair.org/publications/preprint/PsVv.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhao</surname>
          </string-name>
          , J. Liu,
          <article-title>Human-in-the-loop based named entity recognition</article-title>
          ,
          <source>in: 2021 International Conference on Big Data Engineering and Education (BDEE)</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>170</fpage>
          -
          <lpage>176</lpage>
          . doi:
          <volume>10</volume>
          . 1109/BDEE52938.
          <year>2021</year>
          .
          <volume>00037</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>J.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Sun</surname>
          </string-name>
          , J. Han,
          <string-name>
            <given-names>C.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <article-title>A survey on deep learning for named entity recognition</article-title>
          ,
          <source>IEEE Transactions on Knowledge and Data Engineering</source>
          <volume>34</volume>
          (
          <year>2022</year>
          )
          <fpage>50</fpage>
          -
          <lpage>70</lpage>
          . doi:
          <volume>10</volume>
          .1109/TKDE.
          <year>2020</year>
          .
          <volume>2981314</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>E. F.</given-names>
            <surname>Tjong Kim</surname>
          </string-name>
          <string-name>
            <surname>Sang</surname>
          </string-name>
          ,
          <article-title>Introduction to the CoNLL-2002 Shared Task: Language-Independent Named Entity Recognition</article-title>
          ,
          <source>in: COLING-02: The 6th Conference on Natural Language Learning</source>
          <year>2002</year>
          (CoNLL-2002),
          <year>2002</year>
          . URL: https://aclanthology.org/W02-2024.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>E.</given-names>
            <surname>Leitner</surname>
          </string-name>
          , G. Rehm,
          <string-name>
            <given-names>J.</given-names>
            <surname>Moreno-Schneider</surname>
          </string-name>
          ,
          <article-title>Fine-grained Named Entity Recognition in Legal Documents</article-title>
          , in: M.
          <string-name>
            <surname>Acosta</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Cudré-Mauroux</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Maleshkova</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Pellegrini</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Sack</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Sure-Vetter</surname>
          </string-name>
          (Eds.),
          <source>Semantic Systems. The Power of AI and Knowledge Graphs. Proceedings of the 15th International Conference (SEMANTiCS</source>
          <year>2019</year>
          ),
          <source>number 11702 in Lecture Notes in Computer Science</source>
          , Springer, Karlsruhe, Germany,
          <year>2019</year>
          , pp.
          <fpage>272</fpage>
          -
          <lpage>287</lpage>
          . 10/
          <issue>11</issue>
          <year>September 2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>E.</given-names>
            <surname>Leitner</surname>
          </string-name>
          , G. Rehm,
          <string-name>
            <given-names>J.</given-names>
            <surname>Moreno-Schneider</surname>
          </string-name>
          ,
          <article-title>A dataset of German legal documents for named entity recognition</article-title>
          ,
          <source>in: Proceedings of the Twelfth Language Resources and Evaluation</source>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>