<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Detecting Occupations in German Texts: Challenges and Data</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Thomas Reiser</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jens Dörpinghaus</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Petra Steiner</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Federal Institute for Vocational Education and Training (BIBB)</institution>
          ,
          <addr-line>Bonn</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Linnaeus University, Department of Computer Science and Media Technology</institution>
          ,
          <addr-line>Växjö</addr-line>
          ,
          <country country="SE">Sweden</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>University of Koblenz, Department of Computer Science</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <abstract>
        <p>This paper is concerned with the detection and classification of occupational titles in German texts, with a focus on the linguistic and structural challenges that are unique to the German language. The study utilizes an extensive dataset on job title variants to assess rule-based and language model-based methodologies across diverse corpora, encompassing historical documents and parliamentary proceedings. The findings indicate that rule-based methods demonstrate robust performance, particularly in structured texts, while large language models exhibit complementary strengths in recognizing complex terms. The work makes a significant contribution to the field by providing valuable annotated data and methodological insights.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Text analysis</kwd>
        <kwd>NER</kwd>
        <kwd>name detection</kwd>
        <kwd>computational social sciences</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>In addressing the second research question, the focus is directed towards two distinct datasets. The
ifrst dataset comprises German parliamentary debates. The second dataset encompasses GDR job
descriptions (Berufsbilder). The principal contribution of this paper is a substantial collection of novel
training data and a systematic analysis of the challenges associated with the detection of German job
titles according to KldB.</p>
      <p>The present document is organized into five sections.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        The integration of diverse labor market data sources is widely acknowledged as a multifaceted
undertaking [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. However, the present study focuses on the mapping and automated classification of job titles
in the German language. While dictionary-based methods are commonly employed, machine learning
(ML)-based approaches have also been explored. Existing training datasets are often compiled from
survey responses or classification systems, including KldB, ESCO, and other synonym collections.
      </p>
      <p>
        We follow our study of automated classifications presented in [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. A multitude of classification
categories are recognized for occupations. The International Standard Classification of Occupations
(ISCO) was developed by the International Labor Organization (ILO) and published in 1958, 1968,
1988, and most recently in 2008)1. The ISCO 2008 has also been utilized within the European Union
(EU), with certain German-speaking countries (Germany, Austria, and Switzerland) developing a
customized version of the classification. The International Standard Classification of Occupations
(ISCO) is structured at a skill level and linked to the “European Skills, Competences, Qualifications
and Occupations” (ESCO) ontology, which adds another hierarchy level to the data. In Germany, the
Classification of Occupations (KldB) serves as the reference classification for the Federal Employment
Agency (BA) and its research institute (IAB)2. In this organization, occupations are structured at a
task level. The most recent version is the 2020 revision of the KldB 2010, which has undergone a
comprehensive redesign, thereby rendering the previous versions from 1988 and 1992 obsolete. The
development of this system was undertaken with the objective of ensuring compatibility with the
ISCO-08 standard. The study of job titles and taxonomies has a long history, extending even before the
advent of computer technology [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>
        A portion of the research has focused on the classification of OJAs according to the O*NET framework
[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. This has included the application of normalization approaches [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] and similarity-based methods
[
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. The classification of job titles is also employed in the context of online job recruitment [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
      </p>
      <p>
        A limited number of publications have been published on the subject of German job titles, with a
particular emphasis on the German KldB. For instance, a technical report based on OJAs [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] with
challenges on level 4, but with promising results on level 1. Malte Schierholz’s 2018 publication [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]
introduced the concept of auxiliary classifications in the field of occupational coding. For further
research on the subject of occupational coding in surveys, we refer to [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. A master’s thesis endeavors
to predict KldB 5-digit job titles from survey data, thereby highlighting the persistent challenges
associated with this endeavor, see [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. In a similar vein, a scholarly article was published that compared
the classification of survey data using BERT and GPT-3, see [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. However, the absence of a standardized
reporting methodology precludes the direct comparison of their results. Nevertheless, they evince
analogous challenges to those observed in other studies. Our previous study [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] lends support to
this assertion, particularly in terms of the conclusion that large language models (LLMs) are not
capable of enhancing the quality of automated classifiers to a significant extent. Consequently, both
the classification of occupational areas and, in particular, the level of performance (5th digit) persist as
arduous tasks.
      </p>
      <sec id="sec-2-1">
        <title>1See https://www.ilo.org/public/english/bureau/stat/isco/isco08/.</title>
        <p>2See https://statistik.arbeitsagentur.de/DE/Navigation/Grundlagen/Klassifikationen/Klassifikation-der-Berufe/
KldB2010-Fassung2020/KldB2010-Fassung2020-Nav.html.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Data and Methods</title>
      <p>3.1. Data
To identify job titles, an extensive dataset was utilized, encompassing 526,535 synonyms and variants
of male, female, and neutral job titles. This dataset was provided by the German Federal Employment
Agency (BA)3. For illustrative purposes, the following examples are given: “Meister – Maßschneiderei”
and “Herrenschneidermeisterin”, both of which link to KldB 28293-901. However, it should be noted that
the first five digits are utilized exclusively, and a link to 28293 is established. The presence of numerous
duplicates results in a non-unique linkage. The most prominent example is the term “Meister” (Master),
which is linked to nearly all crafts. Furthermore, the dataset under consideration contains terms
associated with occupations. For instance, the terms “Kohle”, “Naturwerkstein”, and “Anlagenführung”
are linked to 21212-129. Consequently, this dataset can also be utilized to identify any implicit relations
in a text that link to occupations. Nevertheless, for the aforementioned approach, a blacklist of these
terms was created. While these terms are classified under the gender-neutral section, the removal of
these terms would also result in the removal of all gender-neutral occupational titles.</p>
      <p>
        The initial test dataset comprises job descriptions (Berufsbilder) utilized in the former German
Democratic Republic (GDR) for the purpose of vocational guidance, see [
        <xref ref-type="bibr" rid="ref15 ref16">15, 16</xref>
        ]. These documents are
available for a variety of occupations and contain eight pages of information regarding requirements, the
content of (vocational) education, skills, and workplaces. Furthermore, these documents often include
information regarding additional training opportunities (Weiterbildungsmöglichkeiten). However, as
shown in Figure 1, the layout and content varies through the years. A preliminary investigation of job
postings reveals a heterogeneity in the occupational requirements. While some positions do not specify
additional occupations or training prerequisites, other positions do. For instance, the job description for
“Berufskraftfahrer” (professional driver) includes qualifications such as “Fahrlehrer” (driving instructor),
“Meister für Transportbetriebstechnik”, and even university degrees. Therefore, in order to methodically
ascertain the interrelationships between disparate occupations, it is necessary to assess which other
occupations are referenced.
      </p>
      <p>
        The Corpus of Plenary Proceedings of the German Bundestag (CPP-BT) is another dataset that will
be utilized in this study, see [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]. This data set consists of 4,566 plenary proceedings of the German
Bundestag. The document under consideration is a compilation of all plenary minutes from the first
legislative period until the twenty-fourth. May 2025. The initial XML data was retrieved from the Open
      </p>
      <sec id="sec-3-1">
        <title>3Available at https://www.arbeitsagentur.de/institutionen/dkz-downloadportal.</title>
        <p>und Natursteinen. Durch ein Fachschulstudium können Sie sich zum Ingenieur für
Baustofftechnologie, Ingenieur der Bauelementeproduktion oder Ingenieur für Verfahrenstechnik</p>
        <p>FilterList</p>
        <p>LLMs
PhraseMatcher</p>
        <p>Occupation title
Data Portal of the German Bundestag and the Documentation and Information System for Parliamentary
Materials (DIP) up to the respective cut-of date.</p>
        <p>While the initial dataset does not include any individual names, the subsequent dataset highlights
the complexity of identifying individuals by their family names when these names are associated with
specific job titles, as previously discussed. Examples include:
• Carsten Schneider spricht jetzt für die SPD-Fraktion. (Carsten Schneider now speaks on behalf of
the SPD parliamentary group.)
• Kollege Schneider, ich glaube, die Bürger haben richtig entschieden
• Da wird Herr Weber nicht begeistert sein! (Mr. Schneider, I believe that the citizens have made
the right decision!)
• Herr Koch war sowieso dagegen. (Mr. Koch was against it anyway.)
• Stefan Müller [Erlangen] [CDU/CSU]
• Zimmermann, Sabine DIE LINKE</p>
        <sec id="sec-3-1-1">
          <title>3.2. Methods</title>
          <p>The workflow delineates two sequential steps: initial identification of job titles, followed by subsequent
determination of whether said titles refer to an individual or pertain to a previously mentioned
occupation. To identify all relevant candidates, a PhraseMatcher is employed, leveraging the data set
comprising job titles and synonyms. The present study utilizes Python 3.11.2 and the Spacy library. For
a visual representation of the complete workflow, refer to Figure 2.</p>
          <p>The initial step in this process is to either substantiate or refute the candidates’ claims, as certain
terms may not be applicable to a specific position. The approach employed involves the utilization
of a filter list, which comprises a manually curated list, or the execution of queries against diverse
LLMs to ascertain the nature of the term in question, specifically determining whether it pertains to an
occupation or not. In the evaluation, the initial approach is designated as "Rule based."
• If the previous word is a well-known given name (list retrieved from Wikidata), return true.
• If the previous word is a particular title (Kollege, Kollegin, Herr, Frau, or Dr.) or a party name
(SPD, CDU/CSU, FDP, ...) return true.</p>
          <p>• If the previous or next word is a title or begins with a bracket, return true.</p>
          <p>The subsequent task is to ascertain whether the term in question is a family name. The initial approach
is a rudimentary rule-based strategy that utilizes a concise sequence of other words:</p>
          <p>The second approach once again delegates this question to diferent LLMS. In the evaluation, the
initial approach is again designated as "Rule based."</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experimental Results</title>
      <sec id="sec-4-1">
        <title>4.1. GDR job descriptions</title>
        <p>As illustrated in Table 1, the following metrics are presented on GDR job descriptions. The rule-based
approach is the most efective method for identifying job title candidates. However, it is subject to
certain limitations when it comes to terms such as “Aufbereitung”.</p>
        <p>It has been observed that certain models appear to encounter dificulties with extended terms, such as
“Ingenieur für Verfahrenstechnik”. This particular term was identified by both the rule-based approach
and llama3.2, yet it did not elicit any results from other models. Additional challenges were identified,
including the term “Ausbildung” (training), which was identified by the smaller models. Nevertheless,
the rule-based approach was the only method that correctly identified more exotic terms such as
“Fähnrich” and even “Berufskraftfahrer”.</p>
        <p>It is not surprising that the rule-based approach is the most efective method for name detection. As
this text category does not include names, the application of all rules is not guaranteed. Nevertheless, it
is noteworthy that qwen3 also achieved this optimal performance.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. German Bundestag</title>
        <p>As illustrated in Table 2, the result metrics for an annotated subset of 20 plenary discussions of the
German Bundestag are presented. Once more, the rule-based approach to identifying job titles evidently
surpasses the capabilities of LLM approaches. In particular, the qwen3 model has been observed to
detect a considerable number of generic terms, such as “Beratung” (consulting), as occupational titles.
However, both the phi4 and qwen3 models have demonstrated challenges in identifying terms such
as “Wohnungsbau” and “Recht”. However, the performance of phi4 appears to be satisfactory, as
evidenced by its satisfactory performance in both macro and weighted 1-score metrics. In contrast,
the performance of all other models is clearly biased towards macro weights.</p>
        <p>With regard to the identification of family names, the precision and recall metrics demonstrate
eficacy across all models for weighted scores. However, the eficacy of both phi4 and the rule-based
approach has been demonstrated. The annotated data set exhibits significant imbalance, with nearly 94%
of all candidates comprising non-job titles and only 0.2% consisting of family names. Consequently, the
macro values are not readily discernible. However, preliminary findings suggest the potential eficacy
of LLM approaches over the proposed rule-based approach.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusions and Outlook</title>
      <p>This study investigated the complex task of detecting and classifying occupational titles in German texts,
highlighting the linguistic and contextual challenges that render this problem particularly challenging.
The research utilizes a comprehensive dataset comprising 526,535 job title variants and synonyms, and
it analyzes two distinct textual domains: GDR vocational descriptions, and plenary proceedings of the
German Bundestag. This approach ofers a comprehensive perspective on the nuances involved in
occupational detection.</p>
      <p>The experimental results underscore the continued efectiveness of rule-based approaches, particularly
in structured texts such as GDR job descriptions. The eficacy of these methods was demonstrated by
their high precision and recall in identifying occupational titles and family names, with several cases
exhibiting superior performance in comparison to large language models (LLMs). However, LLMs such
as llama3.2 demonstrated proficiency in detecting more complex, multi-word occupational expressions
that were occasionally overlooked by rule-based systems. Despite these advancements, the process
of mapping occupations to fine-grained classification codes, such as the 5-digit level of the German
Klassifikation der Berufe (KldB), persists as a challenging endeavor. The task is further complicated by
the presence of ambiguities caused by overlapping uses of names and job titles, as well as the presence
of gender-specific, gender-neutral, and historically derived terms.</p>
      <p>
        In light of these findings, this work suggests several promising areas for future research. First, the
usage of other AI-approaches like BERT, see [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], could improve the quality. Second, the integration of
rule-based methodologies with LLMs within hybrid systems has the potential to achieve a harmonious
equilibrium between precision and adaptability, see for example [
        <xref ref-type="bibr" rid="ref18 ref19">18, 19</xref>
        ]. The augmentation of training
data to encompass a more diverse array of contemporary sources, including user-generated content
from online platforms and digitized historical documents, has the potential to enhance the robustness
and applicability of models.
      </p>
      <p>Furthermore, the implementation of cross-lingual mapping and alignment with international
classification systems such as ISCO and ESCO would expand the practical relevance of these methods
beyond national boundaries. As researchers increasingly experiment with LLMs for occupational coding,
the development of standardized benchmarks and evaluation frameworks will be essential to ensure
comparability and reproducibility of results.</p>
      <p>In sum, the present study furnishes a substantial dataset and a series of methodological insights that
establish the foundation for more precise and extensible systems in occupational text analysis. The
ifndings are of particular pertinence for applications in labor market research, policy analysis, and
computational social science, where the automated identification of occupational information persists
as a salient concern.</p>
    </sec>
    <sec id="sec-6">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the authors used DeepL in order to: Grammar and spelling check.
After using these tool(s)/service(s), the authors reviewed and edited the content as needed and take(s)
full responsibility for the publication’s content.</p>
      <sec id="sec-6-1">
        <title>The sources for the ceur-art style are available via</title>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>J.</given-names>
            <surname>Binnewitt</surname>
          </string-name>
          ,
          <article-title>Recognising occupational titles in german parliamentary debates</article-title>
          ,
          <source>in: Proceedings of the 8th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage</source>
          , Social Sciences,
          <article-title>Humanities and Literature (LaTeCH-CLfL</article-title>
          <year>2024</year>
          ),
          <year>2024</year>
          , pp.
          <fpage>221</fpage>
          -
          <lpage>230</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>K.</given-names>
            <surname>Hein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Dörpinghaus</surname>
          </string-name>
          ,
          <article-title>What is said about vet on social media in germany? trends, demands, and opinions</article-title>
          .,
          <source>in: NORDYRK BOOK OF ABSTRACTS</source>
          ,
          <year>2024</year>
          , p.
          <fpage>109</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>R.</given-names>
            <surname>Dorau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Hein</surname>
          </string-name>
          ,
          <article-title>Towards the automated classification of german job titles according to kldb</article-title>
          ,
          <source>in: 205th Conference on Computer Science and Information Systems (FedCSIS)</source>
          ,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>A.</given-names>
            <surname>Fischer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Dörpinghaus</surname>
          </string-name>
          ,
          <article-title>Web mining of online resources for german labor market research and education: Finding the ground truth?, Knowledge 4 (</article-title>
          <year>2024</year>
          )
          <fpage>51</fpage>
          -
          <lpage>67</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>N. R.</given-names>
            <surname>Council</surname>
          </string-name>
          , D. of Behavioral, S. Sciences,
          <string-name>
            <surname>C.</surname>
          </string-name>
          <article-title>on Occupational Classification, Analysis, Work, jobs, and occupations: A critical review of the dictionary of occupational titles (</article-title>
          <year>1980</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>F.</given-names>
            <surname>Javed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>McNair</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Jacob</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <article-title>Towards a job title classification system</article-title>
          ,
          <source>arXiv preprint arXiv:1606.00917</source>
          (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Javed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Ozturk</surname>
          </string-name>
          ,
          <article-title>Document embedding strategies for job title classification</article-title>
          .,
          <source>in: FLAIRS</source>
          ,
          <year>2017</year>
          , pp.
          <fpage>221</fpage>
          -
          <lpage>226</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>I.</given-names>
            <surname>Rahhal</surname>
          </string-name>
          ,
          <string-name>
            <surname>K. M. Carley</surname>
            , I. Kassou,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Ghogho</surname>
          </string-name>
          ,
          <article-title>Two stage job title identification system for online job advertisements</article-title>
          ,
          <source>IEEE Access 11</source>
          (
          <year>2023</year>
          )
          <fpage>19073</fpage>
          -
          <lpage>19092</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>F.</given-names>
            <surname>Javed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Luo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>McNair</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Jacob</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. S.</given-names>
            <surname>Kang</surname>
          </string-name>
          ,
          <article-title>Carotene: A job title classification system for the online recruitment domain</article-title>
          ,
          <source>in: 2015 IEEE First International Conference on Big Data Computing Service and Applications</source>
          , IEEE,
          <year>2015</year>
          , pp.
          <fpage>286</fpage>
          -
          <lpage>293</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>R.</given-names>
            <surname>Baskaran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Müller</surname>
          </string-name>
          ,
          <article-title>Classification of german job titles in online job postings using the kldb-2010 taxonomy (</article-title>
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>M.</given-names>
            <surname>Schierholz</surname>
          </string-name>
          ,
          <article-title>An auxiliary classification with work activity descriptions for occupation coding</article-title>
          ,
          <source>AStA Wirtschafts-und Sozialstatistisches Archiv</source>
          <volume>12</volume>
          (
          <year>2018</year>
          )
          <fpage>285</fpage>
          -
          <lpage>298</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>A.</given-names>
            <surname>Müller</surname>
          </string-name>
          ,
          <article-title>The implementation of the German Classification of Occupations 2010 in the IAB Job Vacancy Survey: documentation of the implementation process</article-title>
          ,
          <source>Technical Report, IABForschungsbericht</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>V. P. V.</given-names>
            <surname>Karanam</surname>
          </string-name>
          ,
          <article-title>Occupation coding using a pretrained language model by integrating domain knowledge (</article-title>
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>P.</given-names>
            <surname>Safikhani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Avetisyan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Föste-Eggers</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Broneske</surname>
          </string-name>
          ,
          <article-title>Automated occupation coding with hierarchical features: A data-centric approach to classification with pre-trained language models</article-title>
          ,
          <source>Discover Artificial Intelligence</source>
          <volume>3</volume>
          (
          <year>2023</year>
          )
          <article-title>6</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>T.</given-names>
            <surname>Reiser</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Dörpinghaus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Steiner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Tiemann</surname>
          </string-name>
          ,
          <article-title>Towards a datatset of digitalized historical german vet and cvet regulations</article-title>
          ,
          <source>Data</source>
          <volume>9</volume>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>T.</given-names>
            <surname>Reiser</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Dörpinghaus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Steiner</surname>
          </string-name>
          ,
          <article-title>Analyzing historical legal textcorpora: German vet and cvet regulations</article-title>
          ,
          <source>in: INFORMATIK</source>
          <year>2024</year>
          , Gesellschaft für Informatik eV,
          <year>2024</year>
          , pp.
          <fpage>2007</fpage>
          -
          <lpage>2018</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>S.</given-names>
            <surname>Fobbe</surname>
          </string-name>
          ,
          <article-title>Corpus der plenarprotokolle des deutschen bundestages (cpp-bt</article-title>
          ),
          <year>2025</year>
          . URL: https: //doi.org/10.5281/zenodo.15462956. doi:
          <volume>10</volume>
          .5281/zenodo.15462956.
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>S.</given-names>
            <surname>Laqrichi</surname>
          </string-name>
          ,
          <article-title>A hybrid framework for cosmic measurement: Combining large language models with a rule-based system</article-title>
          ,
          <source>IWSM-Mensura</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>M.</given-names>
            <surname>Billi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Parenti</surname>
          </string-name>
          , G. Pisano,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sanchi</surname>
          </string-name>
          ,
          <article-title>A hybrid approach for accessible rule-based reasoning through large language models</article-title>
          ,
          <source>in: 18th International Workshop on Juris-Informatics</source>
          ,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>