<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>The Australian
Library Journal 54 (2005) 24-30. doi:10.1080/00049670.2005.10721710.
[62] S. Peroni</journal-title>
      </journal-title-group>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.1007/978</article-id>
      <title-group>
        <article-title>Perspectives on the Renaissance with a Knowledge Graph of Giorgio Vasari's The Lives</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Sarah Rebecca Ondraszek</string-name>
          <email>sarah-rebecca.ondraszek@fiz-karlsruhe.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>FIZ Karlsruhe - Leibniz Institute for Information Infrastructure</institution>
          ,
          <addr-line>Eggenstein-Leopoldshafen</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Institute of Applied Informatics and Formal Description Methods (AIFB) of KIT</institution>
          ,
          <addr-line>Karlsruhe</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <volume>3724</volume>
      <fpage>2</fpage>
      <lpage>6</lpage>
      <abstract>
        <p>In the digital humanities, semantic technologies have been recognized as providing the necessary bits and pieces to represent the complex and often ambiguous nature of humanities data. Despite this growing interest, a lack of practical frameworks for modeling the complex, usually multifaceted and multilingual, historical sources remains. In this paper, we present Viewsari, an ongoing Ph.D. project aiming to build a knowledge graph based on Giorgio Vasari's Lives of the Most Excellent Painters, Sculptors, and Architects (1568), referred to as The Lives. This collection of biographies of important Renaissance artists, recounting tales of their lives and describing their artistic styles and works, is widely regarded as the first modern work of art history. With it, Vasari shaped the The Viewsari project draws on knowledge extraction and aims to contextualize content from diferent editions of Vasari's The Lives, addressing the challenges of working with complex, multilingual historical texts. Situated at the intersection of digital humanities and the Semantic Web, it demonstrates how modular, pattern-driven ontology development, leveraging Ontology Design Patterns and the eXtreme Design methodology, can support the structured representation and exploration of information across diferent editions and linguistic versions. The central goal is to generalize the Viewsari framework to match similar challenges, i.e., enriching and interconnecting textual sources in diferent domains.</p>
      </abstract>
      <kwd-group>
        <kwd>Digital humanities</kwd>
        <kwd>knowledge graphs</kwd>
        <kwd>ontologies</kwd>
        <kwd>knowledge extraction</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        As an interdisciplinary field of study, the digital humanities (DH) encompasses domains such as art
history, performative arts, and literary studies, all of which provide heterogeneous data. Thus, it
represents a challenging domain for the application and evaluation of semantic technologies [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ].
Their application in the DH, however, often results in isolated or overly complex solutions that lack
interoperability and reusability. This partly stems from the heterogeneous nature of the data, which
frequently compels researchers to create ad hoc solutions instead of modularized approaches [
        <xref ref-type="bibr" rid="ref3 ref4">3, 4</xref>
        ].
      </p>
      <p>This Ph.D. project lies at the intersection of DH and the Semantic Web. With it, we explore how
semantic technologies can enable structured, interoperable, and reusable knowledge representations
to support the modeling and analysis of complex and frequently multilingual (historical) texts. To
this end, we incorporate ontology engineering, knowledge graph (KG) construction, and knowledge
extraction using natural language processing (NLP) tools and state-of-the-art (SOTA) large language
models (LLMs).</p>
      <p>The chosen case study to demonstrate these aspects is Giorgio Vasari’s Renaissance literary work
biographies of prominent artists of the time, it is considered one of the founding works of art history as
a discipline. Over the past 450 years, various eforts have contributed to its analysis, including scholarly</p>
      <p>CEUR</p>
      <p>
        ceur-ws.org
editions, commentaries, annotations, and reinterpretations. These eforts have typically followed the
tradition of “close reading,” focusing on detailed, line-by-line interpretations [
        <xref ref-type="bibr" rid="ref6 ref7">6, 7</xref>
        ].
      </p>
      <p>
        Recent years have shown a growing interest in Franco Moretti’s concept of scaling literary analysis
across larger corpora of texts to a ‘bird-perspective’, so-called ‘distant reading’, analyzing patterns using
computational methods to uncover broader connections [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
      </p>
      <p>
        Despite the shift, to this day, The Lives, among other seminal works, remains a source of unstructured
text. However, a structured representation is necessary to represent the multifaceted content of many
sources, be it historical, scientific, or of fantastic matter. In Vasari’s case, this can open the biographies to
interdisciplinary research questions, facilitate navigation and interpretation of the web of relationships
and contexts described, and possibly allow for new insights and perspectives on the data from both
a close and distant reading perspective. Entity extraction and linking lay the foundation for this
construction, enabling machine-understandable, interoperable representations [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
      </p>
      <p>
        We introduce this project as two-fold: One part is the overall Viewsari project (referred to simply
as Viewsari), which symbolizes the framework and entire process – Viewsari, because it intends to
support an experience of the Renaissance through Vasari’s eyes. The other part is the Viewsari KG as
the beating heart of this Ph.D. project. Central aspects of the Viewsari KG are the formal representations
of contextual implications for and in (historical) social networks and the representation of subjective
claims about situations to explicitly address digital hermeneutics and context sensitivity as critical
aspects in the interpretation of information [
        <xref ref-type="bibr" rid="ref10 ref11 ref12">10, 11, 12</xref>
        ]. Furthermore, with the knowledge extraction
pipeline behind Viewsari, we investigate the use of NLP and LLMs to extract named entities from texts,
especially those with ambiguous references or mere textual descriptions, supported by the iconographic
interpretation model by Panofsky in the case of artworks [
        <xref ref-type="bibr" rid="ref13 ref14">13, 14</xref>
        ].2
      </p>
      <p>Overall, our goal is to create a generalizable framework that can be used outside this Ph.D. project
and in other domains (outside of DH) to transform unstructured texts into ontology-based KGs. Results
shall be provided in a comprehensible interface, with a focus on the formal representation of content
and text-related information across various versions, whether linguistic, structural, or editorial.</p>
      <p>
        Problem Statement. We identify the key challenge to be a lack of repeatable methods for extracting
structured information from these sources and linking it to existing databases. Especially due to
linguistic variations and ambiguities, this remains a significant hurdle. This also concerns recognizing
entities from their description and the treatment of long-tail cultural heritage data: lesser-known
artists or vague artwork references through textual descriptions, which are often crucial for humanities
research but not represented in standard knowledge bases, so-called out-of-knowledge-base (OOKB)
entities [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. Traditional NLP techniques often fail in these domains. However, recent innovations in
LLMs promise to bridge this gap [
        <xref ref-type="bibr" rid="ref16 ref17 ref9">9, 16, 17</xref>
        ].
      </p>
      <p>
        Furthermore, the creation of consistent, reusable, and comprehensible formal representations that
ifnd application beyond a single research project (especially in DH) remains a challenge [
        <xref ref-type="bibr" rid="ref1 ref4">1, 4</xref>
        ].
      </p>
      <p>The challenges frequently encountered in extracting knowledge from historical texts and modeling
their content, such as ambiguous terminology, interpretive knowledge, multilinguality, lengthy
descriptions instead of direct references, and the need for detailed provenance, are also relevant in other
domains. The same holds true for knowledge representations. This points to the need for generalizable
solutions.</p>
      <p>
        Importance. The Lives provides a rich and complex data set. It has had a lasting influence on
art historical research and sets well-defined boundaries for a contained scope, making it a valuable
proof-of-concept for the domain. Although the scope allows for the focused development and evaluation
of proposed methodologies, the dataset also presents significant challenges: the extraction of unknown
entities and the representation of historical context. This includes fine-tuning prompts for LLMs and
experimenting with potential models for entity matching, especially with unseen entities [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]. To
support digital hermeneutics and source criticism, Viewsari ofers a structured representation. Moreover,
a key feature is the formal representation of provenance information, contextual implications, and
2The sources for the Viewsari project can be found via https://github.com/ISE-FIZKarlsruhe/viewsari.
historical relatedness [
        <xref ref-type="bibr" rid="ref10 ref11 ref19">10, 11, 19</xref>
        ]. Additionally, the underlying ontology development process serves
as a proof-of-concept for overcoming inconsistent modeling approaches and the dificulty of aligning
formal representations with interpretively layered data, aiming to develop transferable best practices
that researchers can apply in other DH-related contexts.
      </p>
      <p>
        Solving this problem has value for multiple communities beyond the humanities. As a reproducible
knowledge extraction and KG construction workflow, the methodology developed in Viewsari addresses
the aforementioned challenges, which can also be prevalent in applications such as scientific knowledge
representation, the representation of complex processes and their provenance in experiments, or
social/biographical networks [
        <xref ref-type="bibr" rid="ref20 ref21">20, 21</xref>
        ]. This positions the Viewsari methodology with broader relevance
and motivates its potential generalization.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        Shaping the way scholars conduct research in the humanities, digital methods brought new perspectives,
unlocking new perceptions of what was previously hidden in source material at a larger scale [
        <xref ref-type="bibr" rid="ref12 ref8">8, 12</xref>
        ].
The following section provides an overview of related work in the context of the Viewsari project,
focusing on knowledge extraction with LLMs, related challenges, and ontology engineering, particularly
in the DH. It also explores cross-domain approaches to the diverse challenges mentioned in the previous
section.
      </p>
      <sec id="sec-2-1">
        <title>2.1. Knowledge Extraction with Large Language Models</title>
        <p>
          Information extraction techniques such as named entity recognition (NER) and entity linking provide
a way to access knowledge stored in unstructured, text-based material, and for transforming these
sources into structured representations [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ].
        </p>
        <p>
          Despite the inherent heterogeneity of cultural heritage data, a commonly agreed-upon set of
information is relevant to many research questions [
          <xref ref-type="bibr" rid="ref22 ref23">22, 23</xref>
          ]. Named entities, which include people, places,
and historical events, are a crucial part of this. In early attempts to extract these, as shown in a survey
by Sporleder, the pipelines for NER often involved classification tasks that required domain-specific
training with annotated data that was frequently unavailable, and therefore, the models performed
poorly.
        </p>
        <p>
          Recent work with SOTA transformer-based technologies, such as BERT- or GPT-based models,
showed that improved extraction would be possible in a zero-shot setting [
          <xref ref-type="bibr" rid="ref12">12, 25</xref>
          ]. The same applies to
text-to-KG approaches, which mitigate semantic parsing to transform unstructured sources to structured
graph representations, using abstract meaning representations [26]. The application of these techniques
in specific domains, such as art history, is still being explored, and thus continues to pose problems in
the area of artwork or motif recognition and linking from unstructured, descriptive passages [
          <xref ref-type="bibr" rid="ref9">9, 27</xref>
          ].
        </p>
        <sec id="sec-2-1-1">
          <title>2.1.1. Ambiguities and Descriptive Passages in Entity Extraction</title>
          <p>
            Extracting vaguely referenced entities, even with SOTA approaches and LLMs, remains a challenge. In
particular, this concerns references via descriptive passages or non-named references. Furthermore,
non-standardized terminology and variation in writing styles can complicate the extraction of relevant
entities from unstructured texts [
            <xref ref-type="bibr" rid="ref20">20</xref>
            ]. Recent work like AI4DiTraRe [28] has shown that research data
frequently lacks terminological consistency and contains nested concepts. An example is the ambiguous
references to parameters in studies, such as ‘Age’, which might appear as ‘AGE’, ‘Age of sample’, or
‘age (years)’, leading to dificulties in extraction as a singular entity. As a normalization framework,
AI4DiTraRe proposes an LLM-based pipeline. Similarly, Viewsari applies prompt-based pipelines with
LLMs to extract and disambiguate named entities from Vasari’s text, given implicit references and cases
when no canonical label is present.
          </p>
        </sec>
        <sec id="sec-2-1-2">
          <title>2.1.2. Out-of-Knowledge-Base Entities</title>
          <p>
            Advances in KG construction have addressed the challenge of treating OOKB with graph neural
networks (GNNs), generating embeddings for unseen entities [
            <xref ref-type="bibr" rid="ref15">15</xref>
            ]. Projects like CHAD-KG exemplify
the integration with cultural heritage data, providing a reproducible pipeline to accommodate OOKB
entities with mapping rules to normalize and create IRIs for input data [29]. To expand on these works,
within this Ph.D. project, in addition to the extraction of OOKB entities, a focus is on formally describing
their provenance and uncertainty in an ontology.
          </p>
        </sec>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Ontology Engineering in the Digital Humanities</title>
        <p>Over the years, several methodologies have emerged to facilitate ontology development. One of these
is XD, a dynamic approach to ontology engineering. Diferent, iterative phases guide the process in
close consultation with domain experts. All steps focus on reusability and interoperability of developed
formalizations. At the heart of this are ODPs, modular solutions for formalizations, addressing recurring
design problems in ontologies, which can be shared across domains and reused like templates [30, 31, 32].
The advantages of pattern-based approaches have been explored in several works, highlighting aspects
such as a lower overall error rate or higher understandability [31].</p>
        <p>
          In the domain of cultural heritage, XD and patterns have been applied, e.g., in the ArCo KG [33],
which integrates Italian resources. However, the broader adoption of the practices remains uncharted
territory, and there is no overarching standard for connecting diverse ontologies in the DH [
          <xref ref-type="bibr" rid="ref10">10, 34</xref>
          ].
In art history, patterns have been utilized to encode the complex interpretation process of artworks
[
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] or digital collections of artworks [35]. Additionally, KGs have been highlighted for encoding
contextual implications for and in (historical) social networks [
          <xref ref-type="bibr" rid="ref19">19, 36, 37</xref>
          ]. This approach relates to a
similar endeavor from Shimizu et al. (2023), in which they introduce the ongoing work of building a
library for modular ontology design (MODL), aiming to abstract and harmonize patterns across diferent
ontologies. Similarly, in Viewsari, the goal is to reduce fragmentation and bridge the gap between
pattern-based and text-based contextual annotations in a structured KG.
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Cross-Domain Approaches and Shared Challenges</title>
        <p>Inherent in diferent types of data, the challenges faced when extracting knowledge from historical
texts and modeling the outcome and its underlying process are applicable to other domains.</p>
        <p>
          Existing research in the biomedical domain tackles similar issues, including provenance and process
modeling. As can be seen in ontologies like Provenance, Authoring and Versioning (PAV) [38] and
extensions of PROV-O [39, 40, 41], it is essential to formally describe the progress and interpretation of
data, given diferent stages of processing. The same holds for applications in materials science, where
Basic Formal Ontology (BFO)-based ontologies [42] describe complex data-processing workflows [ 43].
Such modeling strategies can be mapped to the knowledge extraction process in Viewsari to define
multiple interpretive stages in the extraction and analysis of textual sources. In parallel to tracking
data extraction in scientific content, which necessitates layered perspectives [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ], Viewsari mirrors the
evolution of interpretation across editions or commentaries in historical sources.
        </p>
        <p>
          On equal terms, social and biographical network research shares a structural overlap. In the Sampo
universe, BiographySampo [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ] uses textual references to build social networks, i.e., graphs of people.
Similar to how citation graphs are structured, it can be traced back to its sources; however, the approach
lacks explicit provenance modeling. This concerns, for example, the diferent versions of a source. In
the domain of music and cultural heritage, the Polifonia Ontology Network [44] exemplifies a modular
approach for representing music history and the cultural heritage context in which it was generated. For
example, within the module for Musical Meetups (encounters between musicians), the model supports
the annotation of relationships with additional context, e.g., the time, place, and purpose [45].
        </p>
        <p>In Viewsari, the goal is to build a generalizable methodology that extends these approaches with
detailed, paragraph-level provenance based on PROV-O, enriched with diferent layers of interpretation
and contextuality. Similarly, the Odeuropa model captures multifaceted information about smells [46].
The HiCO ontology (Historical Context Ontology) [47] supports provenance and interpretive assertions,
providing formal representations for the diferentiation of factual assertions and interpretive assertions.
Viewsari draws on these approaches, making the interpretive provenance of entities and their relations
explicit through linking each extracted triple to the textual fragment and the extraction process.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Research Questions</title>
      <p>Three core research questions (RQs) define the scope of this Ph.D. project.</p>
      <p>RQ1: In what way can semantic technologies model the heterogeneous content, provenance, and
interpretive complexity of historical texts like Giorgio Vasari’s The Lives?</p>
      <p>We hypothesize that the heterogeneous and interpretive content of (historical) texts can be formalized
and made machine-understandable without losing interpretive depth. This includes the development
of an ontology that is capable of capturing explicit and implicit content, context, and provenance of
historical sources, capturing conceptual relationships and their grounding in specific textual passages
across diferent editions, translations, and modalities. This involves modeling provenance information
for extracted entities with references to the in-line paragraphs in the diferent versions of the text. Thus,
we assume that it is possible to represent not only the extracted entities and their relationships but
also the extraction process itself to allow for traceability of provenance across diferent editions and
interpretations.</p>
      <p>RQ2: How can NLP methods, and LLMs in particular, be used to extract and link both explicitly and
implicitly mentioned entities?</p>
      <p>We hypothesize that the integration of LLMs into knowledge extraction pipelines enables the
identification and linking of both explicitly and implicitly referenced entities in textual sources. This includes,
for example, iconographic themes of artworks, motifs,3 or vague references to their content.</p>
      <p>Additionally, we assume that for the transformation of unstructured textual descriptions into
structured representations, for explicitly mentioned entities, a mixture of state-of-the-art NLP methods and
LLM prompting can increase recall and accuracy.</p>
      <p>RQ3: What are the common challenges in modeling complex, multilingual (historical) sources, and
how can a generalizable approach be drawn to make the methodology repeatable and transferable?</p>
      <p>The goal is to map existing practices in other endeavors, e.g., from biomedicine, to the approach in
Viewsari and to contribute to their methodological advancement by proposing solutions to challenges
in modeling historical sources, particularly through semantic representations.</p>
      <p>We hypothesize that a pattern-based, modular ontology engineering approach (XD + ODPs), combined
with prompt-based entity extraction, can generalize to similar structured representations in other
humanities and scientific domains. Using examples from Vasari’s work, the goal is to identify reusable
design patterns that encode recurring conceptual structures, such as multilinguality and multimodality.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Proposed Approach</title>
      <p>To reflect these multidisciplinary research goals, the methodology involves diferent steps, i.e.,
knowledge extraction, ontology development, and knowledge graph construction.</p>
      <p>
        In the current state of this Ph.D. project, we use the translated English edition of The Lives for all the
knowledge extraction and engineering steps. It was written by Gaston C. Du Vere in 1912, based on the
edition by Vasari released in 1568. A digital edition of Du Vere’s translation is available in ten volumes,
each containing a subset of all biographies [
        <xref ref-type="bibr" rid="ref5">5, 48</xref>
        ].
3Motifs are “recurring subject[s], theme[s], or idea[s] in art”. For reference, see this site: https://blog.stephens.edu/
arh101glossary/?glossary=motif.
      </p>
      <sec id="sec-4-1">
        <title>4.1. Knowledge Extraction</title>
        <p>
          For entity extraction from The Lives, the team of [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] experimented with diferent NLP models for entity
recognition and entity linking, for which they created annotation guidelines [49]. The experimental
pipeline plays a central role in this efort, using a range of models, including Universal-NER [50] for the
recognition of artwork and subject recognition, and mGENRE [51] for the disambiguation of entities.
In a corresponding GitHub repository, a sample set of the following entities is available:4 persons,
organizations, places, and miscellaneous (all following the CoNLL 2003 [52] dataset, partially linked
to Wikidata5 and Iconclass [53]), as well as artwork references, motifs, terms, and dates. For persons,
the Index of Names was the baseline for extraction. As an appendix for The Lives, it lists all persons
who occur in the work and is included in the original Italian and translated versions. In addition,
co-reference resolution and statistical association from the co-occurrences of names in paragraphs
model relationships between artists to create the social network.
        </p>
        <p>
          The results are available in tabular form: CSV files summarize occurrences of entities with additional
information about their provenance per volume of the translation. The data basis includes 673
cooccurrences and 1.073 persons, from which 312 appear in the relevant co-occurrences. Since this
approach is only experimental, the quantity of other extracted entities is lower. For example, only 133
artworks and 311 motifs could be correctly identified [
          <xref ref-type="bibr" rid="ref9">9, 54</xref>
          ].
        </p>
        <p>
          To expand their coverage, in ongoing work, we employ LLMs to assist in identifying and extracting
further entities from the descriptions given in The Lives, supplementing existing annotations with
OOKB entities [35, 55, 56]. A prompt-based pipeline is the methodological foundation. It integrates
diferent LLMs, e.g., Mistral 7B [ 57], or LLaMA 2 13B [58], to parse text fragments such as “a painting
of Madonna with Saint John in the wilderness” with dedicated prompts like “extract persons, locations,
events, and artworks from the following Italian Renaissance text”, and generate structured entries
describing, for example, the title, motif, type, and associated artist. When not dealing with an OOKB
entity, the pipeline links it to external identifiers. OOKB entities are assigned local identifiers. These
candidate entities need to be manually reviewed by domain experts and stored in a separate annotation
layer, as represented in the ontology. A crucial part of this pipeline is prompt engineering, for example,
crafting prompts related to artistic styles or artwork characteristics based on iconographic theories can
help to guide LLMs in identifying artworks or artists based on descriptive passages [
          <xref ref-type="bibr" rid="ref13">13, 59</xref>
          ].
        </p>
        <p>Furthermore, another round of identifying relationships between entities is planned. This process
allows the pipeline to reconstruct relationships between entities beyond just persons, such as artworks
and locations, via relation extraction. To expand the scope of Viewsari, information extracted from the
original Italian version of Vasari’s The Lives will be incorporated.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Ontology Development and Knowledge Graph Construction</title>
        <p>
          We develop the Viewsari ontology following the XD methodology. As an application ontology, it
formally represents semi-automatically extracted content and related provenance, exemplified on the
use case of The Lives. Using XD ensures the correct encoding of the user requirements of the art history
community and provides a standardized approach to modeling and testing the conceptualizations, also
based on how the data basis is shaped and what results can be achieved using the aforementioned
techniques for information extraction [54, 60]. It includes both simple and complex class definitions,
based on the domain needs, e.g., diferentiating between co-occurrences between artists, their artwork
production, or interpersonal influence. The Viewsari ontology extends diferent mid-level and domain
ontologies to ensure interoperability [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]: For provenance representation, the ontology extends PROV-O
with new classes for co-occurrences and the extraction process with LLMs. For representing the diferent
versions of Vasari’s work, Viewsari extends the Functional Requirements for Bibliographic Records
(FRBR) [61] and FaBiO [62] ontologies. Subsequently, the KG construction transforms the extracted
entities into a graph, using the Viewsari Ontology and Semantic Web standards such as RDF, RDFS,
4https://github.com/ISE-FIZKarlsruhe/vasari_nlp/
5https://www.wikidata.org/wiki/Wikidata:Main_Page
OWL, and SPARQL.
        </p>
        <p>For the construction of the KG, we utilize Robot [63] templates and RML (RDF Mapping Language)6
to transform the entities given in tabular form (CSV files) into RDF triples, using the ontology as a
conceptual schema. For all entities, metadata, positional information (provenance), relationships, and
external identifiers (e.g., Wikidata) are systematically integrated into the KG [ 29, 60]</p>
        <p>All steps in ontology development and KG construction are iterative, ensuring constant external
validation and evaluation in close cooperation with domain experts.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Preliminary Results and Evaluation</title>
      <p>The current set of results includes: 1) a preliminary ontology to describe the complex content and
contextual references of Vasari’s work, distinguishing between descriptive, interpretive, and
provenancerelated aspects, taking into account diferent levels of content-description, 2) a collection of extracted
entities, plus the full LLM-based pipeline for information extraction, NER, and entity linking, along
with the corresponding prompts, 3) the Viewsari KG as a domain-specific prototype.</p>
      <sec id="sec-5-1">
        <title>5.1. Findings from the Viewsari Pipeline</title>
        <p>
          As previously mentioned, the ontology-based Viewsari KG is the core part of this ongoing Ph.D.
project. The first version is based on a historical social network automatically generated as described in
section 4. Another set of extracted named entities, including places, artworks, historical events, motifs,
etc., complements this for additional contextualization and extension of the KG. To evaluate the first set
of information extraction results, precision, recall, and F1 scores are calculated against the manually
curated and annotated ground truth (as provided by [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]). The Universal-NER [50] model achieves an F1
score of 55.606% for artwork detection and 60.242% for subjects (motifs), indicating strong performance
in core domains of interest, given the noisy and historical data. In the future, art history domain experts
will assess the quality of LLM-generated outputs. Additionally, we aim to use the output coherence and
extractability of structured data to understand which prompts achieve the highest average precision
in identifying entities in art history, so they can be used and tested in other use cases. As shown in
section 2, in the scientific domain, projects such as AI4DiTraRe [ 28] show a comparable terminological
inconsistency, similarly to the hurdles in historical texts and knowledge extraction. These similarities
suggest a transferable strategy for LLM-based extraction in heterogeneous corpora.
        </p>
        <p>
          Following the XD methodology and consulting with domain experts, we could draw diferent
conclusions about the concepts behind Viewsari: regarding the social network, a simple ontology would
have suficed to conceptualize persons and their shared relationships. However, complex relationships
between entities, contextual implications of information extracted from the text, and information
provenance for extracted data require a more sophisticated representation. These aspects concern all extracted
resources and their relations since they originate from a specific segment of a specific work in a specific
edition [
          <xref ref-type="bibr" rid="ref2">2, 60</xref>
          ]. This way, it is possible to align extracted knowledge from diferent versions, making
them comparable on a larger scale, e.g., looking at the English translation versus the Italian original
text. Accordingly, the ontology defines the formal and logical structure, covering three conceptual
dimensions of the domain: bibliographic, structural, and content components. The bibliographic layer
depicts information about the work and its diferent expressions, whereas the structural layer covers
information on the document level. In the third layer, the ontology represents the extracted content
(named entities, co-occurrences). As of now, the concepts in Viewsari are based on or extend ontologies
such as FaBiO [62] and DoCo [64], as well as PROV-O [39] and the Web Annotation Ontology [65].
Extracted entities can be connected to source information (e.g., paragraphs in which a co-occurrence
appears) from the structural layer. This then links back to the appropriate bibliographic components,
allowing provenance information to be traced through detailed positional arguments and information
about the edition or translation used. Figure 1 shows a simplified visualization of the ontology. 7
        </p>
        <p>In line with the XD methodology, we evaluate all steps iteratively and user-centered. This includes
validation of concepts by domain experts and prototype-based validation (based on the KG). Competency
questions test the ontology’s accuracy in supporting queries for domain experts [32].</p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. Limitations</title>
        <p>In the current version of Viewsari, the focus is solely on one corpus (the English translation by C. du
Vere [48]) and thus, on one language. This limits the application of the pipeline to a narrow view in
terms of knowledge extraction and transferability. For example, the generation of cross-connections
between diferent versions can only be further highlighted when a comparison to another use case is
drawn. Additionally, we currently rely almost fully on prompt-based extraction for implicitly named
entities, lacking a properly benchmarked evaluation. The same holds for a lack of automated alignment
or reconciliation for complex iconographic references, motifs, or entity/artwork descriptions.</p>
      </sec>
      <sec id="sec-5-3">
        <title>5.3. Towards a Generalizable Methodology Based on Viewsari</title>
        <p>
          To overcome the aforementioned limitations, one solution is to broaden the application of the
methodology behind Viewsari and to consolidate various case studies. Furthermore, we address the challenges
through domain expert validation for the ontology and for the results from the knowledge extraction to
create a ground truth. Given the various challenges we address, we aim to make the developed solutions
in Viewsari transferable to other domains. In a first attempt, as shown in prior work, we systematized
reusable ODPs for DH applications [
          <xref ref-type="bibr" rid="ref2">2, 60</xref>
          ]. The goal is to further abstract and operationalize them
domain-independently.
7For a detailed view of the ontology and alignments (T-box and A-box), please consult the Viewsari GitHub repository:
https://github.com/ISE-FIZKarlsruhe/viewsari.
        </p>
        <p>
          To assess the generalization of this methodology, we identify three key dimensions where transfer
is possible: 1) Annotation modeling, namely provenance, as used in Viewsari [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]. Entities (persons,
places, artworks) are modeled with links (annotations) to their textual origin. 2) Support of
multiple interpretations and evolving knowledge, based on the connection where it was mentioned, who
mentioned it, etc., which is also relevant to scientific knowledge graphs, where experimental results,
hypotheses, and claims often require multifold descriptions. This also includes the representation of
the knowledge extraction process, allowing for more detailed semantic provenance. This is true for
process-oriented data, experimental parameters, and measurement provenance, as seen in biomedical
data or materials science. 3) The knowledge extraction pipeline, including a variety of prompts and
experiments with LLMs to see how ambiguous entities and further information can be extracted from
unstructured sources.
        </p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion and Future Works</title>
      <p>Integrating semantic technologies into the complex research questions of the DH community
revolutionized the way knowledge can be represented, linked, and shared; however, this domain is only
a singular example out of many comparable ones. However, to go one step further and improve not
only the quality of the resulting ontologies but also the whole process, with this project, we aim to
demonstrate how to implement a methodology to go from unstructured textual sources to formalized
knowledge representations, independent of the very interdisciplinary nature of research data.</p>
      <p>
        Viewsari addresses key challenges in DH and beyond, such as inconsistent modeling practices or the
dificulty of representing interpretively rich, multilingual data [
        <xref ref-type="bibr" rid="ref1 ref3 ref4">1, 3, 4</xref>
        ].
      </p>
      <p>Central to this project is the knowledge extraction from Vasari’s text, transforming the unstructured
biographies into structured resources, based on the schema provided by the developed Viewsari ontology.
This enables a large-scale analysis of the corpus while retaining the provenance information of extracted
entities and their relations down to the paragraph level. Additionally, the KG design bridges knowledge
gaps by providing a representation level for OOKB entities, their emergence, and document-level
provenance, highlighting the interpretive nature of historical data, where not all relevant knowledge
has yet been curated or indexed. This also includes the recognition of entities like artworks and motifs
from their description rather than by named references.</p>
      <p>The challenges addressed in Viewsari, however, afect various domains aside from DH research. Across
domains such as biomedicine, musicology, or social network research, ambiguous textual references,
extracting structured knowledge with LLMs, provenance modeling, and modeling of interpretation
show up alike. Thus, the goal of Viewsari is to ofer a generalizable framework that can readily be
adapted to other approaches.</p>
      <p>Next steps involve broadening the Viewsari methodology by integrating more case studies. As such,
we aim to support the transfer of our approach to other domains by systematizing reusable patterns and
abstracting them for wider application. This also concerns a generalization of the knowledge extraction
process to fit other domains, such as in scientific knowledge representation.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments</title>
      <p>I would like to thank Prof. Dr. Harald Sack for his invaluable support and supervision. I also want to
thank Tabea Tietz for her input and moral support during all project steps.</p>
    </sec>
    <sec id="sec-8">
      <title>Declaration on Generative AI</title>
      <p>The author has employed DeepL and Grammarly, as well as Writefull for grammar and spell checking.
The author used GPT-4-turbo model for: Citation management and drafting an outline for the structure.
The author reviewed and edited the content as needed and takes full responsibility for the publication’s
content.
Linguistics Compass 4 (2010) 750–768. URL: https://doi.org/10.1111/j.1749-818X.2010.00230.x.
doi:10.1111/j.1749-818X.2010.00230.x, publisher: John Wiley &amp; Sons, Ltd.
[25] F. De Toni, et al., Entities, Dates, and Languages: Zero-Shot on Historical Texts with T0, in: A. Fan,
et al. (Eds.), Proceedings of BigScience Episode #5 – Workshop on Challenges &amp; Perspectives in
Creating Large Language Models, Association for Computational Linguistics, virtual+Dublin, 2022,
pp. 75–83. doi:10.18653/v1/2022.bigscience-1.7.
[26] A. Graciotti, Knowledge Extraction from Multilingual and Historical Texts for Advanced Question
Answering, in: Proceedings of ISWC 2023, ISWC 2023 Doctoral Consortium, CEUR Workshop
Proceedings (CEUR-WS.org), Athens, Greece, 2023.
[27] A. M. Brasoveanu, et al., In Media Res: A Corpus for Evaluating Named Entity Linking with
Creative Works, in: R. Fernández, T. Linzen (Eds.), Proceedings of the 24th Conference on
Computational Natural Language Learning, Association for Computational Linguistics, Online,
2020, pp. 355–364. doi:10.18653/v1/2020.conll-1.28.
[28] A. Jacyszyn, et al., AI4DiTraRe: Towards LLM-Based Information Extraction for Standardising
Climate Research Repositories, in: First AAAI Bridge on Artificial Intelligence for Scholarly
Communication AI4SC, Philadelphia, United States, 2025. doi:10.5281/zenodo.14872358.
[29] S. Barzaghi, et al., CHAD-KG: A Knowledge Graph for Representing Cultural Heritage Objects
and Digitisation Paradata, 2025. URL: https://arxiv.org/abs/2505.13276.
[30] A. Gangemi, V. Presutti, Ontology Design Patterns, in: S. Staab, R. Studer (Eds.), Handbook
on Ontologies, Springer Berlin Heidelberg, Berlin, Heidelberg, 2009, pp. 221–243. doi:10.1007/
978-3-540-92673-3_10.
[31] V. Presutti, et al., eXtreme design with content ontology design patterns, in: Proceedings of
the 2009 International Conference on Ontology Patterns - Volume 516, WOP’09, CEUR-WS.org,
Aachen, DEU, 2009, pp. 83–97. Event-place: Washington DC.
[32] E. Blomqvist, et al., Engineering Ontologies with Patterns - The eXtreme Design Methodology.,
in: Ontology Engineering with Ontology Design Patterns - Foundations and Applications, 2016,
pp. 23–50. doi:10.3233/978-1-61499-676-7-23.
[33] V. Carriero, et al., ArCo: The Italian Cultural Heritage Knowledge Graph, 2019, pp. 36–52.</p>
      <p>doi:10.1007/978-3-030-30796-7_3.
[34] Y. Tzitzikas, et al., CIDOC-CRM and Machine Learning: A Survey and Future Research, Heritage
5 (2022) 1612–1636. doi:10.3390/heritage5030084.
[35] A. Ahola, L. Peura, H. Rantala, Using generative AI and LLMs to enrich art collection metadata for
searching, browsing, and studying art history in Digital Humanities, 2024.
[36] N. Ockeloen, et al., BiographyNet: managing provenance at multiple levels and from diferent
perspectives, in: Proceedings of the 3rd International Conference on Linked Science - Volume
1116, LISC’13, CEUR-WS.org, 2013, p. 59–71.
[37] M. Kienle, Between Nodes and Edges: Possibilities and Limits of Network Analysis in Art History,</p>
      <p>Artl@s Bulletin 6 (2017).
[38] P. Ciccarese, et al., PAV ontology: provenance, authoring and versioning, Journal of Biomedical</p>
      <p>Semantics 4 (2013) 37. doi:10.1186/2041-1480-4-37.
[39] T. Lebo, et al., PROV-O: The PROV ontology, Technical Report, World Wide Web Consortium,
2013.
[40] S. S. Sahoo, et al., Scientific Reproducibility in Biomedical Research: Provenance Metadata Ontology
for Semantic Annotation of Study Description, in: AMIA Annual Symposium Proceedings, volume
2016, American Medical Informatics Association, 2016, pp. 1070–1079.
[41] T. Procko, O. Ochoa, Mapping the W3C Provenance Ontology (PROV-O) to the Basic Formal
Ontology (BFO): Epistemological Considerations and Preliminary Implementation, Social Science
Research Network (2024). URL: http://dx.doi.org/10.2139/ssrn.4852748.
[42] B. Smith, et al., Basic Formal Ontology (BFO) Version 2020-08-26, 2020. URL: http://purl.obolibrary.</p>
      <p>org/obo/bfo.owl.
[43] B. Bayerlein, et al., PMD Core Ontology: Achieving semantic interoperability in materials science,</p>
      <p>Materials &amp; Design 237 (2024) 112603. doi:10.1016/j.matdes.2023.112603.
[64] A. Constantin, et al., The DocumentComponents Ontology(DoCO), Semantic Web 7 (2016) 167–181.</p>
      <p>doi:10.3233/SW-150177.
[65] P. Ciccarese, et al., An open annotation ontology for science on web 3.0, Journal of Biomedical
Semantics 2 (2011) S4. doi:10.1186/2041-1480-2-S2-S4.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Meroño-Peñuela</surname>
          </string-name>
          , et al.,
          <article-title>Ontologies in CLARIAH: Towards Interoperability in History, Language and Media</article-title>
          , CoRR abs/
          <year>2004</year>
          .02845 (
          <year>2020</year>
          ). URL: https://arxiv.org/abs/
          <year>2004</year>
          .02845.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>S. R.</given-names>
            <surname>Ondraszek</surname>
          </string-name>
          , et al.,
          <article-title>One Pattern to Express Them All? Towards Generalised Patterns for Ontology Design in the Digital Humanities</article-title>
          ,
          <source>in: Proceedings of the 15th Workshop on Ontology Design and Patterns (WOP 2024) at ISWC</source>
          <year>2024</year>
          ,
          <year>2024</year>
          . URL: https://tinyurl.com/mrxr7wnt.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>V. A.</given-names>
            <surname>Carriero</surname>
          </string-name>
          , et al.,
          <source>The Landscape of Ontology Reuse Approaches</source>
          , in: Applications and Practices in Ontology Design, Extraction, and
          <string-name>
            <surname>Reasoning</surname>
          </string-name>
          , IOS Press,
          <year>2020</year>
          . doi:
          <volume>10</volume>
          .3233/ssw200033.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>C.</given-names>
            <surname>Shimizu</surname>
          </string-name>
          , et al.,
          <article-title>Modular ontology modeling</article-title>
          ,
          <source>Semantic Web</source>
          <volume>14</volume>
          (
          <year>2023</year>
          )
          <fpage>459</fpage>
          -
          <lpage>489</lpage>
          . doi:
          <volume>10</volume>
          .3233/ SW-222886.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>G.</given-names>
            <surname>Vasari</surname>
          </string-name>
          , Le vite de' più eccellenti pittori,
          <source>scultori e architettori</source>
          , 2 ed.,
          <string-name>
            <surname>Giunti</surname>
          </string-name>
          , Florence, Italy,
          <volume>1568</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>L.</given-names>
            <surname>Pon</surname>
          </string-name>
          , Rewriting Vasari, in: The Ashgate Research Companion to Giorgio Vasari, Ashgate,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>S. J.</given-names>
            <surname>Campbell</surname>
          </string-name>
          ,
          <article-title>Vasari's Renaissance and Its Renaissance Alternatives</article-title>
          , in: Renaissance Theory, Routledge,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>F.</given-names>
            <surname>Moretti</surname>
          </string-name>
          , Distant Reading, Verso Books,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>C.</given-names>
            <surname>Santini</surname>
          </string-name>
          , et al.,
          <article-title>Knowledge Extraction for Art History: the Case of Vasari's The Lives of The Artists (1568)</article-title>
          ,
          <source>in: Proceedings of the Third Conference on Digital Curation Technologies (Qurator</source>
          <year>2022</year>
          ),
          <year>2022</year>
          . doi:
          <volume>10</volume>
          .34657/10668.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>A.</given-names>
            <surname>Meroño-Peñuela</surname>
          </string-name>
          , et al.,
          <article-title>Semantic technologies for historical research: A survey, Semantic Web 6 (</article-title>
          <year>2014</year>
          )
          <fpage>539</fpage>
          -
          <lpage>564</lpage>
          . doi:
          <volume>10</volume>
          .3233/SW-140158.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>S.</given-names>
            <surname>Jänicke</surname>
          </string-name>
          , et al.,
          <source>On Close and Distant Reading in Digital Humanities: A Survey and Future Challenges</source>
          , The Eurographics Association;,
          <year>2015</year>
          , pp.
          <fpage>83</fpage>
          -
          <lpage>103</lpage>
          . doi:
          <volume>10</volume>
          .2312/eurovisstar.20151113.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>M.</given-names>
            <surname>Ehrmann</surname>
          </string-name>
          , et al.,
          <article-title>Named Entity Recognition and Classification in Historical Documents: A Survey</article-title>
          ,
          <source>ACM Computing Surveys</source>
          <volume>56</volume>
          (
          <year>2023</year>
          )
          <fpage>1</fpage>
          -
          <lpage>47</lpage>
          . doi:
          <volume>10</volume>
          .1145/3604931, publisher: Association for Computing Machinery (ACM).
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>B.</given-names>
            <surname>Sartini</surname>
          </string-name>
          , et al.,
          <source>ICON: An Ontology for Comprehensive Artistic Interpretations, J. Comput. Cult. Herit</source>
          .
          <volume>16</volume>
          (
          <year>2023</year>
          ). doi:
          <volume>10</volume>
          .1145/3594724.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>E.</given-names>
            <surname>Panofsky</surname>
          </string-name>
          ,
          <article-title>Studies in Iconology: Humanistic Themes in the Art of the Renaissance</article-title>
          , Oxford University Press, New York,
          <year>1939</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>T.</given-names>
            <surname>Hamaguchi</surname>
          </string-name>
          , et al.,
          <article-title>Knowledge Base Completion with Out-of-Knowledge-Base Entities: A Graph Neural Network Approach</article-title>
          ,
          <source>Transactions of the Japanese Society for Artificial Intelligence</source>
          <volume>33</volume>
          (
          <year>2018</year>
          ). doi:
          <volume>10</volume>
          .1527/tjsai.f-h72,
          <source>publisher: Japanese Society for Artificial Intelligence.</source>
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>R.</given-names>
            <surname>Peeters</surname>
          </string-name>
          , et al.,
          <source>Entity Matching using Large Language Models</source>
          ,
          <year>2024</year>
          . URL: https://arxiv.org/abs/ 2310.11244.
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>D. G.</given-names>
            <surname>Stork</surname>
          </string-name>
          ,
          <article-title>How AI is expanding art history</article-title>
          ,
          <source>Nature</source>
          <volume>623</volume>
          (
          <year>2023</year>
          )
          <fpage>685</fpage>
          -
          <lpage>687</lpage>
          . doi:
          <volume>10</volume>
          .1038/ d41586-023-03604-3, publisher: Nature Publishing Group.
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>A.</given-names>
            <surname>Steiner</surname>
          </string-name>
          , et al.,
          <source>Fine-tuning Large Language Models for Entity Matching</source>
          ,
          <year>2025</year>
          . URL: https: //arxiv.org/abs/2409.08185.
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>E.</given-names>
            <surname>Hyvönen</surname>
          </string-name>
          ,
          <article-title>Digital humanities on the Semantic Web: Sampo model</article-title>
          and portal series,
          <source>Semantic Web</source>
          <volume>14</volume>
          (
          <year>2023</year>
          )
          <fpage>729</fpage>
          -
          <lpage>744</lpage>
          . doi:
          <volume>10</volume>
          .3233/SW-223034, publisher: IOS Press.
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>T.</given-names>
            <surname>Aggarwal</surname>
          </string-name>
          , et al.,
          <article-title>Large language models for scholarly ontology generation: An extensive analysis in the engineering field</article-title>
          ,
          <source>Information Processing &amp; Management</source>
          <volume>63</volume>
          (
          <year>2025</year>
          )
          <article-title>104262</article-title>
          . doi:
          <volume>10</volume>
          . 1016/j.ipm.
          <year>2025</year>
          .
          <volume>104262</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>E.</given-names>
            <surname>Hyvönen</surname>
          </string-name>
          , et al.,
          <article-title>BiographySampo - Publishing and Enriching Biographies on the Semantic Web for Digital Humanities Research</article-title>
          , in: The Semantic Web, Springer International Publishing, Cham,
          <year>2019</year>
          , pp.
          <fpage>574</fpage>
          -
          <lpage>589</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>D.</given-names>
            <surname>Nadeau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Sekine</surname>
          </string-name>
          ,
          <article-title>A survey of named entity recognition and classification</article-title>
          ,
          <source>Lingvisticae Investigationes</source>
          <volume>30</volume>
          (
          <year>2007</year>
          )
          <fpage>3</fpage>
          -
          <lpage>26</lpage>
          . doi:
          <volume>10</volume>
          .1075/li.30.1.03nad.
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>M.</given-names>
            <surname>Piotrowski</surname>
          </string-name>
          ,
          <source>Natural Language Processing for Historical Texts</source>
          , Springer International Publishing, Cham,
          <year>2012</year>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>031</fpage>
          -02146-6.
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>C.</given-names>
            <surname>Sporleder</surname>
          </string-name>
          ,
          <source>Natural Language Processing for Cultural Heritage Domains</source>
          , Language and
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>