<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Italian Conference on Computational Logic, June</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Unlocking Historical Insights: Developing a Dataset from Historical Archives</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Laura Pandolfo</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Luca Pulina</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>DUMAS, University of Sassari</institution>
          ,
          <addr-line>via Roma 151, Sassari</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2023</year>
      </pub-date>
      <volume>2</volume>
      <fpage>1</fpage>
      <lpage>23</lpage>
      <abstract>
        <p>The proliferation of data on the Web has resulted in an increased need for efective techniques to extract relevant and valuable knowledge from this data. The intersection of the fields of Information Extraction and Semantic Web has created new opportunities to improve ontology-based information extraction tools. However, the development and evaluation of such systems have been hampered by the scarcity of annotated documents, particularly in historical domains. This article discusses the current state of our work in creating a large RDF dataset that aims to support the development of ontology-based extraction tools. The dataset was created through manual annotation by domain experts as part of the arkivo project and contains approximately 300,000 triples, which are freely available. This dataset can be used as a benchmark to evaluate systems that automatically extract entities and annotate documents.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Semantic Web</kwd>
        <kwd>Linked Open Data</kwd>
        <kwd>Cultural Heritage</kwd>
        <kwd>Ontology</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Historical texts are a crucial resource for scholars in the field of Digital Humanities [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. They
provide a unique perspective on the social, cultural, and political contexts of the past, providing
valuable insights into the evolution of human thought and behavior. By examining them,
researchers can gain a better understanding of how people interacted with each other during
diferent historical periods [
        <xref ref-type="bibr" rid="ref2 ref3">2, 3</xref>
        ].
      </p>
      <p>The digitization of historical texts has made them more accessible than ever before. Often
these collections are preserved in digital libraries and digital archives that allows researchers to
search and analyze across diferent time periods and geographic regions, by enabling them to
identify patterns and trends that would have been dificult to discern using traditional methods.
In fact, by leveraging digital tools and methods, it is possible to explore historical texts and gain
new insights into the complex history of humanity. While the mere digitization of historical
texts has now become an obvious and common practice, using computational techniques to
automatically analyze them is a complex and challenging process. Historical texts, in fact,
present unique challenges that must be addressed in order to automatically extract meaningful
and accurate information. In the following, we report some of the main issues when working
with this kind of documents:
1. Quality. It is not uncommon that historical texts that may be damaged, discolored,
or written in a non-standard fonts. This poses challenge for OCR (Optical Character
Recognition) technology which may fail to interpret some characters, resulting in errors
that are not easy to rectify.
2. Variability. This kind of texts may use archaic language, obsolete spelling and syntax,
which can make it dificult to automatically analyze them. In this regard, the lack of
models trained on outdated languages is felt.
3. Inconsistency. Historical texts often contain inconsistencies and errors that can hinder
the process of extracting accurate key information.
4. Lack of standardization. They often use diferent protocols for formatting, citation,
and referencing, making it dificult to compare and analyze texts from diferent sources.
5. High expertise. Historical texts regularly contain cultural and historical references that
may not be easily understood by modern-day readers, thus requiring high expertise for
a proper understanding of the text. For example, historical texts may reference specific
events, customs, or practices that are no longer relevant or well-known in modern times.
To cope with all these issues, specialized domain experts should be involved in the manual
annotation process in order to extract relevant data and make it accessible in a structured way.
However, it is clear that manual annotation cannot be an afordable solution, since it represents
a time-consuming and expensive task.</p>
      <p>
        For many years, research in the fields of Text Mining [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], Information Extraction [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] and
Natural Language Processing [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] have been working on developing techniques able to automatically
extract structured information from historical documents with high precision. However, despite
significant progresses in these fields, computers are still far from being able to have a complete
semantic understanding of the human language [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. Methods to automatically extract
information have been a core topic also in the context of the Semantic Web field, in which information
extraction techniques are especially useful to populate the semantic knowledge-bases. On the
other hand, Semantic Web resources, such as ontologies, languages, data, tools, have been used
to guide and improve the information extraction process [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. In particular, the application of
ontology has proven to be beneficial in the field of information extraction, as it provides a
formal and explicit definition of domain concepts. This has led to the emergence of
OntologyBased Information Extraction as a sub-discipline of knowledge extraction [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. Ontology-based
techniques are used to enhance the performance of systems by guiding algorithms for eficient
and relevant information extraction [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. Additionally, the use of formal ontologies enables
standard inference engines to reason over extracted entities, thus allowing to infer additional
information that may not be explicitly stated in the original text [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ].
      </p>
      <p>
        Promising developments in the field of Machine Learning [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] for historical texts have been
made. Recent studies demonstrated how Neural Networks (NNs) could be efective in processing
historical texts since they efectively support many NLP tasks that are relevant for Digital
Humanities research, such as Named Entity Recognition (NER), Entity Linking (EL), Relation
Extraction (RE) and other tasks – see, e.g. [
        <xref ref-type="bibr" rid="ref13 ref14 ref15">13, 14, 15</xref>
        ]. Nonetheless, the use of NNs seems to be
rather limited for diferent reasons. Firstly, historical texts often lack consistent annotations for
named entities such as people, places, and organizations. Currently, the amount of annotated
corpus available to train models is very scarce, making it dificult to train models that are able
to accurately recognize and extract entities from historical texts. Developing these labeled
corpora is not a straightforward task as it requires a great amount of resources in terms of time,
budget, and expertise. Secondly, the lack of of-the-shelf tools represents a potential hindrance
for scholars in the digital humanities who could provide a valuable contribution in this field.
      </p>
      <p>In this paper, the authors present the arkivo dataset, which was built from archival historical
documents previously manually annotated by domain experts. Currently, the dataset contains
around 300,000 triples and is freely available for use. In addition to its intrinsic value as historical
artifacts, this dataset could also be of great importance for the development of information
extraction tools and methods as a benchmark to evaluate systems that automatically annotate
entities in unstructured documents, such as places, persons, and organizations. In fact, one of
the critical aspects in the development of this type of system is the evaluation phase, which
requires a ground truth, i.e. a dataset with all the relevant findings in the documents. Usually,
the output of these tools is assessed by comparing it to the reference annotation, in order
to compute standard quality metrics, such as recall and precision. This dataset also could
serve as a rich source of training data for machine learning algorithms in order to allow
researchers to create more accurate and eficient natural language processing systems. In this
respect, historical data provides a valuable testbed for evaluating the efectiveness of these
models, as they present unique challenges that must be overcome in order to extract meaningful
information. Additionally, the ontology schema of the arkivo dataset is based on the OWL 2
DL profile, which makes it suitable for ontology benchmarking purposes, as there is a shortage
of expressive ontologies and language element combinations available. The dataset was created
as part of the largest arkivo project.</p>
      <p>The rest of the paper is organized as follows. Section 2 describes the research baseline for
this work, including background on ontology and Linked Data, and it discusses some related
works. Section 3 presents the source datasets and the ontology model, while Section 4 describe
the arkivo dataset and its usefulness. Section 5 concludes and presents planned future work.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Background</title>
      <p>
        In the field of philosophy, ontology is often understood as the study of the nature of being,
including the nature of entities or substances, their properties, and their relations. In recent
years, the term ontology has also been used in other fields, including computer science, where
it refers to the study of conceptual models of a particular domain, and in information science,
where it refers to the study of the nature of information and knowledge. In the field of computer
science, an ontology is commonly defined as a formal and explicit specification of the concepts,
entities, and relationships within a particular domain [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]. The goal of the ontology is to create
a shared understanding of a particular domain among people and machines, enabling efective
communication and knowledge sharing. There are diferent formal languages and frameworks
used to develop ontologies in computer science, such as the Resource Description Framework
Schema (RDFS) [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] and the Web Ontology Language OWL [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]. These languages provide a way
of representing knowledge in a machine-readable format, thus enabling automated reasoning
and inference.
      </p>
      <p>
        The most recent version of OWL is OWL 2 [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ], which tackles the issue of complexity by
establishing profiles, namely fragments. In particular, OWL 2 has the following profiles: OWL 2
EL, OWL 2 QL, OWL 2 RL, OWL 2 DL, and OWL 2 Full. Each profile varies in terms of their
expressivity and reasoning complexity. The first three profiles (OWL 2 EL, OWL 2 QL, OWL
2 RL) are tractable fragments of OWL 2 having polynomial reasoning time. Reasoning over
OWL 2 DL ontologies has a complexity of N2EXPTIME, while OWL 2 Full is undecidable. OWL
2 DL, the version of OWL we focus on, is based on Description Logics (DL) [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ], a group of
formal languages for knowledge representation that model concepts, roles, individuals, and
their relationships. In DL, a database is called a knowledge base. In particular, if  = ⟨ , ⟩ is
a knowledge base, then the Tbox  is a set of inclusion assertions, i.e. concept descriptions in
ℒ or some of its extensions, whereas the Abox is a set of membership assertions of the form
() and (, ) where  is some atomic concept,  is some atomic role and ,  are objects
of a domain. Some OWL 2 constructors with the corresponding DL syntax are listed in Table 1.
      </p>
      <p>
        The standard query language is SPARQL [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ], whose latest version, namely SPARQL 1.1,
includes many new language features such as aggregates, sub-queries, a new suite of built-in
functions, and path expressions. SPARQL queries typically consist of various clauses and blocks,
which specify basic graph patterns to be matched along with keywords that join, filter and
extend the solution sequences to these patterns.
      </p>
      <p>
        For a more detailed description of SPARQL syntax and its operators, please refer to [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ].
      </p>
      <sec id="sec-2-1">
        <title>2.1. Linked Data</title>
        <p>
          The term Linked Open Data (LOD) refers to tools or platforms that allow for the collection and
integration of data from various sources or formats, which can be accessed by both machines
and humans [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ]. These tools typically enable users to search for information using predefined
languages or mechanisms (e.g. SQL, HTML, SPARQL, DL, RDFS, etc.). Linked Open Data is
technically defined as a knowledge graph that is represented as a semantic web or schema using
ontologies to interconnect data. The goal of LOD is to establish a worldwide data environment
that can be accessed, shared, and resused by anyone, from any location, and for any objective.
        </p>
        <p>
          The state-of-the-art of LOD is constantly evolving, driven by advances in technologies,
standards, and best practices [
          <xref ref-type="bibr" rid="ref23 ref24">23, 24</xref>
          ]. In particular, it is characterized by a growing emphasis
on interoperability, quality, and reuse, as well as a focus on developing new applications and
use cases that can leverage the vast amount of interconnected data available on the Web. In the
last years, the number of datasets in the LOD cloud 1 has been growing steadily, from a few
hundred in 2007 to over 1,300 as of 2021.
        </p>
        <p>
          As LOD datasets become more widely used, there is a growing emphasis on ensuring their
quality and provenance. This includes eforts to develop standards and best practices for specific
application domains concerning data cleaning, enrichment, as well as mechanisms for tracking
the source and history of data [
          <xref ref-type="bibr" rid="ref25">25</xref>
          ]. In the cultural heritage domain, for examples, several
studies have been conducted in order to analyze the state of application of linked data and to
develop specific actions and recommendations to be implemented to improve their efective
usage [
          <xref ref-type="bibr" rid="ref26 ref27">26, 27</xref>
          ].
        </p>
        <p>
          In the last year, the use of knowledge graphs has been considered relevant for representing
and linking data in a structured and semantic way [
          <xref ref-type="bibr" rid="ref28">28</xref>
          ]. In fact, by using knowledge graphs,
it is possible to connect diferent data sources and extract meaningful insights from large and
complex datasets. Currently, many LOD applications are using knowledge graphs to provide
more sophisticated search, recommendation, and analysis capabilities.
        </p>
        <p>
          Also the integration with AI and machine learning has been investigated in the LOD field [
          <xref ref-type="bibr" rid="ref29 ref30">29,
30</xref>
          ]. For example, linked data is increasingly being used as a source of training and validation
data for AI and machine learning applications.
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Related Work</title>
        <p>Cultural heritage data is an important resource for research, study and analysis in diferent
ifelds, including history, social sciences, and humanities in general. In the last decade, several
historical and cultural heritage datasets have been published in the LOD cloud, covering a wide
range of topics and formats. Very often these datasets have been built as part of the development
of semantic digital libraries, as for example Europeana 2, which is a platform that provides
access to millions of digitized cultural heritage items from museums, archives, and libraries
across Europe. Its datasets can be downloaded for free and it also ofers a range of APIs. Data is
represented in the Europeana Data Model (EDM), which re-use some of the reference ontologies
already available (e.g. CIDOC-CRM 3, SKOS 4, FOAF 5, Dublin Core 6). It enables interoperability
without afecting the source data models. Provides queries to multiple linked metadata from
European institutions</p>
        <p>The Smithsonian Institution 7 ofers more than 3 million digital images and assets through
their open access platform, allowing users to explore and download high-resolution images of
1https://lod-cloud.net/
2https://pro.europeana.eu/data
3https://www.cidoc-crm.org/
4https://www.w3.org/TR/2009/REC-skos-reference-20090818/
5http://xmlns.com/foaf/0.1/
6https://www.dublincore.org/
7https://www.si.edu/openaccess
their collection items. More than 230,000 museum object records from across the 14 institutions
have been converted to LOD and are available to be explored through one interface.</p>
        <p>Open Heritage 3D 8 is a project that aims to digitize and make available 3D models of cultural
heritage sites and artifacts. Their datasets can be downloaded for free and used for research,
education, and preservation purposes.</p>
        <p>The British Library 9 published some of its collections as LOD by using existing RDF
vocabularies and ontologies, including BIBO, BIO, Dublin Core, FOAF, GeoNames, Schema.org and
SKOS. Over the years, the British Library extended its services and it actually provides access
to over 150 digital collections, some of which can be queried from the SPARQL endpoint, such
as the British National Bibliography.</p>
        <p>WarSampo 10 is a shared semantic infrastructure and a LOD service for publishing data about
WW2, with a focus on Finnish military history, which contains over 14 million triples. It uses
some existing vocabularies such as CIDOC-CRM, Dublin Core and SKOS.</p>
        <p>Enslaved 11 aims to create a LOD portal that allow users to easily query and inspect integrated
historical data related to the slave trade. To efectively combine the vast array of diverse data
sources commonly found in historical research communities, an ontology schema has been
developed using the OWL 2 DL profile.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Materials &amp; Methods</title>
      <sec id="sec-3-1">
        <title>3.1. Source Dataset</title>
        <p>The source dataset of arkivo derives from the Józef Piłsudski Institute of America (JPIA), which
is a non-profit organization dedicated to preserving and promoting the legacy of Józef Piłsudski,
a prominent Polish statesman and military leader who played a key role in the history of Poland
during the early 20th century. The Institute was founded in 1943 by a group of Polish-American
intellectuals and scholars who sought to honor Piłsudski’s contributions to Polish independence
and sovereignty.</p>
        <p>Located in New York City, the JPIA is home to an extensive collection of archives, manuscripts,
photographs, and artifacts related to Piłsudski’s life and career, as well as the history of Poland
and the Polish-American community. Its collections and resources are open to the public, and it
welcomes visitors and researchers from all over the world. Most of the archival documents are
written in Polish, but the number of documents in other languages – including Italian, English,
Russian, French, Portuguese – is significant. Table 2 reports all the heterogeneous source of data
of arkivo dataset. The source data are in diferent formats, such as PDF documents, printed
texts, letters, photographs, video, digital images, and spreadsheets.</p>
        <p>
          In the last five years, these JPIA’s collections of historical materials have been annotated,
digitized, full-text indexed, and gradually put online on the website of the Institute - archival
collections are available at http://archiwa.pilsudski.org/index.php. The manual annotation
process of the archival collections has been carried out in the following stages.
8https://artsandculture.google.com/project/openheritage
9https://www.bl.uk/
10https://www.ldf.fi/dataset/warsa
11https://enslaved.org/
1. Archive workers from JPIA annotated documents manually with relevant entities – such
as title, author, date of creation, mentioned persons and/or event – and reported the
annotations into Excel spreadsheets.
2. All annotations have been rigorously inspected by domain experts in order to assess the
accuracy. The annotation of historical texts is a specialized activity that requires not only
knowledge related to the cultural context, but also linguistic knowledge related to the
evolution of language. One of the additional dificulties in this work is due to the fact
that the considered dataset is multilingual.
3. Using OpenRefine [
          <xref ref-type="bibr" rid="ref31">31</xref>
          ], Excel tables have been transformed in order to clean and reshape
the data as our reference data model needed. Then data has been converted in CSV format.
4. Using Tarql 12, which requires a practical knowledge of SPARQL, we mapped CSV data
into RDF format.
5. RDF data has been stored on Stardog [
          <xref ref-type="bibr" rid="ref32">32</xref>
          ], a commercial RDF triple store with fast SPARQL
query, transactions, and OWL reasoning support.
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Ontology Data Model</title>
        <p>
          The source datasets reported in Table 2 were harmonized and transformed in RDF format in
order to populate the arkivo ontology. This ontology was developed in order to provide a
common reference schema able to represent not only the hierarchical structure of archival
documents, but also some essential data embedded within the textual content of these documents.
In fact, it captures the standard levels of archival structure, ranging from the highest level of
12"SPARQL for Tables: Turn CSV into RDF using SPARQL syntax", https://tarql.github.io/examples/, accessed:
2023-03-03.
a collection, which can consist of other collections or individual items, down to the smallest
indivisible unit, or single item. Additionally, the ontology includes modeling of certain historical
elements referenced in archival documents, and serves as a reference schema for publishing
them as LOD. arkivo ontology was developed using a top-down approach, which involves
identifying the most general concepts in the domain before proceeding to the more specific
ones. This methodology is closely aligned with the approach outlined in [
          <xref ref-type="bibr" rid="ref33">33</xref>
          ] and enables the
development of simple, modular, and reusable ontologies that can easily adapt to future changes
and expansions.
        </p>
        <p>
          The ontology was developed using the OWL 2 DL profile, which provides support for
constructs such as universal quantification, inverse object properties, and disjunctions. This
language was chosen because it allows domain experts to encode the knowledge deemed important
for the ontology. Additionally, OWL 2 DL allows for reasoning over the ontology to ensure
consistency [
          <xref ref-type="bibr" rid="ref34">34</xref>
          ]. The current version of the ontology is composed of 46 classes, 26 object
properties, 34 data properties, and 280,282 axioms. In the following, we pinpoint some of the
main classes and properties of arkivo ontology.
        </p>
        <p>
          During the ontology development, some parts of existing ontologies were reused for the
scope of this study, since it is widely acknowledged that promoting the integration and reuse
of existing standard metadata and vocabularies is one of the best practices in the Semantic
Web field. This approach can accelerate the ontology design process and ensure extensibility
and interoperability with other resources and applications [
          <xref ref-type="bibr" rid="ref35">35</xref>
          ]. Table 3 lists the vocabularies
employed as core ontologies of arkivo.
        </p>
        <p>The main classes of arkivo are Collection, which represents the set of documents or
collections, and Item, which is the smallest indivisible unit of an archive. In order to
describe the structure of the archive, diferent subclasses of the class Collection are modeled,
namely the subclasses Fonds, File and Series. Using existential quantification property restriction
(owl:someValuesFrom), we defined that the class Item as the class of individuals that are
linked to individuals in the class Fonds by the isPartOf property, as shown below using the
DL syntax.</p>
        <p>⊑ ∃  .</p>
        <p>This means that there is an expectation that every instance of Item is part of a collection, and
that collection is a member of the class Fonds. This is useful to capture incomplete knowledge.
For example, if we know that the individual 701.180/11884 is an item, we can infer that it is
part at least of one collection.</p>
        <p>Moreover, we defined some union of classes for those classes that perform a specific function
on the ontology. In this case, we used owl:unionOf constructor to combine atomic classes to
complex classes, as we describe in the following:
 ℎ ≡</p>
        <p>⊔  ⊔</p>
        <p>This class denotes things created by agents and it includes individuals that are contained in
at least one of the classes Collection, HistoricalEvent or Item.</p>
        <p>ℎ ≡</p>
        <p>⊔  ⊔ 
It refers to things, such as date, place and agent, that are related to individuals in the
CreativeThing by the object property isMentionedIn, and it includes individuals that
belong to at least one of the classes Place, Date or Agent.</p>
        <p>
          The full ontology documentation is available at https://github.com/arkivoTeam/arkivo, and
the .owl file is available under a Creative Commons CC BY 4.0 license. The latest version builds
upon and extends some previous contributions, i.e. [
          <xref ref-type="bibr" rid="ref36 ref37 ref38 ref39">36, 37, 38, 39</xref>
          ].
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Dataset</title>
      <p>arkivo ontology were used to describe 12,848 collections and 28,644 items of archival holdings
of the JPIA. Taking advantage of the reference schema provided by the ontology for publishing
LOD, an integration process of data coming from diferent sources was carried out. This allows
us to link the resources of Piłsudski Archival Collections to external datasets of the LOD cloud
in order to enrich the information provided with each resource. We referred the most common
datasets for identifying people, organizations and historical events, such as Wikidata, DBpedia,
and VIAF (Virtual International Authority File).</p>
      <p>Figure 1 reports an example of individuals and properties of the arkivo dataset and highliths
how these data have been linked to external resources, such as Wikidata (wd prefix) and DBpedia
(dbo prefix). In this particular example, individual 701.180/6216 of the class Item is related to
its title and to its date of creation. This item, which is part of the file A701.111.003, is linked,
via the object property mentions, to the person mentioned in it, i.e. Roosevelt Franklin Delano,
which is in turn linked to other external instances and data in the LOD cloud.</p>
      <p>
        The dataset is freely available under a Creative Commons CC BY 4.0 license at https://
github.com/ArkivoTeam/ARKIVO [
        <xref ref-type="bibr" rid="ref40">40</xref>
        ]. A typical use case is the discovery of historical data
for a more comprehensive and interconnected understanding of historical events, movements,
political and cultural developments. Also, it can be used as a benchmark for the evaluation
of systems that automatically annotate entities, such as places, persons and organizations, in
unstructured documents. arkivo dataset could be especially useful to conduct named entity
extraction and linking task, which refers to task devoted to identify mentions of entities in a
text and linking them to a reference knowledge base provided as input [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. This process is also
known as entity disambiguation since it typically requires annotating a potentially ambiguous
entity mentioned with a link to an identifier that describes a unique entity [
        <xref ref-type="bibr" rid="ref41">41</xref>
        ]. For example,
the arkivo dataset’s resource G11499 is linked to its Polish name Wielka Brytania via the
schema:name data property. In order to provide a disambiguation target, the resource G11499
is linked via the owl:sameAs property to the unique identifier of Wikidata ( wd:Q295688), which
has its own name data property Great Britain. This example is graphically depicted in Figure 2.
      </p>
      <p>The collections of archival historical texts from which arkivo dataset originated are available
in PDF and published online at http://archiwa.pilsudski.org/index.php#1.</p>
      <p>In the following, we report a straightforward example to explain how the proposed dataset
can be used as a benchmark for named entity extraction. Let suppose that we extracted entities
using any Named Entity Recognition (NER) tool from a set of documents, including the one
represented in Figure 3. In the depicted excerpt, the entities that our NER tool should be able to
detect and extract are marked in green (person entities) and in red (place entities) colours.</p>
      <p>Using our dataset as benchmark, we can obtain the actual named entities in the document by
querying it using simple SPARQL queries, such as the one reported in Figure 4.</p>
      <p>In Table 4, the obtained names’ entities and the class to which they belong are reported. Note
that the SPARQL query results refer to the whole document and not to the only excerpt depicted
above.</p>
      <p>
        Furthermore, considering the lack of expressive ontologies and language element
combinations, arkivo can also be used for ontology benchmarking purposes, such as those presented
in [
        <xref ref-type="bibr" rid="ref42">42</xref>
        ], since it provides good coverage of the OWL 2 language constructs.
      </p>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion</title>
      <p>This paper presented the approach we applied for building a dataset focused on historical texts
coming from the JPIA. The aim is to create a dataset that can be used not only for its intrinsic
value as historical artifact, but also a benchmark to evaluate information extraction tools and
methods devoted to automatically annotate entities in unstructured documents, such as places,
persons, and organizations. Moreover, the presented dataset can also be used for ontology
benchmarking purposes.</p>
      <p>
        The main obstacle of the whole work was represented by the manual annotation activity,
which was a very time-consuming process. With this regard, our current research direction
consists in the development of a semi-automatic ontology-based annotation process from texts
by exploiting some of the techniques presented in [
        <xref ref-type="bibr" rid="ref43 ref44">43, 44</xref>
        ]. The implemented approach will
mainly rely on a combination of natural language process and information extraction techniques
without an extensive involvement of domain experts for the validation of the extracted instances.
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>P.</given-names>
            <surname>Svensson</surname>
          </string-name>
          ,
          <article-title>Humanities Computing as Digital Humanities</article-title>
          , in: Defining Digital Humanities, Routledge,
          <year>2016</year>
          , pp.
          <fpage>175</fpage>
          -
          <lpage>202</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>G.</given-names>
            <surname>Adorni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Maratea</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Pandolfo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Pulina</surname>
          </string-name>
          ,
          <article-title>An Ontology-Based Archive for Historical Research</article-title>
          , in: Description Logics,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>G.</given-names>
            <surname>Adorni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Maratea</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Pandolfo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Pulina</surname>
          </string-name>
          ,
          <article-title>An Ontology for Historical Research Documents</article-title>
          ,
          <source>in: Web Reasoning and Rule Systems: 9th International Conference, RR 2015</source>
          , Berlin, Germany,
          <source>August 4-5</source>
          ,
          <year>2015</year>
          , Proceedings. 9, Springer,
          <year>2015</year>
          , pp.
          <fpage>11</fpage>
          -
          <lpage>18</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>T.</given-names>
            <surname>Jo</surname>
          </string-name>
          , Text Mining,
          <article-title>Studies in Big Data (</article-title>
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>R.</given-names>
            <surname>Grishman</surname>
          </string-name>
          , Information Extraction,
          <source>IEEE Intelligent Systems</source>
          <volume>30</volume>
          (
          <year>2015</year>
          )
          <fpage>8</fpage>
          -
          <lpage>15</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>K.</given-names>
            <surname>Chowdhary</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Chowdhary</surname>
          </string-name>
          ,
          <source>Natural Language Processing, Fundamentals of Artificial Intelligence</source>
          (
          <year>2020</year>
          )
          <fpage>603</fpage>
          -
          <lpage>649</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>K.</given-names>
            <surname>Adnan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Akbar</surname>
          </string-name>
          ,
          <article-title>An analytical study of information extraction from unstructured and multidimensional big data</article-title>
          ,
          <source>Journal of Big Data</source>
          <volume>6</volume>
          (
          <year>2019</year>
          )
          <fpage>1</fpage>
          -
          <lpage>38</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>J. L.</given-names>
            <surname>Martinez-Rodriguez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Hogan</surname>
          </string-name>
          ,
          <string-name>
            <surname>I.</surname>
          </string-name>
          <article-title>Lopez-Arevalo, Information extraction meets the semantic web: a survey, Semantic Web (</article-title>
          <year>2020</year>
          )
          <fpage>1</fpage>
          -
          <lpage>81</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>D. C.</given-names>
            <surname>Wimalasuriya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Dou</surname>
          </string-name>
          ,
          <article-title>Ontology-based information extraction: An introduction and a survey of current approaches</article-title>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>A.</given-names>
            <surname>Konys</surname>
          </string-name>
          ,
          <article-title>Towards knowledge handling in ontology-based information extraction systems</article-title>
          ,
          <source>Procedia computer science 126</source>
          (
          <year>2018</year>
          )
          <fpage>2208</fpage>
          -
          <lpage>2218</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>D. A. de Araujo</surname>
            ,
            <given-names>S. J.</given-names>
          </string-name>
          <string-name>
            <surname>Rigo</surname>
            ,
            <given-names>J. L. V.</given-names>
          </string-name>
          <string-name>
            <surname>Barbosa</surname>
          </string-name>
          ,
          <article-title>Ontology-based information extraction for juridical events with case studies in brazilian legal realm</article-title>
          ,
          <source>Artificial Intelligence and Law</source>
          <volume>25</volume>
          (
          <year>2017</year>
          )
          <fpage>379</fpage>
          -
          <lpage>396</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>I. El</given-names>
            <surname>Naqa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. J.</given-names>
            <surname>Murphy</surname>
          </string-name>
          , What is Machine Learning?, Springer,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>O.</given-names>
            <surname>Suissa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Elmalech</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zhitomirsky-Gefet</surname>
          </string-name>
          ,
          <article-title>Text Analysis Using Deep Neural Networks in Digital Humanities and Information Science</article-title>
          ,
          <source>Journal of the Association for Information Science and Technology</source>
          <volume>73</volume>
          (
          <year>2022</year>
          )
          <fpage>268</fpage>
          -
          <lpage>287</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>O.</given-names>
            <surname>Dereza</surname>
          </string-name>
          ,
          <article-title>Lemmatization for Ancient Languages: Rules or Neural Networks?</article-title>
          ,
          <source>in: Artificial Intelligence and Natural Language: 7th International Conference, AINL</source>
          <year>2018</year>
          ,
          <article-title>St</article-title>
          . Petersburg, Russia,
          <source>October 17-19</source>
          ,
          <year>2018</year>
          , Proceedings 7, Springer,
          <year>2018</year>
          , pp.
          <fpage>35</fpage>
          -
          <lpage>47</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>C.</given-names>
            <surname>Yan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Su</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <article-title>Mogcn: Mixture of Gated Convolutional Neural Network for Named Entity Recognition of Chinese Historical Texts</article-title>
          , IEEE Access 8
          <article-title>(</article-title>
          <year>2020</year>
          )
          <fpage>181629</fpage>
          -
          <lpage>181639</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>N.</given-names>
            <surname>Guarino</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Oberle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Staab</surname>
          </string-name>
          ,
          <article-title>What is an Ontology?</article-title>
          , Handbook on Ontologies (
          <year>2009</year>
          )
          <fpage>1</fpage>
          -
          <lpage>17</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>B.</given-names>
            <surname>McBride</surname>
          </string-name>
          ,
          <article-title>The Resource Description Framework (rdf) and its Vocabulary Description Language rdfs</article-title>
          , Handbook on Ontologies (
          <year>2004</year>
          )
          <fpage>51</fpage>
          -
          <lpage>65</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>G.</given-names>
            <surname>Antoniou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. v.</given-names>
            <surname>Harmelen</surname>
          </string-name>
          , Web Ontology Language: OWL, Handbook on Ontologies (
          <year>2009</year>
          )
          <fpage>91</fpage>
          -
          <lpage>110</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>P.</given-names>
            <surname>Hitzler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Krötzsch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Parsia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. F.</given-names>
            <surname>Patel-Schneider</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Rudolph</surname>
          </string-name>
          , et al.,
          <source>Owl 2 Web Ontology Language Primer, W3C Recommendation</source>
          <volume>27</volume>
          (
          <year>2009</year>
          )
          <fpage>123</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>F.</given-names>
            <surname>Baader</surname>
          </string-name>
          , I. Horrocks, U. Sattler, Description Logics,
          <source>Foundations of Artificial Intelligence</source>
          <volume>3</volume>
          (
          <year>2008</year>
          )
          <fpage>135</fpage>
          -
          <lpage>179</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>S.</given-names>
            <surname>Harris</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Seaborne</surname>
          </string-name>
          ,
          <string-name>
            <surname>E.</surname>
          </string-name>
          <article-title>Prud'hommeaux, Sparql 1.1 Query Language</article-title>
          ,
          <source>W3C Recommendation</source>
          <volume>21</volume>
          (
          <year>2013</year>
          )
          <fpage>778</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>C.</given-names>
            <surname>Bizer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Heath</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Idehen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Berners-Lee</surname>
          </string-name>
          ,
          <article-title>Linked Data on the Web (ldow2008)</article-title>
          ,
          <source>in: Proceedings of the 17th international conference on World Wide Web</source>
          ,
          <year>2008</year>
          , pp.
          <fpage>1265</fpage>
          -
          <lpage>1266</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>M. T.</given-names>
            <surname>Kone</surname>
          </string-name>
          ,
          <article-title>State of the art in semantic organizational knowledge, Encyclopedia of Organizational Knowledge, Administration,</article-title>
          and
          <string-name>
            <surname>Technology</surname>
          </string-name>
          (
          <year>2021</year>
          )
          <fpage>1762</fpage>
          -
          <lpage>1773</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>M.</given-names>
            <surname>Mountantonakis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Tzitzikas</surname>
          </string-name>
          ,
          <article-title>Large-Scale Semantic Integration of Linked Data: A Survey, ACM Computing Surveys (CSUR) 52 (</article-title>
          <year>2019</year>
          )
          <fpage>1</fpage>
          -
          <lpage>40</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>D.</given-names>
            <surname>Feitosa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Dermeval</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Ávila</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I. I.</given-names>
            <surname>Bittencourt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. F.</given-names>
            <surname>Lóscio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Isotani</surname>
          </string-name>
          ,
          <string-name>
            <surname>A Systematic</surname>
          </string-name>
          <article-title>Review on the Use of Best Practices for Publishing Linked Data</article-title>
          ,
          <source>Online Information Review</source>
          <volume>42</volume>
          (
          <year>2018</year>
          )
          <fpage>107</fpage>
          -
          <lpage>123</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>L.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <article-title>Empowering Linked Data in Cultural Heritage Institutions: A Knowledge Management Perspective</article-title>
          ,
          <source>Data and Information Management</source>
          <volume>6</volume>
          (
          <year>2022</year>
          )
          <fpage>100013</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>E.</given-names>
            <surname>Davis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Heravi</surname>
          </string-name>
          ,
          <article-title>Linked Data and Cultural Heritage: a Systematic Review of Participation, Collaboration, and Motivation</article-title>
          ,
          <source>Journal on Computing and Cultural Heritage (JOCCH) 14</source>
          (
          <year>2021</year>
          )
          <fpage>1</fpage>
          -
          <lpage>18</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>J. Z.</given-names>
            <surname>Pan</surname>
          </string-name>
          , G. Vetere,
          <string-name>
            <given-names>J. M.</given-names>
            <surname>Gomez-Perez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <source>Exploiting Linked Data and Knowledge Graphs in Large Organisations</source>
          , Springer,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>P.</given-names>
            <surname>Ristoski</surname>
          </string-name>
          ,
          <string-name>
            <surname>G. K. D. De Vries</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Paulheim</surname>
          </string-name>
          ,
          <article-title>A Collection of Benchmark Datasets for Systematic Evaluations of Machine Learning on the Semantic Web, in: The Semantic Web-ISWC</article-title>
          <year>2016</year>
          : 15th International Semantic Web Conference, Kobe, Japan,
          <source>October 17-21</source>
          ,
          <year>2016</year>
          , Proceedings,
          <source>Part II 15</source>
          , Springer,
          <year>2016</year>
          , pp.
          <fpage>186</fpage>
          -
          <lpage>194</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [30]
          <string-name>
            <given-names>M.</given-names>
            <surname>Mountantonakis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Tzitzikas</surname>
          </string-name>
          ,
          <article-title>How Linked Data can Aid Machine Learning-Based Tasks</article-title>
          ,
          <source>in: Research and Advanced Technology for Digital Libraries: 21st International Conference on Theory and Practice of Digital Libraries, TPDL</source>
          <year>2017</year>
          , Thessaloniki, Greece,
          <source>September 18-21</source>
          ,
          <year>2017</year>
          , Proceedings, Springer,
          <year>2017</year>
          , pp.
          <fpage>155</fpage>
          -
          <lpage>168</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [31]
          <string-name>
            <given-names>R.</given-names>
            <surname>Verborgh</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. De Wilde</surname>
          </string-name>
          ,
          <string-name>
            <surname>Using</surname>
            <given-names>OpenRefine</given-names>
          </string-name>
          , Packt Publishing Ltd,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          [32]
          <string-name>
            <given-names>S.</given-names>
            <surname>Union</surname>
          </string-name>
          , Stardog,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          [33]
          <string-name>
            <given-names>E.</given-names>
            <surname>Blomqvist</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Hammar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Presutti</surname>
          </string-name>
          ,
          <article-title>Engineering Ontologies with Patterns-The eXtreme Design Methodology, Ontology Engineering with Ontology Design Patterns (</article-title>
          <year>2016</year>
          )
          <fpage>23</fpage>
          -
          <lpage>50</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          [34]
          <string-name>
            <given-names>D.</given-names>
            <surname>Riboni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Bettini</surname>
          </string-name>
          ,
          <article-title>Owl 2 Modeling and Reasoning with Complex Human Activities</article-title>
          ,
          <source>Pervasive and Mobile Computing</source>
          <volume>7</volume>
          (
          <year>2011</year>
          )
          <fpage>379</fpage>
          -
          <lpage>395</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          [35]
          <string-name>
            <given-names>M.</given-names>
            <surname>Katsumi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Grüninger</surname>
          </string-name>
          , Choosing Ontologies for Reuse,
          <source>Applied Ontology</source>
          <volume>12</volume>
          (
          <year>2017</year>
          )
          <fpage>195</fpage>
          -
          <lpage>221</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          [36]
          <string-name>
            <given-names>L.</given-names>
            <surname>Pandolfo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Pulina</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zielinski</surname>
          </string-name>
          ,
          <article-title>Towards an Ontology for Describing Archival Resources</article-title>
          , in: WHiSe@ ISWC,
          <year>2017</year>
          , pp.
          <fpage>111</fpage>
          -
          <lpage>116</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          [37]
          <string-name>
            <given-names>L.</given-names>
            <surname>Pandolfo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Pulina</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zielinski</surname>
          </string-name>
          ,
          <article-title>Arkivo: an ontology for describing archival resources</article-title>
          , in: CILC,
          <year>2018</year>
          , pp.
          <fpage>112</fpage>
          -
          <lpage>116</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref38">
        <mixed-citation>
          [38]
          <string-name>
            <given-names>L.</given-names>
            <surname>Pandolfo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Pulina</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zieliński</surname>
          </string-name>
          ,
          <source>Exploring Semantic Archival Collections: the Case of Piłsudski Institute of America, in: Digital Libraries: Supporting Open Science: 15th Italian Research Conference on Digital Libraries, IRCDL 2019</source>
          , Springer,
          <year>2019</year>
          , pp.
          <fpage>107</fpage>
          -
          <lpage>121</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref39">
        <mixed-citation>
          [39]
          <string-name>
            <given-names>L.</given-names>
            <surname>Pandolfo</surname>
          </string-name>
          , L. Pulina,
          <article-title>Building the Semantic Layer of the Józef Piłsudski Digital Archive with an Ontology-Based Approach</article-title>
          ,
          <source>International Journal on Semantic Web and Information Systems (IJSWIS) 17</source>
          (
          <year>2021</year>
          )
          <fpage>1</fpage>
          -
          <lpage>21</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref40">
        <mixed-citation>
          [40]
          <string-name>
            <given-names>L.</given-names>
            <surname>Pandolfo</surname>
          </string-name>
          , L. Pulina, ARKIVO Dataset:
          <article-title>a Benchmark for Ontology-based Extraction Tools</article-title>
          , in: WEBIST,
          <year>2021</year>
          , pp.
          <fpage>341</fpage>
          -
          <lpage>345</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref41">
        <mixed-citation>
          [41]
          <string-name>
            <given-names>L.</given-names>
            <surname>Derczynski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Maynard</surname>
          </string-name>
          , G. Rizzo,
          <string-name>
            <surname>M. Van Erp</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          <string-name>
            <surname>Gorrell</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Troncy</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Petrak</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Bontcheva</surname>
          </string-name>
          ,
          <article-title>Analysis of Named Entity Recognition and Linking for Tweets</article-title>
          ,
          <source>Information Processing &amp; Management</source>
          <volume>51</volume>
          (
          <year>2015</year>
          )
          <fpage>32</fpage>
          -
          <lpage>49</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref42">
        <mixed-citation>
          [42]
          <string-name>
            <given-names>O.</given-names>
            <surname>Zamazal</surname>
          </string-name>
          ,
          <article-title>A Survey of Ontology Benchmarks for Semantic Web Ontology Tools</article-title>
          ,
          <source>International Journal on Semantic Web and Information Systems (IJSWIS) 16</source>
          (
          <year>2020</year>
          )
          <fpage>47</fpage>
          -
          <lpage>68</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref43">
        <mixed-citation>
          [43]
          <string-name>
            <given-names>L.</given-names>
            <surname>Pandolfo</surname>
          </string-name>
          , L. Pulina,
          <article-title>Adnoto: A self-adaptive system for automatic ontology-based annotation of unstructured documents</article-title>
          ,
          <source>in: Advances in Artificial Intelligence: From Theory to Practice: 30th International Conference on Industrial Engineering and Other Applications of Applied Intelligent Systems, IEA/AIE 2017</source>
          , Springer,
          <year>2017</year>
          , pp.
          <fpage>495</fpage>
          -
          <lpage>501</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref44">
        <mixed-citation>
          [44]
          <string-name>
            <given-names>L.</given-names>
            <surname>Pandolfo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Pulina</surname>
          </string-name>
          ,
          <string-name>
            <surname>G.</surname>
          </string-name>
          <article-title>Adorni, A Framework for Automatic Population of OntologyBased Digital Libraries</article-title>
          , in:
          <source>AI*IA 2016: Advances in Artificial Intelligence</source>
          , volume
          <volume>10037</volume>
          of Lecture Notes in Computer Science, Springer,
          <year>2016</year>
          , pp.
          <fpage>406</fpage>
          -
          <lpage>417</lpage>
          . URL: https://doi.org/10. 1007/978-3-
          <fpage>319</fpage>
          -49130-1_
          <fpage>30</fpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>319</fpage>
          -49130-1\_
          <fpage>30</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>