=Paper=
{{Paper
|id=Vol-3152/BD2019_paper6
|storemode=property
|title=Data Exchange in Practice: Towards a Prosopographical API
|pdfUrl=https://ceur-ws.org/Vol-3152/BD2019_paper_6.pdf
|volume=Vol-3152
|authors=Georg Vogeler,Gunter Vasold,Matthias Schlögl
|dblpUrl=https://dblp.org/rec/conf/bd/VogelerVS19
}}
==Data Exchange in Practice: Towards a Prosopographical API==
Data Exchange in Practice: Towards a Prosopographical API
Georg Vogeler, Gunter Vasold, Matthias Schlögl
Universität Graz / Österreichische Akademie der Wissenschaften, Universität Graz, Österreichische Akademie der Wissenschaften
georg.vogeler@uni-graz.at, gunter.vasold@uni-graz.at, matthias.schloegl@oeaw.ac.at
Abstract
The paper discusses the question whether methods applied in work with prosopographical data can be integrated into an ”International
Proposography Interoperability Framework” (IPIF) comparable to the IIIF (International Image Interoperability Framework). It suggests
basing this upon a RESTful interface defined in a publicly available definition of an API. The API is based on the three-partite factoid
model, thus keeping the identification of the person, information on the person, and the source documenting this information separated,
aggregating it in a “factoid” with the metadata of its creation. The API definition follows the rules of OpenAPI which allows automatic
code generation. As a proof of concept, the API has been implemented in the context of the APIS project, containing the information
of the Austrian biographical lexicon, and with data from monasterium.net, the world wide largest database of medieval charters. It
introduces a wrapper for LOBID and wikidata and a standalone server created with code generation from the OpenAPI specification as
a second proof of concept implementation. In a third proof of concept it showcases the reuse of tools originally developed for APIS in
other services.
Keywords: RESTful API, OpenAPI, factoid model, APIS, monasterium.net, IPIF, Data aggregation, prosopographical resources.
1 Introduction reuse information on a person, of which they have some ba-
sic information (usually a name) available. Information on
This paper discusses whether methods applied in work with
this person can be provided by a wide range of resources:
prosopographical data can be integrated into an ”Interna-
authority files (usually aggregated into VIAF); general-
tional Proposography Interoperability Framework” (IPIF)
purpose databases like Wikidata; general prosopographi-
comparable to the IIIF (International Image Interoperabil-
cal databases like those created from ancient and medieval
ity Framework). Prosopography is a field of research in
sources (e.g. DPRR, PASE); or specific prosopographical
which biographical information on a selected population
databases documenting a group like clergy (e.g. CCEd);
of individuals are aggregated to detect patterns in relation-
parliamentarians (e.g. BIOPARL) or artists (e.g. Bay-
ships, careers, and other attributes relevant for social, cul-
erisches Musiker-Lexikon Online. http://bmlo.de/);
tural, or political observations (Charle, 2015). This field
biographical dictionaries in prose like the wide range of na-
of study can draw on a wide range of resources, ranging
tional biographical dictionaries of which (Fox, 2019) gives
from scholarly publications of historical sources to dedi-
the most recent overview, or community efforts by family
cated prosopographical databases, processed with a wide
historians. Rich information is available in resources not
range of methods - from natural language processing to
even dedicated to prosopographical information like schol-
network analysis - and it has a rich history of research, in
arly editions, as in tax registers (for example the Hearth Tax
which the digital transformation took place early (Althoff,
series by the British Academy1 ) or letters, as for instance
1976; Imhof, 1978; Bulst, 1989; Goudriaan, 1995; Keats-
aggregated in the correspSearch service2 , or results of au-
Rohan, 2007). To bring this “digital prosopography” for-
tomatic text annotation, for instance in OCRed newspapers
ward, we take up on earlier proposals to create APIs for
from projects like NewsEye3 . In the scenario the users
prosopographical data (Ebneth and Reinert, 2017) and sug-
would use an application with a search parameter and list of
gest a shared RESTful API definition to facilitate communi-
URLs of arbitrary IPIF endpoints. This application can be a
cation between prosopographical data resources, web front
GUI with a search slot or part of an automatic information
ends, and analytical tools. This is motivated by the identi-
linking architecture. Typically, the user would use this ap-
fication of common use cases and the need for an efficient
plication to search for the name at question and resolve the
technical infrastructure. This paper will start with a de-
name to matching authority IDs. This functionality could
scription of the motivation (section 2), describe the current
be included in any kind of editing and annotation process,
proposal in detail (section 3), and demonstrate its feasibility
e.g. tools like Pelagios/Recogito4 could be extended be-
in three prototypes (section 4).
yond geographical gazetteers, or tools for scholarly editing
like ediarum5 could extend TEI-P5 person markup beyond
2 Motivation the link to a local databases.
2.1 Common Use Cases 1
http://gams.uni-graz.at/context:htx
An IPIF must be based on concrete application scenarios. A 2
https://correspsearch.net/
major use case of the IPIF is an aggregated query on several 3
https://www.newseye.eu/
services. Whenever a humanities scholar mentions a per- 4
https://recogito.pelagios.org/
son, the easy look-up of existing information on the person 5
http://www.bbaw.de/telota/software/
is a standard requirement. The users want to look up and ediarum
Copyright © 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
We consider the reuse of basic analytical tools as the second 2.2 Efficient technical infrastructure
major use case. Networks of connections between persons The second major motivation for the proposal of IPIF is
or lists of events in the life of a single person are analytical the lack of an efficient technical infrastructure to serve
tools applied in many projects. Most prosopographical re- these use cases. We consider the Web of Data activities of
sources offer network visualisations, e.g. ego-networks on the W3C to be the best conceptual approach to publishing
a single entry or networks of a selected population6 . The databases and for making individual small amounts of data
second analytical use case is the display of chronologically queryable on the web. Data publication in RDF is widely
sorted structured information on a single person. IPIF will used and can be considered a well-supported technology.
facilitate interfaces creating these common visualizations For the programmer’s access to these data publications, the
from a single resource or by aggregating information across W3C proposed to make the RDF data not only available as
distributed resources. Another common usage scenario data dumps but also via a query API, the SPARQL pro-
is the enrichment of prose. Several resources add struc- tocol10 . The SPARQL protocol defines a RESTful API
tured information to biographical prose: examples include which allows submitting queries in a common query lan-
Wikipedia infoboxes7 , introductory notes in biographical guage (SPARQL) and receiving structured data in return.
databases, and the standard display of cosmotool8 . Again, SPARQL is designed to cover arbitrary levels of complex-
IPIF will facilitate the creation of these enrichments from ity. The API is therefore completely agnostic to the data
a single resource as well as aggregating it from distributed queried.
resources. These functionalities do not rely on modelling This creates problems for the envisaged use cases: First,
decisions affected by specific research questions or data the use cases and the power of a full SPARQL endpoint
sources but reflect common prosopographical use cases. are not balanced. While the use cases can be served by
There are other use cases which could be addressed by IPIF, simple and efficient queries, the technical maintenance of
although they might be less obvious and much more de- a SPARQL endpoint is very demanding. As a SPARQL
pendent on individual modelling decisions. We would like endpoint is designed to process any kind of query possi-
to summarize them under the headings ”Biographical Dic- ble, (computationally-)expensive or even harmful queries
tionary,” ”Fact Checking,” ”New Interpretation,” ”Database are possible, e.g. requesting huge amounts of data or filter-
Publication,” but assume that this list can be easily ex- ing on a large dataset without making use of indices.11 This
tended. Let us describe some of them briefly: The author and the lack of native permission/user management speci-
of any kind of biogram (“Biography Dictionary”), e.g. in fications in SPARQL are arguments for system administra-
a biographical dictionary, would profit from a search in- tors to prefer not to offer a SPARQL-endpoint. Even big
terface aggregating textual fragments from primary (and institutions providing central data resources to the public
secondary) sources including the reference. The “New In- such as the German National Library (GND) decided not to
terpretation” scenario adds new interpretations of existing maintain a SPARQL endpoint for their data.
sources on a person, while “Fact Checking” looks for jus- Second, querying an RDF resource does require fundamen-
tifications of statements about a person in a set of digital tal knowledge of SPARQL, and a deep understanding of the
resources. These scenarios depict both research with proso- ontology used. Wikidata, for instance, uses a set of more
pographical data as well as the generation of new data. than 6,000 RDF properties.12
The definition of the API has been influenced by a group While SPARQL offers a standardized interface to any triple
of scholars working on religious orders9 . During a work- store, it does not solve the problem of ontology alignment.
shop held in 2017 in Vienna the group discussed how to That means that a useful federated query is only possible
use/improve the API for the edition of primary resources. across triple stores that use known ontologies or map to a
common ontology. In prosopography, the attempts to map
data to CIDOC CRM in the data4history consortium seem
the most promising, but have not yet found sufficient im-
6
plementations.
E.g. the online portal of “Deutsche Biographie”:
Several approaches exist to solve the technical problems
https://www.deutsche-biographie.de/graph?
id=sfz53095
created by SPARQL endpoints, like moving parts of the
7
https://en.wikipedia.org/wiki/Help: complexity of the SPARQL request into the client via
Infobox dumps; smart brokers between requests and SPARQL
8
https://cosmotool.de.dariah.eu/, an example server like Linked Data Fragments (Verborgh et al 2016)13 ;
for the gives https://cosmotool.de.dariah.eu/ or data virtualization services, like OpenLink Virtuoso uri-
cosmotool/person/Wikidata_Q5879 burner.14 Other approaches hide the SPARQL endpoint be-
9
Participants were Thomas Wallnig (Vienna), P. Alkuin hind a RESTful API that transforms API requests into pre-
Schachenmeyer OCist (Klosterneuburg), Irene Rabl (Vienna),
Hedvika Kuchařová (Prague), Jana Borovičková (Prague), Miguel 10
https://www.w3.org/TR/sparql11-protocol/
Vieira (London), James Kelly (Durham), Nada Zečević (London), 11
The SPARQL FILTER() procedure is applied only after ma-
Daniel Jeller (Vienna), Stephan Makowski (Cologne), Ekaterini terialisation of the graph selected by the graph pattern in the query
Mitsiou (Vienna), Christian Popp (Göttingen), Stefan Eichert (Vi- 12
https://www.wikidata.org/wiki/Wikidata:
enna), Bärbel Kröger (Göttingen), Gunter Vasold (Graz), Georg Database_reports/List_of_properties/all,
Vogeler (Graz) representing projects like CCEd, Germania Sacra, 2019-08-29.
Pez-Correspondence, Jesuit Networks, “Who were the nuns?”, 13
https://linkeddatafragments.org/
PRODOMO (http://prodomo.icar-us.eu/) 14
http://uriburner.com/
defined SPARQL queries like grlc (Meroño-Peñuela and data aggregation and processing. A shared API dedicated
Hoekstra, 2016) or the recently published LOBID service to prosopographical data should facilitate the implementa-
to GND data provided by the “Hochschulbibliothekszen- tion of applications for the above-mentioned use cases and
trum des Landes NRW”.15 make the data practically interoperable. To achieve this, the
Digital humanities practice seems to prefer these interme- API definition has to propose a flexible but efficient data
diate solutions. Tools like openRefine Reconciliation use model and methods to cover the core use cases described
Wikidata for our first use case efficiently. OpenRefine fol- above.
lows the path of hiding the SPARQL-Endpoint behind a
restricted API focussed on the necessary information for
3 The API
reconciliation.16 In particular the publication of resource- In the following section we will give a descriptive introduc-
specific RESTful APIs is establishing itself as alternative tion into the API. A formal description according to stan-
technology in the digital humanities. They allow far-less- dards OpenAPI is available on github.22
flexible queries compared to SPARQL endpoints, but are 3.1 The data model
much more stable and can be implemented more easily.
RESTful APIs are a quasi-industry standard and are sup- The list of possible sources given in section 2.1 might al-
ported by all web development frameworks and program- ready demonstrate that there is information on different lev-
ming languages. Several prosopographical databases offer els of description available which leads to different data
their data not via SPARQL endpoint but via their own API: models. This model is the result of workshops with proso-
the Deutsche Biographie, for instance, reuses the SOLR- pography experts and developers held in Vienna (2017,
API;17 the database of persons provided by the Germania 2019), and in the data for history-group in Leipzig 2019,23
Sacra project has defined its own API;18 and the above- which identified the diverging data models for the individ-
cited correspSearch service19 offers an API based on the ual statements about individuals as the core problem of a
TEI models used inside the service which is a similar ap- real data exchange. (Fokkens and ter Braake, 2018) have
proach to the one taken by the large digital scholarly edition already discussed the variety of data models and suggest
of all documents related to the Composer Carl Maria von a repository with data models and examples to help users
Weber.20 to understand and reuse existing models. They identify the
BIO-CRM proposed by (Tuominen et al., 2017) as a pos-
All these solutions solve the technical problems of the
sible overarching model. The data model of IPIF therefore
SPARQL endpoints, but cannot solve the problem of har-
tries to be aligned with this model.
monising the query methods, results or data models. They
Nevertheless, IPIF does not attempt to describe a concep-
usually only provide verbal descriptions of the service.
tual model covering the full range of prosopographical use
With OpenAPI (formerly swagger) and Core API, there are
cases and data sets. In fact, the scenarios described above
proposals to describe such API definitions in a standard-
do not require full harmonisation. The analytical tools dis-
ized way. These technologies allow for an easy implemen-
cussed in section 2.2 need a simple common data model.
tation of APIs and—maybe perhaps more importantly—for
It has to cover relationships between people (“Social Net-
a (semi-)automated creation of clients.
work”) and dateable information about a person (“Career”).
This approach extends established practical solutions like
Therefore, IPIF is not meant to deliver the whole richness
the BEACON format for identifier alignment21 , which has
of data that might be available in projects. For interoper-
been promoted by the Wikipedia and is adopted by many
ability on the ontology level there exist several approaches
prosopographical data resources. While BEACON defines
such as schema.org,24 CIDOC CRM,25 the Europeana Data
just a data format, we propose a dynamic solution allowing
Model,26 etc. The model presented below has to be distin-
interactions with the data source.
guished from these efforts to tackle interoperability issues
We conclude from the existence of common use cases
in the digital humanities. IPIF is not meant to replace ex-
and advantages of lightweight RESTful APIs compared to
isting data models, but rather to be an easy path to access
SPARQL endpoints, that creating a public definition of a
existing data for common use cases that simply require a
shared RESTful API is a valid path for prosopographical
downsized version of the data.
Still, the development of data models in prosopography
15
https://www.hbz-nrw.de/produkte/ has led to one common concept beyond the use cases de-
linked-open-data scribed above. IPIF realizes a conceptual model for which
16
https://github.com/OpenRefine/
OpenRefine/wiki/Reconciliation-Service-Api 22
https://github.com/GVogeler/
17
http://data.deutsche-biographie.de/ prosopogrAPhI, This paper describes the sta-
about/#solropen tus of the proposal in commit https://github.
18
https://adw-goe.de/forschung/ com/GVogeler/prosopogrAPhI/commit/
forschungsprojekte-akademienprogramm/ 7924bd513980bd776a34761806b3d6e63d99e2f5
germania-sacra/schnittstellen-und-linked-data/ 23 https://github.com/
19
https://correspsearch.net/index.xql?id= GVogeler/prosopogrAPhI/commit/
api&l=en 7924bd513980bd776a34761806b3d6e63d99e2f5
20
https://weber-gesamtausgabe.de/en/Help/ 24
http://schema.org
API_Documentation.html 25
http://www.cidoc-crm.org/
21
https://de.wikipedia.org/wiki/Wikipedia: 26
https://pro.europeana.eu/resources/
BEACON standardization-tools/edm-documentation
(Bradley and Short, 2005) introduced the term factoid. A W3C Web Annotation.32 The IPIF statement maps to
factoid is formed by three main information units: state- the annotation body, the source to the target in the Web an-
ments about an individual person justified by the source. notation data model. Both cover mainly the relationship be-
This factoid is the result of an interpretation of the source tween source and information. In fact, even RDF reification
by a researcher at a specific time, and can be considered and named graph specifications could be used to implement
as one instantiation of generic models to represent per- factoid structures, mapping the IPIF statements directly to
spectives on facts (Fokkens and ter Braake, 2018). John RDF statements.
Bradley used the conceptual model in several projects at Therefore, the model proposed by IPIF is a very simple
King’s College London (PASE, DPRR, CCEd). The gener- one: IPIF suggests a resource called factoid, which is the
ality of the model is proved by independent realisations of aggregation of the three objects: the combination of state-
the model in other prosopographical resources: The Reper- ments on a person justified by a source. As the factoid
torium Academicum Germanicum uses a similar three- is created by a scholar or an algorithm, it has to con-
part model (Baeriswyl-Andresen, 2008). The Personen- tain metadata on the responsible person and date of cre-
datenrepositorium (PDR) of the Berlin-Brandenburgische ation (createdBy, createdWhen) and later updates
Akademie der Wissenschaften (Neumann et al., 2010) uses (modifiedBy, modifiedWhen). The person resource
the same conceptual model and by this the model spread needs no more properties than a local id (@id) and alter-
into projects based on the PDR, e.g. MusMig27 , or the Je- native URIs (uris). The source-object carries a descrip-
suit Science Network28 . tive string (label), a local id (@id) and alternative URIs
This model deviates from the personal databases that have (uris) as attributes. The statements (statement) are the
established themselves as controlled vocabularies and ref- most complex entity in the model as they should cover main
erences for Linked Data in the Digital Humanities. These information structures in prosopographical descriptions.
usually do not take into account the process of aggregating The data model of the statement is loosely based on an
information about a person from historical sources. Also, event model as every statement can include a temporal in-
the approach thus deviates from the TEI’s ”personogra- formation. However, following the proposal by (Tuomi-
phy”,29 which, like the linked data resources, describes a nen et al., 2017) for a Bio-CRM, it allows for unary
person with a list of characteristics, although TEI could be and binary properties as well. IPIF suggests using a
used to express the model, as the syriaca.org project has statementContent property to deliver a verbal de-
recently demonstrated (Schwartz, 2019). scription, e.g. a phrase extracted from a source or a short
There are several attempts to transfer the factoid con- biogram. This is the most generic property of a statement
ceptual model into RDF. (Pasin and Bradley, 2015) pro- and serves as a fallback for all information in an existing
posed a CIDOC CRM based version of the Factoid model database which cannot be mapped to other IPIF properties.
and published corresponding ontologies.30 The concep- Usually, the content can be described in a more structured
tual model has also been realized with other vocabular- way. The main scenarios given in section 2.1 suggest two
ies in SNAP, which re-uses vocabulary from the Linking types of information: the relationship to other persons and
Ancient-Wisdom project (Lawrence and Bodard, 2015). events. The first is realised in the relatesToPerson
The King’s Digital Lab has recently published the proso- property. Additionally a social relationship can be cre-
pographic database on the Roman Republic as a LOD re- ated by the membership in a group or institution, for which
source including a SPARQL endpoint using its own on- IPIF uses the property memberOf. The second scenario
tology to describe the Factoids.31 Wikidata offers a data is realised in a set of properties of an event in the biogra-
model to identify contributions as claims made by a con- phy of the person: Any event related to the person can be
tributor (Erxleben et al., 2014) and therefore can be consid- described by the role (role) of the person in the event,
ered as another proposal to implement the factoid concep- and its temporal (date) and geographical (place) allo-
tual model in RDF. Finally, the W3C has proposed a vocab- cation. The statement can describe the type of the event
ulary similar to the needs of the factoid model: The prove- (statementType). The samples collected by (Fokkens
nance data model (PROV, (Missier et al., 2013)) covers re- and ter Braake, 2018) suggest considering additional prop-
lationships between data and sources. In the PROV model erties to model lifespan, gender, and “occupation/claim of
the creation of a factoid is the activity (prov:activity) fame/person-type.” From the experience in APIS it has been
connecting the source (prov:used) and the resulting suggested to include creative and productive activities with
statement (prov:entity). (Ockeloen et al., 2013) have a dedicated property contributesTo.
applied this to biographical data. You can interpret the fac- It should be noted that there is no explicit event object. This
toid also as an annotation on the source and express it with is for two reasons, the first pragmatic: many existing data
structures (e.g. XML/TEI-P5 or simple csv tables) share
this flat structure of the descriptive properties of a person.
27
http://www.musmig.eu/home/ The model allows for easy mapping of these resources. The
28
http://jesuitscience.net/ second is that the event model is implicit in most of the in-
29
https://www.tei-c.org/release/doc/ formation on a person, so it can be inferred from the cur-
tei-p5-doc/en/html/ND.html#NDPERS
30 rent data model. Every statement can be converted into the
http://factoid-dighum.kcl.ac.uk/ and
https://github.com/johnBradley501/FPO
description of an event by simply adding a date. Conceptu-
31
http://romanrepublic.ac.uk/rdf/doc/
32
ontology.html https://www.w3.org/TR/annotation-model/
ally one could even argue that every statement is implicitly 3.2 Endpoints and parameters
rooted in time, although sometimes with unknown dates. The data model is reflected in the main endpoints of the
In practice, many users do not care for this unknown infor- API: factoids, statements, persons, sources.
mation. The practical advantages of this decision become The sources endpoint returns bibliographic or archival
clear when we consider the “Social Network” analysis sce- identification of the historical sources in which the infor-
nario. Any kind of social network needs two persons as mation on a person is based. This can be a verbal descrip-
nodes connected via an edge. The data model covers this tion (label) and list of URIs (uris) pointing to further
scenario efficiently by filtering all statements on a person information. The resource returns, as all the resources of
with the property relatesToPerson. Still, event-based the main endpoints, the local ID (@id) and metadata on the
modelling suggests that most of these relationships are not creation and modification of the entry. IPIF suggests using
stable, but could change in time. Even the biological rela- pointers to machine-readable resources, e.g. other RESTful
tionship to parents is established by the birth. The model APIs. Via the persons endpoint the API consumer can
represents this by making a statement possible which in- retrieve identification information on the individuals. This
cludes relationship information and a date. endpoint can be addressed by the local id or any other id
Practical considerations lead to an additional explicit prop- stored for the same person, in particular URIs from author-
erty, the name as the main identifier of a person in spo- ity files. It returns nothing else but the local id (@id) and a
ken language. Again, the advantage of the implicit event- list of alternative URIs (uris). The factoids endpoint
based model becomes clear considering the double use aggregates the information on source, person, statements
of names as general identifiers and as appellations which made, and the creation of the statements by a researcher as
can change in time. Binary statements relating the per- metadata to the factoid.
son to others, like family relationships, are covered by the All endpoints allow querying by parameters stored in re-
relatesToPerson property. The relationship can be sources linked to the current resource. The API defines for
specified using the role property. Unary statements like this parameters like statementId (return all resources
“main occupation” or “gender” are covered by a combina- which are related to an identified statement), sourceId
tion of the role property and statementType. The (return all resources related to source with the given id),
role property contains the predicate of this unary triple factoidId (return all resources related to a factoid with
like statements, and statementType the object. Thus, the given id), and personId (return all resources related
they can be considered to be a first fallback for parts of the to a person with the given id). This makes quick traver-
source model which do not map directly to the IPIF data sals through the prosopographical database possible. With
model. an API call like /statements?personId=ia14328
Most of the properties named above are defined as ob- you can receive all statements made upon an individual, e.g.
jects which can be described by an URI (uri) as well as to prepare a timeline by sorting the results with a date
as by a human readable label (label). This model ap- property.
plies to roles (role), places (place), relationships to In addition to the ids of resources, this generic search
persons (relatesToPerson), and group memberships pattern includes full-text search in all values stored as
(memberOf), and categories (statementType). property of the resource. ?st filters by a full text search
We expect that API providers use controlled vocabu- in all properties of a statement related to the current
laries for the URIs, e.g. the TEI guidelines on per- resource, ?f filters by full text search in the all properties
sonography,33 Bio-vocab34 , FOAF35 (Brickley / Miller resources connected to the current resource via a factoid,
2014), schema.org,36 or Swiss Art Research Project and ?p filters the current resource by a search in the
(SARI).37 IPIF suggests referencing these vocabularies in properties of the person object. /person?st=Bishop
the describe-Resource (/describe/vocabs). Only the returns all identifiers of persons, on which statements exist
name and the date are properties of the statement-object containing the word “Bishop.” The precise functionalities
with specialised models: the date can be described by a of the full text search are a decision of the API provider.
verbal label and a formalized sortDate in W3C schema IPIF expects to handle URIs as single words, that a search
format (xs:date), while name is just a string. /factoids?p=http%3A%2F%2Fviaf.org%2Fviaf%2F28977956
The factoid is the aggregation of the three objects, returns all factoids on Engelbert I, archbishop of Cologne,
i.e. the combination of statements on a person jus- as identified by his VIAF URI as stored in the person
tified by a source created by person. Therefore, it object’s uris property.
adds metadata on the responsible person and date of cre- As described above, the statements have the richest
ation (createdBy, createdWhen) and later updates data model. Therefore, statements can be queried not only
(modifiedBy, modifiedWhen). by id and a full-text query over all properties, but by sin-
gle property values as well, i.e. relatesToPerson,
33 place, memberOf, name, and role. Those proper-
https://www.tei-c.org/release/doc/
tei-p5-doc/en/html/ND.html#NDPER ties carrying a URI as descriptor can therefore be used for
34
http://vocab.org/bio/ linked data queries as the URIs stored for related person,
35
http://xmlns.com/foaf/spec/ place and even statement contents from a controlled vocab-
36
https://schema.org/Person ulary.
37
https://docs.swissartresearch.net/et/ Temporal searches are facilitated by from and to
person/ query parameters. A special interpretation mode is de-
fined to handle fragmentary data: Fragments (yyyy, all parameters supported by these two methods with the ex-
yyyy-mm) will be interpreted as exact time ranges ception of the fulltext searches p=, st=, s= and f=. Im-
if the second parameter is missing (from=yyyy-mm plementing this level should be easy, as it only maps API
is interpreted as from=start of month, to=end calls to rather simple database queries. Data providers can
of month). If conflicting data is present, for ex- extend existing services by this functionality or integrate
ample from=yyyy&to=yyyy-mm-dd, the most cor- a dedicated server software. We are working on a refer-
rect interpretation will be decided on by the backend. ence implementation for the API which can act as a proxy
For instance, the example above will be interpreted as to an existing database. This server will support pluggable
from=yyyy-01-01&to=yyyy-mm-dd. adapters in form of custom simple functions which imple-
Paging of filter results is supported by parameters for page mented these mappings to the database, while the rest of
size (size), page number to be returned (page) and sort- the software can be used unchanged.
ing (sortBy). Compliance level 1 also only supports read-only access
As applications might only want to traverse via a list of (HEAD and GET), but additionally allows filtering by full-
results, the API provider is asked to return only the id of text searches over multiple fields, so implements all param-
the resource matching to search parameters and the prop- eters for these two methods as documented in the API spec-
erty searched. This will in particular facilitate autocomplete ification. Again this level can be added to existing services
services like “type in a part of a name and the application or can be provided via a proxy like the reference implemen-
will list all the persons to which a factoid with the name- tation as described above.
statement containing this part exists,” where selecting an Compliance level 2 fully implements the API which ex-
entry can trigger other requests based on the id. tends level 1 with the possibility to add and modify data.
In the current definition of the API, the resources are re- Accordingly, level 2 is only useful for clients which add
turned as JSON objects, in which lists are represented as or manage data on a server. As writing data to existing
arrays and properties allowing a verbal and a formal repre- databases heavily depends on data models which will nor-
sentation as objects. mally not fit the IPIF data model, this will mostly be use-
It has to be noted, that in combination with the ful for new projects settled on the IPIF model, for projects
REST endpoint to the respective resource (e.g. which want to (cyclically) export data from their main
{server}/{api}/person/{id}), the local ids constitute database to provide it via a dedicated IPIF server, or for
globally unique URIs reusable as RDF resource identifier. collecting and integrating data from different sources into a
In fact, the responses of the API can be interpreted as RDF, single database. The reference implementation will support
e.g. by adding a context description to the JSON response, level 2 and therefore the described use cases.
for which a first draft can be found on github.38
A fifth resource returns metadata on the data provider itself: 4 Prototypes
/describe gives basic information about the service and The API is currently deployed in three prototypes as proof
the implementation of the API. This includes information of concept. The first is built upon an existing prosopo-
on the service provider (provider, contact), a verbal graphical database: APIS - Austrian Prosopographical In-
description of the service (description), and in partic- formation System.40 APIS is a research project financed by
ular a statement on the level of compliance to the API defi- the Austrian “Nationalstiftung.” Its ultimate goal is the se-
nition (complianceLevel). mantic enrichment of the Austrian Biographic Dictionary
To facilitate the creation of REST interfaces implementing (ÖBL).41 During the project, a system was developed at
the suggested API as front ends to existing databases, each ACDH that turned into a generic tool for storing and cu-
end point has to provide information about the supported rating prosopographical data, called the A(ustrian) Proso-
API version(s) and compliance levels. The API is orga- pographical Information System (APIS).42 A central idea
nized in 3 levels of compliance. The supported compliance of APIS is allowing automatic tools and human researchers
level of a specific server can be queried via an API call to to work on the same data. Therefore, it was and still is
server/api/describe. Based on this information a consuming essential to provide access to the data via APIs. APIS cur-
program can find out which requests are supported by the rently features two different APIs: a hyperlinked API that
server. On the server side, compliance levels allow deci- delivers atomic data and allows the retrieval of fine-grained
sions about how much effort is to be invested in the devel- information, and a second API that delivers everything that
opment of an API-compliant interface. Implementing level APIS holds on a specific entity. For the latter, APIS uses
0 will be easy and can be done without much effort; level so-called “renderers” to convert the original Python objects
1 is a bit more complicated; level 2 will require more de- coming from the serializer into various formats. Currently,
velopment work or the usage and/or adaption a third party APIS supports a full-featured json serialization using inter-
software like the one which will become available as refer- nal attributes, a very basic XML/TEI serialization, a basic
ence implementation.39
Compliance level 0 is a minimalistic implementation which 40
http://apis.acdh.oeaw.ac.at/
only supports HEAD and GET requests. It can be used with 41
The ÖBL is a national biography. It is written at the Austrian
Academy of Sciences since the 50s and contains roughly 20.000
38
https://github.com/GVogeler/ biographies of people who had an impact on what was then Aus-
prosopogrAPhI/blob/master/context.json trian soil.
39 42
https://github.com/GVasold/papilotte https://github.com/acdh-oeaw/apis-core
CIDOC CRM serialization (in various formats), and the se- class. Connectors for JSON-files and relational databases
rialization to the prosopography API format.43 It has to be including a built in SQLite database are part of the software
noted that the APIS reference implementation at the current or will be added as soon as the API has settled. Adding
stage only offers compliance level 1 of the IPIF definition custom connectors to other data sources is as simple as
(GET requests via identifier).44 implementing a single class for each type (factoid, per-
A second prototype has been implemented with data from son, source and statement) implementing three methods:
the monasterium.net charters database transferred into a get(), search() and count(). Each method queries
stand-alone IPIF server. It uses a data set collected in the the data source and returns the result in a format defined by
context of the FWF project “Illuminated Charters”45 which the IPIF specification. Papilotte takes care of the rest.
contains a rich set of names on cardinals issuing collective Custom connectors can be written to access any form
indulgences.46 Monasterium.net is an eXist-db instance us- of data source, for example existing relational databases,
ing TEI und CEI schemata47 for documents and prosopo- document stores like MongoDB, XML databases, graph
graphical data. The prosopographical data follows the pro- databases or triple stores. Connectors can also act as prox-
posal of the TEI “personography”48 with simple properties ies to web services like SPARQL endpoints or custom APIs
(, with for the pe- to make their data easily accessible via the IPIF API.
riod of activity) of a person referenced in the charter de- The third proof-of-concept implementation demonstrates
scription as a . The data was exported how IPIF could be used to deliver potential entity can-
from the monasterium XML database to a JSON represen- didates. The application acts as a middle layer between
tation following the proposed IPIF model using the XQuery SPARQL endpoints and Rest APIs on the one side and the
language. The data has been used as a first test for Papi- IPIF API on the other. Requests posted to the IPIF API
lotte, a flexible and extensible IPIF server currently being are transformed to SPARQL or Rest API GET queries and
developed at the Centre for Information Modelling of the forwarded to the respective APIs.
University of Graz.49 The application includes an admin backend to configure
Papilotte uses the Connexion library,50 a framework which the layer. Two steps need to be configured: the request
handles HTTP requests based on a given OpenAPI specifi- and the return. The configuration of the request uses a
cation. This means that Connexion reads-in the IPIF Ope- Python format string and ingests IPIF query attributes in
nAPI specification and routes paths defined in the specifi- the string. The formatting of the return json utilises the
cation to Python functions. It also relieves the programmer power of the Django html templating engine.51 Whatever
from annoying tasks such as validation of incoming and is returned from the queried APIs is ingested in these tem-
outgoing data against the schemata defined in the specifi- plates. The templating engine allows for simple process-
cation, validation and casting of request parameters, excep- ing such as for loops, if statements, concatenating, slicing,
tion handling, authorization and the like. Connexion is a and others directly from within the template. The proof
good example of how an OpenAPI specification can speed of concept application was configured for querying LO-
up and facilitate the development of software based on this BID52 and wikidata.org.53 Results from both APIs are pro-
API specification. Similar tools are available for many pro- vided as-is without any disambiguation. Figuren 1 shows
gramming languages and can also be used for data consum- the date of birth statements of two returns for the query
ing client applications. /person?f=Kreisky. The left snippet was returned
The core functionality of Papilotte is to provide an IPIF from LOBID, the right from Wikidata. Both entries can
conform REST interface, implementing the full functional- be matched via the list of uris also returned from the APIs
ity of IPIF on compliance level 2. Papilotte is very flexible and seamlessly merged in a richer entry.
in terms of data sources because data is provided via con-
Finally, the proof of concept of the IPIF can be demon-
figurable connectors based on a simple abstract connector
strated by a prototypical reuse of a simple network visual-
43 isation application. The application was developed for the
Please see https://gist.github.com/
sennierer/d0dc2311b36a00d0a88541e27988ae43 APIS project.54 It allows the iterative creation of network
and https://apis.acdh-dev.oeaw.ac.at/apis/ visualizations between entities (e.g. person-place, person-
api2/entity/43670/?format=json\%2Bprosop for person) and export the final network to graphml or a json
an example biographies serialized in the prosopography API format for further analysis in tools such as Gephi. The ap-
format. plication was refactored to read the data from the IPIF API.
44
The APIS internal APIs can of course be used to query for the The application was able to read the data from the MOM-
entities.
45
https://illuminierte-urkunden.uni-graz.
51
at/ https://docs.djangoproject.com/en/2.2/
46 topics/templates/
https://www.monasterium.net/mom/index/
52
BischoefeAblaesse LOBID is a service provided by “Hochschulbibliothekszen-
47
http://www.cei.lmu.de trum des Landes NRW” serving the GND via a Rest API: https:
48 //lobid.org/gnd
https://www.tei-c.org/release/doc/
53
tei-p5-doc/en/html/ND.html#NDPERSE Via the SPARQL endpoint
54
49
https://github.com/gvasold/papilotte, See https://bit.ly/2kNFsiC for the source code and
the implementation is currently available at https: https://pmb.acdh.oeaw.ac.at/apis/entities/
//ginko.uni-graz.at/illurk/api/ui/ networks/generic/ for a live version of the visualization
50
https://github.com/zalando/connexion tool
Figure 1: Results of a query service wrapping requests to LOBID (a) and WikiData (b) in IPIF
CA project and display them.55 This setup can be considered one prototypical use case
for IPIF. We will continue the proof of concept with addi-
5 Conclusion and Future Work tional scenarios. We consider in particular data aggregation
This paper identified the need for a lightweight system pro- from different resources, simple autocomplete and look up
moting the reuse of prosopographical data and tools in dif- activities, and the integration into information extraction
ferent contexts. This system should be much simpler than pipelines in which IPIF can provide ground truth with la-
the Web of Data proposals by the W3C and describe func- belled statementContent as interesting use cases for
tionalities shared in individual prosopographical solutions. the API.
The paper therefore suggests a shared RESTful API. It de- Further implementations should leverage RDF without tak-
scribes the simple data model—an outcome of two work- ing the risk of maintenance of a SPARQL endpoint. We
shops with developers and researchers working on proso- believe in taking the best of both worlds: the linked data
pographical data—as well as the definition of the API. capabilities and openness of RDF and the ease of use, main-
The API is built upon the well established “factoid” model tainability and structure of RESTful APIs. For this we pro-
which combines information on source, person, and state- vide a preliminary JSON-LD context description.56 If ac-
ments extracted from the source in a factoid. Its publicly cepted by a larger community the API could allow scholarly
available definition following the OpenAPI standard allows editions, authority files, or biographical or family history
automatic code generation. databases to be reused in prosopographical information sys-
As a proof of concept the paper has described how an ex- tems. First expressions of interest were made by the Peso-
isting biographical database (the Austrian Prosopographi- nendatenrepositorium of the BBAW, scholarly editors at the
cal information System - APIS) can easily provide its data Cologne Center for eHumanities,57 and the Correspsearch
via the API and how a new prosopographical database can project.58 In the end, a larger community experimenting
be created using Python code generation tools and data ex- with the definitions can demonstrate how the data model,
tracted from the monasterium.net charter database. The the API definition including its compliance levels can con-
common API consequently allows the reuse of a visual- tribute to reconstruct the social context of the people in the
isation tool originally developed for the APIS database. past.
56
55
https://github.com/GVogeler/
https://github.com/GVogeler/ prosopogrAPhI/blob/master/context.json
prosopogrAPhI/wiki/Tools: 57
From the Haller-project: http://hallernet.org/.
-Network-display-APIS 58
http://correspsearch.net
6 References International Conference on Extending Database Tech-
nology, pages 773–776. ACM.
Gerd Althoff. 1976. Zum Einsatz der elektronis-
Gerald Neumann, Fabian Körner, Thorsten Roeder,
chen Datenverarbeitung in der historischen Personen-
and Niels-Oliver Walkowski. 2010. Personendaten-
forschung. Freiburger Universitätsblätter, 52:17–32.
Repositorium. In Berlin-Brandenburgische Akademie
Suse Baeriswyl-Andresen. 2008. Das ”Repertorium der Wissenschaften. Jahrbuch 2010.
Academicum Germanicum”: Überlegungen zu einer N Ockeloen, A Fokkens, and Ter S Braake. 2013. Biog-
modellorientierten Datenbankstruktur und zur Auf- raphynet: Managing provenance at multiple levels and
bereitung prosopographischer Informationen der from different perspectives. Proceedings of the 3rd . . . .
graduierten Gelehrten des Spätmittelalters. In Städtische Michele Pasin and John Bradley. 2015. Factoid-based
Gesellschaft und Kirche im Spätmittelalter. Arbeitsta- prosopography and computer ontologies: towards an in-
gung auf Schloss Dhaun 2004, pages 17–36. tegrated approach. Literary and Linguistic Computing,
John Bradley and Harold Short. 2005. Texts into 30(1):86–97, April.
Databases: The Evolving Field of New-style Prosopog- Daniel L. Schwartz. 2019. Syriac Persons, Events, and Re-
raphy. Literary and Linguistic Computing, 20(Suppl):3– lations: A Linked Open Factoid-based Prosopography.
24, January. Jouni Tuominen, Eero Hyvonen, and Petri Leskinen. 2017.
Neithard Bulst. 1989. Prosopography and the computer: Bio CRM: A Data Model for Representing Biographical
problems and possibilities. 2. Data for Prosopographical Research. page 8.
Christophe Charle. 2015. Prosopography (Collective Bi-
ography). In James D. Wright, editor, International En-
cyclopedia of the Social & Behavioral Sciences (Second
Edition), pages 256–260. Elsevier, Oxford, January.
Bernhard Ebneth and Matthias Reinert. 2017. Potentiale
der Deutschen Biographie als historischbiographisches
Informationssystem. Europa baut auf Biographien: As-
pekte, Bausteine, Normen und Standards fr eine europis-
che Biographik, pages 283–295.
Fredo Erxleben, Michael Günther, Markus Krötzsch, Ju-
lian Mendez, and Denny Vrandečić. 2014. Introducing
Wikidata to the Linked Data Web. In Peter Mika, Tania
Tudorache, Abraham Bernstein, Chris Welty, Craig
Knoblock, Denny Vrandečić, Paul Groth, Natasha Noy,
Krzysztof Janowicz, and Carole Goble, editors, The Se-
mantic Web – ISWC 2014, Lecture Notes in Computer
Science, pages 50–65. Springer International Publishing.
Antske Fokkens and Serge ter Braake. 2018. Connecting
People Across Borders: a Repository for Biographical
Data Models. In Proceedings of the Second Conference
on Biographical Data in a Digital World 2017. Linz, Aus-
tria, November 6-7, 2017., volume 2119, pages 83–92.
CEUR Workshop Proceedings.
Karen Fox. 2019. The Cultural Journeys of Dictionaries of
National Biography. In True Biographies of Nations?’,
page 19. ANU Press.
Koen Goudriaan. 1995. Prosopography and computer:
contributions of mediaevalists and modernists on the use
of computer in historical research, volume 2. Garant.
Arthur E. Imhof. 1978. The computer in social history:
Historical demography in Germany. Computers and the
Humanities, pages 227–236.
Katharine SB Keats-Rohan. 2007. Prosopography ap-
proaches and applications: A Handbook, volume 13.
Occasional Publications UPR.
K. Faith Lawrence and Gabriel Bodard. 2015. Prosopogra-
phy is Greek for Facebook: The SNAP: DRGN Project.
In WebSci, pages 44–1.
Paolo Missier, Khalid Belhajjame, and James Cheney.
2013. The W3c PROV family of specifications for mod-
elling provenance metadata. In Proceedings of the 16th