=Paper=
{{Paper
|id=Vol-2532/paper1
|storemode=property
|title=Digital Edition Publishing Cooperative for Historical Accounts and the Bookkeeping Ontology
|pdfUrl=https://ceur-ws.org/Vol-2532/paper1.pdf
|volume=Vol-2532
|authors=Christopher Pollin
|dblpUrl=https://dblp.org/rec/conf/rodbh/Pollin19
}}
==Digital Edition Publishing Cooperative for Historical Accounts and the Bookkeeping Ontology ==
T. Riechert, F. Beretta, G. Bruseker (Ed.) RODBH 2019, Proceedings of the Doctoral Symposium on Research on Online Databases in History 2019 7 Digital Edition Publishing Cooperative for Historical Accounts and the Bookkeeping Ontology Christopher Pollin1 Abstract: The Project ”Digital Edition Publishing Cooperative for Historical Accounts”, a Andrew W. Mellon funded cooperation of five US partners and the Centre for Information Modelling at Graz University, aims to link the knowledge domain of economic activities to historical accounting records. For this purpose the so-called Bookkeeping Ontology is developed. DEPCHA creates a publication hub for digital editions on the web. It converts multiple formats into RDF and publishes these in combination with the associated transcriptions. DEPCHA also allows the usage of retrieval and visualization functionalities, as well as interoperability and reuse of information in the sense of Linked Open Data. Keywords: Web of Data; GAMS; Historical Financial Records; Bookkeeping Ontology; Knowledge Domain; History; Digital Humanities; Semantic Web; Linked Open Data 1 Introduction On the first of August 1808 James Haley purchased ¼ lb of powder, 1 lb of shot and 1 lb of sugar for the price of 2 shilling and 6 pence each from Stagville Plantation in North Carolina (USA). We can find information like that, and numerous similar ones, in historical financial records. In the 1980s, two groups emerged which applied different approaches to tackle such historical sources: the ”traditionalists” and the ”quantifiers”. While the ”traditionalists” used a hermeneutic approach to historical sources, the ”quantifiers” tried to formally describe and evaluate the historical dimensions, in order to support an intuitive process of understanding with empirically identifiable facts [JAT85]. This divide tells us that, when using formal methods on historical data, research should distinguish between the representation of the original source and its interpretation. The latter is the core knowledge domain of historical research. It is advised to share the basic assumptions and definitions in a knowledge domain in a formal way [Th17]. In this context, the Web of Data (aka Semantic Web) and Linked Open Data are central concepts that offer technologies going hand in hand with that new understanding of historical research. The hermeneutic method and the method of transforming historical phenomena into formal models, as well as its connections to other domains and its reuse by the scientific community, makes the work of historians more dynamic and comprehensible. 1 University of Graz, Centre for Information Modelling - ACDH, Elisabethstraße 59/III, 8010 Graz, Austria christopher.pollin@uni-graz.at Copyright © 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). c b 8 Christopher Pollin The Project ”Digital Edition Publishing Cooperative for Historical Accounts” (DEPCHA), a Mellon funded cooperation of five US partners and the Centre for Information Modelling at Graz University, aims to link the knowledge domain of economic activities to historical accounting records. After the discussion about common entities in historical financial records in the beginning of this paper, the second part focuses on the formalization of these entities within the Bookkeeping Ontology. The third section defines a workflow to publish RDF data, as part of the digital editions of historical financial records, as Linked Open Data. In conclusion, future challenges and results concerning ontology engineering, retrieval and visualization functionalities of the web prototype2 are discussed briefly. 2 Historical Financial Records and Relevance Historical financial records provide rich and highly structured data sets over long periods, containing substantial amounts of individual information. This individual information is often not in the core of research interest. Instead, the records acquire their significance in aggregation of the single entries. Pure transcription does not cover the full range of dimen- sions of such a source: the linguistic/textual, the quantifiable and the semantic dimension [Vo15], [Vo16]. For research purpose, historical sources are subject to a transformation process towards (linked) information sources that can be used in various research scenarios. In order to illustrate this we will discuss three case studies of project partners and their respective research interests, which go far beyond economic and administrative aspects. The George Washington Financial Papers (1748-1799)3 gives insight into the life of George Washington and other topics such as the material culture, social history, manufacturing and agriculture. The financial papers exist as digital edition, created and published via a Drupal4 based editorial platform, and aim to make Washington’s records freely accessible. The platform allows editing and publishing financial documents and gives the users the possibility to perform simple analytical functionalities. Samples of research questions that could be of interest to historians are: How much money did Washington spent annually and for which specific commodities? Which role slave trade plays in his business? How did the price of certain commodities fluctuate? What did the network of partners look like and who did business with him? How was the value of tobacco calculated through different currencies [St14]? The Wheaton Accounts (1828-1859) contain a daybook5 of a store selling commodities of daily life. The digital edition follows a TEI/XML approach. It extends the range of questions to historical narratives and geographical information. It is interesting to follow an individual or a family as they appear in the daybook over time and reconstruct their social background 2 DEPCHA Prototype, https://gams.uni-graz.at/depcha 3 The George Washington Financial Papers Project, www.financial.gwpapers.org 4 Open-source content management framework, www.drupal.org 5 Daybook of L.M. Wheaton’s Store, expenses of building houses and barns, and expenses of constructing Wheaton Female Seminary buildings, http://hdl.handle.net/11040/17982 DEPCHA and the Bookkeeping Ontology 9 for a historical narrative. The same applies to geographic information allowing to track geographic relationships of people or the origin of commodities [TB13]. The digitization project of the Stagville Financial Papers (1767-1892), including daybooks and ledgers from the Stagville plantation store in North Carolina, follows an open science and crowdsourcing approach using From the Page6 to transcribe and encode the material. Research questions in this context include the numbers and connections of customers, as well as commodities, which ”go together”. Furthermore, economic dependencies (who is in debt of whom) as well as the social status of customers (e.g. free/enslaved) are of interest [BA15]. 3 Methods Common structures can be drawn from the research questions mentioned above. To do so, data must be prepared and structured according to a formal set of rules. The Bookkeeping Ontology7, a conceptual data model based on the REA [Mc82] model and CIDOC CRM8, is developed in an ontology engineering process, involving historians, software developers and digital humanists. The ontology is published in a stable version in GAMS9 [SS18] and in OntoMe10 and can be further discussed by the scientific community. The Bookkeeping Ontology formalizes the interpretation of a transaction (bk:Transaction) as combination of transfers (bk:Transfer) of measurable objects (bk:Measurable) from one accounting object (bk:Between) to another. bk:Between defines an abstract class, which unites bookkeeping categories (bk:Accounts, e.g. a cash account) and actors (individual bk:Party e.g. Washington or an unknown group of individuals bk:Group e.g. four farmers). Its physical representation in a historical source is an entry in a written accounting record (bk:Entry). The bk:Entry is an information fragment of a bk:Transaction often naming only one party, while the other party is implicit in the textual context of the entry. Further information on the temporal (bk:when), spatial (bk:where) dimension of a bk:Transaction as, well as the status (bk:status) of it (”partly paid”), can be expressed optionally. In regard to one of our research questions named above a bk:Transaction can be assigned to a specific context. Every transaction consists (bk:consistsOf ) at least one transfer (bk:Transfer). A single bk:Transfer describes the action of transferring a bk:Measurable in one direction (bk:from or bk:to). bk:Measurable is defined as everything that can be quantified. It has subclasses for economic goods (bk:EconomicGood, as labor: bk:Service or as physical things: bk:Commodity) and money (bk:MonetaryValues). bk:Measurable is describe by its quantity (bk:quantity) and the unit of calculation (bk:unit). The bk:Entry is described by the transcription fragment of the original source (bk:text). bk:EconomicGoods can be 6 Crowdsourcing manuscript transcription plattform, www.fromthepage.com 7 Bookkeeping Ontology in DEPCHA, https://gams.uni-graz.at/o:depcha.bookkeeping 8 Semantic framework for mapping cultural heritage information, www.cidoc-crm.org 9 Humanties Asset Management System at Graz University, https://gams.uni-graz.at 10 Ontology Management Environment, ontologies.dataforhistory.org 10 Christopher Pollin categorized (what is measured) and can be assigned a price. A bk:Transfer can be carried out by (bk:by) someone who conducts the transfer process in place of the business partner (bk:Agent). When writing it down into the ledger, accounting categories (bk:debit and bk:credit) are coded optionally. The DEPCHA web prototype is realized in the GAMS infrastructure11, an open source, FEDORA12 based digital repository for storing and publishing data in the humanities. Digital objects, containing multiple data streams and methods, allow disseminating data via HTML, as archival data in XML, and via various APIs. Furthermore, GAMS implements a disseminator for RDF data via the triplestore Blazegraph.13 An encapsulated query object stores predefined SPARQL queries including fulltext search over the RDF data sets. This provides data for retrieval, analysis and visualization [St18]. 4 From Digital Edition to Linked Open Data TEI/XML serves as an interchange format for the data from different systems (Drupal, From the Page, or generic CSV). Historical financial records are transcribed and annotated in these respective formats or directly in TEI/XML. TEI allows structuring the textual dimension of a source, marks up text-specific phenomena and normalizes places, dates or persons. The semantic relations covered by the Bookkeeping Ontology are inserted into the XML/TEI via attribute @ana. It allows a global, multiple and outgoing annotation of any structure in the TEI markup. During ingest into GAMS, XSLT extracts the annotated structures in the XML/TEI and transforms it into RDF. The repository stores this data in the triple store. This approach has already been successfully applied in other projects regarding historical sources, like the Municipal accounts of the city of Basel 1535-161114, the Urfehdebücher of the city of Basel - digital edition15 [PV17]. The following example illustrates the workflow from the historical source to RDF data. Following figure represents the origin of an entry in the Stagville Accounts. Fig. 1: Entry in the Stagville Accounts 11 GAMS and Cirilo Client: Policies, documentation and tutorial, https://gams.uni-graz.at/o:gams.doku 12 Flexible Extensible Digital Object and Repository Architecture, https://duraspace.org/fedora 13 Triplestore and graph database, https://www.blazegraph.com 14 Municipal accounts of the city of Basel 1535-1611, https://gams.uni-graz.at/srbas 15 Urfehdebücher of the city of Basel - digital edition, https://gams.uni-graz.at/ufbas DEPCHA and the Bookkeeping Ontology 11 In addition to the table structure of this example, historical financial records can also be represented in other textual structures, such as continuous text or lists. To make the connection to the Bookkeeping Ontology comprehensible, referencing concepts are identified by the bk-prefix in the XML annotation. The entry in the sample above can be interpreted as follows: The person James Haley (bk:Party) buys ¼ lb of powder, 1 lb of shot and 1 lb of sugar (bk:EconomicGood) from Stagville (bk:Party) and transfers in return the monetary value of 7 shilling and 6 pence to Stagville (bk:MonetaryValue). Some information is not explicitly mentioned in a bk:Entry and could be found in the header or is known by the editor of the source. As semantics are defined through the attribute @ana, the textual structure of the TEI document is not relevant for further processing and the full expressiveness of TEI can be used. The starting point is to define a container for a bk:Transaction by annotating the textual representation of an bk:Entry by ana=”bk:entry”. The following simplified XML/TEI snippet illustrates this very entry and its conceptual counterpart bk:Transaction, which consists of two bk:Transfers. One of them transfers powder, sugar and shot, the other bk:Transfers transfers 7 shilling and 6 pence. List. 1: Simplified XML/TEI-SnippetStagville August 1st 1808