On the description of process in digital scholarship David De Roure1 , Pip Willcox2 , and Alfie Abdul-Rahman1 1 Oxford e-Research Centre, University of Oxford, UK {david.deroure,alfie.abdulrahman}@oerc.ox.ac.uk, 2 Centre for Digital Scholarship, University of Oxford, UK pip.willcox@bodleian.ox.ac.uk Abstract. The techniques and tools of linked data are being successfully applied in scholarship across many disciplines. In humanities the focus has often been on describing the content of collections and expressing datasets so that they can be linked, thus improving discovery and facil- itating research. This paper suggests that descriptions of processes are also useful, be they historical processes or the process of scholarship it- self, and therefore worthy of attention at the intersection of semantic web and digital scholarship. We explore this through an exercise in describing the provenance of a ‘born digital’ and a historical artefact. Keywords: algorithmic composition, First Folio, Ada Lovelace, prove- nance, William Shakespeare, visualization 1 Introduction Scholars are often interested in establishing where something has come from, that is, the history of an artefact, be it analogue or digital. This knowledge facilitates interpretation and trust, and describing it digitally enables the use of digital tooling to visualize, search, link, and reuse this information, and thus facilitate the scholarly process. There have been efforts to provide means of description of processes and of provenance. For example, W3C PROV provides a data model for provenance information, with multiple serializations including RDF [1]. Another is CIDOC- CRM, which provides the CRM Digital ontology and RDF Schema to encode metadata about the steps and methods of producing digitization products [2]. For our illustrative exercise we experiment with PROV, especially as it is attracting activity in multiple disciplines. We do not address here the collection, sharing, and linkage of multiple process descriptions, but as proof of concept we note previous work in RDF descriptions of workflows [3]. In the next section we demonstrate provenance representation for algorith- mically generated music, arising from research into the life of Ada Lovelace. Our second case study is based on the Bodleian First Folio of Shakespeare’s plays [4], a physical artefact with a more recent digital manifestation. These provide ex- amples of what can be usefully captured in one representation, and what we 80 A. Adamou, E. Daga and L. Isaksen (eds.) would like to be able to represent. In particular we offer the First Folio as a hybrid physical-digital case study for future work. 2 Numbers Into Notes December 2015 saw the 200th anniversary of the birth of Ada Lovelace. A ma- jor symposium was held to mark the occasion, including the discussion of a thought experiment: had Ada Lovelace lived longer, and had Charles Babbage successfully built the analytical engine, what might have happened to pursue Lovelace’s observation that “the engine might compose elaborate and scientific pieces of music of any degree of complexity or extent” (note A in [5]). We called this exercise Numbers into Notes. As part of this we developed an interactive tool for people to generate mu- sic from integer sequences. The workflow of the tool mirrors our hypothesized workflow involving the analytical engine: the machine runs a parameterized pro- gram to generate a number sequence, and parts of this sequence are then given to different instruments. Inspired by the use of punched cards in the Jacquard loom and the proposed analytical engine, we generate virtual ‘piano rolls’. The programmer and operator (or ‘attendant’) were not allowed to change the num- bers generated by the machine, but had full control of the mapping from num- bers to notes and then from notes to instruments. The interactive tool, a single page web application (http://demeter.oerc.ox.ac.uk/NumbersIntoNotes/), pro- vides several algorithms which illustrate the mathematics of the early 19th cen- tury (the primary example involves generalized Fibonacci sequences, reduced by modular arithmetic to produce periodic sequences). The final stage of the workflow is to export the musical fragment in vari- ous formats, one of which is metadata with an automatically generated natural language description of the algorithm parameters, mapping, and selection. We did this to enable someone at a later stage to be able to understand how the fragment was generated or indeed to regenerate the fragment using different tooling, i.e. to reproduce the results of the experiment. For this same reason, one of the output formats is W3C PROV-N, from which RDF can be generated, as well as an SVG visualization as shown in Figure 1. These conversions use the ProvToolbox software (http://lucmoreau.github.io/ProvToolbox/). 3 The Bodleian First Folio The Bodleian First Folio of Shakespeare’s plays is a physical book with a digital manifestation which facilitates scholarship. First we describe in narrative form the provenance that we wish to represent. Shakespeare wrote, or co-wrote as recent scholarship suggests, plays for his friends and fellow actors, and seems to have crafted parts to suit their particular talents. The plays, in the forms in which they reach us, are generally longer than could practically have been performed in contemporary theatres. It may not be unreasonable to suggest performances were cut according to anticipated 1st Workshop on Humanities in the Semantic Web - WHiSe 2016 81 Fibonacci gen n0: 0 n1: 1 IntegerSequence k: 1 use ReduceByModulus gen mod: 35 ModSequence use Select Composer gen type: edit assoc count: 22 bl: 9,G4 Selection type: prov:Person tr: 78,A5 Fig. 1. The first part of a provenance graph generated by the Numbers Into Notes tool. audiences’ preferences, and not unfeasible that versions of a play were co-created dynamically, by actors responding to audience reaction. The first collection of Shakespeare’s plays (1623) is likely to derive at least in part from prompt books of the King’s Men (as the company to which Shake- speare belonged was called from 1603). The First Folio, as it came to be known, was published as a joint venture by a consortium of printers – Edward Blount, William and later Isaac Jaggard, William Aspley, and John Smethwick – and two of Shakespeare’s fellow actors and friends, John Heminge and Henry Con- dell. It republished 18 of its 36 plays with varying degrees of textual variance, publishing the other 18 for the first time. One copy of the First Folio’s print-run (estimated at between 750 and 800 copies) was sent to the Bodleian Library in Oxford, presumably under the 1610 agreement with the Stationers’ Company. As was common, it arrived unbound, ‘in sheets’, and it was sent to a local bookbinder, William Wildgoose, to be bound strongly but plainly in brown calfskin. The book remained accessible – chained on shelf – in the Library for at least the next 40 years, apparently much read. This copy left the library, probably sold after it had been superseded by the Third Folio of 1663/4. Lost to view for about 240 years, in 1905 Gladwyn Turbutt, an Oxford undergraduate, brought his family copy to the Bodleian Li- 82 A. Adamou, E. Daga and L. Isaksen (eds.) brary’s enquiry desk for advice on its dilapidated and lacklustre binding. The de- sire to return the book to its original owners inspired a private then a public fund- ing campaign – “Oxford men” (at whom the campaign was directed, although by neither education nor gender were the donors so restricted) contributing to the local and national commons. The successful campaign saw the book returned to the Bodleian Library, still (and apparently uniquely among First Folios) in its original binding, and made fragile through frequent use by its early readers. Its physical condition meant access to the book was restricted, and few scholars were able to study it. 2012 saw a second public campaign to fund its stabilization, digitization, and publication freely online (http://firstfolio.bodleian.ox.ac.uk/). We are developing descriptions of the provenance of this First Folio. Fig- ure 2 shows a fragment of a simplified PROV-N visualization. It begins with the physical manifestation of a play, written out in parts for the respective actors, and as a prompt book. The co-creational activities of rehearsal and backstage annotation of the prompt book produce a text which is performed. For the sake of this exercise the text which arises from this performance is imagined as a fair copy. By the hands of Shakespeare’s friends John Heminge and Henry Condell this fair copy is taken to the consortium which prints multiple copies, one of which is sent to the Bodleian Library under the agreement with the Stationers’ Company. The Library then commissions its binding by William Wildgoose. A more elaborate graph makes use of the notions of actors and plans, as with the ‘composer’ in our first example, and we can use the notions of specialization (specializationOf) and invalidation (wasInvalidatedBy) to capture modification and relocation of the physical work. Graphs are available on the lead author’s website http://www.oerc.ox.ac.uk/people/dder. 4 Discussion The PROV output from Numbers into Notes was definitive and achieves the desired purpose. However, working the First Folio through using PROV raises many questions. Our hypothetical premise was the usefulness to scholars of link- ing across provenances to the various entities and agents we declared: the John Heminge who ostensibly co-edits the First Folio’s text and works with the print- ers’ consortium is also a shareholder in The Globe theatre, the company’s finan- cial manager, husband to Rebecca Knell, a beneficiary of Shakespeare’s will, and so on. At this level, when linked to other relevant data, our description could be useful. With scant contemporary record, even aspects of provenance that are gener- ally undisputed and verified by other research can be traced to a scholar’s original work. Both to credit and to attribute the scholarship an extra field seems re- quired, one step removed from the provenance itself – indeed, the provenance of the provenance. A level of certainty would also be helpful: one scholar’s claim might logically fill a gap in the provenance, but if it were unattested elsewhere it could usefully be more hazily visualized than uncontested nodes. We also note the approach of [6] which captures links to ‘spots’ in primary sources. 1st Workshop on Humanities in the Semantic Web - WHiSe 2016 83 annotatedpromptbook use copy gen n: Condell, Henry n: Heminge, John faircopy use print gen n: Jaggard, Isaac n: Blount, Edward n: Aspley, William n: Smethwick, John copy1 use deposit gen n: Stationers' Company Agreement bff use bind Fig. 2. Fragment of the provenance graph for the Bodleian First Folio. Plurality repeatedly vexed our description. We have knowingly taken one of the world’s most studied books as our subject, as a form of stress test for the encoding. While one book, it is made up of preliminaries and 36 plays, each of which has its own, much studied, history. A play’s text could have more than one immediate source: a copy of a play previously printed in quarto format may have been available as its text was composed in a print shop. Print is plural by design, and variants in the text make each copy of this relatively common early modern book unique, but we describe the First Folio’s provenance up to the point that a copy is sent to the Bodleian Library as though it were one work. We collapse the many processes of book production and so obscure the elements of production that make copies unique. However, the digital phase of the book’s existence yields well to PROV: scans of pages, transcription to TEI (including annotations based on the materiality of the physical book), renderings on screen, and downloads in XML and PDF. We offer the Bodleian First Folio as a challenge in process description encompassing the analogue and digital, involving uncertainty and itself the subject of scholarly process, and hope that others might encode it comprehensively in current or future representations. 84 A. Adamou, E. Daga and L. Isaksen (eds.) The exercise of capturing the provenance has proven interesting in its own right, using a mixture of drawing and hand-encoding, and we suggest that inter- active visualizations of provenance are useful tools. This is further evidenced by visualization work in humanities, such as visual analytics for intertextuality [7] and poetry visualization [8]. The algorithmic composition example captures the provenance of a sonification, which is essentially a kind of visualization, and we suggest that describing the provenance of a visualization has similar utility in interpretation and reuse of scholarship. Acknowledgements This work is partially supported by Fusing Semantic and Au- dio Technologies for Intelligent Music Production and Consumption funded under EP- SRC grant EP/L019981/1. We are grateful to Graham Klyne for his advice on using PROV. References 1. Gil, Y., Miles, S., Belhajjame, K., Deus, H., Garijo, D., Klyne, G., Missier, P., Soiland-Reyes, S., Zednik, S.: PROV Model Primer. Working group note, W3C (2013) 2. Theodoridou, M., Tzitzikas, Y., Doerr, M., Marketakis, Y., Melessanakis, V.: Mod- eling and querying provenance by extending CIDOC CRM. Distributed and Parallel Databases, 27(2), 169–210 (2010) 3. Newman, D., Bechhofer, S., De Roure, D.: myExperiment: An ontology for e- Research. In Semantic Web Applications in Scientific Discourse, volume 523 of CEUR Workshop Proceedings: http://ceur-ws.org/Vol-523/ (2009) 4. Shakespeare, W., Heminge, J., Condell, H., Droeshout, M., Jaggard, I., Blount, E., Jaggard, W., Smethwicke, J., Aspley, W.: Mr. William Shakespeares comedies, histories, & tragedies. Published according to the true originall copies. Printed by Isaac Iaggard, and Ed. Blount at the charges of W. Iaggard, Ed. Blount, I. Smithweeke, and W. Aspley, London. Oxford, Bodleian Library, Arch. G c.7 (1623) 5. Lovelace, A.A.: Sketch of the analytical engine invented by Charles Babbage, with notes by the translator. In Scientific Memoirs, Selected from the Transactions of Foreign Academies of Science and Learned Societies, Vol. 3, 1843, pp. 666-731, volume 3. Richard and John E. Taylor, Red Lion Street, Fleet Street, London. Translation of Notions sur la machine analytique de M. Charles Babbage by Luigi Federico Menabrea, in Bibliothèque Universelle de Genève, nouvelle série 41, 352–76 (1842) 6. Pasin, M., Bradley, J.: Factoid-based prosopography and computer ontologies: to- wards an integrated approach. Digital Scholarship in the Humanities, 30(1), 86–97 (2015) 7. A. Abdul-Rahman, A., Roe, G., Olsen, M., Gladstone, C., Whaling, R., Cronk, N., Morrissey, R., Chen, M.: Constructive visual analytics for text similarity detection. Computer Graphics Forum, doi:10.1111/cgf.12798 (2016) 8. Abdul-Rahman, A., Lein, A.J., Coles, K., Maguire, E., Meyer, M., Wynne, M., Johnson, C.R., Trefethen, A., Chen, M.: Rule-based visual mappings – with a case study on poetry visualization. Computer Graphics Forum, 32(3), 381–390 (2013)