<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>On the description of process in digital scholarship</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>David De Roure</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pip Willcox</string-name>
          <email>pip.willcox@bodleian.ox.ac.uk</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Al e Abdul-Rahman</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Centre for Digital Scholarship, University of Oxford</institution>
          ,
          <country country="UK">UK</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Oxford e-Research Centre, University of Oxford</institution>
          ,
          <country country="UK">UK</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2016</year>
      </pub-date>
      <fpage>79</fpage>
      <lpage>84</lpage>
      <abstract>
        <p>The techniques and tools of linked data are being successfully applied in scholarship across many disciplines. In humanities the focus has often been on describing the content of collections and expressing datasets so that they can be linked, thus improving discovery and facilitating research. This paper suggests that descriptions of processes are also useful, be they historical processes or the process of scholarship itself, and therefore worthy of attention at the intersection of semantic web and digital scholarship. We explore this through an exercise in describing the provenance of a `born digital' and a historical artefact.</p>
      </abstract>
      <kwd-group>
        <kwd>algorithmic composition</kwd>
        <kwd>First Folio</kwd>
        <kwd>Ada Lovelace</kwd>
        <kwd>provenance</kwd>
        <kwd>William Shakespeare</kwd>
        <kwd>visualization</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>Scholars are often interested in establishing where something has come from,
that is, the history of an artefact, be it analogue or digital. This knowledge
facilitates interpretation and trust, and describing it digitally enables the use
of digital tooling to visualize, search, link, and reuse this information, and thus
facilitate the scholarly process.</p>
      <p>
        There have been e orts to provide means of description of processes and of
provenance. For example, W3C PROV provides a data model for provenance
information, with multiple serializations including RDF [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Another is
CIDOCCRM, which provides the CRM Digital ontology and RDF Schema to encode
metadata about the steps and methods of producing digitization products [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. For
our illustrative exercise we experiment with PROV, especially as it is attracting
activity in multiple disciplines. We do not address here the collection, sharing,
and linkage of multiple process descriptions, but as proof of concept we note
previous work in RDF descriptions of work ows [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>
        In the next section we demonstrate provenance representation for
algorithmically generated music, arising from research into the life of Ada Lovelace. Our
second case study is based on the Bodleian First Folio of Shakespeare's plays [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ],
a physical artefact with a more recent digital manifestation. These provide
examples of what can be usefully captured in one representation, and what we
would like to be able to represent. In particular we o er the First Folio as a
hybrid physical-digital case study for future work.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Numbers Into Notes</title>
      <p>
        December 2015 saw the 200th anniversary of the birth of Ada Lovelace. A
major symposium was held to mark the occasion, including the discussion of a
thought experiment: had Ada Lovelace lived longer, and had Charles Babbage
successfully built the analytical engine, what might have happened to pursue
Lovelace's observation that \the engine might compose elaborate and scienti c
pieces of music of any degree of complexity or extent" (note A in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]). We called
this exercise Numbers into Notes.
      </p>
      <p>As part of this we developed an interactive tool for people to generate
music from integer sequences. The work ow of the tool mirrors our hypothesized
work ow involving the analytical engine: the machine runs a parameterized
program to generate a number sequence, and parts of this sequence are then given
to di erent instruments. Inspired by the use of punched cards in the Jacquard
loom and the proposed analytical engine, we generate virtual `piano rolls'. The
programmer and operator (or `attendant') were not allowed to change the
numbers generated by the machine, but had full control of the mapping from
numbers to notes and then from notes to instruments. The interactive tool, a single
page web application (http://demeter.oerc.ox.ac.uk/NumbersIntoNotes/),
provides several algorithms which illustrate the mathematics of the early 19th
century (the primary example involves generalized Fibonacci sequences, reduced by
modular arithmetic to produce periodic sequences).</p>
      <p>The nal stage of the work ow is to export the musical fragment in
various formats, one of which is metadata with an automatically generated natural
language description of the algorithm parameters, mapping, and selection. We
did this to enable someone at a later stage to be able to understand how the
fragment was generated or indeed to regenerate the fragment using di erent
tooling, i.e. to reproduce the results of the experiment. For this same reason,
one of the output formats is W3C PROV-N, from which RDF can be generated,
as well as an SVG visualization as shown in Figure 1. These conversions use the
ProvToolbox software (http://lucmoreau.github.io/ProvToolbox/).
3</p>
    </sec>
    <sec id="sec-3">
      <title>The Bodleian First Folio</title>
      <p>The Bodleian First Folio of Shakespeare's plays is a physical book with a digital
manifestation which facilitates scholarship. First we describe in narrative form
the provenance that we wish to represent.</p>
      <p>Shakespeare wrote, or co-wrote as recent scholarship suggests, plays for his
friends and fellow actors, and seems to have crafted parts to suit their particular
talents. The plays, in the forms in which they reach us, are generally longer
than could practically have been performed in contemporary theatres. It may
not be unreasonable to suggest performances were cut according to anticipated</p>
      <p>Fibonacci</p>
      <p>gen
IntegerSequence</p>
      <p>use</p>
      <p>ReduceByModulus
audiences' preferences, and not unfeasible that versions of a play were co-created
dynamically, by actors responding to audience reaction.</p>
      <p>The rst collection of Shakespeare's plays (1623) is likely to derive at least
in part from prompt books of the King's Men (as the company to which
Shakespeare belonged was called from 1603). The First Folio, as it came to be known,
was published as a joint venture by a consortium of printers { Edward Blount,
William and later Isaac Jaggard, William Aspley, and John Smethwick { and
two of Shakespeare's fellow actors and friends, John Heminge and Henry
Condell. It republished 18 of its 36 plays with varying degrees of textual variance,
publishing the other 18 for the rst time.</p>
      <p>One copy of the First Folio's print-run (estimated at between 750 and 800
copies) was sent to the Bodleian Library in Oxford, presumably under the 1610
agreement with the Stationers' Company. As was common, it arrived unbound,
`in sheets', and it was sent to a local bookbinder, William Wildgoose, to be
bound strongly but plainly in brown calfskin. The book remained accessible {
chained on shelf { in the Library for at least the next 40 years, apparently much
read.</p>
      <p>This copy left the library, probably sold after it had been superseded by
the Third Folio of 1663/4. Lost to view for about 240 years, in 1905 Gladwyn
Turbutt, an Oxford undergraduate, brought his family copy to the Bodleian
Library's enquiry desk for advice on its dilapidated and lacklustre binding. The
desire to return the book to its original owners inspired a private then a public
funding campaign { \Oxford men" (at whom the campaign was directed, although by
neither education nor gender were the donors so restricted) contributing to the
local and national commons. The successful campaign saw the book returned to
the Bodleian Library, still (and apparently uniquely among First Folios) in its
original binding, and made fragile through frequent use by its early readers. Its
physical condition meant access to the book was restricted, and few scholars were
able to study it. 2012 saw a second public campaign to fund its stabilization,
digitization, and publication freely online (http:// rstfolio.bodleian.ox.ac.uk/).</p>
      <p>We are developing descriptions of the provenance of this First Folio.
Figure 2 shows a fragment of a simpli ed PROV-N visualization. It begins with the
physical manifestation of a play, written out in parts for the respective actors,
and as a prompt book. The co-creational activities of rehearsal and backstage
annotation of the prompt book produce a text which is performed. For the sake
of this exercise the text which arises from this performance is imagined as a fair
copy. By the hands of Shakespeare's friends John Heminge and Henry Condell
this fair copy is taken to the consortium which prints multiple copies, one of
which is sent to the Bodleian Library under the agreement with the Stationers'
Company. The Library then commissions its binding by William Wildgoose.</p>
      <p>A more elaborate graph makes use of the notions of actors and plans, as with
the `composer' in our rst example, and we can use the notions of specialization
(specializationOf) and invalidation (wasInvalidatedBy) to capture modi cation
and relocation of the physical work. Graphs are available on the lead author's
website http://www.oerc.ox.ac.uk/people/dder.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Discussion</title>
      <p>The PROV output from Numbers into Notes was de nitive and achieves the
desired purpose. However, working the First Folio through using PROV raises
many questions. Our hypothetical premise was the usefulness to scholars of
linking across provenances to the various entities and agents we declared: the John
Heminge who ostensibly co-edits the First Folio's text and works with the
printers' consortium is also a shareholder in The Globe theatre, the company's
nancial manager, husband to Rebecca Knell, a bene ciary of Shakespeare's will, and
so on. At this level, when linked to other relevant data, our description could be
useful.</p>
      <p>
        With scant contemporary record, even aspects of provenance that are
generally undisputed and veri ed by other research can be traced to a scholar's original
work. Both to credit and to attribute the scholarship an extra eld seems
required, one step removed from the provenance itself { indeed, the provenance of
the provenance. A level of certainty would also be helpful: one scholar's claim
might logically ll a gap in the provenance, but if it were unattested elsewhere
it could usefully be more hazily visualized than uncontested nodes. We also note
the approach of [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] which captures links to `spots' in primary sources.
annotatedpromptbook
      </p>
      <p>use
copy
n: Condell, Henry
n: Heminge, John
gen
faircopy</p>
      <p>use
print
n: Jaggard, Isaac n: Blount, Edward
n: Aspley, William n: Smethwick, John
n: Stationers' Company Agreement
gen
copy1</p>
      <p>use
deposit
gen
bff</p>
      <p>use
bind</p>
      <p>Plurality repeatedly vexed our description. We have knowingly taken one of
the world's most studied books as our subject, as a form of stress test for the
encoding. While one book, it is made up of preliminaries and 36 plays, each of
which has its own, much studied, history. A play's text could have more than
one immediate source: a copy of a play previously printed in quarto format may
have been available as its text was composed in a print shop. Print is plural by
design, and variants in the text make each copy of this relatively common early
modern book unique, but we describe the First Folio's provenance up to the
point that a copy is sent to the Bodleian Library as though it were one work.
We collapse the many processes of book production and so obscure the elements
of production that make copies unique.</p>
      <p>However, the digital phase of the book's existence yields well to PROV: scans
of pages, transcription to TEI (including annotations based on the materiality of
the physical book), renderings on screen, and downloads in XML and PDF. We
o er the Bodleian First Folio as a challenge in process description encompassing
the analogue and digital, involving uncertainty and itself the subject of scholarly
process, and hope that others might encode it comprehensively in current or
future representations.</p>
      <p>
        The exercise of capturing the provenance has proven interesting in its own
right, using a mixture of drawing and hand-encoding, and we suggest that
interactive visualizations of provenance are useful tools. This is further evidenced by
visualization work in humanities, such as visual analytics for intertextuality [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]
and poetry visualization [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. The algorithmic composition example captures the
provenance of a soni cation, which is essentially a kind of visualization, and we
suggest that describing the provenance of a visualization has similar utility in
interpretation and reuse of scholarship.
      </p>
      <p>Acknowledgements This work is partially supported by Fusing Semantic and
Audio Technologies for Intelligent Music Production and Consumption funded under
EPSRC grant EP/L019981/1. We are grateful to Graham Klyne for his advice on using
PROV.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Gil</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Miles</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Belhajjame</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Deus</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Garijo</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Klyne</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Missier</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Soiland-Reyes</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zednik</surname>
            ,
            <given-names>S.:</given-names>
          </string-name>
          <article-title>PROV Model Primer</article-title>
          . Working group note,
          <source>W3C</source>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Theodoridou</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tzitzikas</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Doerr</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Marketakis</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Melessanakis</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>Modeling and querying provenance by extending CIDOC CRM</article-title>
          .
          <source>Distributed and Parallel Databases</source>
          ,
          <volume>27</volume>
          (
          <issue>2</issue>
          ),
          <volume>169</volume>
          {
          <fpage>210</fpage>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Newman</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bechhofer</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>De Roure</surname>
          </string-name>
          , D.: myExperiment:
          <article-title>An ontology for eResearch</article-title>
          .
          <source>In Semantic Web Applications in Scienti c Discourse</source>
          , volume
          <volume>523</volume>
          of CEUR Workshop Proceedings: http://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>523</volume>
          / (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Shakespeare</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Heminge</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Condell</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Droeshout</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jaggard</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Blount</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jaggard</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Smethwicke</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Aspley</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          : Mr. William Shakespeares comedies, histories, &amp;
          <article-title>tragedies. Published according to the true originall copies</article-title>
          .
          <source>Printed by Isaac Iaggard</source>
          , and Ed. Blount at the charges of W. Iaggard, Ed. Blount,
          <string-name>
            <surname>I. Smithweeke</surname>
          </string-name>
          , and
          <string-name>
            <given-names>W.</given-names>
            <surname>Aspley</surname>
          </string-name>
          , London. Oxford, Bodleian Library, Arch. G c.
          <volume>7</volume>
          (
          <issue>1623</issue>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Lovelace</surname>
            ,
            <given-names>A.A.</given-names>
          </string-name>
          :
          <article-title>Sketch of the analytical engine invented by Charles Babbage, with notes by the translator</article-title>
          . In Scienti c Memoirs,
          <source>Selected from the Transactions of Foreign Academies of Science and Learned Societies</source>
          , Vol.
          <volume>3</volume>
          ,
          <issue>1843</issue>
          , pp.
          <fpage>666</fpage>
          -
          <lpage>731</lpage>
          , volume
          <volume>3</volume>
          . Richard and John E. Taylor, Red Lion Street, Fleet Street, London. Translation of Notions sur la machine analytique de M.
          <article-title>Charles Babbage by Luigi Federico Menabrea</article-title>
          , in Bibliotheque Universelle de Geneve,
          <source>nouvelle serie 41</source>
          ,
          <volume>352</volume>
          {
          <fpage>76</fpage>
          (
          <year>1842</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Pasin</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bradley</surname>
          </string-name>
          , J.:
          <article-title>Factoid-based prosopography and computer ontologies: towards an integrated approach</article-title>
          .
          <source>Digital Scholarship in the Humanities</source>
          ,
          <volume>30</volume>
          (
          <issue>1</issue>
          ),
          <volume>86</volume>
          {
          <fpage>97</fpage>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>A.</given-names>
            <surname>Abdul-Rahman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Roe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            ,
            <surname>Olsen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Gladstone</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Whaling</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            ,
            <surname>Cronk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            ,
            <surname>Morrissey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            ,
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          :
          <article-title>Constructive visual analytics for text similarity detection</article-title>
          .
          <source>Computer Graphics Forum, doi:10</source>
          .1111/cgf.12798 (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Abdul-Rahman</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lein</surname>
            ,
            <given-names>A.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Coles</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Maguire</surname>
            , E., Meyer,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wynne</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Johnson</surname>
            ,
            <given-names>C.R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Trefethen</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Rule-based visual mappings { with a case study on poetry visualization</article-title>
          .
          <source>Computer Graphics Forum</source>
          ,
          <volume>32</volume>
          (
          <issue>3</issue>
          ),
          <volume>381</volume>
          {
          <fpage>390</fpage>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>