<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Digital Preservation in Data-Driven Science: On the Importance of Process Capture, Preservation and Validation</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Andreas Rauber</string-name>
          <email>rauber@ifs.tuwien.ac.at</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Software Technology and interactive Systems Vienna university of Technology Favoritenstrasse 9-11</institution>
          ,
          <addr-line>1040 Vienna</addr-line>
          ,
          <country country="AT">Austria</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2012</year>
      </pub-date>
      <fpage>7</fpage>
      <lpage>17</lpage>
      <abstract>
        <p>Current digital preservation is strongly biased towards data objects: digital files of document-style objects, or encapsulated and largely self-contained objects. To provide authenticity and provenance information, comprehensive metadata models are deployed to document information on an object's context. Yet, we claim that simply documenting an objects context may not be sufficient to ensure proper provenance and to fulfill the stated preservation goals. Specifically in e-Science and business settings, capturing, documenting and preserving entire processes may be necessary to meet the preservation goals. We thus present an approach for capturing, documenting and preserving processes, and means to assess their authenticity upon re-execution. We will discuss options as well as limitations and open challenges to achieve sound preservation, specifically within scientific processes.</p>
      </abstract>
      <kwd-group>
        <kwd>Digital Preservation</kwd>
        <kwd>Processes</kwd>
        <kwd>Context</kwd>
        <kwd>eScience</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Digital preservation (DP) traditionally has a predominantly data-centric view
on both its operations as well as the objects it is dealing with. Digital objects
considered for preservation are usually (turned into, as far as possible)
selfcontained, static objects such as images resulting from scans, classical
documentstyle objects, data sets, but also information packages containing software and
other in principle dynamic objects as encapsulated files. Furthermore, objects
are usually ”removed” from an operational life-cycle into an archival life cycle,
ingested into designated repositories for long-term maintenance, from which they
are removed and re-inserted into a potentially new life-cycle when needed in an
operational manner again, only to be re-ingested as new objects after completion
of their new life as now new archival objects. Consequently, also cost estimation
and investments are largely based upon aggregated data item-level information,
adding up processing costs, storage costs and others.</p>
      <p>
        We claim that this traditional view is hitting severe limitations as we are
observing a set of interesting changes in the preservation community: Most
importantly, preservation is expanding beyond the traditional cultural heritage
community. While originating in the sciences, recognizing the need to maintain
the investment made into data collected electronically (as evident by the early
and strong commitment of institutions such as NASA, leading to the infamous
OAIS model) the key drivers, expertise and know how and development has come
from and taken place in the cultural heritage community. The key characteristic
of this community is the dedication to preservation of information as a, or even
the, primary mission, resulting in a holistic understanding of the scope of the
problem and its long-term implication beyond individual technical or
organizational issues. Yet, more recently we see a range of other communities facing the
need for digital preservation: back to the origins of DP, science as a whole is
becoming increasingly dependent on data as the core facilitator in virtually all
scientific disciplines, leading to trends defined as data-driven science, e-Science,
Big Data [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], the Fourth Paradigm [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], and others. But even beyond cultural
heritage and science communities, both of which have been involved in DP for
a long time, we find entirely new players / customers in need of DP solutions,
many of them coming from a range of industrial backgrounds, and with a quite
diverse set of motivations. These may range from specific legal / compliance
requirements, via somewhat more ambiguous risk mitigation desires to serving
dedicated business needs. Thus, while in principle being similar to the cultural
heritage sector, there are some interesting challenges stretching beyond the ones
encountered in more traditional settings.
      </p>
      <p>
        This paper starts in Sec. 2 with a loose collection of observations on changes
in the DP community, highlighting three areas of focus that we deem
important, namely a shift towards risk management, viewing DP in the context of
e-Governance frameworks, and the shift towards the preservation of processes
rather than data objects. This will be followed by a more detailed look at two
key aspects, namely process preservation and a framework for evaluating the
quality of re-execution of preserved processes in Sections 3 and 4, respectively
(largely adopted from [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]), before providing a brief summary in Section 5.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Implications of Changes in Stakeholder Communities</title>
      <p>The observed expansion in stakeholder communities has some interesting
implications for the DP community: first of all, we are experiencing yet another
clash of languages and cultures: after partially successfully consolidating the
viewpoints of archival and library communities, merging the viewpoints of the
museums community and even succeeding in getting computer science to listen
and communicate on an increasingly shared level of mutual understanding, DP
is currently being recognized, interpreted and contributed to by a whole range
of new key players with completely diverse interpretation of the concepts widely
accepted in the traditional DP community. Long-term may be as short as 7
years, preservation and loss is not necessarily measured on the level of the need
for maintaining an object, but as best effort vs. risk trade-off, specifically not
happening ”at all costs” wherever possible, with deletion-as-early-as-permissible
being a key factor,
Where DP is serving a business purpose, objects are rarely being perceived
as frozen and deposited into an archive. Rather, they need to be maintained in
an operational environment. Catch phrases such as business continuity capture a
lot about the thinking behind this. More importantly, however, they also help to
identify approaches and solutions to serve these (but also more traditional) DP
needs that stem from different backgrounds: life cycle management of entire IT
environments, redundancy, (IT) security, e-governance structures and methods
have been developed, deployed, tested, customized and improved over long time
spans in these communities. These solutions may prove valuable contributions
to the DP community at large.</p>
      <p>This integration of yet another heterogeneous set of communities poses severe
challenges to a rather tightly-knit network of DP researchers and practitioners
that has just started to evolve into a very young community of its own, with it’s
own jargon, events, and commonly understood basis of generic concepts. Yet, it
also offers huge benefits. Apart from contributing new competences and tested
solutions that can be adapted to serve more generic DP needs, it also broadens
the basis amongst which to share the costs of research and development in DP.
It allows us to integrate know-how from different groups and grows the market
of where this know-how can be used.</p>
      <p>In terms of new areas of activity, at least three major trends can be observed:
Preservation as a cost/benefit trade-off: The primary focus so far has
been on preserving objects because they need to be preserved. Current thinking
seems to be moving towards obsolescence as a risk, and DP as a risk
mitigation strategy. While this concept is anything but new to the DP community,
the key difference is on the juxtaposition of risk and benefit, and a much more
pronounced and explicit willingness to sacrifice availability of objects when the
investments necessary to maintain them would likely outweigh the business
benefits or legal sanctions. Although the same principles are applicable in traditional
settings as well, this focus shift forces a more explicit formulation of
benefits/value and a clearer specification of risk and costs, especially for more
immediate/short term actions. This also will call for the application of existing
frameworks for risk identification as well as cost/benefit estimation, with potentially
strongly diverging valuations in the non-heritage domains.</p>
      <p>
        DP as capabilities and maturity evaluations: Another important shift
we may see coming is a shift from DP being something happening in a data
archive or repository setting, i.e. a designated institution or department
handling ”old” objects, and being audited on its performance via any set of audit
and certification routines to ensure proper preservation at the highest possible
level. Rather, we may view DP as a set of capabilities that an institution has, as
part of many other operational capabilities, and that are integrated with more
routine processes. We are thus currently investigating opportunities of
integrating DP in an eGovernance framework. Reference models such as TOGAF [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] and
COBIT [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] provide a well-tested basis for designing and managing DP activities
alongside other routine IT capabilities. It allows us to manage DP capabilities
in the framework of IT governance, benefiting from the well-structured concepts
and processes in this domain, defining drivers, constrains and controls [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. An
example of how to model maturity levels for preservation capabilities is provided
in [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>Preserving processes rather than (only) data: The third, major area
of activity currently, and probably the most relevant with respect to semantic
technologies, relates to the shift from preserving static objects to the preservation
of entire process chains. This shift is motivated by two core considerations: first
of all, in many new settings, it is not static artifacts, but the need to be able
to re-run processes in an authentic manner that are the key DP requirement.
While the preservation of data as documentation may be sufficient to provide
evidence about processes having been run they are no replacement for preserving
the actual process.</p>
      <p>But even when the focus is on the actual data, it may be advisable to
preserve the process chain the data was subjected to. In an e-Science setting, while
preserving the data is an essential first step for any sustainable research efforts,
the data alone is often not sufficient for later analysis of how this data was
obtained, pre-processed and transformed. Results of scientific experiments are
often just the very last step of the whole process, and to be able to correctly
interpret them by other parties or at a later point in time, also these processes
need to be preserved.</p>
      <p>Thus, specifically in an e-Science setting, preserving processes together with
the data helps us to meet two goals at the same time: on the one hand, the
processes are essential aspects of representation information, allowing to trace
the various (pre-)processing steps applied to the data, any bias that might have
been introduced, or errors stemming from faulty processing. On the other hand,
it also allows us to re-run these processes on new data, learning how models and
views evolved, and to discover discrepancies from earlier analyzes.</p>
      <p>
        We thus need to go beyond the classical concerns of Digital Preservation
research, and consider more than the preservation of data. The following section
takes a closer look at some of the activities centering around process preservation,
covering both process capture as well as evaluation of re-executed processes –
and the resulting requirements at the level of process capture. We will choose
examples from the e-Science/Data curation domain as an exemplary setting.
The following sections are largely adopted from a position paper summarizing
our considerations to data quality aspects in data curation for a recent workshop
by the National Science Foundation [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ].
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Capturing Processes</title>
      <p>Curation of business or E-Science processes requires capturing the whole context
of the process, including enabling technologies, different system components on
both hardware and software levels, dependencies on other computing systems
and services operated by external providers, the data consumed and generated,
and more high-level information such as the goals of the process, different
stakeholders and parties. The context of information needed for preserving processes
is considerably more complex than that of data objects, as it not only requires
dealing with the structural properties of information, but also with the dynamic
behavior of processes. Successful curation of an eScience process requires
capturing sufficient detail of the process, as well as its context, to be able to re-run
and verify the original behavior at a later stage, under changed and evolved
conditions. We thus need to preserve the set of activities, processes and tools,
which all together ensure continued access to the services and software which are
necessary to reproduce the context within which information can be accessed,
properly rendered and validated.</p>
      <p>
        To address these challenges, we have devised a context model to
systematically capture aspects of a process that are essential for its preservation and
verification upon later re-execution [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. The model consists of approximately
240 elements, structured in around 25 major groups. It corresponds to some
degree to the representation information network [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], modeling the relationships
between an information object and its related objects, be it documentation of the
object, constituent parts and other information required to interpret the object.
This is extended to understand the entire context within which a process,
potentially including human actors, is executed, forming a graph of all constituent
elements and, recursively, their representation information. Specific emphasis is
given to the identification of distributed components of a process: identifying,
for example, external web services is essential to ensure that a process can be
preserved, as specific measures need to be devised to ensure their availability in
the future. Approaches include an integration of a web service into the process,
legal agreements on the preservation of web services (at specified versions) by the
service provider, deposit regulations using ESCROW regulations [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], and others.
The model is implemented in the form of an ontology, which on the one hand
allows for the hierarchical categorization of aspects, and on the other hand shall
enable reasoning, e.g. over the possibility of certain preservation actions for a
specific process instance. While the model is very extensive, it should be noted
that a number of aspects can be filled automatically – especially if institutions
have well-defined and documented processes. Also, not all sections of the model
are equally important for each type of process. Therefore, not every aspect has
to be described at the finest level of granularity.
      </p>
      <p>
        To move towards more sustainable E-Science processes, we recommend
implementing them in workflow execution environments. For example, we currently
use the Taverna workflow engine [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. Taverna is a system designed specifically
to execute scientific workflows. It allows scientists to combine services and
infrastructure for modeling their workflows. Services can for example be remote
web-services, invoked via WSDL or REST, or local services, in the form of
predefined scripts, or user-defined scripts.
      </p>
      <p>Implementing such a research workflow in a system like Taverna yields a
complete and documented model of the experiment process – each process step
is defined, as is the sequence (or parallelism) of the steps. Further, Taverna
requires the researcher to explicitly specify the data that is input and output
both of the whole process, as well as of each individual step. Thus, also parameter
settings for specific software, such as the parameters for a machine learning tool
or feature extraction, become explicit, either in the form of process input data,
or in the script code.</p>
      <p>Figure 3 shows an example of the music classification experiment workflow
modeled in the Taverna workflow engine. We notice input parameters to the
process such as the URL of the MP3 contents and the ground truth, and also
an authentication voucher which is needed to authorize the use of the feature
extraction service. The latter is a bit of information that is likely to be forgotten
frequently in descriptions of this process, as it is rather a technical requirement
than an integral part of the scientific process transformations. However, it is
essential for allowing re-execution of the process, and may help to identify
potential licensing issues when wanting to preserve the process over longer periods
of time, requiring specific digital preservation measures.</p>
      <p>
        During an execution of the workflow, Taverna records so-called provenance
data, i.e. information about the creation of the objects, on the data
transformation happening during the experiment. Taverna uses its proprietary Janus
format, an extension on the Open-Provenance Model[
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] that allows capturing
more details. Such data is recorded for the input and output of each process step.
It thus allows to trace the complete data flow from the beginning of the process
until the end, enabling verification of the results obtained. This is essential for
being able to verify system performance upon re-execution, specifically when
any component of the process (such as underlying hardware, operating systems,
software versions, etc.) has changed.
4
      </p>
    </sec>
    <sec id="sec-4">
      <title>Evaluating Process Re-Execution</title>
      <p>
        A critical aspect of re-using digital information in new settings is its
trustworthiness, especially its authenticity and faithful rendering (with rendering being any
form of representation or execution and effect of a digital object, be it rendering
on a screen, an acoustic output device, or state changes on ports, discs etc.).
Establishing identity or faithfulness is more challenging than commonly assumed:
current evaluation approaches frequently operate on the structural level, i.e. by
analyzing the preservation of significant properties on the file format level in case
of migration of objects. Yet, any digital object (file, process) is only perceived
and can only be evaluated properly in a well-specified rendering environment
within which faithfulness of performance need to be established. In emulation
settings, this evaluation approach is more prominently present, yet few
emulators support the requirements specific to preservation settings. We thus argue
that, actually, migration, emulation and virtually all other approaches to
logical/structural data preservation need to be evaluated in the same way, as they
are virtually no different from each other as all need to be evaluated in a given
rendering/performance environment. [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>
        We also devise a framework for evaluating whether two versions of a
digital object are equivalent [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Important steps in this framework include (1) a
description of the original environment, (2) the identification of external events
influencing the object’s behavior, (3) the decision on what level to compare the
two objects, (4) recreating the environment, (5) applying standardized input to
both environments, and finally (6) extracting and (7) comparing the significant
properties on suitable levels of an object’s rendering. Even though the framework
focuses mostly on emulation of environments, the principles are also applicable
specifically for entire processes, and will work virtually unchanged also for
migration approaches, when complex objects are transformed e.g into a new file
format version.
      </p>
      <p>
        An essential component of the framework is the identification at which levels
to measure the faithfulness of property preservation, as depicted in Figure 4. A
rendered representation of the digital object has to be extracted on (a) suitable
level(s) where the significant properties of the object can be evaluated. For some
aspects, the rendering of an object can be performed based on its representation
in specific memories (system/graphics/sound card/IO-buffer), for others the
respective state changes at the output port have to be considered while for yet
others the actual effect of a system on its environment needs to be considered,
corresponding to delineating the boundaries of the system to be evaluated. (Note
that identity on a lower level does not necessarily correspond to identity at higher
levels of the viewpath - in some cases significant effort is required to make up for
differences e.g. on the screen level when having to emulate the visual behavior
of cathode ray screens on modern LCD screens [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ].) An example of applying
this framework to the evaluation of preservation actions is provided in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]
      </p>
      <p>A key challenge in this context will be to come up with a comprehensive model
of what information to capture for specific types of processes and preservation
requirements, as well as guidelines on how to do this.
5</p>
    </sec>
    <sec id="sec-5">
      <title>Conclusions</title>
      <p>This paper presents a loose collection of some trends observed in digital
preservation research, specifically a shift toward a more risk/cost/benefit based approach
to DP, the framing of DP in IT governance principles, and specifically the
necessity to preserve entire processes rather than only data. With the growing
importance of preserving entire processes rather than sets of homogeneous, static (and
usually quite simple) objects, the requirements on techniques to capture,
document and reason across increasingly complex sets of context meta-information
is growing. This requires new approaches to context capture and representation.</p>
      <p>Still, the considerations above cover only a small subset of the quite
significant research challenges that continue to emerge in the field of digital curation.
We thus strongly encourage the community to contribute to an effort of
collecting and discussing these emerging research questions in a loosely organized form.
To this end, following the Dagstuhl Seminar on Research Challenges in Digital
Preservation1, a Digital Preservation Challenges Wiki2 has been created, where
we invite contributions and discussion. As a follow-up to the Dagstuhl seminar,
1 http://www.dagstuhl.de/de/programm/kalender/semhp/?semnr=10291
2 http://sokrates.ifs.tuwien.ac.at
a workshop on DP Challenges3 will be held at iPRES 2012 in Toronto focusing
on the elicitation and specification of research challenges.
3 http://digitalpreservationchallenges.wordpress.com/</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>C.</given-names>
            <surname>Becker</surname>
          </string-name>
          , G. Antunes,
          <string-name>
            <given-names>J.</given-names>
            <surname>Barateiro</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>Vieira</surname>
          </string-name>
          .
          <article-title>A capability model for digital preservation: Analysing concerns, drivers, constraints, capabilities and maturities</article-title>
          .
          <source>In 8th International Conference on Preservation of Digital Objects (IPRES</source>
          <year>2011</year>
          ), Singapore,
          <year>November 2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>C.</given-names>
            <surname>Becker</surname>
          </string-name>
          , G. Antunes,
          <string-name>
            <given-names>J.</given-names>
            <surname>Barateiro</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>Vieira</surname>
          </string-name>
          .
          <article-title>Control objectives for dp: Digital preservation as an integrated part of it governance</article-title>
          .
          <source>In Proceedings of the 74th Annual Meeting of the American Society for Information Science and Technology (ASIST)</source>
          , New Orleans, Lousiana,
          <string-name>
            <surname>US</surname>
          </string-name>
          ,
          <year>October 2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>M.</given-names>
            <surname>Guttenbrunner</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Rauber</surname>
          </string-name>
          .
          <article-title>A Measurement Framework for Evaluating Emulators for Digital Preservation</article-title>
          .
          <source>ACM Transactions on Information Systems (TOIS)</source>
          ,
          <volume>30</volume>
          (
          <issue>2</issue>
          ),
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>M.</given-names>
            <surname>Guttenbrunner</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Rauber</surname>
          </string-name>
          .
          <article-title>Evaluating an emulation environment: Automation and significant key characteristics</article-title>
          .
          <source>In Proceedings of the 9th conference on Preservation of Digital Objects (iPRES2012)</source>
          , Toronto, Canada, October 1-5
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>M.</given-names>
            <surname>Guttenbrunner</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Rauber</surname>
          </string-name>
          .
          <article-title>Evaluating emulation and migration: Birds of a feather?</article-title>
          <source>In Proceedings of the 14th International Conference on Asia-Pacific Digital Libraries</source>
          , Taipei, Taiwan, November
          <volume>12</volume>
          -15
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>T.</given-names>
            <surname>Hey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Tansley</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          K. Tolle, editors.
          <source>The Fourth Paradigm: Data-Intensive Scientific Discovery. Microsoft Research</source>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>K.</given-names>
            <surname>Hobel</surname>
          </string-name>
          and
          <string-name>
            <given-names>S.</given-names>
            <surname>Strodl</surname>
          </string-name>
          .
          <article-title>Software escrow agreements</article-title>
          .
          <source>In Tagungsband des 15. Internationalen Rechtsinformatik Symposions IRIS</source>
          <year>2012</year>
          , pages
          <fpage>603</fpage>
          -
          <lpage>610</lpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>IT</given-names>
            <surname>Governance</surname>
          </string-name>
          <article-title>Institute</article-title>
          .
          <source>COBIT 4</source>
          .1.
          <string-name>
            <surname>Framework - Control Objectives - Management Guidelinces - Maturity Models</surname>
          </string-name>
          .
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>S.</given-names>
            <surname>Manegold</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kersten</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Thanos</surname>
          </string-name>
          . Special theme:
          <article-title>Big data</article-title>
          .
          <source>ERCIM News</source>
          , (
          <volume>89</volume>
          ),
          <year>January 2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <given-names>Y.</given-names>
            <surname>Marketakis</surname>
          </string-name>
          and
          <string-name>
            <given-names>Y.</given-names>
            <surname>Tzitzikas</surname>
          </string-name>
          .
          <article-title>Dependency management for digital preservation using semantic web technologies</article-title>
          .
          <source>International Journal on Digital Libraries</source>
          ,
          <volume>10</volume>
          :
          <fpage>159</fpage>
          -
          <lpage>177</lpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <given-names>R.</given-names>
            <surname>Mayer</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Rauber</surname>
          </string-name>
          .
          <article-title>Towards time-resilient mir processes</article-title>
          .
          <source>In Proceedings of the 13th International Society for Music Information Retrieval Conference (ISMIR</source>
          <year>2012</year>
          ), Porto, Portugal, October 8-12
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <given-names>R.</given-names>
            <surname>Mayer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Rauber</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Neumann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Thomson</surname>
          </string-name>
          , and
          <string-name>
            <given-names>G.</given-names>
            <surname>Antunes</surname>
          </string-name>
          .
          <article-title>Preserving scientific processes from design to publication</article-title>
          .
          <source>In Proceedings of the 15th International Conference on Theory and Practice of Digital Libraries (TPDL</source>
          <year>2012</year>
          ), LNCS, Cyprus,
          <year>September 2012</year>
          . Springer.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <given-names>P.</given-names>
            <surname>Missier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Soiland-Reyes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Owen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Tan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Nenadic</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Dunlop</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Williams</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Oinn</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Goble</surname>
          </string-name>
          . Taverna, reloaded.
          <source>In Proceedings of the 22nd International Conference on Scientific and Statistical Database Management, SSDBM'10</source>
          , pages
          <fpage>471</fpage>
          -
          <lpage>481</lpage>
          . Springer,
          <year>June 2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14. L.
          <string-name>
            <surname>Moreau</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Freire</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Futrelle</surname>
            ,
            <given-names>R. E.</given-names>
          </string-name>
          <string-name>
            <surname>Mcgrath</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Myers</surname>
            , and
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Paulson</surname>
          </string-name>
          .
          <article-title>Provenance and Annotation of Data and Processes, chapter The Open Provenance Model: An Overview</article-title>
          , pages
          <fpage>323</fpage>
          -
          <lpage>326</lpage>
          . Springer,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15. G. Phillips.
          <article-title>Simplicity betrayed</article-title>
          .
          <source>Communications of the ACM</source>
          ,
          <volume>53</volume>
          (
          <issue>6</issue>
          ):
          <fpage>52</fpage>
          -
          <lpage>58</lpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <given-names>A.</given-names>
            <surname>Rauber</surname>
          </string-name>
          .
          <article-title>Data quality for new science: Process curation, curation evaluation and curation capabilities</article-title>
          . In Workshop notes for the UNC/NSF Workshop Curating for Quality, Arlington,
          <string-name>
            <surname>VA</surname>
          </string-name>
          , September
          <volume>10</volume>
          -11
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17. The Open Group.
          <article-title>TOGAF Version 9</article-title>
          . Van Haren Publishing,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>