<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>From workows to Research Ob jects: an architecture for preserving the semantics of science</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Kevin Page</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Raoel Palma</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff5">5</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Piotr Ho“ubowicz</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff5">5</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Graham Klyne</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Stian Soiland-Reyes</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff6">6</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Don Cruickshank</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Rafael Gonz</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>lez Cabero</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Esteban Garca Cuesta</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff7">7</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>David De Roure</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jun Zhao</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>JosØ Manuel Gmez-PØrez</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff7">7</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>An RO-centric architecture</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of Zoology, University of Oxford</institution>
          ,
          <country country="UK">UK</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Electronics and Computer Science, University of Southampton</institution>
          ,
          <country country="UK">UK</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Facultad de Informtica, Universidad Politcnica de Madrid</institution>
          ,
          <country country="ES">Spain</country>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>Oxford e-Research Centre</institution>
        </aff>
        <aff id="aff5">
          <label>5</label>
          <institution>Poznan Supercomputing and Networking Centre</institution>
          ,
          <country country="PL">Poland</country>
        </aff>
        <aff id="aff6">
          <label>6</label>
          <institution>School of Computer Science, University of Manchester</institution>
          ,
          <country country="UK">UK</country>
        </aff>
        <aff id="aff7">
          <label>7</label>
          <institution>iSOCO</institution>
          ,
          <addr-line>Madrid</addr-line>
          ,
          <country country="ES">Spain</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Research Objects (ROs) provide a exible model to collate and describe the semantic context of science. In this position paper we describe how ROs can also provide a foundation for interoperability within RESTful architecture design, enabling the development of new services and clients alongside compatible enhancements to existing software including myExperiment. To illustrate this we introduce an infrastructure, known as the Wf4Ever Toolkit, providing services and clients to encapsulate, preserve, and re-use ROs.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        While a workow can describe an experiment, to aggregate the wider digital
context of scientic processes and their conduct input and output data, method,
software, actors, analysis, dissemination, sharing, re-use, and the links and
relationships between these gathered resources we need Research Objects (ROs)
[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] and a rich array of tools that, through ROs, can support these needs. To
achieve this goal, the Workow4Ever (Wf4Ever) project 8 is building
infrastructure to enable the capture, preservation and re-use of ROs, not simply as a
semantic encoding, but as building blocks for interoperability and co-ordination
in a distributed architecture of services and their APIs.
state and caching considerations to the fore. The RO model and its associates
are, in their RDF incarnation, the primary representation of many resources
exposed by the APIs and thereby provide a linked data interface. Beyond this
use of REST and linked data our architecture is substantially characterized by
the choice of supplemental resources and representations that are provided. A
good example of this relationship between model and API, and its centrality
to the architecture, can be seen RO Storage and Retrieval (ROSR) API 9
used by many of the services and clients below.
      </p>
      <p>Services implement shared provision of functionality accessed through one
or more APIs (each may be implemented by one or more service). The next
section outlines a number of implemented services that demonstrate the
interoperability aorded by adoption of an RO-centric architecture.
3</p>
    </sec>
    <sec id="sec-2">
      <title>The Wf4Ever Toolkit</title>
      <p>The business of capturing, enhancing, and preserving the scientic process is
undertaken by the Wf4Ever Toolkit 10, an RO-centric architecture with services
categorised within three layers and four functional sub-groupings (gure 1; in
the following text services and APIs in bold, conceptual models sans serif ).</p>
      <p>Storage services . The Toolkit makes use of two storage services constructed
around the core RO model. The RO Digital Library (RODL) provides
foundational capabilities within the Toolkit and a reference implementation of the
RO Storage and Retrieval (ROSR) API . Extending the dLibra Digital
Library software, RODL provides core functionality including (i) Retrieval: serving
ROs to clients in multiple representations via content-negotiation including
selfcontained zip archives and RDF manifest descriptions, provision of indexing and
query interfaces (including SPARQL) for ROs and their metadata; (ii) Storage:
creating, editing and deleting internal and external resources and annotations
9 http://www.wf4ever-project.org/wiki/display/docs/RO+SRS+interface+6
10 Implementation repositories at https://github.com/wf4ever : rosrs, workow-runner ,
Stability-service-API , epnio, wf-ro, Collaboration-spheres and portal.</p>
      <p>From Workows to Research Objects
that, alongside semantic metadata, comprise RO structures, minting identiers
(URIs) as required; (iii) Maintenance: adding and removing RO resources whilst
ensuring consistency with metadata and manifests, and managing user identities
and accounts and their relationships to ROs.</p>
      <p>
        The second, myExperiment [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], is a well established platform for the social
sharing of workows. As a Storage Service we consider myExperiment Packs a
form of proto-RO conversely ROs can be considered Packs version 2. These
can be imported into the RO Digital Library, undergoing conversion to ROs
through a web based tool, while myExperiment’s workows and associated
social networks give input to the Recommender Service. The user interface of
myExperiment also performs a role as a client within the architecture below.
      </p>
      <p>Lifecycle services support the dynamic nature of workow-centric RO
resources over time. Publication and Archival are preservation states for which an
RO must be preserved: the RODL service allows creation of a duplicate with a
new state (live, snapshot, or archived) through the RO Evolution API. Using
the roevo ontology it also captures versioning metadata of the subsequent
relationships between these ROs, the history of which can be retrieved through the
API. The Workow Runner service is built upon Taverna workow server,
providing remote workow execution with each run captured as an RO.
Numerous workow runs generate many ROs, so their lifespan may be short should
an RO require longer-term preservation it is exchanged between the Runner and
RODL services using the ROSR API as implemented by both components.</p>
      <p>Data Management and Analysis services provide an extension layer
that augments Storage and Lifecycle capabilities: that generate, maintain and
provide access to added-value data derived from, or related to, RO resources.
Four services are implemented within the current Wf4Ever Toolkit: (i)
Checklist Evaluation performs an assessment of an RO passed by its URI (and
retrieved from either a local disk or from a remote service implementing the ROSR
API) against a minimum information model 11 for purposes of completeness,
repeatability, executability, etc.; (ii) Stability Evaluation is a derivative
service, monitoring the Checklist Evaluation over time as a measure of whether an
RO can maintain its original purpose whilst constituent resources change or
become unavailable; (iii) the Recommender service uses keyword, content-based,
collaborative ltering, and social network approaches, returning recommended
users, ROs, and their aggregated resources; (iv) WF-RO Transformation is a
service that, given a Taverna t2ow workow bundle, generates or updates an RO
encapsulating that workow, extracting workow description using the wfdesc
and roevo ontologies respectively, and storing these resources in the RODL (or
other service implementing the ROSR API).</p>
      <p>
        Access and Usage Clients allow users to interact with ROs and the
services enabled by them. Some, such as Collaboration Spheres [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] and ROBox,
provide an interface to specic functionality (recommendations and Dropbox
compatibility respectively) while others like the RO Portal provide a general
environment for exploring and modifying ROs, and through RO-centric
inter11 http://purl.org/net/mim/ns
change, value-added functions from the architecture. Two other Toolbox clients
highlight the interoperability advantages aorded by an RO-centric architecture.
myExperiment has been extended to support the RO model, to query and
retrieve ROs from ROSR compatible services and display them alongside Packs.
Workow heritage demonstrates this architecture in use: a parent workow
is downloaded by a user from myExperiment; who derives ospring workows;
encapsulated in ROs through WF-RO and RODL; then queried by
myExperiment from RODL using roevo; this relationship - between the parent workow
and the derivative ROs - is displayed in myExperiment as an indicator of reuse.
      </p>
      <p>
        RO Manager [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] is a command line tool for manipulating ROs on the
lesystem of a scientist’s workstation and synchronising them (via ROSR) with those
preserved in RODL. It is the gateway between the local working practices of
the user and the network services provided by the Toolkit, presenting storage,
lifecycle, and extension functions in a local user interface, while ensuring and
utilising compatibility with the RO model and Toolkit APIs.
      </p>
    </sec>
    <sec id="sec-3">
      <title>4 Conclusions and further work</title>
      <p>In this paper we have shown how the RO model provides a suitable basis for
interoperability within a service-based architecture for management of scientic
workows, and outlined a number of implemented services from the Wf4Ever
Toolkit that support scientic preservation via this mechanism of interchange
the RO model, its extensions, the REST APIs built around it, and the services
and clients that implement and use these APIs. The architecture and Toolkit are
a work in progress, co-evolving with user requirements as rst-hand experience is
gathered. With foundation services in place and our approach validated we will
build upon the exibility of the RO model and the simplied client development
process aorded by our APIs; to further enhance and streamline the scientist’s
experience both through targeted applications that assist a specic task, and
the continued RO-enabling of myExperiment with Wf4Ever services.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>K.</given-names>
            <surname>Belhajjame</surname>
          </string-name>
          et al.
          <article-title>Workow-centric research objects: First class citizens in scholarly discourse</article-title>
          .
          <source>In Proc. Workshop on the Semantic Publishing</source>
          , pages
          <fpage>112</fpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>D. De Roure</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Belhajjame</surname>
          </string-name>
          , , et al.
          <article-title>Towards the preservation of scientic workows</article-title>
          .
          <source>In Proc. 8th Intl. Conference on Preservation of Digital Objects</source>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>D. De</surname>
          </string-name>
          Roure et al.
          <article-title>The design and realisation of the virtual research environment for social sharing of workows</article-title>
          .
          <source>FGCS</source>
          ,
          <volume>25</volume>
          (
          <issue>5</issue>
          ):
          <fpage>561567</fpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>R.</given-names>
            <surname>Fielding</surname>
          </string-name>
          .
          <article-title>Architectural Styles and the Design of Network-based Software Architectures</article-title>
          .
          <source>PhD thesis</source>
          , University of California,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>K.</given-names>
            <surname>Page</surname>
          </string-name>
          et al.
          <article-title>Rest and linked data: a match made for domain driven development?</article-title>
          <source>In Proc. 2nd Intl. Workshop on RESTful Design</source>
          , pages
          <fpage>2225</fpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>C.</given-names>
            <surname>Ruiz</surname>
          </string-name>
          , G. `lvaro, et al.
          <article-title>A framework and implementation for secure knowledge management in large communities</article-title>
          .
          <source>In Proc. 11th Intl. Conference on Knowledge Management and Knowledge Technologies , page 19</source>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Kylne</surname>
          </string-name>
          , et al.
          <article-title>RO-Manager: A Tool for Creating and Manipulating Research Objects to Support Reproducibility and Reuse in Sciences</article-title>
          .
          <source>In Proc. 2nd Intl. Workshop on Linked Science</source>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>