=Paper= {{Paper |id=None |storemode=property |title=From Workflows to Research Objects: An Architecture for Preserving the Semantics of Science |pdfUrl=https://ceur-ws.org/Vol-951/paper7.pdf |volume=Vol-951 |dblpUrl=https://dblp.org/rec/conf/semweb/PagePHKSCGGR0G12 }} ==From Workflows to Research Objects: An Architecture for Preserving the Semantics of Science== https://ceur-ws.org/Vol-951/paper7.pdf
             From workows to Research Objects:
                an architecture for preserving
                   the semantics of science
                    1              2                      2
       Kevin Page , Raúl Palma , Piotr Hoªubowicz , Graham Klyne , Stian
                                                                             3
             4                 5                        6
Soiland-Reyes , Don Cruickshank , Rafael González Cabero , Esteban García
                7                      1          3
         Cuesta , David De Roure , Jun Zhao , José Manuel Gómez-Pérez
                                                                                   7

 1
     Oxford e-Research Centre and 3 Department of Zoology, University of Oxford, UK
 2
     Poznan Supercomputing and Networking Centre, Poland 7 iSOCO, Madrid, Spain
               4
                 School of Computer Science, University of Manchester, UK
          5
            Electronics and Computer Science, University of Southampton, UK
           6
             Facultad de Informítica, Universidad Politícnica de Madrid, Spain

         Abstract    Research Objects (ROs) provide a exible model to collate
         and describe the semantic context of science. In this position paper we de-
         scribe how ROs can also provide a foundation for interoperability within
         RESTful architecture design, enabling the development of new services
         and clients alongside compatible enhancements to existing software in-
         cluding myExperiment. To illustrate this we introduce an infrastructure,
         known as the Wf4Ever Toolkit, providing services and clients to encap-
         sulate, preserve, and re-use ROs.

1      Introduction
While a workow can describe an experiment, to aggregate the wider digital con-
text of scientic processes and their conduct  input and output data, method,
software, actors, analysis, dissemination, sharing, re-use, and the links and rela-
tionships between these gathered resources  we need Research Objects (ROs)
[2] and a rich array of tools that, through ROs, can support these needs. To
achieve this goal, the Workow4Ever (Wf4Ever) project
                                                                8 is building infrastruc-
ture to enable the capture, preservation and re-use of ROs, not simply as a
semantic encoding, but as building blocks for interoperability and co-ordination
in a distributed architecture of services and their APIs.


2      An RO-centric architecture
Adopting the practices outlined by Page   et al [5], our RO-centric architecture
builds interoperability through models, APIs, and services :
      Models, particularly the RO model [1] for aggregation and annotation, are
the linchpin of architecture interoperability embodied by APIs. Specialised mod-
els supplement RO for     in silico workow denition (wfdesc ), workow prove-
nance (   wfprov ) and RO evolution (roevo, incorporating versioning and lifecycle).
      APIs. Application Programming Interfaces follow a strict denition of REST
[4] with follow your nose navigation of resources for transitions in application

8
     The EU Wf4Ever project (270129) is funded under EU FP7 (ICT-2009.4.1).
2         Page, Palma, et al.

state and caching considerations to the fore. The RO model and its associates
are, in their RDF incarnation, the primary representation of many resources
exposed by the APIs and thereby provide a linked data interface. Beyond this
use of REST and linked data our architecture is substantially characterized by
the choice of supplemental resources and representations that are provided. A
good example of this relationship between model and API, and its centrality
to the architecture, can be seen     RO Storage and Retrieval (ROSR) API9
used by many of the services and clients below.
      Services implement shared provision of functionality accessed through one
or more APIs (each may be implemented by one or more service). The next
section outlines a number of implemented services that demonstrate the inter-
operability aorded by adoption of an RO-centric architecture.


3      The Wf4Ever Toolkit
The business of capturing, enhancing, and preserving the scientic process is un-
dertaken by the     Wf4Ever Toolkit10 , an RO-centric architecture with services
categorised within three layers and four functional sub-groupings (gure 1; in
the following text services and APIs in     bold, conceptual models sans serif ).




                    Figure 1. The Wf4Ever Toolkit and architecture


      Storage services   . The Toolkit makes use of two   storage services constructed
around the core    RO model. The RO Digital Library (RODL) provides foun-
dational capabilities within the Toolkit and a reference implementation of the
RO Storage and Retrieval (ROSR) API. Extending the dLibra Digital Li-
brary software, RODL provides core functionality including       (i) Retrieval: serving
ROs to clients in multiple representations via content-negotiation including self-
contained zip archives and RDF manifest descriptions, provision of indexing and
query interfaces (including SPARQL) for ROs and their metadata;           (ii) Storage:
creating, editing and deleting internal and external resources and annotations

 9
     http://www.wf4ever-project.org/wiki/display/docs/RO+SRS+interface+6
10
     Implementation repositories at https://github.com/wf4ever : rosrs, workow-runner,
     Stability-service-API, epnio, wf-ro, Collaboration-spheres and portal.
                                          From Workows to Research Objects        3

that, alongside semantic metadata, comprise RO structures, minting identiers
(URIs) as required;    (iii) Maintenance: adding and removing RO resources whilst
ensuring consistency with metadata and manifests, and managing user identities
and accounts and their relationships to ROs.
      The second,   myExperiment [3], is a well established platform for the social
sharing of workows. As a Storage Service we consider myExperiment Packs a
form of proto-   RO  conversely ROs can be considered Packs version 2. These
can be imported into the RO Digital Library, undergoing conversion to ROs
through a web based tool, while myExperiment's workows and associated so-
cial networks give input to the Recommender Service. The user interface of
myExperiment also performs a role as a client within the architecture below.
      Lifecycle services    support the dynamic nature of workow-centric RO re-
sources over time.    Publication and Archival are preservation states for which an
RO must be preserved: the      RODL service allows creation of a duplicate with a
new state (live, snapshot, or archived) through the RO Evolution API. Using
the   roevo ontology it also captures versioning metadata of the subsequent rela-
tionships between these ROs, the history of which can be retrieved through the
API. The     Workow Runner service is built upon Taverna workow server,
providing remote workow       execution with each run captured as an RO. Numer-
ous workow runs generate many ROs, so their lifespan may be short  should
an RO require longer-term preservation it is exchanged between the Runner and
RODL services using the      ROSR API as implemented by both components.
      Data Management and Analysis services              provide an extension layer
that augments Storage and Lifecycle capabilities: that generate, maintain and
provide access to added-value data derived from, or related to, RO resources.
Four services are implemented within the current Wf4Ever Toolkit: (i) Check-
list Evaluation performs an assessment of an RO passed by its URI (and re-
trieved from either a local disk or from a remote service implementing the ROSR
API) against a minimum information model
                                                 11 for purposes of completeness, re-
peatability, executability, etc.;   (ii) Stability Evaluation is a derivative ser-
vice, monitoring the Checklist Evaluation over time as a measure of whether an
RO can maintain its original purpose whilst constituent resources change or be-
come unavailable;     (iii) the Recommender service uses keyword, content-based,
collaborative ltering, and social network approaches, returning recommended
users, ROs, and their aggregated resources;    (iv) WF-RO Transformation is a
service that, given a Taverna     t2ow workow bundle, generates or updates an RO
encapsulating that workow, extracting workow description using the       wfdesc
and    roevo ontologies respectively, and storing these resources in the RODL (or
other service implementing the ROSR API).
      Access and Usage Clients         allow users to interact with ROs and the ser-
vices enabled by them. Some, such as      Collaboration Spheres [6] and ROBox,
provide an interface to specic functionality (recommendations and Dropbox
compatibility respectively) while others like the     RO Portal provide a general
environment for exploring and modifying ROs, and through RO-centric inter-

11
     http://purl.org/net/mim/ns
4       Page, Palma, et al.

change, value-added functions from the architecture. Two other Toolbox clients
highlight the interoperability advantages aorded by an RO-centric architecture.
myExperiment has been extended to support the RO model, to query and re-
            ROSR compatible services and display them alongside Packs.
trieve ROs from
Workow heritage demonstrates this architecture in use: a parent workow
is downloaded by a user from myExperiment; who derives ospring workows;
encapsulated in ROs through WF-RO and RODL; then queried by myExperi-
ment from RODL using       roevo; this relationship - between the parent workow
and the derivative ROs - is displayed in myExperiment as an indicator of reuse.
    RO Manager [7] is a command line tool for manipulating ROs on the lesys-
                                                          ROSR) with those
tem of a scientist's workstation and synchronising them (via
preserved in RODL. It is the gateway between the local working practices of
the user and the network services provided by the Toolkit, presenting storage,
lifecycle, and extension functions in a local user interface, while ensuring and
utilising compatibility with the RO model and Toolkit APIs.

4    Conclusions and further work
In this paper we have shown how the RO model provides a suitable basis for
interoperability within a service-based architecture for management of scientic
workows, and outlined a number of implemented services from the Wf4Ever
Toolkit that support scientic preservation via this mechanism of interchange 
the RO model, its extensions, the REST APIs built around it, and the services
and clients that implement and use these APIs. The architecture and Toolkit are
a work in progress, co-evolving with user requirements as rst-hand experience is
gathered. With foundation services in place and our approach validated we will
build upon the exibility of the RO model and the simplied client development
process aorded by our APIs; to further enhance and streamline the scientist's
experience both through targeted applications that assist a specic task, and
the continued RO-enabling of myExperiment with Wf4Ever services.


References

1. K. Belhajjame et al. Workow-centric research objects: First class citizens in schol-
   arly discourse. In Proc. Workshop on the Semantic Publishing, pages 112, 2012.
2. D. De Roure, K. Belhajjame, , et al. Towards the preservation of scientic workows.
   In Proc. 8th Intl. Conference on Preservation of Digital Objects, 2011.
3. D. De Roure et al. The design and realisation of the virtual research environment
   for social sharing of workows. FGCS, 25(5):561567, 2009.
4. R. Fielding. Architectural Styles and the Design of Network-based Software Archi-
   tectures. PhD thesis, University of California, 2000.
5. K. Page et al. Rest and linked data: a match made for domain driven development?
   In Proc. 2nd Intl. Workshop on RESTful Design, pages 2225, 2011.
6. C. Ruiz, G. Álvaro, et al. A framework and implementation for secure knowledge
   management in large communities. In Proc. 11th Intl. Conference on Knowledge
   Management and Knowledge Technologies, page 19, 2011.
7. J. Zhao, G. Kylne, et al. RO-Manager: A Tool for Creating and Manipulating
   Research Objects to Support Reproducibility and Reuse in Sciences. In Proc. 2nd
   Intl. Workshop on Linked Science, 2012.