RO-Manager: A Tool for Creating and
      Manipulating Research Objects to Support
       Reproducibility and Reuse in Sciences

        Jun Zhao1 , Graham Klyne1 , Piotr Ho!lubowicz2 , Raúl Palma2 , Stian
    Soiland-Reyes3 , Kristina Hettne4 , José Enrique Ruiz5 , Marco Roos4 , Kevin
     Page6 , José Manuel Gómez-Pérez7 , David De Roure6 , and Carole Goble3
              1
                Department of Zoology, University of Oxford, Oxford, UK
                         jun.zhao, graham.klyne@zoo.ox.ac.uk
          2
             Poznań Supercomputing and Networking Center, Poznań, Poland
                            piotrhol, palma@man.poznan.pl
       3
         School of Computer Science, University of Manchester, Manchester, UK
       soiland-reyes@cs.manchester.ac.uk, carole.goble@manchester.ac.uk
                    4
                       Leiden University Medical Center, Leiden, NL
                              k.m.hettne, m.roos@lumc.nl
          5
            Instituto de Astrofı́sica de Andalucı́a, Granada, Spain jer@iaa.es
             6
               Oxford eResearch Center, University of Oxford, Oxford, UK
                       kevin.page, david.deroure@oerc.ox.ac.uk
                      7
                        iSOCO, Madrid, Spain jmgomez@isoco.com


        Abstract. In this position paper we present a lightweight command-line
        tool RO Manager, which provides a straightforward way for scientists
        to assemble an aggregation of their experiment materials and methods
        which can then be published and shared with colleagues or linked to sci-
        entific publications, to enhance the reproducibility and trustworthiness
        of experiment results. The tool is currently being tested by a small group
        of scientists from two different domains, who would like to preserve suffi-
        cient materials and information along with their scientific results in order
        to improve their reproducibility in the future.


1     Reproducibility and New Form of Digital Publication
    There is a growing need for revolutionizing the existing practices of digital
publishing, to take it beyond being a replicate of the paper form. The hypothesis
is that digital papers can be greatly enhanced with additional features by making
effective use of information technology, to accelerate the turn of knowledge [2].
The goals of these activities are multi-fold, ranging from enabling more efficient
search and discovery of knowledge to supporting the reproducibility of science
by promoting the sharing of data as well as tools and methods.
    In this paper, we present an approach of aggregating and publishing a collec-
tion of auxiliary information together with experiment results, which can then
!
    The research reported in this paper is supported by the EU Wf4Ever project (270129)
    funded under EU FP7 (ICT-2009.4.1).
be shared and linked in scientific publications in order to boost the reuse and
reproducibility of these results. This aggregation of objects is represented using
our Research Object (RO) model [1], which provides an aggregation structure
for collecting essential resources related to experiment results along with publi-
cations. This includes not only the data used but also methods applied to pro-
duce and analyse that data, as well as auxiliary documents, scripts and software
used in the research process. Built upon the RO model, we create a lightweight
command-line tool called the Research Object Manager, or RO Manager. The
goal of RO Manager is twofold: to ease the process of packaging necessary ma-
terials and methods together with experiment results in order to boost their
reproducibility and hence reuse, and to ease the creation of new form of repro-
ducible publications by making these aggregation objects sharable and citable.
    Reproducibility of computational science has been widely explored in many
existing scientific domains [3]. Recent efforts 8 have focused on building the
tools and infrastructure to support the reproducibility of experiment results
in publications. However, publishing reproducible papers requires preparation
work prior to the final stage of experiment life cycle, and none of the existing
work supports reproducible science from the early stage of the cycle. Neither is
there an approach for automatically assessing and monitoring of the “health” of
published materials and methods for supporting the reproducibility.
    Existing studies have shown that the top barrier for the scientists to publish
their results in a reproducible way is the time required for creating documenta-
tion [3]. Our RO Manager tool provides a lightweight solution for scientists to
create a structured documentation about their reproducible experiment results
in an environment most familiar to them, i.e. their local file systems, by simply
executing a series of computer commands. Currently a manual validation pro-
cess is commonly employed to validate the resources submitted by the scientists.
However, this manual process is hard to scale and a continuous monitoring of the
health of the aggregation (such as the accessibility of the aggregated resources)
is entirely missing. The RO Manager tool takes one step further by providing a
means to encode the requirements for the list of digital components to be sub-
mitted with experiment results in a machine-processable format so that we can
evaluate that an RO contains all the necessary information required at the time
of submission and monitor the health of these information.
    These two gaps in supporting reproducible science and publication drove the
design of our RO Manager tool, introduced in this paper.

2     What is RO Manager
    RO Manager is a command line tool for creating, displaying and manipulating
ROs. It is meant to provide a lightweight tooling for scientists to create ROs in
an environment that is most familiar to them, i.e. their local file system, before
publishing and sharing it in the open world. A command-line tool is the most
lightweight choice for this purpose, which also provides the following additional
advantages:
8
    http://www.executablepapers.com/
 – Focus on the functionality of the tooling at the first stage of developments
   rather than graphical user interface (GUI) design.
 – Provide users access control of their aggregation object before sharing it with
   the public or friends, which is crucial for scientists who want to protect their
   experiment resources before publishing them.
 – Share the knowledge of its usage by simple shell script files, to demonstrate
   the usage of tool by executing a sequence of RO Manager comannds.
RO Manager is implemented as a Python program, using Python version 2.7
and available for download and installation at https://github.com/wf4ever/
ro-manager. To date the RO Manager provides the following functionalities:
 – Create and populate an RO: By executing the ro create command, the
   tool will automatically generate an RO structure in the local directory and
   a manifest file in RDF format to describe its content using RO ontologies 9 .
 – Annotate an RO or its component: Annotations can be provided directly,
   as values for specified attributes (title, type, etc.) or by attaching an ex-
   isting RDF file to the metadata describing an RO, using the Annotation
   Ontology10 . Some annotations can be automatically generated, describing
   who created the RO and when, while additional annotations, like document
   type, etc, have to be manually created, using the command ro annotate.
 – Display the status of an RO and its annotations: All the annotations on the
   RO as well as on each of its components can be displayed by executing ro
   annotations.
 – Evaluate the quality of the RO: We define the list of requirements for an
   RO to satisfy in a structured format, based on our Minim model [4]. Using
   this and the manifest file our evaluation component can assess whether an
   RO contains all the information required for supporting re-running an ex-
   periment or replicating a previous result, so that scientists can amend any
   missing resources before publishing their ROs.
 – Publish an RO in a public RO repository: The resulting RO can be published
   in a web-based RO repository, becoming citable via a URI, which can be
   dereferenced either as an HTML page or a set of RDF descriptions, returned
   by our RESTful service API. We currently only support publication in our
   RO repository sandbox. We are working on supporting other existing public
   repositories for sharing reproducible experiment resources.

3      User Experiences of RO Manager
    RO Manager has been presented to domain scientists as a workbench to
create and manage ROs during the investigation phase of their research. The
feedback from the scientists demonstrate the need for supporting the manage-
ment of ROs prior to the final stage of research investigation. They also show a
willingness to investigate time on RO creation in order to benefit from the evo-
lution control and quality evaluation. Compared to a web-based user interface,
9
     http://purl.org/wf4ever/ro#
10
     http://purl.org/ao/
the scientists appreciate the flexibility of managing their data locally. Because
the investigation and design phase involves a certain amount of modification to
their initial experiment designs, they are very interested in adopting the RO
evolution management functionality, in development at this moment, in order to
help them track changes made in different versions of the ROs and analyze the
impact on the reproducibility of the RO by these changes.
    Although a command-line tool requires an initial learning curve, once the
scientists got used to it they found it especially convenient for creating an RO
from a bulk of resources. However, the current support for publishing and an-
notating ROs is less satisfactory. Additional editing or annotations might take
place in a web-based space where RO is shared, which must be seamlessly syn-
chronized with its local copies. Although some annotations can be automatically
generated by the RO Manager, the majority of them must be manually created;
as a command-line tool RO Manager is not the best tooling for this purpose.

4    The Vision of RO Manager
    We position RO Manager as a local workbench for scientists to create and
manipulate ROs, which can then be shared as either a resource on the Web or
part of their newer, richer form of research publication. It is one small component
among the big picture of supporting reproducible science. We would like to make
use of existing annotation tools, to ease the creation of richer documentation.
Particularly, we would like to support annotations at various granularities, and
aggregate and retain existing annotations of an external resource (such as a
script stored in web site or a web service) by its URIs. To promote the visibility
of our resulting ROs as well reproducible science in general, we would like to
work together with publishers and existing web sites dedicated for the sharing
of reproducible experiment resources, to publish ROs and provide our enhanced
support for assessing and monitoring their fitness for supporting reproducibility.
Finally, we are working on migrating the functionalities of this tool to a Web-
based interface, for users who are less influent with command-line tools, which
will also provide some richer visualization of the content of the RO and its
evolutions.
References
1. Bechhofer, S., Buchan, I., De Roure, D., Missier, P., Ainsworth, J., Bhagat, J.,
   Couch, P., Cruickshank, D., Delderfield, M., Dunlop, I., Gamble, M., Michaelides,
   D., Owen, S., Newman, D., Sufi, S., Goble, C.: Why linked data is not enough for
   scientists. Future Generation Computer Systems (2011)
2. Goble, C.A., Roure, D.D., Bechhofer, S.: Accelerating scientists’ knowledge turns.
   In: Proceedings of The 3rd international IC3K joint conference on Knowledge Dis-
   covery, Knowledge Engineering and Knowledge Management. (2012), in press
3. Stodden, V.: The scientific method in practice: reproducibility in the computational
   sciences (2010)
4. Zhao, J., Gomez-Perez, J., Belhajjame, K., et al: Why workflows break-
   understanding and combating decay in taverna workflows. In: IEEE eScience. p.
   To appear (2012)