=Paper= {{Paper |id=Vol-2363/paper4 |storemode=property |title=Managing, Preserving and Disseminating Research Objects in Earth Science with the ROHub Science Gateway |pdfUrl=https://ceur-ws.org/Vol-2363/paper4.pdf |volume=Vol-2363 |dblpUrl=https://dblp.org/rec/conf/iwsg/PalmaMGG17 }} ==Managing, Preserving and Disseminating Research Objects in Earth Science with the ROHub Science Gateway== https://ceur-ws.org/Vol-2363/paper4.pdf
                           9th International Workshop on Science Gateways (IWSG 2017), 19-21 June 2017



    Managing, Preserving and Disseminating Research
    Objects in Earth Science with the ROHub Science
                         Gateway
             Raul Palma and Cezary Mazurek
                                                                              Jose Manuel Gomez-Perez and Andrés García
      Poznan Supercomputing and Networking Center
                                                                                              Expert System
                    Poznan, Poland
                                                                                              Madrid, Spain
           {rpalma, mazurek}@man.poznan.pl
                                                                                   {jmgomez, agarcia}@expertsystem.com


    Abstract— Research Objects (ROs) are semantically enriched         experiment/observation, and the means for sharing, validating
information units encapsulating all the materials and methods          and disseminating the research work as a single information
relevant to a particular scientific investigation, their associated    unit, to be interpreted and reused by the community in the
metadata and the context where such resources were produced            future.
and came into play. Their purpose is to enhance the sharing,
preservation and communication of data-intensive science,                  Such capabilities require both an underlying (research
facilitating validation, citation and reuse by the community. For      object) model and the technological support implementing this
such mission, infrastructure and tools for RO governance are           model. The former, known as the RO model, specifies the
critical. ROHub is the platform of reference in the management         semantic vocabulary and relations for capturing and describing
of ROs and their lifecycle. It enables researchers to preserve their   ROs, their provenance and lifecycle. The latter is provided by
work and make it available to others, as well as to discover and       ROHub, a holistic RO management platform implemented
reuse pre-existing scientific knowledge. In this paper, we             natively on top of the RO model. ROHub supports scientists
introduce ROHub to the Science Gateways community and                  throughout the research lifecycle to manage and to structure
present new capabilities and extensions specific to Earth              their resources as high-quality ROs, fostering collaboration
Sciences, beyond previous efforts in experimental disciplines.         within and across scientific communities with such ROs at the
                                                                       center.
   Keywords—Research Objects, Earth Science
                                                                           In the following, we introduce the RO model with a
                      I.      INTRODUCTION                             concrete example, followed by a description of ROHub and
     Research in data-intensive disciplines is increasingly            the recently implemented extensions to both the model and the
consuming and generating a variety of digital resources during         platform in support of Earth Sciences communities. Finally,
the course of scientific investigations. This has steadily             we illustrate the usage of research objects and ROHub with a
increased the need for means to systematically capture the             working example and conclude with a discussion on the
lifecycle of scientific investigations, which at the same time         ongoing work.
provide a single-entry point to all the related resources,
including data, publications, computational resources, and the                           II.     RESEARCH OBJECTS
researchers involved in the investigation. In Earth Science, for
                                                                          A research object can aggregate an arbitrary number of
example, the high-level research and information lifecycle
                                                                       heterogeneous resources, which can be internal or external
involves tasks such as: access to data (e.g., raw data and/or a
                                                                       (linked by reference) to the research object location, such as
variety of added value products); sharing results (with
colleagues and/or community); execution of data analytic               the data used or the results produced in an experiment study,
methods and generation of models; validation and                       the (computational) methods employed to produce and analyse
dissemination of findings; and collaboration with colleagues           that data, and the people involved in the investigation.
[1].                                                                   Additionally, the resources in the research objects can be
                                                                       organised within folders (a special type of resource), to
    Research Objects (ROs) [2] provide the mechanisms to               facilitate their inspection. Similarly, the research object can
support researchers in these tasks. Originally conceived to            encapsulate any number of annotations associated to these
support the scientific endeavour in experimental disciplines           resources (or the research object itself), enabling the
like Genomics or Astrophysics, ROs are rapidly being adopted
                                                                       understanding and interpretation of the scientific work, such as
in other fields, with special interest in Earth Sciences. With
                                                                       provenance and evolution information, descriptions of the
the necessary extensions and updates, research objects can
support also earth scientists to manage their scientific               computational methods, dependency information and settings
investigations lifecycle, providing structured containers that         about the experiment executions.
aggregate all the resources related to a particular
                         9th International Workshop on Science Gateways (IWSG 2017), 19-21 June 2017




                                     Figure 1 Partial view of an exemplary research object structure

    To represent the rich, and potentially complex, structure of      API and the RO Evolution API, which define the formats and
a research object, the underlying model was implemented as a          links used to: i) create and maintain ROs, the resources
suite of lightweight ontologies building upon existing                aggregated and the associated annotations (metadata); ii)
vocabularies: OAI ORE (Object Exchange and Reuse) for                 change the lifecycle stage of a RO, create an immutable copy
specifying aggregation of resources, the Annotation Ontology          (snapshot or archive) from a working (live) research object
(AO) to support the annotations, and the PROV Ontology to             and fetch their evolution provenance. The backend also
represent provenance information. A complete specification of         provides APIs for notifications, search, access control and user
the model can be found in [3].                                        management, plus a SPARQL endpoint. The frontend exposes
                                                                      RO functionalities to the end-users through a web GUI. This is
    Figure 1 depicts a partial (and simplified) view of a             the main interface for researchers to interact with ROHub.
research object structure that illustrates the RO model with a
concrete example. This research object is the result of the           B. Key Features Overview
study elaborated in Section V.                                            Create, manage and share ROs: ROHub provides
                        III.    ROHUB                                 different methods for creating ROs: from scratch, from a zip
                                                                      file or by importing resources from other repositories. It also
   ROHub (www.rohub.org) enables scientists to manage and             supports different access modes for sharing ROs (open, public
preserve their research work through ROs, to make it available        or private), allowing to specify who can read/write to the RO.
for publishing, to collaborate and to discover new knowledge
                                                                         Discover, explore and reuse ROs using a faceted or
(see [4,5] for a more detailed description of ROHub and its
                                                                      keyword search interfaces, or using directly the SPARQL
origin).
                                                                      endpoint, for discovering ROs that can then be inspected,
A. ROHub Implementation                                               downloaded, and reused to create new ones.
    ROHub comprises both a backend service and a frontend                Assess RO quality: The RO overview panel shows a
(client) application. The backend provides a set of REST APIs         progress bar of the RO quality based on a set of basic RO
[6] implementing the RO model, which can be used to access            requirements (Figure 2). Further quality information can be
ROHub programmatically. The two primary ones are the RO               found in the quality panel, where ROs can be assessed against
                        9th International Workshop on Science Gateways (IWSG 2017), 19-21 June 2017


predefined checklist templates for specific domains or            described above.
community needs.
                                                                             IV.      EXTENSIONS FOR EARTH SCIENCE
    Manage RO evolution: ROHub allows creating snapshots
of the current state of the RO for sharing or release, keeping        ROHub is a domain-agnostic platform that has been tested
their versioning information and associated changes. RO           in Experimental Sciences. Currently, we are extending its
evolution can be visualized from the History panel.               capabilities to support the specific needs of the Earth Science
                                                                  community as part of EVER-EST project. The analysis of
                                                                  such needs led to the new features in the model and in the
                                                                  platform, including support for:
                                                                     Geospatial and time information. The most relevant
                                                                  metadata in Earth Sciences, includes geographical location
                                                                  and the time period covered or associated to the RO.
                                                                      Data access policies to specify more detailed information
                                                                  about the possible use of digital content in publishing,
                                                                  distribution, and consumption of digital media across all
                                                                  sectors and communities.
                                                                      Intellectual properties rights to specify           detailed
                                                                  information on the terms of use for a given resource.
                                                                      RO Fork functionality to create a new RO from an
                                                                  existing one to start a new line of work or extend a previous
                                                                  one, citing automatically the source RO.
                                                                      The resulting RO model extensions are publicly available
                                                                  (https://github.com/wf4ever/ro/tree/earth-science) and ROHub
                                                                  is currently being extended to support them and to provide
                                                                  related user interfaces. Such new capabilities include amongst
              Figure 2 ROHub – RO overview panel                  others: access and manipulation of geopositioned ROs through
                                                                  a map interface, definition and enforcement of data access
Nested ROs: An RO can aggregate any type of resource,             policies and intellectual property rights, and the creation of
including internal, links to external resources and other ROs.    new ROs by forking existing ones.
The latter allows aggregating RO bundles [7] that are self-
                                                                                     V.    EXEMPLARY USE CASE
contained ROs serialised as ZIP files and generated by 3rd
party tools (e.g., workflow management systems).                      In this section, we introduce an excerpt of one real
                                                                  scenario provided by a virtual research community from
Preserve and monitor ROs: Long-term preservation features         EVER-EST project, and then highlight the current limitations
include RO fixity checking and quality monitoring that            in the existing technologies and practices to illustrate how the
generate notifications of changes. RO content and quality         use of the research object and ROHub can contribute to the
changes are shown in the notification panel, and an atom feed     preservation, sharing and reuse of research outputs. The RO
is available to get automatic notifications. Additionally, the    associated to this scenario (depicted in Figure 1) is available
quality monitoring has an interface that can be reached from      at: http://sandbox.rohub.org/rodl/ROs/SeaMonitoring01/
the quality panel to visualise the RO quality through time.       Sea Monitoring Scenario: A researcher needs to define the
                                                                  habitat extent of the Cold Water Coral in the Bari Canyon and
Semantic enrichment: An RO can be enriched automatically          to provide this information to assess the good environmental
with structured metadata extracted from its textual content,      status related to the descriptor D1 (Biodiversity, Indicator
including the main concepts, domains, lemmas and named            Habitat extent) within the Marine Strategy Framework
entities, in order to facilitate its discovery via the            Directive for the Italian waters. To this scope, the researcher
faceted/keyword      search    interfaces.   Such      metadata   needs a habitat suitability model for the Cold Water Corals.
complements the metadata provided explicitly by scientists,       The researcher needs to search high resolution bathymetric
offering a richer, machine-readable description of the RO.        data, Cold Water Coral occurrences data and to run a good
                                                                  model to obtain a reliable map of habitat suitability for Cold
DOI and citation: Now a DataCite (www.datacite.org) DOI           Water Corals. The researcher needs to release the results to
allocator, ROHub can assign a DOI to the released ROs,            colleagues from different institutions working at the Marine
enabling citation and stimulating scholarly communication         Strategy Framework Directive, to share the model with them,
and sharing before actual paper publication. DOI assignment       to reuse the model in different locations, and to re-run the
follows RO release after automatically checking that the RO       model after one year using new data from the same location.
follows DataCite’s policies, through the checklist mechanism      For this scenario, it is very important to share data and results
                        9th International Workshop on Science Gateways (IWSG 2017), 19-21 June 2017


within the community, to reuse the models coming from               words, include:
different scientists working at the same topic, to preserve the
                                                                    Semantic search: To reuse an existing research object about
results and to publish methodologies and final maps.
                                                                    habitat suitability models the researcher will pose a query with
Current limitations: Currently there is not a reference site        appropriated concepts (e.g. ecology, habitat suitability, habitat
where a scientist can find publications on this specific topic,     extent) in order to easily find the most suitable ones. Semantic
workflows executing the models, links to the data to be used        search gives the opportunity to retrieve concepts from all
and results (to mention a few). There are no specific               documents in the research object and it is really effective to
repositories that are used to preserve and reuse all this           find the research objects containing data and information that
information. Generally, there is no information about the           you need for your work. Without the semantic search a normal
quality of the models and the methodologies applied and             keyword search could be performed, however it was difficult
described in the paper. Within the Marine Strategy Framework        to find the effective concepts.
Directive (http://data.europa.eu/eli/dir/2008/56/oj) there is a
                                                                    Checklists: To have information about the research object
big lack of communication and all the relevant information is
dispersed in different repositories.                                quality, and to select the research objects that effectively
                                                                    work. It is important to reuse research objects with a running
Overcoming the limitations with ROHub: ROHub allows                 workflow and real link to data. The checklist is a good tool to
the scientists to encapsulate the data, provenance of workflows     evaluate a research object without losing time in verifying
executions, results, documentation and other resources related      manually its content.
to the particular study, and to effectively preserve, share and
reuse these resources through a single information unit.            DOIs: After reusing a workflow, with different data input and
                                                                    modifying the model parameters a new research object will be
Moreover, ROHub allows the scientists to manage, track and
visualize the complete scientific life cycle of the study, to       created with the selected data and the best suitable parameters.
                                                                    This new research object will be released with a DOI that
collaborate throughout this process, and to disseminate the
associated research object at different stages with colleagues      gives the opportunity to be properly cited by other scientists in
                                                                    the community. The use of DOIs encourages researchers to
or with the community (see Figure 3), so that other scientists
can reuse the models in different locations and using different     create research objects containing new data and research
                                                                    outcomes and specially to share them with the community,
datasets. For the monitoring purpose the research object gives
the possibility to access to all the resources necessary to         since they enable citation and credit. This mechanism adds
                                                                    incentives for scientists to share their work and stimulate
exactly re-run the same model in the same location at different
time giving the opportunity to evaluate the differences in          reuse, accelerating the incremental development of science.
habitat extents applying the same methodologies. Some of the        Scientific lifecycle management: The researcher can keep
ROHub features used by the end-users in this scenario, in their     track of the evolution of the scientific study, release
                                                                    preliminary results after reaching intermediate milestones in




                                            Figure 3 Research object lifecycle example
                              9th International Workshop on Science Gateways (IWSG 2017), 19-21 June 2017


order to share the results with other colleagues and to keep a                        reproducible science, in: Presutti, V., et al. (eds.) SemWebEval 2014.
record of a particular state in the study. Such intermediate                          CCIS, vol. 457, pp. 77–82, Springer, Heidelberg (2014), Crete, Greece,
                                                                                      May 2014.
releases can then be compared to analyze the changes, or they
can be used to start alternative lines of work.                                 [5]   R. Palma, O. Corcho, P. Hołubowicz, S. Pérez, K. Page, C. Mazurek,
                                                                                      Digital libraries for the preservation of research methods and associated
Notifications: The researcher can receive notifications                               artefacts, in Proc. 1st International Workshop on the Digital Preservation
                                                                                      of Research Methods and Artefacts (DPRMA 2013) at Joint Conference
regarding changes in research object content but also about
                                                                                      on Digital Libraries (JCDL 2013). pp.8-15. Indianapolis, Indiana, USA,
changes in the quality assessment. Updates on quality                                 July 2013.
downfalls can be particularly useful (e.g., one of the services                 [6]   R. Palma, P. Hołubowicz, K. Page, S. Soiland-Reyes, G. Klyne, C.
used is no longer available) in order to take corrective actions.                     Mazurek. A Suite of APIs for the Management of Research Objects,
Similarly, team collaborators can be notified about research                          Proceedings of the Developers Workshop, ISWC. October 2014.
object editing activity to keep track of the progress in the                    [7]   Research Object Bundle 1.0 specification. November 2014.
study, or to know when their input is required.                                       https://researchobject.github.io/specifications/bundle/

                          VI.       CONCLUSION
    The adoption of the Research Object paradigm by the
scientific enterprise can accelerate science through a better
management of the scientific information. Benefits of this
approach can have an immediate impact on the validation,
sharing, preservation and (eventually) reuse of scientific
outcomes. However, appropriate tools and infrastructure need
to be in place in order to provide the necessary functionalities
to manage ROs throughout their entire lifecycle across the
different scientific communities. ROHub is the first and main
scientific gateway to provide holistic support for the
management, sharing and communication of scientific
knowledge in the form of ROs. In this paper, we recap on its
main features and report the recently implemented and
ongoing extensions that enable a variety of scientific
communities, and specifically earth scientists, to adopt ROs in
their daily work. It is still early to measure the impact that this
will have in terms of increased scientific productivity and
scholarly communication and citation across the different
scientific areas. In addition to further refinement of the
methods and tools produced, future work involves piloting the
approach in our scientific communities and beyond in order to
collect data, e.g. biblio and altmetrics, number of ROs,
number of users, etc. that allow assessing such impact.
                          ACKNOWLEDGMENT
   This work is supported by the EVER-EST EU project
(HORIZON2020-674907). Special thanks to Federica Foglini,
from CNR-ISMAR, whose RO in sea monitoring has been
used as an example across the paper.
                                REFERENCES
[1]   EVER-EST project, D3.1 - Use Cases Description and User Needs
      Document. Project deliverable. 2016
[2]   K. Belhajjame, O. Corcho, D. Garijo, J. Zhao, P. Missier, D. Newman,
      R. Palma, S. Bechhofer, E. Garc´ıa-Cuesta, J.M. Gomez-Perez, G.
      Klyne, K. Page, M. Roos, J.E. Ruiz, S. Soiland-Reyes, L. Verdes-
      Montenegro, D. De Roure, and C.A. Goble. Workflow-centric research
      objects: First class citizens in scholarly discourse. In Proceedings of
      SePublica2012, pages 112, 2012.
[3]   K. Belhajjame, J. Zhao, D. Garijo, M. Gamble, K. Hettne, R. Palma, E.
      Mina, O. Corcho, J. Gómez-Pérez, S. Bechhofer, G. Klyne, C. Goble,
      Using a suite of ontologies for preserving workflow-centric research
      objects, in Journal of Web Semantics: Science, Services and Agents on
      the World Wide Web, 2015. doi:10.1016/j.websem.2015.01.003
[4]   R. Palma, O. Corcho, J. Gomez-Perez and C. Mazurek, ROHub – a
      digital library of research objects supporting scientists towards