<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Utilising Semantic Web Ontologies To Publish Experimental Workflows</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Harshvardhan J Pandit</string-name>
          <email>harshvardhan.pandit@adaptcentre.ie</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ensar Hadziselimovic</string-name>
          <email>ensar.hadziselimovic@adaptcentre.ie</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dave Lewis</string-name>
          <email>dave.lewis@adaptcentre.ie</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>In Reply To: https://linkedresearch.org/calls</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>ADAPT Centre</institution>
          ,
          <country country="IE">Ireland</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of Computer Science &amp; Statistics, Trinity College Dublin</institution>
          ,
          <addr-line>Dublin</addr-line>
          ,
          <country country="IE">Ireland</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2017</year>
      </pub-date>
      <abstract>
        <p>Reproducibility in experiments is necessary to verify claims and to reuse prior work in experiments that advance research. However, the traditional model of publication validates research claims through peer-review without taking reproducibility into account. Workflows encapsulate experiment descriptions and components and are suitable for representing reproducibility. Additionally, they can be published alongside traditional patterns as a form of documentation for the experiment which can be combined with linked open data. For reproducibility utilising published datasets, it is necessary to declare the conditions or restrictions for permissible reuse. In this paper, we take a look at the state of workflow reproducibility through a browser based tool and a corresponding study to identify how workflows might be combined with traditional forms of documentation and publication. We also discuss the licensing aspects for data in workflows and how it can be annotated using linked open data ontologies.</p>
      </abstract>
      <kwd-group>
        <kwd>https</kwd>
        <kwd>//github</kwd>
        <kwd>com/coolharsh55/opmw_workflow_editor</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>mand for open access means researchers must share details about their experi‐
ment such as implementation steps and datasets in a highly accessible and
structured manner. Traditional patterns of publication such as journals are re‐
acting to this demand by providing increasingly interactive access to data that
is often embedded or displayed along with the published paper. However, such
methods of publication do not take into consideration the reproducibility of
the experiment as an important metric which puts the onus of ensuring suffi‐
cient resource sharing and access on the researchers who largely fail to take it
into consideration.</p>
      <p>
        Reproducibility in scientific experiments allows other researchers to repro‐
duce the experiment to obtain results that can confirm or dispute the original
claims [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. To encourage verifiability and adoption of methods, access to the
original experiment and results along with its components or datasets must be
provided in a transparent and declarative manner. Research published through
the peer-review process is seen as having credibility for its correctness which
does not reflect upon its reproducibility. Approaches such as attributing
source code via online repositories such as Github or executable components
through Docker or Virtual Machines help share the technology behind the ex‐
periment, though this creates additional problems due to the sheer diversity in
differing technologies and frameworks in the software world.
      </p>
      <p>
        Workflows capture complex methods and their interactions as a series of
steps [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] and have been used successfully in several different areas of scientific
research [
        <xref ref-type="bibr" rid="ref3 ref4 ref5">3,4,5</xref>
        ]. There have been several efforts to map workflows as linked
data ontologies [a,b,c,d] along with several tools and frameworks that help
users in publishing workflows. As workflows encapsulate the experiment and
its subsequent execution, they are also useful in assessing the reproducibility
of research by including them in publications.
      </p>
      <p>Workflows can be helpful in defining and sharing experiments along with as‐
sociated resources using linked open data principles which can help streamline
the process and make them more accessible. We aim to investigate means to
discern the parity between adoption of workflows as a documentation mecha‐
nism and determining how researchers carry out research documentation and
the associated challenges in augmenting existing publication mechanisms using
linked open data principles. To this aim, we have modelled an experiment to
better understand documentation habits and publication challenges for work‐
flows and data licenses using a browser based tool. We also present a discus‐
sion of the current state of affairs and the need for a more decentralised model
of publication that augments traditional approaches.</p>
      <p>The rest of the paper is laid out as follows: In Section 2, we discuss the
background and related work with respect to workflows and data licensing. We
explain the motivation for identifying workflow documentation through a
browser based tool in Section 3, with the licensing aspect of datasets discussed
in Section 4. We conclude our discussion in Section 5 with an outlook towards
future work.</p>
    </sec>
    <sec id="sec-2">
      <title>2.1   Capturing Provenance in Experiment Workflows</title>
      <p>
        Provenance is information about entities, activities, and people (or software)
involved in producing data or a component which can be used to form an as‐
sessment about its quality, reliability, or trustworthiness. The PROV ontology,
which is a W3C recommendation since 30th April 2013, provides definitions
for interchange of provenance information. Using PROV, we can define entities
and the various relations and operations between them such as generated by,
derived from, and attributions. PROV has been successfully utilised in several
domains and applications [e] including encapsulation of scientific workflows
[
        <xref ref-type="bibr" rid="ref6 ref7">6,7</xref>
        ] and provenance repositories [
        <xref ref-type="bibr" rid="ref8 ref9">8,9</xref>
        ].
      </p>
      <p>
        PROV was designed to be generic and domain independent, and needs to be
extended to address the requirements to represent workflow templates and ex‐
ecutions [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. P-Plan extends PROV to represent plans that guide execution of
scientific processes and describes how the plans are composed and their corre‐
spondence to provenance records that describe the execution itself. OPMW
reuses the Open Provenance Model core vocabulary along with extending both
PROV and P-Plan to describe workflow traces and templates. OPMW is
mostly suited as an ontology to describe workflows in a manner aligning with
how researchers design and conduct experiments, and has been used in tools
and frameworks to capture experimental workflows.
      </p>
      <p>OPMW allows representation of workflows at a very granular level. In
OPMW, a workflow template represents the design of the workflow containing
different steps or processes. Artifacts are part of a template and are used or
generated by the processes. There are two types of artifacts - data variables
and parameter variables. Data variables can be used as inputs and can also be
generated by processes whereas parameters work as expected for workflow
steps. OPMW reuses terms from Dublin Core to represent attribution for au‐
thor, contributor, rights and license of datasets and the code used in the work‐
flow. Workflow Executions are bound to the template and represent an execu‐
tion run. Each step or process in the template has a corresponding execution
process linked to it containing provenance statements about its execution. Ex‐
ecution Artifacts used or generated during execution are linked to their corre‐
sponding artifact from the template. Executions have terms used to define the
start and end of execution traces along with metadata for artifacts such as file
location, file size, and declaration of agents that perform or are involved in the
execution process such as scripts, or tools used to design and/or execute work‐
flows.</p>
      <p>
        There are several tools that allow the creation and consumption of work‐
flows [
        <xref ref-type="bibr" rid="ref11 ref12 ref13 ref14">11,12,13,14</xref>
        ]. WINGS [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] is an end-to-end workflow system that allows
describing and instantiating high-level workflow templates and executing them
in various executing environments. It uses an implementation of OPMW to
model workflows into templates and executions and stores them as a catalogue
and features workflow reuse. Workflows can utilise data variables from the cat‐
alogue while parameters are limited to literal values. WINGS can interleave
metadata generated during execution to utilise it in workflow design and pro‐
cesses which allows creation of partial workflows that can be incrementally it‐
erated towards completion and execution.
      </p>
      <p>A related tool called WorkflowExplorer allows navigating workflow tem‐
plates along with their metadata and execution results. It displays information
as a webpage consisting of all resources related to the template grouped by
their common type and retrieves this data dynamically. Each resource is a link
to a webpage describing it and shows information about it such as if an execu‐
tion run has been successful or listing execution instances for a template vari‐
able. Another tool for documentation of workflows is the Organic Data Science
Wiki, which can generate persistent documentation for workflows automati‐
cally from the repository.</p>
      <p>
        Workflow fragments can be described as a collection of workflow compo‐
nents which form a subset of the workflow and represent some distinct func‐
tionality. Fragments can be shared at a more granular level than workflows,
and can thus be reused more easily. Experiments that utilise the same frag‐
ments can be linked or clustered based on their metadata, though such experi‐
ments would not necessarily be constituted as variations of a common tem‐
plate. The idea of enacting reproducibility over such fragments rather than the
workflow as a whole has seen some interest [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ].
      </p>
    </sec>
    <sec id="sec-3">
      <title>2.2   Reproducibility</title>
      <p>
        Reproducibility is the ability to reproduce the results of an experiment with
the goal to confirm or dispute the experiments claims [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. It requires access to
the description of the original experiment and its results along with workflows
that capture the different settings required to accurately reproduce the execu‐
tion environment. The terms repeatability and variation are commonly aligned
with reproducibility whose formal definitions can be found in [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]. Reproduc‐
tion of experiments is based on availability of resources which may not be ac‐
cessible or were changed since the experiment execution. Reproducibility in
such cases becomes challenging as comparing workflows between the original
and a rerun is non-trivial and time-consuming.
      </p>
      <p>
        Research Objects [18,19] encompass initiatives that allow the bundling to‐
gether of all resources and metadata associated with an experiment. Each re‐
source is identified using a globally unique identifier such as DOI for publica‐
tion or ORCID for researchers. Resource objects can aggregate information re‐
lated to workflows such as original hypothesis, inputs used in executions, and
workflow definitions along with execution traces of workflow runs. Annotations
attached to the research object can include provenance traces and information
about workflow evolution and its component elements. TIMBUS Context
Model [20] is similar in aims as Research Objects while additionally allowing
bundling of legal metadata such as copyright licenses and patents and intellec‐
tual property rights. Its authors have presented a mapping from Context
Model to Research Object making them compatible in usage and consump‐
tion. VisTrail [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] allows creation of reproducible papers that contain descrip‐
tion of the experiment, links for input data, applications, and visualisations
for the execution outputs. ReproZip can help with capturing provenance infor‐
mation along with any environmental parameters required for execution into a
self-contained reproducible package.
      </p>
      <p>Previously mentioned approaches that mitigate these problems look at cap‐
turing all the information required to define and reproduce an experimental
workflow. As this information often contains datasets, resources, and services
which can change or become inaccessible, the associated workflows can no
longer be successfully shared or utilised. In [21], the authors evaluate work‐
flows and term this phenomenon as ‘workflow decay’. They analysed 92 Tav‐
erna workflows and list four causes of workflow decay which are missing
volatile third party resources, missing example data, missing execution envi‐
ronment, and insufficient description about workflows. In [22], the authors ex‐
amined 613 papers from ACM conferences, out of which 515 contained tools
developed by the authors themselves, 231 contained accessible source code of
which only 123 could be successfully built. Common causes of failure were
missing environment variables and incorrect or unspecified dependencies. In
another comprehensive study [23], the authors analysed nearly 1500 workflows
from the myExperiment repository that used Taverna. They found that 737
workflows were accessible and executable workflows, out of which 341 executed
without errors while only 29.2% of 1443 datasets were usable.</p>
      <p>Reproducibility challenges and best practices has seen several discussions. In
[23,24], the authors present six strategies for creation of reproducible scientific
workflows that focus on defining and sharing of all information and data in a
clear and persistent manner. [25] discusses the best practices for workflow au‐
thors with a particular focus on how to prevent workflow decay. The various
challenges in workflow reproducibility arising from third party services is dis‐
cussed in [26,27]. In [25] the authors present seven types of (meta-)data re‐
quired to make workflows reproducible of which some needs to be defined
manually by the user, while the rest can be inferred from provenance data or
generated automatically by the system. In [28] the authors define two types of
reproduction - physical and logical. Physical reproducibility conserves work‐
flows by packaging all its components so that an identical replica can be cre‐
ated and reused, whereas logical reproducibility requires workflows and com‐
ponents to be described with enough information for others to reproduce a
similar workflow in future. [29] uses this principles to utilise Docker as a work‐
flow environment that packages the experiment execution and services along
with required data.</p>
      <p>In [29,30], the authors investigate the probability of making a workflow re‐
producible. They use decay parameter [31] which is the probabilistic term used
to define four categories of reproduction based on their probability for repro‐
ducibility, which are reproducible, reproducible with extra cost, approximately
reproducible, reproducible with probability P, and non-reproducible. The au‐
thors also present operational definitions for various terms based on the decay
parameter. Repeatability is executing the experiment again (in exactly the
same manner) with the same environmental and user specific parameters
where the decay parameters are any randomly values such as system noise or
captured timestamps. Variability is where the workflow is run on the same in‐
frastructure with some intentional modification of the jobs. Portability is repe‐
tition in a different environment and reproducibility is defined as being a com‐
bination of repeatable and portable.</p>
      <p>
        By considering provenance traces as acyclic graphs, it is possible to utilise
graph analysis to find relationships and interactions between workflows. Data
artifacts or activities are considered as nodes with the links denoting relation‐
ships between them. By tracing data flow in a graph, it is possible to reflect
and infer the production and consumption of data for workflow executions.
PDIFF [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] utilises this approach to determine whether an experiment has been
reproduced by identifying points of divergence between graphs of differing
workflows. It tries to find if the two workflows represent the same execution
trace, and if they do not, then at what point do they diverge. FragFlow [32] is
another approach utilising graphs to obtain workflow fragments that relate
workflows to each other and indicate parts that are more likely to be reused.
In [33], the authors present a technique to reduce visual complexity in work‐
flow graphs. They argue that the visualisation generated by combining the
logical and structural attributes leads to a better understanding of complex
and relatively unfamiliar systems.
      </p>
      <p>Along with approaches that focus on enabling the creation and consumption
of research, there has been a growing discussion on the principles and methods
used in the publication and reproduction of workflows along with associated
resources such as datasets. The Joint Declaration of Data Citation Principles
[34] states that data should be machine readable and treated the same as pa‐
pers in a scholarly ecosystem. The FAIR Principles [35], which stand for find‐
able, accessible, interoperable and reusable data, encourage semantic interop‐
erability through reuse of data. Linked Research [36] defines the requirements
for a web-based ecosystem for scholarly communication which makes it possi‐
ble to publish links to workflows and other related resources using existing
technologies. LERU Roadmap for Research Data [37] recommends identifying
documentation and metadata requirements at the start of a project which
would then comply with existing standards for the content. It also advocates
creation, processing and sharing of data with the scientific community through
a generic framework for a wide variety of research processes and outputs. Ope‐
nAIRE aims to substantially improve the discoverability and reusability of re‐
search publications and data by interconnecting large-scale collections of re‐
search outputs across Europe. The central idea for the project is to create
workflows and services on top of repository content to form an interoperable
network which can act as an all-purpose repository which would be open for
all researchers.</p>
      <p>Reproducibility Enhancement Principles (REP) [38] is a set of recommenda‐
tions based on the Transparency and Openness Promotion (TOP) guidelines
along with other discussions regarding data publication amongst funding agen‐
cies, publishers, journal editors, industry participants and researchers. REP
argues that access to the computational steps taken to process data and gener‐
ate findings is as important as access to data themselves which lends to the
argument about publishing workflows and its associated resources. The au‐
thors consider the ability to reproduce an experiment through its steps on the
same data as the original authors as a minimum dissemination standard. This
includes the workflow information describing the resources and its relationship
to the steps used in computation of the results. It also suggests that journals
should conduct a reproducibility check as part of the publication process and
should enact the TOP standards at level 2 or level 3 which would ensure that
all data and code is available persistently in an open trusted repository.</p>
      <p>There has been discussion [40] into weaker forms of reproducibility where
rather than replicating an entire workflow, only a few parts or components of
it are fashioned to be reusable. While workflow fragments are ideal for such
scenarios, it still undermines the difficulties that may arise in its reproduction
due to a variety of reasons such as technical configuration or data availability
and licensing. Additionally, traditional mechanisms of publication do not ad‐
dress these challenges in any meaningful way, which restricts the possibility of
a centralised solution. Recent advances into decentralising this process [36] al‐
lows publication of research in an open and accessible format without fun‐
nelling it into centralised research repositories. Tools that help consume and
annotate published papers can also be extended to reflect workflows and com‐
ponents for the same experiment. As the decentralisation process allows the
researcher to hold sufficient control over the layout and contents of the pub‐
lished research, it can be utilised as a gateway in the interest of reproducibil‐
ity.</p>
      <p>We extend our argument based on these recommendations to discuss various
means of disseminating existing knowledge amongst researchers to try and
identify possible drawbacks in existing approaches and to discover ways in
which traditional approaches in conducting research can benefit from LOD
principles and workflow based systems.</p>
    </sec>
    <sec id="sec-4">
      <title>2.3   Licensing</title>
      <p>When it comes to publishing the datasets, there are many different variables
that need to be considered. First is the need for context regarding limitations
on publication such as public or intra-institution [41]. This should be comple‐
mented with the mode of access describing where the data is stored and avail‐
ability regarding how it can be accessed. There needs to be a clear strategy
about licensing and whether it applies to a subset or the complete data. This
is vital in cases where data can potentially contain personal or sensitive infor‐
mation. There are established mechanisms and providers for data publishing
in academic circles such as Mendeley Data, PLOS, and Dryad.</p>
      <p>It is necessary to have a deeper understanding of the licensing issues along
with laws and policies that may be applicable. This includes defining rules
pertaining to the intellectual property (IP) of the assets and relevant privacy
policies. Without clear understanding of what is freely available to be repro‐
duced in an observed dataset, it is very difficult to know which data is permis‐
sible to be accessed and under which conditions can it be used. There needs to
be an effective mechanism to check the status of intellectual property or li‐
censing issues that might arise in the process. This includes integrity of the re‐
search ethics undertaken in conducting the original experiment that produced
the data along with replication and generating more datasets.</p>
      <p>Due to the nature of linked open data, it is possible to see how information
related to experimental workflows can be effectively interlinked without a cen‐
tralised mechanism. What remains is to find and utilise appropriate models for
declaration of legalities associated with data. Best practices for publishing
linked data, authored by W3C, states that licenses should be explicitly con‐
nected to the data itself. This allows for a transparent definition of the cir‐
cumstances under which a third-party can reuse the datasets. Creative Com‐
mons (CC) is the suggested approach for licensing associated with such decla‐
rations.</p>
      <p>There are two main mechanisms to describe and communicate the permis‐
sions of a dataset. The first is a license which is a legal instrument for rights
holder to permit certain operations over data to other parties [42]. The second
mechanism is a waiver which in practice is enforced as giving up the ability to
claim rights over to other parties. Commonly used conditions in licensing
models are attribution, copyleft, and non-commerciality. Attribution is giving
the original author credit for the work on operations such as distributing,
replicating, and displaying. Copyleft assumes that the derived work must use
the same licensing model as the work it is derived from. Non-commercial
clauses stipulate usage for non-commercial applications except under specified
conditions.</p>
      <p>Datasets are subject to so-called attribution stacking, meaning all of the
contributors to the original work must be attributed in the chain of produc‐
tion. As a derived work may include datasets under different licensing models,
all of the derivatives authors and licences must be taken into consideration
when producing the final licensing model.</p>
      <p>Licensing of datasets is a very complex issue when it comes to publishing
experimental data. Most of the licensing mechanisms including CC are primar‐
ily designed to protect the published work and not necessarily the datasets.
There are ongoing efforts to address this issue. Open Data Commons (ODC)
is a set of legal tools that help provide and use Open Data with ODC Open
Database License (ODbL) that relates to publishing of datasets. Science Com‐
mons, which is now merged with Creative Common under the Open Science
initiative that specifically targets the use of data in scientific environment.</p>
      <p>There are currently only a few options available to evaluate data from a le‐
gal perspective. While there are certain mechanisms that assess licensing and
IP issues, specifically META-SHARE licenses, the actual usage is limited
based on the context of the data and need for a manual assessment.</p>
      <p>The idea is to have all the assets in the experiment tied to certain licenses
and possibly graded to describe their level of openness for repeatability and
reusability. This is achievable using a Rights Expression Language (REL)
which is an ontology to express rights using linked data. Open Digital Rights
Language (ODRL) is a REL developed to express rights, rules, and conditions
including permissions, prohibitions, obligations, and assertions and the rules
pertaining to IP issues. ODRL can be used to expand existing ontologies to
contextualise experimental data through the use of its own semantic vocabu‐
lary. However, there needs to be an awareness of any potential limitations of
using the ODRL language to determine complexities in licensing issues.</p>
      <p>ODRL has an expressive vocabulary that makes it possible to explain per‐
mission-related relationships in a precise manner. Examples are ‘grantUse’,
‘annotate’, ‘reproduce’ permissions and many more. Additionally, ODRL has
the concept of permission inheritance that enables granting of permissions to
dependent variables based on permissions inherited from independent vari‐
ables (arguments) of the experiment. It has both XML and JSON based
schema for easier integration and implementation.</p>
      <p>There can be multiple assets, assigners, and assignees associated with per‐
mission models that describe permissions, prohibitions, duties, and con‐
straints. All the attributes can be inherited as well as passed on to another
party. Translating all of this to an experimental workflow use case, it is possi‐
ble to deal with an experiment’s licensing models and permission inheritance
for only certain fragments or the entire experiment. Through this a privacy
policy can be clearly set that defines a retention policy along with any IP de‐
tails that can be passed using parent-child relationships to executions or varia‐
tions of that experiment.</p>
      <sec id="sec-4-1">
        <title>3    Browser based tool for workflow documentation</title>
        <p>We created a browser based tool as a test-bed for our discussion and study of
the current methods for workflow documentation and publication. The focus
of the tool was in advancing knowledge about the use of vocabularies in facili‐
tating sharing and repeatability of experiments and replication of results. The
tool also focused on the workflow documentation and its role in publication of
the experiment and subsequent discovery of related work. We focused on re‐
searchers in areas aligned with Natural Language Programming (NLP) and
Machine Learning (ML) as these contain a good variety of variations in exper‐
iment workflows where executions are highly interlinked and repetitive by na‐
ture. Additionally, there have been a number of previous approaches and on‐
tologies [43] targeting these specific areas which provides motivation for fur‐
ther discussion. The target audience for the study is researchers not primarily
familiar with linked open data vocabularies for describing experimental work‐
flows.</p>
        <p>Prospective participants are first asked to fill in a questionnaire (termed
pre-questionnaire) to gauge their familiarity with experimental workflows and
linked open data. The pre-questionnaire enquires about experience in sharing
workflows and whether the participants are familiar with the concepts of re‐
producibility and workflow reuse. Academic qualification along with published
research is used as a metric of experience and familiarity with the research
area. The questionnaire also seeks to understand experiences of researchers in
using a variation of existing or prior work. This is enquired through questions
about the use of a slight or small modification of previous research, either
from self or other researchers. The pre-questionnaire can be found online at
here.</p>
        <p>We chose OPMW as the target vocabulary for describing workflows as it al‐
lows experimental workflows to be described in a highly descriptive manner by
capturing steps, datasets and their relationships. Rather than asking users to
learn the ontology, or in some cases, the concept and use of linked open data,
we abstract use of the specification and focus on the documentation aspect of
workflows. Users of the tool are not required to know the underlying use of
OPMW to use the tool, but are presented with simplified concepts and struc‐
ture from the ontology. The explicit use of terms and metadata used to define
and describe resources which can be searched or explored is provided as the
basis of the system. They are provided with the general idea of a template be‐
ing an abstract design of the experiment which contains steps and datasets in‐
terlinked to define control flow. These templates can then be instantiated into
multiple executions each containing distinct outputs and resources similar to
the notion of a generic experiment run. Users are also exposed to how work‐
flows can be documented using the information provided and linked with re‐
lated resources.</p>
        <p>The documentation generated within the tool follows the principles of linked
open data where each resource has its own corresponding properties and at‐
tributes. For e.g. an execution instance will contain links to every resource it is
associated with, such as the template it was based on, its execution processes
and artifacts along with their corresponding template parameters, steps, and
data variables. This allows a comprehensive overview of the entire workflow as
well as the ability to follow these links to the documentation for a particular
resource.</p>
        <p>The tool, which can be accessed here, is hosted on an internal virtual ma‐
chine hosted by Trinity College Dublin running in a python virtual environ‐
ment. For the server side, it uses flask as the web framework and rdflib for in‐
teracting with RDF data. As rdflib is backend-agnostic, and to keep the tool
footprint small for an online demonstration, we use an SQLite single-file
serverless database as a triple-store. On the client side, it uses standard web
technologies along with some additional libraries and JointJS for rendering the
workflow as a graph. It contains a few useful features for testing and the study
such as importing and exporting workflows using JSON which allows work‐
flows to be loaded or saved from within the tool. This is particularly useful for
the study as it allows users to interact with partially filled workflows by sim‐
ply importing the corresponding JSON.</p>
        <p>The experiment contains three tasks, which combined together can take
about one hour in terms of time for completion. To test the tool and the un‐
derlying study, we propose that users be assigned one task based on their fa‐
miliarity with workflow documentation and running executions. This can be
gauged by analysing their response to the pre-questionnaire. Users who are
not familiar with linked open data or with using workflows can start with
Task 1 which asks them to search for experiments containing specified at‐
tributes and resources using a form based interface. For users who are familiar
with experimentation practices and workflows, Task 2 requires completion of
an execution for an existing template. Task 3 can be suited for users who are
familiar with linked open data and publication of workflows or are experienced
with the concepts of reproduction and repeatability. The task asks them to
create a variation of an existing template as an example of modifying existing
research. Each tasks targets a different aspect of workflow documentation and
consumption. Although the three tasks are disjoint with each other, they all
converge on the documentation generated for the workflows which the users
are encouraged to explore at the end of their task.</p>
        <p>In Task 1, the user is asked to search for experiments containing the speci‐
fied attributes and resources. The form based interface (see Fig. 1) allows
specifying the search parameters using a combination of fields for each at‐
tribute and resource such as specifying a substring in the template name, hav‐
ing certain author(s), containing a particular step or dataset, or based on tem‐
plate executions. Based on the arguments supplied, the tool returns workflow
templates that contain or match the given criteria which are shown at the bot‐
tom in the form of hyperlinks. The user is asked to explore the results pro‐
duced by the query to know more about a particular experiment and its exe‐
cution runs and variations. This task asks the user to think about workflows
as being documented using metadata for itself as well as all of its resources
and the advantages of being able to filter or link together queries based on
this information. It also exposes them to workflow documentation and the way
different experiments and resources can be linked or explored in an automati‐
cally generated documentation. Internally, the tool uses a SPARQL query to
retrieve templates.
Task 2 involves the user completing a partially complete execution for an ex‐
isting template. Users need to fill in the missing metadata which for steps
could be the author information if it was a researcher or a software agent for
scripts along with recording the step's starting and finishing time. For datasets
the missing metadata can be the location URI or whether the dataset is stored
as a file or a folder. The tool shows appropriate errors or warnings until the
required information is correctly filled after which the workflow is published
and saved in the triple-store. Users are then asked to view documentation (see
Fig. 2) generated for their execution. Following displayed links allows users to
explore things such as other executions for the same template, executions run
by the same author or agent or utilising the same datasets. The task allows
users to interact with a workflow system that can follow execution runs and
collect them under a common experiment template. Users also see an example
of how a dataset can be linked to multiple executions through the use of an
URI. The idea of storing experiment results in this manner and the subse‐
quent collection of execution runs allows users to discover execution runs or
experiments with the desired results. As there are no specific instructions
given to the users regarding the working of the tool, any method of discovery
or exploration is based on their understanding of how workflows are linked to‐
gether. This is deliberate owing to the nature of linked open data and the
open world assumption.
Task 3 asks the user to create a variation of an experiment by modifying an
existing template. Examples provided for variation are modifying an existing
step by changing the datasets and parameters it uses or adding new steps
and/or datasets to modify the control flow. As the notion of variation is vague
and ambiguous, users will not be given concrete instructions in terms of what
constitutes a variation and are free to modify the experiment as long as it can
still be sufficiently comparable to the original template. Upon successful com‐
pletion, they are shown the documentation for the template along with a de‐
scription and link to the original template which listed their variation of the
experiment (see Fig. 3). The task helped users discover variations of experi‐
ments that could potentially show alternate approaches towards the same
goal. The executions of each variation are only associated with that particular
template and are not shared with the original. This allows a possible query by
the user to see which variation produced the desired results and under which
(parametric) conditions.
As OPMW does not specify any term we can use for denoting that a template
is a variation of another template, we introduced a placeholder term
isVariationOf based on prov:wasDerivedFrom and prov:wasRevisionOf. It
associates two templates together as being variations but does not specify
which resources are shared or what exactly has been modified. Ideally, any on‐
tology specifying such variation should also be expressive enough to describe
what resources in the workflow have been changed or are affected by the
change. The example specified template labelled forked2 as the variation of
the template step123. More work needs to be done in this area to specify de‐
gree of variance between experiements and to express the nuances between
variation, forking, and iteration of experiment templates.
@prefix this_project: &lt;http://lvh.me/directed-study/workflow/&gt; .
this_project:forked_2 a opmw:WorkflowTemplate,
prov:Plan ;
rdfs:label "forked_2" ;
this_project:isVariationOf this_project:steps123 ;</p>
        <p>The use of OPMW and the styling of documentation is inspired from previ‐
ous research and workflow tools such as WINGS and WorkflowExplorer that
show a description of the experiment along with all of its properties and re‐
sources which can be navigated using the hyperlinks. For templates and execu‐
tions, the tools shows a graphical representation of the steps and artifacts as a
visualisation to help the user understand the structure of the experiment. The
steps and artifacts are structured as nodes on the graph with connections be‐
tween them depicting control flow. Each type of resource is depicted in a vis‐
ually distinct manner so that it is easy to differentiate them. The documenta‐
tion is generated by interpreting the underlying RDF graph as a webpage with
resources linked using hyperlinks. Where possible, additional information is
displayed about resources to encourage discovery of related items. For exam‐
ple, a step described in an experiment template contains entries for all tem‐
plates it is used in which can be clicked to access the documentation page for
that template.</p>
        <p>After the end of the given task(s), a post usage questionnaire (post-ques‐
tionnaire) is used to evaluate the responses of the participants. It contains
open ended questions about the usefulness of the tool for workflow documen‐
tation and in exploration and discovery of existing research components. The
post-questionnaire also enquired about their views on incorporating such
workflow tools in their existing research work. At the end of the session, a
non-structured and optional interview is conducted to help better understand
the responses for qualitative responses such as how they plan to incorporate
linked open data or workflows into their existing research. The post-question‐
naire along with the optional interview is useful to form views about chal‐
lenges faced in incorporating workflows as means of documentation and publi‐
cation and whether there are any significant areas of concern in its adoption.
These discussions are also helpful in understanding the state of affairs in pub‐
lication of experiment data and how it can be combined with the linked open
data principles. A link to the online post-questionnaire can be found here.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4    Licensing workflow resources</title>
        <p>Rights expression languages can be used to describe the serialisation of data
relating to an IP or privacy policy. One of the main challenges in this process
are licensing issues related to data, methods and assets used in the experi‐
ments. Depending on how these resources are licensed, the repeatability of the
experiment changes along with the conditions for reuse. If not declared prop‐
erly, utilising published research data can become burdened with legal issues.
Therefore, it is crucially important to evaluate the current state regarding re‐
searchers’ understanding of licensing related to publications and experimental
workflows.</p>
        <p>There are two contexts that must be observed in understanding the licens‐
ing process: one is regarding the entire workflow as a whole, and the other is
specific parts of the workflow, including but not limited to some of the steps,
algorithms, or datasets. Producing an appropriate license for experimental
workflows thus poses a challenge as licenses are not necessarily ‘sum of parts’,
but each part has to be considered in its own contexts. Additionally, the work‐
flow has to be analysed in a more precise manner regarding licenses that apply
in a local or regional legislation or have patent and ownership issues.</p>
        <p>It is possible to produce a grading depicting the potential for reusability for
resources. This can be done by focusing on individual parts of the workflow
and placing them into the above mentioned contexts, summarising gathered
relationships and inheritance, and then producing the final grading model.
Datasets in experimental workflows can be annotated using a schematic based
on colour such as red depicting unavailability for reuse. For the purpose of
simplicity, we will be discussing annotating only the experimental datasets al‐
though annotations can be applied to any experimental resource falling under
the licensing policy.</p>
        <p>Expanding on ODRL, there are two distinct approaches for annotating
workflows depending on whether the experiment is in the process of being ini‐
tiated (original experiment, original data), or reproduced. In the first case, au‐
thors of the experiment are looking to publish the original work and need to
find an appropriate license under which to do so. In latter case, the person re‐
peating or reusing the experiment would like to understand what the attached
licenses mean in terms of publication for a derivative work. Annotation can
thus go both ways, whether explaining the attached licenses’ implications, or
suggesting a new licensing model.</p>
        <p>We discuss here the use of ODRL as the ontology used for describing li‐
censes associated with workflows. First step would be determining the context
for the experimental data being annotated. Authors responsible for creating
the original experiment and dataset are termed as Assigner, with the Assignee
denoting the person(s) repeating the experiment. Keywords used match
ODRL concepts, with the most important ones being use, attribute, and re‐
produce. As there are limited options for licensing datasets, most licenses can
be covered by using CC and ODC licenses. Assignees analyse the attached li‐
cense and express required conditions using ODRL. Analysis of the aggregated
concepts then produces the grading which identifies warnings or alerts based
on usage. Terms and keywords like pay, sell, obtain consent would raise a
warning flag whereas watermark, translate, shareAlike would raise an alert
flag. Other cases and conditions can have different flags depending on their
own contexts.</p>
        <p>By using ODRL, it is possible to have a system that identifies potential le‐
gal issues surrounding data availability for sharing and using data. Annota‐
tions make licenses and issues easier to apprehend by non-legal parties
through a visual grading of resources. Utilising a colour based grading allows
the flexibility to differentiate based on flow or usage of data and whether the
person(s) in question are the original authors or replicators. There are still
some challenges in applying the ontology to specific instances of an experimen‐
tal workflow. Some terms used by ODRL are ambiguous in their meaning or
similar to other terms whereas some terms might not be applicable at all.
Through a subset or a possible extension the ontology can tackle the vast ma‐
jority of use cases in the real life scenarios of experimental practice.</p>
      </sec>
      <sec id="sec-4-3">
        <title>5    Conclusion &amp; Future Work</title>
        <p>Adopting linked open data for dissipating experiment workflows opens new op‐
portunities for dissemination of knowledge. Sharing workflows helps repro‐
ducibility of experiments as a core issue with publication along with access to
experiment data and resources. By combining these with documentation eff‐
fforts for experiment authors, we discussed how research can be better dissemi‐
nated and shared towards the advancement of science.</p>
        <p>We adopted OPMW as ontology for describing workflows along with ODRL
for declaring licensing to create a workflow tool based in the browser. The tool
acts as the central theme for discussions with researchers and allows them to
interact with experiments via the generated documentation and to explore ex‐
isting research. By abstracting away the underlying ontology, users focus on
consumption of workflows and the exploration or related research through doc‐
umentation. The tool along with the associated discussions and questionnaires
allows us to evaluate the state of workflow publishing for researchers not ex‐
plicitly familiar with workflow ontologies and data licensing using linked open
data principles. We are currently evaluating user studies and responses based
on the tool with a focus on its documentation aspects.</p>
        <p>Our main aim in terms of forming this study and the development of the
tool was in understanding the overlap between current workflow and documen‐
tation habits, particularly for NLP and ML researchers. By studying current
documentation habits and available linked open data ontologies, we hypothe‐
sised a tool through which users can be exposed to workflows created with
OPMW. The study associated with the tool looks towards indentifying areas
where linked open data adoption can be simplified and incorporated into tra‐
ditional forms of publications. We also discuss licensing using ODRL for anno‐
tating the experiment workflow and datasets. Licensing workflows and
datasets is important for reproducible workflows, as it lays out the conditions
under which the experiment may produce further work or be evaluated. We
tried to envision a novel approach for integrating licensing in workflow docu‐
mentaion. One idea we found potentially useful was color-coding based on
suitability for reuse, and would like to emphasise this idea for potential future
work.</p>
        <p>In terms of future work, we would like to further enhance the tool using var‐
ious state of the art research approaches that can help in furthering our dis‐
cussions into workflow documentation. We particularly would like to empha‐
sise the use of graph analysis to differentiate between experiments to identify
and highlight variations and help the user visually interact with them. As
OPMW does not currently have terms associated with variation, there is an
opportunity for an extension to be created addressing the interlinking of re‐
lated workflows. We would also like to investigate the means of publishing
workflows in a decentralised manner using linked data. The possibility of en‐
abling researchers to host their workflows themselves while providing a central
repository information about executing it could potentially be helpful in in‐
creasing reproducibility analysis for published workflows. Such information
could then be attached with the published papers as annotations that can
guide the users to updated information on the workflows rather than letting
them decay. Another thing we would like to evaluate is making it easier for re‐
searchers to provide documentation in a way close to how they conduct exper‐
iments, and to bundle this together in a publication.</p>
        <p>The tool tries to visualise experiment workflows for users and generates doc‐
umentation based on OPMW to describe workflows and resources. However,
some users prefer working with other forms of documentation that do not
align well with linked open data or formal forms of publication. An example
can be keeping notes in markup languages such as markdown where there is a
distinct structure to the document but no formal keywords to add context. It
may be possible to look into utilising such text based styles to document ex‐
periments by converging them using ontologies such as OPMW. This would al‐
low users the choice of using tools or writing their own documentation which
can then be converted into linked open data.</p>
        <p>Along with access to papers and experimental workflows, the data associ‐
ated with the experiment must also be made available in the interest of repro‐
ducibility and furthering research. Such datasets should have licenses that de‐
clare the terms under which the data was obtained and the conditions under
which it may be accessed or re-used. A common example in research publica‐
tions is the condition where experimental data may only be re-used in an aca‐
demic environment, expressly forbidding any commercial usage. Such clarity in
license is beneficial and essential for research as it allows access to a large cor‐
pus of shared data that can help in future experiments as well as in repro‐
ducibility of previous research. In cases where such data cannot be made avail‐
able, publication of its schema can allow researchers to utilise the experiment
or its components by compiling a matching dataset. The schema in such a case
would correspond to metadata pertaining to the dataset that describes what
kind of data it encapsulates and how it is structured without exposing any of
the actual data itself. Such approaches are helpful in experiments where per‐
sonalised data is often anonymised and may not be released under any permis‐
sible license. We would like to explore this issue through the use of a grading
mechanism utilising ODRL within the workflow tool.</p>
      </sec>
      <sec id="sec-4-4">
        <title>Acknowledgements</title>
        <p>This work has been supported by the European Commission as part of the
ADAPT Centre for Digital Content Technology which is funded under the SFI
Research Centres Programme (Grant 13/RC/2106) and is co-funded under the
European Regional Development Fund.</p>
      </sec>
      <sec id="sec-4-5">
        <title>7    References</title>
        <p>rigor." ACM SIGPLAN Notices 47, no. 4a (2012): 30-36.
18. Sean Bechhofer, John Ainsworth, Jitenkumar Bhagat, Iain Buchan, Phillip
Couch, Don Cruickshank, Mark Delderfield, Ian Dunlop, Matthew Gamble,
Carole Goble, Danius Michaelides, Paolo Missier, Stuart Owen, David New‐
man, David De Roure, Shoaib Sufi (2013) 'Why Linked Data is Not Enough
for Scientists', Future Generation Computer Systems 29(2), February 2013,
Pages 599-611, ISSN 0167-739X, doi:10.1016/j.future.2011.08.004
19. Khalid Belhajjame, Jun Zhao, Daniel Garijo, Matthew Gamble, Kristina
Hettne, Raul Palma, Eleni Mina, Oscar Corcho, José Manuel Gómez-Pérez,
Sean Bechhofer, Graham Klyne, Carole Goble (2015) 'Using a suite of ontolo‐
gies for preserving workflow-centric research objects', Web Semantics: Science,
Services and Agents on the World Wide Web, doi:10.1016/j.web‐
sem.2015.01.003
20. R. Mayer, T. Miksa, and A. Rauber, ‘Ontologies for Describing the Con‐
text of Scientific Experiment Processes’, 2014, pp. 153–160.
21. J. Zhao et al., ‘Why workflows break—Understanding and combating decay
in Taverna workflows’, in E-Science (e-Science), 2012 IEEE 8th International
Conference on, 2012, pp. 1–9.
22. Moraila, Gina, Akash Shankaran, Zuoming Shi, and Alex M. Warren.
'Measuring Reproducibility in Computer Systems Research'. Tech Report,
2014.
23. R. Mayer and A. Rauber, ‘A Quantitative Study on the Re-executability of
Publicly Shared Scientific Workflows’, 2015, pp. 312–321.
24. Piccolo, Stephen R., and Michael B. Frampton. "Tools and techniques for
computational reproducibility." GigaScience 5, no. 1 (2016): 30.
25. A. Bánáti, P. Kacsuk, and M. Kozlovszky, ‘Minimal sufficient information
about the scientific workflows to create reproducible experiment’, in Intelligent
Engineering Systems (INES), 2015 IEEE 19th International Conference on,
2015, pp. 189–194.
26. D. De Roure, K. Belhajjame, P. Missier, J. M. Gómez-Pérez, R. Palma, J.
E. Ruiz, K. Hettne, M. Roos, G. Klyne, and C. Goble, “Towards the preserva‐
tion of scientific workflows,” in Proceedings of the 8th International Confer‐
ence on Preser- vation of Digital Objects (iPRES 2011), Singapore, 2011.
27. K. Belhajjame, C. Goble, S. Soiland-Reyes, and D. De Roure, “Fostering
scientific workflow preservation through discovery of substitute services,” in
E-Science (e-Science), 2011 IEEE 7th International Conference on, Dec 2011,
pp. 97–104.
28. Santana-Perez, Idafen, Rafael Ferreira da Silva, Mats Rynge, Ewa Deel‐
man, María S. Pérez-Hernández, and Oscar Corcho. "Reproducibility of execu‐
tion environments in computational science using Semantics and Clouds." Fu‐
ture Generation Computer Systems 67 (2017): 354-367.
29. A. Bánáti, P. Kárász, P. Kacsuk, and M. Kozlovszky, ‘Evaluating the aver‐
age reproducibility cost of the scientific workflows’, in Intelligent Systems and
Informatics (SISY), 2016 IEEE 14th International Symposium on, 2016, pp.
79–84.
30. A. Bánáti, P. Kacsuk, and M. Kozlovszky, ‘Evaluating the reproducibility
cost of the scientific workflows’, in Applied Computational Intelligence and In‐
formatics (SACI), 2016 IEEE 11th International Symposium on, 2016, pp.
187–190.
31. A. Banati, P. Kacsuk, M. Kozlovszky, M. 'Four level provenance support to
achieve portable reproducibility of scientific workflows'. In Information and
Communication Technology, Electronics and Microelectronics (MIPRO), 2015
38th International Convention on IEEE. unpublishe
32. Garijo, Daniel, Oscar Corcho, Yolanda Gil, Boris A. Gutman, Ivo D. Di‐
nov, Paul Thompson, and Arthur W. Toga. "Fragflow automated fragment de‐
tection in scientific workflows." In e-Science (e-Science), 2014 IEEE 10th In‐
ternational Conference on, vol. 1, pp. 281-289. IEEE, 2014.
33. T. Koohi-Var and M. Zahedi, ‘Linear merging reduction: A workflow dia‐
gram simplification method’, in Information and Knowledge Technology (IKT),
2016 Eighth International Conference on, 2016, pp. 105–110.
34. Callaghan, Sarah. "Joint declaration of data citation principles." (2014).
35. Wilkinson, Mark D., Michel Dumontier, IJsbrand Jan Aalbersberg,
Gabrielle Appleton, Myles Axton, Arie Baak, Niklas Blomberg et al. "The
FAIR Guiding Principles for scientific data management and stewardship."
Scientific data 3 (2016).
36. https://linkedresearch.org/
37. P. Ayris, R. D. W. Group, and others, ‘LERU Roadmap for Research
Data’, 2013.
38. V. Stodden et al., ‘Enhancing reproducibility for computational methods’,
Science, vol. 354, no. 6317, pp. 1240–1241, 2016.
39. Roure DD, Belhajjame K, Missier P, Al E. 'Towards the preservation of
scientific workflows'. Proceedings of the 8th International Conference on
Preservation of Digital Objects (iPRES 2011), Singapore, 2011; 228–231.
40. Cohen-Boulakia S, Leser U. 'Search, adapt, and reuse: the future of scien‐
tific workflows'. SIGMOD Record 2011; 40(2):6–16. DOI: http://doi.acm.org
/10.1145/2034863.2034865.
41. Alexander, Keith, Richard Cyganiak, Michael Hausenblas, and Jun Zhao.
"Describing Linked Datasets." In LDOW. 2009.
42. Ball, A. (2014). ‘How to License Research Data’. DCC How-to Guides. Ed‐
inburgh: Digital Curation Centre. (2011)
43. McCrae, John P., Penny Labropoulou, Jorge Gracia, Marta Villegas, Víc‐
tor Rodríguez-Doncel, and Philipp Cimiano. "One ontology to bind them all:
The META-SHARE OWL ontology for the interoperability of linguistic
datasets on the Web." In European Semantic Web Conference, pp. 271-282.
Springer International Publishing, 2015.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>P.</given-names>
            <surname>Missier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Woodman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Hiden</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P.</given-names>
            <surname>Watson</surname>
          </string-name>
          , '
          <article-title>Provenance and data differencing for workflow reproducibility analysis: PROVENANCE AND DATA DIFFERENCING FOR REPRODUCIBILITY'</article-title>
          ,
          <source>Concurrency and Computation: Practice and Experience</source>
          , vol.
          <volume>28</volume>
          , no.
          <issue>4</issue>
          , pp.
          <fpage>995</fpage>
          -
          <lpage>1015</lpage>
          , Mar.
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>Y.</given-names>
            <surname>Gil</surname>
          </string-name>
          , '
          <article-title>Intelligent workflow systems and provenance-aware software'</article-title>
          ,
          <source>in Proceedings of the Seventh International Congress on Environmental Modeling and Software</source>
          , San Diego, CA,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>J.</given-names>
            <surname>Ruiz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Garrido</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Santander-Vela</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Sánchez-Expósito</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Verdes-Montenegro</surname>
          </string-name>
          , ''AstroTaverna:
          <article-title>Building workflows with Virtual Observatory services'</article-title>
          ,
          <source>Astron. Comput.</source>
          ,
          <fpage>7</fpage>
          -
          <lpage>8</lpage>
          (
          <year>2014</year>
          ), pp.
          <fpage>3</fpage>
          -
          <lpage>11</lpage>
          Special Issue on The Virtual Observatory: I
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>I.D.</given-names>
            <surname>Dinov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.D.V.</given-names>
            <surname>Horn</surname>
          </string-name>
          ,
          <string-name>
            <surname>K.M. Lozev</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Magsipoc</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Petrosyan</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>MacKenzie-Graham</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Eggert</surname>
            ,
            <given-names>D.S.</given-names>
          </string-name>
          <string-name>
            <surname>Parker</surname>
            ,
            <given-names>A.W.</given-names>
          </string-name>
          <string-name>
            <surname>Toga</surname>
          </string-name>
          '
          <article-title>Efficient, distributed and interactive neuroimaging data analysis using the LONI Pipeline'</article-title>
          , Frontiers in Neuroinformatics, Volume
          <volume>3</volume>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>K.</given-names>
            <surname>Wolstencroft</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Haines</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Fellows</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Williams</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Withers</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Owen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Soiland-Reyes</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Dunlop</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Nenadic</surname>
          </string-name>
          , P. Fisher, J. Bhagat,
          <string-name>
            <given-names>K.</given-names>
            <surname>Belhajjame</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Bacall</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Hardisty</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.N. de la Hidalga</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.P.B. Vargas</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Su</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <article-title>Goble 'The Taverna workflow suite: designing and executing workflows of web services on the desktop, web or in the cloud'</article-title>
          ,
          <source>' Nucleic Acids Res</source>
          . (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>K.</given-names>
            <surname>Belhajjame</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Garijo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Gamble</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Hettne</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Palma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Mina</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Corcho</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.M.</given-names>
            <surname>Gómez-Pérez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Bechhofer</surname>
          </string-name>
          , G. Klyne,
          <string-name>
            <surname>C.</surname>
          </string-name>
          <article-title>Goble 'Using a suite of ontologies for preserving workflow-centric Research Objects'</article-title>
          ,
          <source>Web Semant. Sci. Serv</source>
          . Agents World Wide Web (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>P.</given-names>
            <surname>Missier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Dey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Belhajjame</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Cuevas-Vicenttín</surname>
          </string-name>
          ,
          <string-name>
            <surname>B.</surname>
          </string-name>
          <article-title>Ludäscher 'DPROV: Extending the PROV provenance model with workflow structure'</article-title>
          ,
          <source>Proceedings of the 5th USENIX Workshop on the Theory and Practice of Provenance</source>
          , TaPP'13,
          <string-name>
            <given-names>USENIX</given-names>
            <surname>Association</surname>
          </string-name>
          , Berkeley, CA, USA (
          <year>2013</year>
          ), pp.
          <volume>9</volume>
          :
          <fpage>1</fpage>
          -
          <issue>9</issue>
          :
          <fpage>7</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>Víctor</given-names>
            <surname>Cuevas-Vicenttín</surname>
          </string-name>
          , Parisa Kianmajd, Bertram Ludäscher, Paolo Missier, Fernando Chirigati, Yaxing Wei, David Koop, Saumen Dey '
          <article-title>The PBase scientific workflow provenance repository'</article-title>
          ,
          <source>Int. J. Digit. Curation</source>
          ,
          <volume>9</volume>
          (
          <issue>2</issue>
          ) (
          <year>2014</year>
          ), pp.
          <fpage>28</fpage>
          -
          <lpage>38</lpage>
          View Record in Scopus |
          <source>Citing articles (2)</source>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>Khalid</given-names>
            <surname>Belhajjame</surname>
          </string-name>
          , Jun Zhao, Daniel Garijo, Aleix Garrido,
          <string-name>
            <surname>Stian</surname>
            <given-names>SoilandReyes</given-names>
          </string-name>
          , Pinar Alper, Oscar Corcho, '
          <article-title>A workflow PROV-corpus based on taverna and WINGS'</article-title>
          ,
          <source>in: Proceedings of the Joint EDBT/ICDT 2013 Workshops</source>
          , Genova, Italy,
          <year>2013</year>
          , pp.
          <fpage>331</fpage>
          -
          <lpage>332</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <given-names>D.</given-names>
            <surname>Garijo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Gil</surname>
          </string-name>
          , and
          <string-name>
            <given-names>O.</given-names>
            <surname>Corcho</surname>
          </string-name>
          , '
          <article-title>Abstract, link, publish, exploit: An end to end framework for workflow sharing', Future Generation Computer Systems</article-title>
          , Jan.
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>I.D. Dinov</surname>
            ,
            <given-names>J.D.V.</given-names>
          </string-name>
          <string-name>
            <surname>Horn</surname>
          </string-name>
          ,
          <string-name>
            <surname>K.M. Lozev</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Magsipoc</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Petrosyan</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>MacKenzie-Graham</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Eggert</surname>
            ,
            <given-names>D.S.</given-names>
          </string-name>
          <string-name>
            <surname>Parker</surname>
            ,
            <given-names>A.W.</given-names>
          </string-name>
          <string-name>
            <surname>Toga</surname>
          </string-name>
          '
          <article-title>Efficient, distributed and interactive neuroimaging data analysis using the LONI Pipeline'</article-title>
          , Frontiers in Neuroinformatics, Volume
          <volume>3</volume>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>J. Goecks</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Nekrutenko</surname>
          </string-name>
          , J. Taylor 'Galaxy:
          <article-title>a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences', Genome Biol</article-title>
          .,
          <volume>11</volume>
          (
          <issue>8</issue>
          ) (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <given-names>K.</given-names>
            <surname>Wolstencroft</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Haines</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Fellows</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Williams</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Withers</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Owen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Soiland-Reyes</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Dunlop</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Nenadic</surname>
          </string-name>
          , P. Fisher, J. Bhagat,
          <string-name>
            <given-names>K.</given-names>
            <surname>Belhajjame</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Bacall</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Hardisty</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.N. de la Hidalga</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.P.B. Vargas</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Su</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <article-title>Goble 'The Taverna workflow suite: designing and executing workflows of web services on the desktop, web or in the cloud'</article-title>
          ,
          <source>Nucleic Acids Res</source>
          . (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <given-names>F.</given-names>
            <surname>Chirigati</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Freire</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Koop</surname>
          </string-name>
          , C. Silva, '
          <article-title>VisTrails provenance traces for benchmarking'</article-title>
          ,
          <source>in: Proceedings of the Joint SDBT/ICDT 2013 Workshops</source>
          ,
          <year>2013</year>
          , pp.
          <fpage>323</fpage>
          -
          <lpage>324</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <given-names>Y.</given-names>
            <surname>Gil</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Ratnakar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.A.</given-names>
            <surname>Gonzalez-Calero</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.T.</given-names>
            <surname>Groth</surname>
          </string-name>
          , J. Moody, E. Deelman 'WINGS:
          <article-title>Intelligent workflow-based design of computational experiments'</article-title>
          ,
          <source>IEEE Intell. Syst.</source>
          ,
          <volume>26</volume>
          (
          <issue>1</issue>
          ) (
          <year>2011</year>
          ), pp.
          <fpage>62</fpage>
          -
          <lpage>72</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Harmassi</surname>
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grigori</surname>
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Belhajjame</surname>
            <given-names>K.</given-names>
          </string-name>
          (
          <year>2015</year>
          )
          <article-title>'Mining Workflow Repositories for Improving Fragments Reuse'</article-title>
          . In: Cardoso J.,
          <string-name>
            <surname>Guerra</surname>
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Houben</surname>
            <given-names>GJ.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pinto</surname>
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Velegrakis</surname>
            <given-names>Y</given-names>
          </string-name>
          . (eds) Semantic
          <source>Keyword-based Search on Structured Data Sources. Lecture Notes in Computer Science</source>
          , vol
          <volume>9398</volume>
          . Springer, Cham
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Vitek</surname>
            , Jan, and
            <given-names>Tomas</given-names>
          </string-name>
          <string-name>
            <surname>Kalibera</surname>
          </string-name>
          .
          <article-title>"R3: Repeatability, reproducibility and</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>