=Paper= {{Paper |id=None |storemode=property |title=Data Bridge: solving diverse Data Access in Scientific Applications |pdfUrl=https://ceur-ws.org/Vol-993/paper13.pdf |volume=Vol-993 |dblpUrl=https://dblp.org/rec/conf/iwsg/FarkasKBKSO13 }} ==Data Bridge: solving diverse Data Access in Scientific Applications== https://ceur-ws.org/Vol-993/paper13.pdf
Data Bridge: solving diverse data access in scientific
                    applications

            Zoltán Farkas and Péter Kacsuk and                             Marc Santcroos and Silvia Olabarriaga
           Ákos Balasko and Krisztián Karóczkai                                             AMC
                         MTA SZTAKI                                     The Netherlands, 1100 DE Amsterdam, PO Box 22700
                Hungary, 1518 Budapest, Pf. 63.                                   Email: m.a.santcroos@amc.uva.nl
               Email: zoltan.farkas@sztaki.mta.hu


    Abstract—The nature of data for scientific computation is              concept of the successful DCI Bridge. In the scope of the SCI-
very diverse in the age of big data. First, it may be available            BUS project [2], we designed a new service called the Data
at a number of locations, e.g. the scientist’s machine, some               Bridge service, which provides a unified interface for accessing
institutional filesystem, a remote service, or some sort of database.      different storage services, e.g., HTTP, FTP, GridFTP [3], SRM
Second, the size of the data may vary from a few kilobytes                 [4] and Amazon S3 [5]. In the current paper we describe the
to many terabytes. In order to be available for computation,
data has to be transferred to the location where the computation
                                                                           main concepts and features of this new service.
takes place. This requires a diverse set of middleware tools that             The organization of the paper is as follows: we first
are compatible both with the data and the compute resources.               present an overview of related work, and then we gather the
However, using this tools requires additional knowledge and                requirements towards the system. Afterwards we present the
makes running the experiments an inconvenient task. In this                Data Bridge in detail and show how a complex scenario can
paper we present the Data Bridge, a high-level service that can
be used easily in scientific computations to perform data transfer
                                                                           be supported. Finally, we discuss our preliminary results, and
to and from a diverse set of storage services. The Data Bridge             present our current and future work.
not only unifies access to different types of storage services, but
it can also be used at different levels (e.g., single jobs, parameter                           II.   R ELATED WORK
sweeps, scientific workflows) in scientific computations.                      OGSA-DAI [6] provides a web service for accessing dif-
                                                                           ferent data resources such as relational or XML databases, files
                       I.   I NTRODUCTION                                  or web services. The data can not only be queried and updated,
    There are many different distributed computing infrastruc-             but modified and transformed as well. The OGSA-DAI service
tures (DCIs) that users would like to access from scientific               was used in many different projects.
workflows and science gateways. In order to hide the different                 The Storage Resource Broker (SRB) [7] offers a middle-
APIs needed to access these very different DCIs such as                    ware that provides clients a uniform access to a diverse set of
grids, clouds, clusters, desktop grids, and supercomputers, we             storage resources. Two type of “drivers” are available: file-type
have developed the DCI Bridge service [1] and connected it                 drivers (for example UNIX filesystems) and DB-type drivers
to the WS-PGRADE/gUSE science gateway framework. As                        (for example IBM DB2 and Oracle databases).
a result, application-oriented science gateways, which were
developed by the customization of the WS-PGRADE/gUSE                           The integrated Rule-Oriented Data-management System
framework, can currently access all these types of DCIs                    (iRODS) [8] is a scalable grid software solution for managing
transparently from the nodes of WS-PGRADE workflows. As a                  files in the order of hundred million and total size in the
consequence workflows can be ported between DCIs by simply                 order of petabytes. It is capable of making use of a number of
re-configuring them for another infrastructure.                            authentication mechanisms, supports a wide range of physical
                                                                           storages, and has support for metadata attributes.
    Many of these scientific applications manipulate a large
amount of data of different types, for example medical images                  jSAGA [9], the Java implementation of the Simple API for
or DNA sequences. The data is stored on different types of                 Grid Applications (SAGA) Open Grid Forum (OGF) specifica-
storage systems, like a local file system, a distributed or shared         tion [10] provides a Java API to access different grid services,
file system, some sort of service catalog, or in a database                including storage services as well. jSAGA provides an easily
system. Access to these data might be difficult, as detailed               extensible platform for accessing FTP, GridFTP, iRODS, SRB,
knowledge and specific tools are needed to fetch or upload                 LFC service.
the data. Although easy-to-use (web) interfaces might be                       Globus Online [11] offers a service for managing data
available for the individual systems, using them in combination            available on GridFTP endpoints. It has a web interface, it
is required in complex processing systems such as scientific               is also usable through command-line tools, and it provides a
workflows. In order to provide access to a diverse set of storage          REST API as well. Users of Globus Online can monitor their
resources, the system needs to be aware and provide support                data transfers, which have automatic error recovery.
for data access using the different APIs of each storage service.
                                                                              The presented tools and services offer varying features,
   Our approach for solving this problem is very similar to the            but because of their different scopes, all of them have their
weaknesses outside of their intented purposes, for example,
they only support a limited variety of storage services (OGSA-                                                                      User's
                                                                                                                                   machine


DAI, SRB, Globus Online), do not provide an API (iRODS),                                                       Browse                   Upload
are not available as a service (jSAGA), or are based on a
too heavy software stack (OGSA-DAI, SRB). Our aim with
the Data Bridge is to mix the advantages of these tools in a                                                            Storage View


lightweight service.
                                                                                Action    Action   Action   Action

                                                                                                                         Data Bridge client

       III.   R EQUIREMENTS TOWARDS THE SYSTEM
                                                                                                                        Data Bridge Service

    In this section we present some use cases for the Data
Bridge service. The three main considerations are the follow-
ing: easy-to-use user interface to browse and manage data
stored on a storage resource; transfer of data between different                                                         Storage Service


types of storage resources, and finally data handling from
compute resources.                                                    Fig. 1.   Storage browsing and data upload
    Figure 1 covers the high-level needs of users concerning
ease of use: a convenient user interface is necessary for end-
users so that they can browse, download and upload data from
and to the storage resource. Moreover, users should be enabled
to select data for running experiments on the DCI. In case
of web portals, such interfaces are typically implemented as                                                            Data Bridge
a portlet, thus the user interface can be built as a viewing
portlet (Storage View) that uses functions exposed by a Data
Bridge service to access the various storage resources. The
same interface can also be used to upload and download data
from and to the user’s own machine to and from the selected
storage. The advantage of using the Data Bridge service
clearly appears if storage resources of multiple types should                              Storage 1                                                 Storage 2

be accessed: instead of interfacing with multiple storage APIs
from the storage browsing graphical interface, the developer
of the Storage View component should access only the Data
Bridge service.                                                       Fig. 2.   Transfer between different storage types
    Figure 2 covers the case where large amount of data needs
to be transferred from a given storage resource to another stor-
age service, considering the generic case of Storage services                                                                                    Compute node
located in different DCIs. This is particularly necessary in case                        Data Bridge                                               Wrapper
the experiment, or even a single job, runs on an infrastructure                           Service                                                Pre-processing
that is different from the one where the data is stored on.
In such case the data needs to be transferred from the original                                                                                    Execution
storage location to the storage location of the compute resource
to ensure high data transfer rate during data processing time. In                                                                                Post-processing

Figure 2, the user is requesting the Data Bridge to perform the
copy operation, but this can also be initiated programmatically
by the workflow management system, for example. The Data
Bridge has access to both Storage 1 and Storage 2, and                           Storage Service
performs the copy. However, would the Data Bridge have no
access to Storage 2, it could make use of an other Data Bridge
in a hierarchical organization.
    The third use case covers the needs of handling data in jobs:     Fig. 3.   Using the Data Bridge from a Compute node
once a job has been started on a compute node, the job may
need to access data on a storage service for input or output.
The Data Bridge can be used in this case as a unified interface       to provide access to a diverse set of storage resources for any
to get or put the data. As shown in Figure 3, a wrapper script        kind of application.
fetches input files for the job before the job is actually started,
and it uploads the output files of the job to target storage after                                          IV.      DATA B RIDGE
the job has been executed. Such wrapping methods can be used
to make storage-aware legacy applications without the need to             In this section we present the Data Bridge design in detail.
modify the application itself. As a consequence, using the Data       First we show the high-level architecture, and then we present
Bridge with this wrapped execution offers a convenient way            the different components in more detail.
                                                                                                  Public interface                                 HTTP Servlet




                                          Actual data
                                          transfer                                               Adaptor Manager

                                                                                                                                              Temporary URL queue
             Initiate                                                                               Thread Pool

             upload         Redirection                                                Worker       Worker                 Worker            URI    URI           URI
                                                                                      Thread 1     Thread 2               Thread m




                                                                                                                         Adaptor interface

              Data Bridge                      Amazon S3
                                                                                  Adaptor 1      Adaptor 2        Adaptor 3                  Adaptor n     Data Bridge Adaptor




                                                                                                                                                              Data Bridge
                                                                                                                                                               Service
                                                                                  Storage        Storage          Storage                     Storage
                                                                                     1              2                3                           n



Fig. 4.   Data transfer redirection
                                                                     Fig. 5.   Data Bridge architecture


   The Data Bridge itself is implemented as a web application
exposing a number of operations for managing the data located        B. Interfaces
on different storage. For the client, two main interfaces are            There are two interfaces in the Data Bridge: the Public
available: one web service interface for initiating operations,      interface and the Adaptor interface. These interfaces expose
and one servlet interface for performing any necessary data          the same set of operations, but through different technologies:
up- and download.                                                    the Public interface offers a Web service, whereas the Adaptor
                                                                     interface describes a Java API that the different adaptors should
    Whenever possible, the Data Bridge is making use of data         implement.
redirection to minimize its network traffic: if the destination
storage is able to accept data through a simple HTTP protocol             The following operations are supported by the interfaces:
(GET, PUT or POST methods), file up- and download requests           list, mkdir, delete, get, put, and copy. The operations use URIs
sent to the Data Bridge are redirected to the storage. Thus, the     as their argument where applicable. The URI is an abstract Java
data won’t be transferred through the Data Bridge, but will be       object that all the adaptors should extend in order to contain
transferred directly between the client and the storage service.     all the necessary information to reference and handle the data
The outline of this redirection in case of a client initiating       to be managed on that particular type of Storage service. For
upload to an Amazon S3 storage is shown in Figure 4: first           example, in case of an HTTP adaptor the URI object should
the client connects to the Data Bridge initiating data upload.       contain an URL, or in case of a GridFTP adaptor the URI
Next, the Data Bridge responds with a redirection to URL             object should contain a GridFTP URL and all the necessary
where the targeted Amazon S3 can accept the data, and finally        credentials (X.509 proxy certificate) to access the data. An URI
the client sends the data directly to the S3 storage.                may represent either a file or a directory. All the operations
                                                                     return either a result or an error message depending on the
     Although we originally considered including support for         success of the operation.
metadata attributes of databases in the Data Bridge, the first
version is limited to flat files on Storage services that organize       The list operation can be used to list the contents of a
their data in a hierarchical structure, such as files and direc-     directory entry, represented as an URI, and returns a list of
tories. In the rest of this paper we refer to the actual data as     URIs found under the given entry. If the URI argument of the
file, and to the location of the data as directory.                  operation references a file, the operation returns a single entry
                                                                     with the URI of that file. Otherwise, the list of entries found
                                                                     in the given directory is returned.

A. High-level architecture                                              The mkdir operation can be used to create a directory entry.
                                                                     The URI should represent a non-existing entry.
    Figure 5 shows the high-level architecture of the Data               The delete operation can be used to recursively delete a
Bridge. Five main components or component groups can be              directory or a file represented by the URI. That is, if the URI
distinguished: the Public interface and HTTP Servlet accept          refers to a file, the file is removed. If the URI refers to a
requests from external components; the Adaptor Manager               directory, the entire contents of the directory is removed. Upon
arranges the execution of the requested operations; the Adaptor      termination, the function returns the list of URIs that were
Interface, along with the different Adaptors, communicate            tried, either with a success or failure indicator (in the latter
with the different Storage systems supported by the Data             case, the appropriate error message is appended).
Bridge; and the Temporary URL queue, which temporarily
stores incoming data put and get requests. Depending on the             The get operation can be used to register a file download
incoming requests, the different Threads perform the requested       request. The only argument to the function is the URI,
operation by making use of the relevant adaptor through the          which should represent a file. The result of the operation is
generic Adaptor interface. We describe the different interfaces      a temporary unique Data Bridge URL (using UUID), that
and components in detail.                                            can be accessed by the HTTP GET method to get the data
belonging to the file referenced by the URI. So if one wants
                                                                                    Workflow system
to fetch a file from a storage service using the Data Bridge,
first the get operation should be called to get a temporary




                                                                                                                Public Interface
HTTP GET URL from the Data Bridge, and if the registration                                          Data
                                                                                Job submission
has been successful, HTTP GET can be used to actually fetch                       component
                                                                                                 Management
                                                                                                 Component
the data. Figure 5 shows how these temporary URLs are




                                                                                                                                   Data Bridge
organized in the data bridge.                                                                                                                    Storage

The detailed usage of the get operation is as follows: first




                                                                                                                HTTP Servlet
the client uses the get operation of the Data Bridge’s Public
                                                                                Execution node
interface with the URI to the data to download. This will                          Prefetch
result in putting the URI into the Temporary URL queue of                          Execute

the Data Bridge, with a unique ID. This ID prefixed with                           Upload

the HTTP servlet’s URL is returned as the result of the get
operation. Next, the client invokes an HTTP/GET method to
the Data Bridge’s HTTP servlet by using the temporary URL
                                                                   Fig. 6.   Two-phase get and put operations
to actually have the data downloaded: this will make the
HTTP servlet look up the URI belonging to the temporary
URL, and will make use of the relevant adaptor to actually
                                                                   the Data Bridge, entries older that specified in a configuration
stream the data as a response to the HTTP/GET method. If
                                                                   file are removed, resulting in an HTTP/404 (not found) status
the data on the given storage can be fetched using HTTP
                                                                   code.
GET, the HTTP servlet instead of streaming the data through
the Data Bridge, will redirect the client to the HTTP URL              Finally, the copy operation can be used to copy data
where the data can be fetched from the storage directly. Thus,     from one location to an other. It accepts three arguments:
whenever possible, the data should be streamed directly from       the source URI, the destination URI, and the optional URL
its source, and not through the Data Bridge.                       of a destination Data Bridge service. If the optional URL of
We have decided to implement the get operation following this      the destination Data Bridge service is not specified, the Data
two-phase approach due to the way how data is accessed from        Bridge service serving the copy operation should be capable
scientific workflows’ jobs: before the jobs of the workflow are    of handling both the source and destination URIs, that is it
submitted, a component (for example the workflow system            should have all the relevant adaptors enabled. If the optional
or job submission component) can register the file up- and         URL of the destination Data Bridge service is specified, the
download requests to the data bridge. Once this is done, the       Data Bridge service serving the copy operation will make use
component can submit the actual jobs encapsulated in simple        of its Data Bridge adaptor to issue a put operation to the
wrappers that are capable of fetching and uploading the data       destination Data Bridge service using the destination URI and
using simple HTTP methods (for example wget of cURL                the data served by the source URI’s adaptor to perform the
can be used to perform these tasks). Thus, the somewhat            copy operation. This way, the Bridge services can be used in a
complex operation of invoking the web service interface is         master-slave scenario, where master Data Bridge can use slaves
detached from the actual data transfer, so this latter task can    to perform the actual transfers to the selected target storage. Of
be performed really simply.                                        course, third party transfer should be used whenever possible
                                                                   to minimize the amount of data transferred through the Data
                                                                   Bridge service. For example, the GridFTP protocol enables
    The put operation is very similar to the get operation.        third party transfers.
The only difference is that in order to actually have the data
uploaded, the client has to invoke an HTTP/POST method for
the temporary URL returned by the put operation of the Public      C. Components
interface, and stream the data to be uploaded to the given             The remaining components of the Data Bridge that haven’t
temporary URL, or to the redirected storage URL is HTTP            been explained are the Adaptor Manager, the Thread Pool, the
PUT is supported by the storage.                                   Worker Threads, the Adaptors, the Temporary URL queue and
                                                                   the HTTP Servlet.
    The get and put operations in a simple scenario are shown
in Figure 6. In this case, before the submission of a job’s that       The task of the Adaptors is to implement the Adaptor inter-
uses a data storage, the workflow system’s data management         face for a given type of storage service, and are responsible for
registers the job’s down- and upload request through the Data      actually communicating with storage service of the given type.
Bridge’s Public Interface. After this is done, the temporary       In order to minimize the necessary implementation work, we
URLs associated with the input and output files are sent to the    have choosen to use jSAGA wherever possible to implement
job sumbission component that is responsible for preparing         the different Adaptors. The advantage of using jSAGA is that
an appropriate wrapper, that is capable of fetching input files    is already provides a unified API to access files located on
before running the actual executable, and uploading the pro-       different types of storages, like FTP, GridFTP or SRM. It is
duced output files after the actual executable has terminated.     important to note that the Adaptor Interface is not the same
The wrapper can be really light-weight, as it simply has to use    as the interface offered by jSAGA, thus in order to implement
HTTP GET and POST methods to fetch and upload input and            support for a storage type not handled by jSAGA, one does
output files, respectively.                                        not have to implement that support in jSAGA itself.
   The Temporary URL queue is periodically invalidated by              The Worker Threads are responsible for performing the
operations requested through the Public interface with the dif-                                                        WS-PGRADE/gUSE
                                                                                                        Storage Browsing        Workflow
ferent Adaptors through using the Adaptor interface. Actually,                                               portlet        management portlet
                                                                                                                                                   ...


the web service framework used (JAX-WS) starts different
                                                                                                             WFI            WFS            ...
threads for the different web service requests, thus the Worker
Threads simply start processing the requested operation once
a call to the web service interface comes in. If follows from
this, that the Thread Pool is a simple pool for the Worker                       DCI Bridge
Threads, also managed by the web service framework. The
Adaptor Manager is responsible for managing the execution
of requested Public interface operations through the Worker                    Local cluster (PBS)
Threads.                                                                                                                    Data
                                                                              Worker          Worker                       Bridge                Amazon S3
    Finally, the Temporary URL queue and the HTTP Servlet                       Worker
                                                                               node
                                                                                  Worker
                                                                                 node
                                                                                    Worker
                                                                                   node
                                                                                                Worker
                                                                                               node
                                                                                                  Worker
                                                                                                 node
                                                                                                    Worker
                                                                                                   node
are responsible for actually serving and storing data belonging                      node            node


to previously registered get and put requests as described
earlier. That is, if a get or put method is requested through
the Public Interface, the affected URIs are registered in the       Fig. 7.   Scientific workflow execution scenario
Temporary URL queue, and the temporary URL is returned as
the response to the get or put methods. As described earlier,
after this cliens can invoke the HTTP Servlet through simple        interface and HTTP servlet to browse and manage data stored
HTTP GET and POST methods to actually down- or upload               on the Amazon S3 storage. Once the input data for the
the data, using the temporary URLs registered earlier.              experiment is selected, its location is saved to the workflow’s
                                                                    configuration, and the workflow is submitted. At this point, the
                                                                    WFI (WorkFlow Interpreter) is responsible for arranging the
V.   S UPPORTING SCIENTIFIC WORKFLOWS WITH THE DATA                 workflow’s execution in the form of jobs.
                        B RIDGE
                                                                         Once a job is about to be submitted, the DCI Bridge checks
    In this section we describe a complex usage scenario of         if the job is using any input or output files that can be managed
the Data Bridge, in which end users of a science gateway            only with the help of the Data Bridge (for example, because
are running experiments on a local PBS cluster that processes       they reside on the Amazon S3 storage). For such files, the DCI
data residing on an Amazon S3 storage. Figure 7 shows the           Bridge makes use of the Data Bridge’s get and put operations
outlines of the infrastructure. As shown in the figure, WS-         to get the temporary URLs that can be used by simple tools
PGRADE/gUSE is used as the science gateway framework,               (like wget or cURL) to download and upload the data. That
where users are accessing two user interfaces in the form           is, the DCI Bridge doesn’t do any data transfer, it simply
of portlets: the Storage Browsing portlet for browsing the          invokes the get and put operations to have the temporary
storage and selecting input and output data for the scentific       URLs registered. For all these temporary URLs, appropriate
experiment, and the Workflow management portlet, that can           handling sections are created in the job’s wrapper script, that
be used to parametrize and run the workflows belonging to the       will actually perform the input files’ download, and the output
experiments. As it is shown in the figure, the Storage Browsing     files’ upload from and to the Amazon S3 storage through
portlet is communicating with the Data Bridge to expose the         making use of the Data Bridge’s HTTP servlet on the Worker
browsing and data selection functionality. Once the workflow        node. Finally, the job is submitted to the cluster.
has been configured and submitted, its nodes are processed
by the WFI (WorkFlow Interpreter) component of gUSE. This               Once a submitted job starts on a Worker node, the wrapper
component is responsible for scheduling nodes of the workflow       script created by the DCI Bridge will fetch any input files from
for execution. Once a job is about to be submitted, it is sent      the Amazon S3 storage by using the appropriate temporary
to the DCI Bridge job submission service, that uses the Data        URLs (if the data is directly available from the S3 service, the
Bridge to register the jobs’ data down- and upload requests. In     Data Bridge’s HTTP servlet will redirect the client to S3 to
this scenario, the DCI Bridge is running wrappers to handle the     get the data directly from there). Next, as all the input files for
data, with an optional redirection to the storage to minimize the   the job are available locally, the real executable is started. And
necessary amount of network traffic through the Data Bridge.        finally, once the real executable has finished, any output files
                                                                    destinated to the S3 storage are uploaded using the appropriate
    The steps for running an experiment from the user’s point       temporary URLs.
of view are as follows: uploading input data to the storage
(Amazon S3) or search for it, configuring the experiment’s              Thus, as it can be seen, data is handled in two phases:
workflow to use the selected data, run the workflow, and upon       the DCI Bridge only registers data put and get requests to
termination, get the results. For this, the user simply has to      have temporary URLs available for data up- and download),
use the Storage Browsing portlet for data management and            and the actual data transfer always happens where the job is
selection, and the Workflow management portlet for workflow         actually running, always making directly use of the storage
configuration and experiment execution.                             service whenever possible.
    What is hidden from the user in this case is the complex            Finally, if the workflow has been processed with some
interaction of the other components presented in Figure 7           result, the user may use once again the Storage Browsing
that arrange the execution. The Storage Browsing component          portlet to check the produced results, given that the user has
in the background makes use of the Data Bridge’s Public             configured the workflow to store the results on the S3 storage.
                     VI.   C ONCLUSIONS                            form, the restriction to support storage services that organize
                                                                   data into a hierarchical structure (that is, files in some sort
    In the paper we have presented a service-oriented approach
                                                                   of directory structure) is rather strict. It follows from this
to provide a unified, easy-to-use way to manage data stored
                                                                   restriction, that for example metadata services or databases
on different types of storage services in the form of the Data
                                                                   are currently not supported by the Data Bridge. Fortunately
Bridge. We have presented a number of data management
                                                                   the abstract URI data reference object of the Data Bridge can
scenarios (data browsing, data fetching/uploading, and data
                                                                   later be extended easily to support such data services as well.
migration). Based on the needs of these scenarios, we have
presented the architecture of the Data Bridge so it satisfies
these needs.                                                                              ACKNOWLEDGMENT

    The Data Bridge operates as a stand-alone web service that        The research leading to these results has received funding
is able to perform basic operations on storages, namely listing,   from the European Union Seventh Framework Programme
creating, removing, downloading, uploading and copying data        (FP7/2007-2013) under grant agreement no 283481 (SCI-BUS)
files. All these operations are available as simple web service    and 312579 (ER-Flow).
operations supported by a very simple servlet in case of the
upload and download operations. The detached composition of                                    R EFERENCES
the web service (for initiating operations) and the HTTP servlet   [1]  Kacsuk, Peter and Farkas, Zoltan and Kozlovszky, Miklos and Hermann,
(for performing data up- and download) enables the easy usage          Gabor and Balasko, Akos and Karoczkai, Krisztian and Marton, Istvan:
of the service from scientific workflows, where data transfer is       WS-PGRADE/gUSE Generic DCI Gateway Framework for a Large
initiated by the workflow management system, and the actual            Variety of User Communities, Journal of Grid Computing, vol 10, no
                                                                       4, 2012
data transfer happens where the job is actually running, with
                                                                   [2] SCI-BUS Project, https://www.sci-bus.eu/
the help of simple HTTP clients (like wget or cURL).
                                                                   [3] W. Allcock, J. Bresnahan, R. Kettimuthu, M. Link, C. Dumitrescu, I.
    We have presented the architecture of the Data Bridge              Raicu, I. Foster: The Globus Striped GridFTP Framework and Server,
in detail: we have shown all the interfaces, and highlighted           Proceedings of Super Computing 2005 (SC05), November 2005
internal components as well. The different interfaces (the         [4] Arie Shoshani, Alex Sim, Junmin Gu: Storage Resource Managers:
                                                                       Middleware Components for Grid Storage, Tenth Goddard Conference
Public Interface and the Adaptor Interface) operate at different       on Mass Storage Systems and Technologies. 2002. p. 209.
levels: the Public Interface is exposed to the Data Bridge’s       [5] Amazon Simple Storage Service (Amazon S3), http://aws.amazon.com/
clients as a web service, whereas the Adaptor Interface is a           s3/
Java interface the different Adaptors responsible for actually     [6] Constantinos Karasavvas, Mario Antonioletti, Malcolm Atkinson, Neil
communicating with the storages of different types should              Chue Hong, Tom Sugden, Alastair Hume, Mike Jackson, Amrey Krause,
implement. This organizations offers a pluggable architecture,         Charaka Palansuriya: Introduction to OGSA-DAI Services, Scientific
where implementation for new storage Adaptors becomes                  Applications of Grid Computing, Lecture Notes in Computer Science
                                                                       Volume 3458, 2005, pp 1-12
a relatively easy task. As already mentioned, the different
Adaptors should interface with the different storage resources,    [7] Chaitanya Baru, Reagan Moore, Arcot Rajasekar, Michael Wan: The
                                                                       SDSC Storage Resource Broker, Proceedings of the 1998 conference of
and here we’re making use of the unified jSAGA API where               the Centre for Advanced Studies on Collaborative research, CASCON
possible to minimize necessary development. Finally, in the            ’98, p. 5, 1998
architecture the Temporary URL queue serves as a volatile          [8] Arcot Rajasekar, Reagan Moore, Chien-Yi Hou, Christopher A. Lee,
storage for registered file up- and download requests, thus            Richard Marciano, Antoine de Torcy, Michael Wan, Wayne Schroeder,
helps to implement decoupling the actual data transfer from            Sheau-Yen Chen, Lucas Gilbert, Paul Tooby, and Bing Zhu: iRODS
                                                                       Primer: Integrated Rule-Oriented Data System, Synthesis Lectures on
the invocation of the web service calls.                               Information Concepts, Retrieval, and Services, 2010, Vol. 2, No. 1 ,
    We have also presented a complex scenario which covers             Pages 1-143
two of the cases presented in the Requirements section. This       [9] Sylvain Reynaud: Uniform Access to Heterogeneous Grid Infrastructures
                                                                       with JSAGA, Production Grids in Asia, pp 185-196, 2010
complex scenario is about configuring and running scientific
                                                                   [10] A Simple API for Grid Applications (SAGA), http://www.ggf.org/
experiments in the form of WS-PGRADE/gUSE science gate-                documents/GFD.90.pdf
way framework workflows, that operate on data available in a
                                                                   [11] Globus Online, https://www.globusonline.org/
cloud storage (Amazon S3). In this complex scenario users are
making implicitly use of the Data Bridge service: the Storage
Browsing portlet is used to browse, or up- and download the
data, the Workflow management portlet is used to specify
which data the experiment should be run on, finally, the DCI
Bridge and its job wrappers are making use of the Data Bridge
service to actually handle the data (download the data before
the job’s real executable is started, and upload produced data
after the job’s real executable has finished).
    The complex scenario presented the usage of the Data
Bridge from WS-PGRADE/gUSE, but it is not tied directly to
any workflow system, meaning it can be used as a standalone
service, and can easily be integrated into an existing workflow
system or user interface to satisfy users’ needs.
   Although the Data Bridge is a usable service in its current