=Paper= {{Paper |id=None |storemode=property |title=A Data-Centric Science Gateway for Computational Neuroscience |pdfUrl=https://ceur-ws.org/Vol-993/paper12.pdf |volume=Vol-993 |dblpUrl=https://dblp.org/rec/conf/iwsg/ShahandBHJSMGCKO13 }} ==A Data-Centric Science Gateway for Computational Neuroscience== https://ceur-ws.org/Vol-993/paper12.pdf
                             A Data-Centric Science Gateway
                             for Computational Neuroscience

                        Shayan Shahand∗‡ , Ammar Benabdelkader∗ , Jordi Huguet† , Mahdi Jaghouri∗
                                Mark Santcroos∗ , Mostapha al Mourabit∗ , Paul F.C. Groot†
                        Matthan W.A. Caan† , Antoine H.C. van Kampen∗ and Sı́lvia D. Olabarriaga∗‡
                   ∗ Bioinformatics Laboratory, Dept. of Clinical Epidemiology Biostatistics and Bioinformatics
                                                  † Brain Imaging Center, Dept. of Radiology

                                 Academic Medical Center, University of Amsterdam, The Netherlands


    Abstract—Science gateways provide user interfaces and high-          in the European Grid Infrastructure (EGI) Science Gateway
level services to access and manage applications and data collec-        Primer [3], where issues involved in SG design, implementa-
tions on distributed resources. They facilitate users to perform         tion and operation are presented and discussed. According to
data analysis on distributed computing infrastructures (DCIs)            this Primer, SGs are desktop or web-based interfaces to a set of
without getting involved into the technical details. The e-BioInfra      applications and data collections. SGs comprise front-end and
Gateway is a science gateway for biomedical data analysis
on a national grid infrastructure, which has been successfully
                                                                         back-end components and offer services that facilitate access to
adopted for neuroscience research. Necessary improvements in             computing and storage resources, as well as services provided
this gateway motivated the design of a new next generation of            by Distributed Computing Infrastructures (DCIs). Moreover,
e-BioInfra Gateway. In this paper we describe the motivation,            SGs support collaboration between researchers through ex-
requirements and design of this new gateway, which is based on           change of ideas, tools and datasets. From the functional
the WS-PGRADE/gUSE SG framework, allowing for support for                perspective, SGs are SG frameworks or SG instances. SG
other types of DCIs. The new gateway has additional generic              frameworks implement generic functionalities such as security,
data and meta-data management facilities to access and manage            workflow and data management, and DCI access. Examples are
(biomedical) data servers, and to provide an integrated and data-        WS-PGRADE/gUSE [4] and DARE [5] frameworks. SG in-
centric user interaction. Its first prototype is implemented and         stances are community-specific science gateways, with tailored
deployed for the computational neuroscience research community
of the Academic Medical Center of University of Amsterdam.
                                                                         interfaces and services for a specific application domain. SG
                                                                         instances can be be built using SG frameworks or with custom
    Keywords—science gateway (SG), e-science, virtual laboratory         software stacks. See Section II for examples of SG instances.
(VL), problem solving environment (PSE), computational neuro-
science, medical image analysis                                              The e-BioInfra Gateway [6] is a SG instance for large scale
                                                                         biomedical data analysis on the Dutch e-Science Grid [7].
                        I.   I NTRODUCTION                               It is designed to simplify usage of this infrastructure by
                                                                         biomedical researchers for the analysis of large datasets, and
    Science Gateways (SGs), also called Problem Solving En-              implemented based on a custom framework. It is deployed
vironments (PSEs) or Virtual Laboratories (VLs), support sci-            at the Academic Medical Center (AMC) of the University
entists in e-Science endeavours. De Roure et al. [1] described           of Amsterdam (UvA), The Netherlands. It lowers barriers for
the requirements of e-Science environments as a spectrum with            users by providing services such as community Grid certificate
two ends. One end is characterized by automation, virtual or-            and automatic file transport to and from the Grid resources.
ganizations of services, and the digital world, and the other end        Since its deployment in production (early 2011), researchers
is characterized by interaction, virtual organizations of people,        have successfully performed large computations on the Dutch
and the physical world. Orthogonal to these requirements at              Grid infrastructure via the e-BioInfra Gateway with minor help
both ends is the issue of scale, for example, of virtual organiza-       from the support team. For example, Peters et al. [8], Wingen et
tions, computation, storage, and the complexity of relationships         al. [9], Rienstra et al. [10], and de Kwaasteniet et al. [11]
between them. Increasing scale demands automation and, as                have already published results of neuroscience research based
highlighted by Hey and Trefethen [2], the computer scientists            on the data analysis performed via the e-BioInfra Gateway.
have the research challenge of creating high-level intelligent           The gateway structures the system information and allows
services that genuinely support e-Science applications. Such             for extensions with new data analysis methods. This enabled
services, e.g., SGs, should go beyond straightforward access             (external) developers to extend it with ten applications, six for
to computing resources, and also include support to construct            medical imaging, three for next generation sequencing data
and manage virtual organizations, as well as to manage the sci-          analysis, and one for mass-spectrometry modelling. The e-
entific data deluge in the scholarly cycle including hypothesis,         BioInfra Gateway currently has 29 active users, and the largest
experimentation, analysis, publication, research, and learning.          usage so far is by the researchers from the Brain Imaging
   A large number of communities are therefore facing the                Center of the AMC (BIC [12]).
challenge of building SGs. A recent collaboration resulted
                                                                            Although the current e-BioInfra Gateway can be considered
  ‡ Corresponding authors: {s.shahand | s.d.olabarriaga}@amc.uva.nl      a success story, our experience indicated the need for further
improvement respectively to the following aspects: a) support            management services. These generic services are not dedicated
for data management; b) support for other types of DCIs, such            to any domain-specific data type or format and try to remove
as clusters and clouds; c) customizable interfaces to suit differ-       the burden of moving files around from the user shoulders.
ent user expertise, roles, and preferences; and d) sustainability
of the adopted framework.                                                    III.   BACKGROUND AND M OTIVATION FOR A NEW
                                                                                                    GATEWAY
    In this paper we discuss these experiences, which motivated
the design of the next generation of the e-BioInfra Gateway.                 In a nutshell, the current e-BioInfra Gateway works as
The new gateway design is generic (i.e., it is not specific to           follows (see more details in [6]): the user authenticates with
a particular research community), and it is based on the WS-             username and password, selects the application to run, selects
PGRADE/gUSE SG framework [4], which facilitates access to                the input files and other parameters, and starts a so called
heterogeneous DCIs. Additional features of the new gateway               experiment. She/he can then monitor the experiment and, when
include data and information management, as well as support              finished, retrieve the results. The processing on grid resources
for meta-data that is used and generated during the execution of         is performed by the MOTEUR [28] workflow management
complex data processing. We describe the requirements, design            system (WfMS), and provenance information is kept about
and architecture of the new system, and discuss some initial             the experiments. Because medical imaging data files are large,
results based on the prototype implementation for computa-               their transport is not done directly via the e-BioInfra Gateway
tional neuroscience, coined NeuSG. Although we focus on the              web interface, but via an FTP directory that is located in the
neuroscience use case, the approach could be applicable to               trusted network of the hospital. Therefore, for neuroscience
other domains as well.                                                   applications the user uploads the data to the server before
                                                                         performing the steps above, and retrieves the results from the
                     II.   R ELATED W ORK                                same place when the experiment is completed.

    Design, development, and usage of SGs have gained inter-                 In these around two years of experience with gateway
est and attention in the past few years. Several projects and            extension, operation, and user support, we faced challenges
initiatives have been started worldwide to develop SG frame-             discussed below.
works and SG instances for diverse user communities [13].                    A large number of errors are caused by invalid input data.
For example, see the list of SGs on the websites of XSEDE                Users typically have difficulty to prepare files for processing
(Extreme Science and Engineering Digital Environment) [14],              with the gateway applications, which currently involves steps
EGI (European Grid Infrastructure) [15], and the SCI-BUS                 for file (re-)formatting, naming, transport, and also being aware
(SCIentific gateway Based User Support) [16] project.                    of the data types that can be processed by each application. Al-
                                                                         though these problems are significantly reduced after training
    The VIP (Virtual Imaging Platform) portal [17], the Charite
                                                                         or reading the user manual, the data preparation and transport
Grid portal [18], and WeNMR gateways [19] are examples
                                                                         process should be improved with further automation.
of SG instances based on custom frameworks. The Mos-
GRID SG [20], VisIVO [21], and the Swiss Grid proteomics                     Originally the e-BioInfra Gateway was meant to facilitate
(iPortal) [22] portals are examples of SG instances based on             access grid resources. In the past years other resources have
SG frameworks (i.e., the WS-PGRADE/gUSE [4] in the case                  become available for research, such as local clusters at the
of these three). All of these SGs typically provide data and             AMC, a High-Performance Cloud, and GPU clusters. The
information management for a specific research community                 current WfMS does not interface with clouds, so another
using custom solutions. Particularly in the field of medical             solution is required to exploit these additional resources.
imaging, two examples relate more closely to our work.
                                                                            The current gateway supports two user profiles, end-
    The data engine [23] of the CHAIN project [24] adopts                users and administrators. We noticed, however, that additional
the jSAGA implementation of the Simple API for Grid Ap-                  profiles could be better supported with (combinations of)
plications (SAGA) standard to communicate with Grid re-                  customized views of the various services [29]. For example,
sources for data storage. The related meta-data is stored in             end-users can have different levels of expertise, or application
in-house databases. The CHAIN data engine is used in the                 support can be provided by members of the user community
CHAIN SG [25] and the DECIDE SG [26]. The DECIDE SG                      (and not necessarily only system administrators). Therefore a
provides high-level services for computer-aided neurological             more flexible framework is needed to manage users, their roles
diseases diagnosis and research on the European Research and             and interaction, and viewing preferences.
Education Networks and the European Grid Infrastructure.
                                                                             Finally, we noticed the need for adopting a more sustain-
    The neuGRID for you (N4U) Science Gateway [27] pro-                  able software stack. Although our custom framework fulfilled
vides user-friendly access to N4U tools, algorithms, pipelines,          the needs at first, as a small research group it is difficult to
visualization toolkits, and resources on various DCIs (Grid,             maintain and extend it. In particular, keeping up with all the
Cloud, and Clusters) for medical imaging research, towards               developments related to DCIs requires significant effort and
the cure of brain diseases, in particular Alzheimer’s disease.           expertise that can be achieved by bundling forces across SG
The N4U Persistency Service registers distributed data from              communities, such as done in the SCI-BUS project [16].
project partners into the N4U Information Base, which are
then treated as a single data source.                                                  IV.   R EQUIREMENTS A NALYSIS
    In contrast to these SGs, our approach aims at generic ser-              We described the typical phases of computational neu-
vices that are able to connect to existing data and information          roscience studies in [30], which in summary include study

                                                                     2
                                                                    e-Bioinfra Browser Portlet                                    Security         WS-




                                                                                                                                                           WS-PGRADE/gUSE Framework
                                       e-BioInfra                           (eBrowser)                                             Portlet      PGRADE                                Presentation
                                       Portal                                                                                                     Portal

                  e-BioInfra Gateway
                                                     e-Bioinfra                      Processing Manager (PM)                         gUSE
                                                     Catalogue                                                                    Information
                                                      (eCAT)                                                                        System
                                                                                   Data                                                                                               High-level
                                                              DB                                 WorkFlow          Application                                                        Services
                                                                                Transport                                             DB
                                                                                                 Interpreter       Repository
                                                                                 Service
                                       e-BioInfra                                 (DTS)                                                            gUSE

                                                                     Plug-in
                                                    Plug-in




                                                                      IMS
                                                     IMS



                                       Generic                ...                                        Job Submission Service
                                                                                                                                                 Generic
                                       Services                                                              (DCI-BRIDGE)
                                                                                                                                                Services


                                           Information
                                          Management                                                Cloud               Cluster              Grid
                                                                                               Middleware            Middleware        Middleware                                     Middleware
                                          System (IMS)                             Data
                                                                    DB                           Services              Services          Services                                     Services
                                                                               Services


                                            Storage                            Data                  Cloud              Cluster             Grid                                      Resources
                                                                          Resources              Resources           Resources         Resources



Fig. 1. Layered architecture of the e-BioInfra Gateway based on the WS-PGRADE/gUSE generic SG framework. Grey boxes represent existing third-party
components, and white boxes denote components added for complementary functionality. See text for more details.


design, data acquisition, data handling, processing, analysis,                                                       4) Automatic provenance information collection about the
and publication. Based on the analysis of these phases, the                                                             methods, parameters and input files used for processing.
actors who are involved in each phase, and the tasks that                                                            5) Single sign-on facility to authenticate and authorize trans-
they perform, in that paper we identified the properties and                                                            parently to various computing and storage resources using
functionalities of SGs to support computational neuroscience                                                            user or community credentials.
research communities. In summary, the required properties and
functionalities include: sharing of data and methodology; sat-                                                         In addition to these functionalities, we aimed for a gateway
isfying security and privacy regulations; scalable, transparent,                                                   that is:
and flexible management of storage and computing resources;                                                          1) extensible, to easily accommodate new types of data or
literature discovery; collaboration support; meta-data, data,                                                           compute resources, applications, and user groups;
workflow, and provenance management; and visualization.                                                              2) customizable, to be able to support different research
    The current gateway [6] covers a subset of these require-                                                           communities and user profiles;
ments, namely: transparent authentication and authorization                                                          3) scalable, to gracefully support the growth of user commu-
with Grid resources; flexible and efficient data transfer be-                                                           nity and its needs for resources, as well as infrastructures
tween local and Grid storage for files without user interven-                                                           capacity and heterogeneity; and
tion; workflow processing management, including logging and                                                          4) sustainable, by using a community-driven SG framework.
monitoring; and an extensible set of applications for various
biomedical domains. For the new gateway we focused on                                                                       V.    S YSTEM D ESIGN AND I MPLEMENTATION
the following additional functionalities, in particular to further                                                     Figure 1 illustrates the layered architecture of the new e-
support data handling:                                                                                             BioInfra Gateway. At the bottom, the Resource layer (dark
                                                                                                                   orange) with several DCI (e.g., local clusters, Grid and Cloud)
 1) Unified, secure, and easy access to data and related
                                                                                                                   and data resources (e.g., Radiology research data server).
    meta-data stored on heterogeneous infrastructures and
                                                                                                                   These resources are utilized through Middleware Services con-
    repositories. Users should be able to transparently query,
                                                                                                                   tained in the second layer (light orange). High-level Services
    explore, process, and analyse data from a single interface,
                                                                                                                   contained in the third layer (blue) provide an abstraction to
    without bothering about the data location or format, or
                                                                                                                   interact with the middleware, such as workflow management
    how it is retrieved for further processing.
                                                                                                                   and data transport. Finally, the Presentation layer (green) con-
 2) Automatic data format conversion and preprocessing ac-
                                                                                                                   tains the interfaces for user interaction. The two topmost layers
    cording to pre-defined rules. For example, pseudonymisa-
                                                                                                                   (green, blue) are implemented using generic SG framework
    tion and format conversion are automatically performed
                                                                                                                   components provided by WS-PGRADE/gUSE (at the right),
    when new data is imported into the system.
                                                                                                                   as well as a new data-centric SG framework that complements
 3) Automatic and interoperable file transport and processing
                                                                                                                   the functionality of WS-PGRADE/gUSE for the specific case
    on different infrastructures (e.g., data servers, grid, cloud).
                                                                                                                   of NeuSG (at the left).
    Low level technical details are hidden from the users,
    such as different communication protocols, middleware                                                              Figure 2 illustrates the systems that host these components
    services, and authorization mechanisms.                                                                        respectively and their network location. Due to security regu-

                                                                                                               3
                              A
                                         F
                                                 B
                                                                                      these qualities, XNAT has been deployed in the Radiology
                                         i
                                         r                                            department of AMC and connected to the NeuSG as first
                                         e                                            supported IMS.
                                         w
                          Data           a   e-BioInfra
        Scanner                                                 DCI
                         Server               Gateway
                                         l                                            B. WS-PGRADE/gUSE SG Framework
                                         l
                                                                                          WS-PGRADE/gUSE SG framework [4] is an open source,
Fig. 2.     Hosts and services of NeuSG and their network location: inside            workflow- and service-oriented framework that facilitates de-
or outside the AMC firewall. The e-BioInfra Gateway is located in the                 velopment, execution, and monitoring of scientific workflows
demilitarized zone. User A is within the firewall boundaries and can access           on DCIs. It comprises the WS-PGRADE portal, and the Grid
the data directly; user B is outside the firewall boundaries and therefore only       User Support Environment (gUSE) services. WS-PGRADE is
has access to the meta-data and processing resources.
                                                                                      based on the Liferay portal framework, which provides rich
                                                                                      facilities for community management and customizable user
lations for processing medical research data, some services are                       interfaces. gUSE provides high-level services to access various
hosted inside the hospital firewall. The data is generated by the                     DCI resources. These qualities motivated the choice for this
scanner and directly imported into a Data Server located inside                       SG framework to implement our gateway.
the firewall, which keeps both the raw data and the meta-data.                            The most relevant gUSE services for our gateway are:
The e-BioInfra Gateway is located in the demilitarized zone
(DMZ) of the AMC network, which means that only some of                                  • Job submission service or DCI-BRIDGE:1 provides flex-
its services are visible from outside the network. In Figure 2,                            ible and versatile access to a large variety of DCIs such
both users A and B can browse meta-data, start and monitor                                 as grids, desktop grids, clusters, clouds and service-based
data processing via the gateway, but only user A can download                              computational resources. It also handles authentication
and view the medical imaging data. The raw data itself can                                 and authorization to the configured DCIs transparently.
only be accessed by the user directly from the Data Server, or                           • Workflow Interpreter: parses workflows, submits jobs to
by privileged services of the e-BioInfra Gateway.                                          the DCI-BRIDGE, and retrieves their status for monitor-
                                                                                           ing and fault-tolerance.
    Below we further detail the components that are more                                 • Application Repository: stores ready-to-use tested and
relevant for a data-centric SG, namely data services and the                               configured workflows. These workflows are exported to
new components illustrated as white boxes in Figure 1. For                                 the application repository by workflow developers, from
completeness we briefly introduce the WS-PGRADE/gUSE                                       where they are imported into user space for execution.
SG framework, and finally, we describe the interaction between                           • gUSE Information System: stores configurations of gUSE
these components.                                                                          services and workflow related information such as work-
                                                                                           flow executions and their jobs status.
A. Data Services
                                                                                          The WS-PGRADE/gUSE framework also provides two
    Management of biomedical research data, with its grow-                            Application Programming Interfaces (APIs) to create SG in-
ing size and complexity, requires domain-specific Information                         stances. We used the Application Specific Module (ASM) API
Management Systems (IMSs). There are several IMSs that                                to utilize gUSE services, more specifically the Application
address challenges such as management of biomedical research                          Repository, to manage and share workflows among users, and
data and meta-data, electronic data exchange, archival and                            the Workflow Interpreter, to submit workflows.
security, and the research communities usually already adopt
                                                                                          The WS-PGRADE portal also offers a set of generic
such systems routinely. Additionally, every community has its
                                                                                      portlets to interact with gUSE services via web-based graphical
own procedure to implement rules and regulations regarding
                                                                                      user interfaces. For example, users can manage their creden-
the protection of biomedical research data, as well as policies
                                                                                      tials, which are required to authenticate and authorize to DCIs,
for data sharing and archiving. Therefore, instead of repli-
                                                                                      via security portlets. See [4] for the complete description of
cating such efforts, we decided to rely on existing, external,
                                                                                      WS-PGRADE/gUSE services and portlets.
biomedical research data and meta-data resources, as well as
on their own security mechanisms and policies. In this way,                              Currently the WS-PGRADE/gUSE framework does not
the research community itself provides and manages the IMS,                           have any facility to connect to external IMS resources. More-
defining data ownership, access policies, and regulating data                         over, its current data management facilities are also limited.
confidentiality and privacy methods such as pseudonymisation.                         The data-centric e-BioInfra Gateway tries to bridge this gap
The IMS is connected to the e-BioInfra Gateway by agreement                           with additional components described below.
between the community and the gateway providers, and the
data becomes available for processing at the gateway for                              C. e-BioInfra Gateway data centric framework
authorized users only.
                                                                                         The core of the new e-BioInfra Gateway is made of the fol-
   A popular IMS for medical imaging data and meta-data is                            lowing components: e-BioInfra Catalogue (eCAT), Data Trans-
the eXtensible Neuroimaging Archive Toolkit (XNAT) [31].                              port Service (DTS), Processing Manager (PM), and e-BioInfra
XNAT is an open source IMS that offers an integrated
                                                                                         1 According to [4], the DCI-BRIDGE has been moved out of the gUSE
framework for storage, management, electronic exchange, and
                                                                                      layer to highlight that it is directly accessible via the standard OGF BES
consumption of medical imaging data and its complementary                             job submission interface. Here we utilize a different conceptual framework to
meta-data. XNAT provides a rich communication layer based                             illustrate the architectural layers of the system, thus we consider it as part of
on a RESTful API of resource-oriented web services. Due to                            the gUSE generic services.


                                                                                  4
Browser Portlet (eBrowser). They are loosely coupled and                            E. Processing Manager (PM)
communicate via well-defined APIs, an approach that paves the
road towards a service-oriented architecture and facilitates their                      The PM takes care of submission and monitoring of data
reuse in other gateways. These components are deployed in the                       processing applications, which are defined as workflows that
same environment alongside WS-PGRADE/gUSE components                                are executed by the gUSE Workflow Interpreter. The PM
and work together to implement the NeuSG functionalities.                           instructs to the DTS to transport input files from the IMSs
                                                                                    to the storage resources of the DCI on which the processing
                                                                                    is performed, and to transport the results back to the IMS.
D. e-BioInfra Catalogue (eCAT)                                                      The PM imports the workflow from the gUSE Application
                                                                                    Repository and configures it with the physical location of input
    The eCAT has been designed to facilitate the data manage-
                                                                                    data before submission.
ment functionalities at the gateway. It is a central information
store for user-specific configurations such as IMS hosts and
the user’s credentials to access them. eCAT defines and im-                         F. Data Transport Service (DTS)
plements a data model to manage system-level information,
                                                                                        The DTS transports data between IMSs and storage re-
with the following main entities: User, Project, Data,
                                                                                    sources on DCIs. This service contacts the eCAT to determine
Application, Processing, and View preferences
                                                                                    how to authenticate the IMS on behalf of the user, how to
(see Figure 3 for their relationships).
                                                                                    authenticate with the storage resources of the DCI (possibly
    eCAT provides an aggregated and user-specific view of                           with community credentials), and how to access data on both.
Data entities that each user has access to a given IMSs.                            It autonomously performs the data transfer using third-party
Note that eCAT is not meant to duplicate meta-data that is                          mechanisms as much as possible to avoid bottlenecks. If some
already stored on IMSs; instead, it only stores pointers to such                    data has been replicated on a DCI, the location of that replica
information on IMSs. It retrieves and stores meta-data on IMSs                      is stored in the eCAT and retrieved later.
through the respective IMS Plug-ins, which are software mod-
ules attached to eCAT to enable programmatic communications                         G. e-BioInfra Browser Portlet (eBrowser)
with a specific IMS. The only exceptions are some meta-data
that are specific to user activities on the gateway, which are                          Unlike the previous components, the eBrowsert is part of
not possible, nor of direct interest of research communities, to                    the presentation layer. It provides a web-based user interface
store in their IMSs. For example, location of data replicas on a                    to interact with all the e-BioInfra generic services. Instead
DCI and user View preferences are such meta-data that                               of contacting the services directly, eBrowser retrieves infor-
are only stored in the eCAT database.                                               mation from eCAT to provide a unified view to scientists to
                                                                                    browse data, projects and data processing instances. eBrowser
    Data entities are included in, and processed within, the                        essentially enables scientists to start, manage, and monitor data
scope of Project entities. When possible, Projects are                              processing (through PM), as well as to configure viewing and
also in sync with those on IMSs. Each User has access to                            interaction preferences with the gateway.
some Applications, which are tested and ready-to-use
workflows. When a User processes a certain Data with a
specific Application, the information about this activity is                        H. Component Interactions
captured by eCAT as a Processing entity. The provenance                                 Figure 4 illustrates the interactions between users and
information about the Data consumed and produced during a                           the e-BioInfra Gateway, as well as the interactions between
Processing, the parameters, and the latest status of process-                       underlying components. User actions are expressed via the
ing, are also stored in the eCAT database. eCAT also provides                       eBrowser and trigger interactions between other high-level
necessary information to transport the results produced by a                        components (i.e., PM, DTS and eCAT) and lower-level compo-
data processing to the respective IMS, if possible together with                    nents (i.e., gUSE and XNAT IMS). Details of these interactions
the provenance information. eCAT is accessed by other system                        are presented below.
components (PM, DTS, and eBrowser) through its API.
                                                                                        Upon successful authentication with the WS-PGRADE
                                                                                    portal, the user gets access to the eBrowser portlet. New users
                                                                                    need to configure an IMS endpoint by providing the URL of
                                                                                    the IMS, its type (e.g., XNAT), and recording their username
                                                                                    and password securely. These configurations are collected by
                                                                                    the eBrowser and sent to eCAT for validation and storage.
                                                                                    After this configuration step, the following takes place when
                                                                                    the user logs into the e-BioInfra Gateway
                                                                                     1) At first the user sees a list of her projects. To display this
                                                                                        list, eBrowser sends a request to eCAT, which authen-
                                                                                        ticates on behalf of the user to all registered IMSs and
                                                                                        generates a unified list of all projects that are accessible
                                                                                        by that particular user.
                                                                                     2) Similarly, when the user selects a project, the eBrowser
Fig. 3. Simplified entity-relationship model of the information stored in the
e-BioInfra Catalogue.
                                                                                        sends a request to eCAT, which queries meta-data on the
                                                                                        IMS to produce the list of all data entries in that project.

                                                                                5
Fig. 4. Sequence diagram illustrating the interactions between the user and the various NeuSG components. After authentication, the user can browse projects,
select data based on meta-data, select an application to run on these data and start new processing, monitor processing, and download results.



                                                                             6
 3) The user then selects data entities that she wishes to             on several IMSs and described by rich meta-data, but also
    process, and browses for available applications. The               to perform large scale data processing on DCIs. This can be
    eBrowser retrieves and displays the list of applications           done without getting involved into low-level details, such as
    accessible to the user. The user selects an application and        transporting files, as it was the case in the previous generation.
    the eBrowser displays configurations for that application
                                                                           The previous generation was built based on the Spring
    (e.g., application parameters).
                                                                       framework, it only supported the Dutch Grid infrastructure,
 4) The user configures the application and starts a new data
                                                                       and it lacked facilities for user interface customization or
    processing. The eBrowser collects the provided configu-
                                                                       community support. In contrast, the new generation of the e-
    ration and submits a processing request to the PM. The
                                                                       BioInfra Gateway is built based on the WS-PGRADE/gUSE
    PM consults eCAT to find the details of the selected
                                                                       SG framework, which itself is built on the Liferay portal
    application, namely the DCI to run it and the arguments
                                                                       framework. Liferay provides facilities for user management,
    that need to be configured for its execution (e.g., input
                                                                       community management, and community support (e.g., on-
    files and parameters). The PM validates and creates the
                                                                       line forum). Moreover, it also facilitates the construction of
    processing entity in eCAT, from which the eBrowser
                                                                       customizable web-based user interfaces that are required to
    can later retrieve and display to the user for browsing,
                                                                       suit needs of each user (community) based on their profile,
    management, and monitoring purpose.
                                                                       expertise, and roles. The WS-PGRADE/gUSE SG framework
 5) The PM further instructs the DTS to move the required
                                                                       provides high-level generic services to manage workflows,
    input data to the target DCI. The DTS contacts eCAT to
                                                                       enact them to various DCIs, and monitor their execution. These
    determine if those data already have a replica on the target
                                                                       services allow for functional scalability and interoperability
    DCI. If no replica is available, the eCAT provides DTS
                                                                       between various DCIs. Additionally, the WS-PGRADE/gUSE
    with the IMS endpoint configurations (including authenti-
                                                                       framework is an actively maintained and developed open-
    cation token) and location where it can retrieve the input
                                                                       source project, which allows the development team of the
    data. The DTS then uses this information to authenticate
                                                                       e-BioInfra Gateway to concentrate on its community-specific
    on behalf of the user to the IMS and download the input
                                                                       features, and makes the gateway operation more sustainable.
    data. Similarly, it retrieves user authentication tokens for
    the target DCI to upload input data (not shown in the                  Currently only XNAT is supported as IMS. Several other
    diagram). Finally the DTS registers in eCAT the location           data management platform alternatives meet the research re-
    of the file replica in the DCI and returns it to the PM.           quirements, although XNAT is of special interest due to its
 6) After all data have been staged to the target DCI, the PM          support for medical imaging, and its adoption by the AMC
    imports the application from gUSE via the ASM API into             neuroscience research community. It has been particularly de-
    the user-space, and configures it with the physical location       signed for managing standard medical imaging data as the core
    of input data and user-specified parameters.                       of its functionalities. In addition, its archiving and integrating
 7) Having everything in place, the PM starts the data pro-            capabilities, data model flexibility, ease of use and the highly
    cessing by submitting the configured application (work-            active community of users/developers makes it a relevant asset.
    flow) to gUSE via the ASM API, and updates the process-            Note however that the new e-BioInfra Gateway has been
    ing status in eCAT. The gUSE Workflow Interpreter parses           designed to support multiple and heterogeneous IMSs, and it
    the workflow, generates corresponding jobs, and submits            is not dependent on XNAT.
    them to DCI-BRIDGE. The DCI-BRIDGE retrieves user
                                                                           The eCAT contains much meta-data about the system level
    authentication tokens for the target DCI to submit jobs
                                                                       (viewing and processing), but it is completely dependent on
    on behalf of the user to the target DCI.
                                                                       an external IMS for the data. If the IMS is not available,
 8) The PM periodically updates the information in eCAT
                                                                       the user cannot perform any data-related activity, such as
    based on the status reports from gUSE, which is then
                                                                       browsing or selecting files. We have considered duplicating the
    reflected in the interface of the eBrowser for monitoring.
                                                                       meta-data on the eCAT, both for efficiency and fault-tolerance
 9) Typically, each processing consists of multiple data to be
                                                                       reasons, but we concluded that the synchronization of the two
    processed. When the processing of some data are finished,
                                                                       systems would be too time consuming. Moreover, we chose
    their results are immediately stored in the specific IMS
                                                                       to keep the access control to the Data Server completely in
    via the DTS. Thereby the user can check results even
                                                                       the hands of the community administrators, which, due to
    before the entire processing is complete.
                                                                       the required expertise, can be different persons than the SG
10) The user browses, manages, and monitors the processing
                                                                       administrators. This helped us build trust between the systems,
    via the eBrowser. eBrowser contacts eCAT to get infor-
                                                                       which is a known critical factor to connect such systems to
    mation about processing entities, including status.
                                                                       open infrastructures such as grids and clouds.
11) The user is forwarded to the IMS directly to access and
    download processing results via a link from at the gateway             We used WS-PGRADE/gUSE as SG framework, which
    interface.                                                         in principle provides the workflow management and portal
                                                                       functionalities needed for the NeuSG. After a learning phase,
                     VI.    D ISCUSSION                                during which the concepts of the framework were better
                                                                       understood by the team, we observed that the usage model
    In the new generation of the e-BioInfra Gateway we tried           of the framework differs from our needs in some cases, which
to bridge the gap between scientists, data services, and DCIs.         has led us to develop our own processing manager component.
We aimed for a data-centric gateway in which everything is             This has the goal of translating high-level “data processing”
organized around “data”. Now scientists can use the gateway            commands into low-level data transports, which are performed
not only to browse their data, which can be potentially stored         by the data transport service, and calls to the gUSE ASM

                                                                   7
API. At first this introduces overhead, but at the same time                      [4]   P. Kacsuk et al., “WS-PGRADE/gUSE Generic DCI Gateway Frame-
it provides sufficient isolation from aspects regarding this                            work for a Large Variety of User Communities,” Journal of Grid
particular WfMS, and allows us to consider other WfMSs in                               Computing, vol. 10, no. 4, pp. 601–630, 2012.
                                                                                  [5]   S. Maddineni et al., “Distributed Application Runtime Environment
the future.                                                                             (DARE): A Standards-based Middleware Framework for Science-
                                                                                        Gateways,” Journal of Grid Computing, vol. 10, no. 4, pp. 647–664,
    The development of eBrowser viewing portlets was also                               2012.
simplified by the decision to have all user interaction to take                   [6]   S. Shahand et al., “A grid-enabled gateway for biomedical data analy-
place using information available on the eCAT. This approach                            sis,” Journal of Grid Computing, vol. 10, no. 4, pp. 725–742, 2012.
requires all software components to register all activity on                      [7]   “The BiG Grid Project website,” http://www.biggrid.nl.
the eCAT, but it decouples the viewer from all the other                          [8]   B. D. Peters et al., “Polyunsaturated fatty acid concentration predicts
                                                                                        myelin integrity in early-phase psychosis,” Schizophrenia Bulletin,
components accordingly. This reduces dependencies between                               2012.
the system components and simplifies its implementation and                       [9]   G. A. van Wingen et al., “Persistent and reversible consequences of
maintenance. Moreover, it make the eCAT as a natural prove-                             combat stress on the mesofrontal circuit and cognition,” Proceedings of
nance data repository for the activity carried out at the gateway.                      the National Academy of Sciences, vol. 109, no. 38, pp. 15 508–15 513,
                                                                                        2012.
                                                                                 [10]   A. Rienstra et al., “Symptom validity testing in memory clinics:
           VII.    C ONCLUSION AND F UTURE W ORK                                        Hippocampal-memory associations and relevance for diagnosing mild
                                                                                        cognitive impairment,” Journal of Clinical and Experimental Neuropsy-
    The implementation is being completed, and the new gate-                            chology, 2012.
way will be released soon (April) for evaluation by AMC BIC                      [11]   B. de Kwaasteniet et al., “Relation between structural and functional
users. The portfolio of applications will be enriched (currently                        connectivity in major depressivedisorder,” Biological Psychiatry, no. 0,
                                                                                        pp. –, 2013.
there are only two), and the eBrowser will be extended                           [12]   “The BIC (Brain Imaging Center) at the AMC (Academic Medical
(currently only basic browsing functionality is available). At                          Center) website,” http://www.lebic-amc.nl.
a second step, the gateway will be disseminated in training                      [13]   T. Kiss, “Science gateways for the broader take-up of distributed
events, and become open to the whole neuroscience commu-                                computing infrastructures,” Journal of Grid Computing, vol. 10, pp.
nity of the University of Amsterdam. This step will require                             599–600, 2012.
                                                                                 [14]   “The XSEDE (Extreme Science and Engineering Digital Environment)
inclusion of other IMSs, for example other XNAT instances                               website,” http://www.xsede.org.
or even other systems, as well as extending the eCAT with                        [15]   “EGI (European Grid Infrastructure) Science Gateways,”
federated services for accessing (and/or querying) multiple                             http://www.egi.eu/services/support/science-gateways/index.html.
IMSs. Increasing number of users and data will require further                   [16]   “The SCI-BUS (SCIentific gateway Based User Support) Project web-
development of instruments for strong community support,                                site,” http://www.sci-bus.eu.
                                                                                 [17]   T. Glatard et al., “A virtual imaging platform for multi-modality medical
communication and access control tools, part of which are                               image simulation,” Medical Imaging, IEEE Transactions on, vol. 32,
supported by Liferay. Moreover, semantic content annotation                             no. 1, pp. 110 –118, jan. 2013.
(ontologies), as well as adding knowledge and integrating it                     [18]   J. Wu et al., “The charité grid portal: User-friendly and secure access to
with existing data, could enable further automation of the data                         grid-based resources and services,” Journal of Grid Computing, vol. 10,
                                                                                        pp. 709–724, 2012.
processing to reduce even more human intervention in the                         [19]   T. Wassenaar et al., “WeNMR: Structural Biology on the Grid,” Journal
analysis of large quantities of biomedical data.                                        of Grid Computing, vol. 10, pp. 743–767, 2012.
                                                                                 [20]   S. Gesing et al., “A single sign-on infrastructure for science gateways
    Finally, we kept bioinformatics researchers in the loop                             on a use case for structural bioinformatics,” Journal of Grid Computing,
during the requirement analysis, design, and implementation of                          vol. 10, pp. 769–790, 2012.
the gateway to assure that the resulting SG is generic enough                    [21]   E. Sciacca et al., “VisIVO Workflow-Oriented Science Gateway for
to support bioinformatics research community with minimal                               Astrophysical Visualization,” in Proceedings of the 21st Euromicro
                                                                                        International Conference on Parallel Distributed and Network-Based
additional effort. Although in this paper we are focused on                             Processing, 2013.
the computational neuroscience applications, the same concept                    [22]   P. Kunszt et al., “The swiss grid proteomics portal,” in Proceedings of
and software components are being used to develop a SG for                              the Second International Conference on Parallel, Distributed, Grid and
analysis of DNA sequencing data.                                                        Cloud Computing for Engineering, 2011.
                                                                                 [23]   M. Fargetta et al., “A data engine for grid science gateways enabling
                                                                                        easy transfer and data sharing,” Presentation in the EGI community
                        ACKNOWLEDGMENT                                                  Forum 2012, March 2012.
                                                                                 [24]   “The CHAIN (Co-ordination and Harmonisation of Advanced e-
    This work is financially supported by the COMMIT project                            INfrastrucures for Research and Education Data Sharing) Project web-
“e-Biobanking with imaging for healthcare” funded by the                                site,” http://www.chain-project.eu.
Nederlandse Organisatie voor Wetenschappelijk Onderzoek                          [25]   “The CHAIN Science Gateway,” http://science-gateway.chain-
(Netherlands Organisation for Scientific Research, NWO), the                            project.eu.
                                                                                 [26]   V. Ardizzone et al., “The decide science gateway,” Journal of Grid
SCI-BUS project funded by European Union Seventh Frame-                                 Computing, vol. 10, no. 4, pp. 689–707, 2012.
work Programme (FP7/2007-2013) under grant agreement no                          [27]   “The N4U (neuGRID for you) Project website,” http://neugrid4you.eu.
28348, and the HPCN UvA project “Computational Neuro-                            [28]   T. Glatard et al., “Flexible and Efficient Workflow Deployment of Data-
science Gateway” funded by the University of Amsterdam.                                 Intensive Applications On Grids With MOTEUR,” International Journal
                                                                                        of High Performance Computing Applications, vol. 22, no. 3, pp. 347–
                                                                                        360, Aug. 2008.
                            R EFERENCES                                          [29]   S. Shahand et al., “Front-ends to Biomedical Data Analysis on Grids,”
                                                                                        in Proceedings of HealthGrid 2011, Bristol, UK, 2011.
 [1] D. De Roure et al., “The semantic grid: Past, present, and future,”
     Proceedings of the IEEE, vol. 93, no. 3, pp. 669 –681, march 2005.          [30]   S. Shahand et al., “Integrated Support for Neuroscience Research: from
 [2] T. Hey and A. E. Trefethen, “Cyberinfrastructure for e-science,” Sci-              Study Design to Publication,” in Proceedings of HealthGrid 2012,
     ence, vol. 308, no. 5723, pp. 817–821, 2005.                                       Amsterdam, NL, May 2012.
 [3] E. G. I. Science Gateway Virtual Team, Science Gateway Primer. EGI          [31]   D. Marcus et al., “The extensible neuroimaging archive toolkit,” Neu-
     (European Grid Infrastructure), 2012.                                              roinformatics, vol. 5, pp. 11–33, 2007.


                                                                             8