=Paper=
{{Paper
|id=None
|storemode=property
|title=A Data-Centric Science Gateway for Computational Neuroscience
|pdfUrl=https://ceur-ws.org/Vol-993/paper12.pdf
|volume=Vol-993
|dblpUrl=https://dblp.org/rec/conf/iwsg/ShahandBHJSMGCKO13
}}
==A Data-Centric Science Gateway for Computational Neuroscience==
A Data-Centric Science Gateway for Computational Neuroscience Shayan Shahand∗‡ , Ammar Benabdelkader∗ , Jordi Huguet† , Mahdi Jaghouri∗ Mark Santcroos∗ , Mostapha al Mourabit∗ , Paul F.C. Groot† Matthan W.A. Caan† , Antoine H.C. van Kampen∗ and Sı́lvia D. Olabarriaga∗‡ ∗ Bioinformatics Laboratory, Dept. of Clinical Epidemiology Biostatistics and Bioinformatics † Brain Imaging Center, Dept. of Radiology Academic Medical Center, University of Amsterdam, The Netherlands Abstract—Science gateways provide user interfaces and high- in the European Grid Infrastructure (EGI) Science Gateway level services to access and manage applications and data collec- Primer [3], where issues involved in SG design, implementa- tions on distributed resources. They facilitate users to perform tion and operation are presented and discussed. According to data analysis on distributed computing infrastructures (DCIs) this Primer, SGs are desktop or web-based interfaces to a set of without getting involved into the technical details. The e-BioInfra applications and data collections. SGs comprise front-end and Gateway is a science gateway for biomedical data analysis on a national grid infrastructure, which has been successfully back-end components and offer services that facilitate access to adopted for neuroscience research. Necessary improvements in computing and storage resources, as well as services provided this gateway motivated the design of a new next generation of by Distributed Computing Infrastructures (DCIs). Moreover, e-BioInfra Gateway. In this paper we describe the motivation, SGs support collaboration between researchers through ex- requirements and design of this new gateway, which is based on change of ideas, tools and datasets. From the functional the WS-PGRADE/gUSE SG framework, allowing for support for perspective, SGs are SG frameworks or SG instances. SG other types of DCIs. The new gateway has additional generic frameworks implement generic functionalities such as security, data and meta-data management facilities to access and manage workflow and data management, and DCI access. Examples are (biomedical) data servers, and to provide an integrated and data- WS-PGRADE/gUSE [4] and DARE [5] frameworks. SG in- centric user interaction. Its first prototype is implemented and stances are community-specific science gateways, with tailored deployed for the computational neuroscience research community of the Academic Medical Center of University of Amsterdam. interfaces and services for a specific application domain. SG instances can be be built using SG frameworks or with custom Keywords—science gateway (SG), e-science, virtual laboratory software stacks. See Section II for examples of SG instances. (VL), problem solving environment (PSE), computational neuro- science, medical image analysis The e-BioInfra Gateway [6] is a SG instance for large scale biomedical data analysis on the Dutch e-Science Grid [7]. I. I NTRODUCTION It is designed to simplify usage of this infrastructure by biomedical researchers for the analysis of large datasets, and Science Gateways (SGs), also called Problem Solving En- implemented based on a custom framework. It is deployed vironments (PSEs) or Virtual Laboratories (VLs), support sci- at the Academic Medical Center (AMC) of the University entists in e-Science endeavours. De Roure et al. [1] described of Amsterdam (UvA), The Netherlands. It lowers barriers for the requirements of e-Science environments as a spectrum with users by providing services such as community Grid certificate two ends. One end is characterized by automation, virtual or- and automatic file transport to and from the Grid resources. ganizations of services, and the digital world, and the other end Since its deployment in production (early 2011), researchers is characterized by interaction, virtual organizations of people, have successfully performed large computations on the Dutch and the physical world. Orthogonal to these requirements at Grid infrastructure via the e-BioInfra Gateway with minor help both ends is the issue of scale, for example, of virtual organiza- from the support team. For example, Peters et al. [8], Wingen et tions, computation, storage, and the complexity of relationships al. [9], Rienstra et al. [10], and de Kwaasteniet et al. [11] between them. Increasing scale demands automation and, as have already published results of neuroscience research based highlighted by Hey and Trefethen [2], the computer scientists on the data analysis performed via the e-BioInfra Gateway. have the research challenge of creating high-level intelligent The gateway structures the system information and allows services that genuinely support e-Science applications. Such for extensions with new data analysis methods. This enabled services, e.g., SGs, should go beyond straightforward access (external) developers to extend it with ten applications, six for to computing resources, and also include support to construct medical imaging, three for next generation sequencing data and manage virtual organizations, as well as to manage the sci- analysis, and one for mass-spectrometry modelling. The e- entific data deluge in the scholarly cycle including hypothesis, BioInfra Gateway currently has 29 active users, and the largest experimentation, analysis, publication, research, and learning. usage so far is by the researchers from the Brain Imaging A large number of communities are therefore facing the Center of the AMC (BIC [12]). challenge of building SGs. A recent collaboration resulted Although the current e-BioInfra Gateway can be considered ‡ Corresponding authors: {s.shahand | s.d.olabarriaga}@amc.uva.nl a success story, our experience indicated the need for further improvement respectively to the following aspects: a) support management services. These generic services are not dedicated for data management; b) support for other types of DCIs, such to any domain-specific data type or format and try to remove as clusters and clouds; c) customizable interfaces to suit differ- the burden of moving files around from the user shoulders. ent user expertise, roles, and preferences; and d) sustainability of the adopted framework. III. BACKGROUND AND M OTIVATION FOR A NEW GATEWAY In this paper we discuss these experiences, which motivated the design of the next generation of the e-BioInfra Gateway. In a nutshell, the current e-BioInfra Gateway works as The new gateway design is generic (i.e., it is not specific to follows (see more details in [6]): the user authenticates with a particular research community), and it is based on the WS- username and password, selects the application to run, selects PGRADE/gUSE SG framework [4], which facilitates access to the input files and other parameters, and starts a so called heterogeneous DCIs. Additional features of the new gateway experiment. She/he can then monitor the experiment and, when include data and information management, as well as support finished, retrieve the results. The processing on grid resources for meta-data that is used and generated during the execution of is performed by the MOTEUR [28] workflow management complex data processing. We describe the requirements, design system (WfMS), and provenance information is kept about and architecture of the new system, and discuss some initial the experiments. Because medical imaging data files are large, results based on the prototype implementation for computa- their transport is not done directly via the e-BioInfra Gateway tional neuroscience, coined NeuSG. Although we focus on the web interface, but via an FTP directory that is located in the neuroscience use case, the approach could be applicable to trusted network of the hospital. Therefore, for neuroscience other domains as well. applications the user uploads the data to the server before performing the steps above, and retrieves the results from the II. R ELATED W ORK same place when the experiment is completed. Design, development, and usage of SGs have gained inter- In these around two years of experience with gateway est and attention in the past few years. Several projects and extension, operation, and user support, we faced challenges initiatives have been started worldwide to develop SG frame- discussed below. works and SG instances for diverse user communities [13]. A large number of errors are caused by invalid input data. For example, see the list of SGs on the websites of XSEDE Users typically have difficulty to prepare files for processing (Extreme Science and Engineering Digital Environment) [14], with the gateway applications, which currently involves steps EGI (European Grid Infrastructure) [15], and the SCI-BUS for file (re-)formatting, naming, transport, and also being aware (SCIentific gateway Based User Support) [16] project. of the data types that can be processed by each application. Al- though these problems are significantly reduced after training The VIP (Virtual Imaging Platform) portal [17], the Charite or reading the user manual, the data preparation and transport Grid portal [18], and WeNMR gateways [19] are examples process should be improved with further automation. of SG instances based on custom frameworks. The Mos- GRID SG [20], VisIVO [21], and the Swiss Grid proteomics Originally the e-BioInfra Gateway was meant to facilitate (iPortal) [22] portals are examples of SG instances based on access grid resources. In the past years other resources have SG frameworks (i.e., the WS-PGRADE/gUSE [4] in the case become available for research, such as local clusters at the of these three). All of these SGs typically provide data and AMC, a High-Performance Cloud, and GPU clusters. The information management for a specific research community current WfMS does not interface with clouds, so another using custom solutions. Particularly in the field of medical solution is required to exploit these additional resources. imaging, two examples relate more closely to our work. The current gateway supports two user profiles, end- The data engine [23] of the CHAIN project [24] adopts users and administrators. We noticed, however, that additional the jSAGA implementation of the Simple API for Grid Ap- profiles could be better supported with (combinations of) plications (SAGA) standard to communicate with Grid re- customized views of the various services [29]. For example, sources for data storage. The related meta-data is stored in end-users can have different levels of expertise, or application in-house databases. The CHAIN data engine is used in the support can be provided by members of the user community CHAIN SG [25] and the DECIDE SG [26]. The DECIDE SG (and not necessarily only system administrators). Therefore a provides high-level services for computer-aided neurological more flexible framework is needed to manage users, their roles diseases diagnosis and research on the European Research and and interaction, and viewing preferences. Education Networks and the European Grid Infrastructure. Finally, we noticed the need for adopting a more sustain- The neuGRID for you (N4U) Science Gateway [27] pro- able software stack. Although our custom framework fulfilled vides user-friendly access to N4U tools, algorithms, pipelines, the needs at first, as a small research group it is difficult to visualization toolkits, and resources on various DCIs (Grid, maintain and extend it. In particular, keeping up with all the Cloud, and Clusters) for medical imaging research, towards developments related to DCIs requires significant effort and the cure of brain diseases, in particular Alzheimer’s disease. expertise that can be achieved by bundling forces across SG The N4U Persistency Service registers distributed data from communities, such as done in the SCI-BUS project [16]. project partners into the N4U Information Base, which are then treated as a single data source. IV. R EQUIREMENTS A NALYSIS In contrast to these SGs, our approach aims at generic ser- We described the typical phases of computational neu- vices that are able to connect to existing data and information roscience studies in [30], which in summary include study 2 e-Bioinfra Browser Portlet Security WS- WS-PGRADE/gUSE Framework e-BioInfra (eBrowser) Portlet PGRADE Presentation Portal Portal e-BioInfra Gateway e-Bioinfra Processing Manager (PM) gUSE Catalogue Information (eCAT) System Data High-level DB WorkFlow Application Services Transport DB Interpreter Repository Service e-BioInfra (DTS) gUSE Plug-in Plug-in IMS IMS Generic ... Job Submission Service Generic Services (DCI-BRIDGE) Services Information Management Cloud Cluster Grid Middleware Middleware Middleware Middleware System (IMS) Data DB Services Services Services Services Services Storage Data Cloud Cluster Grid Resources Resources Resources Resources Resources Fig. 1. Layered architecture of the e-BioInfra Gateway based on the WS-PGRADE/gUSE generic SG framework. Grey boxes represent existing third-party components, and white boxes denote components added for complementary functionality. See text for more details. design, data acquisition, data handling, processing, analysis, 4) Automatic provenance information collection about the and publication. Based on the analysis of these phases, the methods, parameters and input files used for processing. actors who are involved in each phase, and the tasks that 5) Single sign-on facility to authenticate and authorize trans- they perform, in that paper we identified the properties and parently to various computing and storage resources using functionalities of SGs to support computational neuroscience user or community credentials. research communities. In summary, the required properties and functionalities include: sharing of data and methodology; sat- In addition to these functionalities, we aimed for a gateway isfying security and privacy regulations; scalable, transparent, that is: and flexible management of storage and computing resources; 1) extensible, to easily accommodate new types of data or literature discovery; collaboration support; meta-data, data, compute resources, applications, and user groups; workflow, and provenance management; and visualization. 2) customizable, to be able to support different research The current gateway [6] covers a subset of these require- communities and user profiles; ments, namely: transparent authentication and authorization 3) scalable, to gracefully support the growth of user commu- with Grid resources; flexible and efficient data transfer be- nity and its needs for resources, as well as infrastructures tween local and Grid storage for files without user interven- capacity and heterogeneity; and tion; workflow processing management, including logging and 4) sustainable, by using a community-driven SG framework. monitoring; and an extensible set of applications for various biomedical domains. For the new gateway we focused on V. S YSTEM D ESIGN AND I MPLEMENTATION the following additional functionalities, in particular to further Figure 1 illustrates the layered architecture of the new e- support data handling: BioInfra Gateway. At the bottom, the Resource layer (dark orange) with several DCI (e.g., local clusters, Grid and Cloud) 1) Unified, secure, and easy access to data and related and data resources (e.g., Radiology research data server). meta-data stored on heterogeneous infrastructures and These resources are utilized through Middleware Services con- repositories. Users should be able to transparently query, tained in the second layer (light orange). High-level Services explore, process, and analyse data from a single interface, contained in the third layer (blue) provide an abstraction to without bothering about the data location or format, or interact with the middleware, such as workflow management how it is retrieved for further processing. and data transport. Finally, the Presentation layer (green) con- 2) Automatic data format conversion and preprocessing ac- tains the interfaces for user interaction. The two topmost layers cording to pre-defined rules. For example, pseudonymisa- (green, blue) are implemented using generic SG framework tion and format conversion are automatically performed components provided by WS-PGRADE/gUSE (at the right), when new data is imported into the system. as well as a new data-centric SG framework that complements 3) Automatic and interoperable file transport and processing the functionality of WS-PGRADE/gUSE for the specific case on different infrastructures (e.g., data servers, grid, cloud). of NeuSG (at the left). Low level technical details are hidden from the users, such as different communication protocols, middleware Figure 2 illustrates the systems that host these components services, and authorization mechanisms. respectively and their network location. Due to security regu- 3 A F B these qualities, XNAT has been deployed in the Radiology i r department of AMC and connected to the NeuSG as first e supported IMS. w Data a e-BioInfra Scanner DCI Server Gateway l B. WS-PGRADE/gUSE SG Framework l WS-PGRADE/gUSE SG framework [4] is an open source, Fig. 2. Hosts and services of NeuSG and their network location: inside workflow- and service-oriented framework that facilitates de- or outside the AMC firewall. The e-BioInfra Gateway is located in the velopment, execution, and monitoring of scientific workflows demilitarized zone. User A is within the firewall boundaries and can access on DCIs. It comprises the WS-PGRADE portal, and the Grid the data directly; user B is outside the firewall boundaries and therefore only User Support Environment (gUSE) services. WS-PGRADE is has access to the meta-data and processing resources. based on the Liferay portal framework, which provides rich facilities for community management and customizable user lations for processing medical research data, some services are interfaces. gUSE provides high-level services to access various hosted inside the hospital firewall. The data is generated by the DCI resources. These qualities motivated the choice for this scanner and directly imported into a Data Server located inside SG framework to implement our gateway. the firewall, which keeps both the raw data and the meta-data. The most relevant gUSE services for our gateway are: The e-BioInfra Gateway is located in the demilitarized zone (DMZ) of the AMC network, which means that only some of • Job submission service or DCI-BRIDGE:1 provides flex- its services are visible from outside the network. In Figure 2, ible and versatile access to a large variety of DCIs such both users A and B can browse meta-data, start and monitor as grids, desktop grids, clusters, clouds and service-based data processing via the gateway, but only user A can download computational resources. It also handles authentication and view the medical imaging data. The raw data itself can and authorization to the configured DCIs transparently. only be accessed by the user directly from the Data Server, or • Workflow Interpreter: parses workflows, submits jobs to by privileged services of the e-BioInfra Gateway. the DCI-BRIDGE, and retrieves their status for monitor- ing and fault-tolerance. Below we further detail the components that are more • Application Repository: stores ready-to-use tested and relevant for a data-centric SG, namely data services and the configured workflows. These workflows are exported to new components illustrated as white boxes in Figure 1. For the application repository by workflow developers, from completeness we briefly introduce the WS-PGRADE/gUSE where they are imported into user space for execution. SG framework, and finally, we describe the interaction between • gUSE Information System: stores configurations of gUSE these components. services and workflow related information such as work- flow executions and their jobs status. A. Data Services The WS-PGRADE/gUSE framework also provides two Management of biomedical research data, with its grow- Application Programming Interfaces (APIs) to create SG in- ing size and complexity, requires domain-specific Information stances. We used the Application Specific Module (ASM) API Management Systems (IMSs). There are several IMSs that to utilize gUSE services, more specifically the Application address challenges such as management of biomedical research Repository, to manage and share workflows among users, and data and meta-data, electronic data exchange, archival and the Workflow Interpreter, to submit workflows. security, and the research communities usually already adopt The WS-PGRADE portal also offers a set of generic such systems routinely. Additionally, every community has its portlets to interact with gUSE services via web-based graphical own procedure to implement rules and regulations regarding user interfaces. For example, users can manage their creden- the protection of biomedical research data, as well as policies tials, which are required to authenticate and authorize to DCIs, for data sharing and archiving. Therefore, instead of repli- via security portlets. See [4] for the complete description of cating such efforts, we decided to rely on existing, external, WS-PGRADE/gUSE services and portlets. biomedical research data and meta-data resources, as well as on their own security mechanisms and policies. In this way, Currently the WS-PGRADE/gUSE framework does not the research community itself provides and manages the IMS, have any facility to connect to external IMS resources. More- defining data ownership, access policies, and regulating data over, its current data management facilities are also limited. confidentiality and privacy methods such as pseudonymisation. The data-centric e-BioInfra Gateway tries to bridge this gap The IMS is connected to the e-BioInfra Gateway by agreement with additional components described below. between the community and the gateway providers, and the data becomes available for processing at the gateway for C. e-BioInfra Gateway data centric framework authorized users only. The core of the new e-BioInfra Gateway is made of the fol- A popular IMS for medical imaging data and meta-data is lowing components: e-BioInfra Catalogue (eCAT), Data Trans- the eXtensible Neuroimaging Archive Toolkit (XNAT) [31]. port Service (DTS), Processing Manager (PM), and e-BioInfra XNAT is an open source IMS that offers an integrated 1 According to [4], the DCI-BRIDGE has been moved out of the gUSE framework for storage, management, electronic exchange, and layer to highlight that it is directly accessible via the standard OGF BES consumption of medical imaging data and its complementary job submission interface. Here we utilize a different conceptual framework to meta-data. XNAT provides a rich communication layer based illustrate the architectural layers of the system, thus we consider it as part of on a RESTful API of resource-oriented web services. Due to the gUSE generic services. 4 Browser Portlet (eBrowser). They are loosely coupled and E. Processing Manager (PM) communicate via well-defined APIs, an approach that paves the road towards a service-oriented architecture and facilitates their The PM takes care of submission and monitoring of data reuse in other gateways. These components are deployed in the processing applications, which are defined as workflows that same environment alongside WS-PGRADE/gUSE components are executed by the gUSE Workflow Interpreter. The PM and work together to implement the NeuSG functionalities. instructs to the DTS to transport input files from the IMSs to the storage resources of the DCI on which the processing is performed, and to transport the results back to the IMS. D. e-BioInfra Catalogue (eCAT) The PM imports the workflow from the gUSE Application Repository and configures it with the physical location of input The eCAT has been designed to facilitate the data manage- data before submission. ment functionalities at the gateway. It is a central information store for user-specific configurations such as IMS hosts and the user’s credentials to access them. eCAT defines and im- F. Data Transport Service (DTS) plements a data model to manage system-level information, The DTS transports data between IMSs and storage re- with the following main entities: User, Project, Data, sources on DCIs. This service contacts the eCAT to determine Application, Processing, and View preferences how to authenticate the IMS on behalf of the user, how to (see Figure 3 for their relationships). authenticate with the storage resources of the DCI (possibly eCAT provides an aggregated and user-specific view of with community credentials), and how to access data on both. Data entities that each user has access to a given IMSs. It autonomously performs the data transfer using third-party Note that eCAT is not meant to duplicate meta-data that is mechanisms as much as possible to avoid bottlenecks. If some already stored on IMSs; instead, it only stores pointers to such data has been replicated on a DCI, the location of that replica information on IMSs. It retrieves and stores meta-data on IMSs is stored in the eCAT and retrieved later. through the respective IMS Plug-ins, which are software mod- ules attached to eCAT to enable programmatic communications G. e-BioInfra Browser Portlet (eBrowser) with a specific IMS. The only exceptions are some meta-data that are specific to user activities on the gateway, which are Unlike the previous components, the eBrowsert is part of not possible, nor of direct interest of research communities, to the presentation layer. It provides a web-based user interface store in their IMSs. For example, location of data replicas on a to interact with all the e-BioInfra generic services. Instead DCI and user View preferences are such meta-data that of contacting the services directly, eBrowser retrieves infor- are only stored in the eCAT database. mation from eCAT to provide a unified view to scientists to browse data, projects and data processing instances. eBrowser Data entities are included in, and processed within, the essentially enables scientists to start, manage, and monitor data scope of Project entities. When possible, Projects are processing (through PM), as well as to configure viewing and also in sync with those on IMSs. Each User has access to interaction preferences with the gateway. some Applications, which are tested and ready-to-use workflows. When a User processes a certain Data with a specific Application, the information about this activity is H. Component Interactions captured by eCAT as a Processing entity. The provenance Figure 4 illustrates the interactions between users and information about the Data consumed and produced during a the e-BioInfra Gateway, as well as the interactions between Processing, the parameters, and the latest status of process- underlying components. User actions are expressed via the ing, are also stored in the eCAT database. eCAT also provides eBrowser and trigger interactions between other high-level necessary information to transport the results produced by a components (i.e., PM, DTS and eCAT) and lower-level compo- data processing to the respective IMS, if possible together with nents (i.e., gUSE and XNAT IMS). Details of these interactions the provenance information. eCAT is accessed by other system are presented below. components (PM, DTS, and eBrowser) through its API. Upon successful authentication with the WS-PGRADE portal, the user gets access to the eBrowser portlet. New users need to configure an IMS endpoint by providing the URL of the IMS, its type (e.g., XNAT), and recording their username and password securely. These configurations are collected by the eBrowser and sent to eCAT for validation and storage. After this configuration step, the following takes place when the user logs into the e-BioInfra Gateway 1) At first the user sees a list of her projects. To display this list, eBrowser sends a request to eCAT, which authen- ticates on behalf of the user to all registered IMSs and generates a unified list of all projects that are accessible by that particular user. 2) Similarly, when the user selects a project, the eBrowser Fig. 3. Simplified entity-relationship model of the information stored in the e-BioInfra Catalogue. sends a request to eCAT, which queries meta-data on the IMS to produce the list of all data entries in that project. 5 Fig. 4. Sequence diagram illustrating the interactions between the user and the various NeuSG components. After authentication, the user can browse projects, select data based on meta-data, select an application to run on these data and start new processing, monitor processing, and download results. 6 3) The user then selects data entities that she wishes to on several IMSs and described by rich meta-data, but also process, and browses for available applications. The to perform large scale data processing on DCIs. This can be eBrowser retrieves and displays the list of applications done without getting involved into low-level details, such as accessible to the user. The user selects an application and transporting files, as it was the case in the previous generation. the eBrowser displays configurations for that application The previous generation was built based on the Spring (e.g., application parameters). framework, it only supported the Dutch Grid infrastructure, 4) The user configures the application and starts a new data and it lacked facilities for user interface customization or processing. The eBrowser collects the provided configu- community support. In contrast, the new generation of the e- ration and submits a processing request to the PM. The BioInfra Gateway is built based on the WS-PGRADE/gUSE PM consults eCAT to find the details of the selected SG framework, which itself is built on the Liferay portal application, namely the DCI to run it and the arguments framework. Liferay provides facilities for user management, that need to be configured for its execution (e.g., input community management, and community support (e.g., on- files and parameters). The PM validates and creates the line forum). Moreover, it also facilitates the construction of processing entity in eCAT, from which the eBrowser customizable web-based user interfaces that are required to can later retrieve and display to the user for browsing, suit needs of each user (community) based on their profile, management, and monitoring purpose. expertise, and roles. The WS-PGRADE/gUSE SG framework 5) The PM further instructs the DTS to move the required provides high-level generic services to manage workflows, input data to the target DCI. The DTS contacts eCAT to enact them to various DCIs, and monitor their execution. These determine if those data already have a replica on the target services allow for functional scalability and interoperability DCI. If no replica is available, the eCAT provides DTS between various DCIs. Additionally, the WS-PGRADE/gUSE with the IMS endpoint configurations (including authenti- framework is an actively maintained and developed open- cation token) and location where it can retrieve the input source project, which allows the development team of the data. The DTS then uses this information to authenticate e-BioInfra Gateway to concentrate on its community-specific on behalf of the user to the IMS and download the input features, and makes the gateway operation more sustainable. data. Similarly, it retrieves user authentication tokens for the target DCI to upload input data (not shown in the Currently only XNAT is supported as IMS. Several other diagram). Finally the DTS registers in eCAT the location data management platform alternatives meet the research re- of the file replica in the DCI and returns it to the PM. quirements, although XNAT is of special interest due to its 6) After all data have been staged to the target DCI, the PM support for medical imaging, and its adoption by the AMC imports the application from gUSE via the ASM API into neuroscience research community. It has been particularly de- the user-space, and configures it with the physical location signed for managing standard medical imaging data as the core of input data and user-specified parameters. of its functionalities. In addition, its archiving and integrating 7) Having everything in place, the PM starts the data pro- capabilities, data model flexibility, ease of use and the highly cessing by submitting the configured application (work- active community of users/developers makes it a relevant asset. flow) to gUSE via the ASM API, and updates the process- Note however that the new e-BioInfra Gateway has been ing status in eCAT. The gUSE Workflow Interpreter parses designed to support multiple and heterogeneous IMSs, and it the workflow, generates corresponding jobs, and submits is not dependent on XNAT. them to DCI-BRIDGE. The DCI-BRIDGE retrieves user The eCAT contains much meta-data about the system level authentication tokens for the target DCI to submit jobs (viewing and processing), but it is completely dependent on on behalf of the user to the target DCI. an external IMS for the data. If the IMS is not available, 8) The PM periodically updates the information in eCAT the user cannot perform any data-related activity, such as based on the status reports from gUSE, which is then browsing or selecting files. We have considered duplicating the reflected in the interface of the eBrowser for monitoring. meta-data on the eCAT, both for efficiency and fault-tolerance 9) Typically, each processing consists of multiple data to be reasons, but we concluded that the synchronization of the two processed. When the processing of some data are finished, systems would be too time consuming. Moreover, we chose their results are immediately stored in the specific IMS to keep the access control to the Data Server completely in via the DTS. Thereby the user can check results even the hands of the community administrators, which, due to before the entire processing is complete. the required expertise, can be different persons than the SG 10) The user browses, manages, and monitors the processing administrators. This helped us build trust between the systems, via the eBrowser. eBrowser contacts eCAT to get infor- which is a known critical factor to connect such systems to mation about processing entities, including status. open infrastructures such as grids and clouds. 11) The user is forwarded to the IMS directly to access and download processing results via a link from at the gateway We used WS-PGRADE/gUSE as SG framework, which interface. in principle provides the workflow management and portal functionalities needed for the NeuSG. After a learning phase, VI. D ISCUSSION during which the concepts of the framework were better understood by the team, we observed that the usage model In the new generation of the e-BioInfra Gateway we tried of the framework differs from our needs in some cases, which to bridge the gap between scientists, data services, and DCIs. has led us to develop our own processing manager component. We aimed for a data-centric gateway in which everything is This has the goal of translating high-level “data processing” organized around “data”. Now scientists can use the gateway commands into low-level data transports, which are performed not only to browse their data, which can be potentially stored by the data transport service, and calls to the gUSE ASM 7 API. At first this introduces overhead, but at the same time [4] P. Kacsuk et al., “WS-PGRADE/gUSE Generic DCI Gateway Frame- it provides sufficient isolation from aspects regarding this work for a Large Variety of User Communities,” Journal of Grid particular WfMS, and allows us to consider other WfMSs in Computing, vol. 10, no. 4, pp. 601–630, 2012. [5] S. Maddineni et al., “Distributed Application Runtime Environment the future. (DARE): A Standards-based Middleware Framework for Science- Gateways,” Journal of Grid Computing, vol. 10, no. 4, pp. 647–664, The development of eBrowser viewing portlets was also 2012. simplified by the decision to have all user interaction to take [6] S. Shahand et al., “A grid-enabled gateway for biomedical data analy- place using information available on the eCAT. This approach sis,” Journal of Grid Computing, vol. 10, no. 4, pp. 725–742, 2012. requires all software components to register all activity on [7] “The BiG Grid Project website,” http://www.biggrid.nl. the eCAT, but it decouples the viewer from all the other [8] B. D. Peters et al., “Polyunsaturated fatty acid concentration predicts myelin integrity in early-phase psychosis,” Schizophrenia Bulletin, components accordingly. This reduces dependencies between 2012. the system components and simplifies its implementation and [9] G. A. van Wingen et al., “Persistent and reversible consequences of maintenance. Moreover, it make the eCAT as a natural prove- combat stress on the mesofrontal circuit and cognition,” Proceedings of nance data repository for the activity carried out at the gateway. the National Academy of Sciences, vol. 109, no. 38, pp. 15 508–15 513, 2012. [10] A. Rienstra et al., “Symptom validity testing in memory clinics: VII. C ONCLUSION AND F UTURE W ORK Hippocampal-memory associations and relevance for diagnosing mild cognitive impairment,” Journal of Clinical and Experimental Neuropsy- The implementation is being completed, and the new gate- chology, 2012. way will be released soon (April) for evaluation by AMC BIC [11] B. de Kwaasteniet et al., “Relation between structural and functional users. The portfolio of applications will be enriched (currently connectivity in major depressivedisorder,” Biological Psychiatry, no. 0, pp. –, 2013. there are only two), and the eBrowser will be extended [12] “The BIC (Brain Imaging Center) at the AMC (Academic Medical (currently only basic browsing functionality is available). At Center) website,” http://www.lebic-amc.nl. a second step, the gateway will be disseminated in training [13] T. Kiss, “Science gateways for the broader take-up of distributed events, and become open to the whole neuroscience commu- computing infrastructures,” Journal of Grid Computing, vol. 10, pp. nity of the University of Amsterdam. This step will require 599–600, 2012. [14] “The XSEDE (Extreme Science and Engineering Digital Environment) inclusion of other IMSs, for example other XNAT instances website,” http://www.xsede.org. or even other systems, as well as extending the eCAT with [15] “EGI (European Grid Infrastructure) Science Gateways,” federated services for accessing (and/or querying) multiple http://www.egi.eu/services/support/science-gateways/index.html. IMSs. Increasing number of users and data will require further [16] “The SCI-BUS (SCIentific gateway Based User Support) Project web- development of instruments for strong community support, site,” http://www.sci-bus.eu. [17] T. Glatard et al., “A virtual imaging platform for multi-modality medical communication and access control tools, part of which are image simulation,” Medical Imaging, IEEE Transactions on, vol. 32, supported by Liferay. Moreover, semantic content annotation no. 1, pp. 110 –118, jan. 2013. (ontologies), as well as adding knowledge and integrating it [18] J. Wu et al., “The charité grid portal: User-friendly and secure access to with existing data, could enable further automation of the data grid-based resources and services,” Journal of Grid Computing, vol. 10, pp. 709–724, 2012. processing to reduce even more human intervention in the [19] T. Wassenaar et al., “WeNMR: Structural Biology on the Grid,” Journal analysis of large quantities of biomedical data. of Grid Computing, vol. 10, pp. 743–767, 2012. [20] S. Gesing et al., “A single sign-on infrastructure for science gateways Finally, we kept bioinformatics researchers in the loop on a use case for structural bioinformatics,” Journal of Grid Computing, during the requirement analysis, design, and implementation of vol. 10, pp. 769–790, 2012. the gateway to assure that the resulting SG is generic enough [21] E. Sciacca et al., “VisIVO Workflow-Oriented Science Gateway for to support bioinformatics research community with minimal Astrophysical Visualization,” in Proceedings of the 21st Euromicro International Conference on Parallel Distributed and Network-Based additional effort. Although in this paper we are focused on Processing, 2013. the computational neuroscience applications, the same concept [22] P. Kunszt et al., “The swiss grid proteomics portal,” in Proceedings of and software components are being used to develop a SG for the Second International Conference on Parallel, Distributed, Grid and analysis of DNA sequencing data. Cloud Computing for Engineering, 2011. [23] M. Fargetta et al., “A data engine for grid science gateways enabling easy transfer and data sharing,” Presentation in the EGI community ACKNOWLEDGMENT Forum 2012, March 2012. [24] “The CHAIN (Co-ordination and Harmonisation of Advanced e- This work is financially supported by the COMMIT project INfrastrucures for Research and Education Data Sharing) Project web- “e-Biobanking with imaging for healthcare” funded by the site,” http://www.chain-project.eu. Nederlandse Organisatie voor Wetenschappelijk Onderzoek [25] “The CHAIN Science Gateway,” http://science-gateway.chain- (Netherlands Organisation for Scientific Research, NWO), the project.eu. [26] V. Ardizzone et al., “The decide science gateway,” Journal of Grid SCI-BUS project funded by European Union Seventh Frame- Computing, vol. 10, no. 4, pp. 689–707, 2012. work Programme (FP7/2007-2013) under grant agreement no [27] “The N4U (neuGRID for you) Project website,” http://neugrid4you.eu. 28348, and the HPCN UvA project “Computational Neuro- [28] T. Glatard et al., “Flexible and Efficient Workflow Deployment of Data- science Gateway” funded by the University of Amsterdam. Intensive Applications On Grids With MOTEUR,” International Journal of High Performance Computing Applications, vol. 22, no. 3, pp. 347– 360, Aug. 2008. R EFERENCES [29] S. Shahand et al., “Front-ends to Biomedical Data Analysis on Grids,” in Proceedings of HealthGrid 2011, Bristol, UK, 2011. [1] D. De Roure et al., “The semantic grid: Past, present, and future,” Proceedings of the IEEE, vol. 93, no. 3, pp. 669 –681, march 2005. [30] S. Shahand et al., “Integrated Support for Neuroscience Research: from [2] T. Hey and A. E. Trefethen, “Cyberinfrastructure for e-science,” Sci- Study Design to Publication,” in Proceedings of HealthGrid 2012, ence, vol. 308, no. 5723, pp. 817–821, 2005. Amsterdam, NL, May 2012. [3] E. G. I. Science Gateway Virtual Team, Science Gateway Primer. EGI [31] D. Marcus et al., “The extensible neuroimaging archive toolkit,” Neu- (European Grid Infrastructure), 2012. roinformatics, vol. 5, pp. 11–33, 2007. 8