Introducing The Neuroscience Gateway Subhashini Sivagnanam, Amit Majumdar, Nicholas. T. Carnevale Kenneth Yoshimoto, Vadim Astakhov, Yale School of Medicine Anita Bandrowski, MaryAnn Martone New Haven, CT, USA University of California San Diego La Jolla, CA, USA Abstract— Last few decades have seen the emergence of and administrative details of underlying hardware. As a result computational neuroscience as a mature field where researchers this allows neuroscience researchers easy access to best are interested in modeling complex and large neuronal systems neuroscience simulation and analysis packages running on and require access to high performance computing machines and large scale HPC resources. The neuroscience gateway (NSG) associated cyberinfrastructure to manage computational [8] started its friendly user phase since December 2012. As a workflow and data. The neuronal simulation tools, used in this part of this we introduced some users, to NSG, who were research field, are also implemented for parallel computers and recruited earlier to help us test the NSG. Since late December, suitable for high performance computing machines. But using 2012 we consider it to be in production and we continue to these tools on complex high performance computing machines add more neural simulation tools. This paper describes the remain a challenge due to issues with acquiring computer time on these machines located at national supercomputer centers, NSG software architecture, how it is implemented, the impact dealing with complex user interface of these machines, dealing it has had until now, and the future plan. with data management and retrieval. The Neuroscience Gateway is being developed to alleviate all of these barriers to entry for II. MOTIVATION AND BACKGROUND computational neuroscientist. It hides or eliminates, from the Most computational neuronal modeling projects start point of view of the users, all the administrative and technical barriers and makes parallel neuronal simulation tools easily "small" and many stay "small," in the sense of being available and accessible on complex high performance computing accommodated by individual desktop systems, but many machines and handles the running of jobs and data management eventually outstrip the speed and/or storage capabilities of and retrieval. This paper describes the architecture it is based on, local hardware. This is most often the case with projects that how it is implemented, and how users can use this for involve complex models (especially large scale networks) or computational neuroscience research using high performance complex protocols (often involving learning rules), or require computing at the back end. high-dimensional optimization or parameter space exploration. Such projects have a tremendous potential to use Keywords—computational neuroscience, science gateway, high cyberinfrastructure (CI), but only a very few neuroscientists performance computing have been able to perform simulations on extreme scale HPC machines [9, 10, 11]. There is a broad consensus that the I. INTRODUCTION wider computational neuroscience community needs access to complex CI and HPC resources. Those who lack such access Computational neuroscience has seen tremendous growth are at a significant disadvantage relative to the very few who in recent years. This is evident from the large number of have it. This disparity is particularly telling in light of the fact publications, in prestigious neuroscience journals, that are that the widely used simulators such as NEURON, more and more based on modeling. In the last two decades, GENESIS3, MOOSE, NEST, Brian and PyNN have been this has motivated development of simulation tools such as implemented on and released for parallel hardware, including NEURON [1], GENESIS3 [2], MOOSE [3], NEST [4], PyNN HPC machines, for several years now. The typical [5] and Brain [6] which are mature and written for parallel neuroscientist - even one who is engaged in computational computers. Complex neuroscience problems, which involve modeling - has limited compute resources, usually only a few network models, optimization or exploration of high desktop machines. The investigator whose project requires dimensional parameter spaces, require access to high HPC machines must write a yearly proposal for allocation of performance computing (HPC), data storage and complex supercomputer time. In the US these are peer reviewed for workflow. Yet accessing and using HPC resources remain computer time on National Science Foundation (NSF)’s HPC difficult due to the need to write yearly allocation proposals machines and there is no guarantee that requested resource for acquiring computer time on national supercomputer will be made available. If the proposal succeeds, the next task centers, understand complex HPC architecture, learn complex is to install simulation software (NEURON, GENESIS 3, OS/software, optimally install neuronal software, learn MOOSE, NEST, PyNN, Brian) optimally on the policies and batch environments, manage data supercomputer, then apply the batch software process to transfer/retrieval, and understand remote authentication issues. configure and run the model on the cluster, and finally deal As a solution to this problem we have developed a community with output data retrieval and storage issues. These steps infrastructure layer, i.e. a science gateway [7], specifically for involve many vexing details that are not only time consuming computational neuroscientists, that abstracts away technical and requires knowledge of HPC and IT, but also differs A. Underlying Architecture significantly from facility to facility. Investigators who want The WF is a software development kit (SDK) designed to to use neuroal software on HPC resources directly, have to generically deploy analytical jobs and database searches to a deal with these issues themselves. The entire process generic set of computational resources and databases. The WF constitutes a barrier that impedes effective utilization of HPC contains modules to manage submission of jobs to analytical resources and also distracts neuroscientists from their primary tools on various computational resources and modules to research goals. We believe that many of issues are manage queries on data resources. The higher level schematic encountered by neuroscientists in other parts of the world such of the WF architecture, as described in a diagram by the WF as in Europe and Asia. Our proposed solution enables the project [22] developers is shown in Fig. 1. It is the basic entire community of computational neuroscientists to access software architecture of the NSG. The modules in the WF are NSF (and other) funded national HPC resources transparently as follows: through a common, convenient interface that is already configured for optimal use and operates within a common Presentation Layer: The WF Presentation Layer accesses SDK spatial and semantic framework. The benefit of this capabilities through the J2EE front controller pattern, which infrastructure can be extended to many more end users, involves only two Java Classes. As a result, the WF is neutral allowing investigators to focus on their research and fostering with respect to interface access. The presentation layer collaboration. provides lightweight access through a web browser and preserves flexibility for alternative access routes and adopts an In recent years user-friendly scientific gateways have been architecture based on Linux, Apache Tomcat, MySQL, and developed successfully for many research fields and have Java Struts2. The browser interface is based on the look and resulted in tremendous adoption of CI and HPC by the broader feel of popular email clients and supports data and task user community of those fields. A few examples of such management in user-created folders. The complexity of gateways in the US include the nanoHUB gateway [12] for gateway layer is hidden from users through this interface that nanotechnology, the CIPRES gateway for phylogenetics also allows users to create a login-protected personal account. research [13], the GRIDCHEM [14] gateway for Registered users can store their data and records of their computational chemistry, and the ROBETTA [15] gateway for activities for a defined period of time. Uploaded user data is protein structure prediction. Similarly many gateways exist in checked for format compatibility. Users can also manually Europe and Asia. In the US moust of these gateways are specify data types and formats. funded by the NSF for specific scientific domains, and primarily utilize NSF’s Extreme Science and Engineering User Module: The User Module passes user-initiated queries Discovery Environment (XSEDE) [16] or Open Science Grid and tasks from the interface to the executive portions of the (OSG) [17] for accessing HPC resources. In addition to NSF infrastructure via data and task management modules. It also funded science gateways, there are gateways that are funded stores all user data and task information in a MySQL database. by other organizations such as the US Department of Energy It supports individual user roles, permitting the assignment of (DOE) at various DOE laboratories. Similarly outside of US, individual user accounts, the sharing of data between there are many e-infrastructures and gateways available for accounts, and selective access to tools and data sources that different domain sciences. Specific to neuroscience there is the may be proprietary. Mapping user information takes places at neuGRID [18], which offers a science gateway for this layer which helps track the individual usage on neuroscientists to facilitate the development of image analysis computational resources. pipelines using HPC resources aiming at accelerating research Broker Module: The Broker Module provides access to all in Alzheimer and other neurodegenerative disease markers. application-specific information in a Central Registry. This The Blue Brain project [19] is further developing its own Registry contains information about all data types required as portal environment. input and output for each application along with the formats accepted by each tool. Concepts and concept relationships are III. NSG ARCHITECTURE formulated in XML files and read by Central Registry API The NSG architecture design is based on the existing implementing classes. Defining tools and data types in a single CIPRES Science Gateway framework [20] which has been location allows adding new tools and data types with no very successful and popular for building the phylogenetics impact on the functioning of the application outside the gateway as well as other gateways such as the PoPLAR Registry. (Portal for Petascale Lifescience Applications and Research) Tool Module: The Tool Module manages the translation of [21] gateway. CIPRES is very mature framework, and was tasks submitted by users into command lines and submission implemented at the San Diego Supercomputer Center (SDSC), of the command line strings along with user data for execution by SDSC researchers and programmers, using the open source by appropriate compute engines. The Tool module handles Workbench Framework (WF) [22]. Below we describe the data formatting for jobs, and job staging. It also keeps track of various software architecture components of WF, their which tools can be run on which computational resources, and functional adaptation specifically for the NSG and the current the status of those resources. The design allows great usage monitoring and metrics of the NSG. flexibility in determining what assets the NSG can access for job execution. Computational resources can be added through editing the tool resource configuration file, and the application can send command line scripts and receive output via essentially any well-defined protocol (e.g. Unix command neuroscience community. Hardware needed for setting up the line, web services, SSH, DRMMA, GRAM, gsissh, etc.). NSG utilizes SDSC’s reference VM server, MySQL, Cloud External Resources: The generic design of the WF storage and webservices. The latest software version of WF architecture supports access to a wide variety of computational architecture was obtained from SVN maintained by CIPRES resources and databases, whether local or remote. Access can developers. Following is the list of the key adaptations done to be accomplished through a combination of individual the existing CIPRES code base for the NSG. mechanisms, including SSH, GRAM/Globus, SOAP, and ReST services. 1. Addition of uuencode/uudecode functionality to support upload of input files in zip format B. CIPRES Adaptation for the NSG 2. Modification of job submission environment to The adaptation of CIPRES WF architecture to NSG was accommodate compilation of the NEURON code done with the idea of hiding all the complexities associated 3. Storing of output file per session in SDSC’s Cloud with accessing and using a HPC resource such as job storage submission, input data transfer, choosing of machine specific 4. Automatic deletion of unused user files based on time HPC parameters, output retrieval and storage etc. Fig. 2 shows length of inactivity the high level functional diagram of this adaptation. Though 5. Define computational neuronal tools in the PISE NSG’s initial software design was based on the CIPRES WF XML format and interface with the portal architecture, our implementation contained enhancement and modification to the existing software based on the needs of the Fig. 1. Workbench Framework (from WF [22] project page). Fig. 2. Functional Diagram of NSG Architecture. Some of the core functional implementation changes are Higher order calculator (hoc) [23] programming language as a discussed below. scripting language for neuronal models. However due to the possibility of malicious or incorrect use or handling of hoc Access: A web browser serves as the entry point to the NSG codes, which poses a security concern, direct user registration portal. The web browser offers a simple interface, which on the NSG is not allowed. Users are required to fill out a allows users to upload input file or neuronal models, specify form with identity information, which allows NSG neuronal code specific input files, and specify the job administrators to validate the user manually (“vetting”) prior submission parameters such as number of cores or nodes, to creating their account. Once registered, NSG can track each expected wall clock time for job completion. Users are able to individual user's access and usage, as well as enforce NSG monitor the status of submitted jobs and extract output files specified usage policies. The account information and usage is from the user-friendly portal. stored in the NSG MySQL database at SDSC. Though the community gateway account is used for job Installation of computational neuroscience tools: Currently submission, individual user accounts are necessary to keep NEURON 7.2, NEURON 7.3, PGENESIS 2.3, NEST, Brian track of usage and access. Some of the neuronal simulation and PyNN have been installed on SDSC’s Trestles HPC tools, such as NEURON, require that users be able to use the machine and are being installed on TACC’s (Texas Advanced Computing Center) [24] Lonestar HPC machine. These codes are available through the NSG for the neuroscience community. We are also in the process of installing the MOOSE tool on the SDSC Trestles machine. Based on input from users, additional tools will be installed in the future. User input file and job distribution: Most neuroscience computational models usually have more than one input file from sources such as ModelDB [25]. To accommodate this requirement, we have added capability for NSG users to upload input file in a zip format. Many other science gateways use flat text file as input and use precompiled executable to run their job. Existing WF architecture can only handle input data that is not binary. For the NSG we had to add and implement the functionality to uuencode the uploaded zip file and to uudecode the zip file on the computational resource during the staging of input. NSG allows compilation and running of user’s code based on the requirement of the neuroscience application (e.g. NEURON, GENESIS 3, MOOSE, NEST, Brian and PyNN). NEURON allows custom Fig. 3. User’s View of NSG Flow of Work. C++ code to be used for new equations within a particular model. To accommodate this, we created a mechanism to collect all such code (located in .mod files) and compile them C. Allocation and Policies as a part of the job submission process. Job scripts are Initial allocation, called Startup allocation within the XSEDE automatically created and submitted. Once a job completes, allocation process, was obtained for 50,000 core hours on the working directory is compressed along with the input files, SDSC’s Trestles and 50,000 core hours on TACC’s Lonestar job submission scripts and output files, and are transferred to machines. Utilizing the XSEDE allocation process [XRAC] SDSC’s Cloud storage. The compressed file is also made [26] we obtained community gateway account on NSF high available for immediate download through the NSG portal. performance computing resources, which include Trestles and File staging is handled via the Java CoG Kit GridFTP API and Lonestar. Additional computational resources will be added Java runtime exec of Globus “gsissh” is used to remotely run based on user demands. commands. While the job is processing on the HPC cluster, an intermediate results view option is available in the portal Users of the community gateway account abide by the policies which gives a snapshot of the working directory that was set by the NSG administrators. Currently we allow 5000 core created in the backend HPC cluster. Advanced users can look hours per user per year. Based on the total amount of at the intermediate results folder to see if their job has started computer time acquired every year for the NSG and the total or if any output file has been written. Another notable feature number of NSG users, we will decide what percentage of the is the ability to clone a job on the portal. Users are able to total time can be allocated freely to each user and monitor clone their jobs, and this is helpful when they want to submit a their usage. NSG has the capability to allow users to run jobs job with the same input file but vary the parameters such as with their own allocation. For assessment of impact to number of cores or the wall clock time. This is particularly identifiable members of the community, individual gateway helpful in parallel scaling performance studies on HPC user’s tag will be propagated through the batch system, such machines for neuronal tools. that final job accounting reconciliation process will report quantitative usage by those individual gateway users. A Storage and data retrieval: The output data is saved as a zip ticketing system is in place and is used to keep track of user file and is made available on the portal. Email notification is questions and provide immediate assistance. sent to the users when a job completes. This is handled by a curl command in the job submission script, which notifies a servlet in the web application when the job finishes. In case of D. User Workflow curl failure, two daemon processes named “checkJobsD” and Fig. 3 shows at a high level and from a neuroscience user’s “loadResultsD” check to see which jobs have finished and point of view how the flow of work will appear as a simple transfer the result to the NSG. The NSG also moves the data environment. It consists of the following steps: User logs into from the HPC resource’s scratch disk space to SDSC’s Cloud the NSG -> User uploads input data -> User requests storage for archival storage and employs a storage policy simulation run on a HPC resource -> NSG frontend sends based on data last accessed. input data and job request to the remote HPC machine -> NSG retrieves output data from the remote machine -> NSG notifies users of job (completion) status -> NSG provides user 1. CIPRES WF is well established gateway software information about output data location and retrieval process. which has been developed over the past 10 years by experienced software developers and researchers 2. CIPRES WF has been successfully used for building gateways in other domain sciences 3. CIPRES WF developers are researchers at SDSC and as a result we were able to get expert help when needed 4. Reuse of existing NSF funded software was considered a good practice 5. Any additions or modifications done for NSG will be contributed back to the CIPRES WF software and can be adopted by future gateway developers Bringing up the NSG to early production took about 2 months of time using approximately 75% effort of a staff person. All of the IT resources, such as VM servers, SDSC Cloud storage . etc. are located at SDSC and this was beneficial to the project. Fig. 4. Distribution of Location of NSG Users. Reusability of software played a key role and helped to save lot of time and effort. As a result we were able to make the E. Education and Collaboration portal available to computational neuroscientists within a relatively short period of time since the start of the project. As a part of the NSG education and outreach activity, a high school student created a tutorial on Multiple Sclerosis using IV. FUTURE WORK the NEURON code. The tutorial is now available on nsgportal NSG portal occasionally faces issues related to using website [27]. An undergraduate student from the University of GridFTP or the NFS server on the remote HPC cluster. During California San Diego performed parallel scaling study of file staging from the portal to the remote cluster or while various models, available from ModelDB, on HPC resources copying the results back, the task would fail either because located at SDSC. As a part of this study it was shown that there are too many open GridFTP connections on the remote consistency of a model was not affected by running with cluster or the NFS home directory on the HPC cluster is slow different number of processors [28]. Effort is also underway to due to multiple users using it. Addressing this issue requires make output of parallel models, from ModelDB, available for restarting autofs by system administrator of the HPC cluster. educational purposes. To avoid this, we are planning to use our own disk space and mount the HPC cluster’s home directory on the new disk F. Usage space. Early users were added, following the “vetting” process, to the We will also integrate a programmatic interface module, NSG since the beginning of December 2012. Within the first which will provide an interface layer for external neuroscience three months period we have 83 users, out of which 25 are frameworks such as ModelDB, NIF [29] etc. and will allow from outside the US and this distribution is shown in Fig. 4. mapping of user requests from these external frameworks to 45 users attended the first NSG workshop, held at SDSC in the NSG. The module will translate user requirements and mid-March 2013 and this was simultaneously broadcasted authentication to the NSG interface. The external framework over the web for remote attendees. The initial allocation of would format its request in XML for job processing, status 50,000 core hours on Trestles was fully used up within the query, and output data retrieval. REST API will also be first two months since December, 2012 and as a result, we incorporated to provide programmatic access to NSG. An acquired additional 200,000 core hours for the NSG on interface for model sharing with data provenance will be Trestles. This demonstrates the interest and initial success of provided. Users who are willing to share their models or the project. From now on we will continue to write allocations output will be able to do so in a collaborative environment. proposals for XSEDE HPC resources annually and provide the Validated data models will be provided for educational computer time to NSG users. This alleviates the need for the purposes. NSG users to deal with the allocation process directly. ACKNOWLEDGEMENT G. CIPRES Adaptation Experience From the very outset of developing the NSG we decided to use This work was supported in part by the National Science the CIPRES WF as our software base. The key reasons for Foundation awards DBI (Division of Biological Infrastructure) choosing CIPRES WF and adapting it to create the NSG are: 1146949 and DBI 1146830, and by NIH awards NS011613 and DC09977. The work was also supported by computer allocation time provided by the NSF funded XSEDE organization on SDSC’s Trestles and TACC’s Lonestar HPC resources and by XSEDE Extended Collaborative Support Services (ECSS) program which allowed Terri Schwartz (SDSC) to collaborate with the NSG team. The authors would like to thank Mark Miller, PI of the CIPRES software, for providing advice and guidance as the NSG was implemented. The authors would also like to thank Nancy Wilkins-Diehr (SDSC) and Suresh Marru (Indiana University) for helpful discussions, and UCSD undergraduate student Prithvi Sundar and West View High School, San Diego student Debleena Sengupta for their summer internship project work. REFERENCES [1] M.L. Hines, and N.T. Carnevale. "Translating network models to parallel hardware in NEURON." J. Neurosci. Methods, 169, pp. 425- 455, 2008. [2] http://genesis-sim.org/. [3] http://moose.sourceforge.net/ [4] http://www.nest-initiative.uni- freiburg.de/index.php/Software:About_NEST [5] http://neuralensemble.org/trac/PyNN/ [6] http://brainsimulator.org [7] N. Wilkins-Diehr, D. Gannon, G. Klimeck, S. Oster, S. Pamidighantam, “TeraGrid Science Gateways and Their Impact on Science,” IEEE Computer, Vol. 41. Number 11 (November, 2008), pages 32-41. [8] http://www.nsgportal.org [9] R. Ananthanarayanan, S. Esser, H. D. Simon, and D. S. Modha, SC09 Proceedings, Nov 14-20, 2009, Portland, Oregon, UCA, 2009 ACM 978-1-60558-744-8/09/112009. [10] S. Kumar, P. Heidelberger, D. Chen, and M. Hines, “Optimization of Applications with Non-blocking Neighborhood Collectives via Multi- sends on the Blue Gene/P Supercomputer,” 24th IEEE International Parallel and Distributed Processing Symposium, 2010. [11] H. Markram, “The Blue Brain Project,” Nature Reviews Neuroscience 7, 153-160 (1 February 2006). [12] http://www.nanohub.org [13] M. Miller, W. Pfeiffer, and T. Schwartz, “Creating the CIPRES Science Gateway for Inference of large Phylognetic Trees,” Gateways Computing Environments Workshop (GC), 2010, PP 1-8, New Orleans, LA, 14 Nov., 2010. [14] https://www.gridchem.org [15] http://robetta.bakerlab.org/ [16] http://www.xsede.org [17] http://www.opensciencegrid.org [18] https://neugrid4you.eu/ [19] http://bluebrain.epfl.ch/ [20] M. Miller, W. Pfeiffer, and T. Schwartz, “CIPRES Sceince Gateway: A Community Resource for Phylogenetic Analyses,” TeraGrid’11, July 18- 21, Salt Lake City, 2011. [21] poplar.nics.tennessee.edu/locing!input.action [22] http://www.ngbw.org/wbframework/ [23] Kernighan, Brian W.; Pike, Rob (1984). The Unix Programming Environment. Prentice Hall. [24] www.tacc.utexas.edu [25] http://senselab.med.yale.edu/modeldb [26] https://www.xsede.org/allocations [27] http://www.nsgportal.org/ed-index.html [28] A. E. Bandrowski, S. Sivagnanam, K. Yoshimoto, V. Astakhov, A. Majumdar, "Performance of parallel neuronal models on the Triton cluster," Society for Neuroscience Annual Meeting, Washington D.C., Nov 12-16, 2011. [29] www.neuinfo.org