=Paper=
{{Paper
|id=None
|storemode=property
|title=Introducing The Neuroscience Gateway
|pdfUrl=https://ceur-ws.org/Vol-993/paper10.pdf
|volume=Vol-993
|dblpUrl=https://dblp.org/rec/conf/iwsg/SivagnanamMYABMC13
}}
==Introducing The Neuroscience Gateway==
<pdf width="1500px">https://ceur-ws.org/Vol-993/paper10.pdf</pdf>
<pre>
                  Introducing The Neuroscience Gateway
       Subhashini Sivagnanam, Amit Majumdar,                                                Nicholas. T. Carnevale
        Kenneth Yoshimoto, Vadim Astakhov,                                                  Yale School of Medicine
        Anita Bandrowski, MaryAnn Martone                                                    New Haven, CT, USA
             University of California San Diego
                    La Jolla, CA, USA


    Abstract— Last few decades have seen the emergence of              and administrative details of underlying hardware. As a result
computational neuroscience as a mature field where researchers         this allows neuroscience researchers easy access to best
are interested in modeling complex and large neuronal systems          neuroscience simulation and analysis packages running on
and require access to high performance computing machines and          large scale HPC resources. The neuroscience gateway (NSG)
associated cyberinfrastructure to manage computational                 [8] started its friendly user phase since December 2012. As a
workflow and data. The neuronal simulation tools, used in this         part of this we introduced some users, to NSG, who were
research field, are also implemented for parallel computers and        recruited earlier to help us test the NSG. Since late December,
suitable for high performance computing machines. But using            2012 we consider it to be in production and we continue to
these tools on complex high performance computing machines
                                                                       add more neural simulation tools. This paper describes the
remain a challenge due to issues with acquiring computer time on
these machines located at national supercomputer centers,
                                                                       NSG software architecture, how it is implemented, the impact
dealing with complex user interface of these machines, dealing         it has had until now, and the future plan.
with data management and retrieval. The Neuroscience Gateway
is being developed to alleviate all of these barriers to entry for               II.   MOTIVATION AND BACKGROUND
computational neuroscientist. It hides or eliminates, from the
                                                                           Most computational neuronal modeling projects start
point of view of the users, all the administrative and technical
barriers and makes parallel neuronal simulation tools easily
                                                                       "small" and many stay "small," in the sense of being
available and accessible on complex high performance computing         accommodated by individual desktop systems, but many
machines and handles the running of jobs and data management           eventually outstrip the speed and/or storage capabilities of
and retrieval. This paper describes the architecture it is based on,   local hardware. This is most often the case with projects that
how it is implemented, and how users can use this for                  involve complex models (especially large scale networks) or
computational neuroscience research using high performance             complex protocols (often involving learning rules), or require
computing at the back end.                                             high-dimensional optimization or parameter space exploration.
                                                                       Such projects have a tremendous potential to use
    Keywords—computational neuroscience, science gateway, high         cyberinfrastructure (CI), but only a very few neuroscientists
performance computing                                                  have been able to perform simulations on extreme scale HPC
                                                                       machines [9, 10, 11]. There is a broad consensus that the
                     I.     INTRODUCTION                               wider computational neuroscience community needs access to
                                                                       complex CI and HPC resources. Those who lack such access
    Computational neuroscience has seen tremendous growth              are at a significant disadvantage relative to the very few who
in recent years. This is evident from the large number of              have it. This disparity is particularly telling in light of the fact
publications, in prestigious neuroscience journals, that are           that the widely used simulators such as NEURON,
more and more based on modeling. In the last two decades,              GENESIS3, MOOSE, NEST, Brian and PyNN have been
this has motivated development of simulation tools such as             implemented on and released for parallel hardware, including
NEURON [1], GENESIS3 [2], MOOSE [3], NEST [4], PyNN                    HPC machines, for several years now. The typical
[5] and Brain [6] which are mature and written for parallel            neuroscientist - even one who is engaged in computational
computers. Complex neuroscience problems, which involve                modeling - has limited compute resources, usually only a few
network models, optimization or exploration of high                    desktop machines. The investigator whose project requires
dimensional parameter spaces, require access to high                   HPC machines must write a yearly proposal for allocation of
performance computing (HPC), data storage and complex                  supercomputer time. In the US these are peer reviewed for
workflow. Yet accessing and using HPC resources remain                 computer time on National Science Foundation (NSF)’s HPC
difficult due to the need to write yearly allocation proposals         machines and there is no guarantee that requested resource
for acquiring computer time on national supercomputer                  will be made available. If the proposal succeeds, the next task
centers, understand complex HPC architecture, learn complex            is to install simulation software (NEURON, GENESIS 3,
OS/software, optimally install neuronal software, learn                MOOSE, NEST, PyNN, Brian) optimally on the
policies     and     batch      environments,    manage       data     supercomputer, then apply the batch software process to
transfer/retrieval, and understand remote authentication issues.       configure and run the model on the cluster, and finally deal
As a solution to this problem we have developed a community            with output data retrieval and storage issues. These steps
infrastructure layer, i.e. a science gateway [7], specifically for     involve many vexing details that are not only time consuming
computational neuroscientists, that abstracts away technical
and requires knowledge of HPC and IT, but also differs             A. Underlying Architecture
significantly from facility to facility. Investigators who want        The WF is a software development kit (SDK) designed to
to use neuroal software on HPC resources directly, have to         generically deploy analytical jobs and database searches to a
deal with these issues themselves. The entire process              generic set of computational resources and databases. The WF
constitutes a barrier that impedes effective utilization of HPC    contains modules to manage submission of jobs to analytical
resources and also distracts neuroscientists from their primary    tools on various computational resources and modules to
research goals. We believe that many of issues are                 manage queries on data resources. The higher level schematic
encountered by neuroscientists in other parts of the world such    of the WF architecture, as described in a diagram by the WF
as in Europe and Asia. Our proposed solution enables the           project [22] developers is shown in Fig. 1. It is the basic
entire community of computational neuroscientists to access        software architecture of the NSG. The modules in the WF are
NSF (and other) funded national HPC resources transparently        as follows:
through a common, convenient interface that is already
configured for optimal use and operates within a common            Presentation Layer: The WF Presentation Layer accesses SDK
spatial and semantic framework. The benefit of this                capabilities through the J2EE front controller pattern, which
infrastructure can be extended to many more end users,             involves only two Java Classes. As a result, the WF is neutral
allowing investigators to focus on their research and fostering    with respect to interface access. The presentation layer
collaboration.                                                     provides lightweight access through a web browser and
                                                                   preserves flexibility for alternative access routes and adopts an
In recent years user-friendly scientific gateways have been        architecture based on Linux, Apache Tomcat, MySQL, and
developed successfully for many research fields and have           Java Struts2. The browser interface is based on the look and
resulted in tremendous adoption of CI and HPC by the broader       feel of popular email clients and supports data and task
user community of those fields. A few examples of such             management in user-created folders. The complexity of
gateways in the US include the nanoHUB gateway [12] for            gateway layer is hidden from users through this interface that
nanotechnology, the CIPRES gateway for phylogenetics               also allows users to create a login-protected personal account.
research [13], the GRIDCHEM [14] gateway for                       Registered users can store their data and records of their
computational chemistry, and the ROBETTA [15] gateway for          activities for a defined period of time. Uploaded user data is
protein structure prediction. Similarly many gateways exist in     checked for format compatibility. Users can also manually
Europe and Asia. In the US moust of these gateways are             specify data types and formats.
funded by the NSF for specific scientific domains, and
primarily utilize NSF’s Extreme Science and Engineering            User Module: The User Module passes user-initiated queries
Discovery Environment (XSEDE) [16] or Open Science Grid            and tasks from the interface to the executive portions of the
(OSG) [17] for accessing HPC resources. In addition to NSF         infrastructure via data and task management modules. It also
funded science gateways, there are gateways that are funded        stores all user data and task information in a MySQL database.
by other organizations such as the US Department of Energy         It supports individual user roles, permitting the assignment of
(DOE) at various DOE laboratories. Similarly outside of US,        individual user accounts, the sharing of data between
there are many e-infrastructures and gateways available for        accounts, and selective access to tools and data sources that
different domain sciences. Specific to neuroscience there is the   may be proprietary. Mapping user information takes places at
neuGRID [18], which offers a science gateway for                   this layer which helps track the individual usage on
neuroscientists to facilitate the development of image analysis    computational resources.
pipelines using HPC resources aiming at accelerating research      Broker Module: The Broker Module provides access to all
in Alzheimer and other neurodegenerative disease markers.          application-specific information in a Central Registry. This
The Blue Brain project [19] is further developing its own          Registry contains information about all data types required as
portal environment.                                                input and output for each application along with the formats
                                                                   accepted by each tool. Concepts and concept relationships are
                III.   NSG ARCHITECTURE                            formulated in XML files and read by Central Registry API
    The NSG architecture design is based on the existing           implementing classes. Defining tools and data types in a single
CIPRES Science Gateway framework [20] which has been               location allows adding new tools and data types with no
very successful and popular for building the phylogenetics         impact on the functioning of the application outside the
gateway as well as other gateways such as the PoPLAR               Registry.
(Portal for Petascale Lifescience Applications and Research)       Tool Module: The Tool Module manages the translation of
[21] gateway. CIPRES is very mature framework, and was             tasks submitted by users into command lines and submission
implemented at the San Diego Supercomputer Center (SDSC),          of the command line strings along with user data for execution
by SDSC researchers and programmers, using the open source         by appropriate compute engines. The Tool module handles
Workbench Framework (WF) [22]. Below we describe the               data formatting for jobs, and job staging. It also keeps track of
various software architecture components of WF, their              which tools can be run on which computational resources, and
functional adaptation specifically for the NSG and the current     the status of those resources. The design allows great
usage monitoring and metrics of the NSG.                           flexibility in determining what assets the NSG can access for
                                                                   job execution. Computational resources can be added through
                                                                   editing the tool resource configuration file, and the application
                                                                   can send command line scripts and receive output via
essentially any well-defined protocol (e.g. Unix command              neuroscience community. Hardware needed for setting up the
line, web services, SSH, DRMMA, GRAM, gsissh, etc.).                  NSG utilizes SDSC’s reference VM server, MySQL, Cloud
External Resources: The generic design of the WF                      storage and webservices. The latest software version of WF
architecture supports access to a wide variety of computational       architecture was obtained from SVN maintained by CIPRES
resources and databases, whether local or remote. Access can          developers. Following is the list of the key adaptations done to
be accomplished through a combination of individual                   the existing CIPRES code base for the NSG.
mechanisms, including SSH, GRAM/Globus, SOAP, and
ReST services.                                                            1.    Addition of uuencode/uudecode functionality to
                                                                                 support upload of input files in zip format
B. CIPRES Adaptation for the NSG                                          2.    Modification of job submission environment to
    The adaptation of CIPRES WF architecture to NSG was                          accommodate compilation of the NEURON code
done with the idea of hiding all the complexities associated              3.    Storing of output file per session in SDSC’s Cloud
with accessing and using a HPC resource such as job                              storage
submission, input data transfer, choosing of machine specific             4.    Automatic deletion of unused user files based on time
HPC parameters, output retrieval and storage etc. Fig. 2 shows                   length of inactivity
the high level functional diagram of this adaptation. Though              5.    Define computational neuronal tools in the PISE
NSG’s initial software design was based on the CIPRES WF                         XML format and interface with the portal
architecture, our implementation contained enhancement and
modification to the existing software based on the needs of the


                                        Fig. 1. Workbench Framework (from WF [22] project page).
                                            Fig. 2. Functional Diagram of NSG Architecture.

Some of the core functional implementation changes are                Higher order calculator (hoc) [23] programming language as a
discussed below.                                                      scripting language for neuronal models. However due to the
                                                                      possibility of malicious or incorrect use or handling of hoc
Access: A web browser serves as the entry point to the NSG            codes, which poses a security concern, direct user registration
portal. The web browser offers a simple interface, which              on the NSG is not allowed. Users are required to fill out a
allows users to upload input file or neuronal models, specify         form with identity information, which allows NSG
neuronal code specific input files, and specify the job               administrators to validate the user manually (“vetting”) prior
submission parameters such as number of cores or nodes,               to creating their account. Once registered, NSG can track each
expected wall clock time for job completion. Users are able to        individual user's access and usage, as well as enforce NSG
monitor the status of submitted jobs and extract output files         specified usage policies. The account information and usage is
from the user-friendly portal.                                        stored in the NSG MySQL database at SDSC.

Though the community gateway account is used for job                  Installation of computational neuroscience tools: Currently
submission, individual user accounts are necessary to keep            NEURON 7.2, NEURON 7.3, PGENESIS 2.3, NEST, Brian
track of usage and access. Some of the neuronal simulation            and PyNN have been installed on SDSC’s Trestles HPC
tools, such as NEURON, require that users be able to use the          machine and are being installed on TACC’s (Texas Advanced
Computing Center) [24] Lonestar HPC machine. These codes
are available through the NSG for the neuroscience
community. We are also in the process of installing the
MOOSE tool on the SDSC Trestles machine. Based on input
from users, additional tools will be installed in the future.

User input file and job distribution: Most neuroscience
computational models usually have more than one input file
from sources such as ModelDB [25]. To accommodate this
requirement, we have added capability for NSG users to
upload input file in a zip format. Many other science gateways
use flat text file as input and use precompiled executable to
run their job. Existing WF architecture can only handle input
data that is not binary. For the NSG we had to add and
implement the functionality to uuencode the uploaded zip file
and to uudecode the zip file on the computational resource
during the staging of input. NSG allows compilation and
running of user’s code based on the requirement of the
neuroscience application (e.g. NEURON, GENESIS 3,
MOOSE, NEST, Brian and PyNN). NEURON allows custom
                                                                     Fig. 3. User’s View of NSG Flow of Work.
C++ code to be used for new equations within a particular
model. To accommodate this, we created a mechanism to
collect all such code (located in .mod files) and compile them       C. Allocation and Policies
as a part of the job submission process. Job scripts are             Initial allocation, called Startup allocation within the XSEDE
automatically created and submitted. Once a job completes,           allocation process, was obtained for 50,000 core hours on
the working directory is compressed along with the input files,      SDSC’s Trestles and 50,000 core hours on TACC’s Lonestar
job submission scripts and output files, and are transferred to      machines. Utilizing the XSEDE allocation process [XRAC]
SDSC’s Cloud storage. The compressed file is also made               [26] we obtained community gateway account on NSF high
available for immediate download through the NSG portal.             performance computing resources, which include Trestles and
File staging is handled via the Java CoG Kit GridFTP API and         Lonestar. Additional computational resources will be added
Java runtime exec of Globus “gsissh” is used to remotely run         based on user demands.
commands. While the job is processing on the HPC cluster, an
intermediate results view option is available in the portal          Users of the community gateway account abide by the policies
which gives a snapshot of the working directory that was             set by the NSG administrators. Currently we allow 5000 core
created in the backend HPC cluster. Advanced users can look          hours per user per year. Based on the total amount of
at the intermediate results folder to see if their job has started   computer time acquired every year for the NSG and the total
or if any output file has been written. Another notable feature      number of NSG users, we will decide what percentage of the
is the ability to clone a job on the portal. Users are able to       total time can be allocated freely to each user and monitor
clone their jobs, and this is helpful when they want to submit a     their usage. NSG has the capability to allow users to run jobs
job with the same input file but vary the parameters such as         with their own allocation. For assessment of impact to
number of cores or the wall clock time. This is particularly         identifiable members of the community, individual gateway
helpful in parallel scaling performance studies on HPC               user’s tag will be propagated through the batch system, such
machines for neuronal tools.                                         that final job accounting reconciliation process will report
                                                                     quantitative usage by those individual gateway users. A
Storage and data retrieval: The output data is saved as a zip        ticketing system is in place and is used to keep track of user
file and is made available on the portal. Email notification is      questions and provide immediate assistance.
sent to the users when a job completes. This is handled by a
curl command in the job submission script, which notifies a
servlet in the web application when the job finishes. In case of     D. User Workflow
curl failure, two daemon processes named “checkJobsD” and            Fig. 3 shows at a high level and from a neuroscience user’s
“loadResultsD” check to see which jobs have finished and             point of view how the flow of work will appear as a simple
transfer the result to the NSG. The NSG also moves the data          environment. It consists of the following steps: User logs into
from the HPC resource’s scratch disk space to SDSC’s Cloud           the NSG -> User uploads input data -> User requests
storage for archival storage and employs a storage policy            simulation run on a HPC resource -> NSG frontend sends
based on data last accessed.                                         input data and job request to the remote HPC machine -> NSG
                                                                     retrieves output data from the remote machine -> NSG notifies
users of job (completion) status -> NSG provides user                 1.   CIPRES WF is well established gateway software
information about output data location and retrieval process.              which has been developed over the past 10 years by
                                                                           experienced software developers and researchers
                                                                      2.   CIPRES WF has been successfully used for building
                                                                           gateways in other domain sciences
                                                                      3.   CIPRES WF developers are researchers at SDSC and
                                                                           as a result we were able to get expert help when
                                                                           needed
                                                                      4.   Reuse of existing NSF funded software was
                                                                           considered a good practice
                                                                      5.   Any additions or modifications done for NSG will be
                                                                           contributed back to the CIPRES WF software and
                                                                           can be adopted by future gateway developers

                                                                  Bringing up the NSG to early production took about 2 months
                                                                  of time using approximately 75% effort of a staff person. All
                                                                  of the IT resources, such as VM servers, SDSC Cloud storage
                                     .                            etc. are located at SDSC and this was beneficial to the project.
Fig. 4. Distribution of Location of NSG Users.                    Reusability of software played a key role and helped to save
                                                                  lot of time and effort. As a result we were able to make the
E. Education and Collaboration                                    portal available to computational neuroscientists within a
                                                                  relatively short period of time since the start of the project.
As a part of the NSG education and outreach activity, a high
school student created a tutorial on Multiple Sclerosis using                         IV.   FUTURE WORK
the NEURON code. The tutorial is now available on nsgportal           NSG portal occasionally faces issues related to using
website [27]. An undergraduate student from the University of     GridFTP or the NFS server on the remote HPC cluster. During
California San Diego performed parallel scaling study of          file staging from the portal to the remote cluster or while
various models, available from ModelDB, on HPC resources          copying the results back, the task would fail either because
located at SDSC. As a part of this study it was shown that        there are too many open GridFTP connections on the remote
consistency of a model was not affected by running with           cluster or the NFS home directory on the HPC cluster is slow
different number of processors [28]. Effort is also underway to   due to multiple users using it. Addressing this issue requires
make output of parallel models, from ModelDB, available for       restarting autofs by system administrator of the HPC cluster.
educational purposes.                                             To avoid this, we are planning to use our own disk space and
                                                                  mount the HPC cluster’s home directory on the new disk
F. Usage                                                          space.
Early users were added, following the “vetting” process, to the
                                                                  We will also integrate a programmatic interface module,
NSG since the beginning of December 2012. Within the first
                                                                  which will provide an interface layer for external neuroscience
three months period we have 83 users, out of which 25 are
                                                                  frameworks such as ModelDB, NIF [29] etc. and will allow
from outside the US and this distribution is shown in Fig. 4.     mapping of user requests from these external frameworks to
45 users attended the first NSG workshop, held at SDSC in         the NSG. The module will translate user requirements and
mid-March 2013 and this was simultaneously broadcasted            authentication to the NSG interface. The external framework
over the web for remote attendees. The initial allocation of      would format its request in XML for job processing, status
50,000 core hours on Trestles was fully used up within the        query, and output data retrieval. REST API will also be
first two months since December, 2012 and as a result, we         incorporated to provide programmatic access to NSG. An
acquired additional 200,000 core hours for the NSG on             interface for model sharing with data provenance will be
Trestles. This demonstrates the interest and initial success of   provided. Users who are willing to share their models or
the project. From now on we will continue to write allocations    output will be able to do so in a collaborative environment.
proposals for XSEDE HPC resources annually and provide the        Validated data models will be provided for educational
computer time to NSG users. This alleviates the need for the      purposes.
NSG users to deal with the allocation process directly.
                                                                                    ACKNOWLEDGEMENT
G. CIPRES Adaptation Experience
From the very outset of developing the NSG we decided to use      This work was supported in part by the National Science
the CIPRES WF as our software base. The key reasons for           Foundation awards DBI (Division of Biological Infrastructure)
choosing CIPRES WF and adapting it to create the NSG are:         1146949 and DBI 1146830, and by NIH awards NS011613
                                                                  and DC09977. The work was also supported by computer
                                                                  allocation time provided by the NSF funded XSEDE
                                                                  organization on SDSC’s Trestles and TACC’s Lonestar HPC
                                                                  resources and by XSEDE Extended Collaborative Support
Services (ECSS) program which allowed Terri Schwartz
(SDSC) to collaborate with the NSG team. The authors would
like to thank Mark Miller, PI of the CIPRES software, for
providing advice and guidance as the NSG was implemented.
The authors would also like to thank Nancy Wilkins-Diehr
(SDSC) and Suresh Marru (Indiana University) for helpful
discussions, and UCSD undergraduate student Prithvi Sundar
and West View High School, San Diego student Debleena
Sengupta for their summer internship project work.
                           REFERENCES
[1]  M.L. Hines, and N.T. Carnevale. "Translating network models to
     parallel hardware in NEURON." J. Neurosci. Methods, 169, pp. 425-
     455, 2008.
[2] http://genesis-sim.org/.
[3] http://moose.sourceforge.net/
[4] http://www.nest-initiative.uni-
     freiburg.de/index.php/Software:About_NEST
[5] http://neuralensemble.org/trac/PyNN/
[6] http://brainsimulator.org
[7] N. Wilkins-Diehr, D. Gannon, G. Klimeck, S. Oster, S. Pamidighantam,
     “TeraGrid Science Gateways and Their Impact on Science,” IEEE
     Computer, Vol. 41. Number 11 (November, 2008), pages 32-41.
[8] http://www.nsgportal.org
[9] R. Ananthanarayanan, S. Esser, H. D. Simon, and D. S. Modha, SC09
     Proceedings, Nov 14-20, 2009, Portland, Oregon, UCA, 2009 ACM
     978-1-60558-744-8/09/112009.
[10] S. Kumar, P. Heidelberger, D. Chen, and M. Hines, “Optimization of
     Applications with Non-blocking Neighborhood Collectives via Multi-
     sends on the Blue Gene/P Supercomputer,” 24th IEEE International
     Parallel and Distributed Processing Symposium, 2010.
[11] H. Markram, “The Blue Brain Project,” Nature Reviews Neuroscience 7,
     153-160 (1 February 2006).
[12] http://www.nanohub.org
[13] M. Miller, W. Pfeiffer, and T. Schwartz, “Creating the CIPRES Science
     Gateway for Inference of large Phylognetic Trees,” Gateways
     Computing Environments Workshop (GC), 2010, PP 1-8, New Orleans,
     LA, 14 Nov., 2010.
[14] https://www.gridchem.org
[15] http://robetta.bakerlab.org/
[16] http://www.xsede.org
[17] http://www.opensciencegrid.org
[18] https://neugrid4you.eu/
[19] http://bluebrain.epfl.ch/
[20] M. Miller, W. Pfeiffer, and T. Schwartz, “CIPRES Sceince Gateway: A
     Community Resource for Phylogenetic Analyses,” TeraGrid’11, July 18-
     21, Salt Lake City, 2011.
[21] poplar.nics.tennessee.edu/locing!input.action
[22] http://www.ngbw.org/wbframework/
[23] Kernighan, Brian W.; Pike, Rob (1984). The Unix Programming
     Environment. Prentice Hall.
[24] www.tacc.utexas.edu
[25] http://senselab.med.yale.edu/modeldb
[26] https://www.xsede.org/allocations
[27] http://www.nsgportal.org/ed-index.html
[28] A. E. Bandrowski, S. Sivagnanam, K. Yoshimoto, V. Astakhov, A.
     Majumdar, "Performance of parallel neuronal models on the Triton
     cluster," Society for Neuroscience Annual Meeting, Washington D.C.,
     Nov 12-16, 2011.
[29] www.neuinfo.org

</pre>