=Paper= {{Paper |id=None |storemode=property |title=The HUBzero Platform: Extensions and Impressions |pdfUrl=https://ceur-ws.org/Vol-819/paper2.pdf |volume=Vol-819 |dblpUrl=https://dblp.org/rec/conf/iwsg/AlberNW11 }} ==The HUBzero Platform: Extensions and Impressions== https://ceur-ws.org/Vol-819/paper2.pdf
3rd International Workshop on Science Gateways for Life Sciences (IWSG 2011), 8-10 JUNE 2011




The HUBzero Platform: Extensions and Impressions
Anna	
  Alber,1,*	
  Jarek	
  Nabrzyski,1,2	
  and	
  Timothy	
  Wright1,2	
  
1Center	
  for	
  Research	
  Computing,	
  University	
  of	
  Notre	
  Dame	
  

2Department	
  of	
  Computer	
  Science	
  &	
  Engineering,	
  University	
  of	
  Notre	
  Dame	
  

	
  

ABSTRACT                                                                                                   intended to offer an environment for users to create and work with
Motivation: We our efforts to work with HUBzero in a significant                                           simulation tools that are made available in a hub. However, there
collaboration-oriented project, as well as our impressions of                                              is potential for accidental and malicious abuse of such a powerful
HUBzero after nearly a year of interaction with the platform. Our                                          virtual environment; for example, by gratuitous resource use or
assessment is a mixed one: while the HUBzero platform does a                                               hacking.
good job of bringing users and modeling tools together, we found                                              To provide hub access to openModeller and accommodate the
some noteworthy limitations of this recently open-sourced                                                  aforementioned needs, we have extended HUBzero in three ways.
technology. For example: hubs are generally configured to deploy                                           First, we have significantly expanded the OpenVZ environment to
simulation tools of modest means with input and output data not                                            support the execution of a tool as complex as openModeller. Next,
readily shared across user accounts; by default, the security                                              we have integrated NFS (Network File System) (Smith, 2006) with
configuration of a HUBzero server is very open; hubs are inflexible in                                     the file system made available through the HUBzero Workspace
terms of their deployment requirements, and there is a need for                                            tool. Related to this, we have enhanced the process of hub user
more detailed documentation. In answer to the first (and most                                              account creation to include the automatic setup of public and
serious) of these issues, we have extended HUBzero to enable the                                           private NFS directories for specific users. Finally, we have made
use of more sophisticated modeling tools, such as openModeller                                             it possible to restrict which hub users are granted access to the
Desktop, and to provide seamless access to external Network File                                           Workspace and openModeller tools.
System drive space.                                                                                           To permit long-running, computationally intense openModeller
                                                                                                           simulations, we have embarked on a separate project to kick off
                                                                                                           and manage such jobs outside of the HUBzero environment. The
1       INTRODUCTION                                                                                       results of these simulations can be made available to our hub
The open source HUBzero platform offers an effective way for                                               through NFS.
scientific and educational communities to share information,                                                  The remainder of this paper is organized as follows. Section 2
interact, and run simulations (McLennan, 2010). The latter                                                 presents the state-of-the art in collaborative, virtual organization
capability is considered to be HUBzero’s “signature service” and,                                          tools.    Section 3 discusses the Collaboratory Project and
indeed, is not typically available through similar Web platforms.                                          simulations tools that we integrate with the HUBzero
   We are currently employing HUBzero as part of an effort to                                              infrastructure. In section 4 we present the HUBzero architecture.
build a so-called collaboratory: a virtual space in which users may                                        Section 5 reviews the OpenVZ configuration and what we have
share information and innovate. The goal of our work is to support                                         done to enhance this configuration. Section 6 addresses our
a multidisciplinary community of scientists with a focus on                                                integration of NFS with HUBzero, including our use of public and
adaptation strategies for biological systems. In addition to                                               private directories for each Workspace/openModeller user.
publishing and discussing content related to adaptation, users of                                          Section 7 describes our changes to HUBzero to restrict access to
the HUBzero portal (or “hub”) need to leverage relevant                                                    the Workspace and openModeller tools. Section 8 briefly looks at
simulation tools, such as openModeller and openModeller Desktop                                            a separate project to handle more intense openModeller jobs
(de Souza Muñoz et al., 2011).                                                                             outside of our hub. Section 9 outlines our overall impressions of
   Such tools have requirements that may not be satisfied by a                                             working with HUBzero. Finally, in Section 10 we offer some
typical hub deployment. For example, there may be a need for                                               concluding remarks about this project.
software libraries that do not normally exist in HUBzero’s
virtualized Linux environment (OpenVZ [Parallels Holdings Ltd.,
2011]). Also, for functionality purposes, users may have to
manage and archive a tool’s input and output files. As part of this
management there will almost always be a need to make some files
publically accessible (e.g., simulation results) and others private.
   Tangentially related to these requirements is the need to restrict
access to a hub’s OpenVZ-based Workspace tool. Through
Workspace a user is given a private, virtual Linux workstation that
may be employed for nearly any purpose. Generally, this is

*   To whom correspondence should be addressed.


Copyright © 2011 for the individual papers by the papers' authors. Copying permitted only for private and academic purposes. This volume is published and copyrighted by its editors.
3rd International Workshop on Science Gateways for Life Sciences (IWSG 2011), 8-10 JUNE 2011



                                                                                                           resources, including the NSF TeraGrid, the Open Science Grid,
2      RELATED WORK                                                                                        and others. HUBzero’s middleware hides much of the complexity
                                                                                                           of distributed and grid computing, handling authentication,
HUBzero offers many features that researchers need for effective
                                                                                                           authorization, file transfer, and visualization, and letting the
collaboration. Major components include: various interactive
                                                                                                           researcher focus on research.
simulation tools that can be accessed through almost any Web                                                 Aside from its core features, HUBzero was chosen for our
browser, a repository for online course materials and other                                                collaboratory project due to its wide adoption by the US research
publications, mechanisms for uploading new resources, a tool                                               community as well as our easy access to support services that are
development area, ratings and citations abilities, content tagging                                         provided by Purdue University. We believe that MyExperiment
mechanisms, wikis and blogs, private and public collaboration                                              could be adapted to our needs, but its focus on workflows wasn’t
areas, usage statistics, news and events, and more. While                                                  as good a fit as HUBzero. Our project heavily depends on climate
relatively few platforms are as feature-rich, there are, nevertheless,                                     and adaptation simulation tools that need to be run within the
some worth mentioning.                                                                                     Collaboratory’s system.
   MyExperiment (De Roure et al., 2009) is a virtual research
environment for collaboration and the sharing of experiments,
which aims to provide a “workflow bazaar” for any workflow
management system.          An experiment is represented as an
application workflow rather than as an infrastructure or services
workflow. The design methodology of myExperiment was inspired
by the Web 2.0 approach that was used in systems such as
Facebook, MySpace, and Amazon.               MyExperiment brings
functionality to the user through familiar interfaces and can be
combined with other services. Although its primary focus is on
workflows, the designers of myExperiment realized that there was
an immediate need to associate workflows with other information.
Thus, the myExperiment concept is about sharing digital objects
that include data, results, provenance information, tags, associated
documentation, etc. These individual items can be collected
together to form research objects, for example to record an
experiment. Unlike Twine (www.twine.com), BioMedExperts
(www.biomedexperts.com)            or      Nature         Networking                                      Fig. 1. The Collaboratory Project’s hub home page.
(network.nature.com), myExperiment is not intended to be a
general social networking environment for scientists. Instead, the
                                                                                                           3      OVERVIEW OF THE COLLABORATORY
focus is on social networking around shared artifacts. In this way
                                                                                                                  PROJECT AND SIMULATION TOOLS
it is more like social bookmarking systems such as CiteULike
(www.citeulike.org) and Connotea (www.connotea.org)—but with                                               3.1        The Collaboratory Project
a much wider and richer remit than published articles, or social                                           Climate change has compelled researchers to look not only for
content systems like YouTube (www.youtube.com), SlideShare                                                 causality, but also for adaptation strategies. Aimed at building a
(www.slideshare.net), and Flickr (www.flickr.com). It effectively                                          virtual environment for comprehensive and informed decision-
creates a social network of people and the items that they share.                                          making, the Collaboratory Project is comprised of a team of
The main difference between myExperiment and HUBzero is that                                               researchers in ecology, computer science, law, and social science.
HUBzero is built using portal technologies and provides simulation                                         We anticipate that our work will help foster an understanding of
tools that can be accessed from the browser and executed in the                                            complexity in natural systems and charter a path from data to
HUBzero space. Furthermore, users of HUBzero can add new                                                   knowledge to insight and, finally, to action. Integrated with on-
resources and simulation tools within the HUBzero space.                                                   going and proposed survey research on expert opinion, we also
MyExperiment is built using Web 2.0 Ruby on Rails, rather than a                                           plan to carry out social network analysis (collaboration networks,
portal framework, includes an API and offers remote execution                                              in particular) and study the impact of a virtual organization on the
capabilities.                                                                                              decision-making process. We anticipate that our efforts will result
   HUBzero is often compared to the highly successful Open                                                 in various cyber tools that can be adapted by any domain requiring
Courseware Initiative from MIT (web.mit.edu/ocw). However,                                                 collaborative decision-making.
HUBzero is more than just a repository for course materials. It is a                                          The role played by our hub (see Figure 1) is to operate as a
place where researchers and educators can meet and accomplish                                              shared space where researchers from numerous fields related to
real work. The HUBzero platform offers groups for private                                                  climate adaptation can run projection and simulation tools, devise
collaboration, software development projects in its forge area,                                            new hub tools, and share results with colleagues and the public.
event calendars, and many other services designed to connect                                               Our initial work focused on making computational simulation and
researchers and build a community. But most importantly, it                                                projection tools publicly available. However, we soon extended
connects users to the simulation tools they need for research and                                          our hub architecture to better enable easy data sharing with added
education. Simulation jobs can be dispatched on national grid                                              privacy control. As more tools become available, we plan on our

Copyright © 2011 for the individual papers by the papers' authors. Copying permitted only for private and academic purposes. This volume is published and copyrighted by its editors.
3rd International Workshop on Science Gateways for Life Sciences (IWSG 2011), 8-10 JUNE 2011




hub becoming a central point of contact for experts in climate                                            Source Project, and middleware that enables the execution of hub
change and adaptation strategies—a space where a suite of                                                 tools inside a user’s Web browser. To provide an easy-to-use
computational tools can be leveraged to simulate the effects of                                           portal system for the scientific community, HUBzero also
climate change on species and project future distribution patterns.                                       incorporates features for publishing articles and presentations, a
As its community of users grows in both number and diversity, our                                         Q&A area, a wish list feature, and user group management. All
hub will become more than a repository of publicly available tools                                        hub tools run in restricted virtual containers that control access to
and data: it will evolve into a virtual community for climate                                             the underlying file system, networking, and other processes. Each
adaptation research where members contribute back by sharing                                              user is granted access to their own home directory with quota
new tools, publications, use cases, and data.




  Fig. 2.     The openModeller experiment designer with multiple
  algorithms, available environmental layers, and biological data running
  on the Collaboratory Project’s hub.

                                                                                                            Fig. 3. An openModeller simulation experiment with multiple models
                                                                                                            with output map inside user's browser
3.2        OpenModeller Desktop
OpenModeller Desktop is an open source, fundamental niche                                                 limitations and a Debian Linux X Window environment.
modeling library with embedded Geographic Information System
(GIS) using the Quantum GIS (QGIS) libraries. It provides a                                               4.2        Site Walk-through
uniform method for modeling distribution patterns using a variety                                         Our hub users can access various resources about climate
of widely used algorithms (de Souza Muñoz et al., 2011). The                                              adaptation strategy ranging from legal documents to simulation
openModeller library is used to generate a projected occupancy                                            tools. Users can also become contributors by uploading articles,
map on the basis of environmental parameters considered in the                                            documents, and event information; creating new tools; and
model (e.g., temperature, precipitation, altitude) and biological                                         participating in forum discussions. As new tools are added to the
data with point locations (see Figure 2). The openModeller                                                hub, they contribute to the overall capacity for analysis and lead to
package can run simulations with multiple species and multiple                                            more data; in turn, these data become input for steps in the
algorithms, and includes the ability to add newly developed models                                        adaptation decision-making process.
as plug-ins. This flexibility and versatility is central to the                                             Adding a new tool to the hub begins with a registration process
collaborative nature of our project, as it permits users to run a wide                                    on the site, which alerts the hub administrator to create a code
variety of projections, and develop/share new algorithms.                                                 repository for the tool. The owner of the tool then iterates though a
   Projected occupancy maps (see Figure 3) as well as other output                                        software engineering process until the tool is ready for publication
can be shared, but with an option to keep user-specified files                                            to the hub’s community. There is, of course, extensive
private. Ecologists, biologists, policy makers, and experts in other                                      documentation available on the hub to assist newcomers with all
fields can use scientific data and simulation results to make                                             HUBzero features, including tool creation.
informed decisions about climate change and its effect on species                                           User groups may also be created to facilitate discussions and the
population.                                                                                               exchange of ideas through the hub. This mechanism offers
                                                                                                          different levels of privacy, ranging from public to “by invitation
4      HUBZERO ARCHITECTURE                                                                               only,” and permits users to more finely manage their discussions
                                                                                                          and the sharing of data.
4.1        HUBzero Infrastructure
The HUBzero platform was developed at Purdue University to
support nanoHUB.org, an online community for the Network for                                              5       OPENVZ CONFIGURATION
Computational Nanotechnology (McLennan, 2010). HUBzero is                                                 5.1        Containers and VNC
comprised of a Joomla! content management system, a ticketing
mechanism, version control, a wiki system based on Trac Open                                              HUBzero’s middleware relies on OpenVZ to operate multiple,
                                                                                                          isolated containers (or virtual environments [VEs]). Because

Copyright © 2011 for the individual papers by the papers' authors. Copying permitted only for private and academic purposes. This volume is published and copyrighted by its editors.
3rd International Workshop on Science Gateways for Life Sciences (IWSG 2011), 8-10 JUNE 2011



OpenVZ utilizes Linux, any computational tools deployed within a                                           access to precompiled libraries for openModeller (in general, the
hub must also be able to run under Linux. When a hub user                                                  libraries found inside our VEs differ from those of the underlying
launches a tool, the corresponding application is executed within a                                        HUBzero host system).
lightweight VE.                                                                                               We found it beneficial to test new tools with the hub Workspace
   In addition to OpenVZ, HUBzero employs VNC (Virtual                                                     tool. Through Workspace, error messages are displayed and
Network Computing [Richardson, Stafford-Fraser, Wood, &                                                    provide hints about any missing libraries or problems with
Hopper, 1998]) as a means of interacting with the interfaces of                                            invocation scripts. For example, the binary package we manually
tools through a Web browser. Hub tool interfaces are displayed by                                          created for openModeller contained several libraries that were not
connecting a TightVNC Java Applet to a RealVNC-enabled X                                                   carried along internally during HUBzero’s process of promoting
Window server (Kisseberth, 2010). Both the applet and server are                                           the openModeller tool from initial registration to installation. In
modified to work with the HUBzero system; the applet, of course,                                           this case, we had to manually copy the missing libraries to their
runs on a user’s Web browser, while the server executes inside an                                          destination. This issue may have been the result of the HUBzero
OpenVZ VE.                                                                                                 project not foreseeing the use of tools as extensive and complex as
                                                                                                           openModeller.
                                                                                                              For other hub tools developed in Java, we configured a more
                                      Hub Extended Architecture                                            complete Java environment for use by OpenVZ. In addition to the
                                                                                                           installation of the sun-java6-jdk package, we had to make changes
                                                                                                           to x86_64-linux.gnu.conf file for Java libraries.

                                              Web Server
                                                                                   Database                5.3        X Window Configuration
                                                                                                           Beyond providing access to all the necessary utilities and libraries
      User's desktop
                                                                                                           for our openModeller hub tool, we also added an X Window
                                                                                                           window manager, IceWM, to properly display openModeller
                            VE NFS              VE NFS            VE NFS
                                                                                                           Desktop within the HUBzero TightVNC Java applet. With IceWM
                             Client              Client            Client
                                                                                                           in place, openModeller Desktop appeared and behaved as it
                                                                 Middleware                                normally would on a graphical workstation. We employed the
                                                                                                           same window manager configuration used by the HUBzero
                                              NFS Client
                                                                                                           Workspace tool, although we changed themes and preference files
                                                                                                           to better handle the openModeller icons and window behavior.
                                                                                                              It is noteworthy that a special script is needed to invoke a hub
                                                                                                           tool; for openModeller certain requisites had to be in place within
                                              NFS Server                                                   this script.    In particular, we had to ensure that various
                                                                                                           openModeller libraries were pointed to and the IceWM window
                                                                                                           manager was started.
                                                                                                              As a result of our efforts, users familiar with openModeller
Fig. 4. The Collaboratory Project’s HUBzero infrastructure.
                                                                                                           Desktop who visit our hub do not have to learn a new interface.
                                                                                                           We have also embarked on an effort to construct new
                                                                                                           openModeller plugins for our hub tool. These extensions will be
5.2        Extending the OpenVZ Environment
                                                                                                           made available for hub users as they become ready.
The standard open source installation of HUBzero includes a basic
OpenVZ configuration that we chose to enrich with additional
software packages. For example, Subversion can be added to                                                 6      NFS CONFIGURATION
provide a more functional development environment for hub tools                                            In a hub, files (as opposed to Web content) are typically accessible
(although, in the most recent release of HUBzero, Subversion is                                            through the Workspace tool, which provides a Linux workstation
already included). We found significant value in adding a                                                  within a VE. Our extensions to HUBzero employ NFS thin clients
graphical editor (gedit), Web browser (kazehakase), a compression                                          that operate within our hub’s VEs and on our hub server.
package, and other essential utilities.
   The installation of gedit was not limited to just package                                               6.1        Extended Storage
deployment. Additional changes to HUBzero’s File Alteration                                                NFS allows sharing the resource of an external file system with
Monitor (FAM) scripts are required before gedit can be installed                                           many thin clients. In our case, these clients execute inside hub
without errors. FAM notifies applications when specific files or                                           VEs and permit users to access an external file system as though it
directories are changed (Silicon Graphics International, 2010).                                            were local to the hub’s server. As a result we can 1) use less disk
    In order to successfully execute a hub tool inside a VE, libraries                                     space on our HUBzero server and provide access to external,
on which the tool relies must also be added to the OpenVZ                                                  scalable data storage; and 2) leverage the external NFS storage to
environment. For our hub, we needed to provide various C/C++                                               permit public and private directories for hub users.
libraries (starting with the installation of the Debian build-essential                                      Initially, we made an effort to avoid running the NFS client on
package) and an extensive list of additional packages such as qgis,                                        our server, but found that this made collecting and processing user
gdal, sqllite, expat, grass and qt. In addition, through a locally                                         data difficult and did not easily integrate with HUBzero’s existing
mounted directory accessible within our hub’s VEs, we provided                                             configurations. Also, it became cumbersome to automate the

Copyright © 2011 for the individual papers by the papers' authors. Copying permitted only for private and academic purposes. This volume is published and copyrighted by its editors.
3rd International Workshop on Science Gateways for Life Sciences (IWSG 2011), 8-10 JUNE 2011




creation of public and private user account directories under NFS                                           Access control for tools similar to openModeller can be handled
(HUBzero’s middleware already has access to the user information                                          during the tool publishing process and restricted to a previously
we need for the dynamic creation of public/private directories                                            created group.
under NFS file system).                                                                                     It is worth noting that testing HUBzero extensions can become
  With our present architecture, an NFS client executes on both                                           somewhat cumbersome due to a requirement that each registered
our hub server and inside the VEs. Under this configuration, our                                          user have only one unique email address. For example, with our
NFS space is mounted when the OpenVZ service starts. We made                                              added restrictions to the Workspace tool, we needed to test
our NFS mount persistent for all VEs such that each VE becomes                                            multiple users who are members of various groups. Each user
an NFS client that is accessible by the user via the Workspace tool.                                      account, of course, requires a unique email address in order to be
This permits us to access user data on the host machine while                                             registered. However, this creates significant limitations for
simultaneously providing access to the NFS space from inside the                                          fabricating test accounts, since a legitimate email address must be
hub VEs. With this, NFS can be used for storing and sharing                                               devised for each account. Thus, we altered our hub so that this
openModeller input and results.                                                                           requirement, which we find useful in general, is relaxed for
                                                                                                          specific accounts that we set aside for testing purposes only.
6.2        Security Considerations
While data sharing is an important task, the ability to control data
access is also important. To accommodate our researchers’ need to                                         8       HANDLING OPENMODELLER RESOURCE
manage how their data are shared, we automated our hub to create                                                  CHALLENGES
a public and a private directory for each user who is permitted to                                        The expectations of our hub’s user community are that simulation
use the Workspace tool. When a given user launches the                                                    results are forthcoming with little delay. Even for small, simple
Workspace tool, HUBzero automatically checks to see if this is a                                          jobs, however, openModeller’s memory and CPU overhead is not
new user, and, if they are new, creates their public and private                                          insignificant. Moreover, the openModeller Desktop user interface
directories in NFS space. If the user is later deleted, a maintenance                                     can contribute a fair amount to this overhead, too. In general, such
routine is run to reclaim the defunct user’s storage. A user’s public                                     resource consumption can have a deleterious affect on any system
NFS directory is accessible by all Workspace tool users as a read-                                        if enough users simultaneously execute large openModeller jobs.
only folder, while their private NFS directory is accessible only to                                         To better handle this type of workload, we have begun
them.                                                                                                     implementing a stand-alone system to execute multiple numbers of
                                                                                                          simultaneous openModeller jobs. Although job processing is
                                                                                                          handled outside of HUBzero’s environment, job submission will
7      WORKSPACE AND TOOL RESTRICTIONS
                                                                                                          take place through a webpage hosted on our hub, while simulation
By default, the HUBzero platform makes the Workspace tool                                                 results will be accessible through the same NFS space that is made
available to any registered user. With such a configuration, our                                          available to users of our hub’s Workspace tool.
system’s resources (including NFS) would be open to use and,
possibly, abuse by anyone who registered with our hub. Given the
CPU cycles and large data sets involved in simulations, the misuse                                        9       OVERALL IMPRESSIONS OF HUBZERO
of hub resources is a significant risk.                                                                   After having been under development for a number of years,
   To better manage resource usage, we modified HUBzero so that                                           HUBzero was released to open source in 2010. This happened to
new users are granted access to the Workspace tool only upon                                              coincide with the start of our collaboratory project and, for the
request and after being vetted. When an unauthorized (but                                                 reasons already noted, we decided to try HUBzero and deploy our
registered) user tries to launch Workspace, they are given an                                             own hub rather than pay for a hosting plan through Purdue
option to request access through a trouble ticket. This request is                                        University.
dispatched to the hub administrator who may then determine                                                  Because we have only recently made our project’s hub available
whether or not to approve the desired access. If approved, the user                                       to the Internet, we have yet to accumulate enough traffic and
will be able to employ Workspace with access to both their home                                           benchmarking to offer a worthwhile opinion of HUBzero’s
directory and NFS public/private space.                                                                   capabilities while in production use and under load. Nevertheless,
   Our extensions to restrict the Workspace tool required a number                                        we can comment on deploying and securing HUBzero, as well as
of changes to the tool’s configuration. First, a new group was                                            the support offered by the HUBzero project.
created to manage the use of Workspace. Next, we created a new
resource through the HUBzero administrative interface, associating                                        9.1        Deployment
the aforementioned group name with this resource. Finally, we                                             We have deployed the open source HUBzero platform onto three
published the new resource and manually updated the hub database                                          different servers (one test and two production systems) multiple
so that Workspace access became limited to the newly created                                              times. Each server operates Debian Lenny as a virtual machine
group.1 In a related enhancement, we also extended the HUBzero                                            (VM) running under the Kernel-based Virtual Machine (KVM)
ticketing system logic to include a ticket status for Workspace                                           hypervisor (Solomon, 2011) with Red Hat Enterprise Linux 6 as
access approval (as touched upon above).                                                                  the host. We found it helpful to deploy HUBzero as a VM since
                                                                                                          we could make snapshots of successful installations and return to
1                                                                                                         these if things went awry. Also, operating a hub as a VM is a good
 This method was provided to us through communications with the
                                                                                                          way to manage the risk of incompatible hardware—for example, at
HUBzero project team.

Copyright © 2011 for the individual papers by the papers' authors. Copying permitted only for private and academic purposes. This volume is published and copyrighted by its editors.
3rd International Workshop on Science Gateways for Life Sciences (IWSG 2011), 8-10 JUNE 2011



one point we attempted to install Debian Lenny on a new physical                                           (since anyone logging into a hub server must do so as root) and
machine and ran into an unresolvable problem with the RAID                                                 significantly reduces the utility of OS activity audit trails. Related
adapter. With a VM acting as a layer of abstraction from                                                   to this, the root account is exposed to password guessing through
underlying hardware, such issues generally cannot happen. To                                               SSH login attempts. Finally, efforts to install file space monitoring
counteract the potential for loss of speed, especially with respect to                                     tools such as Tripwire appear to cause malfunctions in HUBzero.
I/O operations, our production host server is outfitted with 32                                            (Although, we cannot offer much detail about Tripwire’s erroneous
cores, 128 gigabytes of RAM, and 1.6 terabytes of hard disk space;                                         interactions with our hub deployments, as we elected not to spend
all cores, 120 gigabytes of RAM, and 1.5 terabytes of disk are                                             time diagnosing this issue.)
made available to the HUBzero VM.
   The HUBzero installation process at the time of this writing
consists of 77 steps spread over 25 major areas. This complexity
often results in errors that are challenging to debug. One option
the HUBzero project might consider implementing is a checkpoint
system for their install process. With this, at key intervals it would
be possible to know that a given installation is correct. To this
end, we have compiled our own internal list of checkpoints,
especially with regard to Workspace configuration. In a related
matter, although we have not yet tested it, the HUBzero project has
recently devised a beta version of a simplified installation process.
   Over time our familiarity and expertise improved regarding
HUBzero installations. Our team now maintains an internal wiki
to help maintain institutional knowledge on this front and keep
solutions to common problems close at hand. It now takes us
roughly eight hours to setup a new HUBzero server (without
adding our extensions).
9.2        Speed and Resource Consumption                                                                     Fig. 5. CPU utilization for 6, 12, and 18 simultaneously executing
An informal benchmarking of simultaneous openModeller Desktop                                                 openModeller jobs.
tool executions offered some encouraging results. In our test (see
Figure 5), each openModeller Desktop tool processed through a                                              9.4        Support
149-megabyte data set, resulting in no perceptible delay for 18                                            The support available through the HUBzero project has improved
simultaneous sessions. The maximum CPU utilization was 56%                                                 significantly over time. We typically see responses to our email
and the total running time was 76 seconds. As the number of                                                queries within 24 hours and, in some cases, even on weekends.
sessions was decreased, we observed that the total maximum                                                 Often, the issues we raise are already known to the HUBzero
running time changed only slightly (e.g., 70 seconds for six                                               project team and do not require much troubleshooting on their part.
simultaneous sessions); we believe this was due to the abundance                                           Both a knowledge base and a “Questions & Answers” resource are
of available resources for the tested number of sessions. For                                              available through the HUBzero project website. Although anyone
comparison, we found the maximum CPU utilization was 37% for                                               can post a question, only a handful of institutions appear to be
12 simultaneous sessions and 19% for six sessions. Memory usage                                            deploying their own hubs (as opposed to purchasing a hosting plan
was not significant given the amount of data processed (it                                                 through Purdue), so there is relatively little information to be found
remained below 6% of our hub’s total memory throughout the                                                 in existing support resources.
test). On a previous HUBzero VM (four CPU cores, four
gigabytes of RAM, and 500 gigabytes of hard disk) we had carried
out a similar test of the openModeller Desktop tool, but with fewer                                        10 CONCLUSION
simultaneous sessions. There, for 15 simultaneous sessions we                                              The abilities to execute simulation/modeling tools and share
noted the memory usage peaked at 31%, while CPU utilization                                                information are paramount to our collaboratory project. As such,
reached 94%; running time was considerably greater at about three                                          we found the HUBzero platform to be a proper fit for sharing
minutes.                                                                                                   computational tools and data. While the HUBzero platform does a
                                                                                                           good job of bringing users and modeling tools together, we
                                                                                                           extended its functionality to enable the use of sophisticated,
9.3        Security                                                                                        computationally intensive modeling tools. Also, we discovered
The “out of the box” security posture of HUBzero is best described                                         that a default hub requires additional security measures to protect
as open. To begin with, a newly installed hub will start life with                                         its network services (e.g., SMTP, MySQL, and LDAP) and restrict
several critical services exposed to its network. These include:                                           access to its OpenVZ environment.           In terms of resource
SSH, SMTP, MySQL, and LDAP. Each of these should be                                                        consumption, we tested relatively modest sets of data with the
appropriately restricted—in the case of SMTP, MySQL, and                                                   openModeller Desktop tool and will continue evaluating resource
LDAP, these can be safely restricted to the hub itself. Next, other                                        consumption for larger data sets as they become available. Finally,
than “root,” it is not possible to install a user account on a hub.                                        we provided seamless access to external Network File System
This makes it very difficult to know who is using the root account                                         drive space to store and share input and output data.

Copyright © 2011 for the individual papers by the papers' authors. Copying permitted only for private and academic purposes. This volume is published and copyrighted by its editors.
3rd International Workshop on Science Gateways for Life Sciences (IWSG 2011), 8-10 JUNE 2011




ACKNOWLEDGEMENTS

This material is based upon work supported by the National
Science Foundation under Grant No. 1029584.

REFERENCES

De Roure, D., Goble, C., & Stevens, R. (2009). The design and realisation of the
     virtual research environment for social sharing of workflows. Future Generation
     Computer Systems, 25(5), 561-567.
de Souza Muñoz, M., De Giovanni, R., de Siqueira, M., Sutton, T., Brewer, P.,
     Pereira, R., et al. (2011). openModeller: A generic approach to species' potential
     distribution modelling. GeoInformatica, 15(1), 111-135. doi:10.1007/s10707-009-
     0090-7
Kisseberth, Nicholas J. (2010), "Hub set-up - Setting up Middleware,"
     http://hubzero.org/resources/186
McLennan, M., & Kennell, R. (2010). HUBzero: A platform for dissemination and
     collaboration in computational science and engineering. Computing in Science
     Engineering, 12(2), 48-53.
Parallels Holdings Ltd. (2011). OpenVZ project website. Retrieved 3/19, 2011, from
     http://wiki.openvz.org/Main_Page
Richardson, T., Stafford-Fraser, Q., Wood, K. R., & Hopper, A. (1998). Virtual
     network computing. Internet Computing, IEEE, 2(1), 33-38.
Silicon Graphics International. (2010). File alteration monitor. Retrieved 4/5, 2011,
     from http://oss.sgi.com/projects/fam/
Smith, M. C. (2006). Linux NFS project website. Retrieved 3/19, 2011, from
     http://nfs.sourceforge.net/
Solomon, H. (2011). KVM - the linux kernel-based virtual machine. Retrieved 3/22,
     2011, from http://www.linux-kvm.com/




Copyright © 2011 for the individual papers by the papers' authors. Copying permitted only for private and academic purposes. This volume is published and copyrighted by its editors.