=Paper=
{{Paper
|id=None
|storemode=property
|title=The HUBzero Platform: Extensions and Impressions
|pdfUrl=https://ceur-ws.org/Vol-819/paper2.pdf
|volume=Vol-819
|dblpUrl=https://dblp.org/rec/conf/iwsg/AlberNW11
}}
==The HUBzero Platform: Extensions and Impressions==
3rd International Workshop on Science Gateways for Life Sciences (IWSG 2011), 8-10 JUNE 2011
The HUBzero Platform: Extensions and Impressions
Anna
Alber,1,*
Jarek
Nabrzyski,1,2
and
Timothy
Wright1,2
1Center
for
Research
Computing,
University
of
Notre
Dame
2Department
of
Computer
Science
&
Engineering,
University
of
Notre
Dame
ABSTRACT intended to offer an environment for users to create and work with
Motivation: We our efforts to work with HUBzero in a significant simulation tools that are made available in a hub. However, there
collaboration-oriented project, as well as our impressions of is potential for accidental and malicious abuse of such a powerful
HUBzero after nearly a year of interaction with the platform. Our virtual environment; for example, by gratuitous resource use or
assessment is a mixed one: while the HUBzero platform does a hacking.
good job of bringing users and modeling tools together, we found To provide hub access to openModeller and accommodate the
some noteworthy limitations of this recently open-sourced aforementioned needs, we have extended HUBzero in three ways.
technology. For example: hubs are generally configured to deploy First, we have significantly expanded the OpenVZ environment to
simulation tools of modest means with input and output data not support the execution of a tool as complex as openModeller. Next,
readily shared across user accounts; by default, the security we have integrated NFS (Network File System) (Smith, 2006) with
configuration of a HUBzero server is very open; hubs are inflexible in the file system made available through the HUBzero Workspace
terms of their deployment requirements, and there is a need for tool. Related to this, we have enhanced the process of hub user
more detailed documentation. In answer to the first (and most account creation to include the automatic setup of public and
serious) of these issues, we have extended HUBzero to enable the private NFS directories for specific users. Finally, we have made
use of more sophisticated modeling tools, such as openModeller it possible to restrict which hub users are granted access to the
Desktop, and to provide seamless access to external Network File Workspace and openModeller tools.
System drive space. To permit long-running, computationally intense openModeller
simulations, we have embarked on a separate project to kick off
and manage such jobs outside of the HUBzero environment. The
1 INTRODUCTION results of these simulations can be made available to our hub
The open source HUBzero platform offers an effective way for through NFS.
scientific and educational communities to share information, The remainder of this paper is organized as follows. Section 2
interact, and run simulations (McLennan, 2010). The latter presents the state-of-the art in collaborative, virtual organization
capability is considered to be HUBzero’s “signature service” and, tools. Section 3 discusses the Collaboratory Project and
indeed, is not typically available through similar Web platforms. simulations tools that we integrate with the HUBzero
We are currently employing HUBzero as part of an effort to infrastructure. In section 4 we present the HUBzero architecture.
build a so-called collaboratory: a virtual space in which users may Section 5 reviews the OpenVZ configuration and what we have
share information and innovate. The goal of our work is to support done to enhance this configuration. Section 6 addresses our
a multidisciplinary community of scientists with a focus on integration of NFS with HUBzero, including our use of public and
adaptation strategies for biological systems. In addition to private directories for each Workspace/openModeller user.
publishing and discussing content related to adaptation, users of Section 7 describes our changes to HUBzero to restrict access to
the HUBzero portal (or “hub”) need to leverage relevant the Workspace and openModeller tools. Section 8 briefly looks at
simulation tools, such as openModeller and openModeller Desktop a separate project to handle more intense openModeller jobs
(de Souza Muñoz et al., 2011). outside of our hub. Section 9 outlines our overall impressions of
Such tools have requirements that may not be satisfied by a working with HUBzero. Finally, in Section 10 we offer some
typical hub deployment. For example, there may be a need for concluding remarks about this project.
software libraries that do not normally exist in HUBzero’s
virtualized Linux environment (OpenVZ [Parallels Holdings Ltd.,
2011]). Also, for functionality purposes, users may have to
manage and archive a tool’s input and output files. As part of this
management there will almost always be a need to make some files
publically accessible (e.g., simulation results) and others private.
Tangentially related to these requirements is the need to restrict
access to a hub’s OpenVZ-based Workspace tool. Through
Workspace a user is given a private, virtual Linux workstation that
may be employed for nearly any purpose. Generally, this is
* To whom correspondence should be addressed.
Copyright © 2011 for the individual papers by the papers' authors. Copying permitted only for private and academic purposes. This volume is published and copyrighted by its editors.
3rd International Workshop on Science Gateways for Life Sciences (IWSG 2011), 8-10 JUNE 2011
resources, including the NSF TeraGrid, the Open Science Grid,
2 RELATED WORK and others. HUBzero’s middleware hides much of the complexity
of distributed and grid computing, handling authentication,
HUBzero offers many features that researchers need for effective
authorization, file transfer, and visualization, and letting the
collaboration. Major components include: various interactive
researcher focus on research.
simulation tools that can be accessed through almost any Web Aside from its core features, HUBzero was chosen for our
browser, a repository for online course materials and other collaboratory project due to its wide adoption by the US research
publications, mechanisms for uploading new resources, a tool community as well as our easy access to support services that are
development area, ratings and citations abilities, content tagging provided by Purdue University. We believe that MyExperiment
mechanisms, wikis and blogs, private and public collaboration could be adapted to our needs, but its focus on workflows wasn’t
areas, usage statistics, news and events, and more. While as good a fit as HUBzero. Our project heavily depends on climate
relatively few platforms are as feature-rich, there are, nevertheless, and adaptation simulation tools that need to be run within the
some worth mentioning. Collaboratory’s system.
MyExperiment (De Roure et al., 2009) is a virtual research
environment for collaboration and the sharing of experiments,
which aims to provide a “workflow bazaar” for any workflow
management system. An experiment is represented as an
application workflow rather than as an infrastructure or services
workflow. The design methodology of myExperiment was inspired
by the Web 2.0 approach that was used in systems such as
Facebook, MySpace, and Amazon. MyExperiment brings
functionality to the user through familiar interfaces and can be
combined with other services. Although its primary focus is on
workflows, the designers of myExperiment realized that there was
an immediate need to associate workflows with other information.
Thus, the myExperiment concept is about sharing digital objects
that include data, results, provenance information, tags, associated
documentation, etc. These individual items can be collected
together to form research objects, for example to record an
experiment. Unlike Twine (www.twine.com), BioMedExperts
(www.biomedexperts.com) or Nature Networking Fig. 1. The Collaboratory Project’s hub home page.
(network.nature.com), myExperiment is not intended to be a
general social networking environment for scientists. Instead, the
3 OVERVIEW OF THE COLLABORATORY
focus is on social networking around shared artifacts. In this way
PROJECT AND SIMULATION TOOLS
it is more like social bookmarking systems such as CiteULike
(www.citeulike.org) and Connotea (www.connotea.org)—but with 3.1 The Collaboratory Project
a much wider and richer remit than published articles, or social Climate change has compelled researchers to look not only for
content systems like YouTube (www.youtube.com), SlideShare causality, but also for adaptation strategies. Aimed at building a
(www.slideshare.net), and Flickr (www.flickr.com). It effectively virtual environment for comprehensive and informed decision-
creates a social network of people and the items that they share. making, the Collaboratory Project is comprised of a team of
The main difference between myExperiment and HUBzero is that researchers in ecology, computer science, law, and social science.
HUBzero is built using portal technologies and provides simulation We anticipate that our work will help foster an understanding of
tools that can be accessed from the browser and executed in the complexity in natural systems and charter a path from data to
HUBzero space. Furthermore, users of HUBzero can add new knowledge to insight and, finally, to action. Integrated with on-
resources and simulation tools within the HUBzero space. going and proposed survey research on expert opinion, we also
MyExperiment is built using Web 2.0 Ruby on Rails, rather than a plan to carry out social network analysis (collaboration networks,
portal framework, includes an API and offers remote execution in particular) and study the impact of a virtual organization on the
capabilities. decision-making process. We anticipate that our efforts will result
HUBzero is often compared to the highly successful Open in various cyber tools that can be adapted by any domain requiring
Courseware Initiative from MIT (web.mit.edu/ocw). However, collaborative decision-making.
HUBzero is more than just a repository for course materials. It is a The role played by our hub (see Figure 1) is to operate as a
place where researchers and educators can meet and accomplish shared space where researchers from numerous fields related to
real work. The HUBzero platform offers groups for private climate adaptation can run projection and simulation tools, devise
collaboration, software development projects in its forge area, new hub tools, and share results with colleagues and the public.
event calendars, and many other services designed to connect Our initial work focused on making computational simulation and
researchers and build a community. But most importantly, it projection tools publicly available. However, we soon extended
connects users to the simulation tools they need for research and our hub architecture to better enable easy data sharing with added
education. Simulation jobs can be dispatched on national grid privacy control. As more tools become available, we plan on our
Copyright © 2011 for the individual papers by the papers' authors. Copying permitted only for private and academic purposes. This volume is published and copyrighted by its editors.
3rd International Workshop on Science Gateways for Life Sciences (IWSG 2011), 8-10 JUNE 2011
hub becoming a central point of contact for experts in climate Source Project, and middleware that enables the execution of hub
change and adaptation strategies—a space where a suite of tools inside a user’s Web browser. To provide an easy-to-use
computational tools can be leveraged to simulate the effects of portal system for the scientific community, HUBzero also
climate change on species and project future distribution patterns. incorporates features for publishing articles and presentations, a
As its community of users grows in both number and diversity, our Q&A area, a wish list feature, and user group management. All
hub will become more than a repository of publicly available tools hub tools run in restricted virtual containers that control access to
and data: it will evolve into a virtual community for climate the underlying file system, networking, and other processes. Each
adaptation research where members contribute back by sharing user is granted access to their own home directory with quota
new tools, publications, use cases, and data.
Fig. 2. The openModeller experiment designer with multiple
algorithms, available environmental layers, and biological data running
on the Collaboratory Project’s hub.
Fig. 3. An openModeller simulation experiment with multiple models
with output map inside user's browser
3.2 OpenModeller Desktop
OpenModeller Desktop is an open source, fundamental niche limitations and a Debian Linux X Window environment.
modeling library with embedded Geographic Information System
(GIS) using the Quantum GIS (QGIS) libraries. It provides a 4.2 Site Walk-through
uniform method for modeling distribution patterns using a variety Our hub users can access various resources about climate
of widely used algorithms (de Souza Muñoz et al., 2011). The adaptation strategy ranging from legal documents to simulation
openModeller library is used to generate a projected occupancy tools. Users can also become contributors by uploading articles,
map on the basis of environmental parameters considered in the documents, and event information; creating new tools; and
model (e.g., temperature, precipitation, altitude) and biological participating in forum discussions. As new tools are added to the
data with point locations (see Figure 2). The openModeller hub, they contribute to the overall capacity for analysis and lead to
package can run simulations with multiple species and multiple more data; in turn, these data become input for steps in the
algorithms, and includes the ability to add newly developed models adaptation decision-making process.
as plug-ins. This flexibility and versatility is central to the Adding a new tool to the hub begins with a registration process
collaborative nature of our project, as it permits users to run a wide on the site, which alerts the hub administrator to create a code
variety of projections, and develop/share new algorithms. repository for the tool. The owner of the tool then iterates though a
Projected occupancy maps (see Figure 3) as well as other output software engineering process until the tool is ready for publication
can be shared, but with an option to keep user-specified files to the hub’s community. There is, of course, extensive
private. Ecologists, biologists, policy makers, and experts in other documentation available on the hub to assist newcomers with all
fields can use scientific data and simulation results to make HUBzero features, including tool creation.
informed decisions about climate change and its effect on species User groups may also be created to facilitate discussions and the
population. exchange of ideas through the hub. This mechanism offers
different levels of privacy, ranging from public to “by invitation
4 HUBZERO ARCHITECTURE only,” and permits users to more finely manage their discussions
and the sharing of data.
4.1 HUBzero Infrastructure
The HUBzero platform was developed at Purdue University to
support nanoHUB.org, an online community for the Network for 5 OPENVZ CONFIGURATION
Computational Nanotechnology (McLennan, 2010). HUBzero is 5.1 Containers and VNC
comprised of a Joomla! content management system, a ticketing
mechanism, version control, a wiki system based on Trac Open HUBzero’s middleware relies on OpenVZ to operate multiple,
isolated containers (or virtual environments [VEs]). Because
Copyright © 2011 for the individual papers by the papers' authors. Copying permitted only for private and academic purposes. This volume is published and copyrighted by its editors.
3rd International Workshop on Science Gateways for Life Sciences (IWSG 2011), 8-10 JUNE 2011
OpenVZ utilizes Linux, any computational tools deployed within a access to precompiled libraries for openModeller (in general, the
hub must also be able to run under Linux. When a hub user libraries found inside our VEs differ from those of the underlying
launches a tool, the corresponding application is executed within a HUBzero host system).
lightweight VE. We found it beneficial to test new tools with the hub Workspace
In addition to OpenVZ, HUBzero employs VNC (Virtual tool. Through Workspace, error messages are displayed and
Network Computing [Richardson, Stafford-Fraser, Wood, & provide hints about any missing libraries or problems with
Hopper, 1998]) as a means of interacting with the interfaces of invocation scripts. For example, the binary package we manually
tools through a Web browser. Hub tool interfaces are displayed by created for openModeller contained several libraries that were not
connecting a TightVNC Java Applet to a RealVNC-enabled X carried along internally during HUBzero’s process of promoting
Window server (Kisseberth, 2010). Both the applet and server are the openModeller tool from initial registration to installation. In
modified to work with the HUBzero system; the applet, of course, this case, we had to manually copy the missing libraries to their
runs on a user’s Web browser, while the server executes inside an destination. This issue may have been the result of the HUBzero
OpenVZ VE. project not foreseeing the use of tools as extensive and complex as
openModeller.
For other hub tools developed in Java, we configured a more
Hub Extended Architecture complete Java environment for use by OpenVZ. In addition to the
installation of the sun-java6-jdk package, we had to make changes
to x86_64-linux.gnu.conf file for Java libraries.
Web Server
Database 5.3 X Window Configuration
Beyond providing access to all the necessary utilities and libraries
User's desktop
for our openModeller hub tool, we also added an X Window
window manager, IceWM, to properly display openModeller
VE NFS VE NFS VE NFS
Desktop within the HUBzero TightVNC Java applet. With IceWM
Client Client Client
in place, openModeller Desktop appeared and behaved as it
Middleware normally would on a graphical workstation. We employed the
same window manager configuration used by the HUBzero
NFS Client
Workspace tool, although we changed themes and preference files
to better handle the openModeller icons and window behavior.
It is noteworthy that a special script is needed to invoke a hub
tool; for openModeller certain requisites had to be in place within
NFS Server this script. In particular, we had to ensure that various
openModeller libraries were pointed to and the IceWM window
manager was started.
As a result of our efforts, users familiar with openModeller
Fig. 4. The Collaboratory Project’s HUBzero infrastructure.
Desktop who visit our hub do not have to learn a new interface.
We have also embarked on an effort to construct new
openModeller plugins for our hub tool. These extensions will be
5.2 Extending the OpenVZ Environment
made available for hub users as they become ready.
The standard open source installation of HUBzero includes a basic
OpenVZ configuration that we chose to enrich with additional
software packages. For example, Subversion can be added to 6 NFS CONFIGURATION
provide a more functional development environment for hub tools In a hub, files (as opposed to Web content) are typically accessible
(although, in the most recent release of HUBzero, Subversion is through the Workspace tool, which provides a Linux workstation
already included). We found significant value in adding a within a VE. Our extensions to HUBzero employ NFS thin clients
graphical editor (gedit), Web browser (kazehakase), a compression that operate within our hub’s VEs and on our hub server.
package, and other essential utilities.
The installation of gedit was not limited to just package 6.1 Extended Storage
deployment. Additional changes to HUBzero’s File Alteration NFS allows sharing the resource of an external file system with
Monitor (FAM) scripts are required before gedit can be installed many thin clients. In our case, these clients execute inside hub
without errors. FAM notifies applications when specific files or VEs and permit users to access an external file system as though it
directories are changed (Silicon Graphics International, 2010). were local to the hub’s server. As a result we can 1) use less disk
In order to successfully execute a hub tool inside a VE, libraries space on our HUBzero server and provide access to external,
on which the tool relies must also be added to the OpenVZ scalable data storage; and 2) leverage the external NFS storage to
environment. For our hub, we needed to provide various C/C++ permit public and private directories for hub users.
libraries (starting with the installation of the Debian build-essential Initially, we made an effort to avoid running the NFS client on
package) and an extensive list of additional packages such as qgis, our server, but found that this made collecting and processing user
gdal, sqllite, expat, grass and qt. In addition, through a locally data difficult and did not easily integrate with HUBzero’s existing
mounted directory accessible within our hub’s VEs, we provided configurations. Also, it became cumbersome to automate the
Copyright © 2011 for the individual papers by the papers' authors. Copying permitted only for private and academic purposes. This volume is published and copyrighted by its editors.
3rd International Workshop on Science Gateways for Life Sciences (IWSG 2011), 8-10 JUNE 2011
creation of public and private user account directories under NFS Access control for tools similar to openModeller can be handled
(HUBzero’s middleware already has access to the user information during the tool publishing process and restricted to a previously
we need for the dynamic creation of public/private directories created group.
under NFS file system). It is worth noting that testing HUBzero extensions can become
With our present architecture, an NFS client executes on both somewhat cumbersome due to a requirement that each registered
our hub server and inside the VEs. Under this configuration, our user have only one unique email address. For example, with our
NFS space is mounted when the OpenVZ service starts. We made added restrictions to the Workspace tool, we needed to test
our NFS mount persistent for all VEs such that each VE becomes multiple users who are members of various groups. Each user
an NFS client that is accessible by the user via the Workspace tool. account, of course, requires a unique email address in order to be
This permits us to access user data on the host machine while registered. However, this creates significant limitations for
simultaneously providing access to the NFS space from inside the fabricating test accounts, since a legitimate email address must be
hub VEs. With this, NFS can be used for storing and sharing devised for each account. Thus, we altered our hub so that this
openModeller input and results. requirement, which we find useful in general, is relaxed for
specific accounts that we set aside for testing purposes only.
6.2 Security Considerations
While data sharing is an important task, the ability to control data
access is also important. To accommodate our researchers’ need to 8 HANDLING OPENMODELLER RESOURCE
manage how their data are shared, we automated our hub to create CHALLENGES
a public and a private directory for each user who is permitted to The expectations of our hub’s user community are that simulation
use the Workspace tool. When a given user launches the results are forthcoming with little delay. Even for small, simple
Workspace tool, HUBzero automatically checks to see if this is a jobs, however, openModeller’s memory and CPU overhead is not
new user, and, if they are new, creates their public and private insignificant. Moreover, the openModeller Desktop user interface
directories in NFS space. If the user is later deleted, a maintenance can contribute a fair amount to this overhead, too. In general, such
routine is run to reclaim the defunct user’s storage. A user’s public resource consumption can have a deleterious affect on any system
NFS directory is accessible by all Workspace tool users as a read- if enough users simultaneously execute large openModeller jobs.
only folder, while their private NFS directory is accessible only to To better handle this type of workload, we have begun
them. implementing a stand-alone system to execute multiple numbers of
simultaneous openModeller jobs. Although job processing is
handled outside of HUBzero’s environment, job submission will
7 WORKSPACE AND TOOL RESTRICTIONS
take place through a webpage hosted on our hub, while simulation
By default, the HUBzero platform makes the Workspace tool results will be accessible through the same NFS space that is made
available to any registered user. With such a configuration, our available to users of our hub’s Workspace tool.
system’s resources (including NFS) would be open to use and,
possibly, abuse by anyone who registered with our hub. Given the
CPU cycles and large data sets involved in simulations, the misuse 9 OVERALL IMPRESSIONS OF HUBZERO
of hub resources is a significant risk. After having been under development for a number of years,
To better manage resource usage, we modified HUBzero so that HUBzero was released to open source in 2010. This happened to
new users are granted access to the Workspace tool only upon coincide with the start of our collaboratory project and, for the
request and after being vetted. When an unauthorized (but reasons already noted, we decided to try HUBzero and deploy our
registered) user tries to launch Workspace, they are given an own hub rather than pay for a hosting plan through Purdue
option to request access through a trouble ticket. This request is University.
dispatched to the hub administrator who may then determine Because we have only recently made our project’s hub available
whether or not to approve the desired access. If approved, the user to the Internet, we have yet to accumulate enough traffic and
will be able to employ Workspace with access to both their home benchmarking to offer a worthwhile opinion of HUBzero’s
directory and NFS public/private space. capabilities while in production use and under load. Nevertheless,
Our extensions to restrict the Workspace tool required a number we can comment on deploying and securing HUBzero, as well as
of changes to the tool’s configuration. First, a new group was the support offered by the HUBzero project.
created to manage the use of Workspace. Next, we created a new
resource through the HUBzero administrative interface, associating 9.1 Deployment
the aforementioned group name with this resource. Finally, we We have deployed the open source HUBzero platform onto three
published the new resource and manually updated the hub database different servers (one test and two production systems) multiple
so that Workspace access became limited to the newly created times. Each server operates Debian Lenny as a virtual machine
group.1 In a related enhancement, we also extended the HUBzero (VM) running under the Kernel-based Virtual Machine (KVM)
ticketing system logic to include a ticket status for Workspace hypervisor (Solomon, 2011) with Red Hat Enterprise Linux 6 as
access approval (as touched upon above). the host. We found it helpful to deploy HUBzero as a VM since
we could make snapshots of successful installations and return to
1 these if things went awry. Also, operating a hub as a VM is a good
This method was provided to us through communications with the
way to manage the risk of incompatible hardware—for example, at
HUBzero project team.
Copyright © 2011 for the individual papers by the papers' authors. Copying permitted only for private and academic purposes. This volume is published and copyrighted by its editors.
3rd International Workshop on Science Gateways for Life Sciences (IWSG 2011), 8-10 JUNE 2011
one point we attempted to install Debian Lenny on a new physical (since anyone logging into a hub server must do so as root) and
machine and ran into an unresolvable problem with the RAID significantly reduces the utility of OS activity audit trails. Related
adapter. With a VM acting as a layer of abstraction from to this, the root account is exposed to password guessing through
underlying hardware, such issues generally cannot happen. To SSH login attempts. Finally, efforts to install file space monitoring
counteract the potential for loss of speed, especially with respect to tools such as Tripwire appear to cause malfunctions in HUBzero.
I/O operations, our production host server is outfitted with 32 (Although, we cannot offer much detail about Tripwire’s erroneous
cores, 128 gigabytes of RAM, and 1.6 terabytes of hard disk space; interactions with our hub deployments, as we elected not to spend
all cores, 120 gigabytes of RAM, and 1.5 terabytes of disk are time diagnosing this issue.)
made available to the HUBzero VM.
The HUBzero installation process at the time of this writing
consists of 77 steps spread over 25 major areas. This complexity
often results in errors that are challenging to debug. One option
the HUBzero project might consider implementing is a checkpoint
system for their install process. With this, at key intervals it would
be possible to know that a given installation is correct. To this
end, we have compiled our own internal list of checkpoints,
especially with regard to Workspace configuration. In a related
matter, although we have not yet tested it, the HUBzero project has
recently devised a beta version of a simplified installation process.
Over time our familiarity and expertise improved regarding
HUBzero installations. Our team now maintains an internal wiki
to help maintain institutional knowledge on this front and keep
solutions to common problems close at hand. It now takes us
roughly eight hours to setup a new HUBzero server (without
adding our extensions).
9.2 Speed and Resource Consumption Fig. 5. CPU utilization for 6, 12, and 18 simultaneously executing
An informal benchmarking of simultaneous openModeller Desktop openModeller jobs.
tool executions offered some encouraging results. In our test (see
Figure 5), each openModeller Desktop tool processed through a 9.4 Support
149-megabyte data set, resulting in no perceptible delay for 18 The support available through the HUBzero project has improved
simultaneous sessions. The maximum CPU utilization was 56% significantly over time. We typically see responses to our email
and the total running time was 76 seconds. As the number of queries within 24 hours and, in some cases, even on weekends.
sessions was decreased, we observed that the total maximum Often, the issues we raise are already known to the HUBzero
running time changed only slightly (e.g., 70 seconds for six project team and do not require much troubleshooting on their part.
simultaneous sessions); we believe this was due to the abundance Both a knowledge base and a “Questions & Answers” resource are
of available resources for the tested number of sessions. For available through the HUBzero project website. Although anyone
comparison, we found the maximum CPU utilization was 37% for can post a question, only a handful of institutions appear to be
12 simultaneous sessions and 19% for six sessions. Memory usage deploying their own hubs (as opposed to purchasing a hosting plan
was not significant given the amount of data processed (it through Purdue), so there is relatively little information to be found
remained below 6% of our hub’s total memory throughout the in existing support resources.
test). On a previous HUBzero VM (four CPU cores, four
gigabytes of RAM, and 500 gigabytes of hard disk) we had carried
out a similar test of the openModeller Desktop tool, but with fewer 10 CONCLUSION
simultaneous sessions. There, for 15 simultaneous sessions we The abilities to execute simulation/modeling tools and share
noted the memory usage peaked at 31%, while CPU utilization information are paramount to our collaboratory project. As such,
reached 94%; running time was considerably greater at about three we found the HUBzero platform to be a proper fit for sharing
minutes. computational tools and data. While the HUBzero platform does a
good job of bringing users and modeling tools together, we
extended its functionality to enable the use of sophisticated,
9.3 Security computationally intensive modeling tools. Also, we discovered
The “out of the box” security posture of HUBzero is best described that a default hub requires additional security measures to protect
as open. To begin with, a newly installed hub will start life with its network services (e.g., SMTP, MySQL, and LDAP) and restrict
several critical services exposed to its network. These include: access to its OpenVZ environment. In terms of resource
SSH, SMTP, MySQL, and LDAP. Each of these should be consumption, we tested relatively modest sets of data with the
appropriately restricted—in the case of SMTP, MySQL, and openModeller Desktop tool and will continue evaluating resource
LDAP, these can be safely restricted to the hub itself. Next, other consumption for larger data sets as they become available. Finally,
than “root,” it is not possible to install a user account on a hub. we provided seamless access to external Network File System
This makes it very difficult to know who is using the root account drive space to store and share input and output data.
Copyright © 2011 for the individual papers by the papers' authors. Copying permitted only for private and academic purposes. This volume is published and copyrighted by its editors.
3rd International Workshop on Science Gateways for Life Sciences (IWSG 2011), 8-10 JUNE 2011
ACKNOWLEDGEMENTS
This material is based upon work supported by the National
Science Foundation under Grant No. 1029584.
REFERENCES
De Roure, D., Goble, C., & Stevens, R. (2009). The design and realisation of the
virtual research environment for social sharing of workflows. Future Generation
Computer Systems, 25(5), 561-567.
de Souza Muñoz, M., De Giovanni, R., de Siqueira, M., Sutton, T., Brewer, P.,
Pereira, R., et al. (2011). openModeller: A generic approach to species' potential
distribution modelling. GeoInformatica, 15(1), 111-135. doi:10.1007/s10707-009-
0090-7
Kisseberth, Nicholas J. (2010), "Hub set-up - Setting up Middleware,"
http://hubzero.org/resources/186
McLennan, M., & Kennell, R. (2010). HUBzero: A platform for dissemination and
collaboration in computational science and engineering. Computing in Science
Engineering, 12(2), 48-53.
Parallels Holdings Ltd. (2011). OpenVZ project website. Retrieved 3/19, 2011, from
http://wiki.openvz.org/Main_Page
Richardson, T., Stafford-Fraser, Q., Wood, K. R., & Hopper, A. (1998). Virtual
network computing. Internet Computing, IEEE, 2(1), 33-38.
Silicon Graphics International. (2010). File alteration monitor. Retrieved 4/5, 2011,
from http://oss.sgi.com/projects/fam/
Smith, M. C. (2006). Linux NFS project website. Retrieved 3/19, 2011, from
http://nfs.sourceforge.net/
Solomon, H. (2011). KVM - the linux kernel-based virtual machine. Retrieved 3/22,
2011, from http://www.linux-kvm.com/
Copyright © 2011 for the individual papers by the papers' authors. Copying permitted only for private and academic purposes. This volume is published and copyrighted by its editors.