GWW2011 Grid Workflow Approach using the CELLmicrocosmos 2.2 MembraneEditor and UNICORE to commit and monitor GROMACS Jobs Sebastian Rubert1, Christian Gamroth1, Jens Krüger2 and Björn Sommer1,* 1 Bio-/Medical Informatics Department, Bielefeld University, Universitätsstraße. 25, D-33615 Bielefeld 2 Organic Chemistry, University of Paderborn, D-33098 Paderborn ABSTRACT possibility to temporarily access idle computer resources. And like Motivation: Molecular dynamic simulations of membrane systems are many other scientific associations, this technology came to the an important method for the prediction and analysis of attention of the membrane and protein modeling community as physicochemical properties. The CELLmicrocosmos 2.2 well (Birkenheuer et al. 2010). MembraneEditor (CmME) provides a comfortable workflow to In this work we present a first approach to connect CmME to generate lipid membranes with different conformations. While GROMACS-running cluster resources using the Grid-middleware CmME is intended to generate molecular structures on desktop and UNICORE (Schuller and Schumacher 2009), providing a fast and mobile computers in a very short time, the atomic simulation of easy approach to model and simulate membranes at the molecular exported membranes needs external high performance computer and atomic level. Computational resources were provided by the resources. In this work, a first approach of a direct connection Paderborn Center of Parallel Computing (PC²), which is involved between CmME and a cluster running GROMACS using the Grid- in the MoSGrid initiative (Wewior et al. 2010). MoSGrid is part of middleware UNICORE-6 is discussed. the D-Grid initiative, which is promoted by the ministry of education and science. 1 INTRODUCTION The generation and simulation of membranes is a very important 2 METHODS topic of chemical and physical biology. Model membranes are 2.1 Technical Details used to analyze physiochemical properties at a level still beyond the reach of present experimental and microscopic techniques. For the development of the GROMACS plug-in, CmME 2.2 and The CELLmicrocosmos 2.2 MembraneEditor (CmME) (Sommer GROMACS 4.5 are used. Java 1.6 and Java3D 1.5 are the current et al. 2011) provides a modular interactive shape-based software programming APIs included in the CmME. The computation takes approach to model membranes based on the Protein Databank place at the local grid resource BisGrid at the PC2 in Paderborn. format (PDB) (Berman et al. 2000). Different PDB lipid types can To gain access, a BisGrid account and a D-Grid certificate are be imported into CmME and used to generate membranes featuring needed. This grid resource can be accessed using UNICORE at different percentages of lipid distributions. The computation of an version 6.0 and presently has a local installation of GROMACS average-size membrane system (~500x500 Å2) on an actual version 4.5. It consists of 64 cores, 512 gigabyte main memory and desktop or mobile computer takes a few seconds up to several 10 terabyte harddisk. minutes, because the lipid placement algorithm does not change 2.2 UNICORE the atomic structure of single (rigid) molecules. The UNICORE middleware is a comfortable option to gain direct To simulate the atomic structure of the exported PDB membranes, access to grid resources (Gesing et al. 2010). To connect to a molecular dynamics programs like GROMACS (Hess et al. 2008) server resource a key-store is used to provide the security needed. are deployed. This computationally very space- and time- A job can be started by providing a job file containing information consuming task needs adequate computer resources. Many about the application, imports, exports and some server settings research groups have the opportunity to use internal or external like the number of cores to be used for this calculation. This cluster facilities for molecular simulations. But often this procedure can be accomplished through a rich client, a command possibility does not exist for multiple reasons. line client or the high level API directly from a Java program. To This fact is one reason why the public and scientific attention for connect to the grid server a property file containing information Grid computing is raising rapidly. The advantages of Grid about a keystore and the computation resource is needed. Every computing lies in the possibility to use a Grid-middleware, which single command has to be included in a job file, which needs to be enables the user to use the power of large computer resources uploaded. The application output stdout, stderr and created files without the need to know and configure the underlying hardware can be download after completion of the job. and software. Especially companies like Google and Amazon raised attention for this field by giving external users the 2.3 CELLmicrocosmos MembraneEditor 1 CmME currently implements four algorithms for the arrangement been generated with the final MD run and loaded back into the of lipids. After defining the percental lipid distribution, an CmME. algorithm can be started to create the initial membrane according to the given percentages. In addition, CmME has an algorithm 3.2 Plug-in Design plug-in interface, which was originally integrated to create external The Plugin is developed around a GUI Toolbar, which is user-specific algorithms. The Gromacs plug-in discussed later in inherited from the MembraneAlgorithm-Interface. The GUI this work, is based on this plug-in interface. The advantage of this Toolbar offers the ability to open different windows to configure approach is that the plug-in can be excluded from the Standard the chosen algorithm (in this case the GROMACS plug-in). Edition of CmME, because the regular user does not need Furthermore, the assembling of the settings window is adjusted by GROMACS support. Another feature of CmME is that the the GromacsData Class, where all GROMACS application exported PDB format is adjustable to the requirements of different values are stored. By changing values in the settings window the programs. This feature is used for exporting the membrane to data is modified through a controller class GROMACS. (GromacsController, see Fig. 1). In addition, the job list is saved in the data class and can be transferred to the calculation 2.4 GROMACS window at the time the corresponding button is clicked. The GROMACS is an open source molecular dynamics software, calculation window is assigned to start and stop the configured which is very popular in the membrane and protein modeling jobs. Changes in the job list are monitored and executed by an community (Krüger and Fels 2010). A standard GROMACS additional controller class, the JLDController. On submitting workflow consists of eight steps. At first the editconf a job, a ProcessRunner is created and a GROMACS process is application is called to create a box around the virtual membrane. started. To show the output of the started process the After that, genbox is used to fill this box with water (needed for StreamGobbler listens to the given input stream. the molecular movement of the atoms). Lastly grompp creates the To gain access to a network resource like the BisGrid cluster, an input file for the energy minimization step, which is computed in additional UNICOREConnector class is implemented (see mdrun. These two steps are repeated three times for the Figure 2). This thread is started in the settings window and can be minimization, equilibration and finally for the molecular invoked by the calculation window. To simplify the usage of simulation of the membrane. UNICORE connections, the GUI classes merely need to use this connector class to run jobs on the grid server. All necessary UNICORE operations are implemented. 3 RESULTS 3.3 The GUI 3.1 Plug-in Workflow The GUI consists of two main parts. In the settings window, the To control the GROMACS workflow (see Fig. 1) discussed above user can set up parameters of the different applications, which are with CmME, a plug-in has been developed. Before adjusting the shown in different tabs (see Fig. 3), connect to a UNICORE parameters of the application the membrane has to be created. resource, load and save application configuration to/from XML file Afterwards the GROMACS plug-in is used to set up the and create a job list. The connection state is indicated by a text parameters of the needed commands and put them into the job list line, which is lighted in green if a connection is established (see Fig. 2). Each job from the job list can be launched as often as successfully. To give the user additional feedback about the file required; the sequence and the configuration can be changed again operation, every upload is illustrated by a progress bar and a dialog prior to that. window to show an error or the success of the process. Each The CmME GROMACS plug-in uses the high level API from Application can be configured by checkboxes, spinnerfields or by UNICORE to send the computed membrane to the grid server. just providing a file-choose dialog. A resulting command is Calculation of membranes using GROMACS can be completed in generated automatically in a text field for each application. a short time using the strength of a grid system compared to a local Furthermore each application-tab could be saved to and loaded machine. UNICORE connections can be established when the job from XML files. The job list (see Fig. 2) is sent to the calculation list is generated by providing a properties file. The needed files for window which has the ability to reconfigure some jobs, start and the simulation can be uploaded via the graphical user interface of stop them. To give the user some feedback, the standard output and the GROMACS plug-in. Furthermore, the configuration of each error channels are displayed. GROMACS application can be saved in an XML file and loaded This kind of invoking a GROMACS job is very comfortable and again if a similar simulation needs to be done. At the time, a job is easy to use. No local GROMACS installation is needed and a configured and finally started; a job file is created and committed created workflow can be loaded into the GUI via XML. via UNICORE to the server. While the GROMACS application runs on the grid resource it is possible to check the status of the job using the CmME GROMACS plug-in. When the job is finished, the standard output and the standard error channel from the invoked GROMACS tools are presented. If desired, the produced file is downloaded to the local machine. This procedure is repeated for every single job until the molecularly simulated membrane has Fig. 1. CmME GROMACS Plug-in: Workflow Graphic Fig. 2. CmME GROMACS Plug-in: Communication via UNICORE 3 Rubertet al. Fig. 3. CmME GROMACS Plug-in: The GUI of mdrun ongoing CmME MD Edition initiative, which develops a CmME plug-in 3.4 Potential Problems A Problem in the GROMACS workflow is the potential requirement for changes in the configuration files by the user during the job execution. Topology files, which are required by the genbox command, can be created based on information given by specially adjusted to the needs of molecular dynamic simulations CmME. But the balancing of charge in the membrane needs to be using GROMACS. In addition, a direct and a SSH connection to done manually or in the calculation window. GROMACS is in preparation. The great advantage of the GROMACS plug-in is the link to 3.5 Performance CmME. In contradiction to other projects like GUIMacs (Kota The Performance of the whole calculation depends on the used grid 2007), the membrane, which has to be simulated, can directly be resource and the size of the given membrane. Using a local parsed into the GROMACS tool. In addition, the connection to grid GROMACS installation on a single machine can take weeks to resources makes the plug-in a powerful appliance for molecular complete, even the simulation for only one lipid may take half an membrane simulation. hour on a local machine. Because of this, a grid resource like The Webstart version of CmME is available free-of-charge from BisGrid is indispensable to get a result within a reasonable time. http://Cm2.CELLmicrocosmos.org The calculation itself, and the connection as well as some file uploading is done in different threads to keep the main GUI running. ACKNOWLEDGEMENTS We thank the Paderborn Center for Parallel Computing (PC²) for 4 CONCLUSION providing access to BisGrid. Special thanks go to Georg Here we present a fast and comfortable way to generate and Birkenheuer and Johannes Schuster for valuable discussion about simulate membranes at the molecular and atomic level by grid middleware. Our thanks go to Kai Löwenthal, whose diploma combining CmME and GROMACS using UNICORE. To make the thesis (Löwenthal 2005) provided a good basis for discussion, and configuration even more simple an additional GUI mode is his thesis supervisor Dr. Dieter Lorenz. planned. The creation of the job list should be done by one click Funding: This work was partially funded within the Graduate and the user will be guided directly to the calculation window to College Bioinformatics (GK635) of the DFG (German Research start the GROMACS workflow. This approach is a first part of the Foundation). We are grateful to Prof. Ralf Hofestädt for continuous 4 support, to the Bio-/Medical Informatics Group of Bielefeld University, where this work has been realized, and all other people supporting or participating in this project: http://team.CELLmicrocosmos.org REFERENCES Berman, H. M.; Westbrook, J.; Feng, Z.; Gilliland, G.; Bhat, T.N.; Weissig, H.; Shindyalov, I.N.; Bourne, P.E. (2000) The Protein Data Bank. Nucleic Acids Res., 28(1):235-242. Birkenheuer, G.; Breuers, S.; Brinkmann, A.; Blunk, D.; Gesing, S.; Herres-Pawlis, S.; Krüger, J.; Packschies, L.; Fels, G. (2010) Grid-Workflows in Molecular Science Software Engineering 2010. GI-Edition - Lecture Notes in Informatics (LNI), P- 160:177-184. Gesing, S.; Marton, I.; Birkenheuer, G.; Schuller, B.; Grunzke, R.; Krüger, J.; Breuers, S.; Blunk, D.; Fels, G.; Packschies, L.; Brinkmann, A.; Kohlbacher, O.; Kozlovszky, M. (in print 2010) Workflow Interoperability in a Grid Portal for Molecular Simulations. IWSG2010 (International Workshop on Science Gateways), Catania, Italy. Hess, B.; Kutzner, C.; van der Spoel, D.; Lindahl, E. (2008) GROMACS 4: Algorithms for Highly Efficient, Load-Balanced, and Scalable Molecular Simulation. J. Chem. Theory Comput., 4(3):435–447. Kota, P. (2007) GUIMACS - a Java based front end for GROMACS. In Silico Biol., 7(1):95-99. Krüger, J.; Fels, G. (2010) Ion Permeation Simulations by Gromacs - An Example of High Performance Molecular Dynamics. In Concurrency and Computation: Practice and Experience; Fox , G. C.; Moreau, L., Eds., ISSN: 1532-0634. Löwenthal, K. (2005) Molekulardynamische Berechnungen auf Basis von GROMACS in einem hybriden Netzwerk. Diploma thesis, Bielefeld University. Schuller, B.; Schumacher, M. (2009) Space-Based Approach to High-Throughput Computations in UNICORE 6 Grids. Lect. Notes Comput. Sc., 5415, 75-83. Sommer, B.; Dingersen, T.; Gamroth, C.; Schneider, S. E.; Rubert, S.; Krüger, J.; Dietz, K.-J. (forthcoming 2011) )CELLmicrocosmos 2.2: A modular interactive shape-based software approach to model scalable PDB membranes with definable lipid compositions and semi-automatic protein placement. J. Chem. Inf. Model. Wewior, M.; Packschies, L.; Blunk, D.; Wickeroth, D.; Warzecha, K.; Herres-Pawlis, S.; Gesing, S.; Breuers, S.; Krüger, J.; Birkenheuer, G.; Lang, U. (in print 2010) The MoSGrid Gaussian portlet - Technologies for Implementation of Portlets for Molecular Simulations. IWSG2010 (International Workshop on Science Gateways), Catania, Italy. 5