=Paper= {{Paper |id=None |storemode=property |title=MoSGrid: Progress of Workflow driven Chemical Simulations |pdfUrl=https://ceur-ws.org/Vol-826/paper02.pdf |volume=Vol-826 }} ==MoSGrid: Progress of Workflow driven Chemical Simulations == https://ceur-ws.org/Vol-826/paper02.pdf
Grid Workflow Workshop 2011                                                                                                                GWW 2011



MoSGrid: Progress of Workflow driven Chemical
Simulations
Georg Birkenheuer1∗, Dirk Blunk2 , Sebastian Breuers2 , André Brinkmann1 ,
Gregor Fels3 , Sandra Gesing4 , Richard Grunzke5 , Sonja Herres-Pawlis6 ,
Oliver Kohlbacher4 , Jens Krüger3 , Ulrich Lang7 , Lars Packschies7 ,
Ralph Müller-Pfefferkorn5 , Patrick Schäfer8 , Johannes Schuster1 ,
Thomas Steinke8 , Klaus-Dieter Warzecha7 , and Martin Wewior7
1
  Paderborn Center for Parallel Computing, Universität Paderborn.
2
  Department für Chemie, Universität zu Köln.
3
  Department Chemie, Universität Paderborn.
4
  Zentrum für Bioinformatik, Eberhard-Karls-Universität Tübingen.
5
  Zentrum für Informationsdienste und Hochleistungsrechnen, Technische Universität Dresden.
6
  Fakultät Chemie, Technische Universität Dortmund.
7
  Regionales Rechenzentrum, Universität zu Köln.
8
  Konrad-Zuse-Institut für Informationstechnik Berlin.




ABSTRACT                                                                          Parallel to the extension of WS-PGRADE and gUSE for generic
Motivation: Web-based access to computational chemistry grid                      workflows, MoSGrid started to implement intuitive portlets for the
resources has proven to be a viable approach to simplify the use                  orchestration of specific workflows. The workflow application for
of simulation codes. The introduction of recipes allows to reuse                  Molecular Dynamics is described in Section 3, followed by the
already developed chemical workflows. By this means, workflows                    workflow application for Quantum Mechanics in Section 4. Section
for recurring basic compute jobs can be provided for daily services.              5 covers the distributed data management for the workflow systems.
Nevertheless, the same platform has to be open for active workflow
development by experienced users. This paper provides an overview
of recent developments of the MoSGrid project on providing tools and
instruments for building workflow recipes.                                        2    THE WORKFLOW-ENABLED GRID PORTAL IN
Contact: birke@uni-paderborn.de                                                        MOSGRID
                                                                                  The MoSGrid portal is developed on top of WS-PGRADE [3, 4], a
                                                                                  workflow-enabled grid portal. The chosen WS-PGRADE version is
1     INTRODUCTION                                                                based on the open-source portal framework Liferay [5] and supports
                                                                                  the standards JSR168 [6] and its successor JSR286 [7]. The choice
The BMBF funded MoSGrid project supports the computational
                                                                                  of using these standards assures the sustainability of the developed
chemistry community with an easy access to powerful compute
                                                                                  portlets.
resources. The developed MoSGrid portal1 offers access to
                                                                                     Users are enabled to create, change, invoke, and monitor
molecular simulation codes available on the German Grid resources.
                                                                                  workflows via WS-PGRADE, which contains a graphical workflow
Chemical scientists are supported with instruments to handle and
                                                                                  editor. WS-PGRADE functions as a highly flexible graphical user
orchestrate complex molecular simulation methods.
                                                                                  interface for the grid User Support Environment (gUSE). This
  The complexity of the various chemical recipes projected by
                                                                                  virtualization environment provides a set of services for distributed
the simulations are mapped to workflows. Commonly used simple
                                                                                  computing infrastructures, where developers and end users
workflows can be accessed and directly used by the users. A
                                                                                  can share sophisticated workflows, workflow graphs, workflow
workflow editor based on WS-PGRADE allows users to develop,
                                                                                  templates, and workflow applications via a repository (cf. figure 1).
improve, and publish complex workflow constructs. First results are
                                                                                     gUSE contains a data-driven workflow engine and the
presented in [1, 2].
                                                                                  dependencies of single steps in a workflow are represented by
  In this paper, we describe the recent developments for the creation
                                                                                  their connections between their input and output. The workflow
of the MoSGrid infrastructure and the embedded workflow system.
                                                                                  engine facilitates the management of workflows, which are based
We start with a description of the integration of the gUSE workflow
                                                                                  on directed acyclic graphs (DAGs). DAGs allow following workflow
system from WS-PGRADE into the MoSGrid portal in Section 2.
                                                                                  constructs:

∗ to whom correspondence should be addressed
                                                                                      • Steps: A single step in the workflow describes a job with its
1   Access to the MoSGrid portal: http://mosgrid.de/portal                              parameters, the input and the output.


Copyright c 2011 for the individual papers by the papers’ authors. Copying permitted only for private and academic purposes.                         1
Birkenheuer et al



                                                                         users to submit jobs to UNICORE [10] or to local resources. Hence,
                                                                         short pre-processing and/or post-processing steps of the workflow
                                                    User interface
                                                                         can be invoked on the gUSE server and the resource-consuming
                                                     WS-PGRADE           steps can be invoked in grid infrastructures.



                                                                         3   WORKFLOW APPLICATION BY MOLECULAR
                                                      High-level             DYNAMICS
                                                     middleware
                                                     service layer       The MoSGrid project implemented the first portlet for Molecular
                                                         gUSE            Dynamic (MD) simulations, (i) a submission interface for Gromacs
                                                                         portable run input (tpr) files and (ii) a molecular equilibration
                                                                         for protein simulations in explicit water are available. Figure 2
                                                    Grid resources       shows a screenshot of the portal user interface for the latter protein
                                                   middleware layer
                                                      UNICORE 6          simulation. The design of the portlet allows an easy integration of
                                                                         further workflows for chemical recipes. The Gromacs simulation
                                                                         toolkit supports calculations with many instruments, which have to
                                                                         be orchestrated by a workflow system.
Fig. 1. The gUSE Architecture.                                              The UNICORE 6 infrastructure with an embedded workflow
                                                                         engine was already available for the implementation. Unfortunately,
                                                                         the use of this infrastructure was limited to the UNICORE
                                                                         Rich Client or the UCC. Already available APIs for accessing
    • Conditions: The workflow engine uses conditions to select the      the UNICORE infrastructure like HiLA did not support the full
      next step to be processed.                                         workflow functionality.
    • Splits: A split is unconditional and delivers data for the next       In order to use the UNICORE workflow infrastructure for the
      parallel steps.                                                    MoSGrid MD portlet, we decided to implement a solution named
    • Joins: A join is executed after parallel steps are finished and    UccAPI. The API has been implemented using the abilities of the
      have delivered their output.                                       UCC client. This strategy allows us to use well-tested code. We
                                                                         decided to submit even singular jobs as workflows through the MD
Loops can be represented by parameter sweeps, which allow to             portlet. This design decision eases the creation of the UccAPI, since
specify varying parameters for single steps or workflows. Hence, the     only the workflow dependent commands needed to be implemented.
workflow engine invokes the same job multiple times with different          However, UccAPI does not need a separated UCC client,
parameters. Furthermore, the user can integrate generator steps and      all necessary functionality is provided either embedded in UCC
collector steps. A generator step produces multiple datasets, which      libraries or has been adapted, because of some parameters hidden
are presented in the workflow graph as one output file. Hence, the       deeply in the source-code.
input of the corresponding collector step is presented in the graph as      Other extensions to the UCC were necessary, because it was
one input file and internally the multiple files included in the input   designed to be used as stand-alone application and not as supporting
are processed by the collector step.                                     library. Some of the extensions implemented the error handling.
  The workflow engine encapsulates the single steps and invokes
so-called submitters (Java-based applications) for each job. Via
these submitters, gUSE offers the possibility to submit jobs to grid
middleware environments like Globus Toolkit and gLite, desktop
grids, clouds and clusters, and unique web-services. MoSGrid
extends the features of gUSE by integrating UNICORE 6 [8]. The
submitter has to provide the following methods:

    • actionJobSubmit: Submission of a job including authentication,
      authorization, and data staging
    • actionJobAbort: Cancel a job
    • actionJobOutput: Get the output of a job
    • actionJobStatus: Query the status of a job
    • actionJobResource: Return the resource, where the job was
      submitted to

The developed UNICORE submitter is based on the UCC
(UNICORE commandline client) libraries. In contrast to program-
ming interfaces like HiLA, the UCC libraries allow to process            Fig. 2. A screenshot of the MD portlet.
UNICORE workflows [9]. The adapted WS-PGRADE portal allows


2
                                                                                                                MoSGrid - Current Innovations



The reason is that most errors are difficult to trace, because they           In contrast to its major competitors, the Gaussian program
occur only in complex test scenarios. Static variables were another        package does not allow to independently call separate executables
challenge as UCC is designed to be used once for one job and not           for different tasks. Instead, a single executable parses the input file,
for multiple executions in parallel threads. This means that some          determines the requested calculations and devises a strategy to call
static variables are wrongly set from the last submission or are           a series of subroutines, known as links. This intrinsic workflow
not properly synchronized. This weakness raises the risk of errors         concept, although seemingly comfortable for the end user, renders
during multiple and parallel execution.                                    a direct override or a more granular control on the calculation level
   The user of the MD portlet has most likely no UNICORE server            difficult.
installation available. Therefore, file uploads need to be handled            While the multi-step job option of Gaussian allows the
separately. The upload of the files is processed as follows. Firstly,      concatenation of compute jobs reusing previous results, workflows
the user uploads the data for the workflow. Then, an appropriate           beyond this stage can only be achieved via external control, as
number of compute nodes has to be chosen, the simulation length            realized in the Quantum Mechanics-Portlet described here.
has to be set, and it has to be defined how many nanoseconds the              This workflow, while at present directly implemented in the
job should simulate. At last, according to this input the configuration    portlet, will eventually be managed through the facilities provided
files were adapted.                                                        by WS-PGRADE.
   The workflow description of UNICORE does not include a data                Typically, the workflow for a quantum chemical calculation with
stage-in from the client. The solution for the first prototype is to       Gaussian, as implemented in the portlet, consists of three phases,
transparently define the input files in a separate job-file and transfer   namely a pre-processing phase, the execution of the job, and a post-
them by a predecessor job from the portlet to the UNICORE global           processing phase.
storage. At a later stage of the project, an XtreemFS connection              In the pre-processing phase, two basic workflows are currently
should avoid this kind of data stage-in, as described in Section 5.        available. A user may opt to upload a valid Gaussian input file
   The UCC API represents workflows by the use of end point                previously prepared. In this workflow, no limits regarding rare
references (EPRs). This is a well designed standard for service            keywords or exotic options exist. Less experienced users will
identification but does not improve readability. To avoid confusing        however prefer the assistance provided by a graphical user interface
users, it was decided to hide the EPRs and instead to show only            in the second workflow. Here, jobs may be configured by choosing
conclusive workflow names. An exemplary name is the combination            among reasonable parameters (e.g. basis sets) and tasks. Once
of user name, time stamp, and workflow recipe.                             created, these jobs are open to further editing and adjustment of
                                                                           parameters prior to submission.
                                                                              The job execution is realized using the UNICORE command line
4   WORKFLOW APPLICATION BY QUANTUM                                        client (UCC). A simple wrapper class encapsulates the UCC tool
    MECHANICS                                                              and emulates basic user interaction while providing the client’s
                                                                           messages. This approach was chosen until a more sophisticated
With respect to a previous user survey in the MoSGrid community,
                                                                           solution becomes available. This could have been the library which
the first prototype of the Quantum Mechanics-Portlet [11] aimed for
                                                                           was developed later along with the MD-Portlet or, as it now is
the implementation of Workflows for the Gaussian [12] suite, while
                                                                           available, the framework provided by WS-PGRADE.
the support for Turbomole [13], GAMESS-US [14, 15] and other
                                                                              The results of a successful Gaussian run are retrieved and stored
relevant quantum chemical suites was shifted to a later time.
                                                                           on the portal server. The portlet initiates different post-processing
                                                                           scripts.
                                                                              Early attempts to process the Gaussian log file via shell scripting
                                                                           using tools from the Unix tool chain (e.g. grep tr, etc.) turned out
                                                                           to be tedious, ineffective and where thus soon replaced by scripts
                                                                           written in Python [16].
                                                                              The Python scripts currently operative were written in close co-
                                                                           operation with colleagues from the chemical community to match
                                                                           their specific requirements. The total energy of each step in the
                                                                           course of a geometry optimisation is routinely parsed from log files.
                                                                           In addition to this common task in quantum chemical calculations,
                                                                           thermochemical data, infrared and raman absorption spectra, as well
                                                                           as the outcome of natural bond order (NBO) analysis are retrieved.
                                                                              These results are stored to platform-independent csv files,
Fig. 3. A screenshot of the QM-Portlet.                                    displayed in the portlet and made available for download.
                                                                              At the current stage, optimized molecular geometries can
                                                                           be retrieved through Python scripts from machine independent
   Gaussian compute jobs are described in a single input file, which       Gaussian outputs (e.g. formatted checkpoint files) with the use of
includes the molecular geometry and a route card that defines the          the free Pybel [17] module, which provides Python bindings to the
task.                                                                      OpenBabel [18, 19] library.
   During a calculation, the progress is written to an unstructured
output stream, the log file. In addition, calculated molecular
properties are stored in a platform-dependent binary checkpoint file.


                                                                                                                                                 3
Birkenheuer et al



5     DISTRIBUTED DATA MANAGEMENT IN                                     (MRCs) contain the filename, unique file identifier, owner and
      WORKFLOWS WITH XTREEMFS AND UNICORE                                directory tree, for example. XtreemFS provides packages for the
                                                                         most common Linux distributions and a Filesystem in Userspace-
5.1     Data Flow
                                                                         based (FUSE) client for seamless integration. The client acts like a
Data management is an integral part of the MoSGrid portal. The data      local file system by translating POSIX file system calls into requests
flow within a MoSGrid workflow originally passed four sites:             to the MRCs and OSDs. XtreemFS seems to be a large, secure and
                                                                         replicated local file system for its users.
    1. the WS-PGRADE portal in Tübingen,                                   XtreemFS provides custom pluggable security policies, X.509
    2. the ZIH in Dresden being resource provider with the               (proxy-)certificates, Globus gridmap and UNICORE UUDB files.
       UNICORE 6 middleware,                                             Furthermore, XtreemFS implements POSIX access rights and ACLs
    3. frontend nodes of the D-Grid clusters and                         based on the distinguished name (DN) entries of the user X.509
                                                                         certificates for authorization.
    4. compute nodes within a D-Grid cluster.
                                                                            The support of GSI enables us to seamlessly integrate XtreemFS
                                                                         with the portal and UNICORE.
The simulation results were propagated using the reverse path.
When dealing with large-scale or a large number of jobs, this
introduces a lot of network traffic between clusters and the portal,     5.3    The Integration of XtreemFS in UNICORE and the
which will eventually become a bottleneck.                                      Portal
   In order to safeguard the scientific data and provide a distributed   Integration in WS-PGRADE: A JSR286 compliant portlet,
access all data is stored redundantly in the Grid file system            deployed in WS-PGRADE, provides simple access to XtreemFS
XtreemFS [20]. Compared to the original approach, when uploading         using a web browser. It manages the authentication and the upload
data for a simulation, XtreemFS handles the placement of replicas        of simulation input data to XtreemFS.
of the data at the according cluster instead of using less efficient
data transfers via UNICORE. Additionally, the portal server does
                                                                         Integration in the UNICORE 6 middleware: XtreemFS is mounted
not need to store any data. However, uploads and downloads of
                                                                         on the frontend node of each grid cluster using the node’s host
simulation data are still realized via the portal.
                                                                         certificate and the UNICORE UUDB, which is a mapping of DN
   In a first step the data flow using XtreemFS is as follows:
                                                                         entries or full public keys of certificates to local system users. On
                                                                         this node the UNICORE Target System Interface (TSI) is installed,
    • Input data flow: A user uploads simulation input data to the
                                                                         which communicates with the batch system and makes storage
      Grid file system using a web interface on the portal server. The
                                                                         available. By integrating XtreemFS with the TSI the data becomes
      workflow engine propagates the location of these files within
                                                                         available through UNICORE. Data transfers in the MoSGrid context
      XtreemFS to the UNICORE 6 middleware, which then takes
                                                                         are now mediated by the UNICORE 6 middleware.
      care of transferring the files from the frontend node to the
      compute nodes of a cluster.
    • Output data flow: The compute nodes produce simulation
      results, which will be passed to XtreemFS by the UNICORE 6         6     CONCLUSIONS AND FUTURE WORK
      middleware at the end of a simulation. Finally, the data is        This paper has shown an overview of recent developments of the
      available for access by the user via the web interface on the      MoSGrid project. MoSGrid has embedded chemical simulation
      portal server.                                                     tools for Molecular Dynamics and Quantum Mechanics in workflow
                                                                         recipes and allows an easy access to this instruments over the
5.2     XtreemFS                                                         portal. With the ongoing integration of WS-PGRADE the creation
XtreemFS is a distributed Grid and Cloud file system. The
advantages of using XtreemFS are

    • the ability to minimize data transfers especially between portal
      server and the grid clusters,
    • the ability to manage replicated data for redundancy reasons,
      and
    • the Grid Security Infrastructure (GSI) based authorization and
      authentication.

The requirement for deploying XtreemFS in MoSGrid is its
seamless integration with UNICORE, WS-PGRADE and the
software stack on D-Grid clusters.
   XtreemFS is an object-based file system, i.e., file data and
metadata are stored on different servers (Figure 4). The object
storage devices (OSDs) store the contents of a file split into fixed
                                                                         Fig. 4. The XtreemFS Architecture.
size chunks of data (the objects). The metadata and replica catalogs



4
                                                                                                                   MoSGrid - Current Innovations



of workflows will be eased and interoperability of MoSGrids portal     REFERENCES
solution will be extended.                                              [1]O. Niehörster, G. Birkenheuer, A. Brinkmann, B. Elsässer, D. Blunk, S. Herres-
                                                                           Pawlis, J. Krüger, J.Niehörster, L.Packschies, and G. Fels. Providing scientific
WS-PGRADE: At the moment, users can upload executables (e.g.               software as a service in consideration of service level agreements. 2009.
shell scripts) for the configuration of single steps in WS-PGRADE.      [2]Georg Birkenheuer, Sebastian Breuers, Andé Brinkmann, Dirk Blunk, Gregor
MoSGrid plans to integrate the Incarnation Database (IDB) of               Fels, Sandra Gesing, Sonja Herres-Pawlis, Oliver Kohlbacher, Jens Krüger, and
UNICORE to further simplify the users’ interaction with the portal.        Lars Packschies. Grid-Workflows in Molecular Science. In Proceedings of the
                                                                           Grid Workflow Workshop (GWW), February 2010.
The IDB contains information about the applications, which are          [3]Peter Kacsuk. P-GRADE portal family for grid infrastructures. Concurrency and
installed on the different available UNICORE target systems. The           Computation: Practice and Experience, 2011. in print.
list of available applications will be offered for selection in a       [4]Zoltan Farkas and Peter Kacsuk. P-GRADE Portal: a generic workflow system to
drop-down menu.                                                            support user communities. Future Generation Computer Systems, 2011. in print.
                                                                        [5]Inc. Liferay. Liferay. http://www.liferay.com.
MD: A portlet for the submission of Molecular Dynamics                  [6]Alejandro Abdelnur and Stefan Hepper. JSR 168: Portlet specification. http:
in MoSGrid is available. A connection to the UNICORE                       //www.jcp.org/en/jsr/detail?id=168, Oct 2003.
                                                                        [7]M.S. Nicklous and Stefan Hepper. JSR 286: Portlet specification 2.0. http:
workflow system allows the submission of single computations               //www.jcp.org/en/jsr/detail?id=286, June 2008.
or equilibration workflows. The next step of the development            [8]Sandra Gesing, Istvan Marton, Georg Birkenheuer, Bernd Schuller, Richard
in MD will be to replace the UNICORE workflow environment                  Grunzke, Jens Krüger, Sebastian Breuers, Dirk Blunk, Georg Fels, Lars
by the gUSE/WS-PGRADE workflow management system that is                   Packschies, Andre Brinkmann, Oliver Kohlbacher, and Miklos Kozlovszky.
                                                                           Workflow Interoperability in a Grid Portal for Molecular Simulations. In Roberto
interoperable with other Grid middleware or Cloud environments.
                                                                           Barbera, Giuseppe Andronico, and Giuseppe La Rocca, editors, Proceedings
QM: In order to allow for data exchange between different                  of the International Workshop on Science Gateways (IWSG10), pages 44–48.
                                                                           Consorzio COMETA, 2010.
computational chemistry suites, e.g. in the context of more complex     [9]UNICORE Tea. High Level API for Grid Applications. http://www.
workflows, a non-proprietary, highly structured data format is             unicore.eu/community/development/hila-reference.pdf,
required. The Chemical Markup Language (CML) [21, 22, 23], an              August 2010.
XML-based data format, seems a viable and promising approach.          [10]A. Streit, P. Bala, A. Beck-Ratzka, K. Benedyczak, S. Bergmann, R. Breu,
                                                                           J. M. Daivandy, B. Demuth, A. Eifer, A. Giesler, B. Hagemeier, V. Huber
   Currently, optimized molecular geometries from machine
                                                                           S. Holl, N. Lamla, D. Mallmann, A. S. Memon, M. S. Memon, M. Rambadt,
independent Gaussian outputs (e.g. formatted checkpoint files) can         M. Riedel, M. Romberg, B. Schuller, T. Schlauch, A. Schreiber, T. Soddemann,
be processed using the free Pybel [17] module. This module                 and W. Ziegler. Unicore 6 - recent and future advancements. JUEL-4319, February
provides Python bindings to the OpenBabel [18, 19] library and             2010.
enables to store the fundamental geometrical data in syntactically     [11]Martin Wewior, Lars Packschies, Dirk Blunk, Daniel Wickeroth, Klaus-Dieter
                                                                           Warzecha, Sonja Herres-Pawlis, Sandra Gesing, Sebastian Breuers, Jens Krüger,
valid CML files.                                                           Georg Birkenheuer, and Ulrich Lang. The MoSGrid Gaussian Portlet –
   The retrieval of further molecular properties, job-specific data,       Technologies for the Implementation of Portlets for Molecular Simulations.
and their subsequent processing to CML files is under investigation.       In Roberto Barbera, Giuseppe Andronico, and Giuseppe La Rocca, editors,
   The storage of both molecular data as well as computational             Proceedings of the International Workshop on Science Gateways (IWSG10), pages
                                                                           39–43. Consorzio COMETA, 2010.
recipes (i.e., workflows) in a common system-independent language
                                                                       [12]M. J. Frisch, G. W. Trucks, H. B. Schlegel, G. E. Scuseria, M. A. Robb, J. R.
and on a distributed file system, such as XtreemFS, will eventually        Cheeseman, J. A. Montgomery, Jr., T. Vreven, K. N. Kudin, J. C. Burant,
allow for cross-domain workflows (e.g. combinations of quantum             J. M. Millam, S. S. Iyengar, J. Tomasi, V. Barone, B. Mennucci, M. Cossi,
chemical calculations and Molecular Dynamics studies) and                  G. Scalmani, N. Rega, G. A. Petersson, H. Nakatsuji, M. Hada, M. Ehara,
moreover render a sustainable data management possible.                    K. Toyota, R. Fukuda, J. Hasegawa, M. Ishida, T. Nakajima, Y. Honda, O. Kitao,
                                                                           H. Nakai, M. Klene, X. Li, J. E. Knox, H. P. Hratchian, J. B. Cross, V. Bakken,
Distributed Data Management: SAML trust delegation support for             C. Adamo, J. Jaramillo, R. Gomperts, R. E. Stratmann, O. Yazyev, A. J. Austin,
                                                                           R. Cammi, C. Pomelli, J. W. Ochterski, P. Y. Ayala, K. Morokuma, G. A. Voth,
XtreemFS is currently being developed. It is planned to integrate
                                                                           P. Salvador, J. J. Dannenberg, V. G. Zakrzewski, S. Dapprich, A. D. Daniels, M. C.
XtreemFS and UNICORE at all participating resource providers               Strain, O. Farkas, D. K. Malick, A. D. Rabuck, K. Raghavachari, J. B. Foresman,
to further minimize data traffic overhead. Support for metadata is         J. V. Ortiz, Q. Cui, A. G. Baboul, S. Clifford, J. Cioslowski, B. B. Stefanov,
planned, thus being able to query for simulation results. Future           G. Liu, A. Liashenko, P. Piskorz, I. Komaromi, R. L. Martin, D. J. Fox, T. Keith,
work includes an applet for directly uploading simulation input            M. A. Al-Laham, C. Y. Peng, A. Nanayakkara, M. Challacombe, P. M. W. Gill,
                                                                           B. Johnson, W. Chen, M. W. Wong, C. Gonzalez, , and J. A. Pople. Gaussian 03,
data to XtreemFS and the direct access from the compute nodes to           Revision C.02, 2004. Gaussian, Inc., Wallingford CT.
XtreemFS, thus avoiding the portal and the frontend nodes for data     [13]TURBOMOLE V6.2 2010, a development of University of Karlsruhe and
transfer altogether.                                                       Forschungszentrum Karlsruhe GmbH, 1989-2007, TURBOMOLE GmbH, since
                                                                           2007. http://www.turbomole.com.
                                                                       [14]M. W. Schmidt, K. K. Baldridge, J.A. Boatz, S. T. Elbert, M.S. Gordon, J. H.
                                                                           Jensen, S. Koseki, N. Matsunaga, K. A. Nguyen, S. Su, T. L. Windus, M. Dupuis,
                                                                           and J.A. Montgomery. General Atomic and Molecular Electronic Structure
ACKNOWLEDGEMENT                                                            System. J. Comput. Chem., 14:1347–1363, 1993.
                                                                       [15]Mark S. Gordon and Michael W. Schmidt. Advances in electronic structure theory:
We would like to thank Bernd Schuller, István Márton, Miklos
                                                                           GAMESS a decade later. In C. E. Dykstra, G. Frenking, K. S. Kim, and G. E.
Kozlovszky, and Ákos Balaskó for the fruitful discussions and for        Scuseria, editors, Theory and Applications of Computational Chemistry: the first
the bug fixing for the UNICORE integration into WS-PGRADE.                 forty years, pages 1167–1189. Elsevier, Amsterdam, 2005.
                                                                       [16]The Python Language Reference.                  http://docs.python.org/
Funding: This work is supported by the German Ministry                     reference/, 2011.
of Education and Research under project grant #01IG09006               [17]Noel O’Boyle, Chris Morley, and Geoffrey Hutchison. Pybel: a Python wrapper
(MoSGrid) and by the European Commission FP7 Capacities                    for the OpenBabel cheminformatics toolkit. Chemistry Central Journal, 2(1):5–
                                                                           12, 2008.
Program under grant agreement nr RI-261556 (EDGI).


                                                                                                                                                           5
Birkenheuer et al



[18]Rajarshi Guha, Michael T. Howard, Geoffrey R. Hutchison, Peter Murray-Rust,    [21]Peter Murray-Rust and Henry S. Rzepa. Chemical Markup, XML, and the
    Henry Rzepa, Christoph Steinbeck, Jörg Wegner, and Egon L. Willighagen. The       Worldwide Web. 1. Basic Principles. Journal of Chemical Information and
    Blue Obelisk – Interoperability in Chemical Informatics. Journal of Chemical       Computer Sciences, 39(6):928–942, 1999.
    Information and Modeling, 46(3):991–998, 2006.                                 [22]Peter Murray-Rust and Henry S. Rzepa. Chemical Markup, XML and the World-
[19]Open Babel: The Open Source Chemistry Toolbox. http://openbabel.                   Wide Web. 2. Information Objects and the CMLDOM. Journal of Chemical
    org/, 2011.                                                                        Information and Computer Sciences, 41(5):1113–1123, 2001.
[20]Felix Hupfeld, Toni Cortes, Bjrn Kolbeck, Jan Stender, Erich Focht, Matthias   [23]Peter Murray-Rust and Henry S. Rzepa. Chemical Markup, XML, and the World
    Hess, Jesus Malo, Jonathan Marti, and Eugenio Cesario. The xtreemfs                Wide Web. 4. CML Schema. Journal of Chemical Information and Computer
    architecturea case for object-based file systems in grids. Concurrency and         Sciences, 43(3):757–772, 2003.
    Computation: Practice and Experience, 20(17):20492060, 2008.




6