=Paper=
{{Paper
|id=None
|storemode=property
|title=MoSGrid: Progress of Workflow driven Chemical Simulations
|pdfUrl=https://ceur-ws.org/Vol-826/paper02.pdf
|volume=Vol-826
}}
==MoSGrid: Progress of Workflow driven Chemical Simulations
==
Grid Workflow Workshop 2011 GWW 2011
MoSGrid: Progress of Workflow driven Chemical
Simulations
Georg Birkenheuer1∗, Dirk Blunk2 , Sebastian Breuers2 , André Brinkmann1 ,
Gregor Fels3 , Sandra Gesing4 , Richard Grunzke5 , Sonja Herres-Pawlis6 ,
Oliver Kohlbacher4 , Jens Krüger3 , Ulrich Lang7 , Lars Packschies7 ,
Ralph Müller-Pfefferkorn5 , Patrick Schäfer8 , Johannes Schuster1 ,
Thomas Steinke8 , Klaus-Dieter Warzecha7 , and Martin Wewior7
1
Paderborn Center for Parallel Computing, Universität Paderborn.
2
Department für Chemie, Universität zu Köln.
3
Department Chemie, Universität Paderborn.
4
Zentrum für Bioinformatik, Eberhard-Karls-Universität Tübingen.
5
Zentrum für Informationsdienste und Hochleistungsrechnen, Technische Universität Dresden.
6
Fakultät Chemie, Technische Universität Dortmund.
7
Regionales Rechenzentrum, Universität zu Köln.
8
Konrad-Zuse-Institut für Informationstechnik Berlin.
ABSTRACT Parallel to the extension of WS-PGRADE and gUSE for generic
Motivation: Web-based access to computational chemistry grid workflows, MoSGrid started to implement intuitive portlets for the
resources has proven to be a viable approach to simplify the use orchestration of specific workflows. The workflow application for
of simulation codes. The introduction of recipes allows to reuse Molecular Dynamics is described in Section 3, followed by the
already developed chemical workflows. By this means, workflows workflow application for Quantum Mechanics in Section 4. Section
for recurring basic compute jobs can be provided for daily services. 5 covers the distributed data management for the workflow systems.
Nevertheless, the same platform has to be open for active workflow
development by experienced users. This paper provides an overview
of recent developments of the MoSGrid project on providing tools and
instruments for building workflow recipes. 2 THE WORKFLOW-ENABLED GRID PORTAL IN
Contact: birke@uni-paderborn.de MOSGRID
The MoSGrid portal is developed on top of WS-PGRADE [3, 4], a
workflow-enabled grid portal. The chosen WS-PGRADE version is
1 INTRODUCTION based on the open-source portal framework Liferay [5] and supports
the standards JSR168 [6] and its successor JSR286 [7]. The choice
The BMBF funded MoSGrid project supports the computational
of using these standards assures the sustainability of the developed
chemistry community with an easy access to powerful compute
portlets.
resources. The developed MoSGrid portal1 offers access to
Users are enabled to create, change, invoke, and monitor
molecular simulation codes available on the German Grid resources.
workflows via WS-PGRADE, which contains a graphical workflow
Chemical scientists are supported with instruments to handle and
editor. WS-PGRADE functions as a highly flexible graphical user
orchestrate complex molecular simulation methods.
interface for the grid User Support Environment (gUSE). This
The complexity of the various chemical recipes projected by
virtualization environment provides a set of services for distributed
the simulations are mapped to workflows. Commonly used simple
computing infrastructures, where developers and end users
workflows can be accessed and directly used by the users. A
can share sophisticated workflows, workflow graphs, workflow
workflow editor based on WS-PGRADE allows users to develop,
templates, and workflow applications via a repository (cf. figure 1).
improve, and publish complex workflow constructs. First results are
gUSE contains a data-driven workflow engine and the
presented in [1, 2].
dependencies of single steps in a workflow are represented by
In this paper, we describe the recent developments for the creation
their connections between their input and output. The workflow
of the MoSGrid infrastructure and the embedded workflow system.
engine facilitates the management of workflows, which are based
We start with a description of the integration of the gUSE workflow
on directed acyclic graphs (DAGs). DAGs allow following workflow
system from WS-PGRADE into the MoSGrid portal in Section 2.
constructs:
∗ to whom correspondence should be addressed
• Steps: A single step in the workflow describes a job with its
1 Access to the MoSGrid portal: http://mosgrid.de/portal parameters, the input and the output.
Copyright c 2011 for the individual papers by the papers’ authors. Copying permitted only for private and academic purposes. 1
Birkenheuer et al
users to submit jobs to UNICORE [10] or to local resources. Hence,
short pre-processing and/or post-processing steps of the workflow
User interface
can be invoked on the gUSE server and the resource-consuming
WS-PGRADE steps can be invoked in grid infrastructures.
3 WORKFLOW APPLICATION BY MOLECULAR
High-level DYNAMICS
middleware
service layer The MoSGrid project implemented the first portlet for Molecular
gUSE Dynamic (MD) simulations, (i) a submission interface for Gromacs
portable run input (tpr) files and (ii) a molecular equilibration
for protein simulations in explicit water are available. Figure 2
Grid resources shows a screenshot of the portal user interface for the latter protein
middleware layer
UNICORE 6 simulation. The design of the portlet allows an easy integration of
further workflows for chemical recipes. The Gromacs simulation
toolkit supports calculations with many instruments, which have to
be orchestrated by a workflow system.
Fig. 1. The gUSE Architecture. The UNICORE 6 infrastructure with an embedded workflow
engine was already available for the implementation. Unfortunately,
the use of this infrastructure was limited to the UNICORE
Rich Client or the UCC. Already available APIs for accessing
• Conditions: The workflow engine uses conditions to select the the UNICORE infrastructure like HiLA did not support the full
next step to be processed. workflow functionality.
• Splits: A split is unconditional and delivers data for the next In order to use the UNICORE workflow infrastructure for the
parallel steps. MoSGrid MD portlet, we decided to implement a solution named
• Joins: A join is executed after parallel steps are finished and UccAPI. The API has been implemented using the abilities of the
have delivered their output. UCC client. This strategy allows us to use well-tested code. We
decided to submit even singular jobs as workflows through the MD
Loops can be represented by parameter sweeps, which allow to portlet. This design decision eases the creation of the UccAPI, since
specify varying parameters for single steps or workflows. Hence, the only the workflow dependent commands needed to be implemented.
workflow engine invokes the same job multiple times with different However, UccAPI does not need a separated UCC client,
parameters. Furthermore, the user can integrate generator steps and all necessary functionality is provided either embedded in UCC
collector steps. A generator step produces multiple datasets, which libraries or has been adapted, because of some parameters hidden
are presented in the workflow graph as one output file. Hence, the deeply in the source-code.
input of the corresponding collector step is presented in the graph as Other extensions to the UCC were necessary, because it was
one input file and internally the multiple files included in the input designed to be used as stand-alone application and not as supporting
are processed by the collector step. library. Some of the extensions implemented the error handling.
The workflow engine encapsulates the single steps and invokes
so-called submitters (Java-based applications) for each job. Via
these submitters, gUSE offers the possibility to submit jobs to grid
middleware environments like Globus Toolkit and gLite, desktop
grids, clouds and clusters, and unique web-services. MoSGrid
extends the features of gUSE by integrating UNICORE 6 [8]. The
submitter has to provide the following methods:
• actionJobSubmit: Submission of a job including authentication,
authorization, and data staging
• actionJobAbort: Cancel a job
• actionJobOutput: Get the output of a job
• actionJobStatus: Query the status of a job
• actionJobResource: Return the resource, where the job was
submitted to
The developed UNICORE submitter is based on the UCC
(UNICORE commandline client) libraries. In contrast to program-
ming interfaces like HiLA, the UCC libraries allow to process Fig. 2. A screenshot of the MD portlet.
UNICORE workflows [9]. The adapted WS-PGRADE portal allows
2
MoSGrid - Current Innovations
The reason is that most errors are difficult to trace, because they In contrast to its major competitors, the Gaussian program
occur only in complex test scenarios. Static variables were another package does not allow to independently call separate executables
challenge as UCC is designed to be used once for one job and not for different tasks. Instead, a single executable parses the input file,
for multiple executions in parallel threads. This means that some determines the requested calculations and devises a strategy to call
static variables are wrongly set from the last submission or are a series of subroutines, known as links. This intrinsic workflow
not properly synchronized. This weakness raises the risk of errors concept, although seemingly comfortable for the end user, renders
during multiple and parallel execution. a direct override or a more granular control on the calculation level
The user of the MD portlet has most likely no UNICORE server difficult.
installation available. Therefore, file uploads need to be handled While the multi-step job option of Gaussian allows the
separately. The upload of the files is processed as follows. Firstly, concatenation of compute jobs reusing previous results, workflows
the user uploads the data for the workflow. Then, an appropriate beyond this stage can only be achieved via external control, as
number of compute nodes has to be chosen, the simulation length realized in the Quantum Mechanics-Portlet described here.
has to be set, and it has to be defined how many nanoseconds the This workflow, while at present directly implemented in the
job should simulate. At last, according to this input the configuration portlet, will eventually be managed through the facilities provided
files were adapted. by WS-PGRADE.
The workflow description of UNICORE does not include a data Typically, the workflow for a quantum chemical calculation with
stage-in from the client. The solution for the first prototype is to Gaussian, as implemented in the portlet, consists of three phases,
transparently define the input files in a separate job-file and transfer namely a pre-processing phase, the execution of the job, and a post-
them by a predecessor job from the portlet to the UNICORE global processing phase.
storage. At a later stage of the project, an XtreemFS connection In the pre-processing phase, two basic workflows are currently
should avoid this kind of data stage-in, as described in Section 5. available. A user may opt to upload a valid Gaussian input file
The UCC API represents workflows by the use of end point previously prepared. In this workflow, no limits regarding rare
references (EPRs). This is a well designed standard for service keywords or exotic options exist. Less experienced users will
identification but does not improve readability. To avoid confusing however prefer the assistance provided by a graphical user interface
users, it was decided to hide the EPRs and instead to show only in the second workflow. Here, jobs may be configured by choosing
conclusive workflow names. An exemplary name is the combination among reasonable parameters (e.g. basis sets) and tasks. Once
of user name, time stamp, and workflow recipe. created, these jobs are open to further editing and adjustment of
parameters prior to submission.
The job execution is realized using the UNICORE command line
4 WORKFLOW APPLICATION BY QUANTUM client (UCC). A simple wrapper class encapsulates the UCC tool
MECHANICS and emulates basic user interaction while providing the client’s
messages. This approach was chosen until a more sophisticated
With respect to a previous user survey in the MoSGrid community,
solution becomes available. This could have been the library which
the first prototype of the Quantum Mechanics-Portlet [11] aimed for
was developed later along with the MD-Portlet or, as it now is
the implementation of Workflows for the Gaussian [12] suite, while
available, the framework provided by WS-PGRADE.
the support for Turbomole [13], GAMESS-US [14, 15] and other
The results of a successful Gaussian run are retrieved and stored
relevant quantum chemical suites was shifted to a later time.
on the portal server. The portlet initiates different post-processing
scripts.
Early attempts to process the Gaussian log file via shell scripting
using tools from the Unix tool chain (e.g. grep tr, etc.) turned out
to be tedious, ineffective and where thus soon replaced by scripts
written in Python [16].
The Python scripts currently operative were written in close co-
operation with colleagues from the chemical community to match
their specific requirements. The total energy of each step in the
course of a geometry optimisation is routinely parsed from log files.
In addition to this common task in quantum chemical calculations,
thermochemical data, infrared and raman absorption spectra, as well
as the outcome of natural bond order (NBO) analysis are retrieved.
These results are stored to platform-independent csv files,
Fig. 3. A screenshot of the QM-Portlet. displayed in the portlet and made available for download.
At the current stage, optimized molecular geometries can
be retrieved through Python scripts from machine independent
Gaussian compute jobs are described in a single input file, which Gaussian outputs (e.g. formatted checkpoint files) with the use of
includes the molecular geometry and a route card that defines the the free Pybel [17] module, which provides Python bindings to the
task. OpenBabel [18, 19] library.
During a calculation, the progress is written to an unstructured
output stream, the log file. In addition, calculated molecular
properties are stored in a platform-dependent binary checkpoint file.
3
Birkenheuer et al
5 DISTRIBUTED DATA MANAGEMENT IN (MRCs) contain the filename, unique file identifier, owner and
WORKFLOWS WITH XTREEMFS AND UNICORE directory tree, for example. XtreemFS provides packages for the
most common Linux distributions and a Filesystem in Userspace-
5.1 Data Flow
based (FUSE) client for seamless integration. The client acts like a
Data management is an integral part of the MoSGrid portal. The data local file system by translating POSIX file system calls into requests
flow within a MoSGrid workflow originally passed four sites: to the MRCs and OSDs. XtreemFS seems to be a large, secure and
replicated local file system for its users.
1. the WS-PGRADE portal in Tübingen, XtreemFS provides custom pluggable security policies, X.509
2. the ZIH in Dresden being resource provider with the (proxy-)certificates, Globus gridmap and UNICORE UUDB files.
UNICORE 6 middleware, Furthermore, XtreemFS implements POSIX access rights and ACLs
3. frontend nodes of the D-Grid clusters and based on the distinguished name (DN) entries of the user X.509
certificates for authorization.
4. compute nodes within a D-Grid cluster.
The support of GSI enables us to seamlessly integrate XtreemFS
with the portal and UNICORE.
The simulation results were propagated using the reverse path.
When dealing with large-scale or a large number of jobs, this
introduces a lot of network traffic between clusters and the portal, 5.3 The Integration of XtreemFS in UNICORE and the
which will eventually become a bottleneck. Portal
In order to safeguard the scientific data and provide a distributed Integration in WS-PGRADE: A JSR286 compliant portlet,
access all data is stored redundantly in the Grid file system deployed in WS-PGRADE, provides simple access to XtreemFS
XtreemFS [20]. Compared to the original approach, when uploading using a web browser. It manages the authentication and the upload
data for a simulation, XtreemFS handles the placement of replicas of simulation input data to XtreemFS.
of the data at the according cluster instead of using less efficient
data transfers via UNICORE. Additionally, the portal server does
Integration in the UNICORE 6 middleware: XtreemFS is mounted
not need to store any data. However, uploads and downloads of
on the frontend node of each grid cluster using the node’s host
simulation data are still realized via the portal.
certificate and the UNICORE UUDB, which is a mapping of DN
In a first step the data flow using XtreemFS is as follows:
entries or full public keys of certificates to local system users. On
this node the UNICORE Target System Interface (TSI) is installed,
• Input data flow: A user uploads simulation input data to the
which communicates with the batch system and makes storage
Grid file system using a web interface on the portal server. The
available. By integrating XtreemFS with the TSI the data becomes
workflow engine propagates the location of these files within
available through UNICORE. Data transfers in the MoSGrid context
XtreemFS to the UNICORE 6 middleware, which then takes
are now mediated by the UNICORE 6 middleware.
care of transferring the files from the frontend node to the
compute nodes of a cluster.
• Output data flow: The compute nodes produce simulation
results, which will be passed to XtreemFS by the UNICORE 6 6 CONCLUSIONS AND FUTURE WORK
middleware at the end of a simulation. Finally, the data is This paper has shown an overview of recent developments of the
available for access by the user via the web interface on the MoSGrid project. MoSGrid has embedded chemical simulation
portal server. tools for Molecular Dynamics and Quantum Mechanics in workflow
recipes and allows an easy access to this instruments over the
5.2 XtreemFS portal. With the ongoing integration of WS-PGRADE the creation
XtreemFS is a distributed Grid and Cloud file system. The
advantages of using XtreemFS are
• the ability to minimize data transfers especially between portal
server and the grid clusters,
• the ability to manage replicated data for redundancy reasons,
and
• the Grid Security Infrastructure (GSI) based authorization and
authentication.
The requirement for deploying XtreemFS in MoSGrid is its
seamless integration with UNICORE, WS-PGRADE and the
software stack on D-Grid clusters.
XtreemFS is an object-based file system, i.e., file data and
metadata are stored on different servers (Figure 4). The object
storage devices (OSDs) store the contents of a file split into fixed
Fig. 4. The XtreemFS Architecture.
size chunks of data (the objects). The metadata and replica catalogs
4
MoSGrid - Current Innovations
of workflows will be eased and interoperability of MoSGrids portal REFERENCES
solution will be extended. [1]O. Niehörster, G. Birkenheuer, A. Brinkmann, B. Elsässer, D. Blunk, S. Herres-
Pawlis, J. Krüger, J.Niehörster, L.Packschies, and G. Fels. Providing scientific
WS-PGRADE: At the moment, users can upload executables (e.g. software as a service in consideration of service level agreements. 2009.
shell scripts) for the configuration of single steps in WS-PGRADE. [2]Georg Birkenheuer, Sebastian Breuers, Andé Brinkmann, Dirk Blunk, Gregor
MoSGrid plans to integrate the Incarnation Database (IDB) of Fels, Sandra Gesing, Sonja Herres-Pawlis, Oliver Kohlbacher, Jens Krüger, and
UNICORE to further simplify the users’ interaction with the portal. Lars Packschies. Grid-Workflows in Molecular Science. In Proceedings of the
Grid Workflow Workshop (GWW), February 2010.
The IDB contains information about the applications, which are [3]Peter Kacsuk. P-GRADE portal family for grid infrastructures. Concurrency and
installed on the different available UNICORE target systems. The Computation: Practice and Experience, 2011. in print.
list of available applications will be offered for selection in a [4]Zoltan Farkas and Peter Kacsuk. P-GRADE Portal: a generic workflow system to
drop-down menu. support user communities. Future Generation Computer Systems, 2011. in print.
[5]Inc. Liferay. Liferay. http://www.liferay.com.
MD: A portlet for the submission of Molecular Dynamics [6]Alejandro Abdelnur and Stefan Hepper. JSR 168: Portlet specification. http:
in MoSGrid is available. A connection to the UNICORE //www.jcp.org/en/jsr/detail?id=168, Oct 2003.
[7]M.S. Nicklous and Stefan Hepper. JSR 286: Portlet specification 2.0. http:
workflow system allows the submission of single computations //www.jcp.org/en/jsr/detail?id=286, June 2008.
or equilibration workflows. The next step of the development [8]Sandra Gesing, Istvan Marton, Georg Birkenheuer, Bernd Schuller, Richard
in MD will be to replace the UNICORE workflow environment Grunzke, Jens Krüger, Sebastian Breuers, Dirk Blunk, Georg Fels, Lars
by the gUSE/WS-PGRADE workflow management system that is Packschies, Andre Brinkmann, Oliver Kohlbacher, and Miklos Kozlovszky.
Workflow Interoperability in a Grid Portal for Molecular Simulations. In Roberto
interoperable with other Grid middleware or Cloud environments.
Barbera, Giuseppe Andronico, and Giuseppe La Rocca, editors, Proceedings
QM: In order to allow for data exchange between different of the International Workshop on Science Gateways (IWSG10), pages 44–48.
Consorzio COMETA, 2010.
computational chemistry suites, e.g. in the context of more complex [9]UNICORE Tea. High Level API for Grid Applications. http://www.
workflows, a non-proprietary, highly structured data format is unicore.eu/community/development/hila-reference.pdf,
required. The Chemical Markup Language (CML) [21, 22, 23], an August 2010.
XML-based data format, seems a viable and promising approach. [10]A. Streit, P. Bala, A. Beck-Ratzka, K. Benedyczak, S. Bergmann, R. Breu,
J. M. Daivandy, B. Demuth, A. Eifer, A. Giesler, B. Hagemeier, V. Huber
Currently, optimized molecular geometries from machine
S. Holl, N. Lamla, D. Mallmann, A. S. Memon, M. S. Memon, M. Rambadt,
independent Gaussian outputs (e.g. formatted checkpoint files) can M. Riedel, M. Romberg, B. Schuller, T. Schlauch, A. Schreiber, T. Soddemann,
be processed using the free Pybel [17] module. This module and W. Ziegler. Unicore 6 - recent and future advancements. JUEL-4319, February
provides Python bindings to the OpenBabel [18, 19] library and 2010.
enables to store the fundamental geometrical data in syntactically [11]Martin Wewior, Lars Packschies, Dirk Blunk, Daniel Wickeroth, Klaus-Dieter
Warzecha, Sonja Herres-Pawlis, Sandra Gesing, Sebastian Breuers, Jens Krüger,
valid CML files. Georg Birkenheuer, and Ulrich Lang. The MoSGrid Gaussian Portlet –
The retrieval of further molecular properties, job-specific data, Technologies for the Implementation of Portlets for Molecular Simulations.
and their subsequent processing to CML files is under investigation. In Roberto Barbera, Giuseppe Andronico, and Giuseppe La Rocca, editors,
The storage of both molecular data as well as computational Proceedings of the International Workshop on Science Gateways (IWSG10), pages
39–43. Consorzio COMETA, 2010.
recipes (i.e., workflows) in a common system-independent language
[12]M. J. Frisch, G. W. Trucks, H. B. Schlegel, G. E. Scuseria, M. A. Robb, J. R.
and on a distributed file system, such as XtreemFS, will eventually Cheeseman, J. A. Montgomery, Jr., T. Vreven, K. N. Kudin, J. C. Burant,
allow for cross-domain workflows (e.g. combinations of quantum J. M. Millam, S. S. Iyengar, J. Tomasi, V. Barone, B. Mennucci, M. Cossi,
chemical calculations and Molecular Dynamics studies) and G. Scalmani, N. Rega, G. A. Petersson, H. Nakatsuji, M. Hada, M. Ehara,
moreover render a sustainable data management possible. K. Toyota, R. Fukuda, J. Hasegawa, M. Ishida, T. Nakajima, Y. Honda, O. Kitao,
H. Nakai, M. Klene, X. Li, J. E. Knox, H. P. Hratchian, J. B. Cross, V. Bakken,
Distributed Data Management: SAML trust delegation support for C. Adamo, J. Jaramillo, R. Gomperts, R. E. Stratmann, O. Yazyev, A. J. Austin,
R. Cammi, C. Pomelli, J. W. Ochterski, P. Y. Ayala, K. Morokuma, G. A. Voth,
XtreemFS is currently being developed. It is planned to integrate
P. Salvador, J. J. Dannenberg, V. G. Zakrzewski, S. Dapprich, A. D. Daniels, M. C.
XtreemFS and UNICORE at all participating resource providers Strain, O. Farkas, D. K. Malick, A. D. Rabuck, K. Raghavachari, J. B. Foresman,
to further minimize data traffic overhead. Support for metadata is J. V. Ortiz, Q. Cui, A. G. Baboul, S. Clifford, J. Cioslowski, B. B. Stefanov,
planned, thus being able to query for simulation results. Future G. Liu, A. Liashenko, P. Piskorz, I. Komaromi, R. L. Martin, D. J. Fox, T. Keith,
work includes an applet for directly uploading simulation input M. A. Al-Laham, C. Y. Peng, A. Nanayakkara, M. Challacombe, P. M. W. Gill,
B. Johnson, W. Chen, M. W. Wong, C. Gonzalez, , and J. A. Pople. Gaussian 03,
data to XtreemFS and the direct access from the compute nodes to Revision C.02, 2004. Gaussian, Inc., Wallingford CT.
XtreemFS, thus avoiding the portal and the frontend nodes for data [13]TURBOMOLE V6.2 2010, a development of University of Karlsruhe and
transfer altogether. Forschungszentrum Karlsruhe GmbH, 1989-2007, TURBOMOLE GmbH, since
2007. http://www.turbomole.com.
[14]M. W. Schmidt, K. K. Baldridge, J.A. Boatz, S. T. Elbert, M.S. Gordon, J. H.
Jensen, S. Koseki, N. Matsunaga, K. A. Nguyen, S. Su, T. L. Windus, M. Dupuis,
and J.A. Montgomery. General Atomic and Molecular Electronic Structure
ACKNOWLEDGEMENT System. J. Comput. Chem., 14:1347–1363, 1993.
[15]Mark S. Gordon and Michael W. Schmidt. Advances in electronic structure theory:
We would like to thank Bernd Schuller, István Márton, Miklos
GAMESS a decade later. In C. E. Dykstra, G. Frenking, K. S. Kim, and G. E.
Kozlovszky, and Ákos Balaskó for the fruitful discussions and for Scuseria, editors, Theory and Applications of Computational Chemistry: the first
the bug fixing for the UNICORE integration into WS-PGRADE. forty years, pages 1167–1189. Elsevier, Amsterdam, 2005.
[16]The Python Language Reference. http://docs.python.org/
Funding: This work is supported by the German Ministry reference/, 2011.
of Education and Research under project grant #01IG09006 [17]Noel O’Boyle, Chris Morley, and Geoffrey Hutchison. Pybel: a Python wrapper
(MoSGrid) and by the European Commission FP7 Capacities for the OpenBabel cheminformatics toolkit. Chemistry Central Journal, 2(1):5–
12, 2008.
Program under grant agreement nr RI-261556 (EDGI).
5
Birkenheuer et al
[18]Rajarshi Guha, Michael T. Howard, Geoffrey R. Hutchison, Peter Murray-Rust, [21]Peter Murray-Rust and Henry S. Rzepa. Chemical Markup, XML, and the
Henry Rzepa, Christoph Steinbeck, Jörg Wegner, and Egon L. Willighagen. The Worldwide Web. 1. Basic Principles. Journal of Chemical Information and
Blue Obelisk – Interoperability in Chemical Informatics. Journal of Chemical Computer Sciences, 39(6):928–942, 1999.
Information and Modeling, 46(3):991–998, 2006. [22]Peter Murray-Rust and Henry S. Rzepa. Chemical Markup, XML and the World-
[19]Open Babel: The Open Source Chemistry Toolbox. http://openbabel. Wide Web. 2. Information Objects and the CMLDOM. Journal of Chemical
org/, 2011. Information and Computer Sciences, 41(5):1113–1123, 2001.
[20]Felix Hupfeld, Toni Cortes, Bjrn Kolbeck, Jan Stender, Erich Focht, Matthias [23]Peter Murray-Rust and Henry S. Rzepa. Chemical Markup, XML, and the World
Hess, Jesus Malo, Jonathan Marti, and Eugenio Cesario. The xtreemfs Wide Web. 4. CML Schema. Journal of Chemical Information and Computer
architecturea case for object-based file systems in grids. Concurrency and Sciences, 43(3):757–772, 2003.
Computation: Practice and Experience, 20(17):20492060, 2008.
6