3rd International Workshop on Science Gateways for Life Sciences (IWSG 2011), 8-10 JUNE 2011 Granular Security for a Science Gateway in Structural Bioinformatics Sandra Gesing1∗, Richard Grunzke2∗ , Ákos Balaskó3 , Georg Birkenheuer4 , Dirk Blunk5 , Sebastian Breuers5 , André Brinkmann4 , Gregor Fels6 , Sonja Herres-Pawlis7 , Peter Kacsuk3 , Miklos Kozlovszky3 , Jens Krüger6 , Lars Packschies8 , Patrick Schäfer9 , Bernd Schuller10 , Johannes Schuster4 , Thomas Steinke9 , Anna Szikszay Fabri3 , Martin Wewior8 , Ralph Müller-Pfefferkorn2 , and Oliver Kohlbacher1 1 Zentrum für Bioinformatik, Eberhard-Karls-Universität Tübingen, Germany. 2 Zentrum für Informationsdienste und Hochleistungsrechnen, Technische Universität Dresden, Germany. 3 MTA SZTAKI, Computer and Automation Research Institute, Hungarian Academy of Sciences, Budapest, Hungary. 4 Paderborn Center for Parallel Computing, Universität Paderborn, Germany. 5 Department für Chemie, Universität zu Köln, Germany. 6 Department Chemie, Universität Paderborn, Germany. 7 Fakultät Chemie, TU Dortmund, Germany. 8 Regionales Rechenzentrum, Universität zu Köln, Germany. 9 Konrad-Zuse-Institut für Informationstechnik Berlin, Germany. 10 Forschungszentrum Jülich, Germany. ABSTRACT 1 INTRODUCTION Structural Bioinformatics is concerned with computational methods Structural bioinformatics and computational chemistry have become for the analysis and modeling of three-dimensional molecular indispensable tools in many fields of biomedical research. structures. There is a plethora of computational tools available to work Molecular dynamics methods, quantum chemical methods, and with structural data on a large scale. Using these tools on distributed protein-ligand docking provide deep insights into the structure computing infrastructures (DCI), however, is often hampered by a of biomolecules and their interactions and are thus essential lack of suitable interfaces. The MoSGrid (Molecular Simulation Grid) tools in such diverse areas as materials science and drug design. science gateway provides an intuitive user interface to several widely- While very powerful, most of the tools and applications used for used tools in structural bioinformatics. It ensures the confidentiality, computational chemistry calculations reflect the complexity of the integrity and availability of data via a granular security concept which underlying scientific theories. Using these tools thus requires a lot of covers all layers of the infrastructure. The concept applies SAML experience. Their usability is seriously lacking and thus frequently (Security Assertion Markup Language) and allows trust delegation deters novice users. from the user interface layer across the high-level middleware layer The computational complexity of these theories make the and the grid middleware layer down to the HPC facilities. SAML according tools ideal candidates for high-performance computing assertions had to be integrated into the MoSGrid infrastructure in infrastructures [1]. However, this has become one of the biggest several places: the workflow-enabled grid portal WS-PGRADE, the challenges for quite a number of scientists, since powerful compute gUSE (grid User Support Environment) DCI services, and the cloud resources may not be easily usable for everyone. Here, DCIs come file system XtreemFS. The security infrastructure presented here into play. allows single sign-on and thus lowers the hurdle for users to utilize These issues, complexity of theory and tools as well as limited large HPC infrastructures for structural bioinformatics. access to high performance infrastructures, have been in focus when Contact: sandra.gesing@uni-tuebingen.de, the MoSGrid (Molecular Simulation Grid) project was conceived. It richard.grunzke@tu-dresden.de is part of the German Grid Initiative (D-Grid) and is designed to address the requirements of both commercial and academic users. MoSGrid offers a science gateway for the computational chemistry community, providing easy access to tools from the field of quantum chemistry, molecular dynamics, and docking. Currently, the MoSGrid community consists of about 110 users or working ∗ to whom correspondence should be addressed ∗ these authors contributed equally groups, respectively. At this stage, the science gateway is opened Copyright c 2011 for the individual papers by the papers’ authors. Copying permitted only for private and academic purposes. This volume is published and copyrighted by its editors. 3rd International Workshop on Science Gateways for Life Sciences (IWSG 2011), 8-10 JUNE 2011 for about 15 users from academia and industry whose feedback and can consume a considerable time scale, a large amount of data will demands are invaluable for the further development. It is planned to be generated in the course of these calculations [6]. offer the science gateway to the whole community in the near future. Novice and advanced users are enabled to run their sequences Sensitive Data in Research and Science Both in an academic and of work on grid resources. They are assisted by graphical user in an industrial context the most valuable goods being produced by interfaces with different levels of sophistication to accommodate structural bioinformatics is data. This data has to be stored reliably both user groups. Additionally, standard methods for specific in order to avoid data loss, but also securely in order to avoid problem classes are provided. MoSGrid provides a framework for unauthorized access to sensitive and valuable information. Keeping developing, storing and providing simple and complex workflows. that in mind, it is essential that the scientist has full control over Furthermore, users are enabled to collect and process results the access policies to all of his simulation data. With respect to a of calculations and more generally are provided with molecular collaborative work strategy, the option to share selected data with structures in databases. co-workers is also an essential feature. One has to differentiate what Having left the first prototypic state, developments in MoSGrid kind of data should be shared. The pure simulation data, such as continue to focus on the security requirements of the different intermediate molecular structures, raw trajectories, and unanalyzed communities. Distributed computing infrastructures are accessible energies is usually only of interest for closest collaborators. In by a number of users from different locations at the same time. The contrast, access has to be granted to a broader community if the broad user community has to be provided with an infrastructure that knowledge is published. protects their know how and molecular data by efficiently securing Within an academic environment to publish is a prerequisite it. before analyzed and approved data is shared with third parties. The MoSGrid science gateway lowers the barrier of utilizing HPC In collaboration with industry partners the focus shifts to other infrastructures and allows access to UNICORE [2] infrastructures priorities. Publications are out of question before a patent utilizing a single sign-on concept which applies SAML. This application is filed. In both areas a highly secure exchange of paper describes the recent developments in the MoSGrid security data including robust encryption and authentication techniques is infrastructure. Especially considering both the demands of academic immanent. and commercial users, the paper focuses on the integration and Another crucial requirement is a high degree of data persistence, interoperability of the employed components with respect to user i.e. protection from loss or inadvertent change of data. In regard to authentication and authorization and data security. this goal several requirements have to be met. The remainder of the paper is structured as follows. Section 2 The security demands in an industrial context comprise a introduces the background with the application domain and related multitude of details; (i) the data shall be transferred with robust work. The developments for the MoSGrid security infrastructure are encryption, (ii) the data shall not be visible or modifiable by third presented in Section 3 and Section 4 demonstrates domain specific parties, and (iii) the jobs and even their existence shall not be workflows utilizing the security infrastructure. transparent for third parties. In academic environments, the demands are different due to the more open and collaborative approach to work. This distributed approach generates different challenges. (i) A great degree of 2 BACKGROUND transparency in terms of versions and changes for all contributors is Some of the application cases of structural bioinformatics desired because the project data is handled like a ”living” document. and computational chemistry, in particular applications in (ii) When a project is highly distributed, simultaneous access to pharmaceutical industry, impose strict requirements on data security data can cause problems with naming schemes and versions as well in order to protect potential intellectual property. We will discuss as concurrency issues. (iii) During a long-term project, a mass of these issues briefly and then examine how a good level of security preliminary data is produced which cannot be stored forever. Hence, can be obtained while still providing a convenient single sign-on criteria for secure long-time archiving of data and also reliable access. erasing of data have to be evaluated. 2.1 Application Domain 2.2 Related Work Structural bioinformatics deals with the prediction and analysis Security is a key aspect for a science gateways [7] on top of of the structure, and the mechanisms of function of biological DCIs. Currently, the established basis for authentication in grid macromolecules [3], including proteins, nucleic acids [4], lipids middlewares (e.g., UNICORE, Globus Toolkit, gLite) are X.509 and sugars. Some major issues handled by this field are e. g., the certificates. The basic security concept includes offering single sign- improvement of drug targeting [5], the derivation of enzymatic on to users. It is a principle for access control in connected systems. design principles, or the development of computational models that The user has to authenticate himself just once and gains access to describe structure function relations. Knowledge is gained by both, all connected systems without the need for further authentication experimentally derived structures as well as computational models. procedures. Another main advantage is that the user does not have Regarding the computational methods, two fields have emerged to maintain several means of authentication, meaning no multiple among others: (i) quantum chemical calculations (QM) dealing with passwords for multiple systems or several certificates are required. the electronic structure of molecules and (ii) molecular dynamics Single sign-on relies on the principle of trust delegation with (MD) employing classical mechanics approaches. Since the target of which systems can be allowed to act on behalf of the user. It is the investigations are macromolecules and the processes of interest used, for instance, in workflow systems, where a whole workflow Copyright c 2011 for the individual papers by the papers’ authors. Copying permitted only for private and academic purposes. This volume is published and copyrighted by its editors. 3rd International Workshop on Science Gateways for Life Sciences (IWSG 2011), 8-10 JUNE 2011 consists of multiple jobs. Using trust delegation, a workflow engine is analogously used in WS-PGRADE and therefore the MoSGrid acting in the name of the user, submits the individual jobs to suitable science gateway. In both solutions the users are provided with an resources without further user interaction. This approach decouples intuitive user interface to create their credentials without the need to job submission and user interaction. use any command line invocations for generating credentials. To support single sign-on and thus trust delegation UNICORE 6 The GENIUS portal supports the concept of X.509-based robot uses the approach of explicit trust delegation (ETD) [8] in its certificates [13]. These are not associated with specific users but dynamic style [9]. It allows the dynamic creation of jobs in the with communities, applications or science gateways. The certificates name of the user, though the trust relationships are still static. are handed over to the users on smart cards, which demands card ETD advanced to its dynamic style offers increased flexibility readers connected to the users’ computers. Users are authenticated while maintaining robust security properties. The trust delegations via login and password in the GENIUS portal and are allowed assertions are encoded in SAML 2.0. It can contain several afterward to use DCIs via the smart card. This solution has two statements specifying the assertion in more detail. It also can be major drawbacks. First, the need for additional hardware on the chained, meaning that an entity acting on the user’s behalf can users’ side. Second, the duplicated additional effort for already delegate trust to yet another entity, which is then also able to act on implemented processes in grid security infrastructures, like mapping the user’s behalf. SAML trust delegation assertions offer important user distinguished names (DN) to local accounts on HPC facilities. security characteristics. They can be limited to one entity, to a The EU project EGI (European Grid Infrastructure) [14] specific validity time span, and to a trust chain of a maximum length. presented in October 2010 the result of a questionnaire about Furthermore, SAML is already supported by various single sign- requirements for authentication and authorization infrastructures on infrastructures (e.g., Shibboleth), which allow mapping of local for DCIs, which was answered by a number of projects from accounts to federated identities. different domains, e.g., biomedicine. One result was that the key Other grid middlewares like Globus Toolkit or gLite implement technologies include SAML and X.509 certificates and that the goal trust delegation via GSI (Grid Security Infrastructure) proxy is to bridge security domains by using for example Shibboleth. certificates. GSI is a specification for secure communication in a Since the MoSGrid science gateway already uses SAML, its security grid environment and is based on public key cryptography using infrastructure can be easily adapted to rely on Shibboleth for user certification authorities (CAs) and X.509 certificates. These proxy authentication instead of certificates. certificates have several disadvantages compared to trust delegation based on SAML. The proxy certificate is always transfered along with its private key which is extremely sensitive since anyone, who possesses it, can impersonate the user. To mitigate this problem, the 3 THE MOSGRID SECURITY INFRASTRUCTURE validity span is often severely limited which creates new problems. The MoSGrid security infrastructure consists of four layers: Furthermore, it is impossible to reconstruct each step of a trust chain the science gateway as intuitive user interface, the high-level build with proxy certificates. To lessen the problem of short validity middleware service layer including gUSE [15] (grid User Support time spans users can upload their certificate to MyProxy [10] servers Environment) and XtreemFS [16], the grid middleware layer with and periodically generate proxy certificates valid for a certain UNICORE and suitable HPC facilities in the D-Grid infrastructure duration of time. A MyProxy server also lessens certain security (see Fig. 1). risks, because the private keys do not have to be stored on every In general, a science gateway can be defined as a single point of machine used. However, it also creates news risks, because the entry to a set of tools for a specific application domain operating central servers have to be very well secured. Also it does not across organizational boundaries. We characterize a grid portal improve the security of GSI proxy certificates by itself. as a web-based science gateway utilizing grid infrastructures and Both approaches for trust delegation introduced above are based demanding solely a web browser on the user’s side. The workflow- on X.509 certificates, which demand that users go through a enabled grid portal WS-PGRADE [7, 17] (Web Services Parallel multistage application process to receive their user certificates. Grid Runtime and Developer Environment) is the basis for the Additionally, they have to create essential files from their certificates MoSGrid science gateway. The chosen WS-PGRADE version for the trust delegation. These procedures are time-consuming employs the open source portal framework Liferay [18], which and may discourage users to utilize DCIs. Therefore, several supports the JSR168 [19] standard and its successor JSR286 [20]. approaches are on the way to simplify the application process or Additionally, WS-PGRADE is the highly flexible graphical user to automatically generate the essential credential files. interface for gUSE. The latter provides a large set of services for The Java library GridCertLib [11] supports users of web-based the management of workflows in DCIs. science gateways by automatically obtaining X.509 certificates and XtreemFS is an object-based file system which supports using proxy certificates. The prerequisite is that the science gateway distribution of data up to a world-wide scale and allows simple has access to a SAML assertion of a previous successful Shibboleth access on local machines. Furthermore, the data availability is authentication. This library could be adapted for the use of SAML increased and the latency and network consumption reduced using assertions and employed in the MoSGrid science gateway in case its replica management. the D-Grid infrastructure will be extended for offering federated As a fully developed grid middleware, UNICORE is deployed identities based on Shibboleth. and used in a variety of settings. It consists of a full software stack A similar concept has been implemented by the UK project including clients, a gateway, system services, and components for SARoNGS [12]. However, the generation of a MyProxy certificate access to the actual computing or data resources. The latest version in the portal still needs the interaction of the users and a web is UNICORE 6, which is based on Web Services and particularly service which demands Shibboleth authentications. This mechanism the Web Service Resource Framework (WSRF) [21]. Copyright c 2011 for the individual papers by the papers’ authors. Copying permitted only for private and academic purposes. This volume is published and copyrighted by its editors. 3rd International Workshop on Science Gateways for Life Sciences (IWSG 2011), 8-10 JUNE 2011 User Interface High-Level Middleware Grid Middleware Layer HPC Facilities WS-PGRADE Service Layer UNICORE D-Grid User BFT Batch System gUSE Services User Certificate UNICORE UNICORE SAML SAML Target Atomic TLS Assertion Assertion System Services Interface XtreemFS File Proxy System Certificate Service Certificate Fig. 1. The MoSGrid Security Infrastructure MoSGrid employs security features of Liferay, extended WS- Novice users, in terms of being novice to structural bioinformatics PGRADE and gUSE, and is extending XtreemFS for the use tools, are classified as MoSGrid users. This role enables to choose of UNICORE and SAML assertions. These extensions affect pre-defined workflows to become acquainted to the tools and three major domains of security in DCIs: user and credential domain specific workflows. The latter are offered via intuitive management, workflow and job management, and distributed data graphical user interfaces which lowers the barrier for utilizing management. These are described in the following sections in detail. the tools as well as using them on high-performance computing facilities. The novice users are allowed to change input and parameters, to invoke and monitor workflows. The access rights are 3.1 User and Credential Management implemented as a community role for MoSGrid users. Liferay consists of a portlet container with default applications and The access to additional features for creating and changing a portal interface which is deployed inside an application server. workflows and for configuring settings for grid infrastructures is MoSGrid has chosen Apache Tomcat 6 [22] as the underlying granted via the MoSGrid advanced user role. application server. Finally, the administrators are additionally enabled to manage all Apache Tomcat handles the access control of users and programs credentials, users, organizations, and communities. to resources and the integrity of data during transfers via HTTP The presented user management implements solely the access to or HTTPS, respectively. Furthermore, the application server offers the MoSGrid features in WS-PGRADE (e.g., creation of workflows) role-based authorization modules and supports the login with but not the access to the underlying grid infrastructures. The latter’s user name and password. Liferay facilitates these modules and essential file on the users’ side is a security token generated via the extends the role-based authorization with more granular security certificate portlet of WS-PGRADE. mechanisms in the user management by providing organization, community, and group management. Organizations may present various divisions or various locations of a company and offer private 3.1.1 Managing Security Token with a Certificate Portlet Since and public accessible pages of the portal. In contrast, communities UNICORE allows access to underlying HPC facilities based on are designed for allowing access across organizational boundaries X.509 certificates, every user who wants to utilize UNICORE or to pages which are applicable for all users of a portal. infrastructures has to obtain an X.509 user certificate from an To meet the needs of the computational chemistry community, appropriate certificate authority (CA). the organization and the community management is utilized in To protect the user’s certificate it is fundamental to absolutely the MoSGrid science gateway. Hence, we implemented four main minimize its necessary transfers in the authentication process and roles via Liferay: guests, novice users, advanced users, and the locations where it has to be stored. administrators. WS-PGRADE achieves this goal by offering a certificate portlet Guests are characterized by lacking an account for the science for credential management without the need to upload personal gateway. However, they can obtain information about the project certificates to the portal server. The original version of the certificate and about essential steps for getting access to the MoSGrid science portlet solely supported proxy certificate based authentication via gateway and its features. Liferay offers the option that an account MyProxy servers. A new certificate portlet was created which can be created by an unknown user. As soon as users have a login provides features for diverse credential formats including SAML. created for the MoSGrid science gateway, they can apply for the In the near future, when XtreemFS fully supports SAML assertions, MoSGrid community membership via email and their accounts will the use of the less secure proxy certificates will be discarded in the be assigned to the MoSGrid user or MoSGrid advanced user role. MoSGrid science gateway. Copyright c 2011 for the individual papers by the papers’ authors. Copying permitted only for private and academic purposes. This volume is published and copyrighted by its editors. 3rd International Workshop on Science Gateways for Life Sciences (IWSG 2011), 8-10 JUNE 2011 Applets ensure that processed data remains on the user’s computer In general, gUSE submitters are Java-based applications developed and signed applets additionally use policy files to ensure the to provide authentication mechanisms and the management of single integrity of the processed data. Therefore, a signed applet has been jobs for a specific DCI. They implement the interface GridService integrated into the certificate portlet for generating SAML assertion of the workflow engine with methods for the management of jobs files locally on the user’s computer. For the generation of a SAML including authentication, authorization, and data-staging. assertion, the user only has to fill in the location of his certificate on gUSE offers various submitters for grid and cloud infrastructures, his computer, the corresponding password and the location on his desktop grids, and web services. In MoSGrid we have additionally computer where the generated assertion file should be stored (see developed the submitter for UNICORE 6 [23]. The submitter Fig. 2). The applet then automatically generates an assertion file utilizes the UCC (UNICORE commandline client) libraries, with the same validity as the user’s certificate. implements authentication with SAML assertions, and manages data-staging utilizing the secured BFT (Basic File Transfer) protocol of UNICORE. To authenticate a user with SAML assertions against a UNICORE infrastructure, the submitter requires access to three files: the SAML assertion file created via the certificate portlet, the X.509 certificate to which the trust delegation is issued by the user, and a truststore which includes the public keys of the CAs used in the UNICORE infrastructure. The first file is unique for each user, the second and the third are the same for all users of the MoSGrid science gateway. As soon as a user uploads his SAML assertion file via the certificate portlet to the portal server, the submitter is able to access the file. The public key of the MoSGrid science gateway is utilized by the certificate portlet to create the SAML assertion file. An administrator of the science gateway ensures that the X.509 certificate used for the trust delegation as well as the truststore is available for the submitter. Accordingly, the submitter uses Fig. 2. Trust delegation generation with integrated certificate portlet. these essential files to authenticate the user against the selected UNICORE middleware installation, which then checks whether the credentials are valid and authorizes the user or returns an error. Furthermore, the extended certificate portlet is adapted to Once a user is authenticated, the submitter creates a job on the simplify its use in MoSGrid. Users do not have to distinguish targeted UNICORE resource. As a result, UNICORE automatically between diverse options but are still enabled to use all relevant provides a job working directory on a HPC facility (USpace) which options regarding a SAML assertion file and its management, is solely accessible for the user who invoked the job. Currently, the e.g., generating, uploading, and deleting the assertion file. The submitter utilizes the BFT protocol for uploading or downloading uploaded SAML assertion file sets or will set the stage for the all files belonging to a job to or from the USpace. This mechanism authentication processes in UNICORE, XtreemFS, and the domain will be extended in the near future to apply the cloud file system specific portlets. XtreemFS for specified input and output files. 3.2 Workflow and Job Management 3.2.1 Application Specific Module gUSE provides a sophisticated The collaborative, community-oriented application development web-based way to create, configure, and execute grid applications environment of WS-PGRADE offers a graphical workflow editor on various types of DCIs. However, there is a demand to let the and enables the users to create, change, invoke, and monitor portal developers use features and functionalities of gUSE from workflows. The latter may contain jobs on local resources and portlets’ codes. The developers can focus on creating domain distributed resources in grid and cloud infrastructures. Existing specific portlets that are tailored especially for the applications workflows, workflow graphs, workflow templates, and sophisticated and for the users’ needs. The authentication on grid and cloud workflow applications can be shared via a local repository. infrastructures and the submission and monitoring of workflows WS-PGRADE allows to configure intuitively settings for various is handled by services of gUSE. Therefore, a new component is grid middlewares and corresponding resources. In the case of developed called ASM (Application Specific Module) that can be UNICORE 6, MoSGrid advanced users are enabled to add used as an API (Application Programming Interface). UNICORE registries which provide access to a number of Applications consist of workflows and corresponding parameters, infrastructures. MoSGrid users are enabled to choose the preferred input files and output files. Every application included in the local UNICORE registry out of a list of configured registries. However, repository of gUSE can be reused via a portlet using the ASM the whole integration process of UNICORE in WS-PGRADE libraries. ASM provides various interfaces for the management additionally demanded the development of a so-called submitter and contains functions to be able to manage the whole execution plug-in in gUSE. lifecycle. gUSE provides a set of services for the management of workflows The java functions can be called from portlets which themselves in DCIs including the data-driven workflow engine and submitters. can use any technology and visualization methods suitable to the Jobs within the same workflow may be configured for diverse DCIs applications’ needs, independently from the underlying solution. and the workflow engine invokes each with an appropriate submitter. The security mechanisms rely on the implemented submitters Copyright c 2011 for the individual papers by the papers’ authors. Copying permitted only for private and academic purposes. This volume is published and copyrighted by its editors. 3rd International Workshop on Science Gateways for Life Sciences (IWSG 2011), 8-10 JUNE 2011 in gUSE. Hence, portlets developed for the MoSGrid science The UNICORE middleware will only take care of transferring gateway can utilize the submitter for UNICORE via the configured the data from the mounted XtreemFS volume on the HPC facility applications and are unaffected in case modules in the security to the USpace of the simulation job on the same machine. This infrastructure are changed. way, UNICORE is avoided for extensive data transfers over long distances as it is less efficient in this regard. The simulation jobs are enabled to directly access the MoSGrid volume via UNICORE that 3.3 Distributed Data Management provides a technically mature and proven way for this feature. XtreemFS was chosen as distributed file system for MoSGrid to safeguard data and provide each resource with secured access. It is an object-based file system which stores file data and metadata on different services. The object storage devices (OSDs) 4 DOMAIN SPECIFIC WORKFLOWS manage the physical files and the metadata and replica catalogs Beneath the supply of the WS-PGRADE based workflow oriented (MRCs) contain the directory tree and metadata such as the instruments to use grid resources, MoSGrid aims to provide filename, DN of the owner and file size. Moreover, the MRC novice users intuitive means to run chemical simulations. To serve authenticates users based on GSI and authorizes access to files this purpose, the chemical simulation codes, workflows, and IT based on the X.509 user certificate’s DN entry. The features for infrastructures are hidden. The user accesses portlets that directly authorization and authentication based on certificates allow to easily offer instruments to start and manage simulations for different integrate XtreemFS into existing services namely UNICORE, WS- subjects of structural bioinformatics. PGRADE, and the D-Grid infrastructure. Currently, XtreemFS and Currently, MoSGrid offers specific portlets for molecular its components support GSI proxy certificates for authentication dynamics and quantum chemistry and conceives a portlet for while SAML support is being developed. docking. The connections of the portlets are established to the Users are enabled to access, upload, and download data to and UNICORE grid middleware directly and the portlets use predefined from XtreemFS via a portlet deployed in WS-PGRADE. As soon certificates. In the near future, the portlets will be ported to utilize as the portlet is initialized, XtreemFS is mounted using a proxy the newly introduced ASM library and with it the gUSE services like certificate issued by a MyProxy server. the UNICORE submitter. This enables the developers to focus on the domain related features to further improve the user experience. 3.3.1 Integration of XtreemFS in UNICORE To make an The design and functionality of the domain specific portlets are XtreemFS volume available in UNICORE, the latter manages described in the following. the transfers of data between an XtreemFS volume and HPC facilities. UNICORE uses the FUSE [24] client of XtreemFS for this purpose. The client translates file system calls to requests 4.1 The Molecular Dynamics Portlet to the corresponding MRC and OSD. The client as well as the The Molecular Dynamics (MD) portlet enables chemists to easily UNICORE Target System Interface (TSI) shall be installed on access molecular simulation codes in the area of molecular every login node of participating HPC facilities. The TSI is the dynamics. Frequently used workflows are predefined and available UNICORE component which forms the interface between the for different recipes. On the one hand, the portal should ease UNICORE grid middleware and the HPC facility, e.g., it manages the work of experienced users and lower the hurdle for novice the communication with the batch system of the HPC facility and users on the other hand. The scientists can submit molecular handles data transfers via TLS (Transport Layer Security) [25] simulations without knowledge of the underling DCI. The MD connections. portlet is organized in three main sections. The XtreemFS client will mount the MoSGrid volume using Connection In the connect widget users can connect to the the XtreemFS X.509 service certificate, which identifies a services underlying DCI and see how many HPC computing facilities can instead of a person, and a file based on extended UNICORE User be accessed with their certificates. Database (XUUDB) information. It contains the mapping between Submission The MD submission widget is designed to provide a the user DN and a login on a HPC facility. Using this information molecular dynamics service on multiple levels. It allows the user an the local logins of the users are mapped to their corresponding DNs. easy use of standard chemical recipes. In the current state the user is Afterwards, the DNs are passed to XtreemFS for authorization and enabled to submit a single simulation using a directly uploaded job access to the users’ files, which are identified by a DN. The DNs are description. Alternatively, the user can run a complex recipe that thus the basis for the access rights on the HPC facilities. includes an energy minimization and a following equilibration. This The MRC regards XtreemFS clients using a service certificate as recipe is an indispensable prerequisite for all kinds of production a trusted system component, meaning that the MRC will accept runs. any DN sent by the client. Using the TSI, the mounted MoSGrid The user has, for both cases, to upload a file, containing either the XtreemFS volume will be integrated into UNICORE and thus made job description (Gromacs TPR-Format) or the structural information available in the UNICORE middleware. (PDB-Format). In the background the portlet automatically checks This way of integration offers important advantages. First of all, the job description for correctness. Some unnecessary input the integration is transparent in regard to XtreemFS. Independent information is automatically filtered. Other erroneous information, of the available storage resources, XtreemFS provides one global like missing residues, is detected and shown to the user in the portal. namespace. Furthermore, XtreemFS as an efficient distributed data In the next development stage the MD portlet will detect topological management system handles the transfer of data between the science features of the input structure ,e.g., if the protein is a monomer or a gateway and the HPC facilities. multimer and adapt the simulation to the different input files. Copyright c 2011 for the individual papers by the papers’ authors. Copying permitted only for private and academic purposes. This volume is published and copyrighted by its editors. 3rd International Workshop on Science Gateways for Life Sciences (IWSG 2011), 8-10 JUNE 2011 However, the portlet minimizes the necessary user input as far as specification (e. g. optimization, energy minimization), the selection possible but still needs some user input. First, it is hard to guess how of the simulation method, technical parameters of the resources to long a chemical process should be simulated [26]. Therefore, the be used and additional options. After specifying the geometry, the user has to define the simulation length in picoseconds. Secondly, job can be submitted for calculation. the user has to define the resources for the simulation. This includes Direct submission of an existing job file is provided as second the number of parallel nodes and the maximum duration of the option on the start screen. Users are enabled to directly upload simulation (wall time). When all information is given and checked, and submit pre-generated job descriptions in Gaussian job file the user can submit the job to the MoSGrid infrastructure. format [28]. They may parametrize the job with the specific Monitoring Finally, the user can monitor the job process. The requirements like maximum run time, number of processors to jobs are named after the user login on the portlet, combined with use, or memory requirements. This option is intended for advanced submit time, and name of the workflow recipe. A traffic light for users. These users are accustomed to certain tools which generate each simulation entry shows the status of the simulation. Further the output or want to modify the job descriptions directly to information, e.g., about the underlying HPC facility which the job achieve maximum control over the simulations and reuse existing utilizes, is hidden. job descriptions. For each simulation the user can query the output files, even in an ongoing simulation. Files can be downloaded or displayed in the portal. The MD portlet shows either plain text, picture, or figures, and in case a molecule file is selected, a 3D view in Jmol (see Fig. 3). Fig. 4. QC Portlet - Graphical job creation. Monitoring. No matter which method was used for creating the simulation job, the monitoring facilities can be used to acquire an overview of the currently active jobs. The status is represented by the well known items queued, running, successful, or failed. Furthermore, in case of a successful execution, the exit code of the tool is provided as well as the information that the data is available. Fig. 3. MD Portlet - Monitoring and view of a molecule file in Jmol. For successfully finished jobs, the workflow produces different results. Besides the native output format of the simulation tool, The next step of the development of the MD portlet is the specific values and the development of these values are plotted to incorporation of more simulation codes, additionally to the currently files, which can be viewed, downloaded, and processed in common supported Gromacs [27]. spreadsheet applications. 4.2 The Quantum Chemistry Portlet The Quantum Chemistry (MD) portlet is a fully functional prototype 5 SUMMARY AND OUTLOOK which implements a complete quantum chemical workflow. The We presented the security infrastructure of the MoSGrid science platform enables both experienced and inexperienced researchers gateway offering single sign-on to HPC facilities via SAML to submit their molecular simulations, monitor the progress, and assertions. Users are enabled to intuitively create SAML assertions retrieve the results. Moreover, pre- and post-processing routines are and are provided with domain specific workflows and portlets. available. Among others, these can be used to extract the output of Furthermore, WS-PGRADE offers the ASM API which allows the simulation tools and format it in a standardized way. developers to focus on domain specific workflows and portlets On the start screen the user has three options to select from. without the need to become acquainted to the security infrastructure The first two represent the two implemented workflows, the third in detail. On the high-level middleware service layer, gUSE was provides access to a monitoring facility. and the cloud file system XtreemFS is being extended for the use of Graphical job creation is supported in the first workflow. The SAML assertions. extensible interface provides the most common options to create Our next steps regarding the security infrastructure will enhance molecular simulations. Using familiar user interface components, the usability of the authentication mechanism. Therefore, we will both less experienced and advanced users can configure and submit utilize the user certificate embedded in the browser. This embedded simulation jobs. user certificate will be employed for two purposes: first, for an The interface is divided in different tabs, which group different automatic login of the user based on membership in the MoSGrid functions and settings (see Fig. 4). This includes the job type virtual organization, second, for an automatic creation of SAML Copyright c 2011 for the individual papers by the papers’ authors. Copying permitted only for private and academic purposes. This volume is published and copyrighted by its editors. 3rd International Workshop on Science Gateways for Life Sciences (IWSG 2011), 8-10 JUNE 2011 assertions. The latter step will eliminate the user interaction for [11]P. Kunszt, S. Maffioletti, R. Murri, and V. Tschopp. creating a SAML assertion and for choosing a certificate from GridCertLib: Use Shibboleth to Access the Grid from Web the local hard drive. Since personal certificates expire annually, Portals. http://arxiv.org/PS_cache/arxiv/pdf/ information about how to renew the certificate will be presented 1101/1101.4116v1.pdf, December 2010. when this case occurs. Together with the previous mentioned [12]X. D. Wang, M. Jones, J. Jensen, A. Richards, D. Wallom, measures, this will further aid the user in smoothly using the T. Ma, R. Frank, D. Spence, S. Young, C. Devereux, and MoSGrid science gateway. N.l Geddes. Shibboleth Access for Resources on the National Grid Service (SARoNGS). In Fifth International Conference on ACKNOWLEDGEMENT Information Assurance and Security, volume 2, pages 338–341, We would like to thank Valentina Huber for the basic version of the 2009. applet for generating SAML assertions. [13]R. Barbera, G. Andronico, G. Donvito, A. Falzone, J.J. Keijser, G. La Rocca, L. Milanesi, G. P. Maggi, and S. Vicario. A Grid Funding: This work is supported by the German Ministry Portal with Robot Certificates for Bioinformatics Phylogenetic of Education and Research under project grant #01IG09006 Analyses. Concurrency and Computation: Practice and (MoSGrid) and by the European Commission’s 7th Framework Experience, 23(3):246–255, March 2011. Programme under grant agreement #RI-261556 (EDGI), #RI- [14]EGI. European Grid Infrastructure. http://www.egi.eu/. 261323 (EGI-InSPIRE), #261585 (SHIWA), and #RI-283481 (SCI- [15]MTA SZTAKI. gUSE. http://www.guse.hu/. BUS). [16]F. Hupfeld, T.i Cortes, B. Kolbeck, J. Stender, E. Focht, M. Hess, J. Malo, J. Marti, and E. Cesario. The XtreemFS REFERENCES Architecture - A Case for Object-based File Systems in Grids. [1]O. Niehörster, G. Birkenheuer, A. Brinkmann, B. Elsässer, Concurrency and Computation: Practice and Experience, D. Blunk, S. Herres-Pawlis, J. Krüger, J. Niehörster, 20(17):2049–2060, 2008. L. Packschies, and G. Fels. Providing Scientific Software as [17]Z. Farkas and P. Kacsuk. P-GRADE Portal: a generic workflow a Service in Consideration of Service Level Agreements. In system to support user communities. Future Generation Proceedings of the Cracow Grid Workshop (CGW). 2009. Computer Systems journal, 27(5):454–465, 2011. [2]A. Streit, P. Bala, A. Beck-Ratzka, K. Benedyczak, [18]Inc. Liferay. Liferay. http://www.liferay.com. S. Bergmann, R. Breu, J. M. Daivandy, B. Demuth, A. Eifer, [19]A. Abdelnur and S. Hepper. JSR 168: Portlet Specification. A. Giesler, B. Hagemeier, V. Huber S. Holl, N. Lamla, http://www.jcp.org/en/jsr/detail?id=168, Oct D. Mallmann, A. S. Memon, M. S. Memon, M. Rambadt, 2003. M. Riedel, M. Romberg, B. Schuller, T. Schlauch, A. Schreiber, [20]M.S. Nicklous and S. Hepper. JSR 286: Portlet Specification T. Soddemann, and W. Ziegler. Unicore 6 - Recent and Future 2.0. http://www.jcp.org/en/jsr/detail?id= Advancements. JUEL-4319, February 2010. 286, June 2008. [3]N. Chandra, P. Anand, and K. Yeturu. Structural [21]OASIS Web Services Resource Framework (WSRF). Bioinformatics: Deriving Biological Insights from Protein http://www.oasis-open.org/committees/tc_ Structures. Interdisciplinary Sciences: Computational Life home.php?wg_abbrev=wsrf, 2011. Sciences, 2(4):347–366, December 2010. [22]The Apache Software Foundation. Apache Tomcat. http: [4]M. A. Jonikas, A. Laederach, and R. B. Altman. RNA //tomcat.apache.org/tomcat-6.0-doc/. STRUCTURAL BIOINFORMATICS. Wiley-Liss Inc., 2003. [23]S. Gesing, I. Marton, G. Birkenheuer, B. Schuller, R. Grunzke, [5]E. B. Fauman, A. L. Hopkins, and C. R. Groom. Structural J. Krüger, S. Breuers, D. Blunk, G. Fels, L. Packschies, Bioinformatics in Drug Discovery. A. Brinkmann, O. Kohlbacher, and M. Kozlovszky. Workflow [6]O. Niehörster, A. Brinkmann, G. Fels, J. Krüger, and J. Simon. Interoperability in a Grid Portal for Molecular Simulations. Enforcing SLAs in Scientific Clouds. In IEEE International In Roberto Barbera, Giuseppe Andronico, and Giuseppe La Conference on Cluster Computing 2010 (Cluster), 2010. Rocca, editors, Proceedings of the International Workshop [7]P. Kacsuk. P-GRADE portal family for grid infrastructures. on Science Gateways (IWSG10), pages 44–48. Consorzio Concurrency and Computation: Practice and Experience, COMETA, 2010. 23(3):235–245, March 2011. [24]FUSE. http://fuse.sourceforge.net. [8]D. Snelling, S. van den Berghe, and V. Li. Explicit Trust [25]T. Dierks and E. Rescorla. TLS. https://tools.ietf. Delegation: Security for Dynamic Grids. In Fujitsu Scientific org/html/rfc5246, 2008. and Technical Journal, pages 282–294, 2004. [26]J. Krüger and G. Fels. Ion Permeation Simulations by Gromacs [9]K. Benedyczak, P. Bała, S. van den Berghe, R. Menday, and − an Example of High Performance Molecular Dynamics. B. Schuller. Key Aspects of the UNICORE 6 Security Model. In Concurrency and Computation: Practice and Experience, Future Generation Computer Systems, number 27, pages 195– 23(3):279–291, 2011. 201, 2011. [27]B. Hess, C. Kutzner, D. van der Spoel, and E. Lindahl. [10]S. Tuecke, V. Welch, and J. Novotny. An Online Credential GROMACS 4: Algorithms for Highly Efficient, Load- Repository for the Grid: MyProxy. In Proceedings of the Tenth Balanced, and Scalable Molecular Simulation. Journal of International Symposium on High Performance Distributed Chemical Theory and Computation, 4(3):435–447, 2008. Computing (HPDC-10), pages 104–111. IEEE Press, August [28]M. J. Frisch, G.W. Trucks, E. Frisch, et al. Gaussian 03, 2001. Revision E.01. Gaussian, Inc., Wallingford CT, 2004. Copyright c 2011 for the individual papers by the papers’ authors. Copying permitted only for private and academic purposes. This volume is published and copyrighted by its editors.