8th International Workshop on Science Gateways (IWSG 2016), 8-10 June 2016 Towards a Metadata-driven Multi-community Research Data Management Service Richard Grunzke*, Wolfgang E. Nagel Volker Hartmann, Thomas Jejkal, Center for Information Services and High Performance Computing Ajinkya Prabhune, Rainer Stotzka Technische Universität Dresden Institute for Data Processing and Electronics Dresden, Germany Karlsruhe Institute of Technology richard.grunzke@tu-dresden.de Karlsruhe, Germany Alexander Hoffmann, Sonja Herres-Pawlis Aline Deicke, Torsten Schrade Hendrik Herold, Gotthard Meinel Institut für Anorganische Chemie Digitale Akademie Monitoring of Settlement and Rheinisch-Westfälische Technische Akademie der Wissenschaften und Open Space Development Hochschule Aachen Literatur Mainz Institute of Ecological and Aachen, Germany Mainz, Germany Regional Development Dresden, Germany Abstract—Nowadays, the daily work of many research commu- findability, pre-processing for further use and the exploitation nities is characterized by an increasing amount and complexity of existing data. of data. This makes it increasingly difficult to manage, access An established method to describe complex data structures and utilize to ultimately gain scientific insights based on it. At the same time, domain scientists want to focus on their science is the use of metadata. This encapsulates in aggregated form instead of IT. The solution is research data management in order the substance of a data set. Metadata (”data about data”) to store data in a structured way to enable easy discovery for plays a central role in making data available for the long- future reference. An integral part is the use of metadata. With it, term. It is essential for the comprehension and storage of data becomes accessible by its content instead of only its name data, its preservation, curation and discovery for future re- and location. The use of metadata shall be as automatic and seamless as possible in order to foster a high usability. use. Metadata allows for the easier applying of complex and Here we present the architecture and initial steps of the MASi costly tasks such as searching for data based on metadata. project with its aim to build a comprehensive research data Aside from a better discovery, other data management aspects, management service. First, it extends the existing KIT Data such as managing and utilizing similarities between data sets, Manager framework by a generic programming interface and by are fostered. a generic graphical web interface. Advanced additional features includes the integration of provenance metadata and persistent In diverse scientific communities partly very different meta- identifiers. The MASi service aims at being easily adaptable data standards exist that each incorporate community specific for arbitrary communities with limited effort. The requirements data characteristics. This limited portability to new use cases for the initial use cases within geography, chemistry and digital causes established methods in a scientific field to be of limited humanities are elucidated. The MASi research data management use in other fields. Also, the number of standardized tools to service is currently being built up to satisfy these complex and varying requirements in an efficient way. extract metadata from heterogeneous data is limited. Keywords—Metadata, Communities, Research Data Manage- In the MASi (Metadata Management for Applied Sciences) ment project [1] of the DFG (German Research Foundation) we are developing a generic data management service for sci- I. I NTRODUCTION entific data. Along heterogeneous scientific use cases we are demonstrating its applicability. The kind and extent of the data Today’s research landscape is characterized by steadily in- of the participating communities is largely domain specific. creasing amounts of data that is caused by the use of improved Likewise, the use of metadata is not uniform across them so data recording, increasingly complex simulation and by the that a suitable overarching research data management service correlation of numerous, often heterogeneous data sources. is a fundamental requirement. This increase of the data basis promises a higher amount of scientific insights. When amount and complexity of data is II. BACKGROUND increasing, the requirements in regard to the data structure The MASi research data management service is being built are also increasing. A suitable and specific data description using the KIT Data Manager repository framework (see Sec- becomes paramount. Present data processing methods are often tion II-A). Utilizing and extending the KIT DM enables MASi reaching their capacity limit. Novel management methods for to offer elaborate metadata functionality with a large degree newer and more complex data become essential. Especially of automation and flexibility. Such metadata management ca- important are improved data descriptions, sustainable storages, pabilities are a higher level abstraction based on basic storage 8th International Workshop on Science Gateways (IWSG 2016), 8-10 June 2016 devices and data management systems (e. g. iRODS [2]) in Java EE application server with Glassfish being the standard the data life cycle hierarchy [3]. Other systems including a and the Oracle and MySQL databases are supported. It offers delimitation in regard to the KIT DM are described in Section a web service interface to support, for example, the ICAT II-B. download manager TopCat. Authentication and authorization mechanisms are supported via LDAP, local data base or A. KIT Data Manager - A Repository Framework anonymous access with a plugin interface being available for The KIT Data Manager [4] is a generic, highly customizable extensions. open source software framework for building research data DSpace [7] is a mature and ready-to-use solution for institu- repository systems. Horizontally, it is organized into a number tional repositories and it is free and open source software. It is of well-defined high-level services providing functionalities adaptable to fit the need of individual institutions and fosters for data and metadata management and sharing as well as open access to all kinds of content. It supports submission administrative services for user and group management. Due workflows and various ingest and export methods. Various file to the focus on research data, KIT Data Manager also provides types, persistent IDs and PostgreSQL and Oracle databases features in addition to typical repository systems, namely a are supported. Search capabilities via metadata (descriptive, flexible data transfer service literally supporting every data administrative, structural) are provided that foster the long- transfer protocol and a data workflow service allowing to term preservation and accessibility of data. locally or remotely trigger the automatic execution of data Fedora (Flexible Extensible Digital Object Repository Ar- processing tasks. These as configured in the repository system chitecture) [8] provides a framework with individual basic and include data transfer to the processing environment, data components to build repositories. It is open source and aims ingest of the processing results and provenance tracking. High- to be robust and modular. The main use case is to provide level services can be accessed either via Java APIs, e.g. specialized services that may be integrated with existing to implement Web-based user interfaces or to extend the environments and technologies. A main goal is to foster digital basic framework by additional functionalities, or via RESTful content preservation for complex and large datasets. Metadata service interfaces, e.g. to access KIT Data Manager based for data organization is supported as well as descriptions of repository systems remotely using a programming language relationships between and linking of datasets. of choice. EUDAT [9] is a European project aiming to create a Vertically, KIT Data Manager is organized into different generically applicable infrastructure to manage, access, and layers where the upper layer is formed by the high-level preserve research data. The EUDAT services B2SHARE and services described before. For repository systems based on B2FIND involve metadata. B2SHARE is for storing and KIT Data Manager this upper layer provides reliable and sharing research data via a web portal. It is also the central well-defined extension points on the one hand and a high mean to upload data. This has to be done via the web portal degree of abstraction from underlying technologies on the and metadata has to be entered manually with community other hand. The lowest layer of the architecture interfaces specific profiles being definable. B2FIND enables to access these technologies by defining a basic set of functionalities data sets via their metadata and to annotate it with comments. that is provided by a corresponding technology, e.g. to store A B2NOTE service is planned which aims a enabling an and restore a predefined hierarchical data structure in case of automatic annotation of metadata [10]. the interface that has to be implemented for integrating a data iRODS as a distributed data management systems is not storage technology. This offers a high degree of sustainability focused on metadata management although it offers some basic as changing technologies only affects the lower layer whereas metadata functionality. Metadata can be attached to data as upper layers are unaffected by technology changes. attribute-value-unit triples on a per file basis which can be used Currently, KIT Data Manager is used to implement reposi- for searching. Integrated capabilities for metadata extraction, tory systems for various scientific disciplines, namely biology, annotation, provenance support is missing. arts and humanities, and nano-science. Due to its extensibility In contrast to these systems, the KIT Data Manager is more the base framework can be tailored to fulfil the specific needs flexible. It can be specifically adapted in-depth to arbitrary of each of these disciplines with a reasonable effort. target communities with a close integration into community workflows. It enables far reaching automations for high us- B. Other Systems and Delimitation age efficiency with ready-made capabilities to be adapted to The ICAT system [5] aims at supporting data management specific communities. for photon science facilities [6]. This includes supporting III. MAS I R ESEARCH DATA M ANAGEMENT S ERVICE beamline proposals, access rights, experiments, studies and instruments that produce the actual data. This data is collected A. Overarching Goals as datasets which can then be published. The attaching of The MASi service is building a generic and sustainable metadata such as experiments parameters, instrument parame- repository. It will be sustainably operated for the involved ters, and sample descriptions is supported. This closely follows communities to fully handle their data management require- the physics requirements but at the same time makes it hard ments by utilizing metadata. One part of the project is the to adapt for other use cases. Technically, ICAT relies on a development of a generic model as a concrete best practice 2 8th International Workshop on Science Gateways (IWSG 2016), 8-10 June 2016 Figure 1. The MASi architecture for generic research data management. implementation guide. It will enable to easily satisfy specific Standard) [11] document which is structured in XML. In community data management requirements by using metadata. MASi it is used as the standard format for all interfaces. Along this guide further communities will be supported to In the final stage there will be a registered MASi profile of build up their own MASi instances. The service is based on METS which is valid in all configurations. METS defines the KIT Data Manager repository framework (see Section II seven sections for different purposes. The metadata handled A) that we are currently extending. One the one hand, this by MASi itself is split in several packages (see Figure 1). includes generating a generic API to support further metadata Some of the packages are very similar with the sections used models. On the other hand, we are implementing and will in METS. Others are allocated in a way so that they are most provide generic graphical interfaces to fundamentally lower suitable for MASi. Each package is responsible for a special the effort to adapt MASi to new use cases. Furthermore, we are purpose (administrative, structural, content, bit preservation, closely collaborating within the Research Data Alliance (RDA) provenance and annotation metadata). While not all packages with other data researchers to develop recommendations and are needed by every community the structure of the METS aim at implementing these within MASi. An example of such document may slightly differ. In the future, also new packages a RDA recommendation is the support for PIDs in conjunction can be added without conflict. with PID information types and a data type registry. To store these different kinds of formats in an efficient way, MASi offers a generic storage API to various kinds of un- B. Generic Metadata Interface derlying data management systems. To keep the maintenance The metadata groups within RDA compiled a set of princi- effort manageable, we will focus on widely used standards. ples regarding metadata. The well-known definition of meta- MASi offers a REST interface (see Figure 1) which allows data as “data about data” is the basis. Metadata differs in the for a high extensibility. This interface supports the CRUD mode of use and should be easily machine-understandable. It (create, read, update, delete) operations for each package or may cover the whole lifecycle of the data starting at the idea of the whole metadata sets. In case of published open access a project, the acquisition to the publication which references data, anyone can perform read operations on metadata without the data. MASi will be a single point of access for all such authentication. For all other operations the user has to be kinds of metadata. The metadata is linked to the data via a authenticated and authorized. unique identifier. The identifier may be a custom one as long The following serves as an example regarding the prove- the data is only managed internally. As soon the metadata is nance package functionality: There are many workflow en- available for the public, a persistence identifier (PID) such gines each implementing its own format. But there are two as DOI (Digital Object Identifier) is used to make the data standards which are supported by the majority, Open Prove- referencable. Each PID contains at least two attributes holding nance Model (OPM) [12] and ProvOne [13]. It is possible to an URL to the metadata and to the data. The metadata is transform OPM metadata to the ProvOne format without loss. available as a METS (Metadata Encoding and Transmission ProvOne describes the provenance as a graph represented as 3 8th International Workshop on Science Gateways (IWSG 2016), 8-10 June 2016 C. Generic Graphical Interface The motivation to develop and provide a generic web interface for MASi is to significantly lower the time and effort required to adapt the MASi service to further use cases. Developers are then freed from the need to re-develop a user interface for each new use case. This saves time for developers familiar with the technology. However, it is fundamentally enabling for developers unfamiliar with it. The generic web interface is partly built on the basis of the Liferay portal framework [14] that provides ready-made capabilities such as plugins, menus, groups, roles, separable areas and user authentication management with systems such as LDAP and Shibboleth. This enables the integration of federations such as eduGAIN [15] for the easy re-use of existing institute logins. Liferay is open-source, mature and widely used. For this to seamlessly work within MASi, we are currently developing a Liferay plugin to integrate Liferay with the KIT Data Manager repository framework. The plugin will ensure consistency between the user management systems of Liferay and KIT DM by automatically syncing new Liferay users to KIT DM. Adding users to KIT DM via another way will be disabled in this operation mode to ensure that all users exist in both systems. This integration will enable the KIT DM to transparently support authentication systems that are already supported by Liferay such as LDAP and Shibboleth. Also, the KIT DM admin interface will be integrated with Liferay. We will provide a detailed installation and configuration guide in order to further lower the barrier of adoption. The second main part is the current development of a Figure 2. Workflow and metadata for historical maps. generic Liferay graphical interface portlet with common func- tionality. Initially, it will include basic upload, search and download capabilities. To fundamentally increase the impact of this development, we will create an extensive documentation XML. To allow for sophisticated queries the graph is stored to enable developers to easily adapt the portlet for their in a graph database. Therefore, it is possible to query, e.g., for specific use case requirements. The documentation will include similar workflows and even more complex queries are possi- everything from code checkout, development project config- ble. METS has a pre-defined section for provenance metadata. uration, adaptation examples to compilation. A main goal of It uses digiProvMD which can be losslessly transformed to the documentation is to lower the training period as much as ProvOne and vice versa. possible. All binaries, source code and documentation will be open source and will become part of the KIT Data Manager For each package there can be a specialized database storing framework. In the course of MASi, the generic GUI portlet the metadata. MASi will collect all metadata and compile will be continuously extended with increasingly advanced it in a METS document using the MASi profile or in case generic capabilities. Consequently, the documentation will be of a data ingest split the metadata in its packages to store appropriately extended in order to enable quick community them accordingly. On client side there will be MASi tools adaptations. supporting communities to compile a valid METS document IV. I NITIAL U SE C ASES matching the MASi profile. Subsequently, on server side a basic quality control is triggered during metadata ingest. It is A. Historical Maps based on the respective XML schema which is available for Historical topographic and cadastral maps are a valuable all packages except for content metadata as each community and often the only source for reconstructing land use changes has its own specific content metadata. Such a schema needs over long periods of time. To access this information for large to be created for each community. Registered schemata can scale spatial analyses and change detection, advanced image be used for the quality control. If this control is required to analysis and pattern recognition algorithms have to be applied be more sophisticated, a Java plugin can be implemented and to the scanned map documents. The retrieved information can easily activated in order so support any kind of quality control hence be used to “historize” existing land use and land cover capability. databases [16]. 4 8th International Workshop on Science Gateways (IWSG 2016), 8-10 June 2016 The automatic information acquisition from historical maps generates and necessitates a variety of metadata. The process comprises three major components: firstly, the scanning of the paper maps (which are only partially available as digital images); secondly, the georeferencing of the scanned maps (which is only provided for the minority of digital available maps); and thirdly, the information extraction from the geo- referenced digital map images. Each of the three components generate at least four obligatory metadata entries. Figure 2 shows the workflow of the information acquisition process as well as the essential metadata that are generated during the process. The given metadata are essential for both the change detection process as well as the correct interpretation of the retrieved information by third users. B. Spectroscopy in Chemistry In bioinorganic chemistry, a multitude of spectroscopic information can be obtained by experimental methods such as UV/Vis, IR, Raman, EPR and XAS spectroscopy. In most cases, these data are complemented by theoretical simulations which help to interpret the experimental data and obtain scientific insights. In a concrete case, we investigate metal complexes and their redox behavior with oxidants (electron- Figure 3. Metadata for the electron transfer (subgroup of the spectroscopic taking reagents) and reductants (electron-delivering reagents) use case). by UV/Vis spectroscopic measurements. Here, for instance, a copper(I) complex is treated with a cobalt(II) complex yielding copper(II) and cobalt(II) complexes under exchange of an Academy of Sciences and Literature Mainz and the Berlin- electron. The copper(I) spectroscopic features decay and those Brandenburg Academy of Sciences and Humanities, lies in of copper(II) form. The speed of this development is monitored the analysis of medieval stained glass preserved in church every 1.5 ms for some seconds producing a large amount of windows, museums, galleries and other places all over Europe, raw data. This raw data is reduced by the researcher, e.g. by the US and Canada (see Figure 4 for an example). choice of a suited wavelength and absorption time traces are Due to its fragile nature, medieval stained glass is greatly generated, at the moment manually. From these time traces, affected by environmental impacts. In a first step during the kinetic decay constants are determined. This analysis is the research, all windowpanes are photographed and then performed for several ratios between oxidant and reductant to documented in schematic drawings. With this documentation resolve the second-order kinetics of the electron transfer. This as a basis, the history of each window’s glazing and any final data shall be stored together with the theoretical analyses changes or restoration activities that might have been carried of the electron transfer by density functional theory. Metadata out throughout the centuries are studied. Finally, the iconog- annotation is important in all steps but yet an open issue: the raphy and the religious context of each window within its original raw data need annotation of who measured which ecclesiastical space are interpreted. chemical system and which ratio, temperature, setup etc. This The CVMA curates a digital image archive of the pho- information is traditionally documented manually in laboratory tographs taken. For each image, an extensive set of metadata is notebooks which are stored in the working group. The reduced provided. The records are modeled according to the guidelines data is then stored electronically. Here, metadata can comprise of the internationally acknowledged XMP metadata standard. all metadata of the raw data but additional information on All XMP information is directly embedded in the TIFF files. data reduction steps must be added to the metadata. The Due to this approach, an accidental separation between the file theoretical data imply different metadata: the version of the and its metadata becomes highly unlikely. In addition to XMP, code, functional, basis set, dispersion and solvent modelling the ICONCLASS vocabulary is used to describe and classify as well as grid size should be noted in the corresponding the contents of each image. workflow [17]. Figure 3 summarizes these different levels of The MASi service will open up the CVMA image archive data production for this example. to further interested parties, e.g. providers of cultural heritage photography such as the Prometheus, Foto-Marburg or the C. Church Windows Europeana online platforms. During the implementation of the The Corpus Vitrearum Deutschland“ [18] is part of the service, an OAI-PMH interface will be created. Also, a proof- ” international “Corpus Vitrearum Medii Aevi” (CVMA). The of-concept for the automatic matching of metadata records main focus of this long-term research project, funded by the with other cultural heritage data repositories will be drafted. 5 8th International Workshop on Science Gateways (IWSG 2016), 8-10 June 2016 Dresden/Leipzig is gratefully acknowledged. The research leading up to these results has been supported by the LSDMA project of the Helmholtz Association of German Research Centres. R EFERENCES [1] MASi, “Metadata Management for Applied Sciences,” 2016. [Online]. Available: http://www.scientific-metadata.de/ [2] A. Rajasekar, R. Moore, C.-y. Hou, C. A. Lee, R. Marciano, A. de Torcy, M. Wan, W. Schroeder, S.-Y. Chen, L. Gilbert et al., “iRODS primer: In- tegrated Rule-Oriented Data System,” Synthesis Lectures on Information Concepts, Retrieval, and Services, vol. 2, no. 1, pp. 1–143, 2010. [3] R. Grunzke, J. Krüger, S. Gesing, S. Herres-Pawlis, A. Hoffmann, A. Aguilera, and W. E. Nagel, “Managing complexity in distributed data life cycles enhancing scientific discovery,” in IEEE 11th International Conference on e-Science, August 2015, pp. 371–380. [4] T. Jejkal, A. Vondrous, A. Kopmann, R. Stotzka, and V. Hartmann, “KIT Data Manager: The Repository Architecture Enabling Cross- Disciplinary Research,” in Large-Scale Data Management and Analysis - Big Data in Science - 1st Edition, 2014. [Online]. Available: http://digbib.ubka.uni-karlsruhe.de/volltexte/1000043270 [5] D. Flannery, B. Matthews, T. Griffin, J. Bicarregui, M. Gleaves, L. Lerusse, R. Downing, A. Ashton, S. Sufi, G. Drinkwater et al., “ICAT: Integrating Data Infrastructure for Facilities Based Science,” in e-Science, 2009. e-Science’09. Fifth IEEE International Conference on. IEEE, 2009, pp. 201–207. Figure 4. Mauritius and his companions refuse the idolothyte. Act of the [6] R. Grunzke, J. Hesser, J. Starek, N. Kepper, S. Gesing, M. Hardt, saints window in the Marktkirche Hannover. V. Hartmann, S. Kindermann, J. Potthoff, M. Hausmann, R. Müller- Pfefferkorn, and R. Jäkel, “Device-driven Metadata Management So- lutions for Scientific Big Data Use Cases,” in 22nd Euromicro In- ternational Conference on Parallel, Distributed, and Network-Based Finally, a configurable web interface will be built that will Processing (PDP 2014), February 2014. allow the generic embedding of this automatically enriched [7] M. Smith, M. Barton, M. Bass, M. Branschofsky, G. McClellan, D. Stuve, R. Tansley, and J. H. Walker, “DSpace: An Open metadata records into the CVMA image files. Source Dynamic Digital Repository,” 2003. [Online]. Available: hdl.handle.net/1721.1/29465 V. C ONCLUSION AND O UTLOOK [8] FedoraCommons, “Fedora Commons Repository Software,” 2016. [Online]. Available: http://fedora-commons.org/ The MASi research data management service provides a [9] D. Lecarpentier, P. Wittenburg, W. Elbers, A. Michelini, R. Kanso, solution for the highly relevant challenge of managing large P. Coveney, and R. Baxter, “EUDAT: A New Cross-Disciplinary Data amounts of complex data. It builds on substantial previous Infrastructure for Science,” International Journal of Digital Curation, work that is further extended and broadened. The MASi ser- vol. 8, no. 1, pp. 279–287, 2013. [10] EUDAT, “Eudat semantics working group,” 2015. [Online]. Available: vice that is currently being built up, is easily able to seamlessly http://eudat.eu/semantics integrate with highly diverse use cases. In this capacity it plays [11] J. P. McDonough, “METS: Standardized Encoding for Digital Library an essential role in fulfilling the complex requirements while Objects,” International journal on digital libraries, vol. 6, no. 2, pp. 148–158, 2006. further use cases are currently being planned. [12] L. Moreau, J. Freire, J. Futrelle, R. E. McGrath, J. Myers, and P. Paulson, As future work, we are evaluating the integration of MASi “The Open Provenance Model: An Overview,” in Provenance and both with the UNICORE HPC middleware [19] and science Annotation of Data and Processes. Springer, 2008, pp. 323–326. [13] V. Cuevas-Vicenttı́n, B. Ludäscher, P. Missier, K. Belhajjame, F. Chiri- gateways such as MoSGrid [20]. We are also continuing to gati, Y. Wei, S. Dey, P. Kianmajd, D. Koop, S. Bowers, and I. Altintas, work within the RDA and contribute our own expertise in “ProvONE: A PROV Extension Data Model for Scientific Workflow discussions to create RDA recommendations on how to best Provenance,” in DataONE Provenance Working Group, 2014. [14] Liferay, “Enterprise Open Source Portal and Collaboration Software,” handle various aspects of research data management. We aim 2016. [Online]. Available: http://www.liferay.com/ at implementing the resulting joint recommendations within [15] Geant, “eduGAIN - Interconnecting Federations to Link MASi. This will contribute in the creation of MASi as a Services and Users Worldwide,” 2015. [Online]. Available: service that is efficient, future-proof and has a high user http://www.geant.net/service/eduGAIN/Pages/home.aspx [16] H. Herold, G. Meinel, R. Hecht, and E. Csaplovics, “A GEOBIA acceptance. Approach to Map Interpretation-Multitemporal Building Footprint Re- trieval for High Resolution Monitoring of Spatial Urban Dynamics,” in ACKNOWLEDGMENT International Conference on Geographic Object-Based Image Analysis, 2012, pp. 252–256. The authors would like to thank the DFG (German Re- [17] S. Herres-Pawlis, A. Hoffmann, T. Rosener, J. KrÃŒger, R. Grunzke, search Foundation) for the opportunity to do research in the and S. Gesing, “Multi-layer Meta-metaworkflows for the Evaluation of MASi project (NA711/9-1). Furthermore, financial support Solvent and Dispersion Effects in Transition Metal Systems Using the MoSGrid Science Gateways,” in Science Gateways (IWSG), 2015 7th by the BMBF (German Federal Ministry of Education and International Workshop on, June 2015, pp. 47–52. Research) for the competence center for Big Data ScaDS [18] “Corpus Vitrearum Deutschland,” http://www.corpusvitrearum.de/, 2016. 6 8th International Workshop on Science Gateways (IWSG 2016), 8-10 June 2016 [19] K. Benedyczak, B. Schuller, M. Petrova, J. Rybicki, and R. Grunzke, “UNICORE 7 - Middleware Services for Distributed and Federated Computing,” in International Conference on High Performance Com- puting Simulation (HPCS), 2016, accepted. [20] J. Krüger, R. Grunzke, S. Gesing, S. Breuers, A. Brinkmann, L. de la Garza, O. Kohlbacher, M. Kruse, W. E. Nagel, L. Packschies, R. Müller- Pfefferkorn, P. Schäfer, C. Schärfe, T. Steinke, T. Schlemmer, K. D. Warzecha, A. Zink, and S. Herres-Pawlis, “The MoSGrid Science Gateway - A Complete Solution for Molecular Simulations,” Journal of Chemical Theory and Computation, vol. 10(6), pp. 2232–2245, 2014. 7