=Paper=
{{Paper
|id=None
|storemode=property
|title=From the Desktop to the Grid: conversion of KNIME Workflows to gUSE
|pdfUrl=https://ceur-ws.org/Vol-993/paper9.pdf
|volume=Vol-993
|dblpUrl=https://dblp.org/rec/conf/iwsg/Garza0SRARK13
}}
==From the Desktop to the Grid: conversion of KNIME Workflows to gUSE==
From the desktop to the grid: conversion of KNIME Workflows to gUSE Luis de la Garza Jens Krüger Charlotta Schärfe Applied Bioinformatics Group Applied Bioinformatics Group Applied Bioinformatics Group University of Tübingen, Germany University of Tübingen, Germany University of Tübingen, Germany delagarza@informatik.uni-tuebingen.de Marc Röttig Stephan Aiche Knut Reinert Applied Bioinformatics Group Department of Mathematics and Computer Science Algorithms in Bioinformatics University of Tübingen, Germany Freie Universität Berlin, Germany Freie Universität Berlin, Germany International Max Planck Research School for Computational Biology and Scientific Computing Berlin, Germany Oliver Kohlbacher Applied Bioinformatics Group University of Tübingen, Germany oliver.kohlbacher@uni-tuebingen.de Abstract—The Konstanz Information Miner is a user-friendly the sense of good lab practice. graphical workflow designer with a broad user base in industry and academia. Its broad range of embedded tools and its powerful The most obvious and direct advantage of the application of data mining and visualization tools render it ideal for scientific workflows. It is thus used more and more in a broad range workflows in the scientific environment is the capability of of applications. However, the free version typically runs on a saving the general sequence of events in order to conveniently desktop computer, restricting users if they want to tap into optimize the settings for a simulation, such as including computing power. The grid and cloud User Support Environment the sweep through single parameter settings. Scientists also is a free and open source project created for parallelized and benefit from other non-obvious advantages of using workflows distributed systems, but the creation of workflows with the included components has a steeper learning curve. including, but not limited to: ability to analyze the results, In this work we suggest an easy to implement solution including statistical analysis and data visualization, data combining the ease-of-use of the Konstanz Information Miner mining on experimentally (wet or dry lab) obtained datasets with the computational power of distributed computing infras- and report creation using previously obtained data without tructures. We present a solution permitting the conversion of requiring further user input. workflows between the two platforms. This enables a convenient development, debugging, and maintenance of scientific workflows on the desktop. These workflows can then be deployed on a cloud Those tasks can also be fulfilled using simple scripts or or grid, thus permitting large-scale computation. separate program suites for the individual steps. Workflow To achieve our goals, we relied on a Common Tool Description technology however allows combining all steps together by XML file format which describes the execution of arbitrary providing interfaces to external tools while not requiring programs in a structured and easily readable and parseable way. In order to integrate external programs into we employed the any knowledge of programming or scripting languages. Generic KNIME Nodes extension. Additionally, the workflows established within one project may be easily applied to other projects as well, which then I. I NTRODUCTION facilitates consistency in analysis and reporting throughout Workflow technology with platforms such as Pipeline Pilot several projects, thus reducing the risk of human error and [1], KNIME [2], Taverna [3], [4], [5] and Galaxy [6], [7], [8] allowing reproducing previous results. Furthermore, through have now become a crucial part in supporting scientists in the ability to share workflows with collaborators or the their daily work. By helping to create and automate virtual scientific community a team-based analysis of experimental processes such as molecular docking or molecular dynamics results can take place. simulations, as well as simplifying data analysis and data mining, scientists are allowed to focus on their primary goals Nowadays a plethora of different workflow systems exists that [9]. Furthermore the quality of simulation results is improved, was initially targeted at different use cases such as desktop- as following established protocols increases reproducibility in based data mining or automation of computations on a grid. With the exponential increase of computational power [10] as a black box, enabling the execution of sub-workflows available to scientists, as well as the improvements in network within WS-PGRADE, which acts as a host system. The technology, the boundaries between local applications and second approach puts emphasis on the actual transformation processes executed on distributed systems became blurred. of selected workflow languages such as ASKALON, Pegasus, This has as a result that there does not yet exist a one-fits-all P-GRADE, MOTEUR and Triana into each other [14]. ER- solution that is being able to satisfy the scientific user’s FLOW continues these ideas, adding the aspects of detailed needs for a combination of local and distributed workflow evaluation of user community needs and the specific handling execution. In addition to that, most users of workflow of scientific applications on remote DCIs, being called by the technology in the scientific environment have created a workflows. library of own workflows with their workflow suite of choice over the past years. These may now be outdated or not suited III. W ORKFLOWS for the computation resources required for today’s tasks, The concept of recipes or protocols is familiar to scientists thus requiring the switch to another workflow environment from all academic fields. The expression ”workflow” follows and the need of re-implementing the existing workflows in this concept of a collection of consecutive computational the workflow language used by the new environment. For steps. This may involve preparation steps for importing example, the Konstanz Information Miner (KNIME) [2] was data, converting it and to carrying out whatever preparations mainly created for applications on a local machine and its are required. After these steps the actual simulation or free version does not provide access to compute clusters computational step is usually carried out. There is a multitude out of the box, but KNIME has, due to its ease of use of possible application domains, like quantum calculations, and extensibility, found wide acceptance in the scientific molecular dynamics, docking or data mining to name only community resulting in a huge library of existing KNIME a few. The last section of a typical workflow deals with the workflows for various tasks. The grid and cloud User Support data analysis and visualization, often summarized in the form Environment (gUSE) [11] on the other hand, was specifically of a report. created to use distributed computing infrastructures (DCI), but the creation of workflows requires more user input and An important aspect for workflow interoperability is the therefore is not as straightforward as local systems such as representation as a graph. The individual tasks represent the KNIME. A KNIME user now may want to not only use nodes; their edges correspond to the data flow or execution the KNIME desktop version for data analysis and pilot runs dependencies between these nodes. Hence, when a workflow for evaluating simulation parameters and post-simulation shall be converted from e.g. KNIME to gUSE it has to be analysis, but also the open source gUSE environment for taken care that the graph representation is similar. Is the moving the actual simulations to a cluster. The workflows workflow represented as a strict directed graph or does it for the simulation pilot run and the actual simulation are correspond to a multigraph? Are parameter sweeps executable identical since the first is used to find the best settings and the via loops or through the enumeration of predefined lists? Does latter then applies those settings. When using two different the workflow have multiple start or end points corresponding software suites such as KNIME and gUSE for the pilot run to a quiver? This small selection of questions illustrates and the actual full-scale simulation, it is currently required the logical constraints faced when dealing with workflow to implement the workflow twice (one for each software). conversion from one language into another. Furthermore the The same applies when switching the workflow software. data handling and its flow along the graph is of relevance. Is This re-implementation of existing workflows is a tedious the data directly incorporated into the nodes, e.g. as tables or task that would not be needed if it were possible to convert does it reside elsewhere independently of the execution status workflows written with one workflow language in a way that of the specific node? Are there specific formats or conventions it could then be read by another workflow environment - thus regarding the dependency to the workflow language? How is enabling workflow interoperability. the data annotated? Great care has to be taken when facing the conversion of data from different workflow languages. II. R ELATED W ORK The question whether a certain computational task is exe- In the following chapters specific details of KNIME cutable on different platforms is as old as computers them- and gUSE are described. selves. Regarding modern workflow languages, a couple of specific challenges come into focus, discussed in detail in the IV. KNIME following chapters. Since there is a multitude of workflow KNIME is one of the most commonly used workflow languages, the focus shifts for different use cases and other management systems in the field of e-Science systems, user communities. The most prominent approach to deal with especially pharmaceutical research, but also financial data the general problem of workflow interoperability is SHIWA analysis and business intelligence [15]. The KNIME pipelining and its follow up project ER-FLOW [12], [13]. A double platform is an open-source program implemented as a plug-in strategy was followed, namely coarse and fine grained in- for Eclipse [2], written in Java, and offered to the scientific teroperability. The first one considers a workflow language community as a desktop version free of charge. Although Fig. 2. The layered structure of gUSE/WS-PGRADE is shown. Figure modified from [21] Fig. 1. An illustration of the KNIME workflow concept. Nodes represent single processing unites and connecting edges between these nodes transport data or models from one processing unit to the next. In the end a final data between gUSE services and middlewares, enabling access table is created that can be saved to a file. Figure modified from [19] to the computational resources of grid or cloud. On the top layer resides WS-PGRADE, the graphical user interface. All functionality of the underlying services is exposed to the there are extensions allowing the execution of single nodes end-user by portlets residing in a Liferay portlet container in the cloud or on a grid [16], these are only restricted to a being part of WS-PGRADE. professional release and are thus not part of the free workflow management system. Furthermore, KNIME is highly popular gUSE workflows may be created and maintained via due to its easy of use and extensibility. standard web browsers accessing corresponding portlets and underlying services. Initially the workflow graph has to be The KNIME platform implements a modular approach to created through a Java applet. The nodes have to be defined workflow management and execution in which single nodes while each node may have multiple input and output ports. represent single processing units such as data manipulation, These work as anchor points for the vertices connecting as depicted in Figure 1. These nodes are connected via edges them. The selection of applications is done through the that pipe either data or computational models from one node Concrete portlet also enabling the selection of different into the next node. Data is internally stored in special java DCIs with different middlewares within the same workflow. classes called DataTable, which store the data and additional Application specific parameters can be set, as well as resource meta information about the different data columns [17], [18]. requirements such as memory or runtime settings. Beside The nodes and edges together form a directed acyclic graph, submission and monitoring features, a multitude of import which is called ”workflow” and converts initial input files and export features are available to the user. into output data tables that can be further exported as new files [18]. The whole set of services offers convenient access to the vast computational resources of modern grids and clouds. The implementation as an Eclipse plug-in with its free API gUSE is available free of charge for academic purposes. [20] facilitates easy extensibility of the workflow system and simple integration of novel nodes thus resulting in a vast VI. G ENERIC KNIME N ODES library of nodes created by the scientific community and also As previously discussed, KNIME offers a wide array of commercial software providers. prebuilt nodes for the execution of a multitude of different tasks. It is also possible to obtain external nodes provided V. G USE by community developers such as the ones developed by The grid and cloud User Support Environment (gUSE) [11] Schrödinger [22], ChemAxon [23], etc. Furthermore, it is is a highly popular technology for scientific portals enabling possible to develop KNIME nodes, being a simple task of access to distributed computing infrastructures (DCIs). It has implementing a few KNIME specific classes in the Java been developed at the Laboratory of Parallel and Distributed programming language. However, we still felt that, although Systems in Budapest over the past years. gUSE represents the KNIME is powerful for most computations and it enables middle tier of a multi-layer portal solution. Different tasks users to easily extend its capabilities, sometimes it is needed can be handled by a set of high level web services (see Figure to integrate external binaries into KNIME in the form of a 2). The Application Repository holds the executable for all node in a simpler way. programs that may be linked to a node within a workflow. The File Storage deals with the data handling, while the We used a KNIME extension called Generic KNIME Information System takes care of e.g. user information and Nodes [24], [25], which allows the integration of arbitrary job status. The Workflow Interpreter is responsible for the programs into KNIME. This integration is fully compatible workflows and their execution, which are stored in the with KNIME and other KNIME nodes and each integrated Workflow Storage. The Submitter represents the connection program behaves as a KNIME node. Since KNIME relies on the use of data tables rather than on files, GKN also includes utility nodes such as File to Table, Table to File, Input File and Output File to ease the interaction of a GKN-generated node with other nodes. In order for GKN to properly execute external binaries, we also relied on an XML-based file format that describes tools, nodes of the workflow graph, called Common Tool Description (CTD) [26]. CTD files are XML documents that contain information about the parameters, flags, inputs and outputs of a given binary. This information is presented in a structured and human readable way, thus facilitating manual generation for arbitrary binaries. Since CTDs are also properly formed XML documents, parsing of these is a trivial matter. The generation of CTDs can be either manual or by CTD capable programs. Software tool suites such as SeqAn Fig. 3. In order to start the conversion of a workflow, we’ve integrated visual [27], OpenMS [28] and CADDSuite [29] can not only elements in the KNIME platform generate CTDs for each of its tools, but can also parse input CTDs and execute its tools accordingly. VII. C ONVERSION FROM KNIME TO G USE when it comes to the translation of the configuration that assists in the execution of the workflow. It is clear that A. Overview significant effort has to be invested to resolve any potential The motivation for this conversion lies in the fact that disparity between the architectures on the computer in which most scientific computations can be memory and processor the KNIME workflow was created and the grid or cloud. In intensive. The requirements to run such computations in other words, a generic solution cannot simply rely on both an acceptable time frame are hardly to be met by a simple the desktop machine in which KNIME is being executed and desktop or laptop computer. Grids and clouds are packed with each node in the infrastructure administered by gUSE having resources ready to be tapped, but as earlier discussed; creating the same architecture and therefore, the same binaries. For workflows on such systems can be a tedious task that the this reason, a conversion table relating the binaries needed most enthusiast scientists might not be ready to go through. for each step on the desktop to the ones required on the grid Based on the popularity of KNIME’s ease of use and its wide or cloud is needed. Since gUSE supports several middlewares acceptance in the scientific community, we felt that there was (e.g. UNICORE, LSF, BOINC, etc.), the usage of a different a gap to be filled by bridging a great workflow editor such format to represent the required information to execute a as KNIME with a great grid and cloud manager such as gUSE. needed binary has to be accounted for in the workflow conversion process. Our vision is to have users creating and executing workflows on KNIME in their desktop computers using a reduced B. Conversion of complete Workflows or a test dataset and when an acceptable stable version The KNIME extension that we have developed to convert of a workflow is ready, it can be exported into a gUSE KNIME workflows to gUSE format is fully integrated in managed grid or cloud. Following this, the user would have KNIME. When a user is satisfied with a certain workflow, to configure the exported workflow to include a larger or all is needed is to request a conversion by simply clicking a production-ready dataset on which to perform a computation. a button in a toolbar or a menu element (see Figure 3). What follows is a standard dialog window (see Figure 4) in One of the great features of the Eclipse Platform is its which the user can select the desired destination to export the extendibility through the development of so called plug-ins workflow. Once the user has selected an export destination, [30]. Given that KNIME has been built on top of the an archive that can be uploaded and imported into a gUSE Eclipse Platform, it is fairly simple to develop KNIME portal will be generated. extensions, which in turn are Eclipse Platform plug-ins. KNIME also exposes an API that gives full access to all of The conversion process starts by using KNIME’s API to the elements involved in a workflow [20], both visually and access each node and its connections to convert them into a logically. We have developed a simple conversion KNIME workflow in an intermediate, internal format. Afterwards, this extension that can export a KNIME Workflow to gUSE format. internal format workflow is converted into a gUSE workflow. This seemingly impractical design choice was taken in order The critical challenge for workflow conversion arises to follow the Separation of Concerns principle [31]. Since the Fig. 6. Using GKN it is possible to perform docking in KNIME with the Fig. 4. An archive in the gUSE format will be generated, which can be CADDSuite imported into gUSE benefit from the strengths of gUSE, but also to overcome its drawbacks. Part of our effort consisted of the creation of a docking workflow using the provided tools by gUSE. As mentioned in a previous section, getting a complex workflow right on gUSE can turn into a quite intimidating task for the inexperienced user. During this time, we felt that the creation of such a complex workflow could and should be simpler. We perform docking using our own software, the Computer Aided Drug Design Suite (CADDSuite) [29], which we also integrated in KNIME using Generic KNIME Nodes, as depicted in Figure 6. Putting this workflow together on KNIME took us less than an hour. A similar version of this workflow on gUSE took us significantly more than that. We were able to export the workflow to gUSE with minimal configuration, that is, we just needed to provide adequate input data files. Since docking is a processor intensive task, we used different data sets in our desktop computers and on the MoSGrid portal. In order to use input files in KNIME with GKN, it is required to use the Input File node. Similarly, for output files, Fig. 5. Our KNIME extension has been designed taken into account extendability and modularity the Output File node must be used. However, in gUSE input and output files are directly associated to a job’s input and output ports, respectively. This is the reason why during the release schedule of KNIME is something not under of our conversion of a workflow from KNIME to gUSE any input control, it is a good idea to minimize the exposure of the com- or output file nodes will disappear and take the form of input ponents of our conversion process from changes in KNIME’s or output ports in gUSE. (see Figure 7) API or workflow format by first using an intermediate for- VIII. F UTURE W ORK mat. This intermediate format is something internal to the conversion process whose changes are mandated exclusively A major work in progress is how to properly export by us. Another advantage of this design is that, in the event workflows that benefit from parameter sweep. This is critical of extending our KNIME extension by adding other export for the performance of exported workflows, since gUSE formats, it would only be needed to perform the conversion offers parallelization via parameter sweep. from this internal format without explicitly converting the KNIME workflow, thus, decreasing development time and the KNIME offers several data mining, statistics and reporting amount of code needed to perform the required task. This is nodes that could be easily integrated with our docking broadly depicted on Figure 5. workflow. For instance, it would be desirable to generate a concise PDF report containing the top ranked ligands. C. Example Application: Docking Workflows Unfortunately, the conversion of such nodes is still not As users and developers of the Molecular Simulation possible. However, KNIME offers a headless execution of Grid portal (MoSGrid) [32] we have learned not only to workflows (i.e., command line), thus giving us the chance to We have these two complementary forces that we feel our KNIME extension smoothly combines. On one side, we have gUSE enabling users to harness the power supplied by a grid or a cloud. On the other side, we have KNIME allowing users to create workflows in a friendly manner. Joining these two is of critical importance for the advancement of scientific fields in which an experiment can be broken up in smaller tasks to form a workflow. ACKNOWLEDGMENT The authors would like to thank the BMBF (German Federal Ministry of Education and Research) for the Fig. 7. The exported workflow in MoSGrid after the user has uploaded to opportunity to do research in the MoSGrid project (reference the portal 01IG09006). The research leading to these results has also partially been supported by the European Commission’s Seventh Framework Programme (FP7/2007-2013) under grant work around this current limitation. agreement no 283481 (SCI-BUS). MoSGrid relies on UNICORE to access binaries and Stephan Aiche gratefully acknowledges funding by the data. A workflow using UNICORE resources has a different European Commissions’s seventh Framework Program representation in gUSE than a workflow using an LSF (GA263215). scheduler. Since we want to reach as many users as possible, it is desired that our KNIME extension can properly handle as many constellation of gUSE components as possible. R EFERENCES [1] accelrys. Pipeline Pilot. [Online]. Available: IX. C ONCLUSION http://accelrys.com/products/pipeline-pilot/ Any robust scientific experiment must be repeatable. [2] M. R. Berthold, N. Cebron, F. Dill, T. R. Gabriel, T. Kötter, T. Meinl, P. Ohl, C. Sieb, K. Thiel, and B. Wiswedel, KNIME: The Konstanz Workflow technologies provide their users with repeatability information miner. Springer, 2008. on the tasks that comprise a workflow. Furthermore, these [3] P. Missier, S. Soiland-Reyes, S. Owen, W. Tan, A. Nenadic, I. Dunlop, technologies offer adopters with the possibility of saving A. Williams, T. Oinn, and C. Goble, “Taverna, reloaded,” in Scientific temporary and final results for further analysis as well as the and Statistical Database Management. Springer, 2010, pp. 471–481. [4] D. Hull, K. Wolstencroft, R. Stevens, C. Goble, M. R. Pocock, P. Li, chance of rerunning a subset of tasks contained in a workflow. and T. Oinn, “Taverna: a tool for building and running workflows of If a configuration error is detected in one of the tasks that services,” Nucleic acids research, vol. 34, no. suppl 2, pp. W729–W732, make up a workflow, this very ability of storing intermediate 2006. [5] T. Oinn, M. Greenwood, M. Addis, M. N. Alpdemir, J. Ferris, K. Glover, results allow users to make changes to the configuration and C. Goble, A. Goderis, D. Hull, D. Marvin et al., “Taverna: lessons in later on resume the execution of the workflow without having creating a workflow environment for the life sciences,” Concurrency and to execute tasks not influenced by these changes. Computation: Practice and Experience, vol. 18, no. 10, pp. 1067–1100, 2006. [6] J. Goecks, A. Nekrutenko, J. Taylor, T. G. Team et al., “Galaxy: a Making grids and clouds accessible to users has the comprehensive approach for supporting accessible, reproducible, and benefit of speeding up experiments, production of scientific transparent computational research in the life sciences,” Genome Biol, vol. 11, no. 8, p. R86, 2010. texts, ensure an optimal use of resources and minimize idle [7] D. Blankenberg, G. V. Kuster, N. Coraor, G. Ananda, R. Lazarus, computing time. As we have argued, one of the main obstacles M. Mangan, A. Nekrutenko, and J. Taylor, “Galaxy: A web-based in accessing grids and clouds is the steep learning curve genome analysis tool for experimentalists,” Current protocols in molec- ular biology, pp. 19–10, 2010. to generate usable workflows. However, gUSE is accessible [8] B. Giardine, C. Riemer, R. C. Hardison, R. Burhans, L. Elnitski, P. Shah, to users and excels in executing workflows in an efficient way. Y. Zhang, D. Blankenberg, I. Albert, J. Taylor et al., “Galaxy: a platform for interactive large-scale genome analysis,” Genome research, vol. 15, no. 10, pp. 1451–1455, 2005. It is far more easier to train users to use KNIME in [9] K. Görlach, M. Sonntag, D. Karastoyanova, F. Leymann, and M. Reiter, order to generate workflows and test experiments than to “Conventional workflow technology for scientific simulation,” in Guide teach them how to generate scripts for a certain resource to e-Science. Springer, 2011, pp. 323–352. [10] G. E. Moore et al., “Cramming more components onto integrated manager or middleware. Using KNIME, users can rapidly circuits,” 1965. generate a workflow by using an intuitive and robust user [11] MTA-SZTAKI LPDS. grid and cloud User Support Environment. interface. The obvious limitation is that KNIME will have as [Online]. Available: http://guse.hu/ much computing power as the desktop computer on which it [12] Sharing Interoperable Workflows for large-scale scientific Simulations on available DCIs. [Online]. Available: http://www.shiwa-workflow.eu/ runs and this might not be adequate for applications such as [13] Building an European Research Community through Interoperable docking. Workflows and Data. [Online]. Available: http://www.erflow.eu/ [14] M. Kozlovszky, K. Karoczkai, I. Marton, A. Balasko, A. Marosi, and P. Kacsuk, “Enabling generic distributed computing infrastructure compatibility for workflow management systems,” Computer Science, vol. 13, no. 3, pp. 61–78, 2012. [15] K. Achilleos, C. Kannas, C. Nicolaou, C. Pattichis, and V. Promponas, “Open source workflow systems in life sciences informatics,” in Bioin- formatics & Bioengineering (BIBE), 2012 IEEE 12th International Conference on. IEEE, 2012, pp. 552–558. [16] CloudBroker: High Performance Computing Software as a Service - Integration in KNIME. [Online]. Available: http://www.knime.org/files/10 CloudBroker.pdf [17] M. R. Berthold, N. Cebron, F. Dill, G. D. Fatta, T. R. Gabriel, F. Georg, T. Meinl, P. Ohl, C. Sieb, and B. Wiswedel, “Knime: The konstanz information miner,” in Proceedings of the Workshop on Multi-Agent Systems and Simulation (MAS&S), 4th Annual Industrial Simulation Conference (ISC), 2006, pp. 58–61. [18] M. R. Berthold, N. Cebron, F. Dill, T. R. Gabriel, T. Kötter, T. Meinl, P. Ohl, K. Thiel, and B. Wiswedel, “Knime - the konstanz information miner: version 2.0 and beyond,” SIGKDD Explor. Newsl., vol. 11, no. 1, pp. 26–31, Nov. 2009. [Online]. Available: http://doi.acm.org/10.1145/1656274.1656280 [19] M. R. Berthold, N. Cebron, F. Dill, T. R. Gabriel, T. Kötter, T. Meinl, P. Ohl, C. Sieb, K. Thiel, and B. Wiswedel, “Knime: The konstanz information miner,” in In Data Analysis, Machine Learning and Appli- cations - Proceedings of the 31st Annual Conference of the Gesellschaft für Klassifikation e.V., Studies in Classification, Data Analysis, and Knowledge Organization. Berlin, Germany: Springer, 2007, pp. 319– 326. [20] KNIME Application Programming Interface. [Online]. Available: http://tech.knime.org/docs/api/ [21] MTA-SZTAKI LPDS. grid and cloud User Support Environment Archi- tecture. [Online]. Available: http://www.guse.hu/?m=architecture&s=0 [22] Schrödinger KNIME Extensions. [Online]. Available: http://www.schrodinger.com/productpage/14/8/ [23] ChemAxon’s JChem Nodes on the KNIME workbench. [Online]. Available: http://www.chemaxon.com/library/chemaxons-jchem-nodes- on-the-knime-workbench/ [24] M. Röttig, “Combining sequence and structural information into predic- tors of enzymatic activity,” Ph.D. dissertation, University of Tübingen, Nov. 2012. [25] M. Röttig, S. Aiche, L. de la Garza, and B. Kahlert. Generic KNIME Nodes. [Online]. Available: https://github.com/genericworkflownodes/GenericKnimeNodes [26] OpenMS Team. Common Tool Description. [Online]. Available: http://open-ms.sourceforge.net/workflow-integration/topp- and-common-tool-description/ [27] SeqAn Team. SeqAn. [Online]. Available: http://www.seqan.de/ [28] OpenMS Team. OpenMS. [Online]. Available: http://open- ms.sourceforge.net/ [29] BALL Team. Computer Aided Drug Design Suite - CADDSuite. [Online]. Available: http://www.ball-project.org/caddsuite [30] The Eclipse Foundation. Plug-In Development Environment. [Online]. Available: http://www.eclipse.org/pde/ [31] E. W. Dijkstra, “On the role of scientific thought,” in Selected Writings on Computing: A Personal Perspective. Springer, 1982, pp. 60–66. [32] S. Gesing, R. Grunzke, J. Krüger, G. Birkenheuer, M. Wewior, P. Schäfer, B. Schuller, J. Schuster, S. Herres-Pawlis, S. Breuers, A. Balaskó, M. Kozlovszky, A. S. Fabri, L. Packschies, P. Kacsuk, D. Blunk, T. Steinke, A. Brinkmann, G. Fels, R. Müller-Pfefferkorn, R. Jäkel, and O. Kohlbacher, “A Single Sign-On Infrastructure for Science Gateways on a Use Case for Structural Bioinformatics,” Journal of Grid Computing, vol. 10, no. 4, pp. 769–790, Nov. 2012. [Online]. Available: http://www.springerlink.com/index/10.1007/s10723-012-9247-y