=Paper= {{Paper |id=None |storemode=property |title=From the Desktop to the Grid: conversion of KNIME Workflows to gUSE |pdfUrl=https://ceur-ws.org/Vol-993/paper9.pdf |volume=Vol-993 |dblpUrl=https://dblp.org/rec/conf/iwsg/Garza0SRARK13 }} ==From the Desktop to the Grid: conversion of KNIME Workflows to gUSE== https://ceur-ws.org/Vol-993/paper9.pdf
From the desktop to the grid: conversion of KNIME
               Workflows to gUSE
                 Luis de la Garza                                    Jens Krüger                         Charlotta Schärfe
          Applied Bioinformatics Group                  Applied Bioinformatics Group               Applied Bioinformatics Group
        University of Tübingen, Germany               University of Tübingen, Germany           University of Tübingen, Germany
      delagarza@informatik.uni-tuebingen.de

          Marc Röttig                                    Stephan Aiche                                     Knut Reinert
 Applied Bioinformatics Group          Department of Mathematics and Computer Science       Algorithms in Bioinformatics
University of Tübingen, Germany                Freie Universität Berlin, Germany       Freie Universität Berlin, Germany
                                           International Max Planck Research School
                                      for Computational Biology and Scientific Computing
                                                         Berlin, Germany

        Oliver Kohlbacher
   Applied Bioinformatics Group
 University of Tübingen, Germany
oliver.kohlbacher@uni-tuebingen.de



   Abstract—The Konstanz Information Miner is a user-friendly          the sense of good lab practice.
graphical workflow designer with a broad user base in industry
and academia. Its broad range of embedded tools and its powerful       The most obvious and direct advantage of the application of
data mining and visualization tools render it ideal for scientific
workflows. It is thus used more and more in a broad range              workflows in the scientific environment is the capability of
of applications. However, the free version typically runs on a         saving the general sequence of events in order to conveniently
desktop computer, restricting users if they want to tap into           optimize the settings for a simulation, such as including
computing power. The grid and cloud User Support Environment           the sweep through single parameter settings. Scientists also
is a free and open source project created for parallelized and         benefit from other non-obvious advantages of using workflows
distributed systems, but the creation of workflows with the
included components has a steeper learning curve.                      including, but not limited to: ability to analyze the results,
   In this work we suggest an easy to implement solution               including statistical analysis and data visualization, data
combining the ease-of-use of the Konstanz Information Miner            mining on experimentally (wet or dry lab) obtained datasets
with the computational power of distributed computing infras-          and report creation using previously obtained data without
tructures. We present a solution permitting the conversion of          requiring further user input.
workflows between the two platforms. This enables a convenient
development, debugging, and maintenance of scientific workflows
on the desktop. These workflows can then be deployed on a cloud        Those tasks can also be fulfilled using simple scripts or
or grid, thus permitting large-scale computation.                      separate program suites for the individual steps. Workflow
   To achieve our goals, we relied on a Common Tool Description        technology however allows combining all steps together by
XML file format which describes the execution of arbitrary             providing interfaces to external tools while not requiring
programs in a structured and easily readable and parseable way.
In order to integrate external programs into we employed the           any knowledge of programming or scripting languages.
Generic KNIME Nodes extension.                                         Additionally, the workflows established within one project
                                                                       may be easily applied to other projects as well, which then
                      I. I NTRODUCTION                                 facilitates consistency in analysis and reporting throughout
   Workflow technology with platforms such as Pipeline Pilot           several projects, thus reducing the risk of human error and
[1], KNIME [2], Taverna [3], [4], [5] and Galaxy [6], [7], [8]         allowing reproducing previous results. Furthermore, through
have now become a crucial part in supporting scientists in             the ability to share workflows with collaborators or the
their daily work. By helping to create and automate virtual            scientific community a team-based analysis of experimental
processes such as molecular docking or molecular dynamics              results can take place.
simulations, as well as simplifying data analysis and data
mining, scientists are allowed to focus on their primary goals         Nowadays a plethora of different workflow systems exists that
[9]. Furthermore the quality of simulation results is improved,        was initially targeted at different use cases such as desktop-
as following established protocols increases reproducibility in        based data mining or automation of computations on a grid.
With the exponential increase of computational power [10]             as a black box, enabling the execution of sub-workflows
available to scientists, as well as the improvements in network       within WS-PGRADE, which acts as a host system. The
technology, the boundaries between local applications and             second approach puts emphasis on the actual transformation
processes executed on distributed systems became blurred.             of selected workflow languages such as ASKALON, Pegasus,
This has as a result that there does not yet exist a one-fits-all     P-GRADE, MOTEUR and Triana into each other [14]. ER-
solution that is being able to satisfy the scientific user’s          FLOW continues these ideas, adding the aspects of detailed
needs for a combination of local and distributed workflow             evaluation of user community needs and the specific handling
execution. In addition to that, most users of workflow                of scientific applications on remote DCIs, being called by the
technology in the scientific environment have created a               workflows.
library of own workflows with their workflow suite of choice
over the past years. These may now be outdated or not suited                                III. W ORKFLOWS
for the computation resources required for today’s tasks,                The concept of recipes or protocols is familiar to scientists
thus requiring the switch to another workflow environment             from all academic fields. The expression ”workflow” follows
and the need of re-implementing the existing workflows in             this concept of a collection of consecutive computational
the workflow language used by the new environment. For                steps. This may involve preparation steps for importing
example, the Konstanz Information Miner (KNIME) [2] was               data, converting it and to carrying out whatever preparations
mainly created for applications on a local machine and its            are required. After these steps the actual simulation or
free version does not provide access to compute clusters              computational step is usually carried out. There is a multitude
out of the box, but KNIME has, due to its ease of use                 of possible application domains, like quantum calculations,
and extensibility, found wide acceptance in the scientific            molecular dynamics, docking or data mining to name only
community resulting in a huge library of existing KNIME               a few. The last section of a typical workflow deals with the
workflows for various tasks. The grid and cloud User Support          data analysis and visualization, often summarized in the form
Environment (gUSE) [11] on the other hand, was specifically           of a report.
created to use distributed computing infrastructures (DCI),
but the creation of workflows requires more user input and            An important aspect for workflow interoperability is the
therefore is not as straightforward as local systems such as          representation as a graph. The individual tasks represent the
KNIME. A KNIME user now may want to not only use                      nodes; their edges correspond to the data flow or execution
the KNIME desktop version for data analysis and pilot runs            dependencies between these nodes. Hence, when a workflow
for evaluating simulation parameters and post-simulation              shall be converted from e.g. KNIME to gUSE it has to be
analysis, but also the open source gUSE environment for               taken care that the graph representation is similar. Is the
moving the actual simulations to a cluster. The workflows             workflow represented as a strict directed graph or does it
for the simulation pilot run and the actual simulation are            correspond to a multigraph? Are parameter sweeps executable
identical since the first is used to find the best settings and the   via loops or through the enumeration of predefined lists? Does
latter then applies those settings. When using two different          the workflow have multiple start or end points corresponding
software suites such as KNIME and gUSE for the pilot run              to a quiver? This small selection of questions illustrates
and the actual full-scale simulation, it is currently required        the logical constraints faced when dealing with workflow
to implement the workflow twice (one for each software).              conversion from one language into another. Furthermore the
The same applies when switching the workflow software.                data handling and its flow along the graph is of relevance. Is
This re-implementation of existing workflows is a tedious             the data directly incorporated into the nodes, e.g. as tables or
task that would not be needed if it were possible to convert          does it reside elsewhere independently of the execution status
workflows written with one workflow language in a way that            of the specific node? Are there specific formats or conventions
it could then be read by another workflow environment - thus          regarding the dependency to the workflow language? How is
enabling workflow interoperability.                                   the data annotated? Great care has to be taken when facing
                                                                      the conversion of data from different workflow languages.
                      II. R ELATED W ORK
   The question whether a certain computational task is exe-          In the following chapters specific details of KNIME
cutable on different platforms is as old as computers them-           and gUSE are described.
selves. Regarding modern workflow languages, a couple of
specific challenges come into focus, discussed in detail in the                                IV. KNIME
following chapters. Since there is a multitude of workflow               KNIME is one of the most commonly used workflow
languages, the focus shifts for different use cases and other         management systems in the field of e-Science systems,
user communities. The most prominent approach to deal with            especially pharmaceutical research, but also financial data
the general problem of workflow interoperability is SHIWA             analysis and business intelligence [15]. The KNIME pipelining
and its follow up project ER-FLOW [12], [13]. A double                platform is an open-source program implemented as a plug-in
strategy was followed, namely coarse and fine grained in-             for Eclipse [2], written in Java, and offered to the scientific
teroperability. The first one considers a workflow language           community as a desktop version free of charge. Although
                                                                               Fig. 2.    The layered structure of gUSE/WS-PGRADE is shown. Figure
                                                                               modified from [21]
Fig. 1. An illustration of the KNIME workflow concept. Nodes represent
single processing unites and connecting edges between these nodes transport
data or models from one processing unit to the next. In the end a final data   between gUSE services and middlewares, enabling access
table is created that can be saved to a file. Figure modified from [19]
                                                                               to the computational resources of grid or cloud. On the top
                                                                               layer resides WS-PGRADE, the graphical user interface. All
                                                                               functionality of the underlying services is exposed to the
there are extensions allowing the execution of single nodes
                                                                               end-user by portlets residing in a Liferay portlet container
in the cloud or on a grid [16], these are only restricted to a
                                                                               being part of WS-PGRADE.
professional release and are thus not part of the free workflow
management system. Furthermore, KNIME is highly popular
                                                                                  gUSE workflows may be created and maintained via
due to its easy of use and extensibility.
                                                                               standard web browsers accessing corresponding portlets and
                                                                               underlying services. Initially the workflow graph has to be
The KNIME platform implements a modular approach to
                                                                               created through a Java applet. The nodes have to be defined
workflow management and execution in which single nodes
                                                                               while each node may have multiple input and output ports.
represent single processing units such as data manipulation,
                                                                               These work as anchor points for the vertices connecting
as depicted in Figure 1. These nodes are connected via edges
                                                                               them. The selection of applications is done through the
that pipe either data or computational models from one node
                                                                               Concrete portlet also enabling the selection of different
into the next node. Data is internally stored in special java
                                                                               DCIs with different middlewares within the same workflow.
classes called DataTable, which store the data and additional
                                                                               Application specific parameters can be set, as well as resource
meta information about the different data columns [17], [18].
                                                                               requirements such as memory or runtime settings. Beside
The nodes and edges together form a directed acyclic graph,
                                                                               submission and monitoring features, a multitude of import
which is called ”workflow” and converts initial input files
                                                                               and export features are available to the user.
into output data tables that can be further exported as new
files [18].
                                                                               The whole set of services offers convenient access to
                                                                               the vast computational resources of modern grids and clouds.
   The implementation as an Eclipse plug-in with its free API
                                                                               gUSE is available free of charge for academic purposes.
[20] facilitates easy extensibility of the workflow system and
simple integration of novel nodes thus resulting in a vast                                     VI. G ENERIC KNIME N ODES
library of nodes created by the scientific community and also
                                                                                  As previously discussed, KNIME offers a wide array of
commercial software providers.
                                                                               prebuilt nodes for the execution of a multitude of different
                                                                               tasks. It is also possible to obtain external nodes provided
                               V. G USE
                                                                               by community developers such as the ones developed by
   The grid and cloud User Support Environment (gUSE) [11]                     Schrödinger [22], ChemAxon [23], etc. Furthermore, it is
is a highly popular technology for scientific portals enabling                 possible to develop KNIME nodes, being a simple task of
access to distributed computing infrastructures (DCIs). It has                 implementing a few KNIME specific classes in the Java
been developed at the Laboratory of Parallel and Distributed                   programming language. However, we still felt that, although
Systems in Budapest over the past years. gUSE represents the                   KNIME is powerful for most computations and it enables
middle tier of a multi-layer portal solution. Different tasks                  users to easily extend its capabilities, sometimes it is needed
can be handled by a set of high level web services (see Figure                 to integrate external binaries into KNIME in the form of a
2). The Application Repository holds the executable for all                    node in a simpler way.
programs that may be linked to a node within a workflow.
The File Storage deals with the data handling, while the                       We used a KNIME extension called Generic KNIME
Information System takes care of e.g. user information and                     Nodes [24], [25], which allows the integration of arbitrary
job status. The Workflow Interpreter is responsible for the                    programs into KNIME. This integration is fully compatible
workflows and their execution, which are stored in the                         with KNIME and other KNIME nodes and each integrated
Workflow Storage. The Submitter represents the connection                      program behaves as a KNIME node. Since KNIME relies on
the use of data tables rather than on files, GKN also includes
utility nodes such as File to Table, Table to File, Input File
and Output File to ease the interaction of a GKN-generated
node with other nodes.

In order for GKN to properly execute external binaries,
we also relied on an XML-based file format that describes
tools, nodes of the workflow graph, called Common Tool
Description (CTD) [26]. CTD files are XML documents
that contain information about the parameters, flags, inputs
and outputs of a given binary. This information is presented
in a structured and human readable way, thus facilitating
manual generation for arbitrary binaries. Since CTDs are also
properly formed XML documents, parsing of these is a trivial
matter.

The generation of CTDs can be either manual or by
CTD capable programs. Software tool suites such as SeqAn
                                                                   Fig. 3. In order to start the conversion of a workflow, we’ve integrated visual
[27], OpenMS [28] and CADDSuite [29] can not only                  elements in the KNIME platform
generate CTDs for each of its tools, but can also parse input
CTDs and execute its tools accordingly.
        VII. C ONVERSION FROM KNIME TO G USE                       when it comes to the translation of the configuration that
                                                                   assists in the execution of the workflow. It is clear that
A. Overview                                                        significant effort has to be invested to resolve any potential
   The motivation for this conversion lies in the fact that        disparity between the architectures on the computer in which
most scientific computations can be memory and processor           the KNIME workflow was created and the grid or cloud. In
intensive. The requirements to run such computations in            other words, a generic solution cannot simply rely on both
an acceptable time frame are hardly to be met by a simple          the desktop machine in which KNIME is being executed and
desktop or laptop computer. Grids and clouds are packed with       each node in the infrastructure administered by gUSE having
resources ready to be tapped, but as earlier discussed; creating   the same architecture and therefore, the same binaries. For
workflows on such systems can be a tedious task that the           this reason, a conversion table relating the binaries needed
most enthusiast scientists might not be ready to go through.       for each step on the desktop to the ones required on the grid
Based on the popularity of KNIME’s ease of use and its wide        or cloud is needed. Since gUSE supports several middlewares
acceptance in the scientific community, we felt that there was     (e.g. UNICORE, LSF, BOINC, etc.), the usage of a different
a gap to be filled by bridging a great workflow editor such        format to represent the required information to execute a
as KNIME with a great grid and cloud manager such as gUSE.         needed binary has to be accounted for in the workflow
                                                                   conversion process.
Our vision is to have users creating and executing workflows
on KNIME in their desktop computers using a reduced                B. Conversion of complete Workflows
or a test dataset and when an acceptable stable version               The KNIME extension that we have developed to convert
of a workflow is ready, it can be exported into a gUSE             KNIME workflows to gUSE format is fully integrated in
managed grid or cloud. Following this, the user would have         KNIME. When a user is satisfied with a certain workflow,
to configure the exported workflow to include a larger or          all is needed is to request a conversion by simply clicking
a production-ready dataset on which to perform a computation.      a button in a toolbar or a menu element (see Figure 3).
                                                                   What follows is a standard dialog window (see Figure 4) in
One of the great features of the Eclipse Platform is its           which the user can select the desired destination to export the
extendibility through the development of so called plug-ins        workflow. Once the user has selected an export destination,
[30]. Given that KNIME has been built on top of the                an archive that can be uploaded and imported into a gUSE
Eclipse Platform, it is fairly simple to develop KNIME             portal will be generated.
extensions, which in turn are Eclipse Platform plug-ins.
KNIME also exposes an API that gives full access to all of            The conversion process starts by using KNIME’s API to
the elements involved in a workflow [20], both visually and        access each node and its connections to convert them into a
logically. We have developed a simple conversion KNIME             workflow in an intermediate, internal format. Afterwards, this
extension that can export a KNIME Workflow to gUSE format.         internal format workflow is converted into a gUSE workflow.
                                                                   This seemingly impractical design choice was taken in order
The critical challenge for workflow conversion arises              to follow the Separation of Concerns principle [31]. Since the
                                                                        Fig. 6. Using GKN it is possible to perform docking in KNIME with the
Fig. 4. An archive in the gUSE format will be generated, which can be   CADDSuite
imported into gUSE

                                                                        benefit from the strengths of gUSE, but also to overcome
                                                                        its drawbacks. Part of our effort consisted of the creation of
                                                                        a docking workflow using the provided tools by gUSE. As
                                                                        mentioned in a previous section, getting a complex workflow
                                                                        right on gUSE can turn into a quite intimidating task for the
                                                                        inexperienced user. During this time, we felt that the creation
                                                                        of such a complex workflow could and should be simpler.

                                                                        We perform docking using our own software, the Computer
                                                                        Aided Drug Design Suite (CADDSuite) [29], which we
                                                                        also integrated in KNIME using Generic KNIME Nodes, as
                                                                        depicted in Figure 6.

                                                                           Putting this workflow together on KNIME took us less
                                                                        than an hour. A similar version of this workflow on gUSE
                                                                        took us significantly more than that. We were able to export
                                                                        the workflow to gUSE with minimal configuration, that
                                                                        is, we just needed to provide adequate input data files.
                                                                        Since docking is a processor intensive task, we used different
                                                                        data sets in our desktop computers and on the MoSGrid portal.

                                                                        In order to use input files in KNIME with GKN, it is
                                                                        required to use the Input File node. Similarly, for output files,
Fig. 5.     Our KNIME extension has been designed taken into account
extendability and modularity
                                                                        the Output File node must be used. However, in gUSE input
                                                                        and output files are directly associated to a job’s input and
                                                                        output ports, respectively. This is the reason why during the
release schedule of KNIME is something not under of our                 conversion of a workflow from KNIME to gUSE any input
control, it is a good idea to minimize the exposure of the com-         or output file nodes will disappear and take the form of input
ponents of our conversion process from changes in KNIME’s               or output ports in gUSE. (see Figure 7)
API or workflow format by first using an intermediate for-
                                                                                             VIII. F UTURE W ORK
mat. This intermediate format is something internal to the
conversion process whose changes are mandated exclusively                  A major work in progress is how to properly export
by us. Another advantage of this design is that, in the event           workflows that benefit from parameter sweep. This is critical
of extending our KNIME extension by adding other export                 for the performance of exported workflows, since gUSE
formats, it would only be needed to perform the conversion              offers parallelization via parameter sweep.
from this internal format without explicitly converting the
KNIME workflow, thus, decreasing development time and the               KNIME offers several data mining, statistics and reporting
amount of code needed to perform the required task. This is             nodes that could be easily integrated with our docking
broadly depicted on Figure 5.                                           workflow. For instance, it would be desirable to generate
                                                                        a concise PDF report containing the top ranked ligands.
C. Example Application: Docking Workflows                               Unfortunately, the conversion of such nodes is still not
  As users and developers of the Molecular Simulation                   possible. However, KNIME offers a headless execution of
Grid portal (MoSGrid) [32] we have learned not only to                  workflows (i.e., command line), thus giving us the chance to
                                                                          We have these two complementary forces that we feel
                                                                          our KNIME extension smoothly combines. On one side, we
                                                                          have gUSE enabling users to harness the power supplied by a
                                                                          grid or a cloud. On the other side, we have KNIME allowing
                                                                          users to create workflows in a friendly manner. Joining these
                                                                          two is of critical importance for the advancement of scientific
                                                                          fields in which an experiment can be broken up in smaller
                                                                          tasks to form a workflow.

                                                                                                   ACKNOWLEDGMENT
                                                                            The authors would like to thank the BMBF (German
                                                                          Federal Ministry of Education and Research) for the
Fig. 7. The exported workflow in MoSGrid after the user has uploaded to   opportunity to do research in the MoSGrid project (reference
the portal
                                                                          01IG09006). The research leading to these results has also
                                                                          partially been supported by the European Commission’s
                                                                          Seventh Framework Programme (FP7/2007-2013) under grant
work around this current limitation.
                                                                          agreement no 283481 (SCI-BUS).
MoSGrid relies on UNICORE to access binaries and
                                                                          Stephan Aiche gratefully acknowledges funding by the
data. A workflow using UNICORE resources has a different
                                                                          European Commissions’s seventh Framework Program
representation in gUSE than a workflow using an LSF
                                                                          (GA263215).
scheduler. Since we want to reach as many users as possible,
it is desired that our KNIME extension can properly handle
as many constellation of gUSE components as possible.                                                   R EFERENCES
                                                                           [1] accelrys.         Pipeline       Pilot.        [Online].        Available:
                        IX. C ONCLUSION                                        http://accelrys.com/products/pipeline-pilot/
   Any robust scientific experiment must be repeatable.                    [2] M. R. Berthold, N. Cebron, F. Dill, T. R. Gabriel, T. Kötter, T. Meinl,
                                                                               P. Ohl, C. Sieb, K. Thiel, and B. Wiswedel, KNIME: The Konstanz
Workflow technologies provide their users with repeatability                   information miner. Springer, 2008.
on the tasks that comprise a workflow. Furthermore, these                  [3] P. Missier, S. Soiland-Reyes, S. Owen, W. Tan, A. Nenadic, I. Dunlop,
technologies offer adopters with the possibility of saving                     A. Williams, T. Oinn, and C. Goble, “Taverna, reloaded,” in Scientific
temporary and final results for further analysis as well as the                and Statistical Database Management. Springer, 2010, pp. 471–481.
                                                                           [4] D. Hull, K. Wolstencroft, R. Stevens, C. Goble, M. R. Pocock, P. Li,
chance of rerunning a subset of tasks contained in a workflow.                 and T. Oinn, “Taverna: a tool for building and running workflows of
If a configuration error is detected in one of the tasks that                  services,” Nucleic acids research, vol. 34, no. suppl 2, pp. W729–W732,
make up a workflow, this very ability of storing intermediate                  2006.
                                                                           [5] T. Oinn, M. Greenwood, M. Addis, M. N. Alpdemir, J. Ferris, K. Glover,
results allow users to make changes to the configuration and                   C. Goble, A. Goderis, D. Hull, D. Marvin et al., “Taverna: lessons in
later on resume the execution of the workflow without having                   creating a workflow environment for the life sciences,” Concurrency and
to execute tasks not influenced by these changes.                              Computation: Practice and Experience, vol. 18, no. 10, pp. 1067–1100,
                                                                               2006.
                                                                           [6] J. Goecks, A. Nekrutenko, J. Taylor, T. G. Team et al., “Galaxy: a
Making grids and clouds accessible to users has the                            comprehensive approach for supporting accessible, reproducible, and
benefit of speeding up experiments, production of scientific                   transparent computational research in the life sciences,” Genome Biol,
                                                                               vol. 11, no. 8, p. R86, 2010.
texts, ensure an optimal use of resources and minimize idle                [7] D. Blankenberg, G. V. Kuster, N. Coraor, G. Ananda, R. Lazarus,
computing time. As we have argued, one of the main obstacles                   M. Mangan, A. Nekrutenko, and J. Taylor, “Galaxy: A web-based
in accessing grids and clouds is the steep learning curve                      genome analysis tool for experimentalists,” Current protocols in molec-
                                                                               ular biology, pp. 19–10, 2010.
to generate usable workflows. However, gUSE is accessible                  [8] B. Giardine, C. Riemer, R. C. Hardison, R. Burhans, L. Elnitski, P. Shah,
to users and excels in executing workflows in an efficient way.                Y. Zhang, D. Blankenberg, I. Albert, J. Taylor et al., “Galaxy: a platform
                                                                               for interactive large-scale genome analysis,” Genome research, vol. 15,
                                                                               no. 10, pp. 1451–1455, 2005.
It is far more easier to train users to use KNIME in                       [9] K. Görlach, M. Sonntag, D. Karastoyanova, F. Leymann, and M. Reiter,
order to generate workflows and test experiments than to                       “Conventional workflow technology for scientific simulation,” in Guide
teach them how to generate scripts for a certain resource                      to e-Science. Springer, 2011, pp. 323–352.
                                                                          [10] G. E. Moore et al., “Cramming more components onto integrated
manager or middleware. Using KNIME, users can rapidly                          circuits,” 1965.
generate a workflow by using an intuitive and robust user                 [11] MTA-SZTAKI LPDS. grid and cloud User Support Environment.
interface. The obvious limitation is that KNIME will have as                   [Online]. Available: http://guse.hu/
much computing power as the desktop computer on which it                  [12] Sharing Interoperable Workflows for large-scale scientific Simulations
                                                                               on available DCIs. [Online]. Available: http://www.shiwa-workflow.eu/
runs and this might not be adequate for applications such as              [13] Building an European Research Community through Interoperable
docking.                                                                       Workflows and Data. [Online]. Available: http://www.erflow.eu/
[14] M. Kozlovszky, K. Karoczkai, I. Marton, A. Balasko, A. Marosi,
     and P. Kacsuk, “Enabling generic distributed computing infrastructure
     compatibility for workflow management systems,” Computer Science,
     vol. 13, no. 3, pp. 61–78, 2012.
[15] K. Achilleos, C. Kannas, C. Nicolaou, C. Pattichis, and V. Promponas,
     “Open source workflow systems in life sciences informatics,” in Bioin-
     formatics & Bioengineering (BIBE), 2012 IEEE 12th International
     Conference on. IEEE, 2012, pp. 552–558.
[16] CloudBroker:         High     Performance      Computing      Software    as
     a Service - Integration in KNIME. [Online]. Available:
     http://www.knime.org/files/10 CloudBroker.pdf
[17] M. R. Berthold, N. Cebron, F. Dill, G. D. Fatta, T. R. Gabriel, F. Georg,
     T. Meinl, P. Ohl, C. Sieb, and B. Wiswedel, “Knime: The konstanz
     information miner,” in Proceedings of the Workshop on Multi-Agent
     Systems and Simulation (MAS&S), 4th Annual Industrial Simulation
     Conference (ISC), 2006, pp. 58–61.
[18] M. R. Berthold, N. Cebron, F. Dill, T. R. Gabriel, T. Kötter,
     T. Meinl, P. Ohl, K. Thiel, and B. Wiswedel, “Knime - the
     konstanz information miner: version 2.0 and beyond,” SIGKDD Explor.
     Newsl., vol. 11, no. 1, pp. 26–31, Nov. 2009. [Online]. Available:
     http://doi.acm.org/10.1145/1656274.1656280
[19] M. R. Berthold, N. Cebron, F. Dill, T. R. Gabriel, T. Kötter, T. Meinl,
     P. Ohl, C. Sieb, K. Thiel, and B. Wiswedel, “Knime: The konstanz
     information miner,” in In Data Analysis, Machine Learning and Appli-
     cations - Proceedings of the 31st Annual Conference of the Gesellschaft
     für Klassifikation e.V., Studies in Classification, Data Analysis, and
     Knowledge Organization. Berlin, Germany: Springer, 2007, pp. 319–
     326.
[20] KNIME Application Programming Interface. [Online]. Available:
     http://tech.knime.org/docs/api/
[21] MTA-SZTAKI LPDS. grid and cloud User Support Environment Archi-
     tecture. [Online]. Available: http://www.guse.hu/?m=architecture&s=0
[22] Schrödinger         KNIME         Extensions.      [Online].     Available:
     http://www.schrodinger.com/productpage/14/8/
[23] ChemAxon’s JChem Nodes on the KNIME workbench. [Online].
     Available: http://www.chemaxon.com/library/chemaxons-jchem-nodes-
     on-the-knime-workbench/
[24] M. Röttig, “Combining sequence and structural information into predic-
     tors of enzymatic activity,” Ph.D. dissertation, University of Tübingen,
     Nov. 2012.
[25] M.       Röttig,      S.     Aiche,    L.      de    la      Garza,    and
     B. Kahlert. Generic KNIME Nodes. [Online]. Available:
     https://github.com/genericworkflownodes/GenericKnimeNodes
[26] OpenMS          Team.       Common       Tool      Description.    [Online].
     Available:        http://open-ms.sourceforge.net/workflow-integration/topp-
     and-common-tool-description/
[27] SeqAn Team. SeqAn. [Online]. Available: http://www.seqan.de/
[28] OpenMS Team. OpenMS. [Online]. Available: http://open-
     ms.sourceforge.net/
[29] BALL Team. Computer Aided Drug Design Suite - CADDSuite.
     [Online]. Available: http://www.ball-project.org/caddsuite
[30] The Eclipse Foundation. Plug-In Development Environment. [Online].
     Available: http://www.eclipse.org/pde/
[31] E. W. Dijkstra, “On the role of scientific thought,” in Selected Writings
     on Computing: A Personal Perspective. Springer, 1982, pp. 60–66.
[32] S. Gesing, R. Grunzke, J. Krüger, G. Birkenheuer, M. Wewior,
     P. Schäfer, B. Schuller, J. Schuster, S. Herres-Pawlis, S. Breuers,
     A. Balaskó, M. Kozlovszky, A. S. Fabri, L. Packschies,
     P. Kacsuk, D. Blunk, T. Steinke, A. Brinkmann, G. Fels,
     R. Müller-Pfefferkorn, R. Jäkel, and O. Kohlbacher, “A Single
     Sign-On Infrastructure for Science Gateways on a Use Case
     for Structural Bioinformatics,” Journal of Grid Computing,
     vol. 10, no. 4, pp. 769–790, Nov. 2012. [Online]. Available:
     http://www.springerlink.com/index/10.1007/s10723-012-9247-y