=Paper= {{Paper |id=Vol-1871/paper10 |storemode=property |title=From the Desktop to the Grid and Cloud: Conversion of KNIME Workflows to WS-PGRADE |pdfUrl=https://ceur-ws.org/Vol-1871/paper10.pdf |volume=Vol-1871 |authors=Luis de la Garza,Fabian Aicheler,Oliver Kohlbacher |dblpUrl=https://dblp.org/rec/conf/iwsg/GarzaAK16 }} ==From the Desktop to the Grid and Cloud: Conversion of KNIME Workflows to WS-PGRADE== https://ceur-ws.org/Vol-1871/paper10.pdf
                       8th International Workshop on Science Gateways (IWSG 2016), 8-10 June 2016



    From the Desktop to the Grid and Cloud:
Conversion of KNIME Workflows to WS-PGRADE
          Luis de la Garza                        Fabian Aicheler                                 Oliver Kohlbacher
      Center for Bioinformatics       Center for Bioinformatics                  Center for Bioinformatics
      Dept. of Computer Science       Dept. of Computer Science                  Dept. of Computer Science
  University of Tübingen, Germany University of Tübingen, Germany                 Faculty of Medicine
delagarza@informatik.uni-tuebingen.de                               Quantitative Biology Center, University of Tübingen
                                                                     Max Planck Institute for Developmental Biology
                                                                                          Germany


   Abstract—Computational analyses for research usually consist        Researchers often break down big, complicated analyses
of a complicated orchestration of data flows, software libraries,   into smaller units of work that are easier to manage. These
visualization, selection of adequate parameters, etc. Structuring   so-called tasks perform one specific function and take an input
these complex activities into a collaboration of simple, repro-
ducible and well defined tasks brings down complexity and           along with controlling parameters to produce a defined output.
increases reproducibility. This is the basic notion of workflows.   Input usually takes the form of files, whereas output could
   Workflow engines allow users to create and execute workflows,    also be for example a set of visualizations. The combination
each having unique features. In some cases, certain features        of tasks is often referred to as a workflow. Task outputs can
offered by platforms are royalty-based, hindering use in the        be passed on as inputs to other tasks, defining an order of
scientific community.
   We present our efforts to convert whole workflows created in     execution for each step of the comprising workflow. Adoption
the Konstanz Information Miner Analytics Platform to the Web        of workflows not only increases reproducibility but also offers
Services Parallel Grid Runtime and Developer Environment. We        the following benefits:
see the former as a great workflow editor due to its considerable
user base and user-friendly graphical interface. We deem the          • Storage of intermediate results (e.g., for troubleshooting,
latter as a great backend engine able to interact with most             additional analysis, bottleneck identification)
major distributed computing interfaces. We introduce work             • Simplified substitution of single tasks (e.g., for bench-
that provides a platform-independent tool representation, thus          marking, testing purposes)
assisting in the conversion of whole workflows. We also present       • Parallel execution of workflow branches (i.e., parameter
the challenges inherent to workflow conversion across systems,
as well as the ones posed by the conversion between the chosen          sweep)
workflow engines, along with our proposed solution to overcome        • Reusability of components
these challenges.                                                     • Independent, parallel development of specialized tasks
   The combined features of these two platforms (i.e., intuitive
workflow design on a desktop computer and execution of work-
flows on distributed high performance computing interfaces)         A. Workflow Interoperability and Conversion
greatly benefit researchers and minimize time spent in technical      Throughout this work we will use workflow terminology
chores not directly related to their area of research.
   Keywords—WS-PGRADE, KNIME, conversion, fine-grained              and representation consistent with our previous work [5], [6].
interoperability, workflow                                          Figures 1 and 2 briefly summarize this.

                      I. I NTRODUCTION
   Computers are essential in various scientific fields. Ex-
ample domains requiring high-performance computing (HPC)
include vaccine design, astrophysics, or the multidisciplinary
field of bioinformatics. Here, the declining costs of both
                                                                    Fig. 1. The abstract layer of a workflow. Vertices represent tasks, edges
data generation and storage in the last few years [1] pushed        indicate the execution order. At this point, no implementation or technical
bioinformaticians into using HPC resources such as grids and        details are represented.
clouds.
   Simultaneously, the scope of research is getting more and           Since the abstract workflow layer contains solely applica-
more refined and complex. As such, upholding the scientific         tion domain information, it is independent of the execution re-
method increases in difficulty: Being able to reproduce previ-      quirements. Thus, the abstract layer remains unchanged across
ously observed results when keeping all variables constant,         workflow engines. In contrast, the concrete workflow layer,
can often be an arduous task. Consequently, journals and            the workflow engine and the executing platform are tightly
news outlets have repeatedly reported cases of published but        coupled. This divergence of concrete layers across engines
irreproducible results [2], [3], [4].                               makes workflow interoperability challenging. Furthermore,
                             8th International Workshop on Science Gateways (IWSG 2016), 8-10 June 2016




Fig. 2. The concrete layer of a workflow. The concrete layer contains
implicit application domain information. But unlike the abstract layer, vertices
are annotated with extra attributes. These are the required resources to execute
the portrayed tasks.
                                                                                       Fig. 3. Workflow conversion challenges. Two different engines (i.e., e1,
                                                                                       e2) running on two different platforms (i.e., p1, p2) contain different concrete
                                                                                       layers of the same workflow. The abstract layer, however, remains unchanged.
workflow engines often contain distinct features, complicating                         A successful workflow conversion must take into account not only the
conversion across platforms.                                                           differences among the source and target engines, but must also consider the
                                                                                       source and target platforms or operating systems.
   One way to alleviate these problems is the development of
platform-independent workflow representations, e.g., the In-
teroperable Workflow Intermediate Representation (IWIR) [7]
                                                                                       remotely execute workflows, however, WS-PGRADE offers a
and Yet another Workflow Language (YAWL) [8] to en-
                                                                                       wider support for resource managers to access DCIs.
able fine-grained interoperability (FGI). However, platform-
independent workflow representations do not address workflow                              We focus on providing fine-grained interoperability between
implementations. The Sharing interoperable Workflows for                               a great workflow editor such as the KNIME Analytics Platform
large-scale scientific Simulation on available distributed com-                        and a versatile, scalable workflow execution platform such as
puting interfaces project (SHIWA) [9], for instance, provides                          WS-PGRADE.
execution of workflows built on different workflow engines                                The first step to provide interoperability is to represent tasks
by uploading them to the SHIWA Simulation Platform. Users                              in a platform-independent manner. Certain attributes of tool
handling data subject to privacy restrictions (e.g., patient data)                     execution remain unchanged across platforms (e.g., version
might find it an unsuitable solution.                                                  and parameters), while some others change (e.g., location
   A proper workflow conversion across engines requires that                           of executables, input and output files). Attributes in need of
the abstract layer remains unchanged (i.e., source and target                          adjustment have to be identified. A platform-independent tool
workflow can be considered logically equivalent). The location                         representation facilitates the task conversion across platforms
of resources, how different engines implement single nodes                             and thus the conversion of full workflows.
and logical constructs (e.g., parameter sweep) are some of                                One of the first challenges in the conversion between these
the aspects to be considered. Features unique to one engine                            engines is the maintenance of a database that relates tools
engine represent a complication. Figure 3 shows an example                             on the user’s computer with tools on each of the target DCI
of a simplified workflow conversion.                                                   platforms. The next set of challenges concerns the implemen-
                                                                                       tation of nodes and logical workflow constructs. The KNIME
                         II. I MPLEMENTATION                                           Analytics Platform implements parameter sweep via node-
   The Web Services Parallel Grid Runtime and Developer                                delimited workflow sections (i.e., using ZipLoopStart, Zip-
Environment Portal (WS-PGRADE) [10] is a web-based work-                               LoopEnd nodes). WS-PGRADE delimits such sections with
flow engine that interacts with a wide array of resource                               generator and collector ports. Furthermore, WS-PGRADE
managers (e.g. Moab, LSF) to access distributed computing                              allows users to assign data files directly to input ports. The
interfaces (DCIs). This makes it a great back-end workflow                             KNIME Analytics Platform, however, requires a dedicated
execution engine. Tasks of the same workflow can be executed                           node (e.g., Input File, Input Files), whose output port refers
on different DCIs. However, workflow creation is a multi-step                          to a file and this reference can be channeled to an input port.
process, posing problems for users without adequate training.                             Some features present in the KNIME Analytics Platform
   The Konstanz Information Miner Analytics Platform                                   are not found in WS-PGRADE. The former requires ports to
(KNIME Analytics Platform) [11] is hosted on a personal                                declare which data types they are compatible with and supports
computer. It features an intuitive interface, contains more than                       file lists as inputs; the latter is more flexible and lacks native
1,000 pre-loaded tools and hundreds of sample workflows.                               support of file lists as inputs (i.e., each input or output port is
Addition of new tools requires knowledge of the Java pro-                              related to one file). Different to WS-PGRADE, KNIME Nodes
gramming language—an aspect that might keep some users                                 produce outputs not only via output ports: They can also set
away from this feature. A couple of royalty-based variants (i.e.,                      flow variables, which can be read further down the execution
the so-called KNIME Collaborative Extensions) are offered to                           flow.

                                                                                   2
                           8th International Workshop on Science Gateways (IWSG 2016), 8-10 June 2016


   The KNIME Analytics Platform is a Java program with a                          checking: Any output port can be connected to any input port.
graphic interface. KNIME Nodes are then instances of Java                         Additionally, the structure of the incoming and outgoing files
classes that live inside the process which launched the KNIME                     is arbitrary.
Analytics Platform. In other words, they require a running
instance of the KNIME Analytics Platform to be executed,
making their execution on a DCI a challenge.
   The following sections describe our approach to address the
mentioned challenges.
A. Conversion of Nodes: Addressing Disparities between
Workflow Engines
   The KNIME Analytics Platform features a node repository
in which users can select any of the available nodes (see Figure
4). Creation of workflows in the KNIME Analytics Platform
requires a single step, thus the abstract and concrete layers are                 Fig. 5. The multi-step creation of Workflows in WS-PGRADE. An abstract
                                                                                  is first created (left pane), after which a concrete can be created and configured
merged into the user-friendly workflow editor. Each KNIME                         (right pane).
Node performs a specific task and defines a fixed number of
input, output ports. Each port is associated to a port type,                         Adding nodes to the KNIME Analytics Platform re-
which is similar to content types (e.g., csv, pdb). Only ports of                 quires knowledge of the Java programming language. Generic
compatible types can be interconnected. Furthermore, KNIME                        KNIME Nodes (GKN) [5], [6] was developed to add nodes
Nodes rely on the assumption that incoming and outgoing data                      without programming experience by allowing arbitrary com-
are arranged in custom in-memory data tables. Each KNIME                          mand line tools to behave as KNIME Nodes and to seamlessly
Node iterates over the rows of incoming data and is able to                       interact with other nodes inside the KNIME Analytics Plat-
modify the contents of the input table, as well as its structure                  form. The only requirement is the representation of the tools
(e.g., by adding columns or rows). File handling is done by                       by Common Tool Descriptors (CTDs), which are XML files
using these same data tables, their cells containing uniform                      describing the inputs, outputs and parameters of a tool [5], [6].
resource identifiers (URI) pointing to the needed files.                          Currently, several software suites [12], [13], [14] are able to
                                                                                  parse and generate CTDs (i.e., they are CTD-enabled). Figure
                                                                                  6 illustrates how CTDs interact with CTD-enabled tools.
                                                                                     We introduce KNIME2gUSE, an extension to the KNIME
                                                                                  Analytics Platform which converts workflows from the
                                                                                  KNIME Analytics Platform to WS-PGRADE, combining the
                                                                                  features of both engines and overcoming their disadvantages.
                                                                                     Conversion of KNIME Nodes that were imported using
                                                                                  GKN is somewhat trivial. Each of these nodes represents an
                                                                                  external tool that is independent of the KNIME Analytics
                                                                                  Platform. In this case, the matching binary for the represented
                                                                                  tool is required on each of the target DCIs.
                                                                                     We identify native nodes as those KNIME Nodes that were
                                                                                  not imported using GKN (i.e., pre-packaged nodes, nodes
                                                                                  added as third-party extensions or nodes added by the user
                                                                                  via other means). Each native KNIME Node is an instance
                                                                                  of a Java class managed by the KNIME Analytics Platform.
                                                                                  Such nodes exist only in the context of the process that
                                                                                  hosts the KNIME Analytics Platform. Execution of a single
                                                                                  KNIME Node requires a running instance of the KNIME
Fig. 4. The KNIME Analytics Platform Node Repository. Available nodes             Analytics Platform and converting these nodes is not trivial.
can be selected from the node repository by drag and dropping them into the
current workflow editor.                                                          Furthermore, a suitable distribution of the KNIME Analytics
                                                                                  Platform must be present on each of the target DCIs.
   WS-PGRADE, on the other hand, requires the creation of                            Data between KNIME Nodes can only be channeled be-
an abstract and a concrete workflow in a multi-step process                       tween ports with compatible data types. Since channeled data
(see Figure 5). During the creation of the concrete workflow,                     are in-memory representations of table-formatted data (i.e.,
users input the required attributes and command line to as-                       data tables), we have devised a solution that allows native
sociate a node to a specific remote binary. In contrast to the                    KNIME Nodes to be executed as if they were command line
KNIME Analytics Platform, WS-PGRADE allows to assign                              tools: During the export process, native KNIME Nodes are
files directly to input ports and it doesn’t perform a strict type                individually packed into a small KNIME workflow. Each such

                                                                              3
                           8th International Workshop on Science Gateways (IWSG 2016), 8-10 June 2016


generated workflow contains a copy of the original node, along
with any user-established settings. Since inputs and outputs
for the exported node won’t be channeled inside an instance
of the KNIME Analytics Platform, extra reader and writer
nodes (i.e., Table Reader and Table Writer) are also included
in this small workflow. These nodes allow the serialization
and deserialization of the in-memory data format required
by native KNIME Nodes. The KNIME Analytics Platform
can execute workflows in a so-called batch mode, without
the need of a graphical user interface. A suitable command
line is automatically generated during our export process.
When the batch mode execution of this generated workflow
is started, input files will be read into the KNIME data table
format; upon completion, any output will be serialized from
the KNIME data table format into a file.
   The work previously presented in [5], [6] introduced work                     Fig. 7. Conversion of native KNIME nodes. Since Row Splitter exists
we have done in the field and showcased conversion of                            only in the context of the KNIME Analytics Platform, it requires an instance
                                                                                 of a KNIME Analytics Platform for its execution. KNIME2gUSE generates
KNIME workflows composed solely of nodes that were im-                           a workflow containing the required input/output nodes and a copy of Row
ported via GKN. We have extended KNIME2gUSE in order                             Splitter with the same configuration settings as its origin node. The generated
to convert workflows composed of any kind of nodes. Figures                      command line invokes a KNIME Analytics Platform in the target DCI in
                                                                                 batch mode, reads input from a file and writes outputs to files.
7 and 8 depict how the conversion of nodes is performed.




Fig. 6. A CTD in action. Top section: Parameters needed for the tool
Ligand3DGenerator [14] to be executed, namely input file, output file and
the desired force field. Middle section: A sample CTD snippet representing
an execution of the Ligand3DGenerator using the shown parameter values.
Bottom section: A CTD-enabled tool with the given sample CTD.
                                                                                 Fig. 8. Conversion of GKN-imported nodes. The nodes depicted on the
                                                                                 left side directly interact with binaries located on the user’s desktop computer.
B. Conversion of Workflows: Exporting KNIME wofkflows to                         The right column shows the mapped binaries and an equivalent execution on
WS-PGRADE                                                                        a target DCI. Since Ligand3DGenerator is CTD-enabled, a suitable CTD file
                                                                                 can be generated; this is not the case for the blastn tool.
   The KNIME2gUSE plug-in produces files that can be
imported into WS-PGRADE, ready to be executed on any
configured DCI with minor modifications.                                         End nodes and substitutes suitable WS-PGRADE generator
   We have chosen WS-PGRADE as the target engine for                             and collector ports.
the export process due to the fact that it interacts directly
with a wide selection of resource and cloud managers (a                          C. Example   Application:                Biomarker          Discovery         in
feature not present in the royalty-based KNIME editions that                     Metabolomics
allow remote execution). It also features workflow submission,                      Metabolomics is a mass spectrometry-based approach aimed
control, monitoring and statistics. These are functionalities                    to evaluate the entirety of a metabolite sample. Applications
which resource managers or cloud engines often lack.                             include the tracking of chemicals and their transformation
   The KNIME Analytics Platform natively supports the as-                        products in waste water [15], identification of cancer types via
sociation of single input/output ports to a file list determined                 biomarkers [16], [17] and elucidation of disease-underlying
at runtime, a functionality not present in WS-PGRADE. To                         mechanisms [18]. Compared to complementary omics tech-
overcome this, a wrapper script is automatically generated                       nologies (e.g., transcriptomics, proteomics), metabolomics is
by KNIME2gUSE that zips corresponding files into a single                        closer to the actual biochemical processes that occur, making
archive. To translate parameter sweep sections, conversion re-                   it attractive for biomarker development.
moves KNIME Analytics Platform ZipLoopStart and ZipLoop-                            A common analysis approach for studies interested in com-

                                                                             4
                            8th International Workshop on Science Gateways (IWSG 2016), 8-10 June 2016




Fig. 9. Metabolics biomarker discovery workflow both in the KNIME Analytics Platform and WS-PGRADE. Top: Workflow implemented in the
KNIME Analytics Platform, OpenMS [12] was imported using GKN. Bottom: Workflow generated by KNIME2gUSE imported into WS-PGRADE (workflow
slightly edited for visual clarity). Even if some elements are missing after conversion (e.g., ZipLoopStart, ZipLoopEnd, Input File nodes), both versions have
the same abstract layer. This is due to the difference in implementation of logically equivalent constructs in WS-PGRADE and the KNIME Analytics Platform,
such as parameter sweep.



parative metabolite concentrations is label-free quantification.                  9. We assume some initial preparations were performed prior
The independence from chemical labels allows the direct                           to the execution of the workflow, namely, conversion from
comparison of small molecules across an arbitrary number of                       closed mass spectrometer vendor formats to the open mzML
samples. As a consequence, the need to evaluate hundreds                          format and data reduction by means of peak picking, which
of gigabyte-sized samples in concert is already common.                           could also be implemented in KNIME via OpenMS [12] tools.
Numbers and sizes of concurrently evaluated samples are
steadily increasing, emphasizing the necessity for distributed                       Using a detection method for so-called small
computing.                                                                        molecules [19], we adapted a label-free quantification
                                                                                  pipeline [20]. The quantification part of our biomarker
   We provide an example workflow for metabolomics                                discovery workflow consists of sample specific feature
biomarker discovery using OpenMS [12] for mass spectrome-                         detection (i.e., finding the convex hulls and respective
try algorithms as well as various native KNIME Nodes (includ-                     centroids of analyte mass traces) followed by temporal
ing nodes for the R scripting language). The KNIME workflow                       alignment of samples and the quantification of corresponding
and its converted WS-PGRADE version are shown in Figure                           features across samples.

                                                                              5
                           8th International Workshop on Science Gateways (IWSG 2016), 8-10 June 2016


   Downstream small molecule identification was done via                           [3] “Trouble at the lab,” The Economist, oct 2013.
mass-based search in the Human Metabolome Database. In-                            [4] M. Baker, “Over half of psychology studies fail reproducibility test,”
                                                                                       Nature, Aug. 2015.
cluded sample normalization allows for comparison of analyte                       [5] L. de la Garza, J. Veit, A. Szolek, M. Röttig, S. Aiche, S. Gesing,
abundances across samples. Analytes whose abundances vary                              K. Reinert, and O. Kohlbacher, “From the desktop to the grid: scalable
significantly after false discovery rate correction are anno-                          bioinformatics via workflow conversion,” BMC Bioinformatics, vol. 17,
                                                                                       no. 1, pp. 1–12, 2016.
tated with the mass-based identifications and exported to a                        [6] L. de la Garza, J. Krüger, C. Schärfe, M. Röttig, S. Aiche, K. Reinert,
Microsoft Excel Spreadsheet (XLS format).                                              and O. Kohlbacher, “From the desktop to the grid: conversion of knime
                                                                                       workflows to guse.” in IWSG, 2013.
                        III. F UTURE W ORK                                         [7] K. Plankensteiner, J. Montagnat, and R. Prodan, “IWIR: A Language
                                                                                       Enabling Portability Across Grid Workflow Systems,” in SIGMOD Rec.,
  The KNIME Analytics Platform features Metanodes en-                                  vol. 34, no. 3, 2011, pp. 97–106.
capsulating complete workflows. We would like to extend                            [8] W. van der Aalst and A. ter Hofstede, “YAWL: yet another workflow
KNIME2gUSE to support their conversion. Furthermore, see-                              language,” Information Systems, vol. 30, no. 4, pp. 245–275, Jun. 2005.
                                                                                   [9] G. Terstyanszky, T. Kukla, T. Kiss, P. Kacsuk, A. Balasko, and Z. Farkas,
ing that considerable effort has been put into creating platform-                      “Enabling scientific workflow sharing through coarse-grained interoper-
independent workflow representation formats, we would like                             ability,” Future Generation Computer Systems, vol. 37, pp. 46–59, 2014.
to add IWIR and YAWL file generation to KNIME2gUSE.                               [10] P. Kacsuk, Z. Farkas, M. Kozlovszky, G. Hermann, A. Balasko,
                                                                                       K. Karoczkai, and I. Marton, “WS-PGRADE/gUSE Generic DCI Gate-
We would also like to extend our converter to support other                            way Framework for a Large Variety of User Communities,” Journal of
workflow engines, such as Galaxy.                                                      Grid Computing, vol. 10, no. 4, pp. 601–630, 2012.
                                                                                  [11] M. R. Berthold, N. Cebron, F. Dill, T. R. Gabriel, T. Kötter, T. Meinl,
                          IV. C ONCLUSION                                              P. Ohl, K. Thiel, and B. Wiswedel, “KNIME - the Konstanz information
                                                                                       miner,” ACM SIGKDD Explorations Newsletter, vol. 11, no. 1, p. 26,
   Workflows assist reproducibility and minimize time spent                            Nov. 2009.
validating research by reducing analysis complexity. There are                    [12] M. Sturm, A. Bertsch, C. Gröpl, A. Hildebrandt, R. Hussong, E. Lange,
currently several workflow engines with user-friendly inter-                           N. Pfeifer, O. Schulz-Trieglaff, A. Zerck, K. Reinert, and O. Kohlbacher,
                                                                                       “OpenMS - an open-source software framework for mass spectrometry.”
faces that support remote execution of workflows. However,                             BMC bioinformatics, vol. 9, p. 163, 2008.
we feel that their scalability and support of major resource                      [13] A. Döring, D. Weese, T. Rausch, and K. Reinert, “SeqAn an efficient,
managers is still lacking. In contrast, HPC infrastructures                            generic C++ library for sequence analysis.” BMC bioinformatics, vol. 9,
                                                                                       no. 1, p. 11, Jan. 2008.
and their resource managers rarely support the execution and                      [14] A. Hildebrandt, A. K. Dehof, A. Rurainski, A. Bertsch, M. Schumann,
control of workflows. As a consequence, HPC users often                                N. C. Toussaint, A. Moll, D. Stöckel, S. Nickels, S. C. Mueller, H.-
require programming skills to handle the channeling of data                            P. Lenhof, and O. Kohlbacher, “BALL–biochemical algorithms library
                                                                                       1.3.” BMC bioinformatics, vol. 11, p. 531, 2010.
as well as to submit, monitor and control the respective                          [15] E. L. Schymanski, H. P. Singer, P. Longrée, M. Loos, M. Ruff, M. A.
computing jobs.                                                                        Stravs, C. Ripollés Vidal, and J. Hollender, “Strategies to characterize
   We present our efforts to support workflow export from                              polar organic contamination in wastewater: exploring the capability of
                                                                                       high resolution mass spectrometry.” Environmental science & technol-
the KNIME Analytics Platform to WS-PGRADE, identified                                  ogy, vol. 48, no. 3, pp. 1811–8, Jan. 2014.
challenges for both node and workflow conversion and detailed                     [16] M. Sugimoto, D. T. Wong, A. Hirayama, T. Soga, and M. Tomita, “Capil-
our solutions. KNIME offers remote workflow execution, but it                          lary electrophoresis mass spectrometry-based saliva metabolomics iden-
                                                                                       tified oral, breast and pancreatic cancer-specific profiles.” Metabolomics
is a royalty-based solution and support of DCIs is limited—an                          : Official journal of the Metabolomic Society, vol. 6, no. 1, pp. 78–95,
aspect in which WS-PGRADE excels. KNIME2gUSE brings                                    mar 2010.
together a user-friendly and intuitive workflow engine for                        [17] C. Denkert, J. Budczies, T. Kind, W. Weichert, P. Tablack, J. Sehouli,
                                                                                       S. Niesporek, D. Könsgen, M. Dietel, and O. Fiehn, “Mass spectrometry-
personal computers together with a scalable HPC workflow                               based metabolic profiling reveals different metabolite patterns in invasive
platform that interacts with several DCIs.                                             ovarian carcinomas and ovarian borderline tumors.” Cancer research,
   We thus provide the individual advantages of both engines                           vol. 66, no. 22, pp. 10 795–804, Nov. 2006.
                                                                                  [18] J. S. Hansen, X. Zhao, M. Irmler, X. Liu, M. Hoene, M. Scheler,
without any of their shortcomings. Overall, our methods                                Y. Li, J. Beckers, M. Hrab? de Angelis, H.-U. Häring, B. K. Pedersen,
decrease time spent designing workflows and troubleshooting                            R. Lehmann, G. Xu, P. Plomgaard, and C. Weigert, “Type 2 diabetes
conversion for different workflow engines.                                             alters metabolic and transcriptional signatures of glucose and amino acid
                                                                                       metabolism during exercise and recovery.” Diabetologia, vol. 58, no. 8,
                        ACKNOWLEDGMENT                                                 pp. 1845–54, Aug. 2015.
                                                                                  [19] E. Kenar, H. Franken, S. Forcisi, K. Wörmann, H.-U. Häring,
   The authors would like to thank Bernd Wiswedel, Thorsten                            R. Lehmann, P. Schmitt-Kopplin, A. Zell, and O. Kohlbacher,
Meinl, Patrick Winter and Michael Berthold for their support,                          “Automated label-free quantification of metabolites from liquid
                                                                                       chromatography-mass spectrometry data.” Molecular & cellular pro-
patience and help in developing the KNIME2gUSE extension.                              teomics : MCP, vol. 13, no. 1, pp. 348–59, jan 2014.
   This work was supported by the German Network                                  [20] H. Weisser, S. Nahnsen, J. Grossmann, L. Nilse, A. Quandt, H. Brauer,
for Bioinformatics Infrastructure (Deutsches Netzwerk für                             M. Sturm, E. Kenar, O. Kohlbacher, R. Aebersold, and L. Malmström,
                                                                                       “An automated pipeline for high-throughput label-free quantitative pro-
Bioinformatik-Infrastruktur, de.NBI).                                                  teomics.” Journal of proteome research, vol. 12, no. 4, pp. 1628–44,
                                                                                       Apr. 2013.
                             R EFERENCES
 [1] C. S. Greene, J. Tan, M. Ung, J. H. Moore, and C. Cheng, “Big data
     bioinformatics.” Journal of cellular physiology, vol. 229, no. 12, pp.
     1896–900, Dec. 2014.
 [2] M. McNutt, “Reproducibility.” Science (New York, N.Y.), vol. 343, no.
     6168, p. 229, 2014.


                                                                              6