=Paper=
{{Paper
|id=Vol-1871/paper10
|storemode=property
|title=From the Desktop to the Grid and Cloud: Conversion of KNIME Workflows to WS-PGRADE
|pdfUrl=https://ceur-ws.org/Vol-1871/paper10.pdf
|volume=Vol-1871
|authors=Luis de la Garza,Fabian Aicheler,Oliver Kohlbacher
|dblpUrl=https://dblp.org/rec/conf/iwsg/GarzaAK16
}}
==From the Desktop to the Grid and Cloud: Conversion of KNIME Workflows to WS-PGRADE==
8th International Workshop on Science Gateways (IWSG 2016), 8-10 June 2016 From the Desktop to the Grid and Cloud: Conversion of KNIME Workflows to WS-PGRADE Luis de la Garza Fabian Aicheler Oliver Kohlbacher Center for Bioinformatics Center for Bioinformatics Center for Bioinformatics Dept. of Computer Science Dept. of Computer Science Dept. of Computer Science University of Tübingen, Germany University of Tübingen, Germany Faculty of Medicine delagarza@informatik.uni-tuebingen.de Quantitative Biology Center, University of Tübingen Max Planck Institute for Developmental Biology Germany Abstract—Computational analyses for research usually consist Researchers often break down big, complicated analyses of a complicated orchestration of data flows, software libraries, into smaller units of work that are easier to manage. These visualization, selection of adequate parameters, etc. Structuring so-called tasks perform one specific function and take an input these complex activities into a collaboration of simple, repro- ducible and well defined tasks brings down complexity and along with controlling parameters to produce a defined output. increases reproducibility. This is the basic notion of workflows. Input usually takes the form of files, whereas output could Workflow engines allow users to create and execute workflows, also be for example a set of visualizations. The combination each having unique features. In some cases, certain features of tasks is often referred to as a workflow. Task outputs can offered by platforms are royalty-based, hindering use in the be passed on as inputs to other tasks, defining an order of scientific community. We present our efforts to convert whole workflows created in execution for each step of the comprising workflow. Adoption the Konstanz Information Miner Analytics Platform to the Web of workflows not only increases reproducibility but also offers Services Parallel Grid Runtime and Developer Environment. We the following benefits: see the former as a great workflow editor due to its considerable user base and user-friendly graphical interface. We deem the • Storage of intermediate results (e.g., for troubleshooting, latter as a great backend engine able to interact with most additional analysis, bottleneck identification) major distributed computing interfaces. We introduce work • Simplified substitution of single tasks (e.g., for bench- that provides a platform-independent tool representation, thus marking, testing purposes) assisting in the conversion of whole workflows. We also present • Parallel execution of workflow branches (i.e., parameter the challenges inherent to workflow conversion across systems, as well as the ones posed by the conversion between the chosen sweep) workflow engines, along with our proposed solution to overcome • Reusability of components these challenges. • Independent, parallel development of specialized tasks The combined features of these two platforms (i.e., intuitive workflow design on a desktop computer and execution of work- flows on distributed high performance computing interfaces) A. Workflow Interoperability and Conversion greatly benefit researchers and minimize time spent in technical Throughout this work we will use workflow terminology chores not directly related to their area of research. Keywords—WS-PGRADE, KNIME, conversion, fine-grained and representation consistent with our previous work [5], [6]. interoperability, workflow Figures 1 and 2 briefly summarize this. I. I NTRODUCTION Computers are essential in various scientific fields. Ex- ample domains requiring high-performance computing (HPC) include vaccine design, astrophysics, or the multidisciplinary field of bioinformatics. Here, the declining costs of both Fig. 1. The abstract layer of a workflow. Vertices represent tasks, edges data generation and storage in the last few years [1] pushed indicate the execution order. At this point, no implementation or technical bioinformaticians into using HPC resources such as grids and details are represented. clouds. Simultaneously, the scope of research is getting more and Since the abstract workflow layer contains solely applica- more refined and complex. As such, upholding the scientific tion domain information, it is independent of the execution re- method increases in difficulty: Being able to reproduce previ- quirements. Thus, the abstract layer remains unchanged across ously observed results when keeping all variables constant, workflow engines. In contrast, the concrete workflow layer, can often be an arduous task. Consequently, journals and the workflow engine and the executing platform are tightly news outlets have repeatedly reported cases of published but coupled. This divergence of concrete layers across engines irreproducible results [2], [3], [4]. makes workflow interoperability challenging. Furthermore, 8th International Workshop on Science Gateways (IWSG 2016), 8-10 June 2016 Fig. 2. The concrete layer of a workflow. The concrete layer contains implicit application domain information. But unlike the abstract layer, vertices are annotated with extra attributes. These are the required resources to execute the portrayed tasks. Fig. 3. Workflow conversion challenges. Two different engines (i.e., e1, e2) running on two different platforms (i.e., p1, p2) contain different concrete layers of the same workflow. The abstract layer, however, remains unchanged. workflow engines often contain distinct features, complicating A successful workflow conversion must take into account not only the conversion across platforms. differences among the source and target engines, but must also consider the source and target platforms or operating systems. One way to alleviate these problems is the development of platform-independent workflow representations, e.g., the In- teroperable Workflow Intermediate Representation (IWIR) [7] remotely execute workflows, however, WS-PGRADE offers a and Yet another Workflow Language (YAWL) [8] to en- wider support for resource managers to access DCIs. able fine-grained interoperability (FGI). However, platform- independent workflow representations do not address workflow We focus on providing fine-grained interoperability between implementations. The Sharing interoperable Workflows for a great workflow editor such as the KNIME Analytics Platform large-scale scientific Simulation on available distributed com- and a versatile, scalable workflow execution platform such as puting interfaces project (SHIWA) [9], for instance, provides WS-PGRADE. execution of workflows built on different workflow engines The first step to provide interoperability is to represent tasks by uploading them to the SHIWA Simulation Platform. Users in a platform-independent manner. Certain attributes of tool handling data subject to privacy restrictions (e.g., patient data) execution remain unchanged across platforms (e.g., version might find it an unsuitable solution. and parameters), while some others change (e.g., location A proper workflow conversion across engines requires that of executables, input and output files). Attributes in need of the abstract layer remains unchanged (i.e., source and target adjustment have to be identified. A platform-independent tool workflow can be considered logically equivalent). The location representation facilitates the task conversion across platforms of resources, how different engines implement single nodes and thus the conversion of full workflows. and logical constructs (e.g., parameter sweep) are some of One of the first challenges in the conversion between these the aspects to be considered. Features unique to one engine engines is the maintenance of a database that relates tools engine represent a complication. Figure 3 shows an example on the user’s computer with tools on each of the target DCI of a simplified workflow conversion. platforms. The next set of challenges concerns the implemen- tation of nodes and logical workflow constructs. The KNIME II. I MPLEMENTATION Analytics Platform implements parameter sweep via node- The Web Services Parallel Grid Runtime and Developer delimited workflow sections (i.e., using ZipLoopStart, Zip- Environment Portal (WS-PGRADE) [10] is a web-based work- LoopEnd nodes). WS-PGRADE delimits such sections with flow engine that interacts with a wide array of resource generator and collector ports. Furthermore, WS-PGRADE managers (e.g. Moab, LSF) to access distributed computing allows users to assign data files directly to input ports. The interfaces (DCIs). This makes it a great back-end workflow KNIME Analytics Platform, however, requires a dedicated execution engine. Tasks of the same workflow can be executed node (e.g., Input File, Input Files), whose output port refers on different DCIs. However, workflow creation is a multi-step to a file and this reference can be channeled to an input port. process, posing problems for users without adequate training. Some features present in the KNIME Analytics Platform The Konstanz Information Miner Analytics Platform are not found in WS-PGRADE. The former requires ports to (KNIME Analytics Platform) [11] is hosted on a personal declare which data types they are compatible with and supports computer. It features an intuitive interface, contains more than file lists as inputs; the latter is more flexible and lacks native 1,000 pre-loaded tools and hundreds of sample workflows. support of file lists as inputs (i.e., each input or output port is Addition of new tools requires knowledge of the Java pro- related to one file). Different to WS-PGRADE, KNIME Nodes gramming language—an aspect that might keep some users produce outputs not only via output ports: They can also set away from this feature. A couple of royalty-based variants (i.e., flow variables, which can be read further down the execution the so-called KNIME Collaborative Extensions) are offered to flow. 2 8th International Workshop on Science Gateways (IWSG 2016), 8-10 June 2016 The KNIME Analytics Platform is a Java program with a checking: Any output port can be connected to any input port. graphic interface. KNIME Nodes are then instances of Java Additionally, the structure of the incoming and outgoing files classes that live inside the process which launched the KNIME is arbitrary. Analytics Platform. In other words, they require a running instance of the KNIME Analytics Platform to be executed, making their execution on a DCI a challenge. The following sections describe our approach to address the mentioned challenges. A. Conversion of Nodes: Addressing Disparities between Workflow Engines The KNIME Analytics Platform features a node repository in which users can select any of the available nodes (see Figure 4). Creation of workflows in the KNIME Analytics Platform requires a single step, thus the abstract and concrete layers are Fig. 5. The multi-step creation of Workflows in WS-PGRADE. An abstract is first created (left pane), after which a concrete can be created and configured merged into the user-friendly workflow editor. Each KNIME (right pane). Node performs a specific task and defines a fixed number of input, output ports. Each port is associated to a port type, Adding nodes to the KNIME Analytics Platform re- which is similar to content types (e.g., csv, pdb). Only ports of quires knowledge of the Java programming language. Generic compatible types can be interconnected. Furthermore, KNIME KNIME Nodes (GKN) [5], [6] was developed to add nodes Nodes rely on the assumption that incoming and outgoing data without programming experience by allowing arbitrary com- are arranged in custom in-memory data tables. Each KNIME mand line tools to behave as KNIME Nodes and to seamlessly Node iterates over the rows of incoming data and is able to interact with other nodes inside the KNIME Analytics Plat- modify the contents of the input table, as well as its structure form. The only requirement is the representation of the tools (e.g., by adding columns or rows). File handling is done by by Common Tool Descriptors (CTDs), which are XML files using these same data tables, their cells containing uniform describing the inputs, outputs and parameters of a tool [5], [6]. resource identifiers (URI) pointing to the needed files. Currently, several software suites [12], [13], [14] are able to parse and generate CTDs (i.e., they are CTD-enabled). Figure 6 illustrates how CTDs interact with CTD-enabled tools. We introduce KNIME2gUSE, an extension to the KNIME Analytics Platform which converts workflows from the KNIME Analytics Platform to WS-PGRADE, combining the features of both engines and overcoming their disadvantages. Conversion of KNIME Nodes that were imported using GKN is somewhat trivial. Each of these nodes represents an external tool that is independent of the KNIME Analytics Platform. In this case, the matching binary for the represented tool is required on each of the target DCIs. We identify native nodes as those KNIME Nodes that were not imported using GKN (i.e., pre-packaged nodes, nodes added as third-party extensions or nodes added by the user via other means). Each native KNIME Node is an instance of a Java class managed by the KNIME Analytics Platform. Such nodes exist only in the context of the process that hosts the KNIME Analytics Platform. Execution of a single KNIME Node requires a running instance of the KNIME Fig. 4. The KNIME Analytics Platform Node Repository. Available nodes Analytics Platform and converting these nodes is not trivial. can be selected from the node repository by drag and dropping them into the current workflow editor. Furthermore, a suitable distribution of the KNIME Analytics Platform must be present on each of the target DCIs. WS-PGRADE, on the other hand, requires the creation of Data between KNIME Nodes can only be channeled be- an abstract and a concrete workflow in a multi-step process tween ports with compatible data types. Since channeled data (see Figure 5). During the creation of the concrete workflow, are in-memory representations of table-formatted data (i.e., users input the required attributes and command line to as- data tables), we have devised a solution that allows native sociate a node to a specific remote binary. In contrast to the KNIME Nodes to be executed as if they were command line KNIME Analytics Platform, WS-PGRADE allows to assign tools: During the export process, native KNIME Nodes are files directly to input ports and it doesn’t perform a strict type individually packed into a small KNIME workflow. Each such 3 8th International Workshop on Science Gateways (IWSG 2016), 8-10 June 2016 generated workflow contains a copy of the original node, along with any user-established settings. Since inputs and outputs for the exported node won’t be channeled inside an instance of the KNIME Analytics Platform, extra reader and writer nodes (i.e., Table Reader and Table Writer) are also included in this small workflow. These nodes allow the serialization and deserialization of the in-memory data format required by native KNIME Nodes. The KNIME Analytics Platform can execute workflows in a so-called batch mode, without the need of a graphical user interface. A suitable command line is automatically generated during our export process. When the batch mode execution of this generated workflow is started, input files will be read into the KNIME data table format; upon completion, any output will be serialized from the KNIME data table format into a file. The work previously presented in [5], [6] introduced work Fig. 7. Conversion of native KNIME nodes. Since Row Splitter exists we have done in the field and showcased conversion of only in the context of the KNIME Analytics Platform, it requires an instance of a KNIME Analytics Platform for its execution. KNIME2gUSE generates KNIME workflows composed solely of nodes that were im- a workflow containing the required input/output nodes and a copy of Row ported via GKN. We have extended KNIME2gUSE in order Splitter with the same configuration settings as its origin node. The generated to convert workflows composed of any kind of nodes. Figures command line invokes a KNIME Analytics Platform in the target DCI in batch mode, reads input from a file and writes outputs to files. 7 and 8 depict how the conversion of nodes is performed. Fig. 6. A CTD in action. Top section: Parameters needed for the tool Ligand3DGenerator [14] to be executed, namely input file, output file and the desired force field. Middle section: A sample CTD snippet representing an execution of the Ligand3DGenerator using the shown parameter values. Bottom section: A CTD-enabled tool with the given sample CTD. Fig. 8. Conversion of GKN-imported nodes. The nodes depicted on the left side directly interact with binaries located on the user’s desktop computer. B. Conversion of Workflows: Exporting KNIME wofkflows to The right column shows the mapped binaries and an equivalent execution on WS-PGRADE a target DCI. Since Ligand3DGenerator is CTD-enabled, a suitable CTD file can be generated; this is not the case for the blastn tool. The KNIME2gUSE plug-in produces files that can be imported into WS-PGRADE, ready to be executed on any configured DCI with minor modifications. End nodes and substitutes suitable WS-PGRADE generator We have chosen WS-PGRADE as the target engine for and collector ports. the export process due to the fact that it interacts directly with a wide selection of resource and cloud managers (a C. Example Application: Biomarker Discovery in feature not present in the royalty-based KNIME editions that Metabolomics allow remote execution). It also features workflow submission, Metabolomics is a mass spectrometry-based approach aimed control, monitoring and statistics. These are functionalities to evaluate the entirety of a metabolite sample. Applications which resource managers or cloud engines often lack. include the tracking of chemicals and their transformation The KNIME Analytics Platform natively supports the as- products in waste water [15], identification of cancer types via sociation of single input/output ports to a file list determined biomarkers [16], [17] and elucidation of disease-underlying at runtime, a functionality not present in WS-PGRADE. To mechanisms [18]. Compared to complementary omics tech- overcome this, a wrapper script is automatically generated nologies (e.g., transcriptomics, proteomics), metabolomics is by KNIME2gUSE that zips corresponding files into a single closer to the actual biochemical processes that occur, making archive. To translate parameter sweep sections, conversion re- it attractive for biomarker development. moves KNIME Analytics Platform ZipLoopStart and ZipLoop- A common analysis approach for studies interested in com- 4 8th International Workshop on Science Gateways (IWSG 2016), 8-10 June 2016 Fig. 9. Metabolics biomarker discovery workflow both in the KNIME Analytics Platform and WS-PGRADE. Top: Workflow implemented in the KNIME Analytics Platform, OpenMS [12] was imported using GKN. Bottom: Workflow generated by KNIME2gUSE imported into WS-PGRADE (workflow slightly edited for visual clarity). Even if some elements are missing after conversion (e.g., ZipLoopStart, ZipLoopEnd, Input File nodes), both versions have the same abstract layer. This is due to the difference in implementation of logically equivalent constructs in WS-PGRADE and the KNIME Analytics Platform, such as parameter sweep. parative metabolite concentrations is label-free quantification. 9. We assume some initial preparations were performed prior The independence from chemical labels allows the direct to the execution of the workflow, namely, conversion from comparison of small molecules across an arbitrary number of closed mass spectrometer vendor formats to the open mzML samples. As a consequence, the need to evaluate hundreds format and data reduction by means of peak picking, which of gigabyte-sized samples in concert is already common. could also be implemented in KNIME via OpenMS [12] tools. Numbers and sizes of concurrently evaluated samples are steadily increasing, emphasizing the necessity for distributed Using a detection method for so-called small computing. molecules [19], we adapted a label-free quantification pipeline [20]. The quantification part of our biomarker We provide an example workflow for metabolomics discovery workflow consists of sample specific feature biomarker discovery using OpenMS [12] for mass spectrome- detection (i.e., finding the convex hulls and respective try algorithms as well as various native KNIME Nodes (includ- centroids of analyte mass traces) followed by temporal ing nodes for the R scripting language). The KNIME workflow alignment of samples and the quantification of corresponding and its converted WS-PGRADE version are shown in Figure features across samples. 5 8th International Workshop on Science Gateways (IWSG 2016), 8-10 June 2016 Downstream small molecule identification was done via [3] “Trouble at the lab,” The Economist, oct 2013. mass-based search in the Human Metabolome Database. In- [4] M. Baker, “Over half of psychology studies fail reproducibility test,” Nature, Aug. 2015. cluded sample normalization allows for comparison of analyte [5] L. de la Garza, J. Veit, A. Szolek, M. Röttig, S. Aiche, S. Gesing, abundances across samples. Analytes whose abundances vary K. Reinert, and O. Kohlbacher, “From the desktop to the grid: scalable significantly after false discovery rate correction are anno- bioinformatics via workflow conversion,” BMC Bioinformatics, vol. 17, no. 1, pp. 1–12, 2016. tated with the mass-based identifications and exported to a [6] L. de la Garza, J. Krüger, C. Schärfe, M. Röttig, S. Aiche, K. Reinert, Microsoft Excel Spreadsheet (XLS format). and O. Kohlbacher, “From the desktop to the grid: conversion of knime workflows to guse.” in IWSG, 2013. III. F UTURE W ORK [7] K. Plankensteiner, J. Montagnat, and R. Prodan, “IWIR: A Language Enabling Portability Across Grid Workflow Systems,” in SIGMOD Rec., The KNIME Analytics Platform features Metanodes en- vol. 34, no. 3, 2011, pp. 97–106. capsulating complete workflows. We would like to extend [8] W. van der Aalst and A. ter Hofstede, “YAWL: yet another workflow KNIME2gUSE to support their conversion. Furthermore, see- language,” Information Systems, vol. 30, no. 4, pp. 245–275, Jun. 2005. [9] G. Terstyanszky, T. Kukla, T. Kiss, P. Kacsuk, A. Balasko, and Z. Farkas, ing that considerable effort has been put into creating platform- “Enabling scientific workflow sharing through coarse-grained interoper- independent workflow representation formats, we would like ability,” Future Generation Computer Systems, vol. 37, pp. 46–59, 2014. to add IWIR and YAWL file generation to KNIME2gUSE. [10] P. Kacsuk, Z. Farkas, M. Kozlovszky, G. Hermann, A. Balasko, K. Karoczkai, and I. Marton, “WS-PGRADE/gUSE Generic DCI Gate- We would also like to extend our converter to support other way Framework for a Large Variety of User Communities,” Journal of workflow engines, such as Galaxy. Grid Computing, vol. 10, no. 4, pp. 601–630, 2012. [11] M. R. Berthold, N. Cebron, F. Dill, T. R. Gabriel, T. Kötter, T. Meinl, IV. C ONCLUSION P. Ohl, K. Thiel, and B. Wiswedel, “KNIME - the Konstanz information miner,” ACM SIGKDD Explorations Newsletter, vol. 11, no. 1, p. 26, Workflows assist reproducibility and minimize time spent Nov. 2009. validating research by reducing analysis complexity. There are [12] M. Sturm, A. Bertsch, C. Gröpl, A. Hildebrandt, R. Hussong, E. Lange, currently several workflow engines with user-friendly inter- N. Pfeifer, O. Schulz-Trieglaff, A. Zerck, K. Reinert, and O. Kohlbacher, “OpenMS - an open-source software framework for mass spectrometry.” faces that support remote execution of workflows. However, BMC bioinformatics, vol. 9, p. 163, 2008. we feel that their scalability and support of major resource [13] A. Döring, D. Weese, T. Rausch, and K. Reinert, “SeqAn an efficient, managers is still lacking. In contrast, HPC infrastructures generic C++ library for sequence analysis.” BMC bioinformatics, vol. 9, no. 1, p. 11, Jan. 2008. and their resource managers rarely support the execution and [14] A. Hildebrandt, A. K. Dehof, A. Rurainski, A. Bertsch, M. Schumann, control of workflows. As a consequence, HPC users often N. C. Toussaint, A. Moll, D. Stöckel, S. Nickels, S. C. Mueller, H.- require programming skills to handle the channeling of data P. Lenhof, and O. Kohlbacher, “BALL–biochemical algorithms library 1.3.” BMC bioinformatics, vol. 11, p. 531, 2010. as well as to submit, monitor and control the respective [15] E. L. Schymanski, H. P. Singer, P. Longrée, M. Loos, M. Ruff, M. A. computing jobs. Stravs, C. Ripollés Vidal, and J. Hollender, “Strategies to characterize We present our efforts to support workflow export from polar organic contamination in wastewater: exploring the capability of high resolution mass spectrometry.” Environmental science & technol- the KNIME Analytics Platform to WS-PGRADE, identified ogy, vol. 48, no. 3, pp. 1811–8, Jan. 2014. challenges for both node and workflow conversion and detailed [16] M. Sugimoto, D. T. Wong, A. Hirayama, T. Soga, and M. Tomita, “Capil- our solutions. KNIME offers remote workflow execution, but it lary electrophoresis mass spectrometry-based saliva metabolomics iden- tified oral, breast and pancreatic cancer-specific profiles.” Metabolomics is a royalty-based solution and support of DCIs is limited—an : Official journal of the Metabolomic Society, vol. 6, no. 1, pp. 78–95, aspect in which WS-PGRADE excels. KNIME2gUSE brings mar 2010. together a user-friendly and intuitive workflow engine for [17] C. Denkert, J. Budczies, T. Kind, W. Weichert, P. Tablack, J. Sehouli, S. Niesporek, D. Könsgen, M. Dietel, and O. Fiehn, “Mass spectrometry- personal computers together with a scalable HPC workflow based metabolic profiling reveals different metabolite patterns in invasive platform that interacts with several DCIs. ovarian carcinomas and ovarian borderline tumors.” Cancer research, We thus provide the individual advantages of both engines vol. 66, no. 22, pp. 10 795–804, Nov. 2006. [18] J. S. Hansen, X. Zhao, M. Irmler, X. Liu, M. Hoene, M. Scheler, without any of their shortcomings. Overall, our methods Y. Li, J. Beckers, M. Hrab? de Angelis, H.-U. Häring, B. K. Pedersen, decrease time spent designing workflows and troubleshooting R. Lehmann, G. Xu, P. Plomgaard, and C. Weigert, “Type 2 diabetes conversion for different workflow engines. alters metabolic and transcriptional signatures of glucose and amino acid metabolism during exercise and recovery.” Diabetologia, vol. 58, no. 8, ACKNOWLEDGMENT pp. 1845–54, Aug. 2015. [19] E. Kenar, H. Franken, S. Forcisi, K. Wörmann, H.-U. Häring, The authors would like to thank Bernd Wiswedel, Thorsten R. Lehmann, P. Schmitt-Kopplin, A. Zell, and O. Kohlbacher, Meinl, Patrick Winter and Michael Berthold for their support, “Automated label-free quantification of metabolites from liquid chromatography-mass spectrometry data.” Molecular & cellular pro- patience and help in developing the KNIME2gUSE extension. teomics : MCP, vol. 13, no. 1, pp. 348–59, jan 2014. This work was supported by the German Network [20] H. Weisser, S. Nahnsen, J. Grossmann, L. Nilse, A. Quandt, H. Brauer, for Bioinformatics Infrastructure (Deutsches Netzwerk für M. Sturm, E. Kenar, O. Kohlbacher, R. Aebersold, and L. Malmström, “An automated pipeline for high-throughput label-free quantitative pro- Bioinformatik-Infrastruktur, de.NBI). teomics.” Journal of proteome research, vol. 12, no. 4, pp. 1628–44, Apr. 2013. R EFERENCES [1] C. S. Greene, J. Tan, M. Ung, J. H. Moore, and C. Cheng, “Big data bioinformatics.” Journal of cellular physiology, vol. 229, no. 12, pp. 1896–900, Dec. 2014. [2] M. McNutt, “Reproducibility.” Science (New York, N.Y.), vol. 343, no. 6168, p. 229, 2014. 6