9th International Workshop on Science Gateways (IWSG 2017), 19-21 June 2017 Molecular docking with Raccoon2 on clouds: extending desktop applications with cloud computing Damjan Temelkovski, Tamas Kiss, Gabor Terstyanszky University of Westminster London, UK damjan.temelkovski@my.westminster.ac.uk, {t.kiss, g.z.terstyanszky}@westminster.ac.uk Abstract—Molecular docking is a computer simulation that practice, as in the rest of this paper, these terms are often predicts the binding affinity between two molecules, a ligand and abbreviated to docking and virtual screening (VS). Although a a receptor. Large-scale docking simulations, using one receptor single docking simulation is relatively short, a VS experiment and many ligands, are known as structure-based virtual is computationally demanding, requiring the use of Distributed screening. Often used in drug discovery, virtual screening can be Computing Infrastructures (DCIs). very computationally demanding. This is why user-friendly domain-specific web or desktop applications that enable running Cloud computing is a paradigm based on virtualisation of simulations on powerful computing infrastructures have been data centres, which “enables ubiquitous convenient, on-demand created. Cloud computing provides on-demand availability, pay- network access to a shared pool of configurable computing per-use pricing, and great scalability which can improve the resources that can be rapidly provisioned and released with performance and efficiency of scientific applications. This paper minimal management effort or service provider interaction” investigates how domain-specific desktop applications can be [1]. The scalability and elasticity provided by cloud computing extended to run scientific simulations on various clouds. A makes it useful for VS. Clouds are available on-demand and generic approach based on scientific workflows is proposed, and users are charged on pay-per-use basis. This can make a proof of concept is implemented using the Raccoon2 desktop scientific applications, such as VS, more accessible for life application for virtual screening, WS-PGRADE workflows, and scientists around the world, lowering the cost of using complex gUSE services with the CloudBroker platform. The presented computing infrastructure. If VS is implemented based on the analysis illustrates that this approach of extending a domain- Software-as-a-Service model, life scientists will always have specific desktop application can run workflows on different types of clouds, and indeed makes use of the on-demand scalability access to the latest version of the simulation software. provided by cloud computing. It also facilitates the execution of Scientists and students without access to expensive DCIs, and virtual screening simulations by life scientists without requiring without experience in configuring them, will be able to run VS them to abandon their favourite desktop environment and easily. providing them resources without major capital investment. Scientific workflow systems such as Taverna [2], Kepler [3] or WS-PGRADE [4], provide a convenient way to represent Keywords—cloud computing; molecular docking; Raccoon2; and develop complex applications composed of multiple steps virtual screening; WS-PGRADE/gUSE; bioinformatics and executables. A user-friendly interface is usually used to provide convenient workflow management facilities. In some I. INTRODUCTION cases, science gateways are developed, providing a user- Biochemical interactions between two molecules can be friendly way to run workflows. There are several examples of estimated using a software simulation technique known as science gateways that use workflows to run VS simulations molecular docking. Particularly important in drug discovery, [5]-[7]. However, all of these solutions require life scientists to this technique can predict the conformation, pose, and binding become familiar with new, typically web-based user interfaces, affinity of a ligand and a receptor. In order to achieve this, the and significantly restrict the use of the docking software for the 3D structure of both molecules must be known. This structure sake of simplicity and ease of use. On the other hand, there are can be determined using X-ray crystallography or NMR popular desktop applications which offer greater flexibility, spectroscopy, or estimated using homology modelling. such as Raccoon2 [8]. Unfortunately, these desktop Molecular docking consists of an algorithm to search through applications are either restricted to local resources, or require the conformational space of the molecules, and a scoring expensive compute clusters and significant IT support to run function to estimate the energy between the ligand and the them on DCIs. Such tools typically cannot utilise cloud receptor’s binding site. Since molecular docking uses the computing resources. structure of the receptor, large-scale molecular docking of This paper describes a generic approach to extend domain- hundreds of thousands of ligands and one receptor is called specific desktop applications to execute workflows on clouds, structure-based virtual screening (virtual, as opposed to high while retaining the same familiar Graphical User Interface throughput screening, the automated laboratory experiment). In (GUI) presented to end-users. We demonstrate the utilisation of This work was funded by the CloudSME Cloud-Based Simulation platform for Manufacturing and Engineering Project No. 608886 and the COLA Cloud Orchestration at the level of Applications Project No. 731574 projects. 9th International Workshop on Science Gateways (IWSG 2017), 19-21 June 2017 distributed heterogeneous clouds to run VS, noting that various a wide range of clouds as well as other DCIs, not limiting life other DCIs can also be used with little or no changes to our scientists to using a specific infrastructure. approach. III. GENERIC CONCEPT II. RELATED WORK Our aim is to enable existing desktop applications to access Most VS experiments on DCIs have used CPU, GPU heterogeneous cloud computing resources. This should be clusters or grid computing resources (e.g. [5], [6], [9]-[11]). achieved without major reengineering of the desktop Applying cloud computing for such experiments is still application and without further burdening the end-user. Ideally, relatively new with much lower number of examples. end-users should be able to design and execute the experiments in the same way they have done earlier, but with the possibility One such example is the wFReDoW [12], a web-based to send the computations to cloud computing resources. environment for docking flexible receptors using the docking tool AutoDock 4.2. Their infrastructure is comprised of a In order to achieve this objective, a set of services (we virtual MPI master-slave environment deployed on the name them Cloud Access Services - CAS) can be called from commercial Amazon EC2 cloud [13]. They have tested it using the desktop application. CAS should be available from an five c1.xlarge EC2 Amazon instances (8-cores, 7GB RAM, Application Programming Interface (API) in order to facilitate 1.65TB storage), the ligand triclosan obtained from the PDB its integration to the GUI of the desktop application. (Protein Data Bank) [14] (PDB ID: 1P45A), and 3,100 Additionally, CAS should provide access to a wide range of snapshots of the receptor InhA from Mycobacterium cloud computing resources, and should enable the design and tuberculosis, generated by Molecular Dynamics (MD) execution of complex application scenarios, such as parameter simulations to model the flexibility. sweep workflow applications, typically required to design VS experiments. Another example is AutoDockCloud [15] which uses Hadoop and MapReduce to run AutoDock4 on the private The integration requires two major steps from the “Kandinsky” cloud at the Oak Ridge National Laboratory. developers, as illustrated in Fig. 1. During the first step, CAS is With 57 16-core nodes reserved for MapReduce, 570 docking configured to run the application in the cloud. This step simulations can be performed simultaneously. To test their typically requires preparing workflow applications describing solution, the authors used the human estrogen receptor alpha the experiment, and configuring CAS to interface with the obtained from the PDB (PDB ID: 1L2I) and 2,637 ligands from desired cloud resources. In the second step minor modification the DUD [16] database. AutoDockCloud has completed the VS of the GUI of the desktop application is required while 450 times faster than a non-parallel execution without affecting integrating the execution of the workflow (practically creating the biochemical results. However, it only handles the docking a simple button to execute the complex workflow representing stage and not the pre or post-docking preparation and analysis, the experiment), and to retrieve the results. Instead of which have to be done separately. implementing CAS, the core component of this conceptual architecture from scratch, existing tools to support the creation In a third example [17], AutoDock and AutoDock Vina of parameter sweep workflows and interfacing with cloud have been ported on the Windows Azure-based “VENUS-C” computing resources can be applied. This approach speeds up cloud computing service. Using a small desktop application, the development and has the potential to result in a mature and scientists were able to submit, monitor and retrieve results of highly reliable solution. The rest of this paper describes this VS simulations. 10,000 ligands and a receptor, generated from approach using a set of existing services and components as the a short MD run on an initial structure, have been tested on 20 selected CAS and their integration to a VS desktop application. extra small Azure instances (1-core, 768MB RAM, 20GB storage), for a total of 110,000 CPU hours and more than 40,000 docking runs. While these experiments illustrated the applicability of cloud computing resources, the custom user interface was rather simplistic restricting the depth of experiments. As summarised above, there have been several efforts to run VS using cloud computing. However, all these attempts provided their own restricted GUI for running the simulations and were focusing on a specific cloud computing infrastructure. In comparison, the approach suggested in this Fig. 1. Generic concept for extending desktop applications to run on clouds paper, enables scientists to use the GUI of a popular domain- specific desktop application they are familiar with. Furthermore, Raccoon2, the desktop application we use to IV. BACKGROUND prove our concept, provides pre-docking and post-docking The solution developed and presented in this paper is facilities to prepare input files, and analyse results. Finally, our focused around the extension of the VS tool Raccoon2, by approach utilises a set of services in the form of WS- connecting it to a WS-PGRADE/gUSE science gateway and to PGRADE/gUSE and the CloudBroker Platform which support various cloud computing resources via the CloudBroker 9th International Workshop on Science Gateways (IWSG 2017), 19-21 June 2017 Platform. This section briefly describes these technologies and C. CloudBroker Platform components. When connected, they enable scientists to run VS The CloudBroker Platform [23] is a cloud computing simulations on various clouds using a familiar GUI. middleware and an application store developed by CloudBroker GmbH. It provides a web interface which can be A. Raccoon2 and AutoDock Vina used to deploy and execute an application in a cloud, and Raccoon2 is the latest version of an open-source desktop monitor its behaviour. The CloudBroker platform is connected application for preparing and analysing VS with AutoDock to various kinds of clouds, including commercial (e.g. Vina. It supports executing docking experiments a Linux CloudSigma, Amazon Web Services) and open-source (e.g. cluster with the PBS or SGE schedulers, and it incorporates OpenNebula, OpenStack). The CloudBroker platform has been analysis features such as filtering, visualising, and exporting integrated into gUSE’s DCI Bridge, providing various cloud the results. Users can select ligands and receptors, configure computing resources to the WS-PGRADE/gUSE framework. docking options, visualise a binding site, and connect to a cluster to submit jobs, directly from the Raccoon2 GUI. We V. DESIGN AND IMPLEMENTATION have chosen Raccoon2 in our implementation since it is the VS tool of choice for bio-scientists at the University of Based on the generic concept described in Section III and Westminster (UoW). the tools introduced in Section IV, a reference implementation of the proposed architecture has been completed. When Before the jobs can be submitted, Raccoon2 guides users to implementing the generic concept of Fig. 1, the domain- deploy the docking tool AutoDock Vina on a cluster. specific desktop application is Raccoon2; the CAS is composed AutoDock Vina [18] is an open-source docking tool with built- of a gUSE server connected to the CloudBroker Platform, a in support for multithreading. It uses a hybrid global-local WS-PGRADE portal for workflow development, and the conformation search with a gradient-based local optimisation, CloudBroker web interface for deployment; while the cloud and its scoring function is based on empirically weighted infrastructures are the UoW OpenStack cloud, and the parameters such as: hydrophobic (van der Waals) interactions, CloudSigma [25] cloud (Fig. 2). hydrogen bonding, and torsional penalties, similarly to its sister-tool AutoDock. As described in Section III, the development is divided into two major steps: configuration of the CAS (1) and modification of the desktop GUI (2). First, the CAS is prepared to execute B. WS-PGRADE/gUSE the VS experiment which includes creating the required WS- WS-PGRADE/gUSE [19] is a workflow-centric open- PGRADE workflow and configuring the CloudBroker source science gateway framework. WS-PGRADE workflows platform. When accessing gUSE through the RemoteAPI, a are dataflow directed acyclic graphs where nodes represent valid well-configured WS-PGRADE workflow needs to be execution blocks, with input and output ports, which can be attached. To simplify this step, a developer can create the executed in parallel. WS-PGRADE workflows support workflow using a WS-PGRADE portal, test it with test input parameter sweep applications, where a workflow node can be data, and then export it. The exported workflow can be executed many times for multiple input data sets. configured from the code of the domain-specific desktop A WS-PGRADE portal is a Liferay-based [20] e-science application and attached to a RemoteAPI call, rather than web portal for development of parallel applications executed created from scratch. To conclude (1), the executable files that on various DCIs using WS-PGRADE workflows. It has a are needed to run the workflow should be deployed to the graph editor which allows creating, configuring and executing cloud, using the CloudBroker platform. In step (2) the source workflows using gUSE (Grid and cloud User Support code of the domain-specific desktop application is extended, in Environment) services. The gUSE is an open-source service order to make the appropriate RemoteAPI calls. The next stack that can form the back-end of science gateways executing sections will elaborate on these steps. applications on DCIs. It provides well-defined services for workflow management. Originally supporting primarily service grids, desktop grids, and clusters, gUSE also supports parallel execution on clouds. A gUSE internal component called DCI Bridge [21], provides a well-defined communication interface enabling access to many different DCIs, including clouds [22]. The gUSE RemoteAPI is an API that allows remote submission and management of WS-PGRADE workflows. Existing applications can call it over HTTP(S) to use gUSE services without a WS-PGRADE portal. The RemoteAPI requires a valid well-parameterised WS-PGRADE workflow to be attached. It submits this workflow using a temporary user. Once downloaded, the workflow output files and all Fig. 2. Architecture of our reference implementation using Raccoon2, WS- information about this user are deleted from the gUSE server. PGRADE/gUSE, CloudBroker, and the UoW or CloudSigma clouds 9th International Workshop on Science Gateways (IWSG 2017), 19-21 June 2017 A. Creating the WS-PGRADE Workflow select the number of cloud instances, their size, the name of the The execution steps of the domain-specific desktop cloud, and the region. We store all possible values, and their application are recreated using a WS-PGRADE workflow. In encoded versions, in an additional XML file this particular case a simple one-node workflow with four input (gUSECloudConfiguration.xml). At the moment, this file is (ligand files, receptor file, Vina configuration file, and an manually synchronised to contain the correct values of all additional file to overcome an output names issue) and one supported types of cloud instances. If, for example, there is a output (the zipped results from the multiple docking runs) ports change in the maximum number of instances a cloud can were created. However, please note that based on the desktop handle, a value in this file should be changed. application more complex workflows may be required. Within the original Raccoon2 GUI, the user can attach a set After consultation with life scientists, an issue with of ligands and a receptor. In our extension, the attached files previous implementations of AutoDock Vina as a WS- are grouped in as many folders as the number of selected cloud PGRADE workflow was discovered. It was difficult to instances. On submission, one workflow will be run for each understand which output file resulted from which input due to folder, where its results will be stored after downloading. automatically generated output names. AutoDock Vina, by Before submitting it, the WS-PGRADE workflow file is default, uses the names of the input ligand and receptor to configured to include the cloud selected in the GUI by the name the result file. However, WS-PGRADE changes the scientist. names of parametric input files internally. In order to avoid Finally the updated workflow.xml file is zipped along with this, we do not to use parametric input ports, but rather apply the rest of the input files, following the WS-PGRADE naming an additional text file which contains pre-generated names of convention, to compose a well-formed WS-PGRADE the expected result file, for each ligand-protein pair. This file is workflow. Thus, the Raccoon2 code can submit it by calling a created within Raccoon2 before the workflow is submitted. gUSE RemoteAPI method using the command curl. Once submitted, the workflow invokes CloudBroker’s Apart from the attached workflow, this RemoteAPI method execution script which runs AutoDock Vina for each ligand it requires authentication. Namely, it needs a RemoteAPI receives as input. In order to run this workflow on many cloud password set by the gUSE server administrators, and instances, our extended code of Raccoon2 splits the set of CloudBroker user credentials (username and password). In the ligands into as many zip archives as the number of instances, current implementation, for security reasons, the end-user is and submits a separate workflow to each instance. asked to provide these. In line with gUSE conventions, the credentials file should be named x509._credentialsID, where B. Deployment on the CloudBroker Platform _credentialsID is the name of the middleware that requires the The CloudBroker deployment process requires a authentication (this naming convention remains, even though deployment script and an execution script. They need to be the X.509 standard is not used). This file is then zipped and uploaded and executed on the CloudBroker platform. If an along with the RemoteAPI password and the zipped WS- image of the Operating System (OS) is not present in the image PGRADE workflow, they are sent as POST parameters. The repository of the target cloud, it should also be installed. The RemoteAPI method returns a workflowID, which is used to deployment script is run only once, to prepare the OS and check the workflow’s status. install any required dependencies. A snapshot of the prepared Monitoring the VS simulation is done by polling for the OS image is then used for future jobs, when the execution status of the workflows using the RemoteAPI. In our current script is called. The execution script validates inputs, executes implementation, a status check is performed every 20 seconds. the application and stores the outputs in a particular folder. The status is displayed to the user and if there were errors they In our example extension of Raccoon2, Ubuntu 14.04 is can re-submit the workflows. Once a workflow has finished, a used to run the deployment script which creates the appropriate final RemoteAPI call retrieves the output. When the workflows folder structure and installs the required tools: AutoDock Vina, complete successfully, their output is downloaded and only the zip, and unzip. After validating the input files, the execution relevant AutoDock Vina result files (.pdbqt_log.txt and .pdbqt) script runs AutoDock Vina with appropriate parameters for are extracted into a result folder. This folder can be directly each ligand. This deployment process is simplified because used by the original analysis tab of Raccoon2. The scientist AutoDock Vina is included in the Ubuntu package repository. needs to simply select it in order to view the ligands sorted by the docking results. The filtering and visualisation features can C. Extending the Raccoon2 Source Code with the RemoteAPI be used, exactly as in the original Raccoon2. The source code of our extended version of Raccoon2 is available at In order to conduct VS on a cloud, we need to submit the https://github.com/damjanmk/Raccoon2. WS-PGRADE workflow using the gUSE RemoteAPI. A WS- PGRADE workflow consists of an XML file (workflow.xml) which describes the workflow and the input files. The XML VI. RESULTS AND EVALUATION file contains other valuable information, such as which kinds of cloud instances would be used. A. Proof of Concept on the UoW and CloudSigma Clouds To show that our concept can be implemented to run a real- To fill in the cloud configuration information correctly, we life VS on different clouds, we obtained biochemically relevant added a section in the Raccoon2 GUI which enables users to 9th International Workshop on Science Gateways (IWSG 2017), 19-21 June 2017 input data from life scientists. The receptor is an enzyme called and we can generally recommend using 32-bit cloud instances ribokinase, which is part of the salvage pathway of nucleotides for this kind of VS experiments since the average execution in the protozoan parasite Trichomonas Vaginalis (TV). The 3D time decreased by 12.92%. Furthermore, although the structure of this receptor has been created by homology CloudSigma instances had half the memory, due to various modelling. TV causes trichomoniasis, a very common sexually performance optimisations in the CloudSigma cloud, they transmitted infection. A set of 130,216 ligands have been finished the docking significantly faster (on average the 32-bit obtained from the ZINC [24] database of drug-like small CloudSigma run was 34.74% faster than the 64-bit UoW run). molecules. It is a diverse subset of ligands that may bind and antagonise the receptor. We tested our extended Raccoon2 B. Scalability tests on the UoW cloud using these input files, conducting three runs, effectively In order to show the scalability of our solution we designed 130,216 docking simulations each. The UoW OpenStack cloud several more experiments using the same input files described (at London, UK) was used to prove that the approach works, in part A. Firstly, we ran the VS using our cloud-enabled and two runs on the commercial CloudSigma cloud (at Zürich, Raccoon2, selecting 7 small instances on the UoW cloud. The Switzerland) were conducted to show the use of different average time per instance was 123h 12min 1s. Then, we clouds. increased the instance type to medium while keeping the There are several types of 64-bit (x86_64) instances that number of instances to 7. The average time per instance was can be used in the UoW cloud: small (1-core 2GB RAM), 75h 35min 16s. Finally, we used 7 large instances, resulting in medium (2-core 4GB RAM), large (4-core 8GB RAM), and average time per instance of 51h 47min 29s. These results extra-large (8-core 16GB RAM). At the time of the tests, the demonstrate reasonable scalability of Raccoon2 when UoW cloud had a maximum capacity of 29 instances and increasing the number of cores inside the instances. The left processor cores that could be allocated for this experiment. panel of Fig. 4 demonstrates the scale-up when compared to an Therefore, we tested our implementation on 29 small instances. ideal proportional scale-up (double the cores = half the time). In order to do this the extended code of Raccoon2 split the In a second set of experiments we kept the instance type the 130,216 ligands into 29 groups. A total of 7 jobs (24.14%) had same (UoW small) while increasing the number of instances. errors due to connection problems between CloudBroker and Namely, we ran 14 small instances with the average time per UoW cloud, but all finished successfully after re-submission. instance of 61h 31min 1s, followed by 28 small instances The average execution time per instance was 26h 35min 52s. resulting with average time per instance of 31h 29min 14s. The To compare the results of both clouds, we decided to use 29 right panel of Fig. 4 shows that these results very closely instances most similar in type to the UoW small instances. resemble the ideal proportional scale-up. It shows that although There are 32-bit or 64-bit CloudSigma small (1-core 1GB AutoDock Vina has multithreading capabilities, it is faster to RAM) instances, note that they have only 1GB RAM. There run 28 small instances than 7 large. Therefore, to maximise were noticeable differences in the execution time between the efficiency, we can recommend using more, but less powerful, fastest and the slowest job per run (e.g. 64-bit fastest: 17h 5min rather than less, but more powerful instances. 57s; slowest: 22h 53min 59s). The average time per job for the 144h Experiment 144h Experiment 64-bit instances was 19h 55min 59s, while the 32-bit instances, Proportional Proportional 120h 120h 17h 21min 23s. Fig. 3 shows the execution time for each of the 29 jobs (instances numbered 1, 2, 3, etc. in each run may be 96h 96h docking different ligands). 33h 72h 72h UoW small (64-bit) CloudSigma small (64-bit) CloudSigma small (32-bit) 48h 48h 30h 24h 24h 27h 0h 0h 7 UoW small 7 UoW medium 7 UoW large 7 UoW small 14 UoW small 28 UoW small 24h Fig. 4. Scalability - Our experiment compared to a proportional: increasing 21h the configuration of instances (left), increasing the number of instances (right) 18h C. Price Estimate As of March 2017, CloudSigma cloud computing prices are 15h $0.0195 per hour for 1-core CPU, $0.007 per GB RAM, 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 $0.1329 per GB SSD storage, and $0.04 per GB of outbound Fig. 3. Comparison of the execution times of each of the 29 instances data transfer [25]. Therefore, running our VS on 29 small instances would cost $15.83. The AutoDock Vina software has been developed for 32-bit machines and as noted on their official website, it is compatible D. Exploring the potential of using other DCIs with 64-bit machines (http://vina.scripps.edu/manual.html). As WS-PGRADE/gUSE is connected to other DCIs such as However, it seems that the overhead produced is significant desktop grids, clusters or service grids via the DCI Bridge, the 9th International Workshop on Science Gateways (IWSG 2017), 19-21 June 2017 same generic solution and the same workflow mapped to these [6] J. Krüger et al., “Performance studies on distributed virtual screening”, different resources can also be applied to further extend the BioMed Res. Int., vol. 2014, pp. 1-7, Jun, 2014. applicable resources of the experiments. In order to [7] T. Kiss, P. Greenwell, H. Heindl, G. Terstyanszky, and N. Weingarten, “Parameter sweep workflows for modelling carbohydrate recognition”, demonstrate this possibility the experiments were executed on J. Grid Comput., vol. 8, no. 4, pp. 587-601, Dec, 2010. the SZTAKI Desktop Grid (SZDG), a BOINC-based desktop [8] S. Forli et al., “Computational protein-ligand docking and virtual drug grid, integrated in gUSE’s DCI Bridge [26]. Desktop grids use screening with the AutoDock suite”. Nat. Protoc., vol. 11, no. 5, pp. spare CPU cycles from desktop computers to create a powerful 905-919, Apr, 2016. DCI. To prove our concept, we used a WS-PGRADE portal [9] N. D. Prakhov, A. L. Chernorudskiy, and M. R. Gainullin, “VSDocker: a (https://autodock-portal.sztaki.hu/liferay-portal-6.1.0) to run tool for parallel high-throughput virtual screening using AutoDock on AutoDock Vina workflows on the SZDG using the same input Windows-based computer clusters”, Bioinformatics, vol. 26, no. 10, pp. 1374-1375, May, 2010. as above 5 times, with average execution time of 30h 16min 9s. [10] X. Jiang, K. Kumar, X. Hu, A. Wallqvist, and J. Reifman, “DOVIS 2.0: an efficient and easy to use parallel virtual screening tool based on VII. CONCLUSION AND FUTURE WORK AutoDock 4.0”, Chem. Cent. J., vol. 2, no. 1, p. 18, Sep, 2008. This paper presented a generic approach to extend domain- [11] I. Sánchez-Linares, H. Pérez-Sánchez, J. Cecilia, and J García, “High- Throughput parallel blind Virtual Screening using BINDSURF”, BMC specific desktop applications, enabling the execution of Bioinformatics, vol. 13, suppl. 14, S13, Sep, 2012. simulations on different clouds. Several experiments were run [12] R. De Paris, F. A. Frantz, O. Norberto de Souza, and D. D. A. Ruiz, to test and evaluate our approach on two different cloud “wFReDoW: a cloud-based web environment to handle molecular infrastructures and measure the scalability of our solution. We docking simulations of a fully flexible receptor model”, BioMed Res. noticed better performance when using many smaller rather Int., vol. 2013, pp. 1-12, Mar, 2013. than a few larger, and 32-bit rather than 64-bit instances. [13] Amazon Web Services, Inc. “Amazon EC2”. [Online]. Available: Although our implementation is based on the VS software https://aws.amazon.com/ec2/. [Accessed: 7 Mar 2017] Raccoon2, WS-PGRADE/gUSE and CloudBroker, the concept [14] H. Berman, K. Henrick, H. Nakamura, and J. L. Markley, “The worldwide Protein Data Bank (wwPDB): ensuring a single, uniform of extending desktop applications to run on clouds is generic. archive of PDB data”, Nucleic Acids Res., vol. 35, Database Issue, pp. With our extension, Raccoon2 users can use the same familiar D301-D303, Jan, 2007. GUI to run their VS experiments on clouds. They no longer [15] S. R. Ellingson and J. Baudry, “High-throughput virtual molecular require access to a Linux PBS CPU or GPU cluster, which docking with AutoDockCloud”, Concurr. Comp. Pract E., vol. 26, no. 4, brings down the cost of running large VS simulations, making pp. 907-916, Mar, 2014. them affordable for scientists around the world. As shown in [16] N. Huang, B. K. Shoichet, and J. J. Irwin, “Benchmarking sets for our tests, the solution works for different kinds of clouds. molecular docking”, J. Med. Chem., vol. 49, no. 23, pp. 6789-6801, Oct, Considering all gUSE supported DCIs, it could use clusters, 2006. grids, and desktop grids. [17] T. Kiss et al., “Large-scale virtual screening experiments on Windows Azure-based cloud resources”, Concurr. Comp. Pract E., vol. 26, no. 10, In the future, we plan to test our implementation on other pp. 1760-1770, Jul, 2014. DCIs and different desktop applications. Instead of the classical [18] O. Trott and A. J. Olson, “AutoDock Vina: improving the speed and method used here, container technology could help automate accuracy of docking with a new scoring function, efficient optimization, and multithreading”, J. Comput. Chem., pp. 455-461, Jun, 2009. the software deployment. At the moment, due to the nature of [19] P. Kacsuk et al., “WS-PGRADE/gUSE generic DCI gateway framework the gUSE RemoteAPI, result files can only be downloaded to for a large variety of user communities” J. Grid Comput., vol. 10, no. 4, the user’s desktop. We will focus future work on exploring pp. 601-630, Dec, 2012. ways to store and further analyse them, easing access to, and [20] Liferay Inc. “Liferay” [Online]. Available: https:liferay.com. [Accessed: facilitating the sharing of docking results. This can provide a 7 Mar 2017] training set for machine learning-based prediction of execution [21] M. Kozlovszky, K. Karóczkai, I. Márton, P. Kacsuk, and T. Gottdank, time and cost, and optimising cloud resource utilisation. “DCI bridge: executing WS-PGRADE workflows in distributed computing infrastructures”, in Science Gateways for Distributed Computing Infrastructures, Springer, 2014, pp. 51-67. REFERENCES [22] S. J. Taylor, T. Kiss, G. Terstyanszky, P. Kacsuk, and N. Fantini, “Cloud [1] P. Mell, T. Grance, and others, “The NIST definition of cloud computing for simulation in manufacturing and engineering: introducing computing”, 2011. the CloudSME simulation platform”, in Proceedings of the 47th Annual [2] K. Wolstencroft et al., “The Taverna workflow suite: designing and Simulation Symposium, ANSS 14, Tampa, FL, USA, 13-16 Apr 2014, executing workflows of Web Services on the desktop, web or in the A. Tolk, Ed. SCS, 2014. pp. 12-19. cloud”, Nucleic Acids Res., vol. 41, no. W1, pp. W557-W561, Jul, 2013. [23] CloudBroker GmbH. “CloudBroker Platform”. [Online]. Available: [3] B. Ludäscher et al., “Scientific workflow management and the Kepler http://cloudbroker.com/platform/. [Accessed: 7 Mar 2017] system”, Concurr. Comp. Pract E., vol. 18, no. 10, pp. 1039-1065, Aug, [24] J. J. Irwin and B. K. Shoichet, “ZINC-a free database of commercially 2006. available compounds for virtual screening”, J. Chem. Inf. Model., vol. [4] P. Kacsuk, K. Karoczkai, G. Hermann, G. Sipos, and J. Kovacs, “WS- 45, no. 1, pp. 177-182, Dec, 2004. PGRADE: supporting parameter sweep applications in workflows”, in [25] Cloudsigma Holding AG. “Cloud servers & Hosting”. [Online]. The 3rd workshop on Workflows in Support of Large-Scale Science, Available: https://www.cloudsigma.com/. [Accessed 7 Mar 2017] WORKS 2008, Austin, TX, USA, 17 Nov 2008, IEEE, 2008. pp. 1-10. [26] P. Kacsuk, J. Kovacs, Z. Farkas, A. C. Marosi, G. Gombas, and Z. [5] M. M. Jaghoori, A. J. van Altena, B. Bleijlevens, S. Ramezani, J. L. Balaton, “SZTAKI Desktop Grid (SZDG): a flexible and scalable Font, and S. D. Olabarriaga, “A multi-infrastructure gateway for virtual desktop grid system”, J. Grid Comput., vol. 7, no. 4, pp. 439-461, Dec, drug screening”, Concurr. Comp. Pract E., vol. 27, no. 16, pp. 4478- 2009. 4490, Nov, 2015.