8th International Workshop on Science Gateways (IWSG 2016), 8-10 June 2016 Milky Way Analysis through a Science Gateway: Workflows and Resource Monitoring Eva Sciacca, Fabio Vitello, Ugo Becciani, Alessandro Costa ∗ Akos Hajnal, Peter Kacsuk † Sergio Molinari, Anna Maria di Giorgio, Eugenio Schisano, Scige John Liu, Davide Elia ‡ Stefano Cavuoti, Giuseppe Riccio, Massimo Brescia§ ∗ INAF-Osservatorio Astrofisico di Catania, Italy † Laboratory of Parallel and Distributed Systems SZTAKI, Budapest, Hungary ‡ INAF-Istituto di Astrofisica e Planetologia Spaziali, Roma, Italy § INAF-Osservatorio Astronomico di Capodimonte, Napoli, Italy Email: eva.sciacca@oact.inaf.it Abstract—This paper presents the latest developments on Galactic Plane to build and deliver a galaxy scale predictive the VIALACTEA Science Gateway in the context of the FP7 model for star formation of the Milky Way. This model will VIALACTEA project. This science gateway operates as a central be used as a template for external galaxies and studies of workbench for the VIALACTEA community in order to allow as- tronomers to process the new-generation (from Infrared to Radio) star formation across the cosmic time. Usually the essential surveys of the Galactic Plane to build and deliver a quantitative steps necessary to unveil the inner workings of the galaxy as 3D model of our Milky Way Galaxy. The final model will be used a star formation engine (such as the extraction of dust compact as a template for external galaxies to study star formation across condensations or robust reconstruction of the spectral energy the cosmic time. The adopted AGILE software development distribution of objects in star-forming regions) are often carried process allowed to fulfill the community needs in terms of required workflows and underlying resources monitoring. The out manually by the astronomer, and necessarily over a limited scientific requirements arose during the process highlighted the number of galactic sources or very restricted regions. needs for easy parameter setting, fully embarrassingly parallel Therefore scientists required new technological solutions computations and large-scale input dataset processing. Therefore able to deal with the growing data size and quantity coming the science gateway based on the WS-PGRADE/gUSE framework from new-generation surveys (from Infrared to Radio wave- has been able to fulfill the requirements mainly exploiting the parameter sweep paradigm and parallel jobs execution of the length). Moving to the Big Data era, allows to overcome workflow management system. Moving from the development the current challenges pushing the envelope of the current to the production environment an efficient resource monitoring state of the art both from technological and scientific point of system has been implemented to easily analyse and debug sources view. The extraction of the meaningful informations contained of failure due to workflows computations. The results of the in the available data required an entirely new approach (the resource monitoring system are exploitable not only for IT experts administrators and workflow developers but also for the new paradigm of “data driven scientific discovery”) which final users of the gateway. The affiliation to the STARnet Gateway resulted in a novel framework based on advanced visual Federation ensures the sustainability of the presented products analytics techniques, data mining methodologies, machine after the end of the project, allowing the usage of VIALACTEA learning paradigms and Virtual Observatory (VO) based data Science Gateway to all the stakeholders and not only to the representation and retrieval standards. All the underlying community members. Keywords—Workflow Systems; Science Gateways; Collabora- pipelines required by this framework (e.g. knowledge base cat- tive Environments; Astrophysics; DCIs; Milky Way Analysis; alogue creation, maps making for visual analytic) are available Infrastructure Tests; Monitoring through the VIALACTEA Science Gateway. The gateway (described in Section III) is based on the WS- I. I NTRODUCTION PGRADE/gUSE [1] portal framework which provides several The Milky Way is a complex ecosystem where a cyclical ready-to-use functionalities off-the-shelf. It allows develop- transformation process brings diffuse baryonic matter into ment of scientific workflows composed of nodes corresponding dense unstable condensations to form stars, that produce to almost any kind of application in a convenient graphical radiant energy for billions of years before releasing chemically user interface. Workflows can be executed in parallel in a enriched material back into the InterStellar Medium in their wide set of Distributed Computing Infrastructures such as final stages of evolution. Although considerable progress has grids, clusters, supercomputers, and clouds. It enables sharing, been made in the last two decades in the understanding of importing and exporting workflows and managing credentials the evolution of isolated dense molecular clumps toward the (and robot certificates), or gathering workflow execution statis- onset of gravitational collapse and the formation of stars and tics. Beyond these features, the portal is extensible, in fact planetary systems, a lot remains still hidden. WS-PGRADE/gUSE offers a number of interfaces to add new The aim of the European FP7 VIALACTEA project is to applications and portlets to its base capabilities. exploit the combination of all new-generation surveys of the This paper presents the latest developments on the 8th International Workshop on Science Gateways (IWSG 2016), 8-10 June 2016 VIALACTEA Science Gateway including the workflows de- Fig. 1. VIALACTEA integrated technological framework. signed for the community and the resource monitoring sys- tem. The workflows (see Section III-B) are mainly focused for performing intensive computations: map making, i.e. the formation of sky images from the instruments data; data mining to obtain band-merged catalogues relating sources with associated counterparts at different wavelengths; filamentary structure detection and extraction from images. Due to the diverse variety of software and computing capa- bilities required by the workflows, a novel monitoring system has been developed within the gateway to test the status of the whole system. The monitoring covers different levels of tests (see Section IV) checking the gateway interoperability with the computing infrastructures and the workflow submission and execution processes. These tests are performed periodically and the resulting reports are published on the gateway so that also final users are aware of any failure of the system avoiding waste of time in debugging their work. Furthermore e-mail alerts are sent on any failure to the infrastructures administrators to promptly fix the problem. II. VIALACTEA R EQUIREMENTS AND T ECHNOLOGICAL structures or bubbles; and Radio Datacubes with search and A RCHITECTURE cutout services. Data-mining and machine-learning pipelines In order to deliver a model of our galaxy with quantitative are embedded within the Science Gateway as workflows and star formation laws, it is necessary to reveal and analyse employed to carry out building of Spectral Energy Distri- throughout the galaxy the dense filamentary clouds where star- butions, distance estimate and Evolutionary classification of forming clumps are found. These clumps are found in very hundreds of thousands of star forming objects on the Galactic different environments and in different evolutionary stages and Plane. All these produced results are then ingested to the their properties are characterized through detailed modelling VLKB. The Visual Analytics tool allows the interaction with of their Spectral Energy Distribution. Their exact location is the VIALACTEA data and to carry out complex tasks for determined using the most up to date distance estimators and multi-criteria data/metadata queries on the VLKB, subsample all these pieces need to be assembled to get a new view of selection and further analysis processed over the science our Galaxy. gateway, or real-time control of data fitting to theoretical The Galactic distribution of Star Formation Rate (stellar models. mass produced per unit time) and Efficiency (stellar mass Due to the cross-domain scientists involved in the com- produced per unit mass of available dense gas) can be quan- munity (computer scientists, technologists and astronomers) titatively related to the variety of physical agents that drive AGILE software development approach has been adopted. star formation in the Galaxy. The timely exploitation of the This approach in fact promotes adaptive planning, evolutionary huge amount of data available requires new technological development, early delivery, and continuous improvement, and solutions able to overcome the current challenges pushing the it encourages rapid and flexible response to change. Cross- envelope of the current state of the art both from technological disciplinary face-to-face meetings have been organized to and scientific point of view. Therefore it has been imple- promote an iterative, incremental and evolutionary framework mented a novel system based on advanced visual analytics based on several cycles of requirements and feedback sessions. techniques, data mining pipelines, VO-based standards and The science gateway is exploited by the scientists to con- science gateway technologies. The implemented framework figure and run the VIALACTEA workflows implementing the can be seen as an integrated workspace where the Visual pipelines developed by the community (see Section III-B). Analytics Desktop Client, the Science Gateway embedding the Furthermore the science gateway allows the Visual Analytic Data Mining pipelines and the VIALACTEA Knowledge Base tool to submit workflows through the usage of Remote API can be employed both as independent actors or as interacting [2]. This API provides also methods for checking the work- components (see Figure 1). flow’s status, and for downloading the outputs. The scientists The VIALACTEA Knowledge Base (VLKB) includes a required easy parameter setting, fully embarrassingly parallel combination of storage facilities, a Relational Data Base computations and large-scale input dataset processing. There- (RDB) server and web services on top of them. It allows easier fore the science gateway, based on the WS-PGRADE/gUSE searches and cross correlations between data and currently framework1 , has been able to fulfill the requirements (see contains: 2D surveys, catalogue sources and related band merged information; structural informations such as filament 1 gUSE Web Page http://guse.hu 2 8th International Workshop on Science Gateways (IWSG 2016), 8-10 June 2016 Section III-A). their new one using a web-based graph editor. The graph editor has been recently improved (see [6]) replacing the three-stage III. VIALACTEA S CIENCE G ATEWAY process of creating, configuring and submitting workflows to The usage of the science gateway provides user-friendliness a single stage process allowing workflow creation, instant (intuitive user interface), efficiency (fast response time even for configuration and submission within a single portlet. complex user requests), scalability (fast response time even B. The VIALACTEA Workflows for a large number of simultaneous user requests), robustness (keeps working under any circumstances and recovers grace- The available VIALACTEA workflows are mainly designed fully from exceptions) and extensibility (easy to be extended for: map making, i.e. the production of high quality images with new interfaces and functionalities). from the raw instruments data; data mining to obtain band- The VIALACTEA Science Gateway2 is based on a cus- merged catalogues, whose entries consist of sources with tomized version of WS-PGRADE/gUSE version 3.7 and is associated counterparts at different wavelengths; filamentary affiliated with the STARnet Gateway Federation [3]. STARnet structure detection and extraction from images. Specifically envisages sharing a set of services for authentication, a com- the following workflows have been identified. mon and distributed computing infrastructure, data archives MOSAIC: The MOSAIC workflow employs Unimap [7] as and workflow repositories. Each STARnet gateway provides map maker software to produce high quality mosaic images access to specialized applications via customized workflows. from the raw instruments data of the infrared imaging pho- The affiliation to the STARnet Gateway Federation also en- tometers onboard of the ESA Herschel satellite. The employed sures the sustainability of the whole products after the end of applications are coded in IDL, Matlab and Bash scripting the project. This will allow the usage of the science gateway language. The workflow has been implemented as a parameter by all the future possible stakeholders and not only by the sweep workflow [8] embedding a parameter sweep map maker VIALACTEA community. workflow. This allows a full parallelization of the processes to be executed. See Figure 2 for the schema of the workflow. A. gUSE Key Features The inputs specify the tiles to be processed (longitude and This section outlines some characteristics of gUSE that have wavelength) and the parameters of the Unimap application. been identified as key feature for the VIALACTEA science The workflow automatically imports the required data from the gateway. Herschel infrared Galactic Plane Survey (Hi-GAL) [9], [10]. Parallelism: gUSE supports four levels of parallelism on The Instantiator job prepares the input tiles to be processed workflow execution. The lowest level, or node-level paral- by the map maker embedded workflow, which computes each lelism, is where the application itself is prepared to utilize tile separately. The Generator job prepares the sub-tiles to be the benefits of multicore processors (multithread) or cluster processed by the Map Maker job (Unimap). Finally, the output systems (e.g. parallel execution using MPI). Besides this is given by the Collector job of the map maker embedded option, gUSE supports parallel execution of different jobs workflow and contains the maps in FITS (Flexible Image placed at different parallel branches of the workflow graph as Transport System) file format. the most intuitive and simple concurrent execution (branch- PPMAP: The PPMAP workflow executes a Point Process level parallelism). A third level of parallelism covers the MAPping (PPMAP) [11] which is a Bayesian procedure situation when one algorithm should be executed on a large that uses images of dust continuum emission at multiple parameter field, generally called parameter study or parameter wavelengths to produce resolution-enhanced image cubes of sweep (PS) execution. The highest level of parallelism is where differential column density as a function of dust temperature the execution of the same workflow is done in parallel. In fact, and position. The employed applications are coded in For- such a parallel execution of workflows can also be initiated tran90, IDL and Bash scripting language. As for the MOSAIC by the user submitting the same workflow with different workflow, this workflow has been implemented using the configurations. parameter sweep submission schema as shown in Figure 2. DCI and Storage access: gUSE can access various DCIs The inputs specify the tiles to be processed and the parameters using the DCI Bridge [4] and different data storages via the (one for each input tile) to be sent to the PPMAP application. Data Avenue [5]. It provides flexible and versatile access to The workflow automatically imports the required data from all the important applied DCIs within Europe supporting a the Hi-GAL Survey. The output is given by the collector job wide range of different middleware types (Clusters, Grids, of the workflow and contains the maps as FITS file formats. Supercomputers, Desktop grids, Clouds). The file transfer Q-FULLTREE: The Q-FULLTREE workflow performs among various storages and workflow nodes can be handled compact source identification through band-merging. The ap- automatically/transparently using Data Avenue service. plication is based on the positional cross-match among sources Workflow Management System: the workflow creation at different wavelengths. It is configured as a multi threaded and parameter setting can be performed from the web interface job splitting the single-band input catalogues into a user- by importing the workflows from the repository or by creating chosen number of small sub-catalogues, with a user-selected percentage of overlapping entries in order to avoid the loss 2 VIALACTEA Science Gateway:http://via-lactea-sg00.iaps.inaf.it:8080 of merged sequences related to borderline entries. FT-Recap 3 8th International Workshop on Science Gateways (IWSG 2016), 8-10 June 2016 Fig. 2. MOSAIC Workflow. Fig. 3. Monitoring levels and the covered components. framework. DCIs are though much more powerful but some- what less reliable than standalone desktop PCs due to their inherent complexity (remote execution, data staging, environ- ment changes, etc); moreover, the running time of scientific applications varies widely, can take minutes but even days or weeks to complete. Any outage of the underlying DCIs might cause breaking the flow of calculations, and in spite of built-in failover mechanisms (re-submit jobs on failure), on error, it can (FullTree-Recap), a post-processing application associated to be very difficult to localize without having information about the band-merging workflow, is submitted to re-organize the proper behavior of the computing infrastructure. Debugging output of the Q-FULLTREE in order to fulfil the Spectral complex workflows can be a very tedious and time consuming, Energy Distribution visualization expectations of the Visual which require re-running the application several times, with Analytic Desktop Client. The employed applications are coded slight modifications, added logging. Furthermore, sometimes in Python and make internal use of the STILTS public library these errors are not even repeatable; temporary blackouts or [12]. The inputs of the workflow are: a TAR archive containing when worker nodes run out of disk space may prevent the the sources at different wavelength in CSV format and two job scheduling and submission system even to record a notice text files specifying the setup and the configuration for the about the actual cause of the failure. application. Using a DCI monitoring system such as the one designed Filamentary Structure Detection: The workflow is de- and implemented in the VIALACTEA Science Gateway, work- signed to perform filament extraction. The underlying appli- flow developers can make sure that all the related DCIs operate cation [13] identifies filamentary-like extended structures on normally prior to starting long running calculations; and also astronomical images and determines their morphological and on error, by revising historical data of monitoring records, physical parameters. The workflow is developed as a three be sure in that the error is not caused by the failure of the steps processing, one for feature detection, one for filament underlying infrastructure, respectively. System administrators extraction and a final one for filtering of artifacts and creation can also benefit from using resource monitoring, as they can of the final catalogue. All these applications are implemented quickly overview all the systems under their supervision, and in IDL. The first step performs the detection of candidates due to e-mail alerting option, they can react to the corrupted through advanced image analysis techniques based on mapping behavior as soon as possible. Due to historical data trustiness of eigenvalues of the local Hessian Matrix computed from the of the computing resources can be assessed; potential improve- input map. The second step analyses the region of interests ments, measures to prevent the same failures in the future can with the support of morphological operators that decompose be initiated. At the moment, resource monitoring restricts to the initial binary mask into simpler units. Finally the third monitoring DCIs of type Portable Batch System (PBS) used step analyses the candidate list and filters out low elongated in the context of the VIALACTEA project. To help in better structures and possible artifacts building up the final candidate identifying the location of errors different levels of monitoring filamentary catalogue. activities had been designed, which run periodically for all connected DCIs. IV. R ESOURCE M ONITORING Four levels of resource monitoring had been implemented: Level 1 (PBS cluster infrastructure head node monitoring); Continuous monitoring of the operational status (“health”) Level 2 (PBS cluster worker nodes environment monitoring); of the underlying distributed computing infrastructures (DCIs) Level 3 (Portal PBS cluster interoperability monitoring); and connected to the science gateway is of high importance, as they Level 4 (VIALACTEA, domain-specific, workflow operational serve as the actual platform performing scientific calculations monitoring). for the VIALACTEA Science Gateway, which functionality The lowest level, level 1 (called “PBS head nodes”), checks was yet missing from the base WS-PGRADE/gUSE portal that the DCI is indeed accessible from the portal (head node 4 8th International Workshop on Science Gateways (IWSG 2016), 8-10 June 2016 responds to ping, successful SSH connection can be estab- Fig. 4. Monitoring results of “PBS worker nodes” tests (level 2). lished) and all the essential middleware commands (qsub, qstat, pbsnodes, etc.) operate as expected. Level 2 tests (called “PBS worker nodes”) scan through all worker nodes available in the DCI (they are all candidates of potential job execution) and checks, one-by-one, that the expected execution environment is available, such as enough disk space, neces- sary runtime environments and libraries (Java, IDL, Matlab, Python, etc.). Level 3 tests involve testing both the portal’s and the DCI’s functionality (“Portal-PBS interoperability”), which executes a probe workflow, composed of a single job. Level 4 tests (“Vialactea base workflows”) submit pre-created, domain- specific workflows having characteristics and requirements similar to other applications used in the customized portal, though with parameters resulting in less load compared to other full-fledged computations. Note that once tests on a certain level fail, tests at higher levels will fail too; thus the portal users about the incidents and expected time of recovery, lowest level of failures help in locating the source of the respectively. Unwanted side effects of software configuration problem as precisely as possible. changes in clusters were also detected by the monitoring tool Figure 3 illustrates which main components of the system automatically (e.g., corrupted Matlab, IDL paths in worker are covered by the different levels of tests (the higher the level, nodes). Finally, due to historical data, it turned out that the the more components are covered by the monitoring test). capacity of one PBS cluster was insufficient; response times For the different levels of monitoring activities different fre- were at an acceptable level at weekends only, at very low quency can be specified, i.e. how often and at what time they load. The given infrastructure was then re-installed with more are to be executed. For example, in the current VIALACTEA processors and more worker nodes to fulfill users’ needs. portal, level 1 tests are set to be executed every 3 hours; level 4 tests run once a day, at midnight. It makes possible to tune V. R ELATED W ORK and schedule the load caused by monitoring system itself to To deal with the data deluge that the Astrophysics com- avoid performance degradation might be experienced during munity is facing, different science gateways and work- normal use of the portal as much as possible. All results are flow technologies are being exploited. Apart from the WS- recorded, so operational status of each resource can be traced PGRADE/gUSE framework that has been extensively em- back for the specified period of time (30 days, by default). ployed by the authors, see e.g. [14], [15], [16], different Also, for error events e-mail alerting can be requested for any approaches have been followed to allow the end users to easily number of e-mail addresses (primarily, system administrator interact with the applications ported on the DCIs. is notified). In [17], the authors present an approach based on the Monitoring data can be viewed by any user of the portal; Taverna Workbench3 [18] and the Astrotaverna plugin4 [19] changes to settings are however allowed by the portal adminis- to perform kinematical modelling of galaxies as an example trators only. Monitoring results are summarized and visualized of analysis task required by the SKA project (which aims in the form of tables and charts on a web interface. Figure 4 to build an instrument that will be the worlds largest radio shows level 2 resource monitoring results. The table (on the interferometer, able to reach data rates in the exa-scale). The top of figure 4) shows the latest results and the frequency of Apache Airavata5 [20] environment on XSEDE6 resources these tests (6 hours). PASSED 10/10 means all tests (free disk have been used in [21] to produce multiple synthetic sky space, Java, Matlab, Python, IDL) had passed on all 10 worker surveys of galaxies and large-scale structure in support of nodes available in cluster “muoni-server-02.oact.inaf.it”. The Dark Energy Survey analysis. The underlying technologies chart (on the bottom of figure 4) shows test results of the described in those works are well suited to be ported into last 30 days, indicating outage on dates 12–16, 17, and 19 a science gateway such as the VIALACTEA one, but requires February; the DCI worked properly at other times. time and extra IT effort for coding web services (as wrapper) As a result of the introduced monitoring service, in prac- on top of each application of interest of the astronomers. tice, gateway users could now verify, prior to running their The Kepler7 scientific workflow system [22] has been workflows, whether the infrastructure of their choice is indeed employed in [23] to implement automatic data reduction available, operational, responsive, and, if not, they still had pipelines. This approach could have been very useful within the possibility to choose another DCI. Also, on error, they 3 Taverna web site: http://www.taverna.org.uk could check past records to clarify whether the failure was 4 AstroTaverna plugin: http://amiga.iaa.es/p/290-astrotaverna.htm due to infrastructure problems thus avoiding costly debugging. 5 Apache Airavata web site: http://airavata.apache.org Administrators were always notified about DCI outages on 6 XSEDE web site: https://www.xsede.org time, so they could fix issues as soon as possible and inform 7 Kepler project web site: https://kepler-project.org 5 8th International Workshop on Science Gateways (IWSG 2016), 8-10 June 2016 the VIALACTEA project but again it requires IT effort to This paper also described a novel resource surveillance build the required Kepler actors for each application. component integrated into WS-PGRADE/gUSE portal capable Finally, to our knowledge, none of the above solutions of checking operational status of the employed computational included a resource monitoring system able to check the status infrastructures based on Portable Batch Systems (PBS). The of the overall gateway interacting components, including the monitoring covers different levels of tests checking the gate- required runtime, as required by the VIALACTEA community. way interoperability with the computing infrastructures and the There exist several resource monitoring tools available, workflow submission and execution processes. These tests are such as Ganglia[24], Nagios 8 , Zabbix 9 , Prometheus10 to performed periodically and the resulting reports are published mention a few, shipped with numerous probes out-of-the-box on the gateway so that also final users are aware of any failure to monitor typical host and service metrics such as availability, of the system avoiding waste of time in debugging their work. CPU, network utilization, memory, disk space usage, service Amongst the things deserving further studies is the evalua- checks, etc. Beyond that they require individual installation, tion of MetaBrokering service of WS-PGRADE/gUSE which administration, and considerable expertise to manage, they is capable of distributing and balancing the load among seemed not to be easily adaptable in our special case, as worker different distributed computing infrastructures. This will be nodes, behind head nodes in PBS clusters, are inaccessible exploited for parameter sweep jobs, such as the map making from outside (they reside in private network); their monitoring computations, avoiding excessive load of one resource with was possible only through submitting dedicated PBS jobs. respect to other having higher capacity. Also, verifying the results of workflow execution, which can only be done using the ”remote API” of the portal, seemed to ACKNOWLEDGMENT be difficult to realize using such tools. Our implementation, and its integration into the portal has other advantages as The research leading to these results has received fund- well: it uses the same monitoring source (host of the gateway) ing from the European Union Seventh Framework Pro- and mechanisms (software libraries, SSH connections, PBS gramme (FP7/2007-2013) under grant agreement no. 607380 commands) as the portal, so it tests resources from an identical (VIALACTEA). environment. Nevertheless, we connected our tool to Zabbix to record workflow execution time metric, and we used Zabbix R EFERENCES triggers, notifications, and chart visualization. [1] P. Kacsuk, Z. Farkas, M. Kozlovszky, G. Hermann, A. Balasko, K. Karoczkai, and I. Marton, “Ws-pgrade/guse generic dci gateway VI. C ONCLUSIONS AND O UTLOOK framework for a large variety of user communities,” Journal of Grid In this paper we have introduced a new framework that Computing, vol. 10, no. 4, pp. 601–630, 2012. [2] A. Balasko, Z. Farkas, and P. Kacsuk, “Building science gateways by allow astronomers to process the new-generation surveys of utilizing the generic WS-PGRADE/gUSE workflow system,” Computer the Galactic Plane to build and deliver a quantitative model Science, vol. 14, no. 2, pp. 307–325, 2013. of Milky Way Galaxy. The presented science gateway op- [3] U. Becciani, E. Sciacca, A. Costa, P. Massimino, F. Vitello, S. Cassisi, A. Pietrinferni, G. Castelli, C. Knapic, R. Smareglia et al., “Creating erates as a central workbench for the VIALACTEA com- gateway alliances using ws-pgrade/guse,” in Science Gateways for munity allowing to deal with the growing data size and Distributed Computing Infrastructures. Springer, 2014, pp. 255–270. quantity coming from new-generation surveys. The extraction [4] M. Kozlovszky, K. Karóczkai, I. Márton, P. Kacsuk, and T. Gottdank, “Dci bridge: Executing ws-pgrade workflows in distributed computing of the meaningful informations contained in the available infrastructures,” in Science Gateways for Distributed Computing Infras- data required an entirely new approach (the new paradigm tructures. Springer, 2014, pp. 51–67. of data driven scientific discovery) which resulted in a novel [5] Á. Hajnal, Z. Farkas, P. Kacsuk, and T. Pintér, “Remote storage resource management in ws-pgrade/guse,” in Science Gateways for Distributed framework based on advanced visual analytics techniques, Computing Infrastructures. Springer, 2014, pp. 69–81. data mining methodologies, machine learning paradigms and [6] G. A. McGilvary, M. Atkinson, S. Gesing, A. Aguilera, R. Grunzke, and Virtual Observatory based data representation and retrieval E. Sciacca, “Enhanced usability of managing workflows in an industrial data gateway,” in e-Science (e-Science), 2015 IEEE 11th International standards. Conference on. IEEE, 2015, pp. 495–502. The focus of the presented workflow applications is on map [7] L. Piazzo, L. Calzoletti, F. Faustini, M. Pestalozzi, S. Pezzuto, D. Elia, making, i.e. the formation of sky images from the instruments A. di Giorgio, and S. Molinari, “unimap: a generalized least-squares map data; data mining to obtain band-merged catalogues relating maker for herschel data,” Monthly Notices of the Royal Astronomical Society, vol. 447, no. 2, pp. 1471–1483, 2015. galactic sources with associated counterparts at different wave- [8] P. Kacsuk, K. Karoczkai, G. Hermann, G. Sipos, and J. Kovacs, “WS- lengths; and filamentary structure detection and extraction PGRADE: Supporting parameter sweep applications in workflows,” in from sky images. Furthermore we have highlighted how the Workflows in Support of Large-Scale Science, 2008. WORKS 2008. Third Workshop on. Ieee, 2008, pp. 1–10. usage of WS-PGRADE/gUSE framework have been able to [9] S. Molinari, B. Swinyard, J. Bally, M. Barlow, J.-P. Bernard, P. Martin, fulfil the project requirements thanks to its key features: user- T. Moore, A. Noriega-Crespo, R. Plume, L. Testi et al., “Hi-gal: The friendliness, efficiency, scalability, robustness and extensibil- herschel infrared galactic plane survey,” Publications of the Astronomical Society of the Pacific, vol. 122, no. 889, p. 314, 2010. ity. [10] D. Elia, S. Molinari, Y. Fukui, E. Schisano, L. Olmi, M. Veneziani, T. Hayakawa, M. Pestalozzi, N. Schneider, M. Benedettini et al., “The 8 Nagios: http://www.nagios.org first hi-gal observations of the outer galaxy: A look at star formation in 9 Zabbix:http://www.zabbix.com the third galactic quadrant in the longitude range 216. 5 225. 5,” The 10 Prometheus:https://prometheus.io Astrophysical Journal, vol. 772, no. 1, p. 45, 2013. 6 8th International Workshop on Science Gateways (IWSG 2016), 8-10 June 2016 [11] K. Marsh, A. Whitworth, and O. Lomax, “Temperature as a third dimension in column-density mapping of dusty astrophysical structures associated with star formation,” Monthly Notices of the Royal Astronom- ical Society, vol. 454, no. 4, pp. 4282–4292, 2015. [12] M. Taylor, “Stilts-a package for command-line processing of tabular data,” in Astronomical Data Analysis Software and Systems XV, vol. 351, 2006, p. 666. [13] E. Schisano, K. Rygl, S. Molinari, G. Busquet, D. Elia, M. Pestalozzi, D. Polychroni, N. Billot, S. Carey, R. Paladini et al., “The identification of filaments on far-infrared and submillimiter images: Morphology, physical conditions and relation with star formation of filamentary structure,” The Astrophysical Journal, vol. 791, no. 1, p. 27, 2014. [14] U. Becciani, E. Sciacca, A. Costa, P. Massimino, C. Pistagna, S. Riggi, F. Vitello, C. Petta, M. Bandieramonte, and M. Krokos, “Science gateway technologies for the astrophysics community,” Concurrency and Computation: Practice and Experience, vol. 27, no. 2, pp. 306–327, 2015. [15] E. Sciacca, M. Bandieramonte, U. Becciani, A. Costa, M. Krokos, P. Massimino, C. Petta, C. Pistagna, S. Riggi, and F. Vitello, “Visivo science gateway: a collaborative environment for the astrophysics com- munity,” in 5th International Workshop on Science Gateways, IWSG 2013. CEUR Workshop Proceedings, 2013. [16] A. Costa, P. Massimino, M. Bandieramonte, U. Becciani, M. Krokos, C. Pistagna, S. Riggi, E. Sciacca, and F. Vitello, “An innovative science gateway for the cherenkov telescope array,” Journal of Grid Computing, vol. 13, no. 4, pp. 547–559, 2015. [17] S. Sanchez Exposito, P. Martin, J. E. Ruiz, L. Verdes-Montenegro, J. Garrido, R. S. Pardell, A. Ruiz Falco, and R. Badia, “Web services as building blocks for science gateways in astrophysics,” in Science Gateways (IWSG), 2015 7th International Workshop on. IEEE, 2015, pp. 80–84. [18] K. Wolstencroft, R. Haines, D. Fellows, A. Williams, D. Withers, S. Owen, S. Soiland-Reyes, I. Dunlop, A. Nenadic, P. Fisher et al., “The taverna workflow suite: designing and executing workflows of web services on the desktop, web or in the cloud,” Nucleic acids research, p. gkt328, 2013. [19] J. Ruiz, J. Garrido, J. Santander-Vela, S. Sánchez-Expósito, and L. Verdes-Montenegro, “Astrotavernabuilding workflows with virtual observatory services,” Astronomy and Computing, vol. 7, pp. 3–11, 2014. [20] M. E. Pierce, S. Marru, L. Gunathilake, D. K. Wijeratne, R. Singh, C. Wimalasena, S. Ratnayaka, and S. Pamidighantam, “Apache airavata: design and directions of a science gateway framework,” Concurrency and Computation: Practice and Experience, vol. 27, no. 16, pp. 4282– 4291, 2015. [21] B. Erickson, R. Singh, A. E. Evrard, M. R. Becker, M. T. Busha, A. V. Kravtsov, S. Marru, M. Pierce, and R. H. Wechsler, “Enabling dark energy survey science analysis with simulations on xsede resources,” in Proceedings of the Conference on Extreme Science and Engineering Discovery Environment: Gateway to Discovery. ACM, 2013, p. 16. [22] B. Ludäscher, I. Altintas, C. Berkley, D. Higgins, E. Jaeger, M. Jones, E. A. Lee, J. Tao, and Y. Zhao, “Scientific workflow management and the kepler system,” Concurrency and Computation: Practice and Experience, vol. 18, no. 10, pp. 1039–1065, 2006. [23] W. Freudling, M. Romaniello, D. Bramich, P. Ballester, V. Forchi, C. Garcı́a-Dabló, S. Moehler, and M. Neeser, “Automated data reduction workflows for astronomy-the eso reflex environment,” Astronomy & Astrophysics, vol. 559, p. A96, 2013. [24] M. L. Massie, B. N. Chun, and D. E. Culler, “The ganglia distributed monitoring system: design, implementation, and experience,” Parallel Computing, vol. 30, no. 7, pp. 817–840, 2004. 7