=Paper=
{{Paper
|id=Vol-2037/paper41
|storemode=property
|title=Effectively and Efficiently Supporting Grid and Cloud Integration via a DBMS-based Framework
|pdfUrl=https://ceur-ws.org/Vol-2037/paper_41.pdf
|volume=Vol-2037
|authors=Alfredo Cuzzocrea,Osvaldo Gervasi,Mirko Mariotti,Flavio Vella,Alessandro Costantini
|dblpUrl=https://dblp.org/rec/conf/sebd/CuzzocreaGMVC17
}}
==Effectively and Efficiently Supporting Grid and Cloud Integration via a DBMS-based Framework==
Effectively and Efficiently Supporting Grid and Cloud Integration via a DBMS-based Framework Alfredo Cuzzocrea1 , Osvaldo Gervasi2 , Mirko Mariotti3 , Flavio Vella4 , and Alessandro Costantini5 1 DIA Dept., University of Trieste and ICAR-CNR, Italy alfredo.cuzzocrea@dia.units.it 2 DMI Dept., University of Perugia, Italy osvaldo.gervasi@unipg.it 3 FISGEO Dept., University of Perugia, Italy mirko.mariotti@unipg.it 4 DI Dept., University of Rome La Sapienza, Italy vella@di.uniroma1.it 5 CNAF-INFN, Italy alessandro.costantini@cnaf.infn.it Abstract. This paper provides anatomy, models and functionalities of a DBMS-based systems for integrating Grids and Clouds. Our study starts from recognizing the similarity of some axioms of Grid and Cloud com- puting, still being these computational paradigms very different for what regards both computing and economic models. Our proposed system is centered along a well-designed DBMS schema that allows to obtain a seamless integration between Grids and IaaS Cloud providers. The pa- per details how images from a Cloud environment are deployed in reply to a specific task execution invoked from the (integrated) Grid environ- ment, as well as other essential components of the proposed architecture (e.g., resource access and grant, user authorization, resource discovery and sharing, job and task management and distribution, integration with other computational platforms, and so forth). 1 Introduction Nowadays two trends are evident in the way the computational resources are or- ganized and managed to provide users the computing infrastructures adequate for the emerging computing needs. On the one hand, virtualization technologies are massively adopted, based on more and more powerful Cloud systems (Open- stack, Opennebula, Eucaliptus, etc.), along with systems for deploying virtual machines and all technologies related to the Cloud scenario. On the other hand, it is clear that in order to increase the performances of the computing systems the best way is to adopt heterogeneous architectures, specializing them on the basis of the requested type of computation from the users. Examples of this type are the usage of GPUs for the fast solution of different problems in computer science like graph analysis [9], cryptography [19, 29, 30] or computational logic[16], the development of innovative architectures, like the Parallella board[24], and the adoption of Field-Programmable Gate Array (FPGA) device as a computing resource[20, 12]. Cloud and Grid computing share some essential driving ideas which led to the construction of both large scale federated Grid infrastructures which can be summarized as follows: (i ) bring the promise of encapsulating the complexity of hardware resources and make them easily accessible by means of high-level user interfaces; (ii ) address some form of the intrinsic scalability issues of large scale computational challenges; (iii ) cope with the need of resources that cannot be hosted on premises. However, the key differences between Grids and Clouds concern abstractions and compute models adopted by both paradigms [18]. It can be said that Grids are built “bottom up” and are concerned more with federation of static exist- ing resources that typically are legacy clusters built around a Local Resources Management System (LRMS) that exploits the Batch computing model. The development of applications for Grid environments requires the knowl- edge of the Grid infrastructure abstractions. This process, aimed at enabling the application to run in such environments, is in fact called “Grid-enabling” and can be rather complex [17]. On the other hand, Cloud users can choose their own compute model, leveraging more general (without the needs of a fine tuning of the environment) interfaces that often lead to simpler interaction and application development [27]. As sketched before, there are several problems that do not play nicely with the heterogeneous nature of aggregated resources in Grids. For example, many scien- tific applications need different environments (operating systems, libraries) and hardware (i.e., Multicore Processors, FPGA, GPUs, etc.). From a Batch point of view this represents a set of requirements influencing scheduling decisions for both top and local level resource managers. So the Grid sites heterogeneity plays a central role in job distribution (workload) among sites that match the aforementioned requirements. Furthermore the Grid workload can be often unpredictable and subject to burst increase, that lead to unbalanced distribution in resource usage, and even deterioration of QoS. In this context the Grid workflow represents a weak point for the Batch model in which resources are often statically managed and parti- tioned, and cannot be adapted in advance to meet possible requirements. More- over the use of Clouds could allow the extension of private resource pools in number and typology with positive effects on Quality of Service (QoS). So why do not dismiss Grids and adopt Cloud solutions? There are several reasons: it is not yet clear how some critical issues (data management, security, etc.) are to be dealt with in the Cloud era, while in Grid are well-established. Furthermore, the costs of an eventual shift in technology must be thoroughly investigated. A more reasonable approach is an integration process that combines the features of both. 2 Integration Opportunities Before the Cloud era, even if these issues were addressed in various works[10, 28, 21], the proposed solutions were often heavy customized and too tightly depen- dent on particular technological choices. With the success of Cloud computing through the spread of IaaS providers [7, 2, 3], the development of interfaces for the simplification of virtual management [4, 6] and related libraries (i.e.,[5, 1], etc.) paved the way to several possibilities for Grid and Cloud integration. As a matter of fact, even if Cloud solutions have been, since their first definition[13], primarily driven by business motivations, the IaaS service model seems to respond to some of the Batch model issues and can overcome them with both on-demand and adaptive characteristics. To the best of our knowledge there are three approaches of site-level integra- tion between the Batch oriented Grid compute model and the service oriented nature of Clouds. Grid over Clouds. According to this alternative, a whole Grid site is built on top of a public/private Cloud. Through this schema, the Grid infrastructure can be built by instantiating resources according to the real needs of the users. In [8], the authors provided a “Grid as a service” tool in order to create new Grid sites, or to add computational resources to existing Grid sites by exploiting a Platform as a service (PaaS) approach. Similar approach is also adopted in [23]. Hybrid with Batch-Dependent Cloud-Enabled LRMS. In this model a single local Batch system is used to schedule the jobs on a pool of dynamically provisioned resources either on premises or public/private Clouds. For example, in [26], the authors described a solution which enables building dynamical envi- ronments through Grid jobs or local Cloud jobs. The solution proposed is built around the LRMS which handles each request. This approach presents manifold limitations. The main drawbacks are the following: i) the solution is strongly dependent on a particular technology adopted (i.e. LRMS requires customiza- tions); ii) the approach is not elastic[11] since it enables the spawn of a virtual environment on local resources only. Hybrid no Batch-Dependent. In this model the local Grid site spawns re- sources (even whole clusters) on public/private Clouds on the basis of the jobs requests. The integration is done at the Computing Element level. In this way, several computational resources (i.e. resources available on other computational centers) can be exploited by a fine grained control over virtual instances. In [29], the first solution based on Cloud-over-Grid approach was presented. The authors also validated their solution providing to Grid users special virtual com- putational resources as GPUs. The last two approaches can be also identified as two types of “Cloud-over- Grid”. In the present work, we describe our solution, that may be used to imple- ment any of the described hybrid approaches with the special attention to the no Batch-dependent model. 3 System Anatomy The proposed system is based on the adoption of Cloud systems to enable the Grid sites to provide to the users a set of non-traditional resources, like the aforementioned ones and dynamical environments. As an example a Grid user may request to run in a server equipped with a given GPU, or with a particular software library installed or operating system. Our system has been designed according to the Unix principles: each compo- nent is autonomous, independent from the others, specialized in carrying out a named task in a simple way. According to this approach we have chosen to use tags for cataloging the virtual machines that have a certain type of hardware features (such as a named architecture, hybrid systems, GPU, etc.) or software (operating systems, special libraries installed). These tags are published using the standard techniques of the Grid environment, as features implemented and published by a particular site, and which can be specified as requirements by the users when submitting a job. In this way, the Grid information system enable the users to submit jobs requiring special environments, provided only by some Grid sites. In Figure 1 the project logical schema is sketched. Our solution in built around a DBMS that plays a central role since it contains the configuration of the system, in terms of the connected clusters, the Cloud systems, and their environments and status. The architectural workflow of our solution is imple- mented by different agents connected to the DBMS each performing a specific action; they will be described in detail later. Fig. 1. Description of the Proposed System In the remaining part of this paper we will use the following terms, and corresponding meaning: Computing Element (CE): is the set of resources made by the Gatekeeper and the Cluster. Gatekeeper : is the system that provides the gateway through which the Grid jobs are submitted to the Batch system running on the local farm nodes; Cluster : it is a Grid enabled Cluster, i.e. a bunch of Worker Nodes (WNs), connected to a Computing Element (CE) and connected to the Grid system. When referring to Clusters we will mean the Cluster Resource Manager. Cloud : it is a Cloud infrastructure with a Cloud controller like OpenNebula, OpenStack or Eucalyptus. Computational node: a single server used as target of the incoming Grid job without the use of a Batch system or a Cloud System. Cloudtag: Tag used to mark the images and to organize the infrastructure re- sources. 4 DBMS Structure The information about Clusters and Clouds is collected on a DBMS system. From implementation point of view we have chosen PostgreSQL for this purpose. The informations have been divided into four logic blocks, each one mapped to a DBMS schema: - The capabilities schema contains informations about the Clusters, Cloud Con- trollers and Virtual Machine images known to the system. It also contains the information about tagging the Virtual Machine images to publish this informa- tion through to Grid information system. - The needs schema contains a live view of the cloudtag needed by clusters. - The fulfillments schema contains a live view of the cloudtag offered by Cloud systems. - The running schema contains the list of the running jobs, with related details. 4.1 The Capabilities Schema The capabilities schema contains the information related to the composition of the various systems. Each software component reads from the DB the necessary information, since the capabilities schema contains all the information related to the structure of the system. The information contained in this schema concern the Cloud, Clusters, Gatekeepers, the Computational nodes connected to the system, and, more important, the Virtual Machine images. Information on the Active Cloud Systems The table clouds of the ca- pabilities schema traces the following information related to the active Cloud systems: (i ) type of Cloud system (i.e.: OpenStack, OpenNebula or Eucaliptus); (ii ) the description of the Cloud system; (iii ) the information on how to interact with the system, which may, or may not, contain authentication information. Information on the Active Clusters The table clusters of the capabilities schema traces the following information related to the Batch system of the active Clusters: (i ) type of Cluster’s Resource Manager (Torque/MAUI, LFS etc); (ii ) the description of the Cluster; (iii ) the optional information on how to interact with the Batch system of the Cluster. Information on Gatekeepers The table gatekeepers of the capabilities schema traces the information related to the Gatekeepers of the active Grid nodes. In particular, the more important information are: (i ) gatekeeper information and the Information System of the Grid site; (ii ) description of the Grid site; (iii ) the optional information on how to access the Gatekeeper and/or the Information System. It is relevant to notice the reason why we implemented two separate tables, one for Gatekeepers and one for the Batch Systems, even if the Grid site is the same, then the CE is listed in the gatekeepers table and the Batch System in clusters table. We kept separated the two tables because we want to stress the fact that the job path, and the related sequence of events and actions, are different if they are under the control of a Batch System or not. Information on Standalone Computational nodes The table compnodes of the capabilities schema traces the information related to the access to Com- putational nodes capable of executing jobs. They represent the real or virtual machines not connected to a Batch System, we want to include in our System. The most relevant information are: (i ) node type; (ii ) operating system; (iii ) information related to the access to the node. Virtual Machine Images A job can be received by a Gatekeeper or by a Batch System and sent to a virtual resource (Cloud) or on a standalone Computational node. The aforementioned resources can be of two types: those that require the fulfillment of a need (Gatekeepers and Clusters) and those that satisfy the need (Clouds and Computational nodes). The association between requests to meet and who can satisfy them is performed inside the table images. The possible job flows originated by this schema are four and they are de- scribed in Table 1. Table 1. Possible Job Flows Component Target Gatekeeper Cloud Gatekeeper Computational node Cluster Cloud Cluster Computational node The single flows will be discussed in the next sections. The table images contains the couple of values Component from which the job is coming and Target, indicating the job flow in the system, the tag that will be published by the Grid Information System in order to notify the presence of the resource, and a series of information related to the possibility of creating multiple instances. In particular, are advised the following information: (i ) how many instances may be generated (for a single computational node this value is 1); (ii ) number of jobs per instance; (iii ) magnitude and boundaries of the instances; (iv ) waiting time before destroying the images. 4.2 The Needs Schema In the needs schema the software agents running on Clusters and/or Gatekeepers connected to the system, maintain the status of requests to be satisfied. Each agent has associated a table containing the list of jobs with the related cloudtags. The job listed in such tables are all waiting jobs. Running jobs are listed in the running schema. 4.3 The Fulfillments Schema In the fulfillments schema are instead listed the resources available to satisfy the requests, so that the systems may know for each cloudtag where is located the Cloud or the Computational node. 4.4 The Running Schema We included in the system also the running schema having the purpose of storing the state of running resources. 5 Conclusions and Future Work In the present work, we have figured out different integration strategies which al- low a simple interoperability between Batch-oriented and Service-oriented com- puting models, namely Computational Grids and Cloud Computing. We pro- vided a straightforward implementation of one of the proposed strategies. The work may be extended in several ways. The DBMS may be removed from the architecture and the system may be re-engineered to be fully distributed adopting for example the protocol 9P described in [25]. Furthermore we can explore innovative approaches for data and big-data management. In this respect, some interesting directions to be taken into consideration are: (i ) fragmentation issues (e.g., [14]); (ii ) uncertain data management issues (e.g., [22]); (iii ) general big data management issues (e.g., [15]). References 1. Web site of boto library, a python interface to amazon web services. https:// github.com/boto/boto. Accessed on 2015-09-09. 2. Web site of eucalyptus. http://www.eucalyptus.com. Accessed on 2015-09-09. 3. Web site of nimbus project. http://www.nimbusproject.org/. Accessed on 2015- 09-09. 4. Web site of the amazon elastic compute cloud (ec2):. http://aws.amazon.com/ ec2/. Accessed on 2015-09-09. 5. Web site of the fog library sdk, an interface to openstack cloud environ- ment:. https://github.com/fog/fog/blob/master/lib/fog/openstack/docs/ getting_started.md. Accessed on 2015-09-09. 6. Web site of the open cloud computing interface (occi). http://occi-wg.org/. Accessed on 2015-09-09. 7. Web site of the open nebula project. http://opennebula.org/. Accessed on 2015- 09-09. 8. G. B. Barone, R. Bifulco, V. Boccia, D. Bottalico, R. Canonico, and L. Carracci- uolo. Gaas: Customized grids in the clouds. In Euro-Par 2012: Parallel Processing Workshops, pages 577–586. Springer, 2013. 9. M. Bernaschi, G. Carbone, E. Mastrostefano, and F. Vella. Solutions to the st- connectivity problem using a gpu-based distributed {BFS}. Journal of Parallel and Distributed Computing, 76:145 – 153, 2015. Special Issue on Architecture and Algorithms for Irregular Applications. 10. R. Buyya, D. Abramson, J. Giddy, and H. Stockinger. Economic models for re- source management and scheduling in grid computing. Concurrency and compu- tation: practice and experience, 14(13-15):1507–1542, 2002. 11. R. Buyya, C. S. Yeo, S. Venugopal, J. Broberg, and I. Brandic. Cloud computing and emerging it platforms: Vision, hype, and reality for delivering computing as the 5th utility. Future Generation computer systems, 25(6):599–616, 2009. 12. J. Castillo, . Bosque, C. Pedraza, E. Castillo, P. Huerta, and J. I. Martinez. Low cost high performance reconfigurable computing. In W. Vanderbauwhede and K. Benkrid, editors, High-Performance Computing Using FPGAs, pages 453–479. Springer New York, 2013. 13. R. K. Chellappa. ”intermediaries in cloud–computing: A new computing paradigm”. INFORMS Meeting, 1997. 14. A. Cuzzocrea, J. Darmont, and H. Mahboubi. Fragmenting very large XML data warehouses via k-means clustering algorithm. IJBIDM, 4(3/4):301–328, 2009. 15. A. Cuzzocrea, D. Saccà, and J. D. Ullman. Big data: a research agenda. In 17th International Database Engineering & Applications Symposium, IDEAS ’13, Barcelona, Spain - October 09 - 11, 2013, pages 198–203, 2013. 16. A. Dovier, E. P. A. Formisano, and F. Vella. Parallel execution of the ASP compu- tation - an investigation on gpus. In Proceedings of the Technical Communications of the 31st International Conference on Logic Programming (ICLP 2015), Cork, Ireland, August 31 - September 4, 2015., 2015. 17. I. Foster, C. Kesselman, and S. Tuecke. The anatomy of the grid. Berman et al.[2], pages 171–197, 2003. 18. I. Foster, Y. Zhao, I. Raicu, and S. Lu. Cloud computing and grid computing 360- degree compared. In Grid Computing Environments Workshop, 2008. GCE’08, pages 1–10. Ieee, 2008. 19. O. Gervasi, D. Russo, and F. Vella. The aes implantation based on opencl for multi- /many core architecture. In Computational Science and Its Applications (ICCSA), 2010 International Conference on, pages 129–134, 2010. 20. B. Klauer. The convey hybrid-core architecture. In W. Vanderbauwhede and K. Benkrid, editors, High-Performance Computing Using FPGAs, pages 431–451. Springer New York, 2013. 21. M. M. L. Servoli and R. M. Cefalà. A proposal to dynamically manage virtual en- vironments in heterogeneous batch systems. In IEEE Nuclear Science Symposium Conference Record, 2008. NSS08, pages 823–826, 2008. 22. C. K. Leung, A. Cuzzocrea, and F. Jiang. Discovering frequent patterns from uncertain data streams with time-fading and landmark models. T. Large-Scale Data- and Knowledge-Centered Systems, 8:174–196, 2013. 23. C. Loomis, M. Airaj, M. E. Bégin, E. Floros, S. Kenny, and D. O’Callaghan. Stratuslab cloud distribution. European Research Activities in Cloud Computing, page 271, 2012. 24. Parallella. Parallella 1-x reference manual, 2014. accessed on July 11, 2015. 25. R. Pike, D. Presotto, K. Thompson, H. Trickey, et al. Plan 9 from bell labs. In Proceedings of the summer 1990 UKUUG Conference, pages 1–9. London, UK, 1990. 26. E. Ronchieri, D. Cesini, D. D’Agostino, V. Ciaschini, G. D. Torre, P. Cozzi, D. Sa- lomoni, A. Clematis, L. Milanesi, and I. Merelli. The wnodes cloud virtualization framework: a macromolecular surface analysis application case study. In Parallel, Distributed and Network-Based Processing (PDP), 2014 22nd Euromicro Interna- tional Conference on, pages 218–222. IEEE, 2014. 27. A. M. S. Jha and G. Fox. Using clouds to provide grids with higher levels of ab- straction and explicit support for usage modes. Concurr. Comput.: Pract. Exper., 21(8):1087–1108, June 2009. 28. B. Sotomayor, K. Keahey, and I. Foster. Combining batch execution and leasing using virtual machines. In Proceedings of the 17th international symposium on High performance distributed computing, pages 87–96. ACM, 2008. 29. F. Vella, R. M. Cefala, A. Costantini, O. Gervasi, and C. Tanci. Gpu computing in egi environment using a cloud approach. In Computational Science and Its Applications (ICCSA), 2011 International Conference on, pages 150–155. IEEE, 2011. 30. F. Vella, I. Neri, O. Gervasi, and S. Tasso. A simulation framework for scheduling performance evaluation on cpu-gpu heterogeneous system. In Proceedings of the 12th International Conference on Computational Science and Its Applications - Volume Part IV, ICCSA 2012, pages 457–469, Berlin, Heidelberg, 2012. Springer- Verlag.