=Paper=
{{Paper
|id=Vol-2357/paper9
|storemode=property
|title=Flexible Deployment of Social Media Analysis Tools
|pdfUrl=https://ceur-ws.org/Vol-2357/paper9.pdf
|volume=Vol-2357
|authors=Gabriele Pierantoni,Tamas Kiss,Gabor Terstyansky,Jose Rapun,Gregoire Gesmeir,James Deslaurier
|dblpUrl=https://dblp.org/rec/conf/iwsg/PierantoniKTRGD18
}}
==Flexible Deployment of Social Media Analysis Tools==
10th International Workshop on Science Gateways (IWSG 2018), 13-15 June 2018 Flexible Deployment of Social Media Analysis Tools Flexible, Policy-Oriented and Multi-Cloud deployment of Social Media Analysis Tools in the COLA Project Gabriele Pierantoni, Tamas Kiss, Gregoire Gesmier, James DesLauriers, Gabor Terstyanszky José Manuel Martín Rapún Department of Computer Science, R&D Department University of Westminster INYCOM London, United Kingdom Zaragoza, Spain pierang@westminster.ac.uk josemanuel.martin@inycom.es Abstract— The relationship between companies and customers intelligence and brand and media management. It is offered as a and among public authorities and citizens has changed Software as Service (SaaS) product hosted in Inycom’s Data dramatically with the widespread utilisation of the Internet and Centre. Some private customers, particularly larger Social Networks. To help governments to keep abreast of these organizations, use a different distribution of Eccobuzz with changes, Inycom has developed Eccobuzz and Magician, a set of extended functionalities, called Magician. This distribution can web applications for Social Media data mining. The unpredictable be personalized and deployed in customers’ premises. An load of these applications requires flexible user-defined policies interesting feature provided by Magician is that it not only and automated scalability during deployment and execution time. collects and structures raw data from the Internet, but it also Even more importantly, privacy norms require that data is gathers information about the user posting this data, and his/her restricted to certain physical locations. This paper explains how such applications are described with Application Description sentiment about it. Therefore, Magician is an ideal candidate for Templates (ADTs). ADTs define complex topology descriptions the Aragon Regional Government to collect and analyse data and various deployment, scalability and security policies, and how regarding its citizens, and their opinion and attitude towards these templates are used by a submitter that translates this generic local services. information into executable format for submission to the reference When Social Media Analysis Tools are used correctly, they framework of the COLA European project can offer precious insight that can be used to improve services, on the other hand, the very nature of these tools and the data they Keywords—COLA, TOSCA, Cloud Orchestration, Swarm rely upon, pose difficult challenges in the protection of the privacy of the data of the users. These challenges are further I. INTRODUCTION complicated when databases and software processes may run on The relationship of companies with their customers and of different Clouds requiring access to a heterogeneous set of public authorities with their citizens has changed dramatically resources in a dynamic way. Two further aspects of these Social with the widespread utilisation of the Internet and Social Media Analysis Tools are the ever-increasing size of the amount Networks. Monitoring this information has become a critical of data available and, at the same time, the unpredictable aspect for private companies, but it is still scarcely used in the fluctuation of computing load required due to the high level of public sector. To overcome this disparity, the Aragon Regional uncertainty on how much information will be collected by the Government in Spain has set the goal to develop communication crawlers to be processed later. channels with citizens to become aware of their opinion about Eccobuzz and Magician search the Internet for information the local government’s services, and how these can be improved. and process it semantically producing valuable structured The authorities also want to offer information to entrepreneurs information made available through web interface, excel export and companies in the region that can be used to improve and pdf reports. The main components and information flow in businesses or develop new ones. this system are displayed in Figure 1. This paper does not intend The local government already collects large amounts of data to describe in great detail the implementation of Eccobuzz and resulting from interactions with its citizens (e.g. applying for Magician, but rather to describe how, in order to meet support public services such as grants, aids, subsidies, and licenses). This the above challenges, these tools are currently being prototyped data can be combined and extended with information publicly for the submission to the reference framework developed by the available in social networks. The aim is to set up a business COLA (Cloud Orchestration at the Level of Application) gateway that is supported by the intelligent analysis and European project [2]. utilisation of all available information. As part of this work, the applications’ architectures are To fulfil this target, Inycom [1], an ICT company described in a TOSCA-based (Topology and Orchestration headquartered in Aragon, developed Eccobuzz, a web Specification for Cloud Applications) [3] description format, application for Social Media data mining, competitive together with Policies that describe desired Quality of Service 10th International Workshop on Science Gateways (IWSG 2018), 13-15 June 2018 (QoS) parameters related to performance, scalability, economic long term operational costs. On the other hand, the take up of viability and other security and privacy-related aspects. cloud computing by SMEs and the public sector is still relatively low due to limited application-level flexibility and security concerns as suggested in the Work Programme for Information and Communication Technology1 A European funded research project, Cloud Orchestration at the Level of Application (COLA) aims at addressing these difficulties to foster the adoption of cloud computing services. COLA is based on a conceptual architecture that describes several generic components which offer fundamental functionalities needed to support the optimal and secure deployment and run-time orchestration of cloud applications. Such applications can then be embedded into workflows or science gateway frameworks to support complex application scenarios from user-friendly interfaces. Figure 1, main component of the Magician Social Tool COLA does not dictate implementation details of its components and defines a reference implementation whereby These policies are the means through which COLA describes the various components can be substituted in a pluggable the requirements and constraints that are to be met during the fashion. However, COLA defines a few fundamental application lifecycle in the cloud in order to meet the unique functionalities that have to be offered by its pluggable challenges of the Social Media Analysis Tools. This standards- components: the management of virtual machines, the based description is then translated to be used by components of management of containers within these virtual machines and the the COLA reference framework in order to deploy the enforcement of policies which describe various Quality of application topology and enforce various scalability and security Service (QoS) parameters related to deployment, geographical policies. As these applications are the very first large-scale case location, performance, economic viability and security. studies to be deployed, the experiences learned during this To describe the application architecture and the policies that process are constantly fed back and influence the development govern their lifecycle, the COLA project developed the concept of the COLA reference framework. of Application Description Templates (ADT) which are based This paper presents our experiences when developing on the TOSCA Language Specification [4][5]. ADTs offer a application description templates including both topology and description of applications that is as technology-agnostic as policy descriptions for Eccobuzz and Magician and possible. As TOSCA is a generic specification of a meta- demonstrates how such templates can be translated and utilized language, the COLA project has proposed a specific TOSCA by container-based cloud technologies for their deployment. The compliant description format for the Application Description rest of this paper is structured as follows: Section 2 introduces Templates. the COLA project and its objectives, Section 3 describes the To support the development and testing of such description concept of Application Description Templates (ADT) used to language, COLA prototypes three industry case-studies and describe applications and their policies in COLA, Section 4 develops near production level demonstrators to showcase its presents the architecture of the Application Submitter, Section 5 results. These three large-scale application examples, besides and Section 6 provide details of the current prototype. Section 7 the already described Eccobuzz and Magician, include the describes related work, comparing ADTs to similar solutions scalable deployment of complex evacuation simulation offered by other cloud service providers including Microsoft scenarios, and the analysis of data arising from ticket sales of ARM templates and Amazon CloudFormation templates. various cultural institutions. While these three case-studies Finally, Section 8 concludes the paper with a summary and directly influence the early development of the COLA reference future work. framework and the related ADTs, in the second phase of the project, twenty further use-cases will be prototyped as proof of II. THE COLA PROJECT concepts on the developed solution. SMEs and public-sector organizations increasingly investigate the possibilities of using cloud computing services in III. THE APPLICATION DESCRIPTION TEMPLATE (ADT) their everyday business. Accessing services and resources in the ADTs act as information conduits that connect Application cloud on-demand and in a flexible and elastic way could result Developers to the various components of the COLA reference in significant cost savings due to more efficient and convenient framework. Although the COLA reference framework defines resource utilization that also replaces large investment costs with 1 http://ec.europa.eu/research/participants/data/ref/h2020/wp/20 16_2017/main/h2020-wp1617-leit-ict_en.pdf 10th International Workshop on Science Gateways (IWSG 2018), 13-15 June 2018 various components, the most relevant for the design of ADTs compose the image are connected with edges that implement the and for the scope of this paper are those represented in Figure 2. TOSCA ConnectsTo relations. The relationship that connects a container to the virtual machine it should deployed in is The information stored in the ADT is parsed and dispatched described with the HostedOn relationship specified by TOSCA. to the relevant components. This information includes policies that define behavior of the applications (including security features), and information that is required for the deployment of virtual machines and containers. Figure 2, the application submitter and the other components of the COLA reference framework The Application Submitter is responsible for the division of Figure 3, the COLA ADT Structure this information as follows: Information that describes the deployment of containers goes to a container orchestrator that is C. Application Policies connected to container managers. Information specific to the selection and execution of virtual machines is passed on to a The description and enforcement of policies in TOSCA is an cloud orchestrator component which is connected to one or more additional dimension to the description of the application cloud provider. Finally, information specific to policies, is topologies. Policies define and enforce the modalities that dispatched to one or more policy engines that in turn enforce the regulate the application lifecycle. various policies by connecting to the other components of the TOSCA allows the definition of type hierarchies of arbitrary COLA reference framework. complexity. COLA defines a three-layered hierarchy of policies To combine the flexibility offered by technology-oriented that all derive from the TOSCA Root Policy. To facilitate the agnosticism of COLA with the expressiveness required to application of policies at the various levels, each policy defines describe all the various facets of a large variety of applications, the nodes it has to be applied to. For each node, the overall policy we have designed the ADT to describe two main aspects of is composed of sub-policies that describe the various features applications: its topology and its policies. The topology such as security, scalability, etc. Each policy is in turn divided describes the main components of the application whilst the into two main sections. The Description Section comprises policies describe the modalities that govern the various parts of meta-data which define the name, type and description of the the application lifecycle. policy, as well as a target (defined elsewhere in the topology) to which the policy should apply. The Properties Section contains A. The ADT General Structure data that falls under two kinds of parameters: common to all COLA policies types, and specific to each policy type. Each ADT consists of one topology template that comprehends the components described in Figure 3. The Input Common Properties are Stage (which defines at which stage Section groups together fields that Application Developers are of the lifecycle of the element the policy is applied), and Priority likely to override their default value. The Policies Section (which is an arbitrary integer ranging from 0 to 100 used to defines the policies that are applied to the application. Each define the priority with which the policy will be implemented). policy can be applied to one or more nodes at different stages of Specific Properties are specific to each Policy. These parameters their lifecycle. The Container Section defines a set of nodes vary depending on the nature of the policy itself. As an example: that specify the Container Images that contain the various a scalability policy based on CPU consumption will define components of the applications. Finally, the Virtual Images various parameters that specify scalability thresholds, a Section defines the characteristics of the Virtual Machines that deployment policy will define minimum number of CPUs, will host the Container Images defined in the section above. minimum memory size for deployment, etc… B. Application Topology IV. THE APPLICATION SUBMITTER The Topology of the Application is contained in the The Application Submitter, a prototype of which is currently Container and Virtual Images layers. The first deals with the under development and testing at the University of Westminster, deployment of the application components into containerized implements the submission of the ADT. The Application services, and the second describes how such containers are to be Submitter performs two main actions upon an ADT: it separates deployed in virtual machines. The various containers that its various components (Container Orchestration, Virtual 10th International Workshop on Science Gateways (IWSG 2018), 13-15 June 2018 Machine, and Policies) and then invokes adaptors which As an example of the ADT policy description, we describe translate the information into a format understood by their in Table 2 the parameters that define the Application Scalability respective components. To support the technology-oriented Policy which govern how containers are scaled up and down agnosticism of the ADT, the Application Submitter adopts the depending on the CPU consumption of the Magician Container. design described in Figure 4. Name Value Description Type The Application Submitter relies on the OpenStack TOSCA The part of the application lifecycle to TOSCA parser[7] which is used to read a TOSCA file, check its Stage Execution which the policy applies Node syntactical correctness and create an in-memory dictionary State which is then passed to the Mapper component. The Mapper The priority with which the policy is Priority 100 Integer enforced uses a key list to isolate the information subset that is then passed Defines the namespace of the service along to the various adaptors that format them into the structure Namespace Prometheus used for monitoring the CPU String expected by the corresponding components of the COLA consumption. Defines the maximum CPU consumption reference framework. Max 80 (in percentage) above which a new Integer instance is deployed. Defines the minimum CPU consumption Min 20 (in percentage) under which the instance Integer is un-deployed. Defines (in seconds) the amount of time Time 600 above or below the Max/Min threshold Integer before the action is triggered Table 2, Example of Policy Implementation Parameters VI. PROTOTYPE IMPLEMENTATION Figure 4, Design of the Application Submitter To demonstrate the feasibility of the above-described concepts, a first version on the Application Submitter has been V. THE DESCRIPTION OF MAGICIAN WITH THE ADT implemented that translates the ADT into a format understood We have used an ADT to describe the deployment of the by a Container Orchestrator. The current prototype of the Motor Engine of Magician and its configuration database (based COLA reference framework uses Docker Swarm[9] as on MongoDB[8]) to the COLA reference framework. For this Container Orchestrator and we have configured the Application first prototype, it was decided to deploy the main components of Submitter to connect with it through an adapter that transforms Magician within a single container and to express four main the container-level application topology into a compose file[10]. aspects of the policies that govern its behaviour: scalability, The Policy Keeper is currently under development, so Policies resources and security connections. The two policies that deal are parsed by the Application Submitter but no adaptor is with scalability and Location Deployment are particularly available as of now. For testing purposes, the same policies are relevant to the deployment and execution of Social Media hard-coded on the different components. Analysis Tools as they address the main concerns of unexpected The ADT is first parsed and validated by the OpenStack load fluctuations and they allow to maintain the data sets within parser. The mapper then separates out the container-relevant a geographical area ruled by certain legal constraints and sections of TOSCA and passes them to a translator. This requirements (e.g. the European General Data Protection container-level portion of TOSCA (Figure 7) is translated into Regulation2). The policies are recapitulated in Table 1 and the format of a Docker-Compose file (Figure 8) so that it may described following the structure introduced in Section III. C. be understood by the currently implemented container Policy orchestrator, Docker Swarm. As TOSCA and Docker Compose Policy Type Policy Description Number both appropriate YAML as a language, translation from the Consumption Defines minimum and maximum CPU former to the latter is straightforward. An adaptor for this consumptions thresholds above which a new P1.1 Based container instance is deployed and below which a specific container orchestrator considers three basic sets of Scalability information within the subset of data that is provided by the container instance is un-deployed Connection Defines the set of inbound connections that must Mapper. P1.2 Deployment be allowed. Policy The first are the TOSCA properties defined within the Resource Defines the minimum requirements of the Virtual P1.3 Deployment Machine in terms of CPU, Memory Size and Disk description of containers. These properties should align with the Policy size. runtime arguments that can be passed to Docker via the docker Location Defines the physical location of the computation run command or via a Docker Compose file, and should follow P1.4 Deployment Policy and storage resources. the naming conventions of the Docker Compose format. One Table 1, Policies Describing the deployment and execution of example of this is seen in Figure 7, where the property for ports Magician is defined, but other options such as command, entrypoint, and environment should also be defined at this level. Translation of 2 https://www.eugdpr.org/ 10th International Workshop on Science Gateways (IWSG 2018), 13-15 June 2018 these properties simply involves copying them across under a academia. The OASIS [11] efforts to define standards for the new key in the Compose file. application description have created TOSCA [12][13][14], a widely adopted standard both with several implementations [15] The second set of information is contained within TOSCA and related tools [16][17][18]. artefacts, which define external data that must be retrieved during orchestration. The image from which to build the container is described as an artefact with properties that define the image name, as well as the repository where it can be found. The final set of information to translate comes in the form of TOSCA relationships. Relationships match TOSCA requirements with matching capabilities to describe how the various components within the ADT should interact with each other. TOSCA standards define the following: a HostedOn Figure 6, Translation of the ADT Container-Level Topology relationship between a container and a virtual machine, a Description into a Swarm COMPOSE file ConnectsTo relationship between two containers, and an AttachesTo relationship for connecting a container to a block Amazon uses Amazon Machine Images (AMI) [19] to storage volume. describe all information required to launch Amazon EC2 instances. AMIs are templates which include a description of the Translating an AttachesTo relationship requires defining a root volume (i.e. an operating system, an application server, and new volume and providing an appropriate reference to that applications), launch permissions that control which AWS volume, both inside the Compose file. Translating a ConnectsTo accounts can use the AMI, and block device mapping that relationship involves defining a new network, and referencing specifies the volumes to attach to the instance when it is that network under both of the connecting components, again launched. The AMI template must contain at least the base inside the Compose file. Translation of the HostedOn operating system and it may be customised to include additional relationship needs to define a constraint inside the Compose file, configuration and software code. AMIs are not written and then requires further cooperation from the cloud orchestrator templates, but rather read-only, re-usable snapshots of EC2 to ensure an appropriate reference is made for that constraint. instances. The COLA project currently implements Docker Following the TOSCA standards, as well as the specific container orchestration to provide similar functionality to that outline of the ADT as implemented by the COLA reference which is offered by an AMI. Docker serves as a lightweight, framework is important in ensuring an adaptor can understand more technology agnostic solution which sees wider use. and translate the information into an understandable format. Through the modularity of the COLA framework, any other container orchestrator, or even a virtual machine image solution In order to finally launch the application, the resultant could be substituted in place of Docker. Docker-Compose file is passed to Swarm and executed by the docker deploy command on an instance of MiCADO V3[2] Amazon CloudFormation[20] supports development, which is the current implementation choice for the COLA deployment and running applications on the Amazon cloud. reference framework. In this implementation of MiCADO, AWS CloudFormation templates describe the different instances cloud orchestration and default policies are hard-coded, making (using AMIs) and resources that make up an application stack, it ideal to test container-level orchestration in isolation. as well as security rules and other customisations that should apply to the deployment. The templates are stored as text files that comply with JavaScript Object Notation (JSON) or YAML [21]. They can be created and edited in any text editor and can be managed in the source code IDE. CloudFormation templates are analogous to the ADTs used by COLA and can even describe container based deployments, though they follow an internal language specification instead of a global solution such as that offered by the TOSCA standard. Microsoft Azure[22] describes resources through Azure Resource Manager (ARM) which combines compute, storage and network resources and shows them as a single unit that can be created, managed and deleted together. ARM templates contain four entities: parameters that can be entered during run- time with a set of values or default values pre-defined, variables Figure 5, ADT Container-Level Topology Description in TOSCA that are static in the code and used for deploying resources, resources to be deployed, and outputs to be produced. ARM VII. RELATED WORK templates are written and stored as text files in the JSON format. The functionality offered by ARM templates is no different from Research described in this paper is closely linked to similar that offered by AWS CloudFormation templates, or COLA’s attempts at application description both in industry and own ADTs. Again, these templates follow their own structure 10th International Workshop on Science Gateways (IWSG 2018), 13-15 June 2018 and wording instead of using the specification laid out by OASIS open.org/tosca/TOSCA-Simple-Profile- in TOSCA. YAML/v1.0/csprd01/TOSCA-Simple-Profile-YAML- COLA approaches the description of an application stack in v1.0-csprd01.html. [Accessed: 14-Feb-2017]. the cloud with more modularity and technology agnosticism [7] “TOSCA-Parser - OpenStack.” [Online]. Available: than do Amazon CloudFormation templates or Microsoft ARM https://wiki.openstack.org/wiki/TOSCA-Parser. templates. By offering a modular approach to assembling and [Accessed: 29-Oct-2017]. deploying images which make up the applications, COLA [8] “MongoDB for GIANT Ideas | MongoDB.” [Online]. allows for and encourages reuse of those images across Available: https://www.mongodb.com/. [Accessed: 30- platforms. Oct-2017]. [9] “Docker Swarm overview - Docker Documentation.” Similarly, as the TOSCA standard becomes more widely [Online]. Available: accepted and appropriated, COLA ADT templates will become https://docs.docker.com/swarm/overview/. [Accessed: more familiar and could even be reused across platforms. Cloudify [23] and Alien4Cloud [24] are two open-source cloud 30-Mar-2017]. orchestrators currently under development. While they do not [10] “Compose file version 3 reference | Docker support the same modularity of components offered by COLA, Documentation.” [Online]. Available: they do conform to the TOSCA standard. Comparing other https://docs.docker.com/compose/compose-file/. aspects of these open-source solutions with COLA are outside [Accessed: 21-Mar-2018]. the scope of this paper, but their adoption of TOSCA is [11] “OASIS | Advancing open standards for the encouraging and hopefully suggests an upward trend in information society.” [Online]. Available: adherence to TOSCA and the growth of its community. https://www.oasis-open.org/. [Accessed: 18-Mar- 2018]. VIII. CONCLUSION AND FUTURE WORK [12] “Topology and Orchestration Specification for Cloud Applications,” 2013. This first proof of concept prototype has demonstrated the [13] G. Katsaros, M. Menzel, A. Lenk, R. Skipp, and J. viability of the technology-agnostic approach and how a Eberhardt, “Cloud Service Orchestration with TOSCA, TOSCA-based ADT can be used to describe abstract application topologies and delegate to a component (Application Submitter) Chef and Openstack Jannis Rake-Revelant.” the translation to formats that are technology-specific. This first [14] P. Hirmer, U. Breitenbücher, T. Binz, and F. Leymann, prototype will now be extended to implement further “Automatic Topology Completion of TOSCA-based functionalities such as flexible submission lifecycle Cloud Applications.” management and support for more detailed policies through a [15] T. Binz et al., “OpenTOSCA – A Runtime for TOSCA- Policy Keeper currently under development in COLA. The based Cloud Applications.” Application Submitter will be embedded into the MiCADO [16] O. Kopp, T. Binz, U. Breitenbücher, F. Leymann, and architecture and will connect the ADTs with the application- U. Breitenb, “Winery – A Modeling Tool for TOSCA- level orchestration features of MiCADO. based Cloud Applications.” [17] U. Breitenbücher, T. Binz, O. Kopp, and F. Leymann, ACKNOWLEDGMENTS “Vinothek – A Self-Service Portal for TOSCA.” [18] J. Soldani, T. Binz, U. Breitenbücher, F. Leymann, and This work was funded by the COLA Cloud Orchestration at A. Brogi, “ToscaMart: A method for adapting and the level of Applications Project No. 731574 project. reusing cloud applications,” J. Syst. Softw., vol. 113, pp. 395–406, 2016. REFERENCES [19] S. Pearce and S. Bryen, “Managing Your AWS [1] “Inycom | Tecnoloía e Innovación para tu Negocio.” Infrastructure at Scale,” 2015. [Online]. Available: https://www.inycom.es/. [20] “Learn Template Basics - AWS CloudFormation.” [Accessed: 18-Mar-2018]. [Online]. Available: [2] “About – COLA Project – Cloud Orchestration at the http://docs.aws.amazon.com/AWSCloudFormation/lat Level of Application.” [Online]. Available: est/UserGuide/gettingstarted.templatebasics.html. http://www.project-cola.eu/cola-project/. [Accessed: [Accessed: 20-Feb-2017]. 27-Mar-2017]. [21] “The Official YAML Web Site.” [Online]. Available: [3] “TOSCA_overview.” http://www.yaml.org/. [Accessed: 20-Feb-2017]. [4] “TOSCA-spec-v1.0.” [22] Rick Rainey, Microsoft Azure Essentials Azure Web [5] T. Binz, U. Breitenbücher, O. Kopp, and F. Leymann, Apps for Developers | Microsoft Press Store. . “TOSCA: Portable Automated Deployment and [23] “Cloudify - Cloud Orchestration” [Online]. Available: Management of Cloud Applications,” in Advanced Web https://cloudify.co/product/. [Accessed: 22-Apr-2017]. Services, 2014, pp. 527–549. [24] “Alien4Cloud” [Online]. Available: [6] W. Draft, “TOSCA Simple Profile in YAML Version https://alien4cloud.github.io/index.html. [Accessed: 22-Apr- 1.0,” 2014. [Online]. Available: http://docs.oasis- 2017].