Model-driven Automated Deployment of Large-scale CPS Co-simulations in the Cloud Yogesh D. Barve, Himanshu Neema, Aniruddha Gokhale and Janos Sztipanovits Institute for Software-Integrated Systems, Dept. of EECS, Vanderbilt University, Nashville, TN 37212, USA Email:{yogesh.d.barve,himanshu.neema, a.gokhale, janos.sztipanovits}@vanderbilt.edu Abstract—With increasing advances in Internet-enabled de- modeling and simulating CPS. Co-simulation or coupled simu- vices, large cyber-physical systems (CPS) are being realized by lation is a methodology that focuses on evaluating the behavior integrating several sub-systems together. Analyzing and reasoning of a system by integrating simulations of its components. Each different properties of such CPS requires co-simulations by composing individual and heterogeneous simulators, each of specialized simulation tool can process and communicate var- which addresses only certain aspects of the CPS. Often these ious events among participating simulation engines to model co-simulations are realized as point solutions or composed in large-scale CPS. To realize such a co-simulation platform, an ad hoc manner, which makes it hard to reuse, maintain and proper time synchronization and coordination of message evolve these co-simulations. Although our prior work on a model- flows among participating simulations engines is needed. based framework called Command and Control Wind Tunnel (C2WT) supports distributed co-simulations, many challenges C2WT [1] is a heterogeneous simulation integration frame- remain unresolved. For instance, evaluating these complex CPSs work that we have previously developed at Vanderbilt Uni- requires large amount of computational and I/O resources for versity. It enables model-based rapid synthesis of heteroge- which the cloud is an attractive option yet there is a general lack neous and distributed CPS co-simulations. C2WT relies on of scientific approaches to deploy co-simulations in the cloud. the IEEE High-Level Architecture (HLA) standard. Domain- In this context, the key challenges include (i) rapid provisioning and de-provisioning of experimental resources in the cloud for specific tools have been built on top of C2WT such as C2WT- different co-simulation workloads, (ii) simulating incompatibility TE [2] which targets transactive smart grid domain, and the and resource violations, (iii) reliable execution of co-simulation SURE testbed [3] that targets security and resilience in CPS. experiments, and (iv) reproducible experiments. Our solution Despite these advances, many challenges still remain unre- builds upon the C2WT heterogeneous simulation integration solved. For instance, large-scale simulations exhibit compute technology and leverages the Docker container technology to provide a model-driven integrated tool-suite for specifying experi- and/or I/O intensive workloads and may need large amount ment and resource requirements, and deploying repeatable cloud- of such resources. Cloud computing can provide access to scale experiments. In this work, we present the core concepts and such a large pool of resources elastically and on-demand. architecture of our framework, and provide a summary of our However, existing cloud platforms lack tools for effective current work in addressing these challenges. deployment of large-scale CPS simulations. Migrating existing Index Terms—co-simulations, verification, model driven, cloud simulation tools to the cloud is also a challenging task, which hinders the widespread adoption of cloud computing I. INTRODUCTION AND PROBLEM STATEMENT for CPS co-simulation. This problem is further exacerbated since CPS domain experts conducting the simulations often Large-scale cyber physical systems (CPS) experiments are lack a proper understanding of the cloud resource provisioning being increasingly deployed for real-world scenarios in do- and utilization thereby resulting in ad hoc and sub-optimal mains such as building automation and control, smart power deployment of CPS simulations in the cloud. grid, health-care, and industrial processes. For example, power In this research, we focus primarily on cloud-based provi- grid CPSs are composed of many multi-domain subsystems sioning of large-scale CPS experiments, and outline the key with different assets and technologies, such as electric grid, challenges associated with deploying and experimenting with sensors, networking and physical control systems. Thus, de- CPS co-simulations in the cloud. signing and analyzing such complex systems needs extensive simulation and prototyping tools that span multiple domains. II. C HALLENGES IN R EALIZING C LOUD - HOSTED CPS C O -S IMULATIONS While recent advances in simulation tools have enabled modeling and simulation of system characteristics, a single The following challenges must be resolved to support simulator tool is not sufficient to model and experiment with reusable and extensible cloud-based CPS co-simulations. CPS. This is due to the fact that no single simulator can 1. Integrated tool to rapidly deploy experiments on cloud simulate all aspects of CPS, and moreover, CPS require resources: To run experiments in the cloud, the framework heterogeneous resources and execution environments. Thus should be able to acquire required resources, instantiate the co-simulation environments have emerged as an approach for deployment and execution of the co-simulation, and tear down the acquired resources when the experiment is completed. The run-time infrastructure should require minimal startup and shutdown time to ensure a quick experiment start and prompt release of resources, without incurring additional resource utilization cost. The simulations also impose different resource requirements such as CPU cores, GPU, RAM, and disk space. Moreover, the simulations could be CPU and/or I/O intensive. These resource requirements must be configured in the tool, and the cloud resources should be allocated accordingly. A dynamic cloud resource management strategy can be highly effective for better cloud resource utilization. 2. Handling simulation incompatibility and resource vio- lations: For faithful experimental outcomes, different simula- tors impose co-simulation specific data-exchange requirements and QoS constraints such as communication latencies, com- putation execution deadlines, hardware resource availability (CPUs, memory, etc.). For instance, if one of the simulators in Fig. 1: Architecture Overview of CPS Co-simulation Deploy- the co-simulation requires high I/O bandwidth to stream large ment in the Cloud videos, the receiving simulator needs to consume the streamed data within a given time-period. Thus, if these simulators are not co-located in the cloud, a violation or incompatibility while still satisfying individual resource requirements. We are warning should be raised so that the user can make the also building a cloud resource monitoring framework utilizing necessary modifications to satisfy the QoS constraints. collectd and other tools to enable real time monitoring of cloud 3. Proactive fault tolerance for simulation execution: resources which can then be fed to the SMT solvers to make The cloud-based co-simulation framework must be resilient effective decisions. To enable fault-tolerant co-simulations, we to system faults and failures that can occur within the cloud are developing a Co-simulation checkpointing technique using platforms. Our solution, called Co-simulation checkpointing, the save and restore functions of the CRIU library for Docker leverages Linux container’s save and restore functions, and containers. It is also critical that checkpointing should be enables, in the event of a failure, an effective recovery of synchronized and coordinated, and must support distributed systems to their previously checkpointed states. Implementing simulator deployments. checkpoint with distributed co-simulations, that have inter- ACKNOWLEDGMENTS twined dependencies, is even more challenging. Here, check- This work is supported in part by NIST contract num- pointing also needs to be coordinated and synchronized across ber 70NANB15H312, NSF CPS VO contract number CNS- all simulators. This ensures reliable recovery and correct 1521617 and NSF US Ignite CNS 1531079. Any opinions, execution from snapshot images during system restoration. findings, and conclusions or recommendations expressed in 4. Reproducible Experiments: Deterministic execution this material are those of the author(s) and do not necessarily and reproducible experiments are needed for many CPS co- reflect the views of the funding agencies. simulations. The co-simulation integration methods and run- time execution tools must be designed for these requirements R EFERENCES from the start. In addition, for repeatable experiments, the [1] G. Hemingway, H. Neema, H. Nine, J. Sztipanovits, and G. Karsai, “Rapid synthesis of high-level architecture-based heterogeneous simula- cloud experimentation platform should provide the same run- tion: a model-based integration approach,” Simulation, vol. 88, no. 2, pp. time execution environment and configuration for the same 217–232, 2012. [2] H. Neema, J. Sztipanovits, M. Burns, and E. Griffor, “C2WT-TE: A experiments. model-based open platform for integrated simulations of transactive smart grids,” in Modeling and Simulation of Cyber-Physical Energy Systems III. P ROPOSED SOLUTION A ND C URRENT S TATUS (MSCPES), 2016 Workshop on. IEEE, 2016, pp. 1–6. [3] H. Neema, P. Volgyesi, B. Potteiger, W. Emfinger, X. Koutsoukos, We are developing a framework that enables effective G. Karsai, Y. Vorobeychik, and J. Sztipanovits, “SURE: An CPS co-simulation in cloud. Figure 1 provides the functional Experimentation and Evaluation Testbed for CPS Security and Resilience: Demo Abstract,” in Proceedings of the 7th International architecture of our framework. Our framework uses Docker Conference on Cyber-Physical Systems, ser. ICCPS ’16. Piscataway, NJ, USA: IEEE Press, 2016, pp. 27:1–27:1. [Online]. Available: containers for deploying simulations in an Openstack cloud. http://dl.acm.org/citation.cfm?id=2984464.2984491 The simulations are built using corresponding pre-packaged simulators inside a Docker container. These containers provide a repeatable runtime environment. We are also developing a domain-specific modeling language to capture experiment resource requirements of individual simulators. In future, we plan on integrating an SMT solver for an optimal placement of simulators in the cloud environment