Continual Verification of Non-Functional
                          Properties in Cloud-Based Systems∗
                                                  Invited Paper

                      Radu Calinescu, Kenneth Johnson, Yasmin Rafiq, Simos Gerasimou,
                               Gabriel Costa Silva and Stanimir N. Pehlivanov
                              Department of Computer Science, University of York, UK

                        Abstract. Cloud-based systems are used to deliver business-critical and
                        safety-critical services in domains ranging from e-commerce and e-govern-
                        ment to finance and healthcare. Many of these systems must comply with
                        strict non-functional requirements while evolving in order to adapt to
                        changing workloads and environments. To achieve this compliance, for-
                        mal techniques traditionally employed to verify the non-functional prop-
                        erties of critical systems at design time must also be used during their
                        operation. We describe how a formal technique called runtime quantita-
                        tive verification can be used to verify cloud-based systems continually.


               1      Introduction
               Few new technologies have been embraced with as much enthusiasm as cloud
               computing. Users, providers and policy makers have set out to exploit the ad-
               vantages of the new technology and to encourage its adoption. Their success
               stories abound, and the range of services delivered by cloud-based systems is
               growing at a fast pace. Business-critical e-commerce and e-government services,
               and safety-critical healthcare services are increasingly part of this range.
                   Using cloud-based systems to deliver critical services is a double-edged sword.
               On the one hand, the ability of cloud to dynamically scale resource allocations in
               line with changing workloads makes it particularly suited for running such sys-
               tems. On the other hand, its reliance on third-party hardware, software and net-
               working can lead to violations of the strict non-functional requirements (NFRs)
               associated with critical services. Taking advantage of the former characteristic of
               cloud without being adversely affected by the latter represents a great challenge,
               as the traditional approaches to verifying NFRs are often ineffective in a cloud
               setting. NFR verification approaches such as model checking, design by contract
               and quality assurance were devised for off-line use during the design or verifica-
               tion and validation stages of system development. As a result, they have high
               overheads and operate with models and non-functional properties (NFPs) that
               in the case of cloud-based systems evolve continually as the workloads, allocated
               resources and requirements of these systems change over time.
                   In this survey paper, we describe how a formal technique called runtime
               quantitative verification [1] can be used (a) to verify the NFPs of evolving cloud-
               based systems continually; and (b) to guide this evolution towards configurations
               that are guaranteed to satisfy the system NFRs.
                ∗
                    This work was partly supported by the UK EPSRC grant EP/H042644/1.


Proceedings of NiM-ALP 2013                                1
               2      Runtime Quantitative Verification
               Quantitative verification is a mathematically based technique for analysing the
               correctness, performance and reliability NFPs of systems that exhibit stochastic
               behaviour [14]. To analyse the NFPs of a system, the technique uses finite-state
               Markov models comprising states that correspond to different system config-
               urations, and edges associated with the transitions that are possible between
               these states. Depending on the type of the analysed NFPs, the edges are an-
               notated with transition probabilities or transition rates; and the model states
               and transitions may additionally be labelled with costs/rewards. Given a model
               and an NFP specified formally in temporal logic extended with probabilities
               and costs/rewards, probabilistic model checkers that implement the technique
               analyse the model exhaustively in order to evaluate the property.
                   Like most formal verification techniques, quantitative verification is tradi-
               tionally used in off-line settings, to evaluate the performance-cost tradeoffs of
               alternative system designs, or to analyse NFR compliance for existing systems. In
               the latter case, systems in violation of their NFRs undergo off-line maintenance.
               This approach does not meet the demands of emerging application scenarios
               in which systems need to be continually verified as they adapt autonomously,
               whenever a need for change is detected while they operate [6]. To address this
               need for continual verification, the runtime variant of the approach depicted in
               Fig. 1 was introduced in [10, 11] and further refined in [1, 4, 9, 12].
                 family of system                     system & environ.
                                          model                              monitor system              cloud-based
                     models for                      parameters defining
                                         selection                          & its environment               system
                different scenarios                  the current scenario


                              selected                                                 reconfiguration
                               model                                                        plan


                                                                               quantitative         verification result
                   non-functional                       quantitative
                                                                               verification          analysis & new
                    requirements                        verification
                                                                                 results             config. selection

               Fig. 1. Continual verification of a cloud-based system—Monitoring is used to identify
               the scenario the system operates in, and to select a model whose quantitative verifica-
               tion enables the detection or, sometimes, prediction of NFR violations. The subsequent
               synthesis and execution of a provably correct reconfiguration plan help the system re-
               instate or maintain compliance with NFRs such as response time, availability and cost.


Proceedings of NiM-ALP 2013                                      2
               3    Continual Verification of Cloud-Deployed Services
               Continual verification can be used to manage the reliability of multi-tier software
               services deployed on cloud infrastructure owned by the service provider. As we
               show in [8, 13], quantitative verification at runtime enables service administrators
               to quantify the impact of planned and unexpected changes on the reliability of
               their services, as the approach provides precise answers to questions such as:
               Q1 What is the maximum probability of a service becoming unavailable within
                  a one-month time period?
               Q2 How will the probability of failure for a service be affected if one of its
                  database instances is switched off to reflect a decrease in service workload?
               Q3 How many additional VMs should be used to run a service component when
                  moving it to VMs placed on servers with fewer memory blocks?
               Service administrators (or their automated resource management scripts) can
               then respond with remedial action if the service fails to comply with its NFRs,
               e.g., by discarding the planned removal of a component instance.
               Compositional NFR Verification Standard verification uses a monolithic
               model constructed from the parallel composition of models of all system com-
               ponents, in order to capture interleaving and synchronised behaviour between
               components. Despite significant improvements in the efficiency of quantitative
               verification techniques and tools, this often results in high overheads that make
               the runtime use of the technique unfeasible because the system may change again
               before its latest change has been fully analysed. This problem is addressed by
               compositional NFR verification, a formal technique that replaces the analysis
               of monolithic system models with a sequence of verification steps carried out
               component-wise on local NFPs. The ordering of the steps is determined accord-
               ing to component dependencies within the analysed system, and ultimately helps
               infer global system NFPs. Compositional verification reduces the NFP analysis
               overheads significantly, and in [8] we show that it can be used for continual NFP
               verification in certain scenarios associated with cloud-based systems.
               Incremental NFR Verification Cloud-deployed services operate in a highly
               dynamic environment where service components are frequently added, removed
               and scaled according to demand, or might fail unexpectedly. While composi-
               tional verification broadens the range of systems to which NFR verification can
               be applied effectively, it cannot cope with such rapid changes. Our INVEST
               incremental verification framework from [13] addresses this limitation by identi-
               fying and executing a minimal sequence of reverification steps after each change
               in a component-based system.
               Practical Use Using our continual NFR verification of cloud-deployed ser-
               vices requires the construction of accurate models of the analysed system and
               the specification of its NFRs in probabilistic temporal logic. As most practi-
               tioners lack the formal methods expertise required to carry out this task, we
               developed domain-specific languages that administrators of cloud resources can
               use to specify the initial structure of their multi-tier cloud-deployed services and
               the changes under which their NFPs should be analysed [16].


Proceedings of NiM-ALP 2013                             3
               4    Continual Verification of Service-Based Systems

               Continual NFP verification can also be exploited by cloud-service users, to drive
               the dynamic selection of service-based system (SBS) components. Built through
               the integration of third-party services deployed on cloud datacentres and ac-
               cessed over the Internet, SBSs are increasingly used in business-critical and
               safety-critical applications in which they must comply with strict NFRs. Self-
               adaptive SBSs achieve NFR compliance by selecting the concrete services for
               their operations dynamically, from sets of functionally equivalent third-party
               services with different levels of performance, reliability and cost.
               Continually verified SBSs The self-adaptive SBS architecture and devel-
               opment methodology we introduced in [2] and extended in [1, 7] carry out this
               concrete service selection using continual NFR verification. Given an SBS work-
               flow comprising n ≥ 1 operations to be performed by third-party services, our
               SBS architecture uses intelligent proxies to interface the workflow with sets of
               remote service such that the i -th SBS operation can be carried out by mi > 1
               functionally equivalent services. The main role of the intelligent proxies is to
               ensure that each execution of an SBS operation is carried out through the in-
               vocation of a concrete service selected as described below. The proxy overseeing
               the i -th SBS operation is initialised with a sequence of “promised” service-level
               agreements slaij = (pij0 , cij ), 1 6 j 6 mi , where pij0 ∈ [0, 1] and ci,j > 0 represent
               the provider-supplied probability of success and the cost for an invocation of
               service si,j , respectively. The intelligent proxies are also responsible for notify-
               ing a model updater about each service invocation and its outcome. The model
               updater starts from a developer-supplied initial Markov model of the SBS work-
               flow, and uses the on-line learning techniques from [3, 5] to adjust the transition
               probabilities of the model in line with these proxy notifications. Finally, the up-
               to-date Markov model is used by an autonomic manager that performs runtime
               quantitative verification to select the service combination used by the intelligent
               proxies so that the SBS satisfies its NFRs with minimal cost at all times.
               Efficient NFR Verification The runtime quantitative verification within
               our framework [3, 5] is carried out by an embedded version of the PRISM prob-
               abilistic model checker [15]. This version of PRISM can handle the “on the fly”
               analysis necessary for a range of small-sized SBS workflows. The compositional
               and incremental verification approaches described in the previous section can be
               used to reduce the overheads of continual NFP verification in order to extend
               its applicability to larger SBS workflows.
               Practical Use For a continually verified SBS architecture to be practical, de-
               velopers should be able to build instances of it without requiring special training
               in formal methods. To address this need, our work from [3, 5] introduced a tool-
               supported SBS development framework. In this framework, many of the com-
               ponents and artefacts described above are either generated automatically (e.g.,
               starting from an annotated UML activity diagram of the SBS workflow and from
               the WSDL specifications of its concrete services) or workflow-independent and
               therefore provided as reusable middleware.


Proceedings of NiM-ALP 2013                               4
               5    Conclusion
               A key advantage of cloud-based systems is their ability to evolve, e.g., by scal-
               ing their resources up and down in response to changes in workload. When
               such evolving systems deliver business-critical or safety-critical services, their
               non-functional properties (NFPs) must be verified continually. We summarised
               recent research into using quantitative model checking to support continual NFP
               verification in cloud-based systems. This novel approach to NFP verification can
               be exploited by both providers and users of cloud-deployed services, and has been
               employed successfully in several case studies, e.g. [1, 5, 7, 8, 13]. Nevertheless, sig-
               nificant research is still needed to extend its applicability to new scenarios, to
               improve its scalability, and to identify and address its limitations.

               References
                1. Calinescu, R., Ghezzi, C., Kwiatkowska, M., Mirandola, R.: Self-adaptive software
                   needs quantitative verification at runtime. Comm. ACM 55(9), 69–77 (Sept 2012)
                2. Calinescu, R., Grunske, L., Kwiatkowska, M., Mirandola, R., Tamburrelli, G.: Dy-
                   namic QoS management and optimization in service-based systems. IEEE Trans.
                   Softw. Eng. 37, 387–409 (2011)
                3. Calinescu, R., Johnson, K., Rafiq, Y.: Using observation ageing to improve Marko-
                   vian model learning in QoS engineering. In: ICPE 2011. pp. 505–510
                4. Calinescu, R., Kikuchi, S., Kwiatkowska, M.: Formal methods for the development
                   and verification of autonomic IT systems. In: Formal and Practical Aspects of
                   Autonomic Computing and Networking. pp. 1–37. IGI Global (2010)
                5. Calinescu, R., Rafiq, Y.: Using intelligent proxies to develop self-adaptive service-
                   based systems. In: TASE 2013. pp. 131–134
                6. Calinescu, R.: Emerging techniques for the engineering of self-adaptive high-
                   integrity software. In: Assurances for Self-Adaptive Systems. Springer (2013)
                7. Calinescu, R., Johnson, K., Rafiq, Y.: Developing self-verifying service-based sys-
                   tems. In: ASE 2013. To appear
                8. Calinescu, R., Kikuchi, S., Johnson, K.: Compositional reverification of probabilis-
                   tic safety properties for large-scale complex IT systems. In: Large-Scale Complex
                   IT Systems. LNCS, vol. 7539, pp. 303–329. Springer (2012)
                9. Calinescu, R., Kwiatkowska, M.: CADS*: Computer-aided development of self-*
                   systems. In: FASE 2009. pp. 421–424. Springer
               10. Calinescu, R., Kwiatkowska, M.Z.: Using quantitative analysis to implement auto-
                   nomic IT systems. In: ICSE 2009. pp. 100–110
               11. Epifani, I., Ghezzi, C., Mirandola, R., Tamburrelli, G.: Model evolution by run-
                   time adaptation. In: ICSE 2009. pp. 111–121
               12. Filieri, A., Ghezzi, C., Tamburrelli, G.: Run-time efficient probabilistic model
                   checking. In: ICSE 2011. pp. 341–350
               13. Johnson, K., Calinescu, R., Kikuchi, S.: An incremental verification framework for
                   component-based software systems. In: CBSE 2013. pp. 33–42
               14. Kwiatkowska, M.: Quantitative verification: Models, techniques and tools. In:
                   ESEC/FSE 2007. pp. 449–458. ACM Press
               15. Kwiatkowska, M., Norman, G., Parker, D.: PRISM 4.0: Verification of probabilistic
                   real-time systems. In: CAV’11. LNCS, vol. 6806, pp. 585–591. Springer (2011)
               16. Pehlivanov, S.: Modelling of Multi-Tier Applications Deployed on Cloud Comput-
                   ing Infrastructure. Master’s thesis, University of York (September 2013)


Proceedings of NiM-ALP 2013                               5