Ensuring Cost-Optimal SLA Conformance for Composite Service Providers Philipp Leitner Supervised by: Schahram Dustdar Distributed Systems Group Vienna University of Technology Argentinierstrasse 8/184-1 A-1040, Vienna, Austria lastname@infosys.tuwien.ac.at Abstract. For providers of composite services, service level agreements (SLAs) provide a means to guarantee a certain service quality to prospec- tive customers. Usually, violating SLAs is associated with costs. However, the means necessary to ensure SLA conformance also generate costs. Of- tentimes, it is therefore optimal from a business perspective to violate certain SLAs sometimes, instead of trying for the high (and expensive) road of always satisfying each one. In this paper we will sketch a frame- work for the prediction of SLA violations and for determining whether an adaptation of the process makes sense economically. If this is the case adaptation actions are triggered, which adapt the composition on either on instance, structural or environmental level. The ultimate goal is to implement a closed-loop system, which self-optimizes the costs resulting from SLA violations. 1 Introduction Service-oriented architectures are at their core about the integration of systems. This new paradigm is used by Software-as-a-Service providers, which deliver basic IT functionality such as customer relations management or business intel- ligence as composite services. One important notion for the seamless integration of such outsourced IT services are agreements about the quality that these ser- vices need to provide (QoS), typically defined within legally binding Service Level Agreements (SLAs). SLAs contain Service Level Objectives (SLOs), con- crete numerical QoS objectives which the service needs to fulfill. If SLOs are violated, agreed upon consequences (usually taking the form of penalty pay- ments) go into effect. However, fulfilling SLAs can also lead to costs for the service provider (e.g., because the composite service provider needs to use more expensive services itself, or because of the costs inherent to optimizing its service composition). It is therefore not trivial for the provider to decide to what extend the service’s SLAs should be fulfilled, or which SLAs should (temporarily) be violated for economical reasons. Even more, these decisions should optimally be automated, to allow for fast reactions to changes in the business environment. In this overview paper we will present a high-level framework for optimiz- ing adaptations of service compositions with regards to SLA violations. We use techniques from the area of machine learning [1] to construct models allowing the system to predict SLA violations at runtime and decide which adaptation actions may be used to improve overall performance. Adaptation can happen on instance level (for one instance only), on structural level (for all future in- stances), or on environmental level (e.g., migrating the composition engine to a machine with better hardware). An optimizer component decides if applying these changes makes sense economically (i.e., whether the costs of violating the SLAs are bigger than the adaptation costs). If this is the case the respective actions are applied in an automated way. At its core, this system is a closed- loop self-optimizing system [2], with the target of minimizing the total costs of adaptations and SLA violations for the service provider. The work described in this paper is currently ongoing. However, some impor- tant fundamental work has already been published. In [3] earlier work regarding the monitoring of QoS of Web services is presented. Our work on VRESCo [4] forms the basis for the proposal presented here, providing core services such as support for dynamic rebinding. Finally, in [5] we have presented first results re- garding the identification of factors influence of business process performance, which is related to the generation of prediction models. The remaining PhD re- search will be led by two key research questions: (1) How can the factors that influence the performance of a composition be identified, modeled and analyzed, in order to enable prediction of SLA violations at runtime, and trigger adapta- tions to prevent these violations? (2) How can the tradeoff between preventing violations and the costs of doing so be best formalized, especially considering that many adaptation actions may be interleaved and combined? Even though first steps exist (see Section 3), to the best of our knowledge, these questions have not been answered sufficiently in literature so far. We plan to validate the outcomes of the thesis using a case study, by showcasing how self-optimization can prevent SLA violations both short- and long-term, and comparing the total costs for the service provider with and without the proposed system. We will consider our work successful if using our system leads to significant financial benefits for the provider, while at the same time reducing SLA violations (even if not all violations are prevented). The remainder of the paper will be structured as follows. Section 2 contains the main contribution of the paper, a description of a system for cost-optimal adaptation of composite services. In Section 3 we give a brief overview over relevant related work. Finally, Section 4 will conclude the paper. 2 Approach Overview A high-level overview of our approach is depicted in Figure 1. The system imple- ments an optimization cycle in the Autonomic Computing [2] sense, i.e., it follows the basic steps Monitor (monitoring the service composition, i.e., measuring QoS values and process instance data), Analyze (generating prediction models), Plan (evaluating based on the generated models, the available adaptation actions and the current SLAs of the provider if there are possible optimizations for the com- position), and Execute (applying these optimizations). The managed element is the Service Composition, while four other components (Composition Monitor, Composition Analyzer, Cost-Based Optimizer and Adaptation Executor ) imple- ment the autonomic manager. 1. 2. Monitor Analyze Service Composition Composition Composition Monitor Analyzer SLA Database Adaptation Metrics Actions Database Database 3x + 4y - 2z Adaptation Cost-Based Executor Optimizer 4. 3. Execute Plan Fig. 1: A Self-Optimizing System for Cost-Optimal SLA Conformance Our approach is based on the following assumptions: (1) service providers have explicit SLA(s) with their customers, including concrete numerical target values for SLOs and penalty payments for SLA violations; these payments can be staged, i.e., more severe SLA violations can lead to more severe penalties, and (2) there is a database of possible adaptations available, including the costs of these adaptations; costs can be both one-time costs (such as a downtime) or continually (such as increased costs for using more expensive services). The implementation of this database of possible adaptation actions needs to be supported by strong tools, which allow for the generation of the most important actions in a semi- automated way. Costs of adaptation actions can be derived using simulation or analysis of historical data. In the following, we are assuming an autonomous system, however, in some situations the role of the optimizer or executor can also be adopted by a human. Composition Monitor: The foundation of all work presented here is the ability to gather accurate runtime data. This includes: (1) QoS metrics such as response time or availability of the services used, (2) runtime payload data, such as cus- tomer identifiers or ordered items, and (3) technical parameters of the execution environment, such as the availability of the composition engine, or the CPU load of the machine running the composition. The Composition Monitor component is used to collect this monitoring data from various sources (e.g., an external QoS monitor [3] or technologies such as Windows Performance Counters1 ), con- solidate it and store it to a metrics database. Composition Analyzer: The data collected by the Composition Monitor is then used to generate prediction models for SLA violations. Prediction models are used to estimate at runtime if a given running instance is going to violate one or more of its SLAs. They are associated with checkpoints in the composition model, at which the prediction is done. Simply put, a prediction model is a function which uses all execution data which is already available at the checkpoint and, if possible, estimations for all missing data, and produces a numerical estimation for every target value in the provider’s SLAs as output. In earlier work we have used simple decision trees to implement such models [5], however for future work we investigate the usage of multi-layer perceptrons instead to improve accuracy of predictions. Cost-Based Optimizer: The Cost-Based Optimizer can be seen as the core of the system. This component needs access to all SLAs, as well as a database of pos- sible adaptation actions. The optimizer has to fulfill two important tasks in the system. Firstly, it uses the prediction models generated before to predict con- crete QoS values for every running instance and compares the predicted values with the respective SLOs. If SLA violations would occur it checks the Adap- tation Actions database for any possible action to prevent the violation, and applies them if it is cost-efficient to do so. This involves solving an optimization problem to decide which combination of actions both prevents most SLA viola- tions and is cheapest to implement. Secondly, if more than a certain threshold of SLA violations (in a given time frame) have been monitored, the component tries to improve the composition itself, i.e., it tries to optimize the composition for every future instance. The main difference is that on composition level more possible adaptation actions exist (mainly because adaptation on this level is less time-critical, so that adaptations which involve e.g., a system downtime are also feasible). Adaptation Executor: The Adaptation Executor is responsible for applying the adaptation actions as planned by the Cost-Based Optimizer. Generally, we con- sider the classes of adaptation actions (action classes) depicted in Figure 2. As discussed before, adaptation can happen either on instance (level 1, i.e., adaptations which affect only a single instance) or structural level (level 2, i.e., adaptations which affect all future instances), and can consist of rebinding base services (R*, i.e., switching from one used service to another), restructuring the composition (S*, e.g., parallelizing some parts of the composition) or adapting the execution environment (E, e.g., upgrading the virtual machine running ser- vice composition). Generally, actions of type E always affect all future instances, and are therefore only applicable on level 2. 1 http://msdn.microsoft.com/en-us/library/aa373083(VS.85).aspx Adapt Adapt Adapt Service Composition Execution Binding Structure Environment R1 S1 Instance Level instance-level instance-level rebinding structural change E R2 S2 Composition execution Level composition-level composition-level environment rebinding structural change change Fig. 2: Classes of Adaptation Actions Obviously, the concrete execution of adaptation actions depends greatly on the action class. For applying adaptations of type R* we can use the means pro- vided by the execution environment (such as dynamic rebinding as discussed in earlier work [4]). For S* more complex means are necessary, for instance adap- tation techniques such as AO4BPEL [6], or the more recent BPEL’n’Aspects [7] approach. Currently, changes of class E are mostly executed manually. However, for some changes of this class automation has been made possible by the recent rise of Cloud Computing. If, for instance, the execution environment is hosted in the Amazon Elastic Compute Cloud2 it is possible to automatically adapt (some) parameters of the hosting environment via the Amazon S2 API. 3 Related Work We will now briefly discuss some key related work. QoS monitoring of atomic services is discussed in [3, 8]. Our Composition Monitor will partially be based on these results. Monitoring of composition instance data has been discussed in [9]. However, these works do not explicitly cover SLA monitoring, which is the scope of [10]. The authors use an event-based approach to monitor QoS, which is in line with the ideas we have used in [5]. This work also pioneers the idea of SLA impact analysis, which is related to the tasks that our Composition Analyzer has to fulfill. Another basic building block of this component is the work presented in [11], which discusses the prediction of QoS (again using an event-based model). Self-adaptation of compositions, another core topic in our work, is discussed in [12]. The MASC system presented there adapts itself in order to recover from failures and improve reliability, however, this system does not try to predict problems and prevent them proactively. Optimizing service compositions with regards to overall QoS is an often-discussed topic, with some seminal work dating back to 2004 [13]. In contrast to this approaches, in our system optimization is done with regard to specific SLOs, which are currently violated, and taking into account the tradeoff between the costs of SLA violations and the costs of adaptation. 2 http://aws.amazon.com/ec2/ 4 Conclusions In this paper we have sketched the architecture of a closed-loop system, which autonomously optimizes service compositions with regard to SLA violations, taking into account the costs caused by the adaptation. Our next steps will be the implementation of an first end-to-end system, which includes prototypes for all four main components of the system. Furthermore, we will define a preliminary model for capturing the costs of adaptation actions. References 1. Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Tech- niques. 2 edn. Morgan Kaufmann (2005) 2. Kephart, J.O., Chess, D.M.: The Vision of Autonomic Computing. IEEE Computer 36(1) (2003) 41–50 3. Rosenberg, F., Platzer, C., Dustdar, S.: Bootstrapping Performance and Depend- ability Attributes of Web Services. In: ICWS ’06: Proceedings of the IEEE Inter- national Conference on Web Services. (2006) 205–212 4. Michlmayr, A., Rosenberg, F., Leitner, P., Dustdar, S.: End-to-End Support for QoS-Aware Service Selection, Invocation and Mediation in VRESCo. Technical report, TUV-1841-2009-03, Vienna University of Technology (2009) 5. Wetzstein, B., Leitner, P., Rosenberg, F., Brandic, I., Leymann, F., Dustdar, S.: Monitoring and Analyzing Influential Factors of Business Process Performance. In: EDOC’09: Proceedings of the 13th IEEE International Enterprise Distributed Object Computing Conference. (2009) 6. Charfi, A., Mezini, M.: AO4BPEL: An Aspect-Oriented Extension to BPEL. World Wide Web 10(3) (2007) 309–344 7. Karastoyanova, D., Leymann, F.: BPEL’n’Aspects: Adapting Service Orchestra- tion Logic. In: ICWS 2009: Proceedings of 7th International Conference on Web Services. (2009) 8. Moser, O., Rosenberg, F., Dustdar, S.: Non-Intrusive Monitoring and Service Adap- tation for WS-BPEL. In: Proceedings of the 17th International Conference on World Wide Web (WWW’08). (2008) 815–824 9. Wetzstein, B., Strauch, S., Leymann, F.: Measuring Performance Metrics of WS- BPEL Service Compositions. In: ICNS’09: Proceedings of the Fifth International Conference on Networking and Services, IEEE Computer Society (April 2009) 10. Bodenstaff, L., Wombacher, A., Reichert, M., Jaeger, M.C.: Monitoring Depen- dencies for SLAs: The MoDe4SLA Approach. In: SCC ’08: Proceedings of the 2008 IEEE International Conference on Services Computing. (2008) 21–29 11. Zeng, L., Lingenfelder, C., Lei, H., Chang, H.: Event-Driven Quality of Service Prediction. In: ICSOC ’08: Proceedings of the 6th International Conference on Service-Oriented Computing. (2008) 147–161 12. Erradi, A., Maheshwari, P., Tosic, V.: Policy-Driven Middleware for Self- Adaptation of Web Services Compositions. In: Middleware’06: Proceedings of the ACM/IFIP/USENIX 2006 International Conference on Middleware. (2006) 62–80 13. Zeng, L., Benatallah, B., H.H. Ngu, A., Dumas, M., Kalagnanam, J., Chang, H.: QoS-Aware Middleware for Web Services Composition. IEEE Transactions on Software Engineering 30(5) (2004) 311–327