Maintainability and control of long-lived enterprise solutions Oliver Daute Stefan Conrad SAP Deutschland AG & Co. KG Heinrich-Heine Universität Düsseldorf Business Solution Technology Institut für Informatik Homberger Straße 25 Universitätsstraße 1 40882 Ratingen 40225 Düsseldorf oliver.daute@sap.com conrad@cs.uni-duesseldorf.de Abstract: Maintainability of long-lived enterprise solutions is today’s top challenge. The constantly increasing functionalities of information technology provided these days, including hardware and software products, leads to giant networked application infrastructures. Business processes glue applications and data sources to enterprise solutions to fulfill business needs. Large firms are more and more dependent on the permanent availability of their enterprise solution. A failure in one part of the landscape can rapidly impair the whole enterprise landscape and harm the business. Maintainability is a critical issue and new approaches are required to keep complex landscapes under control at all times. We introduced the Real-Time Business Case Database, RT-BCDB as a basic concept. This paper presents the next reasonable step. Process Activity Control describes a steering mechanism for complex enterprise application landscapes, to gain more stability, transparency and visibility of processes activities in order to improve maintainability and to support the system administration in their daily operations. 1 Introduction More transparency and control inside complex application landscapes is required [KMP08] since concepts like cloud computing [Vo08], IT service management [ITIL], architecture frameworks [TOG09] or service-oriented architecture [SOA06] make it possible to build up giant enterprise application solutions. But mechanisms how to manage the underlying technology have been neglected. Long-lived enterprise solutions are subject to evolution [RB09], obsolescence and replacement. We have to find answers to questions regarding system maintenance, troubleshooting, availability, change management and the integration of new components. Maintainability and control of complex software solutions is today’s top challenge. This paper presents mechanisms for increasing the maintainability and control of complex application landscapes. Process Activity Control (PAC) is the next step after the introduction of the Real-Time Business Case Database [Da09]. PAC concentrates on the control of business processes which are currently active within an application landscape. The goal is to avoid indeterminate processing states which can cause further incidents within the application environments. Most enterprise or service frameworks are focused on business requirements which have improved the design of enterprise solutions significantly but often with too little consideration for the underlying information technology [Ro03]. Operation interests are neglected and little information about how to run a designed enterprise solution can be found. A single business process is able to trigger process activities across the whole landscape, uses different applications, servers or exchanges data. If a server fails or a database system stops its processing, then several business processes can be impaired (Fig. 2). In such situations it is quite difficult to determine the cause or impact on other process activities of the enterprise solution [KMP08]. The challenge for the system administration is to manage these complex application environments and to react as swiftly as possible to incidents [SW08]. Several tools are available to monitor large enterprise landscapes. Some of them just monitor the operating or the application level. Only few tools collect details about business processes and try to control them. But each tools has its own proprietary view to applications. There is no overall concept for heterogeneous application environments available. Numerous business cases within an enterprise solution make it impossible to be aware of what each single business process does. The system administration must be able to analyze business processing like a technical issue. Thus, business processes are seen as objects which have a run-state and are requesting to run. This eliminates the need to collect information about what a specific business process does. The focus on essential information also standardizes business processes and eliminates the need to identify its run-state in the case of an incident. System administration can react more purposefully and the designers of business processes get detailed information about the behavior of their business cases. Complexity leads to longevity of software solutions for several reasons; here we need only to think of done investments, cross application dependencies and efforts to exchange software products. This paper discusses the challenges and circumstances in complex enterprise solutions and why it is so difficult to control business processes. We will show how to set up a steering & control mechanism within the application environments and will discuss the benefits for the system management administration. For the communication with PAC, we will introduce a Code of Business Processes. It is a valuable step forward in defining the frame to identify business processes. Finally, the use of PAC in collaboration with the RT-BCDB will be presented, preceded by a short introduction to it. 2 Background & Terminology RT-BCDB stands for Real-Time Business Case Database and it is an approach to collecting and providing information about business process activities in heterogeneous application landscapes. RT-BCDB aims to improve the transparency of activities. The knowledge acquired supports the administration during maintenance activities and is an important source of information for the business designers as well. RT-BCDB supports tasks, like updates, recovery, planning or optimization of the time schedules. Figure 1: Real-Time Business Case Database A system failure requires detailed investigations about the impact on business process activities before a system recovery can take place. This means processes which were impaired must be identified and must be included in the recovery process. The data of RT-BCDB is important during the analysis and restoration phase in order to bring back business processing to the latest consistent state. It is also indispensable for monitoring, reporting and changing the landscape. RT-BCDB stores data about business cases, processes, owners, history of previous processing, execution frequencies, run-time, dependencies, as well as availabilities of processing units and applications. Knowledge about run-states of business processes is important information for maintaining and controlling business processes. The term enterprise solution encapsulates a collaboration environment of hardware and software technologies with the common purpose of providing an infrastructure for business processes. A solution can consist of ERP software, various legacy systems, data warehouses, middleware for exchanging data and connecting software applications. Other expressions are application landscape or application environment. Business cases are designed by the business requirements and needs. A business case consists of several processes which can be performed on different systems. Business cases determine the tasks of the customer’s enterprise solution. A business process consumes data or provides them and can be triggered by other processes or services. Business processes make use of different applications and data sources across an application landscape with regard to the enterprise needs. System maintenance activity relates to ongoing tasks on the system administration level. Typical maintenance activities are updates, incident analysis, recovery, replacement and the like. (RT-BCDB supports active or reactive maintenance activities). 3 The Idea Process Activity Control is required because of the continuously increasing complexity of long-lived enterprise solutions, driven by evolution, availability, growth, business requirements, modern tools and enterprise application framework methods [TOG09]. IT administration has to manage these solutions in any situation. New mechanisms are required to assist them. The constantly increasing complexity of enterprise solution is the number one cause of cost-intensive system failures [EIU07]. Incidents interrupt business processes while they are performing a task. The malfunction of a processing unit or of an application can cause process activities to fail. Processes need to be restarted in order reach data consistency on business process level. The idea of PAC is to minimize uncontrolled failure of business processes and reduce the amount of incidents. If problems within the application landscape are already known, e.g. a database stopped processing, then there is no reason for a process to start with the risk of halting in a failure situation. PAC acts proactively and thus avoids disruptions. PAC also addresses another currently unsolved problem: the start and stop process of a complex application landscape or parts of it. It is still a complex matter to shutdown a single application without the knowledge of dependent business cases and without impairing the business. At the moment, there is no outer control for business cases available. Whenever a shutdown is required, PAC identifies active business processes to stop and will avoid the start of further business cases. Business processes are triggered or started by different activators. A process can cause different activities across the whole application landscape and exchanges data. Various automated control instances start and stop processes, too. But there is no overall concept for heterogeneous application environments. The figure (Fig. 2) depicts a well-known situation in application environments without control of processes. Uncontrolled failure of business processes may result in an unknown run-state or data inconsistencies on business level. From the perspective of a business case, a consistent state requires more than data integrity on database level. Also dependent interfaces or single process steps must be taken into consideration. Those can halt in an inconsistent state anywhere in an application environment. Figure 2: Failure within the application environment PAC works as a steering mechanism for business processes and is especially valuable in the control of core business processes. To interact with business processes, PAC makes use of RunControl commands. We will discuss these later. PAC is able to collect run- state information of business processes and send them to RT-BCDB. 4 Code of Business Processing Various situations arise in complex application landscapes because a form of identification for business processes is missing. These are not easy to handle or to overcome in case of incidents. Therefore, identification is required for business processes in the communication with PAC. The Code of Business Processing, CoBP covers some minimum requirements. Traffic laws are simple and effective. They are necessary to control and steer the traffic within a defined infrastructure, the traffic network. Traffic laws and networks together describe a kind of code of conduct which participants have to accept in order to participate in the traffic. It is an appropriate steering mechanism for a complex environment. This example of a regulating mechanism gives us an idea of what is needed for application landscapes. We will try to translate some elements of traffic laws and network into a code for business processes. We propose a Code of Business Processing which contains general rules and requirements for using an application environment. The code should only be applied to processes which are of significance for the enterprise solution itself. That means only processes with a high impact in case of a failure have to meet the code. The CoBP consists of rules and requirements: First CoBP: Each process must have a unique form of identification. This is required to identify and steer a process while it is active. Identification is needed to follow the tracks (footprints) during the processing lifetime and to refer to the process owner. Process IDs are unique across all applications to avoid false identification. A process ID can also be used to classify. The most important business processes have a distinctive ID. Second CoBP: Each business process must be documented. It must belong to a business case and visualization must exist. A diagram contains the run-states a business process can run in. Procedures must be given for recovery purposes in case of a process failure. Third CoBP: Each process must have a given priority. On the one hand it is important to decide which process should be given priority and also to determine a sequence of processing. On the other hand it is useful to mark processes regarding the impact on the application environment. The business processes with higher priority must process first, unless PAC decides differently. Fourth CoBP: The higher a priority is the higher is the charge for a business process. Certainly, accounting and charging for processes is not very often done at customer site. But a process with a high priority does have a significant impact on all other processes that run within that environment. For our purposes the first two rules are sufficient to control activities. Additional rules can be added. Ideally, communication between processes should always take place in traceable ways. As a result of this our last CoBP follows. Fifth CoBP: Business Processes should use defined and traceable ways for interaction. This forces the use of known and open interfaces. The CoBP improves the traceability of business processing significantly and supports the maintainability. For existing application landscapes, the 5th CoBP requires reviewing the process interaction. The rules of CoBP enable better steering of business processes, provide more transparency and thus improve maintainability. 5 Process Activity Control Process Activity Control is an approach to controlling process activities in complex application environments. PAC will stop further business processing in case of problems occurring. This will prevent business processes running into undefined processing states and avoid further incidents. PAC has to consider several issues in order to control process activities. A major task is, for instance, determining the function state of business processes, applications and processing units or the functioning of PAC itself. This can be done by RT-BCDB. RT-BCDB uses agents to collect information about an application landscape. On the hardware and application level, agents search for a specific pattern to determine the function state and availability. Figure 3: Architecture of PAC Applications fork operating system processes, which can be monitored as well to identify throughput. A premature termination of an application process on operating systems can point to the failure of an application. The agents inspect the given sources of information and try to identify run-state and availability information. This information is used by PAC to react to current circumstances. PAC will try to avoid any starting or triggering of business processes which will make use of a malfunctioning processing unit or impaired applications. For smaller software environments information about the availability of hardware and software application is sufficient enough to control and to avoid further incidents. But only few details about business process activities are visible. For large application landscapes, such as long-lived enterprise solutions, PAC must also be informed of run-states of business processes. This information is provided by the knowledge base of RT-BCDB. PAC consists of several basic elements: a decision-control mechanism, a Custom Rule Set, the CoBP, an interface to RunControl, and a communication interface to RT-BCDB. The decision-control mechanism is subdivided into four main activities: receive request, evaluate, decision and control. Each activity has one or more tasks. Activity ‘receive request’, just receives the Request for Run (RfR) in sequence of income. Whenever a business process starts or stops or changes its run-state, then RunControl will send an RfR. The RfR contains the process ID and the state of running. The activity ‘evaluate’, evaluates the RunControl request against the information stored in RT-BCDB. The run-state table of RT-BCDB always reflects the status of process activities within the application environment. Any known problems with the availability of applications or processing units are taken into consideration. The ‘decision’ process uses CoBP, Custom Rule Set, RT-BCDB and the evaluation of the previous activity. A final decision will be prepared to return a ‘Confirmation to Run (CtR)’ or to stop a business process. The ‘control’ activity is the steering process of PAC. It has two functions. The first is to answer the RfR and to send a CtR. In the case of a business process having to be paused, the control process waits to send the CtR until problems are solved. The second function is to stop business processes in the case of the application landscape having to be shut down. Vice versa ‘control’ enables the start-up of business cases in a predefined sequence, for instance after maintenance activities or after the elimination of incidents. The Custom Rule Set contains rules given for a customer’s application landscape. The rule set can contain an alteration of priorities or a list of business cases which have to run with a higher priority. Also preferred processing units can be part of the rule set. Another element is CoBP, which was described previously. Finally, an application interface, which is used to communicate with RT-BCDB, is also part of PAC. In operation, PAC as a control instance must monitor its own availability. At least two instances of PAC must run within the environment. This is necessary to prevent PAC is becoming single-point of failure for the enterprise solution. In normal operation the second instance should be used to answer RfR. 6 Run Control PAC introduces an extension to RunControl [Da09] commands. RunControl commands are used to receive information about processes and required for controlling the progress of their activities. Whenever a business process starts, stops or waits, the RunControl command will send a message with the process ID and the run-state. The information will be stored immediately in the run-state table of RT-BCDB. Due to this, active business cases and their status can be made visible (Fig.4). Figure 4: Run-States of active business cases We adapted the ‘RunControl’ statement for use with PAC. Instead of sending the run- states information using the agents, this information is send to PAC immediately. PAC forwards the information to RT-BCDB. On the business process layer, PAC takes over the tasks of the agents. The second change is an extension of functionality. The RunControl function waits until it receives a ‘Confirmation to Run’ from PAC. To distinguish between the two versions of RunControl statements, we will use RunControlAC when using PAC. Figure 5: Example for RunControl in source code or transparent layers Several options are given to implement RunControl statements (Fig. 5). An option is to insert RunControl statements into the source code [UML] [Sv07]. For existing applications adaptations are possible during migration projects [St06]. Reverse engineering might be an appropriate discipline too. Consequently for the future design of business solutions, applications should be developed with regard to run-state information or the RunControl statements. Instead of modifying the source code, processes can also be called up from a transparent layer in which functions are embedded in RunControl statements (Fig.6). This option is much easier to implement and to introduce for heterogeneous landscapes. Certainly, some effort is needed for the implementation of the RunControl. But with the constantly increasing complexity of application landscapes, a mechanism as described is indispensable for keeping a long-lived enterprise solution under control. 7 Improving Maintainability and Control The aim of our concepts is to gain more transparency of and control over process activities. To increase the overall availability, e.g. by the prevention of cost-intensive incidents, is one aim more to be listed. We will discuss next by means of a simple example, how PAC improves the maintainability of complex application landscapes. An application stops its processing (Fig. 6). PAC recognizes this problem and stops further processing of business cases which will make use of the application. Here, a single business case is impaired and has to be stopped in order to prevent data inconsistencies. If a standby application is available, PAC can move the processing to it. In the case of performance bottlenecks, PAC is able to stop a business process in order to prevent a problem from getting worse or shift an RfR to another application unit if possible. Although users have to wait until the problem is solved, indeterminate processing states will be avoided. These can cause further incidents and additional efforts for solving. PAC makes use of RT-BCDB knowledge to decide a ‘Request for Run’. If incidents to databases, applications, processing units or business cases are known, then PAC will determine if a ‘Request for Run’ will make use of them. The run-state and availability information, stored in RT-BCDB, provides a virtual image of activities within the application infrastructure. Figure 6: Avoiding indeterminate run-states and inconsistencies Maintenance tasks like updates or upgrades of the applications also require detailed information about the business cases possibly involved. The constant availability of long-lived enterprise solution requests to maintain the landscape while numerous business processes are running. PAC can prevent process activities while parts of the application landscape are under construction. RT-BCDB provides the dependencies. How to measure improvements in terms of Return on Investments? We will try to answer this question with regard to time, money and quality. Time: Each incident which was prevented saves time. Identifying the cause of and solution to an incident cost time. Additional time is needed for reporting and documenting the solution, and several persons of different departments are involved. Users are hindered in their work and will lose time. PAC prevents cost-intensive incidents and improves the overall-availability of the enterprise solution. Money: Costs are incurred by incident handling, software for incident tracking and support staff. Downtimes can cause less productivity and can result in fewer sales. An overall estimation is difficult to provide. In the worst case, especially in the area of institutional banks, an incident can cause bankruptcy within a few days [EIU07]. Quality is often not easy to measure. In enterprise application landscapes quality means availability, reliability, throughput, maintainability and competitiveness. Quality can save money but can increase costs. The introduction of PAC and RT-BCDB requires investments. But more visibility and transparency lead to more purposeful reactions to incidents within the landscape. This saves time. And fewer incidents, however, increase the quality and the availability of an enterprise solution. We assume that for large environments the investment in regard to the increase in quality will save money in the end. In smaller environments our concept will at least improve quality. We expect that PAC & RT-BCDB reduce the TCO [Ga87], but further investigations are required to determine the quantity and quality of improvements. 8 Evolution and Obsolescence Long-lived enterprise solutions are subject to evolution, obsolescence and replacement. Typical, for long-lived solutions is that the interaction between software components last longer than the components themselves. Therefore knowledge about activities is essential to maintain the landscape and to support the change management process. Evolution is driven by new requirements, growth and replacements of hardware or software, including general maintenance. Evolving a long-lived application solution can be difficult. Lack of information about dependencies of process interactions, the way the solution works and the integration of new components are some examples. Moreover the enterprise solution requires close to 100% availability. RT-BCDB is oriented to the information about the processing-unit, business cases and process activities. PAC supports the maintenance process while the enterprise solution is in use. Innovation can lead step by step to obsolescence of applications of the enterprise solution. Often, so called legacy systems are difficult to exchange for some reasons. Problems are missing documentations, designers left the organization and too little information about the purpose and frequency of usage. Our concepts can explain their usage and improve the replacement. They are able to detect monolithic applications. A constantly decreasing usage can indicate obsolescence. As we pointed out, most frameworks are focused on business requirements and neglect the maintenance interest. Thus, insufficient transparency in operation and missing control mechanism are the results to name only some of the gaps. Maintainability and availability (incl. avoidance of cost-intensive incidents) are the main really challenging tasks. But also recovery is an interesting issue to ensure data consistency across large application landscapes. Our concepts are steps forward to improving maintainability and control of long-lived software-solutions. 9 Conclusion The concepts presented support maintenance tasks and the evolution of long-lived enterprise solutions. System administration can react more purposefully. PAC improves the control over business process activities, avoids incidents and makes better utilization of resources. This leads to a higher availability and a better quality of the complex application landscape. RT-BCDB is the knowledge basis and provides transparency; an image of process activities and their dependencies. Certainly, some work needs to be done to implement such concepts, especially if software applications need to be adapted. But, with the constantly increasing complexity of enterprise solutions, mechanisms as described are indispensable for keeping long- lived enterprise application landscapes maintainable in the future. References [Da09] Daute, O.: Introducing Real-Time Business CASE Database, Approach to improving system maintenance of complex application landscapes, ICEIS 11th Conference on Enterprise Information Systems, 2009 [Da04] Daute, O.: Representation of Business Information Flow with an Extension for UML, ICEIS 6th Conference on Enterprise Information Systems, 2004 [EIU07] Economist Intelligence Unit: Coming to grips with IT risk; A report from the Economist Intelligence Unit, White Paper 2007 [Ga87] Gartner Research Group: TCO, Total Cost of Ownership, 1987, Information Technology Research, www.gartner.com [ITIL] ITIL, IT Infrastructure Library, ITSMF, Information Technology Service Management Forum, www.itsmf.net [KMP08] Kobbacy, Khairy A. H.; Murthy, D. N. Prabhakar, 2008, Complex System Maintenance Handbook, Springer Series in Reliability Engineering [PH07] Papazoglou, M.; Heuvel, J.: Service oriented architectures: approaches, technologies and research issues, Paper, International Journal on Very Large Data Bases (VLDB), 16:389– 415, 2007 [Ro03] Rosemann, M: Process-oriented Administration of Enterprise Systems, ARC SPIRT project, Queensland University of Technology, 2003 [SW08] Schelp, J.: Winter, Robert: Business Application Design and Enterprise Service Design: A Comparison. In: Int. J. Service Sciences 3/4 (2008) [SOA06] SOA: Reference Model for Service Oriented Architecture Committee Specification, www.oasis-open.org, 2006 [St06] Stamati, T: Investigating The Life Cycle Of Legacy Systems Migration, European and Mediterranean Conference on Information Systems (EMCIS), Alicante Spain 2006 [Sv07] Svatoš, O.: Conceptual Process Modeling Language: Regulative Approach, Department of Information Technologies, University of Economics, Czech Republic, 2007 [RB09] Riebisch, M; Bode, S.: Software-Evolvability, GI Informatik Spektrum, Bd 32 4, Technische Universität Ilmenau, 2009 [TOG09] TOGAF, 9.0: The Open Group Architecture Framework, Vendor- and technology- neutral consortium, The Open GROUP, www.togaf.org, 2009 [UML] UML: Unified Modeling Language, Not-for-profit computer industry consortium, Object Management Group, www.omg.org