Towards Automation of Enterprise Architecture Model Maintenance Matthias Farwick? (Supervisor: Ruth Breu) University of Innsbruck, Institute of Computer Science matthias.farwick@uibk.ac.at http://www.q-e.at Abstract. Enterprise Architecture Management (EAM) is a common practice in organizations that need to have a model of how their business relates to the supporting IT-landscape in order to make informed deci- sions to enhance the enterprise architecture (EA). Creating and main- taining such an EA model is an expensive and time consuming but cru- cial task in today’s organizations. This has been recognised by both re- searchers and practitioners. However, only little research literature and practical approaches can be identified that target the automation of the EA model maintenance and to reduce the manual work. In this thesis we elaborate means to increase the data quality attributes of actuality and consistency of EA models via semi-automated data collection processes from external data sources and events. It is our main goal to better syn- chronize EA models with what they represent in the real world and thus reduce the manual labor of model maintenance. Keywords: enterprise architecture management, maintenance, automa- tion, meta-model, EAM, modelling 1 Introduction Enterprise Architecture Management (EAM) is a practice used in mid-sized to large organizations that aims at modelling the relationships between business, its supporting information systems and the underlying IT infrastructure. This effort is done to be able to make informed decisions to better align business and IT, enable strategic planning of IT-changes, assess risks, and check compliance with legal regulations. Several EAM frameworks are applied in practice today, such as TOGAF [1] and the Zachmann framework [2]. These frameworks prescribe enterprise ar- chitecture (EA) meta-models, processes, best practices and EA principles. It is common that EA specific applications are used in order to collect the EA data and build a model of the current state of the EA as well as to elaborate the ? This thesis is partially supported by the Austrian Federal Ministry of Economy as part of the Laura-Bassi – Living Models for Open Systems – project FFG 822740/QE LaB and iteratec GmbH, Munich. 2 Matthias Farwick planned future state of the EA. This data collection is often a very time consum- ing task since the relevant EA data is distributed among different departments and is mostly collected via interviews or surveys with stakeholders [3]. Due to this time consuming nature of EA data collection, frequent changes in the EA and the immense size as well as complexity of EA models, it is a difficult but crucial task to keep these models up-to-date with the reality. This problem has been recognized by both researchers [4–7] and practition- ers [8]. However, only little tool support and research literature can be identified that actually provides approaches for solving the problem of EA model mainte- nance in practice. To tackle this problem we are elaborating means for better keeping EA mod- els in-sync with what they represent in the real world in the course of this thesis. The aim is to decrease manual work in the EA practice and increase the EA data quality attributes actuality and consistency. We are aiming to achieve this by better supporting the manual data collection processes with tool sup- port. This includes automating the process of integrating EA data from external data sources and making use of change events. These events are either gener- ated by data quality checks or the environment in order to trigger manual and automated updates of the EA repository. The thesis is conducted in collaboration with the company iteratec GmbH, Munich, which produces the open-source EA tool iteraplan. The work on this thesis is at beginning of its third year with the expected completion in summer 2013. This paper is structured as follows. In the next section we detail on the research questions we want to tackle in this thesis. After that we point out the preliminary work, such as empirical studies and implementation work, that was already conducted. We then introduce our approach to automated EA model maintenance in Section 3. From the introduction of the preliminary and future work we then derive the expected contributions in Section 4. After that we introduce the means with which we plan to evaluate the artifacts produced in the course of this thesis. We then show in Section 6 that, although the problem of automated EA model maintenance is highly relevant in practice, related work on this topic is scarce. Finally, we summarize and point to the direction of future work in Section 7. 1.1 Research Questions The general research question of this thesis can be stated as follows: How can EA models be better synchronized with the real world? In order to holistically address this research problem of automated EA model maintenance, we approach the problem space not only from the technical side, but also from the point of view of how the EA model maintenance is embed- ded in organizations. Hence, in our research we first analyzed the context of enterprise architecture initiatives in typical organizations, i.e. the roles of EA Towards the Automation of Enterprise Architecture Model Maintenance 3 stakeholders, (neighboring) processes and common data sources for automated EA data collection. This constitutes the first specific research question of this thesis as follows: Q1 What are the typical environmental factors for the automated mainte- nance of EA models, such as people, processes and available data sources for automation? These environmental factors also determine in which way events from the envi- ronment can be utilized to trigger changes to the EA model. Examples of such events are, e.g. a new release of an application that was developed in-house, the end of an architecture change project, or simply the scheduled execution of a manual architecture review process. This leads to the second research question: Q2 What are typical architecture change events and how can an EA tool collect these in order to update the EA model at the right time? In our preliminary work we noticed that in the context of automated EA data collection it is unrealistic to assume that every step of the collection processes can be automated. One of the main reasons is the high abstraction level of EA models. When data is collected from external sources of EA relevant information it is, e.g. very likely that duplicates are introduced or the level of abstraction of the incoming data does not fit the level of abstraction at the EA repository’s side. Thus, we believe that computer aided processes with human task-lists should be used in order to include humans into the data collection processes where necessary. This leads to the third research question: Q3 How can the EA data collection be supported by semi-automated data collection processes from external data sources? In order to support the data collection processes and eventing mechanisms, an EA tool needs to be provided with context information. This context information can be responsibilities of stakeholders, available data sources, the data origin of elements in the model (manually entered or from a data source), or the date of the last change of a model element. This data has to be stored alongside with the actual EA model and has to be connected to the elements of the model. Hence, this information needs to be incorporated into the EA meta-model. This constitutes the fourth research question: Q4 How can an EA meta-model be devised that facilitates automated EA model updates with context information? As pointed out in the introduction of Q3 the EA data collection from external data sources introduces several problems. The lack of human judgement, for example, can introduce inconsistencies such as duplicate model element entries or simply the representation of the EA at the an inappropriate (too detailed) level of abstraction. This leads to research question five: Q5 How can the data collection processes be supported by an EAM tool, e.g. identity reconciliation mechanisms to remove duplicates? 4 Matthias Farwick After having introduced the research questions, we proceed with presenting the preliminary work that was already conducted in the course of this thesis and the research methodology that is applied. 2 Preliminary Work & Research Methodology In the context of this thesis we have already conducted preliminary work that establishes the problem relevance and produced an initial prototype. The basic chronological steps of the research activities can be seen Table 1. Currently we are approximately at the end of phase 4. # Step Activity 1 Evaluation of problem relevance Practitioner interviews & survey & literature review 2 Elicitation of requirements for im- Practitioner interviews & survey & literature plementation and processes review 3 Definition of integration processes Practitioner interviews & survey & literature review 4 EA tool prototype and meta-model Iterative software development implementation 5 Refinement of prototype with Iterative software development & survey & in- eventing and rules terviews 6 Evaluation Case study & expert interviews & inclusion of automation concepts in open-source EA tool iteraplan 7 Dissemination Journal Publication Table 1. Chronological steps of research including methodology. As it can be seen in the table above, we started out with an evaluation of the problem domain. The first steps were three interviews with practitioners in the field of enterprise architecture management from the electric utility service and insurance fields. These unstructured interviews gave us a basic picture of the data collection problems in practice. A thorough literature review revealed that the related work in the area of automated data collection for EA model is scarce (see Section 6). In order to verify our assumption that the EA data collection is a major problem in EA practice, we conducted an online survey among EA practitioners. The results of the survey supported our assumption with 90% of the survey participants agreeing or strongly agreeing with the statement “The manual collection and maintenance of EA data in a sufficient quality, is one of the major challenges of EA practice.” [8]. As part of the design science process according to Hevner et al. [9] this establishes the relevance of the problem. In the second step we elicited a set of requirements for an EA tool that sup- ports automated maintenance of EA models from external data sources. These Towards the Automation of Enterprise Architecture Model Maintenance 5 requirements resulted from the literature review, the expert interviews, the sur- vey and the experience of our industry partners of the company iteratec in Mu- nich. The results of the survey are published in [8]. In the third step we defined data collection processes (independently from an implementation) that takes human interaction into account. The goal of these processes is to reduce the manual work of EA data collection and to include humans in the process via tasks only when necessary. The results of these process definitions were published in [10]. These processes are the first design science artifact produced in the course of this thesis. Currently, we are in the phase of implementing the first prototype of an EA tool that supports the defined requirements and the proposed processes as the second design science artifact. Also, we are currently conducting another survey among EA practitioners, in order to elicit typical data sources in organizations that apply EAM. We expect that the results will allow us to make statements about what type of data can be typically gathered automatically and statements about the expected data quality. Also the results can help us to further refine our prototype in step 5 to better support realistic integration scenarios from data sources that are common in practice. The details of the prototype and our general approach will be explained in Section 3. After the implementation phase, we plan to evaluate our work with further expert interviews and an extensive case study and the transfer of our concepts to the open-source EA tool iteraplan that is used in practice. We further detail our evaluation strategy in Section 5. Finally, we aim at disseminating our findings in a journal publication. 3 Approach to Automating EA Model Maintenance As outlined before, we have strong indication that full automation of data collec- tion for EA models is not feasible. The reasons are that (i) EA data, especially in the business layer, is often modelled at a high-level of abstraction and these abstractions can only be made by the judgement of humans, (ii) not all data can be collected automatically from existing data sources in an organization, and (iii) in case data inconsistencies occur, such as entry duplicates, the resolution of the duplication can often only be decided by a human. Thus, the inclusion of humans in the data collection and quality assurance processes plays a major role in our approach. The goal of this thesis is to shift the work of humans in the EA process from the data collection to the quality assurance tasks. In addition to the integration of data sources that can automatically provide data on the EA, we consider events from the environment that can trigger manual or automated actions in the EA tool. In the following, we detail the prototypical architecture we are currently implementing, and explain the specifics of the meta-model we created to support the implementation with the required context information. 3.1 Tool Architecture Figure 1 gives an overview of our implementation approach. 6 Matthias Farwick Fig. 1. Overview of our implementation approach for an EA tool that supports the recurring semi-automated collection of EA data from external sources. The central part of the architecture is the process engine, based on Apache Activiti1 , which can be used to configure the semi-automated data collection with BPMN processes. The processes are driven by change events. The change events can come in the form of events from external information systems via the event interface, as data from external data sources and as human input from task lists created by the task provisioning component in the web-based user interface. In addition, the process can be driven by data analysis events, such as the expiry of a model element2 . The data analysis component periodi- cally checks the EA repository for inconsistencies, duplicate entries, or expired model elements. Via the deployed processes, changes can be applied to the EA repository over the data access layer. Currently, the EA repository is implemented as a OWL2 knowledge base using the Jena3 RDF repository. The reason for this technological choice are the ability to easily apply rules, the ability to adapt the meta-model at run-time, and being able to model on several meta-layers at runtime (refer to Section 3.2 for details). External data sources provide EA relevant data via data source adaptors that have to be implemented for a each specific data source type. The adap- tors take care of the mapping between external data sources and the internal EA repository data representation. For example, this could be a mapping, be- 1 http://www.activiti.org 2 In our approach a model element expires when it has not been changed or checked within a specified period of time 3 http://incubator.apache.org/jena Towards the Automation of Enterprise Architecture Model Maintenance 7 tween XML coming from a SOAP interface to the internal OWL representation. Note that implementing the adapters very likely has to occur individually for each company’s data sources. For this prototype we plan to implement only few representative adaptors, such as an adapter to a Configuration Management Database (CMDB)[11] and an adaptor to a network monitor. We acknowledge that in practice implementing the adaptors can be a time consuming task as well. Hence, the trade off between the cost of implementing adaptors and long term savings should be calculated as we describe in [10]. Information systems which cannot provide structured EA data, but are able to indicate EA relevant change events can fire events to the EA tool via the event interface. These events can hint responsible stakeholders, about changes such as finished projects or new information system releases. Event providers in these contexts are, for example, project portfolio management tools or software configuration management tools. This way manual data collection processes can be initiated right after the changes are applied in reality, even though no full automation is possible. 3.2 Meta-model Another important part of our approach is the meta-model. Different EA frame- works such as TOGAF [1] and [12] prescribe the information model including the concepts as well as their attributes and relations that can be used to model the EA. Those EA information models often share some concepts like the concept InformationSystem but greatly differ in detail. In line with other researchers in the EA field [13–15] we believe that the EA information model should be organi- zation specific to better resemble the specific EA of an organization. Hence, we provide a meta-model that is a generic foundation and can be used to create or- ganization specific information models that precisely cover the desired concepts of an organization. The important characteristic of the meta-model is that it contains automation- related concepts that are independent from the organization specific realization of the information model. However, we argue that the concept provided in our meta-model could also be applied to existing information models such as TOGAF if the organization-specificity is not considered as important by an organization. The meta-model, among others, contains concepts that provide means for: – assigning responsible persons or roles to model element types, to specific model elements and to events, – modeling data sources and the model element types and attributes they provide, – tracing the origin of model elements and their attributes (i.e. whether they were entered manually or came from a data source), – updating single model element attributes from a data source, – definition of identifying properties of model elements that are used to elim- inate duplicates and – generating change events via the definition of expiry durations. 8 Matthias Farwick The result is that the data collection processes and event mechanisms have the necessary context information to operate independently of the organization specific meta-model. Details of this meta-model have been submitted for publi- cation. 4 Contributions/Artifacts The main contributions and produced artifacts of this thesis will be the following: 1. A thorough literature analysis on the current state of the art in enterprise architecture model maintenance. 2. The collection and analysis of potential events and data sources that can be used to automate the EA data collection. 3. Specified data collection and quality assurance processes that include the data collection via events that trigger manual input and collection from external data sources. 4. A meta-model that supports the data collection process with context infor- mation and the ability to create organization specific information models. 5. A prototypical EA tool implementation that provides eventing, process exe- cution, and modeling functionality and supports the recurring import of EA data from external data sources. These artifacts in combination will provide a holistic view on automation possibilities in organizations that has, so far, not been covered by research liter- ature. They will be of practical relevance for organizations that want to optimize their EA data collection, and for EA tool vendors and users to optimize their data collection tooling. 5 Evaluation Strategy We are aware of the fact that applying our prototype in practice will be very difficult to achieve. Thus, we apply alternative means to evaluate the artifacts produced in the course of this thesis. First, we will create case studies that show the applicability of our approach, taking into account the findings of our survey on the EA-relevant environment in organizations, such as their processes, events, and available data sources. This way, we can give an estimate on the amount of automation that can be achieved by our approach. As part of these case studies, we will simulated semi-automated data collection from typical data sources such as CMDBs or network monitors. Second, we will present these case studies as part of further interviews to EA experts, in order to gather input on how relevant the produced concepts are in practice. Third, we are already in the process of transferring some automation con- cepts of our prototypical implementation to the open-source EA tool iteraplan of our industry partner iteratec GmbH. This way, parts of our approach can be evaluated in practice. Towards the Automation of Enterprise Architecture Model Maintenance 9 6 Related Work As stated in the introduction, the related work in the EA literature is very lim- ited. Several publications acknowledge the problem of EA model maintenance, however only one publication could be identified that presents an implemented solution. An example of a paper that acknowledges the problem is the recent vision paper by Brückmann et al. [5] on real-time EA monitoring. The authors have the goal of providing real-time EA models. Since it is a vision paper no concrete solution approaches are presented. Another publication that mentions EA automation is the work by Moser et al. [16]. The authors describe a process that includes automated data collection. However, the authors do not describe how the process is supplied with rele- vant context information about data sources and do not provide implementation details. Our meta-model may provide a basis for realizing these processes. The work of Fischer et al. [6] discusses the federated nature of data collection in EAM, that is also relevant for the automation concepts. However, the publi- cation also does not present any details about the meta-model requirements for automation or other implementation details. The only publication that actually presents a concrete implementation ap- proach for automated EA model maintenance is the work by Buschle et al. [7]. They present a tool for the instantiation of an EA model via the use of a se- curity network scanner. The tool can be used for the initial import of model element instances describing the IT infrastructure. We argue that this approach can only be used for initial import of EA data and may introduce inconsis- tencies because no explicit (manual) checks are included in the approach. Our foundational meta-model could help to develop and enhance the presented tool prototype. In particular, recurring updates from arbitrary data source including network scanners could be enabled based on our meta-model. On the technical side, several approaches originating from different IT-fields address the topic of federating information from different data sources in the business context and are hence relevant as foundations for our work. In particu- lar, Extract-Transform-Load (ETL), data warehouses [17], Master Data Manage- ment [18] (MDM) as well as Configuration Management Databases (CMDBs) [11] target these topics. These disciplines provide a large body of foundational re- search that relates to the issues of automated EA model maintenance, but differ in detail. We will build on the concepts of these related disciplines where appli- cable. On the practical side, the current generation of commercial EA tools mostly support the batch import of EA data from external sources. I.e. the import of data from files in common formats such as XML, CSV or Excel. In addition, some tools support the data import from relational databases and selected information systems such as CMDBs. To the best of our knowledge no current EA tool supports the traceability of sources of model element instances and recurring updates that map elements to their original sources, as well as the process and eventing features sketched in this paper. 10 Matthias Farwick 7 Conclusion & Next Steps In this paper we have shown that the maintenance of EA models is a relevant problem in practice. This has been acknowledged by both researchers and practi- tioners. Nevertheless, related approaches are scarce as we outlined. We presented the key research questions that we aim to answer in this thesis in order to provide holistic approach to solving the socio-technical problem of enterprise architec- ture model maintenance. We then discussed the implementation approach that is based on processes for data collection from external data sources and for quality assurance. The underlying meta-model provides means to create organization specific information models and holds relevant data to drive the data collection processes. For example, the meta-model enables maintaining the connection of a model element to its data source, and thus enables recurring updates from its source. Finally, we highlighted the main expected contributions of this thesis and pointed out related EA research literature and neighboring research fields from which foundations can be drawn. The next steps in the course of this thesis will be the evaluation of our survey on EA data sources and the further implementation of our prototype with the input from the survey. We will specifically focus on the concepts of model element identity reconciliation. References 1. The Open Group: TOGAF “Enterprise Edition” Version 9. http://www.togaf.org (cited 2011-06-08) (2009) 2. Zachman, J.A.: A framework for information systems architecture. IBM Systems Journal 26(3) (1987) 276–292 3. Winter, K., Buckl, S., Matthes, F., Schweda, C.: Investigating the state-of-the- art in enterprise architecture management methods in literature and practice. In: MCIS 2010 Proceedings. (2010) 4. Buckl, S., Matthes, F., Schweda, C.M.: Future research topics in enterprise archi- tecture management–a knowledge management perspective. In: Service-Oriented Computing. ICSOC/ServiceWave 2009 Workshops, Springer (2010) 1–11 5. Brückmann, T., Gruhn, V., Pfeiffer, M.: Towards real-time monitoring and control- ling of enterprise architectures using business software control centers. In Crnkovic, I., Gruhn, V., Book, M., eds.: Software Architecture. Volume 6903 of Lecture Notes in Computer Science. Springer Berlin / Heidelberg (2011) 287–294 6. Fischer, R., Aier, S., Winter, R.: A federated approach to enterprise architecture model maintenance. In Reichert, M., Strecker, S., Turowski, K., eds.: EMISA. Volume P-119 of LNI., GI (2007) 9–22 7. Buschle, M., Holm, H., Sommestad, T., Ekstedt, M. In: A Tool for Automatic Enterprise Architecture Modeling. (2011) 25–32 8. Farwick, M., Agreiter, B., Ryll, S., Voges, K., Hanschke, I., Breu, R.: Requirements for automated enterprise architecture model maintenance. In: 13th International Conference on Enterprise Information Systems (ICEIS), Beijing. (2011) 9. Hevner, A.R., March, S.T., Park, J., Ram, S.: Design science in information systems research. MIS Q. 28(1) (March 2004) 75–105 Towards the Automation of Enterprise Architecture Model Maintenance 11 10. Farwick, M., Agreiter, B., Breu, R., Ryll, S., Voges, K., Hanschke, I.: Automa- tion processes for enterprise architecture management. In: 2011 IEEE 15th Inter- national Enterprise Distributed Object Computing Conference Workshops, IEEE (Aug 2011) 340–349 11. OGC: ITIL Lifecycle Publication Suite Books, 2nd impression. TSO (2007) 12. Lankhorst, M.: Enterprise Architecture at Work: Modelling, Communication and Analysis. Springer, Berlin (2005) 13. Buckl, S., Ernst, A.M., Lankes, J., Schneider, K., Schweda, C.M.: A pattern based approach for constructing enterprise architecture management information models. In Oberweis, A., Weinhardt, C., Gimpel, H., Koschmider, A., Pankratius, V., Schni- zler, eds.: Wirtschaftsinformatik 2007, Karlsruhe, Germany, Universitätsverlag Karlsruhe (2007) 145–162 14. Aier, S., Kurpjuweit, S., Riege, C., Saat, J.: Stakeholderorientierte dokumentation und analyse der unternehmensarchitektur. In: GI Jahrestagung (2). (2008) 559–565 15. Lagerström, R., Saat, J., Franke, U., Aier, S., Ekstedt, M.: Enterprise meta model- ing methods – combining a stakeholder-oriented and a causality-based approach. In Halpin, T.A., Krogstie, J., Nurcan, S., Proper, E., Schmidt, R., Soffer, P., Ukor, R., eds.: Enterprise, Business-Process and Information Systems Modeling, EMMSAD 2009, Springer (2009) 381–393 16. Moser, C., Junginger, S., Brückmann, M., Schöne, K.: Some process patterns for enterprise architecture management. In: Proceedings, Workshop on Patterns in Enterprise Architecture Management (PEAM2009), Bonn. (2009) 19–30 17. Vassiliadis, P.: Survey of Extract-Transform-Load Technology. International Jour- nal of Data Warehousing and Mining 5(3) (2009) 1–27 18. White, A., Newman, D., Logan, D., Radcliffe, J.: Mastering master data manage- ment (2006) Gartner, Stamford.