Co-evolving Digital Architecture Twins Sven Jordan1 1 Group IT Solution & Enterprise Architecture, Volkswagen AG, 38440, Germany Abstract Software development in industry is getting increasingly complex as systems are getting more sophis- ticated and are often interconnected constituting the system landscape. Architecture description is therefore getting increasingly important. The necessary maintenance of description is often neglected because of different priorities due to time and budget constraints. This leads, among other things, to outdated architecture description. For a more efficient planning of the architectural landscape and pre- vention of redundancy, it is vital that architects and other stakeholders have the most current information about the systems. This paper presents doctoral research in its early stages concerned with the issue of continuous architecture recovery allowing to reflect the current architecture and evolution of the system as a digital architecture twin. The proposed approach aims to automatically extract architecture information of complex systems by recovering it from heterogeneous architectural data sources. The idea is the consolidation and integration of this recovered architecture information at different points in time to enable the representation of the systemand its evolution. This permits the use of an architecture information query language facilitating different use cases (e.g., support of architectural design decisions or tailored architecture description). Planned contributions are the assessment and consolidation of heterogenous information sources and the application of architecture recovery methods with the note- worthy addition of versions over time of those information sources and the creation of a co-evolving digital twin. Keywords Architecture recovery, Digital twin, Architectural design, System landscape recovery 1. Introduction and problem statement One of the main problems in software architecture evolution and maintenance is the low quality and even non-existence of architecture description (e.g., architecture models) as systems evolve and increase in complexity, and are adapted to environments, technology or customer requirements. Evolution of a system should entail evolution and maintenance of its description, as otherwise the description does not reflect the actual system anymore, resulting in a decrease of quality and usefulness of the architecture description. Yet, the creation and maintenance of architecture description is linked with high effort (time and costs) as it is a primarily manual task. However, an updated description is a key driver for an architect to understand a system, comprehend dependencies and decide on future enhancements. Furthermore, stakeholders require different views [1] on a system at different levels of abstraction or granularity. It adds to the effort to keep the quality of and the description itself consistent considering the different views and abstraction levels resulting in even higher costs. To counter the problem of ECSA’21: 15th European Conference on Software Architecture, September 13–17, 2021, virtual " sven.jordan@volkswagen.de (S. Jordan) © 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4 .0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) orphaned documentation and decreasing description quality, software architecture recovery [2] methods are used. Architecture recovery is referred to as methods and processes for retrieving architecture information from an implemented system and associated data sources. This recovered architecture information reflects the current state of a system. However, as the system evolves, so does the architecture (information). This leads to a need for a continuous process of architecture recovery, which considers heterogeneous data sources and versions over time to reflect the system as accurately as possible. We intend to automatically recover, consolidate and integrate architecture information from heterogeneous data sources. This architecture information will be mapped into a unified architecture information model, which we consider a digital architecture twin representing a system. As the system evolves, this digital twin needs to continuously co-evolve with the system. We further intend to develop an architecture query language able to retrieve architecture information from the digital architecture twin to support architecture design decision making, the identification of prevailing architectural patterns or the creation of tailored architecture description. 2. State of the art and open challenges Automated architecture recovery approaches range from static to dynamic methods using differ- ent techniques like structural clustering, concern-based clustering, or interactive exploration to extract and recover different layered architecture information and employing input parameters like the implemented system (e.g., as source code). Approaches like ACDC [3], WCA [4] or LIMBO [5] belong to the clustering methods retrieving clusters representing subsystems based on structural information. Concern-based approaches like ARC [6] or RELAX [7] add concerns to the clustering approach yielding precise and comprehensible clusters with context. Evolution- based approaches consider the evolution of a software system (e.g., using source code or issue management tools) taking a system’s legacy into account [8] to recover architecture design decisions. Interactive methods like the Grounded Theory approach [9] focus on a more general approach to recover architecture information. It is described as a human intensive and relatively costly process, which has the benefit of being as general as possible, therefore applicable to almost every system. Proposed workbench approaches are Rigi [10] or ARCADE [11]. These approaches enable interactive exploration as they extract data and reconstruct architecture information and architecture views of a system. Even though these methods produce valuable architecture information, they tend to be laborious, require manual effort, or operate on single viewpoints of a system. These are open challenges for software architecture recovery: (1) the identification of possible data sources in a complex and heterogeneous system landscape, (2) the combination of heterogeneous data sources for the purpose of architecture information recovery, (3) the consolidation of available and recovered information in a unified and integrated data model, the digital architecture twin, and (4) the co-evolution of architectural information with the actual system over time, resulting in architecture description suited for the needs of the architect and different stakeholders [12, 2]. We intend to combine existing architecture recovery methods and to automatically integrate the results in a digital twin, thus providing an extensive overview of the system using different views. Moreover, we perform these methods continuously, leading to evolving, version-aware architecture information about the system, preventing architectural information decay. 3. Proposed solution The idea of the approach is to automatically create and co-evolve a Digital Architecture Twin (DArT) of heterogeneous and evolving systems. In general, a Digital Twin is a virtual repre- sentation of a physical or non-physical object (Physical Twin) or process often used in the digitalization of cars or engines and enabling the exchange of data and information between the Digital Twin and Physical Twin. This allows to effectively simulate situation and adaptions without tinkering with the real world image, which could result in high costs or unfavorable failures [13]. DArT automatically recovers, consolidates and maintains comprehensive architec- tural information from heterogeneous data sources as an unified architecture information model. Whenever a data source (e.g., source code) changes, the recovered information is updated in the DArT. The co-evolving DArT is extended by integrating versions of a system over time, incorporating the evolution and current status of the system in the DArT. Co-Evolution Architecture information Incremental updates Recommendation system Virtual representation of the Digital Twin for architectural design decisions Coordination of heterogeneous Architecture Information data sources Recovery Services Recovered and Pattern and Style Discovery Update- consolidated information mechanism Digital Architecture Data Collection Integration & Architecture Architecture information Agents Consolidation module query language Twin Tailored Documentation sources Versioning of the Digital Architecture Twin System Persistence of Automated continuous the Digital Twin compliance checking and quality assessment Figure 1: Digital Architecture Twin Generation Process The Digital Architecture Twin generation process, shown in figure 1, begins with the collection of architectural data from architecture information sources using Data Collection Agents (DCA). Next is the provisioning of this data for the architecture recovery methods implemented as Architecture Information Recovery Services (AIRS). For this, we combine proven architecture information recovery approaches leveraging different data sources. We integrate and consolidate the results provided by the architecture recovery methods into a unified architecture information model based on meta models representing the Digital Architecture Twin to obtain an overarching representation of the system architecture. When the system evolves (e.g., source code changes), the DCA and AIRS update the existing architecture information maintaining information of old versions. For this, the update-mechanism triggers the AIRS either automatically or periodically, depending on the source, to retrieve the current architecture information. This updates the DArT incrementally and keeps it up to date with the evolving system. An open challenge is to ensure that the DArT is conform with the system. To use the collected architecture information, an architecture query language will be developed to dynamically retrieve architecture information of different versions, views or abstractions levels of a system and its architecture information. The DArT in combination with the architecture query language enables to dynamically query information that can be tailored to the specific requirements of developers, architects and other stakeholders at the desired abstraction level and system version. An optional visualization of the dynamic queries and views shall result in human-readable architecture description. Potential use cases of the DArT are: guided architecture design via architecture recommenda- tion and continuous compliance checking to prevent architecture drift. Guided architecture designs allow based on specific questions and similarity matching of existing architecture information tailored architectural proposals enabling a consolidated IT landscape. Automated and continuous compliance checking monitors whether the actual system has diverged from the planned design (software architecture drift/erosion). Recovered architecture information (as-is) can be compared to the explicitly documented system architecture (as-planned) in order to detect and counteract increased erosion at an early stage. 4. Research method The process of the doctoral research is displayed in Fig. 2. The first step is a systematic literature review concerned with architecture recovery methods to get a thorough overview of existing approaches, their potential use cases and benefits as well as their limitations. The second step is the identification of potential data sources for the extraction of architecture data, which can be used to recover architecture information employing architecture recovery methods. The third step is the implementation of suitable architecture recovery methods. The fourth step is the creation of the DArT by consolidating the recovered architecture information into an architecture information model built specifically for the integration of static, dynamic and deployment information. The fifth step is the development of an architecture query language built for the retrieval of tailored, stakeholder-dependent architecture information employing the DArT. The sixth step is the conduction of case studies and expert interviews, which are performed iteratively, to evaluate the benefits and understand possible customization of the approach. This evaluation is done using a prototype, which will be developed to extract the architecture and evolution information which enables the generation of the DArT. Research question: "How to automatically generate the architecture description of an evolving system?" Creation of Identification of Implementation of Architecture query Case studies/ SLR Digital data sources DCA and AIRS language Evaluation Architecture Twin Figure 2: Research method for the dissertation proposal 5. Expected contributions and future work This work contributes the following: (1) development of the conceptual idea of the DArT for the description of a system, (2) consolidation of available and recovered architecture information in a DArT, (3) co-evolution of the DArT with the system using incremental updates featuring heterogeneous architecture artifacts from different points in time and (4) development of a query language capable of retrieving architecture information for tailored stakeholder perspectives using the DArT. We plan to evaluate our approach by developing a prototype of the proposed approach, applying it to open source applications and real-world applications in industry. Furthermore, we intend to perform expert interviews to gather feedback on the idea and approach. Possible limitations of the approach can be the resource- and time intensive architecture recovery process, leading to time shifted description of the analyzed system, devaluing the digital twin. Another limitation is the difficult integration of heterogeneous sources and the consolidation of potentially contradicting information extracted from different sources. Future work comprises of the development of the exchange from DArT to system. References [1] P. Kruchten, The 4+1 view model of architecture, IEEE Softw. 12 (1995) 42–50. [2] T. Lutellier, D. Chollak, J. Garcia, L. Tan, D. Rayside, N. Medvidovic, R. Kroeger, Comparing software architecture recovery techniques using accurate dependencies, in: ICSE (2), IEEE Computer Society, 2015, pp. 69–78. [3] V. Tzerpos, R. C. Holt, ACDC: an algorithm for comprehension-driven clustering, in: WCRE, IEEE Computer Society, 2000, pp. 258–267. [4] O. Maqbool, H. A. Babri, The weighted combined algorithm: A linkage algorithm for software clustering, in: CSMR, IEEE Computer Society, 2004, pp. 15–24. [5] P. Andritsos, V. Tzerpos, Information-theoretic software clustering, IEEE Trans. Software Eng. 31 (2005) 150–165. [6] J. Garcia, D. Popescu, C. Mattmann, N. Medvidovic, Y. Cai, Enhancing architectural recovery using concerns, in: ASE, IEEE Computer Society, 2011, pp. 552–555. [7] D. Link, P. Behnamghader, R. Moazeni, B. W. Boehm, Recover and RELAX: concern- oriented software architecture recovery for systems development and maintenance, in: ICSSP, IEEE / ACM, 2019, pp. 64–73. [8] A. Shahbazian, Y. K. Lee, D. M. Le, Y. Brun, N. Medvidovic, Recovering architectural design decisions, in: ICSA, IEEE Computer Society, 2018, pp. 95–104. [9] D. A. Tamburri, R. Kazman, General methods for software architecture recovery: a potential approach and its evaluation, Empir. Softw. Eng. 23 (2018) 1457–1489. [10] H. A. Müller, S. R. Tilley, K. Wong, Understanding software systems using reverse en- gineering technology perspectives from the rigi project, in: CASCON, IBM, 1993, pp. 217–226. [11] M. S. Laser, N. Medvidovic, D. M. Le, J. Garcia, ARCADE: an extensible workbench for architecture recovery, change, and decay evaluation, in: ESEC/SIGSOFT FSE, ACM, 2020, pp. 1546–1550. [12] G. Canfora, M. D. Penta, L. Cerulo, Achievements and challenges in software reverse engineering, Commun. ACM 54 (2011) 142–151. [13] E. Negri, L. Fumagalli, M. Macchi, A review of the roles of digital twin in cps-based production systems, Procedia Manufacturing 11 (2017) 939–948.