OpEx Driven Software Architecture a case study Sebastien Andreo1 , Ambra Calà1 and Jan Bosch2 1 Siemens AG Technology, Erlangen, Germany 2 Department of Computer Science and Engineering, Chalmers University of Technology, Göteborg, Sweden Abstract In the last thirty years, the software industry has changed how systems are architected and how systems are distributed. Software is moving from a monolithic architecture and locally installed application to micro-services architecture and applications accessible through the internet. The accessibility over the internet is provided by the emergence of cloud providers like Amazon AWS, Microsoft Azure, or Google GCP. This transformation also impacts the financial structure of software projects, which is moving from capital expenditure (CapEx) to operational expenditure (OpEx). This paper highlights the implication of architecture decisions on a cloud application’s operating cost based on two industrial case studies. Keywords Operational Expenditure (OpEx), Software Architecture, Cloud-based Application, Financial Model 1. Introduction In the last thirty years, the software industry moved over different architecture paradigms, from monolith systems to distributed monolith systems, internet-connected systems, and finally, in the last ten years, to internet native systems. The last transformation was possible with the accession of cloud computing to a defacto state-of-the-art technology. As [1] presented, we observed at the beginning of 2010 that organizations started to migrate their internal applications or products to different cloud providers to benefit from the scalability and availability of the infrastructure. Another important motivation was the promise of the cost-effectiveness of cloud computing. We observed a transition from infrastructure as a service (IaaS) business to the platform as a service (PaaS) business from the cloud provider side. With PaaS, the cloud providers deliver more building blocks to speed up the development of new functionality and to enable product vendors to operate their products as software as a service (SaaS). This transformation is not insignificant for the financial structure of software development projects. Previously the balance between CapEx and OpEx tilted to capital expenditures. Indeed, all the development and testing costs have to be handled by the software producer. In contrast, almost all operational expenses (servers, installation, and update) had to be carried by the software consumer. Now with SaaS, the software products run in the cloud provider’s data centers and are based on more complex billing calculations (e.g., pay per use). Thus, OpEx increases for product vendors, and the choices ECSA’21: 15th European Conference on Software Architecture, September 13–17, 2021, Växjö, Sweden Envelope-Open sebastien.andreo@siemens.com (S. Andreo); ambra.cala@siemens.com (A. Calà); jan.bosch@chalmers.se (J. Bosch) © 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) of the building blocks assembled to build the product can have a high impact on the product’s financial health. If an organization or practitioners do not systematically consider the OpEx issue, several risks may realize. As the OpEx is a financial indicator, the software system developed by the organization can create pressure on the business model of the organization itself. Indeed, an increase in OpEx will automatically increase the OpEx-to-Sales ratio. The OpEx-to-Sales ratio is an essential insight for the organization to evaluate the organization’s profitability. It represents how much each sale costs for the organization. For example, if an organization sells a functionality for 20$ per month per user, and the OpEx is around 15$ per month per user, the Opex-to-Sales ratio obtained is 0.75. The more this ratio increases, the lower the profitability presented by the organization will be. In other engineering disciplines, the operational cost aspects are taken into account early in the design phases. For instance, in the design of a power plant, the operating costs are calculated before the construction team builds it, and the design is optimized to minimize these. If CapEx represents the first investment in a project life-cycle, OpEx is a recurring cost. Every year, the owner of the system or the service provider will have to support those costs. However, in the software industry, technologies are the main driver for software system architects, while the financial aspect is usually secondary or even completely ignored. With the evolution of cloud-based software applications, OpEx costs cannot be neglected anymore. It is also essential to notice that OpEx costs can vary significantly depending on product usage. By considering operational costs from the early stage of software architecture design, the software provider’s financial risk will be reduced. This paper aims to illustrate how software architecture and design decisions impact the OpEx and, consequently, have a crucial role in maintaining cloud applications’ financial health. We want to raise the awareness of software practitioners on the OpEx challenge based on two industrial case studies. The paper is organized as follows. The next section briefly presents the background, while section III presents the methodology used to execute this study. Section IV highlights the problem within two case studies. Then section V discusses the implications for practitioners and researchers. Finally, we conclude the paper in section VI. 2. Background Although the relevance of system life cycle cost has been well known in engineering for many years, many complex systems are still planned, designed, produced, and operated, neglecting their total cost throughout the entire life cycle. Usually, the technical aspects are considered first, with the economic aspects deferred until later [2]. However, the increasing complexity of systems calls for an earlier estimation and analysis of life cycle costs. The systems engineering approach, which typically focuses on complex systems, addresses cost estimation and analysis at the beginning of the product development process to identify and quantify risks and eval- uate competing systems’ initiatives, proposals, and trade-offs. Models for the estimations of economic attributes of system architecture, such as Total Cost of Ownership (TCO) and Return of Investment (ROI), have been largely investigated in the systems engineering community [3]. The life cycle cost perspective has proven to be most meaningful during the design phase, where the possibilities of cutting down the costs related to operations and maintenance are large [4]. Indeed, despite the insufficient detailed design information, life cycle cost estimates prepared early at the concept design stage can support the project team in analyzing the cost impact of alternatives and making trade-off decisions throughout the system life cycle. In recent years, OpEx models and tools have been developed in different domains, such as the oil and gas industry [5], [6] or building industry [7]. The parallel between the software industry and the oil and gas industry can be surprising as, for instance, the lifetime of a system is grandly different. Building a refinery is not for a couple of months or years, and on the other hand, considering mobile App development tends to born, evolve, and die rapidly. However, there is also another kind of software that should provide long-term support, like energy management software and factory automation. For those types of software with long period support, the same consideration as OpEx should apply. Managing the cost of the Cloud is not new. [8] proposes an analysis method and tool for the calculation of Cloud TCO and utilization cost. [9] proposes a System-of-Systems (SoS) approach to model cloud infrastructure and services and analyze them from a techno-economic perspective. Another approach is provided in [10] as the TCO model for Cloud Computing Services addressing the indirect and hidden costs of Cloud Computing. However, as we move to intelligent and connected systems, software architecture trends will change, including their cost estimation models. Those new trends have been defined in [1], which highlights the change of financial models and budgets from CapEx biased to OpEx biased. The trend towards an economy based on OpEx cost models is also addressed by [11]. The authors propose a methodology of modeling a migration in the Cloud at minimal cost or time and claiming the method’s applicability to any similar IT project transitioning from CapEx to OpEx. [12] presents case studies putting in evidence the link between architecture decisions and cost as well as the complexity of choosing a cost-effective architecture with the increase of more complex billing models. [13] proposes an integrated framework to estimate the cost of hosting a SaaS application and can be considered a first contribution to linking the architecture and the OpEx. [14] proposes a cost estimation method that uses DAG-based representation and matrix operations to obtain a rapid but not accurate estimation of costs. Despite the many studies on cloud cost estimation, the software architecture-related factors are too often neglected. Most of the studies concentrate on the operational IT costs or on the cost of migrating to the Cloud. The cost of the system architecture and the technological decisions are mainly ignored. Ten years after the accession of cloud computing to a defacto state-of-the-art technology, there is a lack of methods and tools to help software practitioners design their applications regarding costs. 3. Research Method This study covers a place at the border of two domains: the technical domain related to system architecture and the accounting domain associated with the operational cost. In business financial accounting, the notions of Capital expenditure (CapEx) and Operational expenditure (OpEx) refer to categorizing the costs to realize some financial optimization. Our research uses the terms not at a business entity level (firm, organization) but at a software product level. In that way, we propose an interpretation of CapEx and OpEx considering the cost categorization as described in [15]. As a definition basis for CapEx and OpEx, we use those provided by [16]. CapEx refers to the expenses a business incurs to create benefits in the future. For instance, the development of a new feature (R&D costs) is a CapEx. OpEx refers to expenses a business incurs in its day-to-day operations. For example, the run-time costs of the cloud application fall into the OpEx category. As the research topic: the impact on OpEx of software architecture and design decisions, is under-researched, we conducted this study using an inductive approach. We performed the observation on a real industry system and designed two case studies. The selection of the two case studies was based on typical software architects’ activities which fit into two categories: engineering and re-engineering. The design decisions were not influenced by the researcher but were in total control by the development team. Data Collection. We collected primary quantitative data as the cloud provider offered billing information. The filtering of the resource costs was guarantee by the resource tagging mecha- nism of the cloud platform. Before analysis, the gathered data was prepared. The data set was scanned for missing data and outliers or incorrect tagging. We iterated several times to refine the data set. Data Analysis. This research focuses on the cost aspect of an architectural decision. Still, while architecting a system, other non-functional requirements have to be considered. The financial results of the different designs were always put into relation to expected non-functional requirements. Each solution was proposed and accepted by the product owner about functional and non-functional requirements. 4. Case study This section presents two case studies. In both cases, the operational costs will be a criterion for the architecture design decisions. The engineering case is developed in section 4.1 envisaging two deployment alternatives for a data analyst workplace, whereas 4.2 proposes a re-engineering scenario of the storage mechanism. The system under consideration for this case study provides a data analysis environment to develop algorithms that extract insight from measured data like electrical component aging or electrical component losses and visualization for the energy domain. The data analyst has access to several data sources like plant, factory, or building topology, parametrization of functions, signals stored in a time-series manner. Moreover, several Python libraries are used to plot, transform, and save the user’s algorithm results. The application is a cloud application; therefore, all alternatives will compare cloud architectures. As the application is a real product, functional and non-functional requirements will also be addressed but are out of scope in the paper. 4.1. Case 1: Self-managed service vs. managed service A typical user of the application is a data analyst who wants to develop an algorithm in a convenient IDE. Therefore, Jupyter [17] has been chosen as an environment well established for data scientists. It is essential to understand how the users typically interact with the application. We defined a typical usage based on the following assumptions. A user works several hours a day (around 8h). All users are located in the same time zone. The computational power required for the algorithms does not require a specific machine setup. In this case study, we did not specify the system’s scalability requirement in terms of max user count. We kept the focus on our current load, 5 to 20 parallel users. Two deployment alternatives have been envisaged to implement the desired function. On the one hand, a self-managed Jupyter using Kubernetes [18], and on the other hand AWS Sagemaker, AWS managed Jupyter service [19]. Figure 1: Case 1, Alternative 1 - Architecture overview 4.1.1. Alternative 1. This alternative proposes a self-managed solution of Jupyter. Self-managed solutions give the owner full control of how the application is managed. However, availability and scalability requirements, for example, have to be implemented. Jupyter can run on a laptop or be deployed on servers, and naturally, there is no restriction for a self-managed solution in the Cloud. [18] describes how to deploy jupyter with the help of Kubernetes, and Figure 1 depicts the IT architecture implemented in AWS by the development team to host and manage the Jupyter environment. The different AWS building blocks required to implement the IT architecture follow several billing models: per million queries/month for Route 53, free for the internet gateway, per GB/month for the NAT Gateway, per hour and per GB transferred for the Elastic Load Balancer and finally per hour for the EC2. 4.1.2. Alternative 2. Unlike the previous alternative, Alternative 2 proposes a managed solution to deliver the Jupyter environment. Managed services or software as a Service (SaaS) present the advantages and disadvantages as described in [20]. For instance, SaaS tends to reduce installation and maintenance as such activities are part of the service agreement or allows us to focus on our core business instead of technical aspects. However, it also increases the risk of lock-in and reduces customization capabilities. In our case, AWS Sagemaker provided the main component, the Jupyter environment, and the level of customization was sufficient for our needs. Figure 2 depicts the architecture based on AWS Sagemaker Service. The different AWS building blocks required to implement the IT architecture follow both a per-usage billing model. For AWS Sagemaker, it is expressed in per hour/month and per hour/month, and per GB transferred for the NAT Gateway. Figure 2: Case 1, Alternative 2 - Architecture overview 4.1.3. Analysis. Both solutions meet the expected functional criteria. If we make a deep dive into each alterna- tive’s cost structure, we can separate the total cost into fixed and variable costs. Indeed some building blocks generate costs, even if nobody uses the application. Such components issue fixed costs in the OpEx. Like the user’s EC2 instances, other components are dependent on the application’s usage, i.e., variable costs. The cost distribution for each alternative is defined as follows. The first alternative presents a fixed cost of 400$/month and 7.4$/user/month, and the second alternative presents a fixed cost of 40$/month and 8.5$/user/month. To be able to compare the variable costs of each alternative, we defined a typical user profile as following: a typical user works 8h per day and 20 workdays in a month. The virtual machine types used to execute the application are comparable in terms of cost and capabilities. As a result, we observed a factor ten (10) between both alternatives for the fixed cost and nearly equivalent values for the variable cost. In terms of comparison for the same OpEx, Alternative 1 can deliver the functionality for one user, but Alternative 2 can deliver the same functionality to more than 40 users. 4.2. Case 2: Re-engineering The application has a connectivity layer that facilitates sensor data collection. The data are collected from different systems, IoT, or cloud systems. Ultimately, its role is to store those data in a structured way to enable a data scientist to extract insight with the development of algorithms. This second case focuses on the re-engineering of the storage system. The product was using the storage architecture described in the next section, and from a functional and non-functional point of view, no issues were discovered. However, as the team on-boards more and more plants, we observed a drastic increase in our storage costs. To meet the product owner targets of minimum costs, we investigated new technological and architectural alternatives to push the costs down and keeping the functional and non-functional requirements at the same level. In the next sections, we describe the legacy architecture and then present the re-engineering result to highlight how a technical decision issued an important reduction of the OpEx. 4.2.1. Legacy storage architecture. Figure 3 shows the architecture of the legacy storage system. The data received from the plant are stored in an S3 bucket within an HDF5 file - a format designed to store and organize large amounts of data. This format is an advantage for our data structure, which may contain thousands of signals. The Panda library used to read the data proposes an HDF5 Reader, but those files cannot be directly read from an S3 Bucket. The EFS component is then used to solve the problem by replicating the data from S3 and providing direct access from the Jupyter notebook. Figure 3: Case 2, Legacy architecture - System context diagram The AWS components used in this architecture, S3 and EFS, follow a GB/month billing model. We excluded the Sagemaker as it is not part of the re-engineering part. Both components used for storage have a billing model based on the size of the data stored. 4.2.2. New architecture. Figure 4 presents the revisited architecture of the storage system. The data received from the plant are stored in an S3 Bucket. At that point, the storage file format moved from HDF5 to Parquet. This change’s motivation was the ability to read directly from the S3 Bucket using the Panda library and the high compression level of the Parquet file format [21]. The billing model of the unique component S3 is calculated in GB/month. We exclude the Sagemaker as it is not part of the re-engineering part. S3 pricing stays used for storage has a billing model based on the size of the data stored. 4.2.3. Analysis. Both solutions meet the expected functional criteria. Despite the IO performance difference between S3 and EFS, the users did not perceive tangible differences in accessing data from their Jupyter notebooks. The variable costs, expressed in $/GB/month, evolved from 0,3845 for the Figure 4: Case 2, Re-engineered architecture - System context diagram. legacy system to 0,0245 for the Re-engineered. The prices are extracted from the AWS price list for eu-central-1 data center, and we can notice a factor of 15.6 between both solutions in terms of cost. Table 1 Second case alternatives storage cost evolution between June 2019 and April 2020 Jun Jul Aug Sep Oct Nov Dec Jan Feb Mar Apr EFS ($) 482.2 496.6 519.6 559.6 584.2 641.3 697.2 724.2 169.1 8.8 8.8 S3 ($) 37.6 41.4 27.2 34.7 41 50 67.3 70.6 101.5 92.8 99 Total ($) 519.8 538 546.8 594.3 625.2 691.3 764.5 794.8 270.6 101.6 107.8 As the first alternative was already implemented, and we performed a re-engineering, we present the storage costs’ evolution over 11 months in Table 1. The migration to the second alternative took place between January and March. We can observe a significant reduction of the cost of a factor of 7,8. The remaining EFS costs for March and April are due to the usage of this AWS component for another purpose than storage. 5. Discussion In this study, we demonstrate that OpEx has a large impact on the software project’s financial structure and architecture decisions take a prominent role in controlling the OpEx evolution. Therefore, the cost of architectural or technological decisions should receive much more atten- tion when architecting for the Cloud. The presented cases, although simple, show real industrial applications. The cost impact in both cases was almost an order of magnitude reduction in cost, which cannot be neglected. As the two cases covered different aspects of a system - the first one, an application deployment strategy, and the second one a storage strategy - we are confident that the revealed effect can be generalized to other dimensions of the system like messaging or horizontal scalability. Furthermore, the cost aspect appears as a new quality attribute for the architecture, as for both cases, all architectures were delivering the expected features and the expected performances. Our experience shows that the presented cases reveal several gaps in the architecture method- ology toolbox. The first one is the capacity to map the language or domain model of an application to a cloud provider’s language. This mapping is essential to understand and express the link between the application, for example, storing signals with a specific sampling rate and the cloud resources, dealing with the number of transactions and storage size. This first method is a prerequisite to the next one: cost function definition. In software engineering for a cloud application, the architecture abstracts the code itself and the cloud resources or IT infrastructure. Defining the cost function of a component of an architecture, based on the application language and the cloud resources used to realize this component, would help the architect consider the cost aspect during the design. There are also some organizational issues to address. For almost twenty years, the agile manifesto advocates the necessity to collaborate more closely with the customers to impact the customer experience positively. As we illustrate in this paper, the software distribution model has changed from selling a product to selling services. We argue along with this paper the importance of considering the OpEx at each phase of the development. Consequently, the software architect should also improve collaboration and exchange with the sales group to understand the selling strategy, the rate plans, and the financial indicator to make the right decision and avoid the decoupling between the business need, the financial constraints and the technology. To summarize, software architects do not have systematic methods to analyze the cost impact of choice. We think that the interdependency between the business need, technological decision, and cost must be studied in-depth to guarantee the software systems’ financial sustainability. 6. Conclusion In this paper, we highlighted, based on two case studies, the impact of architecture decisions on the OpEx of a software system. This impact should not be neglected, as we demonstrated a difference in cost of around a factor of eight (8) for the two case studies. Other engineering domains have already developed methods and tools to better address operational costs, but there is a lack of awareness about that issue and a lack of methods and tools in the software industry to ensure revenue by design. Since the transformation of applications to cloud-native applications will continue, it is crucial to close the methodology gap and tools gaps. The integration of the necessary modeling activity in the always-moving DevOps and lean culture transition should be taken into account to guarantee a broad adoption of such methodology in the practitioner community. References [1] E. Woods, Software architecture in a changing world, IEEE Software 33 (2016) 94–97. doi:1 0 . 1 1 0 9 / m s . 2 0 1 6 . 1 4 9 . [2] Blanchard, B.S., W.J. Fabrycky, Systems engineering and analysis, 5th ed. Prentice Hall International Series in Industrial and Systems Engineering (2011). [3] M. Nikolaidou, C. Michalakelis, Techno-economic analysis of sysml models, in: 2017 IEEE International Systems Engineering Symposium (ISSE), 2017, pp. 1–6. [4] E. Sterner, Life-cycle costing and its use in the swedish building sector, in: Building Research and Information, volume 28, 2000. [5] F. Verre, A. Giubileo, C. Cadegiani, Asset lifecycle opex modelling with montecarlo simulation to reduce uncertainties and to improve field exploitation, in: 4th EAGE North African/Mediterranean Petroleum and Geosciences Conference and Exhibition Tunis 2009, European Association of Geoscientists and Engineers, 2009. [6] A. Boccardi, A. Giubileo, C. Cadegiani, New software for opex estimation: Cost driver estimation (code) tool with montecarlo risk analysis simulation, in: SPE Production and Operations Conference and Exhibition, 2010. doi:1 0 . 2 1 1 8 / 1 3 3 5 5 5 - m s . [7] D. Geekiyanage, T. Ramachandra, N. Thurairajah, A model for early stage estimation of operational expenses (opex) in commercial buildings, in: Proceeding of the 34th Annual ARCOM Conference, 2018, pp. 617–626. [8] X. Li, Y. Li, T. Liu, J. Qiu, F. Wang, The method and tool of cost analysis for cloud computing, in: 2009 IEEE International Conference on Cloud Computing, 2009, pp. 93–100. doi:1 0 . 1 1 0 9 / C L O U D . 2 0 0 9 . 8 4 . [9] E. Filiopoulou, P. Mitropoulou, A. Tsadimas, C. Michalakelis, M. Nikolaidou, D. Anagnos- topoulos, Integrating cost analysis in the cloud: A sos approach, in: 2015 11th International Conference on Innovations in Information Technology (IIT), 2015, pp. 278–283. [10] B. Martens, M. Walterbusch, F. Teuteberg, Costing of cloud computing services: A total cost of ownership approach, in: 2012 45th Hawaii International Conference on System Sciences, 2012, pp. 1563–1572. doi:1 0 . 1 1 0 9 / H I C S S . 2 0 1 2 . 1 8 6 . [11] T. Antohi, Model for cloud migration cost, 2019 6th IEEE International Conference on Cyber Security and Cloud Computing (CSCloud)/ 2019 5th IEEE International Conference on Edge Computing and Scalable Cloud (EdgeCom) (2019). doi:1 0 . 1 1 0 9 / c s c l o u d / e d g e c o m . 2019.00014. [12] U. Hohenstein, R. Krummenacher, L. Mittermeier, S. Dippl, Towards cost aspects in cloud architectures, in: I. I. Ivanov, M. van Sinderen, F. Leymann, T. Shan (Eds.), Cloud Computing and Services Science, Springer International Publishing, Cham, 2013, pp. 117–134. [13] P. Rosati, F. Fowley, C. Pahl, D. Taibi, T. Lynn, Making the cloud work for software producers: Linking architecture, operating cost and revenue, in: CLOSER, 2018. [14] T. Aoshima, K. Yoshida, Pre-design stage cost estimation for cloud services, 2020 IEEE 44th Annual Computers, Software, and Applications Conference (COMPSAC) (2020) 61–66. [15] Planview, Inc., What is capex vs opex? - agile costing, 2020. URL: https://www.planview. com/de/topics/CapEx-vs-OpEx/. [16] Software Advisory, A beginners guide to capex vs opex, 2020. URL: https://www. softwareadvisoryservice.com/en/whitepapers/a-beginners-guide-to-capex-vs-opex/. [17] J. M. Perkel, Why jupyter is data scientists’ computational notebook of choice, Nature 563 (2018) 145–146. doi:1 0 . 1 0 3 8 / d 4 1 5 8 6 - 0 1 8 - 0 7 1 9 6 - 1 . [18] L. Resende, On-demand notebooks with jupyterhub, 2018. URL: https://blog.jupyter.org/ on-demand-notebooks-with-jupyterhub-jupyter-enterprise-gateway-and-kubernetes-e8e423695cbf. [19] D. Hudgeon, R. Nichol, Machine learning for business: using amazon sagemaker and jupyter, 2020. URL: https://aws.amazon.com/sagemaker/. [20] M. Janssen, A. Joha, Challenges for adopting cloud-based software as a service (saas) in the public sector, in: ECIS, 2011. [21] I. Zaitsev, The best format to save pandas data, 2019. URL: https://towardsdatascience.com/ the-best-format-to-save-pandas-data-414dca023e0d.