A modeling approach to cyber threat mitigation Andrei Chiș1,†, Oliviu Ionuț Stoica2,† and Ana-Maria Ghiran1,∗,† 1 Babeș-Bolyai University, Faculty of Economics and Business Administration, Cluj-Napoca, Romania 2 MassMutual, Cluj-Napoca, Romania Abstract Over the past decade, the security issues that are threatening IT systems worldwide gained increased attention. This was due to several factors and affected both enterprises and individuals. In case of enterprises, there is a popular trend among companies to give up on-premises solutions in favor of using cloud services. For both enterprises and individuals, another influential and decisive factor is the imposed legislation (ADPPA in U.S. or GDPR in EU) with respect to data privacy. Given these circumstances, more people/stakeholders should be involved in devising the security of IT systems who should be acquainted with “secure by design” principles. Given that not many of them are specialists in cyber security a solution that would help them in this matter is needed. This paper presents an approach to mitigate the cyber security threats at design phase of a system. Moreover, it can also be used in auditing an existing system. The main idea is to leverage knowledge that is expressed as diagrammatic models (e.g., dataflow diagrams or threat models created with a domain specific modeling language), which can be understood by all stakeholders of a system, both technical and non-technical. Keywords security, privacy, dataflow, threat, modeling 1 1. Introduction Nowadays, organizations must recognize the inevitability of cyber security incidents and prepare themselves to effectively respond to them. In addition to the increased number of incidents, organizations must also deal with security regulations and new reporting requirements regarding data privacy and their ability to protect customers’ data. Moreover, companies strive to satisfy increasingly higher customer expectations which involve not only delivering the right service or product but also ensuring an adequate infrastructure that enables a prompt and trustful response. The need for speed and availability of information forced companies to change their information systems from their on-premises solution to external service providers known as cloud services. This creates various benefits for companies, for their clients and their BIR-WS 2024: BIR 2024 Workshops and Doctoral Consortium, 23rd International Conference on Perspectives in Business Informatics Research (BIR 2024), September 11-13, 2024, Prague, Czech Rep. ∗ Corresponding author. † These authors contributed equally. andrei.chis@econ.ubbcluj.ro (A. Chiș); oliviu.stoica@stud.ubbcluj.ro (O. I. Stoica); anamaria.ghiran@econ.ubbcluj.ro (A. M. Ghiran) 0000-0003-0173-7250 (A. Chiș); 0000-0001-7890-9386 (A. M. Ghiran) © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings employees as it is available at any time and any place. But this trend of using cloud services brings up the need to enforce data loss prevention techniques, imposing policy controls for cloud services. Some of the policy and security measures are implemented by cloud service providers but many of them remain to be handled by the companies contracting such services. The recommended approach is to develop information systems considering “secure by design” principles. Detecting security issues in early IT system’s design enables cost- efficient fixes. But whether there is the case of migrating an on-premise system to a cloud environment or auditing an existing system on a cloud environment, more people should be involved than the technical team: employees from business’ departments, managers, suppliers and even clients or end users of a system could provide valuable knowledge of how their data should be secured and identifying the risks they might be exposed to. Therefore, a simple domain specific language must be available to be employed by any stakeholder of the system ensuring the knowledge transfer among them. The remainder of the paper is structured as follows: Section 2 provides an introduction to relevant concepts like Data Flow Diagrams (DFDs) and Threat Modeling followed by a short presentation of related works and an overview of our solution; Section 3 describes a proof-of concept of the presented solution and lastly, we summarize our contributions in conclusions. 2. Problem statement and background Today a wide range of security tools are available that can be used to scan a system for vulnerabilities and perform an analysis to detect possible mitigation strategies. These tools are indeed extremely efficient in identifying vulnerabilities; however, such tools can only scan a system for known vulnerabilities. These vulnerabilities are publicly described in vulnerability databases, for example NVD (National Vulnerability Database) [1], which represent valuable knowledge sources for everybody, regardless of their intent. While the security specialists need to identify and address every possible vulnerability of a system, a malicious actor only needs to identify one vulnerability of a system and that system gets compromised. Considering this, the addressed question is whether it is enough to scan for known vulnerabilities (which is done mainly after the system is implemented). Having this in mind, a better approach is to put prevention first place, more specifically, to develop systems driven by secure by design principles. This paper presents a solution based on conceptual models for identifying, communicating and understanding threats and mitigations of a system at design time. Our approach uses data flow diagrams to represent the flow of data through business processes, threat modelling to describe possible risks associated with the components of a system. The created models can be integrated with other architectural descriptions of the system enabling a better understanding of their interconnections. In [2], authors have studied the importance of cyber security and established some parameters of cyber security: threat identification, vulnerability identification, access risk exploration, creating a contingency plan, respond to cyber security incident. The key points in cyber threat mitigation are “vulnerability identification” and “threat identification” – what the system exposes versus what the system is exposed to. To identify vulnerabilities, one needs to have knowledge of how the information flows through a system or a cluster of systems. For this, a good representation of how data travels is required. A widely accepted solution is a modeling representation based on a data flow diagram (DFD) [3]. There are established threat methodologies like STRIDE [4], that use DFDs when designing a system to identify those threats that violate security requirements like confidentiality, integrity, availability, authentication and non-repudiation [5]. However, DFDs are mainly used at design time of a systems (as the next sections shows) and lack an explicit connection with other data (i.e. representations of data to be used or processed). Our approach distinguishes from other approaches that use modeling techniques to represent the security threats and mitigations for a system by enabling the created models to be linked with other data elements that are only available at run time. 2.1. Data Flow Diagrams A data flow diagram (DFD) [6] is a graphical representation of an IT system but in relation with business processes. It shows the flow of data through different components of the system as well as their interactions. Although DFDs have not been standardized, adopters of DFDs have consistently employed similar concepts in their implementations. There are four basic concepts in a data flow diagram and their most used graphical representation is shown in Table 1. Table 1 Data flow diagram concepts and graphical representations Symbol Concept Name External Entity, User or system Data Flow Process Data store The adopted definitions of the data flow diagram components are: A process is an activity or a function which transforms data, and it is performed for a specific business-related reason. A data flow is a link or connector data between processes, data stores, systems, users or other kind of external entities. A data store is a collection of data or information that is stored in a physical device. An external entity can be a user, a person, a system, an organization, or any other kind of entity that is external to the system and interacts with it. Data flow diagrams are widely used in secure by designed analysis, especially in threat modeling. Collecting and storing information in conceptual models a manager, which normally is not a cyber security expert, can conduct an audit of the system with the cyber security department or external security specialists at much ease and speed. Security specialists can provide technical information about the system (e.g. what security measures are needed) while other business executives can provide information about business strategies, business wise or enterprise IT architecture. Instead of DFD for system’s representation, one can use BPMN (Business Process Model and Notation) [7], which also provides symbols for specifying the flow of activities in a process and it includes support for so-called data objects and data store references. However, some authors [8] differentiate between DFD and BPMN as the former is more concerned in capturing the data movement (hence is more suitable to be used during the analysis phase of the systems development life cycle- SDLC), while the latter is more appropriate in describing the activities that need to be executed in a process (hence it is more suitable to be used during the design phase of SDLC). 2.2. Threat modeling While DFDs provide insights into how data flows through a system, they might not be sufficient on their own to comprehensively address security concerns. Threat modeling can be seen as an engineering technique that helps to identify, communicate and understand threats and mitigations within the context of protecting a system and its information [9]. A threat model is a structured process with four main objectives: identify security requirements, pinpoint security threats and potential vulnerabilities, quantify threats and vulnerabilities and prioritize remediation methods. It can be seen as a structured representation of all the information that affects the security of a system. In essence, it is a view of the system and its environment through the lens of security. There are various methodologies for conducting threat modeling [10], each with its own strengths and weaknesses. STRIDE [4] is a threat modeling methodology developed by Microsoft that identifies six types of threats, that provide the name of the methodology: • Spoofing identity: when an agent impersonates somebody else, for example when using the authentication credentials of someone else • Tampering with data: when an attacker changes the data during its transit over the network or when the data is at rest on disk storage or memory. • Repudiation: when an actor denies actions in a system. • Information disclosure: when an attacker violates confidentiality, getting access to information without authorization or stealing information. • Denial of service: when an attacker is exhausting a system’s resources to interrupt its availability. • Elevation of privilege: when actions that are not authorized are performed in a system. An analyst can assign a set of susceptible threats for each element of the DFD. For example: • Spoofing threats are expected to be added for Processes and External Entities components in a DFD • Tampering threats are affecting Processes, Data Stores and Data Flows, • Repudiation threats should be defined for Processes, External Entities and Data Stores, • Information Disclosure threats need to be added for Processes, Data Stores and Data Flows • Denial of Service threats need to be added for Processes, Data Stores and Data Flows • Elevation of Privilege threats must be identified for Process components. In order to easily identify concreate threats for each category, the analysts use the predefined catalogue of security threat trees that is provided by the STRIDE methodology. Then, for each threat an appropriate security risk level must be determined in order to be able to sort them and come up with proper countermeasures. STRIDE categorizes the threats from the attacker point of view, as opposed to identification and categorization from a defensive perspective which is the focus of Application Security Framework (ASF) [11]. The threats in ASF are Authentication, Authorization, Configuration Management, Data Protection in Storage and Transit, Data/Input Validation, Error Handling & Management, Session Management, Auditing & Logging. From a defensive perspective, these might be considered as threats due to weaknesses that they can introduce in a system. Similarly to STRIDE, each of the threats in the ASF framework might have several mitigation techniques. The modelers can choose to apply any threat modeling methodology, without being restricted to those previously enumerated. For instance, they might select a more specialized one like LINDDUN (Linkability, Identifiability, Non-repudiation, Detectability, Disclosure of information, Unawareness and Non-compliance) [12] which is focused on a particular field from cyber security, namely data privacy and confidentiality. 2.3. Related works Arwa et al. compared [13] data flow diagrams and use case diagrams and concluded that data flow diagrams are more powerful and can be easily included in the object-oriented approach. Use case diagrams can be employed as a first draft between system analysts and customers, then the system analyst can switch to data flow diagrams as a formal modeling of the system. In another publication [14], the authors investigated whether data flow diagrams are enough for conducting threat modeling. They concluded that although DFDs could be more easily adopted in practice as they employ very few concepts, they lack specialized notions about security concepts, data elements, abstraction level and deployment information. They advocate for the need of a dedicated integrated language for threat modeling. The threat modeling tool should provide a sufficiently complex language and level of support but, in the same time, should have the “ease of adoption” capability. Sion presented [15] solution-aware data flow diagrams for security threat modeling. They stated that many current techniques enumerate numerous non applicable threats while they should focus on selecting those that are strictly related to the technological context or the domain (i.e. threat modeling in the software development should take in consideration the already implemented security solutions in the system). On the other hand, having too much information about the domain can mislead the analyzer and can make the engineer biased. To overcome misleading by biasing, they proposed a constant re- assessment of the threats. Analys on DFDs diagrams could support information flow control or access control and Seifermann et al. [16] proposed an extended DFD syntax that could be used to model both of them. However, their work focused on capturing the logic and less on the visual representation. An approach related to ours in the sense that considers security by design principles is that of [17]. They provide a set of models annotated with security flaws and propose an automated approach to perform inspection using model query patterns. Our previous work [18] also considered the possibilities of analysis and detection of vulnerability patterns using knowledge graphs derived from DFD descriptions. While our prior research paid particular attention in identifying semantic relationships and patterns on the generated knowledge graph making it susceptible to machine processing and automated reasoning, in this paper we aim to address the support provided to the human security analysts which requires enhanced visual representations 2.4. Solution overview The proposed solution is to create a domain specific modeling method. Besides the modeling language for describing concepts needed to capture the security issues, we defined a functionality to calculate the security score of the modelled system. Our proof of concept considers a business-oriented use case given by an online shop, which is very popular among enterprise systems. Our modeling language groups the new concepts into two types of models. The first category of models includes concepts of the data flow diagram (Data Flow Diagram Model). The second model type provides a structural view, inspired from the mind maps, and it describes a threat methodology or a security framework methodology (Threat Mitigation Model). In this paper we demonstrate how model driven development can be used together with data flow diagrams and threat modeling methodologies to mitigate cyber threats and do a security analysis of a system. There are several threat methodologies that are widely accepted and commonly used in cyber security. In our solution we did not want to limit to a specific one, rather to allow the modeling of any methodology – we provide the possibility to create any methodology for various systems and business cases that exist, choosing STRIDE for our use case. To implement our proposal, a metamodeling approach was chosen, building upon the foundation laid by ADOxx meta modeling platform [19]. This allows us to create the concepts and constraints of our two model types by defining a domain specific modeling language. The metamodel is displayed in Figure 1, presenting the basic concepts and their relationships grouped by model types. Figure 1: Metamodel of the proposed domain specific modeling language. The functionality for calculating the security scores for the business process described in the Data Flow Diagram, given the threat methodology described in Threat Mitigation model is implemented as a functionality in the modeling tool via AdoScript [20]. Figure 2 shows an excerpt from this script. The algorithm reads the model’s content and for all objects of type “Process” retrieves the values from the attribute “Mitigations”. Each one is a hyperlink to an object described in the Threat Mitigation Model (STRIDE in our case). Also, all objects of type “Threat” are retrieved and a map with key/value pairs is created. This allows us to quickly asses whether for a specific process, the recorded mitigations are enough by taking into account the recommended mitigation methodology for a specific threat – in our example, described in the next section, the DoLogin process has 2 mitigations (Appropriate authentication and Don’t store secrets) which are among the mitigations endorsed in the Threat Mitigation Model for the Spoofing Identity Threat: therefore, 2 out of 3 (visible in Fig. 5 and 6). 3. Proof-of-concept In this section we present a running example for the modeling solution that we described above. The first step is to choose an existing methodology and based on it to create a Threat Mitigation Model. After that, the modeler will describe in a Data flow Diagram the most important entities in a system and how data travels among them. For this example, the chosen threat methodology, STRIDE, is modeled in Fig. 3. Figure 2: AdoScript functionality to compute security scores for business processes. Figure 3: Threat Mitigation Model inspired from STRIDE methodology. In the next step we create a Data Flow Diagram inspired by the following business use case: a shop’s employee inserts a list of products into the shop’s product database, afterwards a customer logins into the shop’s online platform and browses for products to place an order which will be saved into the shop’s order data base; finally, a manager reads the placed orders from the database. The Data Flow Diagram describing this use case can be observed in Figure 4. Figure 4: Data Flow Diagram Model describing our example of business use case. The following DFD elements can be identified: External entities: • Customer: Represents the customer who interacts with the online platform to place orders. • Employee: Represents the shop’s employee who provides the list of products to be inserted. • OrderManager: Represents the manager who reads the placed orders from the database. Processes: • IntroduceProducts: Represents the process where the shop's employee inserts a list of products into the product database. This process takes data input (list of products) and stores it in the product database (Data Store 1: Product DataStore). • DoLogin: Represents the process where a customer logs in; it takes input the user’s credentials and allows access on the platform • ChooseProducts: Represents the process where a customer browses for products • Place Order: Represents the process where a customer places an order through the online platform. This process takes input (order details) and stores the order information in the order database (Data Store 2: Order DataStore). • Read Orders: Represents the process where a manager reads the placed orders from the order database. This process retrieves data (placed orders) from the order database (Data Store 2: Order DataStore). Data Stores: • Product DataStore (Data Store 1): Stores the list of products inserted by the employee. • Order DataStore (Data Store 2): Stores the placed orders made by customers. Data Flows: • From Employee to IntroduceProducts: Represents the flow of the list of products from the employee to the process of inserting products. • From Customer to DoLogin: Represents the order details flow from the customer to the Login process. • From DoLogin to Choose Products: Represents the flow of data after the customer’s login, the verification of credentials and the obtained authorization • From Choose Products to Place Order: Represents the order details flow from the choice of the customer to the process of placing orders. • From Place Order to Order Database: Represents the flow of placed orders data from the "Place Order" process to the order database. • From Order Database to Manager: Represents the flows of placed orders data from the order database to the manager for reading purposes. On each process, the modeler can select some mitigations similarly as in an audit (Figure 5). Figure 5: Mitigations for the “DoLogin” process. Having these models created and the mitigation analysis performed, the security score of the modeled system can now be calculated through the implemented script. A sample of the result is presented in Figure 6. Figure 6: Security scores calculated for the modeled system. 4. Conclusions In this paper we presented a conceptual modeling approach for threat identification, choosing suitable mitigation techniques by calculating security scores according to domain specific methodologies. We used conceptual model representations to obtain an overview of the system and enable a security analysis: data flow diagrams have been leveraged in conjunction with existing threat modeling methodologies to perform analysis of the cyber system and identify weak points by generating a security score. The generated security score combined with the data flow diagram can be presented as an audit report understandable by all involved stakeholders, technical and non-technical. As future work, the presented solution can be extended by adding new functionalities for each process regarding the audit of external libraries or third party’s applications. In a similar vein, our solution can be integrated with external tools specialized on scanning the vulnerabilities of the third party libraries used in the modeled system. Nevertheless, the metamodel can be supplemented with new domain specific concepts to allow modeling a security risk score methodology hence, an improved security analysis Acknowledgements This research used infrastructure acquired as part of the project POC/398/1/1/124155 - co-financed by the European Regional Development Fund (ERDF) through the Competitiveness Operational Programme for Romania 2014-2020. References [1] H. Booth, D. Rike, G. A. Witte, The national vulnerability database (nvd): Overview, 2013. URL: https://www.nist.gov/publications/national-vulnerability-database-nvd- overview . [2] S. Ghuandare, A. Patil, R. Lad, Importance of Cyber Security, International Journal of Engineering Research & Technology , vol 8 – 05, 2020. [3] L. Sion, D. Van Landuyt, K. Wuyts, W. Joosen, Privacy risk assessment for data subject- aware threat modeling, In: IEEE Security and Privacy Workshops, pp. 64-71, 2019. [4] L. Kohnfelder, P. Garg, The threats to our products. Microsoft Interface, Microsoft Corporation, 33, 1999. [5] A. Shostack, Threat modeling: Designing for security. John Wiley & Sons, 2014. [6] P. G. Larsen, N. Plat, H. Toetenel, A formal semantics of data flow diagrams. Formal aspects of Computing, 6, pp.586-606, 1994. doi:10.1007/BF03259387. [7] OMG - Object Management Group: Business Process Model and Notation, URL: https://www.bpmn.org/ Accessed 2023/12/27. [8] G.M. Giaglis, A taxonomy of business process modeling and information systems modeling techniques. International Journal of Flexible Manufacturing Systems, 13(2), pp. 209-228, 2001. [9] T. UcedaVelez, M.M. Morana, Risk Centric Threat Modeling: process for attack simulation and threat analysis. John Wiley & Sons, 2015. [10] S. Hussain, A. Kamal, S. Ahmad, G. Rasool, S. Iqbal, Threat modelling methodologies: a survey. Sci. Int.(Lahore), 26(4), pp.1607-1609 Vancouver, 2014. [11] L. Conklin, V. Drake, S. Strittmatter, Z. Braiterman, Threat Modeling Process URL: https://owasp.org/www-community/Threat_Modeling_Process Accessed 2024/01/25. [12] K. Wuyts, L. Sion, W. Joosen, Linddun go: A lightweight approach to privacy threat modeling, in: IEEE European Symposium on Security and Privacy Workshops (EuroS&PW), pp. 302-309, IEEE, 2020. [13] A. Y. Aleryani, Compartive Study between Data Flow Diagram and Use Case Diagram, International Journal of Scientific and Research Publications, 6(3), pp.124-126, 2016. [14] L. Sion, K. Yskout, D. Van Landuyt, A. van Den Berghe, W. Joosen, Security threat modeling: are data flow diagrams enough?, in: Proceedings of the IEEE/ACM 42nd international conference on software engineering workshops, pp. 254-257, 2020. [15] L. Sion, K. Yskout, D. Van Landuyt, W. Joosen, Solution-aware data flow diagrams for security threat modeling. In: Proceedings of the 33rd Annual ACM Symposium on Applied Computing, pp. 1425-1432, 2018. [16] S. Seifermann, R. Heinrich, D. Werle, R. Reussner, Detecting violations of access control and information flow policies in data flow diagrams. Journal of Systems and Software, 184, p.111138, 2022. [17] K. Tuma, L. Sion, R. Scandariato, K. Yskout, Automating the early detection of security design flaws, in: Proceedings of the 23rd ACM/IEEE International Conference on Model Driven Engineering Languages and Systems, pp. 332-342, 2020. [18] A. Chis, I. Stoica, A. M. Ghiran, R. A. Buchmann, A Knowledge Graph Approach to Cyber Threat Mitigation Derived from Data Flow Diagrams, in: IEEE International Conference on Automation, Quality and Testing, Robotics, AQTR 2024, Cluj-Napoca, Romania, 2024. [19] BOC GmbH, The ADOxx metamodeling platform, 2024. URL: https://www.adoxx.org. Accessed 2024/08/01. [20] BOC GmbH The AdoScript Programming Language, 2024. URL: https://www.adoxx.org/live/adoscript-language-constructs Accessed 2024/08/01.