1. Introduction

International Conference on Software Engineering- requirements, Requirements Engineering

10.1016/j

Ensuring threat-model assumptions by using static code analyses

Johannes Geismann

Bastian Haverkamp

Eric Bodden

0 1 0 Department of Computer Science, Heinz Nixdorf Institute, Paderborn University , Fürstenallee 11, 33102 Paderborn , Germany 1 Fraunhofer IEM , Zukunftsmeile 1, 33102 Paderborn , Germany

2021

16 2011 0000 0003

In the past years, the security of information systems has become more and more important. Threat modeling techniques are applied during the design phase of the development, helping to find potential threats as early as possible. However, assumptions made at this development step are often not considered in later steps or are not validated correctly, particularly not during the concrete implementation of the system. To overcome this problem, we present cards, a security modeling approach on the architectural level which utilizes code analyses to validate assumptions made during the threat modeling phase. cards helps ensure a correct implementation but also allows one to determine which efect code vulnerabilities can have on the overall architecture, as described through models. We implemented cards based on the Eclipse Modeling Framework, for Java-based system implementations. We evaluated cards based on the CoCoME case study to show its eficacy. The evaluation showed that cards can ease the validation of assumptions made during threat modeling and reduce the overall analysis efort.

eol>Threat-modeling Security Component-based Static Code Analyses Security-by-design

1. Introduction

tions and are used to ensure that specific dataflows are prevented. For example, when paying in the super marSecurity is an essential property when developing mod- ket, such an assumption on the implementation of the ern software-intensive systems. To ensure high security, cash desk could be that customer credit card informait is important to consider security not only during the tion is only sent to system parts that have permission implementation but already when designing the system. to process it. Especially for large-scale systems this beEspecially dataflows are of high interest because con- comes a challenge because large code bases have to be ifdential data resembles important assets for every in- analyzed. Additionally, such systems consist of several formation system, and also because attacker-controlled subsystems that are possibly developed by diferent parinputs need to be properly filtered before they are used. ties. Distributed systems, micro-services and “serverless” For this reason, one uses threat modeling approaches to architectures are just some prominent examples. reason about potential threats and corresponding coun- Particularly in these areas, model-based approaches termeasures in early development steps [1]. are promising for threat modeling and security by de

Current approaches, however, are limited because of sign [3]. However, most approaches are either fully the lack of full traceability from threat model to the sys- model-driven approaches that are quite heavy-weight tem artifacts. In particular, due to a missing connection and usually hardly adaptable, e.g, UMLsec [4] or SEED [5], of threat model artifacts and the implementation, this im- or light-weight approaches such as STRIDE [1] that only plementation often difers from the specifications made take threat modeling into account but do not consider during threat modeling [2]. Hence, assumptions made the connection to the implemented system. To make during the design or in the threat model are not cor- threat modeling more efective for distributed systems, rectly implemented or not even implemented at all, which the following challenges need to be met. leaves the security state of the actual system unclear. Security requirements are usually defined by several Static code analyses can help to validate these assump- disciplines and, therefore, should be specified on the architectural or system level such that they can be discussed independently from—and in the best case already before—the implementation phase. Countermeasures deifned during such a threat modeling phase are usually assumptions made about the implementation. Hence, all assumptions made on the architectural level have to be made explicit in the model and have to be correctly reifned into source code [ 2]. Because this is a tedious and error-prone task, one must validate these assumptions on

Create System Architecture 1

Specify Securityrelevant information Specify Restrictions and Assumptions Analyze Architecture for Violations Implementation Check System for

Compliance the source code level. An additional challenge is that the The source code for our implementation can be found implemented system is usually not completely under the on https://github.com/secure-software-engineering/cards control of one development team. Static code analyses are a suitable solution to this end because they validate such assumptions on the source code and can be defined for a specific subsystem regardless of who is responsible 2. CARDS: Security Modeling and for the implementation. However, if static code analyses Validation are used, the results are most useful if fed back to the architecture and threat model. Unfortunately, current Efective threat modeling requires four basic steps: (1) solutions fall short in this regard. Finding security-relevant systems parts and functions, (2)

We see two main concepts as essential here: Connec- Finding potential threats with regard to these parts, (3) tion to the source code and making the requirements Risk-assessment, i.e., prioritizing the threats, and (4) imand assumptions made during threat modeling explicit. plementing appropriate countermeasures. While threat To address these challenges, we have developed cards modeling in general targets all kind of threats, cards (Component-based Assumptions and Restrictions for focuses on dataflow-specific threats. We designed cards Dataflow Specifications), a security modeling approach in such a way that it’s concepts can be applied to existing for dataflows in distributed systems. It provides a new development processes. Figure 1 shows an overview of DSL which operates on a generic component model and, the main steps. therefore, can be adapted for existing component-based At first, the system designer creates a component model approaches. cards can be used to specify security re- describing the basic architecture of the system (1). This quirements for dataflows, as well as assumptions made to step is not necessarily part of cards since an existing fulfill these restrictions on the architectural level. cards architectural model could also be adapted for the applicafurther illustrates how static code analyses can be used tion of cards. Based on the component model, security to validate the assumptions on the code level. and domain experts specify security-relevant informa

In particular, this paper makes the following original tion, e.g., confidential data. Also, security restrictions and contributions: security assumptions are specified explicitly. Security restrictions describe security requirements for specific • cards: a concept and a domain-specific language data types of the system, e.g., data from the credit card for the specification of dataflow restrictions and reader are always sanitized before being sent to other assumptions on the architectural level, components of the system. Dataflow-specific security • an analyzer checking the system for dataflow vi- requirements for the system can be refined to security olations, restrictions. A security assumption makes an assumption • a concept for generating the corresponding static to the implementation explicit, e.g., that confidential data code analyses, and will never be send to an external entity. Following this, • an implementation of these concepts based on a restriction describes global requirements the system the Eclipse modeling framework and Sirius, pro- should satisfy, an assumption contrarily describes what viding a textual as well as graphical syntax. the designer assumes to be implemented for each component. The concept of both (1) and (2) are described in more detail in Section 2.1.

Next, the system can be analyzed whether all security restrictions are satisfied assuming that all assumptions will be implemented correctly (3). If a violated restriction is found, the security experts may add additional assumptions to mitigate this security issue and re-apply the analysis until all restrictions are satisfied. The as

This paper is structured as follows: In Section 2, we provide an overview of cards, describe our concept for security restrictions and assumptions, explain our model analyses on and the generation of code analyses. In Section 3, we describe the implementation of the prototype and present the evaluation of cards in Section 4. Section 5 compares cards with related approaches and Section 6 concludes this paper.

0..*

ports Component CompositeComponent

AtomicComponent

Based on the component model, cards utilizes several

sumptions can be useful for the actual implementation security-relevant pieces of information that can be specof the system giving the developers guidelines for the ified within our DSL. In the following, we give a short implementation. Concepts for the analyses and potential overview of the supported language features and their use-cases in the development are explained in Section 2.2. purpose.

Finally, cards uses generated static code analyses to validate if all assumptions are implemented correctly (4).

For this, we provide in Section 2.3 a concept for how the assumptions can be mapped to static code analyses automatically. If all generated analyses pass and no violation is found on source code, the restrictions made to the system can be seen as satisfied on code-level, too. ponents or AtomicComponents. Composite components can contain further components by defining ComponentParts which allows for a hierarchical component model.

Atomic components cannot contain further components.

Components use Ports for communication with other components. In our component model, we assume communication to be asynchronous. Ports are connected via PortConnectors which are embedded into the parent composite component. For a better overview, we have omitted several parts of the meta-model that are mainly needed for technical reasons. The full meta-model can be found in our provided implementation artifacts.

2.1.2. Security-relevant Information

DataTypes are representing the security-relevant data. They are the data assets of the system because they represent the data that should be protected. We only consider data that are relevant for the analyses. DataTypes can have attributes for labels, e.g, to mark a datatype as external user input, a security level, and a type which can be interesting when mapping to the actual source code base. Listing 1 shows an excerpt of the example where three data types are defined (lines 1-5).

Data Groups are used to combine several DataTypes, e.g., all data describing parts of credit card information. DataGroups are mainly used when defining Restrictions and Assumptions. In Listing 1, the data types CreditCardNumber and CreditCardPIN are grouped (cf. line 11). Component Groups are used similarly to combine components that have something in common, e.g., (un)trusted components.

Component Kinds can be used to categorize components, e.g., to mark components as external entities, datastores, or processes (similar to DFD threat modeling) [1].

Data Sources describe which components are the sources for a specific DataType. In Listing 1, the component CardReader is marked as source for the types CredietCardPin and CreditCardNumber.

Sanitzers are used to modify data making them secure for further use, e.g., escaping bad characters. At this stage, a sanitizer is only on conceptual level and can be used in the security assumptions (cf. Section 2.1.3). In the example, a 2.1. Specifying Restrictions and

Assumptions

In this section, we explain our concepts of restrictions,

assumptions, and all concepts required. We developed a DSL for specifying security-relevant information of the system, security restrictions, and security assumptions. Since it is essential to refer to the actual system model, this DSL refers to a component model. For demonstration purposes, we are using a generic component model which is described in Section 2.1.1. However, since we use a generic component model, we see our concepts not restricted to one component model but adaptable to other component models. After that, we describe in Section 2.1.2 how security-relevant information can be formalized. Finally, we explain our concept of restrictions and assumptions in more detail and describe our DSL for this step.

2.1.1. Component Model For demonstration purposes, we are using a generic com

ponent model. We therefore expect that our concepts can be applied to most other component-based system speciifcations as well. Figure 2 depicts the main parts of the underlying meta-model. A component model consists of a set of components which can be either CompositeCom

Listing 1: Example code of a cards-specification. 1 dataTypes { 2 DataType BarCode { }, 3 DataType CreditCardNumber {securityLevel 3 }, 4 DataType CreditCardPin {securityLevel 4 } 5 } 6 components { 7 AtomicComponent CardReader { 8 ports { INOUTPort cardReaderPort ( )} 9 sourceOf { CreditCardPin,CreditCardNumber }} 10 } 11 Groups {DataGroup CreditCardInfo {CreditCardPin,

CreditCardNumber}} 12 Sanitizer {CCSanitizer}

Listing 2: Example of a restriction using cardsspecification.

1 DataFlowRestrictions { 2 GloballyPREVENT CreditCardInfo { 3 Comp CreditCardPin , CreditCardNumber allow CardReader ,

Bank , CashDeskPC}} Listing 3: Example code of security assumptions using

cards. 1 DataFlowAssumptions { 2 componentAssumptions { 3 Component CashDesk neverOut CreditCardInfo } 4 portAssumptions { sanitizer is defined that should sanitize all confi- 5 Port pcLightDisplay neverOut CreditCardInfo dential credit card information, e.g., by replacing 76 sanPiotritzpecrCsaAsshsBuomxpPtoirontsne{verOut CreditCardInfo} it with asterisks. 8 Component CashDeskPC sanitizes DataFlow Security Level can be used to assign a specific pCcrCeadridtRCeaarddeIrnPfooruts-i>ngpCcCPSrainnitetriPzoerr}t}of level of security or trust to components.

2.1.3. Dataflow Restrictions and Assumptions

restrictions could not be validated. The security engineer can therefore specify assumptions of the implemented behavior which must be met to achieve the restriction. We next explain how to specify such assumptions in cards.

In the following, we describe our concepts for security

restrictions and corresponding assumptions and how cards supports the security engineer specifying these. Essentially, restrictions formally describe security requirements regarding the dataflow within the system. Assumptions are used to describe countermeasures that are assumed to be in place in the source code.

Specifying Assumptions An assumption describes a

required behavior of a component. cards provides different kinds of assumptions. At first, we distinguish between two major kinds of assumptions: neverOutassumptions and sanitzer-assumptions. A neverOut-asSpecifying Restrictions Restrictions are used to for- sumption specifies that a context element will never leak mally describe security requirements for the data types the given data type, e.g., that a component will never specified as assets. In essence, the security engineer has send private data to another component. A sanitizerto describe a security policy for each data type describing assumption specifies that a context element will always which component is allowed to access the data. Basically, sanitize the data before leaking it using a specific santhere are two options: 1. Globally allow all components itizer, e.g., replacing some digits with asterisks when to access a data type and define exceptions that are not sending credit card information to the printer. allowed to access the data type (deny-listing approach) We support three diferent context elements: compoand 2. globally prevent components from accessing the nents, ports, and flows within a component. Assumptions data type and define exceptions describing components for component parts are not useful because all parts of a that are allowed to access the data type (allow-listing specific component type will have the same implementaapproach). tion. In the example in Listing 3, we show four diferent

Corresponding to this, we distinguish between two assumptions: 1. an assumption that the (composite) comkinds of restrictions, so-called Allow-Restrictions and ponent CashDesk will never leak the credit card info Prevent-Restrictions. For each datatype, the security en- (line 3). 2. an assumption that the pcLightDisplay gineer has to specify such a restriction. One restriction port will never leak the credit card info (line 6). 3. an may cover more than one data type. Listing 2 shows an assumption that the pcCashBoxPort port will never example of a specified restriction. In particular, we de- leak the credit card info (line 7). 4. an assumption that ifne a prevent restriction describing, that the data types the component CashDeskPc port will always sanitize CreditCardPin and CreditCardNumber should only dataflows of credit card info from pcCardReaderPort be accessed by the components CardReader, Bank, and to pcPrinterPort, using the sanitizer CCSanitizer (line CashDeskPC by combining the prevent restriction and a 10). component refinement. Beside component refinements, cards provides an analysis to check whether the specrestrictions cards also supports refinements for compo- ified restriction is satisfied on model level if all assumpnent parts and component groups. Without any knowl- tions are implemented correctly. This analysis is exedge of the concrete behavior of the components, this cards provides model-based analyses checking whether all specified restrictions are satisfied and if all security assumptions have been implemented correctly. This analysis should be part of the threat modeling activity during system design and is also useful to find efects in 2.3. Using Code Analyses for Validation the system’s architecture when a problem in the actual implementation is found. The analysis can help secu- When all violations of dataflow restrictions are elimirity experts to find unintended dataflows and to specify nated by specifying assumptions, these assumptions must requirements for the implementation of a component also be met through correctly implemented source code. by creating security assumptions. Besides the analysis, To validate this, we propose to use static code analysis cards also provides several reporting features to assist (cf. Step 4 in Figure 1). We provide a general concept for the security experts by exporting the analysis results in creating static code analyses for the given model assumpuseful formats. In this section, we describe how our anal- tions. Since these analyses base on a common structure, ysis works at first and how the results can be reported it is reasonable to generate them and, thus, automating afterward. this step. However, to generate the analysis, some man

In cards, we apply a two-step analysis. First, for each ual prerequisites must be met, i.e., a connection between component, all possible paths through the model are the model and the code base has to be created. In the determined. Second, for each component and compo- following, we explain how we propose to create such a nent parts respectively, all data types are determined that connection first and how the analyses can be generated might reach this component. For the first analysis, we automatically in a second step. treat the component model as a directed graph where components are the nodes and port connectors are the 2.3.1. Connection to Source Code edges. Conceptually, the analysis is as a basic depth-first search. The output of the analysis is a mapping from For connecting the (secured) component model to a given components to all (longest) paths through the model, i.e., code base, we propose to use a so-called mapping model. for each component, we store which components it could This mapping model is used to describe the connections directly or indirectly communicate with. In the second between model artifacts and parts of the source code. All analysis, for every component, a set of available data required mappings are shown in Table 1. All mappings types is determined, i.e., data types that could possibly have to specify the model element, class and a method. be accessed by this component. In the beginning, the Since creating all mappings by hand is a tedious task, set of available data types of all components that are a we provide a source code generator that generates source source for a data type are set to these data types. Next, code skeletons for a given composite component and also the analysis recursively propagates data types through creates an appropriate mapping model containing all rethe system. The analysis iterates through the paths and, quired mappings. As proof of concept, we implemented a for each step in the path, adds all currently available data generator for Java which is explained in Section 3 in more types to an output set which is again propagated to the detail. Supporting the engineers in creating a mapping next component in the path. In this step, we evaluate model for an existing code base is not in the scope of this given assumptions of the component to alter the set of paper but we see potential by applying semi-automatic available data. If a sanitizer-assumption is specified for approaches like done by Peldszus et al. [7]. However, both this component and datatype, we add a flag to the data the mapping model itself and the generator are conceptype that it becomes sanitized by this component. If a tually not restricted to one programming language but neverOut-assumption is specified, the data type is re- can be easily adapted for other programming languages. moved from the output set. The output of this analysis is a mapping of components to pairs of lists of paths and 2.3.2. Generating Static Analyses data types, which are received on these paths. The advan- After creating the mapping model, we use this informatage of this two-step analysis is that the result does not tion to create a suitable static code analysis. Since we only show available data for each component but also tend to analyze the flow of information, we use a taint which path is the source for a given datatype. analysis to validate the flow of data through the program.

To find violations of restrictions, we check for each

Description

In general, a component is mapped to a class. However, this mapping is also used to specify a method that describes the main entry point of the component, e.g., a method that executes the behavior of the component.

This mapping is used to specify a method for writing to or reading from a component port. We therefore distinguish between IN-port mappings and OUT-port mappings. If an INOUT-port is used, both mappings have to be specified.

This mapping is used to specify a method that returns a specific data type if a component is specified as a source for a data type.

This mapping is used to specify a method that executes the sanitization of a data type.

When generating the analyses, we can reduce the search space by considering the information of the component model. In particular, we only take methods for ports into account that are capable of handling the data types under investigation. For example, let us assume that the component of the card reader (cf. Listing 1) is connected to the cash desk. When analyzing the implementation of the cash desk on the flow of credit card information, it is suficient to take the port of this connection as a source for the credit card information.

After executing the analyses, the result shows if the assumptions are correctly implemented in the given implementation. An advantage is that not all analyses have to be re-evaluated if the source code for a component changes but only the analyses that are relevant for this component. Also, the security engineer can use this information to either consider this fact in the security model, e.g., by adding additional assumptions to other components, or by contacting the developer of the components that do not comply with the assumptions.

3. Implementation

Instead of generating full analyses, we use the informa- We implemented a prototype of our DSL and analyses tion stored in assumptions and the mapping model to using the Eclipse Modeling Framework (EMF). We chose configure taint analyses provided by mature frameworks to add a textual representation of the DSL using Xtext such as Boomerang [8, 9]. [10] and implemented a graphical editor using Sirius [11].

Since assumptions are always specified for a specific The source code for our implementation can be found component, the analyses are restricted to the correspond- on https://github.com/secure-software-engineering/cards ing implementation for this component as well. In gen- In the following, we describe all parts of our impleeral, both the read-messages for all IN-ports of the com- mentation shortly. ponent that receive a specific datatype and (if the component is a source) the source-method for data type are Textual and Graphical Editor The graphical editor potential sources for the taint analysis. Similarly, all for cards was implemented using Sirius. Figure 3 shows OUT-ports are potential sinks for the taint analysis. In an example of the graphical editor. In addition, we prothe following, we describe how a taint analysis can be vide a textual editor implemented using the Xtext framespecified for each assumption based on our models. work. All changes made to the model in the graphical

We assume that the mapping model is fully specified editor are also reflected on the underlying Xtext model. and, therefore, provides methods for reading a data type Hence, developers can switch at any time to the reprefrom a IN-port, writing a data type to an OUT-port, sani- sentation they prefer. Using the graphical editor, we can tizing data types for each sanitizer, and for executing the easily model systems or create representations for existcomponent’s behavior. The last method can be used as ing models. The diagram representation can be analyzed an entry-point for the code analyses. If not specified, all using Sirius’ own tool to verify diagrams, which invokes public accessible methods have to be considered as poten- our analyses, using EMF validation and are shown in the tial entry points, e.g., public methods in Java. Methods model and the Eclipse problems view. for ports and sanitizer are used to configure the taint analyses. Both methods for reading IN-ports of all ports Analyses The analysis explained in Section 2.2 is imthat are capable of handling the data type to be analyzed, plemented as a basic depth-first search. We treat the and a method if the component is a source for the data model as a directed graph and recursively propagate data type are considered as sources for in taint analysis. Cor- types, which a component is source for, over outgoing respondingly, methods for OUT-ports are considered as edges. Output of this analysis is a mapping from composinks in the taint analysis. In the case of a flow assump- nents to all paths through the model. The assumption tion that explicitly defines a flow from one to another analysis explained in Section 2.2 iterates through the port, only methods for these two ports are considered. paths determines the processed data per component. The output of this analysis is a mapping of components to reader, a cash box, a printer and a light display, all of pairs of lists of paths and data types, which are received which are connected to a cash desk pc, which also conon these paths. To resolve restrictions, we check for each nects to a bank. Figure 3 shows the component model restriction, if data types of the defined restriction are using our graphical editor. For our evaluation, we chose illegally accessible at a component. to base our model on CoCoME’s first proposed use case, the sale. A sale is an interaction between a customer and Mapping Model As explained in 2.3, we created a a cashier. We model the complete cash desk, a bank and mapping model, which maps model parts to Java code to the store infrastructure. We adapted the data types proease the generation of static code analyses. This mapping vided in the reference implementation of CoCoME [13], is implemented as a EMF model. Empty mappings for new as they are not part of the original definition. We used model parts are automatically added to this model when the case study as a proof of concept of cards itself. For using our graphical editor suite. Instead of providing our example, we defined a restriction that the credit card an additional DSL for the mapping model, we provide number and pin may only be accessed by the card reader, a properties view for relevant parts of the model in our bank and cash desk pc. In the real world, the credit card graphical editor, where mappings can be edited. number may be printed if partly replaced with asterisks, so a sanitization is a sensible approach.

Generation of Glue Code Using the Xtend frame- Using the provided models of CoCoME, this restricwork, we implemented a code generation, whose output tion is not directly clear, as dataflows are not part of their can serve as glue code for Java implementations of a given modeling. With cards, we can already provide a formal model. Components are implemented as Java threads and restriction for this use case. Listing 2 shows the textual all connections and mappings between component parts representation of this restriction. Upon validating the are implemented using the observer pattern. Commu- model, our analyses provide the developer with feedback nication is restricted to strings, but can be extended to that the current model violates the restriction because arbitrary objects. Similar to our DSL, composite com- the credit card information may be accessed at every components handle the inter-component communication by ponent, including the printer. To address this violation, instantiating connections. Additionally, all assumptions we chose to define several dataflow assumptions for our are added as documentation for the developer using Java model. Listing 3 shows a representation of the assumpannotations. Upon code generation, the mapping model tions we made to resolve the violations. In particular, is also created automatically. we assume that the credit card information will never be leaked to the light display, cash box and anything outStatic Code Analyses Based on the concepts described side the cash desk component. Additionally, dataflows in Section 2.3.2, we generate the configuration code for between pcCardReaderPort and pcPrinterPort of the static code analysis automatically using the Xtend the cash desk pc component will be sanitized using the framework. The generator takes the component model CCSanitizer. With these assumptions in place, the analyand the mapping model as input. All assumptions can sis does not show any violations for the restriction. Debe validated using taint analysis. Since we are focusing velopers might find major security flaws in their archion Java code in our implementation, we decided to use tecture based on restriction violations, which may lead the established analysis framework Boomerang [9, 8] for to architectural refactorings that resolve the violation. the specification and execution of the taint analyses. We We used cards to generated a Java project for the cash generate the required taint analyses for each assumption. desk application and implemented the behavior code for The generator can be adapted to any other framework the relevant components based on the documentation of that enables the specification and execution of taint anal- CoCoME. Also, the corresponding mapping model and yses. This also allows one to use diferent languages for the static code analyses were created automatically. the implementation of the system’s components. For the evaluation of the analyses, we created two versions of the implementation: one version violating the assumptions which should therefore lead to a report by 4. Case Study the analysis, and one version that respects the dataflow assumptions, e.g. by preventing dataflows or using the We evaluated cards using a case study based on CoCoME desired sanitizer. The analyses were able to find the in[12]. CoCoME is an established example for component correct dataflows. However, it showed that in the current modeling commonly used in research. The example sys- implementation false positives might get reported if one tem is a model of a store which is part of an enterprise. An specifies diferent policies for data of the same port. To enterprise consists of a server, client and several stores, solve this problem, the developer needs to either adjust each store consists of a server, client and several cash the implementation making sure that the data are filtered desks. A cash desk consists of a bar code scanner, a card and correctly sanitized, or the result is fed back into the component model where the security engineer can split Extended dataflow diagrams Berger et al. present an the dataflows such that the flows are analyzed separately. approach using extended DFDs [16] which are a more

The evaluation showed one major advantage of the formal version of classical dataflow diagrams. Since these approach. When the source of one component changes, DFDs allow for formal analyses and hierarchical system only the analyses for this component have to be re-evalu- specification, it allows for more precise threat modeling. ated instead of analyzing the whole source code again. In contrast, we base our threat modeling approach on For example, assuming that the implementation of the established modeling artifacts enabling the integration of component CashDeskPC changes, only the analyses for our concepts into existing approaches. Peldszus et al. [17] this component have to be executed. If the implementa- providing an approach that aims at the connection from tion of other components changes, no re-evaluation is dataflow diagrams to source code and is therefore also required. Especially for large-scale systems, this compo- highly related to our approach. This approach enables sitional approach can help to reduce the overall time for more precise threat modeling because the actual implethreat modeling and risk analysis. mentation is respected in the threat model. In contrast, cards focuses on a top-down approach enabling early analyses without a code-base. 5. Related Work Also, model-driven and model-based security grew to a large research area in the last years [18]. An overview There are two major areas to which cards is related: of approaches in general can be found in the mapping Threat Modeling and Model-based security testing. Threat study by Nguyen et al. [19]. Several approaches integrate modeling because cards enables threat modeling and security modeling into existing modeling approaches, analyses based on the created threat model. Security e.g. SEED [5] or UMLsec [4]. SEED [5] is an approach testing since cards aims to automate validating the im- that aims at building a bridge between embedded system plemented security assumptions. experts and security experts. In SEED, security experts can define security solutions that can be used during the system design and to validate the system based on the integrated security solutions. In contrast, cards focuses on the definition of assumptions at design time and the validation on source code level instead of defining concrete security solutions that are integrated into the system design. UMLsec [4] provides a UML profile providing modeling concepts and analyses for security-relevant system properties. In contrast to UMLsec, cards focuses on the connection of design-time assumptions and the source code implementation, leaving model-driven concepts like concrete behavior modeling out.

Threat Modeling For threat modeling, often dataflow

diagram based approaches are applied because of the simplicity and technology-agnostic modeling [2]. Most prominent examples are the STRIDE approach [14] or LINDDUNN[15] for privacy-focused threat modeling. cards is related to these approaches since it also utilizes an architectural description of the system. However, in contrast, cards focuses on seamless threat modeling by combining threat modeling and analyses on the actual implementation. Currently, cards does support finding known threats automatically but we plan to implement this in future work.

Several approaches enhanced the use of data-flow diagrams to improve threat modeling and risk analysis.

Model-based Security Testing Following the clas- proach helps designers identify required dataflow rules sifications discussed in a survey by Felderer et. al [ 20], for the implementation at early development steps. These for security testing two principal approaches are distin- rules (assumptions) can be useful in diferent ways: On guished in general: Testing to find vulnerabilities and the one hand, when implementing a new system, they unknown threats in the system and testing if the security can be used as requirements for the later implementamechanisms are implemented correctly [21]. The first tion. On the other hand, they can be used to validate if category does not fit to cards since we are using threat an already implemented system does comply with the modeling techniques to define security requirements and security assumptions. threats in the initial steps but cards does not contribute Furthermore, we provide a concept of how these asto finding new threats or vulnerabilities by itself. sumptions can be expressed by static code analyses, al

Following Schieferdecker et al. [22], models that are lowing to automatically validate the assumptions on a used for model-based security testing can be categorized given implementation. The advantage of this modular into three major categories: First, Architectural and func- approach compared to approaches that validate security tional models which “are concerned with system require- requirements is that assumptions are defined componentments regarding the general behavior and setup of a wise and, therefore, only the code for afected composoftware-based system” [22]. Second, Threat, fault and nents has to be analyzed. This is especially important if risk models that “focus on what can go wrong” [22] and the source code for only one component changes and the are used to determine potential threats, corresponding requirements has to be re-evaluated. Also, connecting risk factors, and their relationships, e.g., STRIDE [1]. a threat model on the architectural level with concrete Third, Weakness and vulnerabilities models describing analyses on the source code level helps feed back analysis “the weakness and vulnerabilities itself” [22], e.g., models results into the threat model. This simplifies reasoning referring to CVE or CWE but also catalogs for generating about the efects of the analysis results. threat lists like in the Microsoft Threat Modeling Tool [1]. We provide a prototypical implementation of cards cards provides a combination of the approaches of the containing a graphical and textual editor for component ifrst and second category because it utilizes architectural model and our DSL for describing assumptions and remodels for describing a secure system architecture but strictions and evaluated our concepts based on a use also concepts and analyses for reasoning about dataflow case of the CoCoME case study. To ease the process of threats in the system. In contrast to existing approaches connecting threat model and code, we provide a genercards combines a light-weighted threat modeling ap- ator to Java code that automatically creates a mapping proach on abstract design models with concrete analyses model describing the connections from model elements on the implemented system and, therefore, enables seam- to dedicated Java methods. For existing system impleless threat modeling of a system. Providing vulnerability mentations, the approach is currently limited in eficacy and attack catalogs or the integration of CVEs is currently because the mapping model that connects the component not supported and left for future work. model used for threat modeling and the source code has to be created manually. However, we see potential to automate this step in future work. We also plan to extend 6. Conclusion the approach by taking the kind and security level of data types and components into account when analyzing the Modern information systems require development tech- model. This would enable the security engineers to apply niques that ensure security-by-design. Especially, dataflows concepts of DFD-threat modeling (like in STRIDE) on the within a system are of high interest since data is often component model and to search for required restrictions a sensitive asset of the system. The early creation of and corresponding assumptions automatically. a threat model but also the seamless integration of the We see cards as a promising combination of lightthreat model into all development steps of the system weighted threat modeling and concrete security analyses are essential to this extent. In this paper, we have pre- on source code which can help system developers to sented cards, a model-based threat modeling approach create more secure large-scaled distributed systems. for dataflows in distributed systems. We discussed our concepts based on a generic component model. cards allows to formally specify security requirements for sensi- References tive data of the system and to validate these requirements on architectural level by defining assumptions for the [1] A. Shostack, Threat Modeling: Designing for Secusystem’s components that need to be fulfilled in the imple- rity, John Wiley and Sons, Indianapolis, USA, 2014. mentation. For this, we provide a DSL that allows defin- [2] L. Sion, K. Yskout, D. Van Landuyt, A. van den ing both requirements and assumptions for a component- Berghe, W. Joosen, Security threat modeling: Are based system specification. Using this systematic ap- data flow diagrams enough?, in: IEEE/ACM 42nd