=Paper=
{{Paper
|id=Vol-2581/emls2020paper1
|storemode=property
|title=Model-driven Development of Evolving Secure Software Systems
|pdfUrl=https://ceur-ws.org/Vol-2581/emls2020paper1.pdf
|volume=Vol-2581
|authors=Sven Peldszus
|dblpUrl=https://dblp.org/rec/conf/se/Peldszus20
}}
==Model-driven Development of Evolving Secure Software Systems==
<pdf width="1500px">https://ceur-ws.org/Vol-2581/emls2020paper1.pdf</pdf>
<pre>
       Model-driven Development of Evolving Secure
                    Software Systems
                                                                  Sven Peldszus
                                                         University of Koblenz-Landau
                                                              Koblenz, Germany
                                                          speldszus@uni-koblenz.de


   Abstract—Software systems are continuously entering more                     To tackle these challenges, we started to develop the
and more parts of our lives and have to deal with a higher                   GRaViTY framework [6]. This framework allows us to au-
amount of sensitive data than ever before. At the same time, these           tomatically create and maintain trace links between different
software systems get more complex and have to be maintained
over long periods. One approach to deal with the issues arising              artifacts, such as UML models, Java source code, and program
from these trends is model-driven software development (MDD).                models for analyses, created during the development of a sys-
Much research has been done on automating MDD approaches                     tem. Starting from early design-time models until the creation
and integrating the different artifacts used. However, there are             and maintenance of the code, this framework is intended
still plenty of issues that have to be solved. In this work, we dis-
cuss the three main challenges we discovered at the development                1) to maintain trace links between these artifacts,
of our MDD approach GRAViTY. GRaViTY supports developers                       2) keep all artifacts up to date if one artifact is changed,
in the model-driven development and maintenance, and evolution                 3) to specify security requirements on the most suitable
of long-living secure software systems. Thereby, GRaViTY itself                    representation of a system, and
leverages multiple MDD approaches.                                             4) to continuously check all system representations for se-
   Index Terms—model-driven development, software engineer-
ing, evolution, maintenance, security
                                                                                   curity violations.
                                                                                For solving this challenge, we mainly utilize bidirectional
                                                                             graph transformation approaches. In this work, we are giving
                         I. I NTRODUCTION                                    an overview of our solutions and discuss the three main
                                                                             challenges we faced:
   Modern software systems tend to be used on a long-
term basis in environments prone to changes, are highly                        1) Transformation between models with different granularity
interconnected, are continuously extended with new features,                   2) Incremental updates of abstract syntax trees (AST)
and often process security-critical data [1], [2], [3]. These                  3) Maintaining networks of transformations
trends complicate to keep up with the ever-changing security                    In what follows, we introduce the relevant background on
precautions, attacks, and mitigations, which is vital for pre-               our assumptions about MDD as well as on security checks in
serving a system’s security. Model-driven development (MDD)                  Sec. II. Afterward, in Sec. III, we give a brief overview of the
enables us to address security issues in the early phases of the             GRaViTY framework. In Sec. IV we discuss challenges we
software design already, such as in UML models defined at                    faced, and present our solutions and challenges that have to
design time [1]. Unfortunately, the specification of a system’s              be overcome. How others dealt with the same challenges, is
security assumptions and documentation is often inconsistent                 discussed in Sec. V. Finally, we conclude in Sec. VI.
with its implementation [4]. The continuous changes in the
security assumptions and the design of software systems – for                                         II. BACKGROUND
instance, due to structural decay [5] – have to be reflected in                In this work, we present an approach for supporting devel-
both the system models (e.g., UML models) and the system’s                   opers in the model-driven software development (MDD) of
implementation (including program models used, e.g., for                     secure software systems. For explaining our approach and the
static analysis or verification).                                            challenges, that we identified during its implementation, in this
   The tracing between the different artifacts available for                 section, we introduce the underlying understanding of MDD
deciding which change is necessary at which location in the                  and which artifacts we use as well as the different security
system and on which of the many artifacts, has currently                     checks which are combined by our approach for enforcing the
to be performed manually by developers. The effort for the                   development of a secure system.
creation of such mappings after the fact is still high even if
this process is guided by tool support, e.g., for the creation of            A. Model-driven Software Development
mappings between models and code [4]. For this reason, we                       In this work, we are building upon the concept of model-
have to maintain mappings between different artifacts used                   driven software development (MDD) [7]. MDD allows devel-
in the different phases of development from the very early                   opers to specify the system and its properties on a higher
beginning and to automate them as much as possible.                          level of abstraction than the source code level [8]. Thereby,


Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
                                                                                    Code                                               OﬃceVisitControl              DiagnosisControl      data
                               UML                   UML               UML
                                                                                                                                + data: OﬃceVisit [1]              + data: Diagnosis [1]
                             Domain              Design        Implementation                                                                                                                       «critical»
                              Model              Model             Model                                                        + openNotesDialog()                                                     User
                                                                                                                                + openHealthMetricDialog()                                   + password: String [1]
Fig. 1: Artifacts used in Model-driven Software Development                                                                     + chooseHealthMetricRecord()
                                                                                                                                + openPrescriptionDialog()
                                                                                                                                                                                             + ﬁrstName: String [1]
                                                                                                                                                                         Control             + lastName: String [1]
                                                                                                                                + choosePrescriptios()
                                                                                                                                                                                             +/ name: FullName [1]
                                                                                                                                + openLaboratoryProcedure()                                  + streetAddress1: String [1]
                                                  <<crititcal>>                                                                 + chooseLaboratoryProcedure()
                                        {secrecy={homeAddress : Address}}                                                                                          + ﬁnishModiﬁcation()      + city: String [1]
                                                                                                                                + openPatientInstructionDialog()                             + state: String [1]
                                                      Person
                                                                                                                                + choosePatientInstruction()
                                            +/ name: FullName [1]                                                                                                                            + zip: String [1]
                                                                                                                                + openDiagnoseDialog()                                       +/ homeAddress: Address [1]
                                            + homeAddress: Address [1]                                                          + chooseProceduresDialog()
                                                                                                                                                                       LogInControl
          Patient                                                                                         Doctor                + chooseImmunizationsDialog()
                               + patients                  treats                                                               + openReferralDialog()             + data: User [1]
  + allergies: String [*]                                                              *         + specialities: String [*]
                                                                                                                                + home()
                               *                                                + doctors                                       + openHospitalDialog()             + login()                                 Patient
                                                       Examination
    1   + patient     1     + patient                                                       1   + doctor *    + doctors         + openAppointmentDialog()          + resetPasswort()
                                             *                           + examinations                                         + billing()                        + chooseUser()

                             + examinations                            *
                                                           1    + examination

                                            + diagnosis         1
                                                                                                                               Fig. 3: Excerpt of a Design Model for iTrust based on [3]
                                                           Diagnosis
                                   + diagnosis                          + diagnoses

                                                 *                      *
                                                                                                                              platform, e.g., a login control, a control for documenting an
 Fig. 2: Excerpt of a Hospital Domain Model based on [11]                                                                     office visit or for entering a diagnosis, as well as a more
                                                                                                                              detailed data structure than in the domain model.
                                                                                                                                 The different controls specify essential actions that can
developers should specify all additional properties, like se-
                                                                                                                              be performed, e.g., the option to reset the password in the
curity assumptions, only once on the most suitable level of
                                                                                                                              login window. For a login of a user, the system needs the
abstraction. While MDD can cover many kinds of models we
                                                                                                                              user information to identify and legitimate the user. For this
focus on UML models [9].
                                                                                                                              purpose, the LogInControl accesses the data available in
   In GRaViTY, the single models are iteratively refined until
                                                                                                                              the User-object given to it.
we reach a concrete implementation of the system. Fig. 1
shows the refinement hierarchy of the model kinds currently                                                                      The data used by the system is detailed in this model. For
considered by us from the most abstract model at the left to                                                                  example, the classes User and Patient can be seen as
the final implementation at the right. All in all, we consider                                                                more concrete instances of the classes Person and Patient
UML models with three different levels of granularity.                                                                        from the domain model in Fig. 2. On the Person class,
     a) Domain Model: The most abstract model is a domain                                                                     for example, it is explicitly specified that the homeAddress
model, specifying general properties of the domain, that the                                                                  attribute, already known from the domain model, is derived
software to develop is placed in [10]. Domain models are                                                                      from other attributes.
used in the earliest phases of software development to capture                                                                   While models with different abstractions are often created
general properties about a system’s domain. Often, domain                                                                     separately, we encourage developers to model the information
models are specified using UML class diagrams.                                                                                of refinement explicitly by the use of inheritance relations
   Fig. 2 shows an excerpt of a domain model for hospitals.                                                                   between elements. For example, there should be an explicit
In hospitals, two kinds of people play a central role, patients                                                               inheritance relationship between the User in the design model
and doctors who treat the patients. Both have a name and                                                                      and the Person in the domain model.
homeAddress. For patients, usually, a list of allergies                                                                             c) Implementation Model: Precise functionality is speci-
is stored and for doctors a list of their specialties. A                                                                      fied in an implementation model. The implementation model is
doctor can examine a patient in an Examination and create                                                                     usually the first platform-dependent model and contains infor-
Diagnose in such examinations.                                                                                                mation about the deployment or languages used to implement
   When we implement a software system for a hospital, e.g.,                                                                  the system. The implementation model can directly be exe-
like iTrust [12] for online management of patient data, we                                                                    cuted, used for code generation, implemented manually, or a
have to support the concepts captured in the domain model.                                                                    combination of all. In our approach, we support a combination
     b) Design Model: After the specification of the domain                                                                   of the code generation and manual implementation.
model, the domain elements realized in the software are                                                                          Fig. 4 shows an excerpt of an implementation model show-
concretized in design models. Those design models specify the                                                                 ing how the iTrust platform could be developed in a hospital.
design of the system and how the functionality is distributed                                                                 The model is based on our experience in modeling the hospital
among the system. Thereby, the foundation of an easily main-                                                                  system of a partner in the VisiOn EU project together with
tainable system is set by the appropriate use of well-known                                                                   them but does not show a real system [14].
design patterns [13]. This is also the first point where we have                                                                 Inside of the hospital, two servers are operated, one running
to start to continuously use design and security analyses to                                                                  iTrust and one running a database as well as an authentication
ensure the system’s maintainability and security.                                                                             service. Doctors are accessing the iTrust system from the
   Fig. 3 shows an excerpt of a design model for iTrust, based                                                                hospital’s local network. Patients can get access to their data
on a UML model reverse-engineered by Bürger at al. [3]. In                                                                   from the outside but have to authenticate themselves at the
this model, different controls are specified for using the iTrust                                                             authentication service provided by the hospital.
                                                                                                                              UML
                                                                                                                                                           UMLsec
              Hospital
                                                                                                                                  Domain Model


                     iTrustServer                                                                                                 Design Model                           Java
  «deploy»                                            WebServer
                                          «LAN»
                                                                                                MobileDevice                      Implementation Model
                                                                                «Internet»

                                                                               «encrypted»                                                                                    UMLsec as
                                                                                                                           CARiSMA                                         Java Annotations
                                                                                                                         Security Checks
                                         «deploy»      «deploy»
                      «deploy»
                                                                                                                                    Security Checks

       «artifact»             «call, secrecy, integrity»          «artifact»
         iTrust                                                   Database                                                                               Program Model
                                                                                                        «deploy»
   «call, secrecy, integrity»                                                                                                        Fig. 5: Concept of the Framework
                    «artifact»                          «artifact»                                «artifact»
                     Doctor                       AuthentiﬁcationService                           Patient

 «call, secrecy, integrity»      «call, secrecy, integrity»        «call, secrecy, integrity»
                                                                                                                         a) Analysis of API calls: Many approaches locally ana-
Fig. 4: Excerpt of an Implementation Model for iTrust based                                                        lyze calls to critical APIs and whether the chosen parameters
on [14]                                                                                                            have been selected securely. This covers, for example, calls to
                                                                                                                   crypto APIs [15] or SQL queries [16]. While those approaches
                                                                                                                   are important for the development of secure systems, in this
B. Security Checks                                                                                                 work we are focusing more on the question whether, e.g. the
   For the enforcement of security requirements, we can                                                            use of a crypto API, has been implemented at a point specified
make use of various kinds of security checks, supported by                                                         in the models.
GRaViTY. In what follows, we give a brief overview of                                                                    b) Secure data flow analysis: A common approach to
different kinds of security checks and in which stages of                                                          detect leaks of secret data is a secure data flow analysis.
model-driven software development they can be applied.                                                             One of the main problems for a precise data flow analysis
                                                                                                                   is the classification of critical sources and sinks. Many tools
   1) Model-based Checks: According to the principle of
                                                                                                                   are based on shared libraries of well known critical sources
security by design, the system to be developed should already
                                                                                                                   and sinks, created manually or by machine learning [17].
be checked early during its development for security issues.
                                                                                                                   However, more precise information, especially about critical
   The UMLsec [1] approach, integrated into GRaViTY, allows
                                                                                                                   sources, is available in design-time models, e.g., annotated
the specification and check of essential security requirements
                                                                                                                   with UMLsec. For example, in Fig. 2 we declared the property
already at design time. In UMLsec, UML models are anno-
                                                                                                                   homeAddress to contain secret values, which has to be
tated with security requirements like security levels of class
                                                                                                                   considered during a secure data flow analysis.
members. These security annotations are checked for their
                                                                                                                      While all these different security checks on the different
compliance with different security policies.
                                                                                                                   artifacts can help in the development of a secure system,
   In the given example, the class Person from the do-                                                             they are often limited to their area of focus. However, such
main model in Fig. 2 is annotated with the UMLsec stereo-                                                          security checks are more powerful when they are combined.
type <<critical>>, which specifies that the attribute                                                              For example, often information required by a security check
homeAddress is on the security level secrecy, meaning                                                              on a lower level has already been defined at design-time. This
that only legitimate entities are allowed to read its value.                                                       information should be reused to avoid misunderstandings and
UMLsec allows, as part of the <<secure dependency>>                                                                divergence in the security assumptions but also to improve
security policy, to check if this domain model or any model                                                        the effectiveness of the checks. Unfortunately, doing so is
refining the domain model contains insecure uses of attributes                                                     challenging and should be assisted by tool support.
or operations, that are annotated with a security requirement.
   Here, we can utilize the use of refinement relations between                                                                            III. GR AV I TY F RAMEWORK
the different model kinds for detecting security violations at                                                        Our proposed framework, called GRaViTY [6], supports de-
no additional cost for considering multiple models. Also, if a                                                     velopers in applying the model-driven development approach,
security requirement is changed in one representation we can                                                       as described in Sec. II-A, to the development and maintenance
immediately see the impact on the other UML representations.                                                       of secure long-living systems. As shown in Fig. 5, design
   In the implementation model, we also annotated the calls                                                        models (e.g. specified in UML), source code (e.g. written
and communication paths with UMLsec stereotypes. E.g., all                                                         in Java), and a program model for performing sophisticated
data transferred from and to the doctors is sent over an internal                                                  analyses, e.g. the security checks from Sec. II-B, are con-
LAN connection and all data sent from and to the patients is                                                       tinuously synchronized for covering the different phases of
sent over an encrypted internet connection.                                                                        software development.
   2) Static Code Analysis: Static code analysis is usually                                                           The program model provides a high-level abstraction from
used to detect security issues during software implementation.                                                     the pure Java source code [18], e.g., by reducing details
Thereby, the analysis tools are often integrated within the                                                        from the statement level to access edges between the single
development environments or build processes.                                                                       members. In addition, easy to query structures are created,
                                                                             (a) MoDisco Java Model  Program Model


       Fig. 6: Excerpt from the iTrust Program Model


                                                                          (b) MoDisco Java Model  UML Class Diagram
such as structuring methods and fields into a tree with
names, signatures, and definitions. For example, Fig. 6 shows            Fig. 7: TGG Transformation Rules for Methods
a program model with two different method signatures for
the method name updateInformation. For the signa-
ture with the parameter types EditOfficeVisitForm and             integrated into GRaViTY. Unfortunately, unlike the mappings
Boolean, a definition from the class EditOfficeVisit              using TGGs, there is currently no automation in updating the
is shown, which calls an other method definition. This allows     different UML models.
the easy specification of, e.g., refactorings [19], [18], anti-      Let us assume a change in the security knowledge and
pattern detection [2] and elimination [20], or compliance         look at how the developed hospital system can be adapted
checks with models [4].                                           to this change using the GRaViTY framework. Due to the
   Security-related specifications are introduced into the dif-   introduction of the European General Data Protection Regu-
ferent artifacts as annotations. On UML models, we use            lation (GDPR) [24], we got a stronger restriction in the ways
the UMLsec profile for security annotations proposed by           how we have to deal with personal data. Before the GDPR
Jürjens [1]. Similar annotations are specified as Java anno-     became valid, it was legal to identify patients based on their
tations on the source code level and are also contained in        names. This information has to be treated with more sensitivity
the program model, like the TSecrecy annotation in Fig. 6         now. This change in the security knowledge can, for exam-
which relates to the secrecy value of the <<critical>>            ple, be reflected in annotating the Patient in the domain
annotation in Fig. 2. Here, GRaViTY mainly allows the reuse       model in Fig. 2 with the UMLsec stereotype <<critical>>
of security requirements across the different artifacts. For      {secrecy={name:FullName}} expressing that the access
example, as discussed in Sec. II-B, the UMLsec security           to this information is only allowed for legitimate cases. As this
annotations can be used to determine the sources and sinks        security annotation is inherited by the more concrete subtypes,
of a secure data flow analysis.                                   the secure dependency check will fail after this change as
                                                                  there are no corresponding changes on the other elements.
   To keep the different artifacts consistent, we employ triple
                                                                  Accordingly, this gives a list of accesses to the developers,
graph grammars (TGG) [21] for a bidirectional synchroniza-
                                                                  which have to be checked for this purpose. To do so, the
tion between the source code and the program model repre-
                                                                  developers have to look into the documentation and can follow
sentation of Java programs [18] as well as UML models. Our
                                                                  the trace links generated by GRaViTY. Furthermore, they can
implementation is based on the eMoflon graph transformation
                                                                  use the TGGs to transfer the new security annotations into the
engine [22]. Among others, eMoflon allows the specification
                                                                  code and re-execute the security analyses to get more detailed
and execution of TGGs between models specified using the
                                                                  feedback about the compliance of the implementation.
Eclipse Modeling Framework (EMF). While the UML models
                                                                     To conclude, our TGGs provide an automated mechanism to
and the program model are specified using EMF, we have to
                                                                  preserve consistency between the three different program rep-
parse the Java source code to create an EMF model from it.
                                                                  resentations for managing evolving Java programs. As a result,
For this purpose, we are currently using MoDisco [23].
                                                                  we obtain a model-based framework for arbitrarily interleaving
   Fig. 7 shows two transformation rules from these TGGs,         program evolution and maintenance steps. Furthermore, we
which translate a method declared by a type to a method           can use this approach to also translate and synchronize security
definition in the program model or an operation in a UML class    requirements of model elements between different system
diagram respectively. Inbetween the models a correspondence       representations to execute sophisticated security checks on
model is built, that allows the synchronization of changes        them as discussed in section II-B.
made on one of the two sides of the rule.
   The single UML models are directly connected by inheri-                      IV. C HALLENGES TO OVERCOME
tance relations, e.g., the User in the design model (Fig. 3) is      During our work, we faced various challenges of which
a subtype of the Patient in the domain model (Fig. 2). This       some have been solved by us, some have been circumvented
allows easy detection of changes that lead to inconsistencies,    by us, for some we have ideas on how to deal with them,
as the inheritance relations can be used as trace links, as       and some are still open challenges. In this section, we are
demonstrated in Sec. II-B1. For this reason, the UMLsec tool is   discussing the three most important challenges we faced.
                                                                    Source Code                 MoDisco Model   Correspondences   Program Model


                                transformation
                                                                                  1. discover


Fig. 8: Challenge A: Transformations between Models with                  2. change
Different Granularity                                                             3. discover


A. Transformation between Models with Different Granularity
   Changes between synchronized UML models and code can                   Fig. 9: Challenge B: Incremental Updates of ASTs
easily be propagated, when they are on the same level of
granularity, e.g., using the TGG-based approach presented
in Sec. III. Models that are modeled by developers in early         code, we generate a UML class diagram on the granularity of
phases, e.g., the iTrust design model in Fig. 3, have a different   the implementation, that is kept in sync with the implementa-
granularity. Nevertheless, we have to be able to apply our syn-     tion, into the implementation model. This generated model is
chronization approach also to those manually defined models.        initialized based on elements available in the implementation,
Accordingly, the first challenge is how to deal with such a         architectural, and design model. Afterward, code stubs, as well
different granularity, as shown in Fig. 8. At the development       as trace links, can be generated from this generated model.
of GRaViTY, we faced this issue in three different variations.      If one of the inheritances is lost, e.g., due to a deletion it
      a) Program Model: Our TGGs have been proven to                has to be manually recreated by a developer. Additions in
be good in handling different granularity by not translating        the implementation are automatically synchronized into the
elements, e.g., all details from the method bodies available        generated part of the implementation model.
in the MoDisco model but not in the program model. Un-                    c) Security Requirements: The same as discussed before
fortunately, they cannot be used for creating structures that       holds also for the different security requirements specified
differ completely on the two sides. For this reason, we had to      on every of these three system representations. While those
implement multiple preprocessing steps extending the different      security requirements should express the same assumption on
models with such structural information.                            every representation, this assumption should be expressed in an
   One example is the method representation as name, sig-           appropriate granularity on each representation: a more detailed
nature, and definition, shown in Fig. 6. While it is pos-           specification is required on the source code level than in the
sible to create this structure using TGG rules by creat-            domain or architectural model.
ing the whole structure when a method name is translated
the first time and inserting afterward, this produces issues        B. Incremental Updates of Abstract Syntax Trees (ASTs)
in the synchronization of changes. Let us assume that the              One of the biggest issues we faced is the loss of information,
TMethodName node in Fig. 6 has been created when the                that was added manually or automatically to the created model,
method in the class EditOfficeVisit has been translated             when the source code has to be parsed again. The same issue
and the other signature has been added afterward reusing this       has been faced by representatives from industry, we talked to.
TMethodName node. In a refactoring, e.g., a pull-up method          This problem mainly covers the loss of generated or manually
refactoring [19], we now delete the TMethodDefinition               added annotations , such as the TSecrecy annotation in
defined by EditOfficeVisit and are going to synchronize             Fig. 6, and trace links to other artifacts, e.g., as part of the
this change into the source code. To do this we have to             correspondence model built by the TGGs, as shown at the top
undo all rule applications that lead to the creation of deleted     of Fig. 9. Usually, the discovery of a model after changes on
nodes or edges. As the creation of the TMethodName node             the code is executed from scratch resulting in an entirely new
took place in the same rule application as the creation of          model. As the added annotations and trace links reside in the
the deleted TMethodDefintion node, it will be like this             old model (as shown in Fig. 9) this results in the loss of all
TMethodName node has never been created. This also makes            added information that has not been written into the source
the creation of the other TMethodSignature node invalid             code. This means we would not be aware of the TSecrecy
as its context does not exist anymore, leading to a situation       annotation on the method definition in Fig. 6 anymore and
in which no recovery is possible. To deal with this issue we        have no correspondences to the new model, as shown on the
defined a preprocessing which already creates the required          bottom of Fig. 9.
structure on the side of the MoDisco model.                            As there can be much information annotated to the models,
   Unfortunately, the handling of such issues by preprocessing      if all this information would be written into the source code
brings an additional level of complexity to the implementation      it could become unreadable. Also, this information is already
and makes the synchronization more difficult as most prepro-        stored at a different location and should not be duplicated.
cessings also need a postprocessing undoing them.                      In our case, the MoDisco framework builds an entirely new
      b) UML Models: To overcome the different granularity          model each time, leading to the problem that the trace links
between the manually maintained UML models and the source           of the TGG still point to the old model. The representatives
                                Tester                             Model transformations. Whenever we need a transformation
                              Transformation
                                                   ?               between the UML Class Diagram and the Program Model we
                                                                   have to execute these two transformations in a row.
                                                                      To speed up this process, we tried to generate a UML


                    Tr
                       an
                         sf
                                                                   Class Diagram  Program Model transformation from the


                        or
                           m
                              at
                                               change              two specifications we already had. As the two transformations


                                 io
                                   n
                       ?                                           are translating the same elements from the MoDisco model, it
Fig. 10: Challenge C: Maintaining Networks of Transforma-          should be straightforward to specify such a transformation.
tions                                                              For example, as the two rules, in Fig. 7 both translate a
                                                                   MethodDeclaration, we can derive that we have to
                                                                   translate an Operation to a TMethodDefiniton. Un-
from industry had the same problem with embedded C code.           fortunately, while doing this, we had to learn that we have to
To deal with this issue, we generate and apply model patches to    resolve many inconsistencies first. Due to a very detailed test
follow the source code changes. For this purpose we calculate      suite for the two transformations, this was surprising for us.
the differences between the old and changed model using EMF           In this test suite, we created minimal examples for most Java
Compare facing the following issues:                               language features to test the translation of these features. These
     a) Scalability: At first EMF Compare behaved very well        features range from a simple class definition to exotic features
when we applied it to small artificial changes but did not scale   such as the definition of an inner class inside of an anonymous
on real changes. In our previous works, we built a test set        class. All in all, our test suite contains 77 input models that can
of open-source projects with different sizes (between 5,800        be given into the two transformations as well as the expected
and 200,000 lines of code) [2]. The creation of the program        outcomes. All in all, we have 231 test cases in this test suite,
model takes between some seconds to some minutes dependent         77 for the preprocessing common to the two transformations as
on the size of the program. For each program, we tried to          well as 77 test cases for each of the transformations. Besides,
calculate the differences between different versions of each       we regularly execute the transformation on our test set of open-
program. While it took e.g. 7s to build a program model for        source projects, introduced in Sec. IV-B.
JUnit version 3.8.1 (32,300 nodes in the MoDisco model) it            While testing the transformations we also had to deal with
took EMF Compare already 37s to calculate the differences          the model comparison problem discussed in Sec IV-B. Due to
between JUnit version 3.8.1 and 3.8.2. For our next bigger         the high complexity of the models, it was also not possible to
program GanttProject (146,000 nodes) the TGG is executed           compare the generated model directly with an expected model
in 6s but EMF Compare did not even finish the comparison           without getting pseudo differences. Our solution to this was to
after 30 minutes.                                                  specify essential expected patterns in Henshin rules [26] and
     b) Pseudo Differences: An additional issue we discov-         to check whether the rules match as expected.
ered is regarding the quality of the differences calculated by        We had to learn that already a network with only two
EMF Compare. We got many differences containing multiple           interacting transformations is hard to maintain. Thereby, we
changes that reverted themselves.                                  are in line with the observations of Stevens for bidirectional
   Helpful tool support going in the same direction, that          transformations [27].
could be utilized for this, might be a TGG based consistency          As the test suite, which contains common input models
check [25]. In this case, we have to specify a TGG transfor-       to both transformations and expected outputs to the two
mation from the MoDisco meta-model to MoDisco that can be          transformations, was not enough to avoid an unnoticed diver-
used to detect the differences between two MoDisco models.         gence, we are currently thinking about other ways to test the
   However, the best would be an incremental parser for Java,      transformation. One of the ideas we are currently thinking of is
that updates an initially created model each time it is executed   to specify the third missing transformation and to test by using
on the same code again. Unfortunately, there are only a few        a round-trip execution. To sum up, additional tool support for
works on this and none supports EMF.                               the maintenance of transformations is strongly required.

C. Maintaining Networks of Transformations                                               V. R ELATED W ORK
   The last challenge is regarding the maintenance of networks        In this section, we discuss how others dealt with the same
of transformations. According to Fig. 10, we have a network        challenges we faced in comparable approaches.
of transformations and are going to change one of the trans-          In the single underlying model approach (SUM), Atkinson
formations. The open question is how to systematically derive      et al. define a single model, that is able to express all
the required changes on the other transformations and how a        information about the system [28]. Suitable views according
tester for detecting divergences has to look like.                 to the current task are extracted from this model. The SUM
   While Fig. 5 shows three transformations between the            is comparable to the different connected UML models of our
different artifacts, by now we only implemented two of them.       approach, in which we integrate all design-time information.
To be more precise, these are the MoDisco Java Model              SUM supports an automated extraction of views that could be
UML Class Diagram and the MoDisco Java Model  Program             helpful in GRaViTY for manual edits of the generated parts of
the implementation model. While we support well known plain                                       R EFERENCES
UML, SUM made many modifications to the UML to support              [1] J. Jürjens, Secure Systems Development with UML. Springer, 2005,
all those kinds of different abstractions. Also, SUM does not           chinese translation: Tsinghua University Press, Beijing 2009.
provide an integration with the concrete implementation.            [2] S. Peldszus, G. Kulcsár, M. Lochau, and S. Schulze, “Continuous
                                                                        Detection of Design Flaws in Evolving Object-Oriented Programs using
   With VITRUVIUS Kramer et al. developed a SUM ap-                     Incremental Multi-pattern Matching,” in ASE, 2016.
proach that also integrates Java source code [29]. Unlike           [3] J. Bürger, D. Strüber, S. Gärtner, T. Ruhroth, J. Jürjens, and K. Schnei-
                                                                        der, “A framework for semi-automated co-evolution of security knowl-
our approach, the trace links to the model are written into             edge and system models,” JSS, vol. 139, 2018.
the source code as annotations and might lead to unreadable         [4] S. Peldszus, K. Tuma, D. Strüber, J. Jürjens, and R. Scandariato, “Secure
source code, as discussed in Sec. IV-A.                                 Data-Flow Compliance Checks between Models and Code based on
                                                                        Automated Mappings,” in MODELS, 2019.
   On a very similar technical basis as our framework is the        [5] D. L. Parnas, “Software Aging,” in ICSE, 1994.
Codeling tool of Konersmann [30]. The idea of Codeling is the       [6] “GRaViTY.” [Online]. Available: http://gravity-tool.org
integration of architecture model information into the program      [7] T. Stahl, M. Voelter, and K. Czarnecki, Model-driven Software Devel-
                                                                        opment: Technology, Engineering, Management. Wiley, 2006.
code. Like our approach, Konersmann uses TGGs for model             [8] B. Hailpern and P. Tarr, “Model-driven Development: The Good, the
to model transformations at architecture extraction. In contrast        Bad, and the Ugly,” IBM Syst. J., vol. 45, no. 3, 2006.
to us, he is not continuously keeping the extracted models up       [9] OMG, “UML Superstructure Specification,” 2011.
                                                                   [10] G. Wagner, Information Management - An Introduction to Information
to date but always writes all changes, made on an extracted             Modeling and Databases, 2019.
model, back to the code. Every time an architectural view          [11] UML-Diagrams, “Hospital management.” [Online]. Available: https:
on the system is needed Codeling extracts it again. By doing            //www.uml-diagrams.org/examples/hospital-domain-diagram.html
                                                                   [12] A. Meneely, B. Smith, and L. Williams, “iTrust Electronic Health Care
so Konersmann is circumventing the challenge of incremental             System Case Study.” [Online]. Available: http://agile.csc.ncsu.edu/iTrust
updates discussed in Sec. IV-B at the cost of massively            [13] E. Gamma, R. Helm, R. Johnson, and J. Vlissides, Design Patterns:
increasing the code base with additional information.                   Elements of Reusable Object-oriented Software. Pearson, 1994.
                                                                   [14] I. Christantoni, C. Biffi, D. Bonutto, and A. C. Sanz, “Vision pilots
   Commercial tools like Enterprise Architect (EA) also pro-            report,” VisiOn Privacy Platform, Tech. Rep., 2017.
vide round-trip engineering for UML models and Code [31].          [15] S. Krüger, S. Nadi, M. Reif, K. Ali, M. Mezini, E. Bodden, F. Göpfert,
The main limitation of these tools is the restriction to UML            F. Günther, C. Weinert, D. Demmler, and R. Kamath, “CogniCrypt:
                                                                        Supporting Developers in using Cryptography,” in ASE, 2017.
models very close to the code which eases the synchronization.     [16] X. Fu, X. Lu, B. Peltsverger, S. Chen, K. Qian, and L. Tao, “A Static
While EA allows a translation from UML stereotypes to Java              Analysis Framework For Detecting SQL Injection Vulnerabilities,” in
annotations, which could be used for transferring UMLsec                COMPSAC, 2007.
                                                                   [17] S. Rasthofer, S. Arzt, and E. Bodden, “A Machine-learning Approach
annotations into the code, they do not support more complex             for Classifying and Categorizing Android Sources and Sinks,” in NDSS,
information transfers.                                                  2014.
                                                                   [18] S. Peldszus, G. Kulcsár, M. Lochau, and S. Schulze, “Incremental Co-
   While all approaches are dealing with the same challenges            evolution of Java Programs Based on Bidirectional Graph Transforma-
as us in similar ways, none of them provides the support                tion,” in PPPJ, 2015.
to maintain security requirements on different artifacts in a      [19] S. Peldszus, G. Kulcsár, and M. Lochau, “A Solution to the Java
                                                                        Refactoring Case Study using eMoflon,” in TTC, 2015.
sophisticated way and to check those security requirements in      [20] S. Ruland, G. Kulcsár, E. Leblebici, S. Peldszus, and M. Lochau,
between the different artifacts.                                        “Controlling the Attack Surface of Object-Oriented Refactorings,” in
                                                                        FASE, 2018.
                                                                   [21] A. Schürr, “Specification of Graph Translators with Triple Graph Gram-
                      VI. C ONCLUSION                                   mars,” in WG, 1995.
                                                                   [22] E. Leblebici, A. Anjorin, and A. Schürr, “Developing eMoflon with
                                                                        eMoflon,” in ICMT, 2014.
   In this work, we presented the GRaViTY approach for             [23] H. Bruneliere, J. Cabot, F. Jouault, and F. Madiot, “MoDisco: A Generic
model-driven development and maintenance of secure long-                and Extensible Framework for Model Driven Reverse Engineering,” in
living systems. Based on GRaViTY, we elaborated on chal-                ASE, 2010.
                                                                   [24] European Parliament and Council of the European Uninon, “Regulation
lenges we had to overcome for further automation in the de-             (EU) 2016/679 – General Data Protection Regulation (GDPR),” in
velopment and verification of long-living systems using MDD.            Official Journal of the European Union, 2016.
                                                                   [25] E. Leblebici, “Inter-Model Consistency Checking and Restoration with
   The main challenges are in dealing with all the different            Triple Graph Grammars,” Ph.D. dissertation, 2018.
levels of abstraction appearing in the development of systems      [26] T. Arendt, E. Biermann, S. Jurack, C. Krause, and G. Taentzer, “Henshin:
and the synchronization of the single artifacts appearing. While        Advanced Concepts and Tools for In-place Emf Model Transformations,”
                                                                        in MODELS, 2010.
we have been able to utilize recent developments from the          [27] P. Stevens, “Bidirectional Transformations in the Large,” in MODELS,
model transformation domain for improving the synchroniza-              2017.
tion between the different artifacts, we faced also challenges     [28] C. Atkinson, D. Stoll, and P. Bostan, “Orthographic Software Mod-
                                                                        eling: A Practical Approach to View-based Development,” in ENASE.
in maintaining those transformations ourselves.                         Springer, 2009, pp. 206–219.
   To sum up, GRaViTY supports the model-driven develop-           [29] M. E. Kramer, E. Burger, and M. Langhammer, “View-centric Engineer-
ment and maintenance of secure software systems by providing            ing with Synchronized Heterogeneous Models,” in VAO, 2013.
                                                                   [30] M. Konersmann, “Explicitly integrated architecture-an approach for
support to synchronize the different artifacts appearing during         integrating software architecture model information with program code,”
MDD as well as in the specification and reuse of security               Ph.D. dissertation, 2018.
requirements in the execution of security checks for ensuring      [31] “Enterprise Architect.” [Online]. Available: www.sparxsystems.de
the security of the developed system.

</pre>