=Paper=
{{Paper
|id=Vol-2019/mdetools_6
|storemode=property
|title=
|pdfUrl=https://ceur-ws.org/Vol-2019/mdetools_6.pdf
|volume=Vol-2019
|authors=Tim Bolender,Bernhard Rumpe,Andreas Wortmann
|dblpUrl=https://dblp.org/rec/conf/models/BolenderRW17
}}
====
<pdf width="1500px">https://ceur-ws.org/Vol-2019/mdetools_6.pdf</pdf>
<pre>
 Investigating the Effects of Integrating Handcrafted
         Code in Model-Driven Engineering
                                 Tim Bolender, Bernhard Rumpe, Andreas Wortmann

                                                      Software Engineering
                                                     RWTH Aachen University
                                                        Aachen, Germany
                                                       www.se-rwth.de


   Abstract—Where model-driven engineering (MDE) requires            Java-like action language. We provide the software architecture
models to interact with general-purpose programming language         without component behavior to the participants as well as a
(GPL) artifacts, sophisticated patterns for the integration of       code generator that translates the software architectures to
generated and handcrafted code are required. This raises a gap
between modeling and programming which requires developers           Java code executable with the Simbad1 3D robot simulator.
to switch between both activities and their contexts. Action         The treatment group develops the components’ behavior with
languages that enable interacting with GPL artifacts on model        embedded, restricted Java/P, which is the action language of
level can alleviate the need for switching, but cannot rely on the   the UML/P [6] language family. The control group develops
mature development tools of GPLs. We propose to investigate          the components’ behavior through integrating of handcrafted
the acceptance and effects of MDE with action languages that
GPL interaction. To this effect, we present a study design for       Java code. We record the 100-minute development sessions
comparing MDE with integration of handcrafted artifacts to           and instrument the development environment with analyses
pervasive MDE with action languages. In this study, participants     on the quality of the component behavior under develop-
develop the behavior of selected components of the software          ment. Together with pre-study questionnaires and post-study
architecture of a small autonomous robot using either the            questionnaires, we aim to uncover whether pervasive MDE
delegation pattern or a Java-like action language. With this,
we aim to uncover which method yields better acceptance and          with action languages or programming of component behavior
performance results.                                                 method yields better acceptance and performance results.
                                                                        In the following, Sec. II introduces the MontiArcAutomaton
                      I. I NTRODUCTION                               architecture modeling infrastructure and the Java/P action lan-
   Programming raises a conceptual gap between the problem           guage, before Sec. III details the study design and ?? discusses
domains and the solution domains of discourse [1]. Model-            threats to its validity. Then Sec. V describes preliminary results
driven engineering (MDE) aims at reducing this gap by lifting        from a pilot study andSec. VI debates related studies. Sec. VII
more abstract models to primary development artifacts [2].           concludes.
Such models are better suited to analysis, communication,
documentation, and transformation. However, where models                                      II. P RELIMINARIES
require interaction with general-purpose programming lan-               To prevent architecture modelers from switching between
guage (GPL) artifacts, such as legacy code, libraries, or            modeling and programming, we embed the Java/P [6] action
frameworks, modeling infrastructures usually require bridging        language into components of the MontiArcAutomaton [5] ar-
this gap by integrating handcrafted with generated artifacts.        chitecture description language. This section introduces both.
For this, various patterns have been developed [3], all of which
require the modeler to switch from more abstract modeling            A. The Java/P Action Language
activities to very technology-specific programming activities,          Java/P [6] is a MontiCore [7] language resembling Java 1.7.
which require different tooling and different mindsets. Where        It is used as action language in the UML/P [6] language family
developers switch between various activities 47 times per            and supports the full concrete syntax and abstract syntax of
hour on average already [4], the additional activity switching       Java 1.7 as well as its context condition rules. MontiCore’s
required by such MDE might ultimately hinder development.            code generation framework translates Java/P models into Java
   We aim to investigate the effect of pervasive MDE using           artifacts. Reifying a Java as a modeling language enables to
action languages in a comparative modeling study. In this            easily extend it with new modeling elements (for instance, the
study, the participants develop the behavior of MontiArc-            notions of components as in ArchJava [8]), context conditions
Automaton [5] component models of a small autonomous                 rules (such as preventing assignment of null values), and
robot. To this end, they either integrate handcrafted code
through a variant of the delegation pattern [3] or leverage a          1 Simbad website: http://simbad.sourceforge.net/
    component type name                 instance name         outgoing port      incoming port
                                                                                                                                    III. S TUDY D ESIGN
BallFinder
                                SearchControl                                        MovementControl
                                searchCtrl                                           movement                     There are various means to achieve pervasive MDE using
                                      InternalControl                Movement
 Camera
              b
                  Boolean
                            b       b ctrl                m      m
                                                                     Command
                                                                                 c    c
                                                                                          MotorControl     l   different language combinations as well as various patterns
 cam                                                                                      ctrl             r
                                    c

                                    c
                                                          l      l                                             to integrate handcrafted code (cf. [3]) and measuring their
                                                                        Lamp
 Collision    c
                  Boolean
                            c
                                                                      Command
                                                                                      l
                                                                                          WheelMotor           differences requires concretizing the research question to com-
 colli                                                                                    left(„left“)
                                    b Logger                           typed
                                                                                                               parable implementations. We use AJava as representative of
                                                                                          WheelMotor
    atomic
                                    c
                                      log(„search.log“)              connector        r
                                                                                          right(„right“)
                                                                                                               pervasive MDE as both ADLs and action languages are com-
  component
                                    m

                                    l
                                                                                                               mon modeling techniques. For the integration of handcrafted
                                                                                          StatusLamp
                                                                                      l
                                                                                          lamp                 code, we selected the delegation pattern [12] as representative,
                                                                                                               which has been employed in various modeling infrastructures.
Figure 1. MontiArcAutomaton architecture of a small autonomous robot                                           This section describes the design of our study based on these
capable of finding and retrieving balls in a simulated environment.                                            representatives.

                                                                                                               A. Research Goals
transformations (such as automatically creating getters and                                                       When planning an experiment, it is crucial to measure only
setters). Moreover, MontiCore produces a parser for Java/P                                                     the properties from which the researchers can correctly draw
programs that supports processing Java 1.7 classes, i.e., lifting                                              conclusions. To answer our research question systematically,
legacy level to model level. This enables to model Java                                                        we applied the Goal-Question-Metric (GQM) [13] method.
programs capable of using existing libraries and frameworks.                                                   Using this top-down approach, the conceptual level (goals)
                                                                                                               is considered first. This is refined to the operational level
B. The MontiArcAutomaton Modeling Infrastructure                                                               (questions) from which the quantitative level (metrics) is
                                                                                                               observed. Each refinement step is based on the preceding level.
   MontiArcAutomaton [5] is an extensible architecture mod-
                                                                                                               This ensures that only metrics are employed, which help to
eling infrastructure. It comprises a component & connector
                                                                                                               achieve the original goal and not because they are easy to
(C&C) architecture description language (ADL) and a code
                                                                                                               measure or convenient for the researcher. To this effect, our
generation framework. The ADL is realized as a Monti-
                                                                                                               research goal is to analyze behavior modeling with AJava
Core [7] language and uses MontiCore’s language integration
                                                                                                               and the delegation pattern for the purpose of evaluation with
mechanisms [9] to enable plug-and-play embedding of action
                                                                                                               respect to their effectiveness and efficiency from the point
languages. It has been configured with behavior modeling
                                                                                                               of view of the researcher in the context of graduate and
languages including Java/P and successfully deployed to teach-
                                                                                                               undergraduate students at RWTH Aachen University.
ing [10] as well as to service robotics applications [11]. The
combination of MontiArcAutomaton with Java/P is denoted                                                           We concretize this in three research questions as follows:
AJava (for “Architectural Java”).                                                                              RQ1 Does pervasive MDE with AJava help to reduce the
   Core concepts of the MontiArcAutomaton ADL are illus-                                                            development time compared to describing component
trated with the experiment’s software architecture depicted                                                         behavior with Java using the delegation pattern?
in Fig. 1: the MontiArcAutomaton ADL distinguishes compo-                                                      RQ2 Is the pervasive MDE with AJava less error-prone than
nent types (denoted components) and their instances (denoted                                                        employing the delegation pattern for integration of hand-
subcomponents). Components feature parameters (similar to                                                           crafted code?
constructors in object-oriented GPLs) and interfaces of typed,                                                 RQ3 How convenient is pervasive MDE with AJava for the
directed ports. They either are composed (SearchControl)                                                            developers?
or atomic (Collision). Composed components yield config-                                                          Based on these research questions, we determine the vari-
urations of subcomponents that exchange messages via unidi-                                                    ables to measure during the experiment (Section III-B) as
rectional connectors between their ports. Atomic components                                                    well as to formulate our hypotheses to be tested against the
yield a behavior description in form of an embedded action                                                     collected data in (Section III-C). Thereby, we strictly follow
language or via integration of handcrafted GPL artifacts.                                                      the GQM method, which we use as guideline for the study
   The BallFinder software architecture comprises ten                                                          design.
subcomponents of nine different component types. The two
components depicted on its left wrap sensing functions to                                                      B. Variables
localize a ball in the environment and to detect possible                                                         Before devising the corresponding metrics to answer our
collisions. Their messages are fed into an instance of the                                                     research questions, we address the variables of the experiment.
composed component type SearchControl, which uses                                                                 1) Independent Variables: As we stated before, we intend
these to determine the next navigation actions and logs both                                                   to gain insights into assets and drawbacks of pervasive MDE
incoming and outgoing messages. It sends motor commands                                                        compared with the integration of handcrafted artifacts. There-
to another composed component taking care of navigation as                                                     fore, we have only one independent variable, which is:
well as to an instance of StatusLamp that serves to convey                                                        • Development method The method used for behavior in-
messages to observers.                                                                                              tegration. This variable features two alternatives, namely
      the delegation pattern and AJava as we select these as             Errors in Time The number of errors in relation to the
      representatives for the methods in question.                       time used to come up with a solution.
   Thus, we are conducting a single factor experiment with two      RQ3 How convenient is pervasive MDE with AJava for the
alternatives. To this effect, we employ two test groups, one             developers?
for each treatment method. To determine the composition of               Time saving The opinion whether the techniques is
these groups, we employ randomization to ensure a balanced               considered as time saving.
distribution with respect to, e.g., knowledge and experience             Convenience The evaluation whether the technique is
between the groups.                                                      found to be easy to use.
   2) Controlled Variables: To be able to make reliable as-              Overhead creation The assessment whether the tech-
sertions after the experiment, we try to control the following           nique requires efforts which are unnecessary for the actual
variables:                                                               solution of the problem.
   • Experience MontiArcAutomaton is developed by the                    Demand for more tooling The rating whether the tech-
      chair of Software Engineering at the RWTH Aachen                   nique needs more tooling to be used properly.
      University and part of its teaching activities. Hence            Each of these variables is measured for each of the tasks
      knowledge about its basic functionalities and its methods     individually and once for the whole project. This way, we
      can be assumed. However, we provide an introduction           enable a result analysis with more insights about the specific
      to ensure that all subjects participate with the same         strong and weak aspects of the methods.
      knowledge.
   • Programming skills These are important for develop-
      ment in general. We expect these to be homogeneously          C. Hypotheses
      distributed, thus we consider this variable as controlled.
      By employing questionnaires, we aim to ensure that our           To enable the statistical analysis of the experiments results,
      assumption on Java programming skills holds.                  we state our hypotheses pairs in the following. For the sake
   • Project The environment in which techniques are exerted
                                                                    of simplicity, we give a combined expectation about each
      has a major influence on the results. We prepare a project    research question.
      consisting of multiple tasks and provide a project skeleton   RQ1 We expect using AJava requires less development time
      to the participants. In Section III-E, we give an overview       and code generation iterations, i.e.,
      over this project and introduce the tasks.                       H1.10 Solution finding time is larger or equal.
   • Tooling Tooling, such as IDEs and build systems, can              H1.11 Solution finding time is less.
      reduce the work for certain tasks enormously. We ensure          H1.20 Solution creation time is larger or equal.
      that the tooling support for applying both methods is            H1.21 Solution creation time is less.
      identical.                                                       H1.30 Code generation is run more or equally often.
   3) Response Variables: The last step of the GQM method              H1.31 Code generation is run less often.
is to consider each of the research question individually and          H1.40 Testing time is larger or equal.
devise a list of metrics to answer these.                              H1.41 Testing time is less.
RQ1 Does pervasive MDE with AJava help to reduce the                   H1.50 Error-solving time is larger or equal.
   development time compared to modeling component be-                 H1.51 Error-solving time is less.
   havior with Java using the delegation pattern?                   RQ2 We expect using AJava produces less errors and conse-
   Time of solution finding The time needed to resolve the             quently requires less time for error solving, i.e.,
   whole task, including testing and error-solving.                    H2.10 The absolute number of errors is larger or equal.
   Time for programming The time required to develop                   H2.11 The absolute number of errors is less.
   the actual solution. Includes actual programming time               H2.20 The time for error-solving is larger or equal.
   only, i.e., excludes testing and code generation times.             H2.21 The time for error-solving is less.
   Number of code generations The number of times the                  H2.30 The errors per time is larger or equal.
   code generator was started.                                         H2.31 The errors per time is less.
   Time for testing The time needed to test the solution.           RQ3 We expect using AJava better accepted and conceived
   Time needed for error-solving The time used to solve                more comfortable, i.e.,
   errors reported during compiling and testing.                       H3.10 The valuation of time saving is less or equal.
RQ2 Is the pervasive MDE with AJava less error-prone than              H3.11 The valuation of time saving is larger.
   employing the delegation pattern for integration of hand-           H3.20 The valuation of convenience is less or equal.
   crafted code?                                                       H3.21 The valuation of convenience is larger.
   Absolute number of errors The number of error occur-                H3.30 The valuation of overhead is larger or equal.
   ring during compilation and testing.                                H3.31 The valuation of overhead is less.
   Time needed for error-solving The time used to solve                H3.40 The demand for tooling is larger or equal.
   any errors that occurred.                                           H3.41 The demand for tooling is less.
  As the variables are collected for each task individually, the       collision sensor representation         3D environment visualization
hypotheses are validated respectively. We employ a one-sided
Student’s t-test for those metrics that are of a ratio scale, e.g.,
development time or error count. For the metrics, especially
those of RQ3, we use a one-sided Mann-Whitney-U test since
we deal with measured data of ordinal scale.                                                                   ball

D. Participants
   Our experiment represents an internal validation, i.e., the
participants are computer science students at the chair of Soft-
ware Engineering. This includes Bachelor, Master, and Ph.D.
                                                                                                robot with collision
students, all of which expected to have a solid knowledge in
                                                                                                  sensors and lamp
Java due to their curricula and background as well as interests
in software architecture. Furthermore, we anticipate that a ma-
jority of the participants already is aware MontiArcAutomaton
and has a rudimentary idea of its capabilities due to its inclu-
sion in the the department’s teaching and research activities.
Nevertheless, an introduction to MontiArcAutomaton is given
                                                                        simulation       robot ego                               simulator and
to ensure a consistent foundation for all participants. Based on                                                                camera controls
                                                                         details        perspective
the in the experiment included questionnaires, a more detailed
description of the demography of the participants including               Figure 2. The graphical interface of the Simbad robot simulator.
their prior knowledge can later be given during the result
analysis. We expect to have 20 subjects participate in the
                                                                                                Table I
experiment.                                                              TASK COMPONENTS WITH THEIR IMPLEMENTATION CHALLENGES

E. Experiment Materials                                                 Component Name                   Logic        Integration   Refactoring
   During the experiment, the participants have to complete             Camera                             3              3
a series of separate tasks in a predefined order. Each of               Collision                          3              3
these tasks has the goal to implement a particular component            Logger                             3              3
behavior in an overall project, but is itself independent. For          MovementControl                                                 3
each of these components, an exact description of the behavior          MotorControl                       3                            3
is provided in form of an implementation-independent text               CompositeMotor                                    3
description. Since we are only interested in the behavior im-           WheelMotor                                        3
plementation during the experiment, an architecture skeleton
of this project will be provided.
   Each participant is implementing the software for a robot on       Therefore, it is equipped with a software camera for locating
the basis of the open source Simbad robot simulator framework         balls and a range of bumper sensors for collision detection.
[14]. Simbad features a 3D virtual environment in which a             To provide easy visual feedback to the observer, it features a
virtual robot can be executed. The robot can be equipped with         simple lamp as actuator. The search takes place in two separate
a variety of utilities, which enable it to sense objects and          phases: locating and collecting. During locating, the robot
obstacles which can be placed in this environment. The move-          rotates around its axis until a red ball is recognized through a
ment is realized via two different kinematic models to choose         rudimentary image analysis via its camera. When successful,
from (1) the default composite control with a velocity and a          the robot switches into the collecting phase: it stops rotating
rotation value; and (2) the differential control, in which two        and starts moving forward towards the ball until a collision is
wheels are separately controllable. The individual behavior is        detected. During the first phase, the lamp is supposed to blink
implemented in a dedicated method which is repeatedly called          while during the second it should be turned on continuously.
by the simulator framework. To enables the supervision of this        After a successful retrieval, a new ball appears at a different
behavior, Simbad features a simple interface which renders the        spot in the environment and the whole procedure starts all over
current as state of the environment as well the different sensor      again.
values. Exemplary, the user interface with the experiment’s              The architecture used for this robot was already introduced
scenario is depicted in Fig. 2. It enables the control over the       by Fig. 1. Almost each of its components is an experimental
simulation, and provides a 3D environment visualization as            item to which the two treatments should be applied to. There-
well as details about the simulation and the robot state (left).      fore, the subjects should develop the components’ the behavior
   The robot software architecture implemented by the par-            implementations. Additionally, one task requires to refactor the
ticipants during the experiment is called BallFinder. As              architecture to change the robot’s kinematic model. In Table I,
the name suggests, its task is to find and collect red balls.         an overview of the components to be implemented is given
with the kind of their related implementation challenges. We             input values from the containing component as well as
distinguish between three main challenge kinds:                          the output values of the InternalSearchControl.
                                                                         Monitoring their changes, it writes these events including
 1) Logic Tasks of this kind demand a behavior implementa-
                                                                         a time stamp to a log file.
    tion with a minimal degree of logic. This should involve
                                                                       • MovementControl This composed component controls
    the usage of control structures like branches and loops.
                                                                         the robot’s movement. Therefore, it translates the
    Thus, we expect the subject to implement more than
                                                                         movement commands received from searchControl
    simple variable reading and writing.
                                                                         to movement of the actual “physical” devices of
 2) Integration This kind includes the interaction with a
                                                                         locomotion. At the beginning of the experiment,
    native API. Instead of developing self-contained code,
                                                                         it contains instances of MotorControl and
    external code in form of provided functions or classes are
                                                                         CompositeMotor, the latter matches Simbad’s
    employed. Therefore, the subject has to include imports
                                                                         default kinematic model. During the experiment, the
    and aspects like function signatures in its consideration.
                                                                         participants have to restructure MovementControl
 3) Refactoring Tasks of this kind require more than the
                                                                         and replace the CompositeMotor instance with two
    creation of new solutions: they might involve the restruc-
                                                                         instances of type WheelMotor.
    turing existing components. Hence, the subject have to
                                                                       • MotorControl The task of this component is the transla-
    consider the existing (old) behavior and has to perform a
                                                                         tion of movement commands for the actual drive control.
    refactoring in an at least limited sense.
                                                                         Initially, when the default kinematic model is deployed,
   We plan to use this classification later for a more differen-         the component possesses two outgoing ports to control
tiated result analysis. In the following, we give more detailed          an instance of CompositeMotor. After restructuring,
description of the components for a better understanding of              the differential model is used, it features one output value
the project and experiment.                                              for each of the two WheelMotor instances.
   • Camera This component uses the camera sensor of the               • CompositeMotor This component corresponds to one of
      Simbad framework to detect whether a red ball is located           the kinematic models available in Simbad and represents
      in front of the robot. It does so by analyzing the color           a single “physical” propulsion. It receives a velocity
      picture delivered by the hardware. The result of the               and a rotation value as input to control the movement.
      analysis is reported through its output port ball. The             The behavior implementation of this component passes
      image perceived by the camera is provided in form of a             the two incoming values to the Simbad framework.
      100 × 100 array of RGB values. For image recognition,            • WheelMotor A component of this type matches with
      a simple algorithm is supposed to be implemented by                one wheel of the differential kinematic model. It re-
      the subjects. Instead of using an actual object recognition        ceives one value as input. A positive one results in a
      algorithm, only the red channel of the image is used. Fur-         forward rotation of the wheel, while a negative value
      thermore, only the center column of pixels is considered           causes a backwards one, respectively. Consequently,
      and segmented into sections with size of 10 pixels. Since          CompositeMotor poses implementation challenge re-
      the virtual environment is mainly colored in green and             garding integration of native API.
      blue, an average red value in one of these sections larger
      than the threshold 100 is sufficient to confirm a ball in      F. Conduction Plan
      from of the robot. This way, a satisfying trade-off between        The complete experiment takes 100 minutes per participant.
      task complexity and detection accuracy is achieved.            For the execution, experiment equipment is provided, this in-
   • Collision This sensor is used to detect physical contact        cludes notebook computers running Ubuntu Linux, the Atom2
      of the robot to other objects in the environment. For this     editor with disabled syntax highlighting and disabled language
      purpose, it is equipped with a belt of 16 evenly distributed   completion support, and a makefile responsible for triggering
      bumper sensors. To signalize whether a collision was de-       the maven-driven build processes. We provide the restricted
      tected the collision port is used. Each sensor reports         editor and the makefiles to mitigate the effects of advanced
      a collision through Boolean value and the component            IDEs. Using basic editing functionalities, such as “find” or
      iterates over all sensors to determine whether any sensors     “replace” is not restricted. The provided makefiles take care
      reports a collision.                                           of parsing models, generating code, compiling, and packaging
   • SearchControl This is a composite component con-                it.
      sisting of the subcomponents InternalControl and                   To enable a correct attribution of the subjects actions to
      Logger. It is the central entity of the project since and      the correct metric, we use screen recording and employ a
      is in charge of controlling the robot. To facilitate the       camera to capture the non-digital behavior. Furthermore, we
      removing errors, its behavior as well as the received input    are storing the project state at each compilation and run for
      values are logged through Logger. This component does          an easier analysis in the aftermath.
      receive any treatment from the subjects.                           During execution, each subject performs the following:
   • Logger This component handles the logging requirement
      of the SearchControl type. Therefore, it receives all           2 Atom website: https://atom.io
 1) Pre-questionnaire (5min) Before the actual start of the          We recognize following potential threats:
    experiment, the participant is asked to fill out a survey          • Knowledge of Java Especially the performance metrics
    on his or her modeling and programming skills. This in-              to answer RQ1 and RQ2 would be affected. Since we
    cludes inquiring the experience with modeling and object-            expect a mostly homogeneous group of participants with
    oriented programming in general as well as specific                  this regard and we randomize the participants among
    modeling and programming techniques and languages.                   the groups, we do not consider this as problematic.
 2) Introduction (25min) Next, we give an introduction                   Questionnaires are conducted to support this assumption.
    to the study’s development environment, MontiArc-                  • Experience with delegation pattern or AJava Similar
    Automaton, and the respective behavior modeling tech-                to the Java knowledge, we expect no risk from this aspect.
    niques for the participants group. Moreover, we introduce            Differences in experience should not be present in large,
    the BallFinder and the modeling task. The participant                for this claim we use the questionnaires as well and
    may explore prepared example code to familiarize himself             provide a proper introduction to the techniques before
    or herself with the assigned modeling technique and                  the actual experiment.
    may inspect the provided architecture of the experimental          • Tooling support Tools, such as IDEs, facilitate the devel-
    project.                                                             opment and, hence, entail the risk of actually measuring
 3) Tasks (65min) We hand out the task description to                    the quality of the tools instead of the techniques to be
    the participant and let him or her perform the task                  investigated in the experiment. Through minimizing the
    using the assigned modeling technique. The participants              tooling and providing the identical tooling to both groups,
    are requested to solve the task steps in order of their              we ensure that this does not affect our experiments.
    presentation. This order is identical to the presentation in       • Unfamiliar environment As a side effect, we force
    Table I, the subjects will therefore implement the robot             the subjects into using an unfamiliar development en-
    following the flow of information from input to output.              vironment, which may be conceived a hindrance. To
    During the whole time, the participants are allowed to ask           address this, included an introduction phase in which each
    questions related to the understanding of the exercises,             subject can make herself or himself comfortable with the
    but no information regarding the solution is given.                  environment. As this applies to both groups, however, this
 4) Post-questionnaire (5min) Ultimately, we conduct a                   should not impact the results.
    second questionnaire to learn more about the acceptance            • Absolute acceptance The performed acceptance mea-
    and ask for general feedback about the experiment.                   surements are absolute and therefore not usable for a
                                                                         comparison unless the same subject experiences both
                       IV. D ISCUSSION                                   techniques. We do not negate this finding. However, we
   Our study design enables comparing MDE with integration               would like to note that we are not interested in determin-
of handcrafted artifacts to pervasive MDE with action lan-               ing an absolute acceptance value and if one techniques
guages. As representatives, we chose AJava and the delegation            performs really bad or is dissatisfying in certain tasks,
pattern for the conduction of the experiment. This study should          we still should be able to notice a difference.
give us a first good indication about their respective benefits in     • Small sample size The number of participants makes the
practice. To this effect, we embedded the experiment’s tasks             results unreliable in terms of significance. There is not
into a small project to simulate a realistic scenario. Through           doubt about this risk, but we would like to note that the
a minimal set of tools, we ensured that we compare the                   exact indicators are only available after the execution of
techniques and not the tooling. By our choice of tasks, we               the experiment. Furthermore, we employ multiple tasks
covered many different aspects of the development process.               for this reason and perform the comparison for each
In addition to the implementation of logic, we should receive            individually, but also in sum. When increasing the sample
insights about refactoring and the integration of native API             by including more external participants, we see the risk
as well. For the comparison, we do not only consider per-                of create more heterogeneous groups endangering our
formance metrics, we also take acceptance and usability into             controlled variables.
account. Both aspects add up to full picture, compensating
shortcomings of other metrics. Otherwise, a later employment         B. Threats to External Validity (Generalizability)
of the technique in practice could be endangered.
   The greatest risk to an experiment is the invalidation of            Threats to external validity address the generalizability of
results found in the collected data. Therefore, we discuss           empirical research. For our study, the most important threats
the possible threats to internal and external validity in the        to its generalizability are:
following and examine how we assured that conclusions are               • Project size The software engineering challenges of
legit and in which context the results are valid.                          our project is several magnitudes less than in industrial
                                                                           projects. Hence, the results can not be generalized easily.
A. Threats to Internal Validity (Causality)                                The size is unfortunately naturally limited by the available
  Threats to internal validity arise from factors influencing the          time of the participants in a laboratory experiment. We
causality between independent and dependent variables [21].                tried to overcome this problem by embedding different
    types of implementation challenge (see Table I) to cover
    multiple application scenarios.
  • Academic context This experiment features an artificially
    created project which differs from real-world projects.
    Again, we refer to the different implementation chal-                  Solution finding time          Programming finding time
    lenges we employed. Beyond, only repetitions in industry
    are able to tackle this threat.
  • Non-professionals We incorporated students in our ex-
    periment which might not be a good replacement for                 Code generation iterations                 Test errors
    actual professionals. To address this issue, we refer
    to [22] for reasons why we think that choice restricts
    the generalization only minimally.
  • Choice of techniques We used Java, AJava and the del-
    egation pattern to compare pervasive MDE to integrating
    handcrafted code. One might argue that the delegation               Practicality (low to high)         Time saving (low to high)
    pattern is especially complicated and using other patterns
    would yield better results for the approach of handcrafted     Figure 3.   Findings from performing our study with a pilot group of 13
                                                                   students.
    code. Or that Java might be especially unsuited for this
    topic in general. As Java is one of the most popular
    languages for professional software engineering and the        57% of the delegation pattern group consider it introducing
    delegation pattern is very popular for integrating gener-      large overhead. Significance of the results implies that H3.21
    ated and handcrafted code, we assume that both reflect         can be accepted. Regarding the findings of time saving with
    the state of the art appropriately.                            any of the techniques and demand for tooling, we did not
                                                                   receive significant results or notable biases in the results.
                 V. P RELIMINARY R ESULTS                          Hence we cannot refute the related hypotheses.
   We conducted a pilot study with 13 participants, out of
which 7 employed the delegation pattern and 6 employed                                   VI. R ELATED S TUDIES
AJava to developed the BallFinder robot software. Inter-              Our study design basically compares two “programming”
esting findings are illustrated in Figure 3 and discussed below.   languages. From this vantage point, several related studies
   Regarding solution finding time, the AJava group solved         have been conducted. The study presented in [15], for instance,
their tasks in a median of 39 minutes compared to 64 minutes       compares “tangible” graphical programming languages where
required by the delegation pattern group. With respect to          the resulting programs were executed on a real robot platform.
programming time, the AJava group required a median of             The authors do not investigate the differences in abstraction
30 minutes compared to 55 minutes of the delegation pattern        between different languages.
group. With p-values of less than 1%, both H1.10 and H1.20            The authors of [16] compare seven GPLs regarding run-
hence can be rejected for the pilot study. For the variables of    time efficiency, loading efficiency, program length, and other
error resolving time and testing time, we received indifferent     factors. They also do not compare their levels of abstraction.
results without relevant significance, hence we cannot refute         Other studies investigate the introduction of MDE in indus-
the related hypotheses. The same holds for the number of           trial contexts to identify obstacles [17] or challenges [18] for
code generation iterations: On average, the AJava group            its adoption [19]. They do, however, not directly investigate
generated code 15 times, whereas the delegation pattern group      the acceptance of pervasive MDE over the programming
on average generated code 20 times.                                techniques established at the respective companies.
   Developing with AJava led to a consistent number of                There also are various studies on different modes of (virtual)
between 11 and 14 test errors, whereas participants of the         robot programming, such as the experiment reported in [20],
delegation pattern group finished with a wide range of between     which are related to our experiment, but not to the investigated
1 and 14 errors. Measuring the amount of test errors per           objects.
minute produced similar diversity, hence H2.20 and H2.30              Overall, to the best of our knowledge, there is only little re-
cannot be rejected either.                                         lated work on comparing pervasive MDE to providing system
   The participants’ assessment on the different techniques’       behavior via integration of handcrafted GPL code.
convenience generally seem favor using AJava: 83% of the
participants of this group consider AJava practical, whereas                               VII. C ONCLUSION
only 43% of the delegation pattern group participants consider        We have presented the design of a study on the acceptance
that practical. As the results are almost significant only, we     and effects of pervasive model-driven engineering using action
also cannot refute H3.10 . Considering the overhead imposed        languages to interface GPL code artifacts. In the proposed
by the respective techniques, 50% of the AJava group’s partici-    study, two groups of participants will develop the behavior
pants consider AJava not introducing overhead at all, whereas      of a robot fetching balls in a simulation environment using
MontiArcAutomaton with either embedded Java/P or by in-                               in Proceedings of the 3rd International Conference on Model-Driven
tegrating handcrafted artifacts. We record the study sessions                         Engineering and Software Development, 2015.
                                                                                 [10] J. O. Ringert, B. Rumpe, C. Schulze, and A. Wortmann, “Teaching agile
and instrument these with quality assurance infrastructure,                           model-driven engineering for cyber-physical systems,” in Proceedings of
and surveys. Based on this data, we aim to gain a better                              the 39th International Conference on Software Engineering: Software
understanding about the impact of using action languages in                           Engineering and Education Track, ser. ICSE-SEET ’17. Piscataway,
                                                                                      NJ, USA: IEEE Press, 2017, pp. 127–136.
modeling activities. Preliminary results tend to favor pervasive
                                                                                 [11] R. Heim, P. M. S. Nazari, J. O. Ringert, B. Rumpe, and A. Wortmann,
MDE with respect to efficiency and convenience. However,                              “Modeling Robot and World Interfaces for Reusable Tasks,” in 2015
these results are based on a small sample size and must be                            IEEE/RSJ International Conference on Intelligent Robots and Systems
investigated with larger groups of participants.                                      (IROS), 2015, pp. 1793–1798.
                                                                                 [12] M. Fowler, Domain-Specific Languages. Addison-Wesley Professional,
                              R EFERENCES                                             2010.
                                                                                 [13] V. R. Basili and D. M. Weiss, “A methodology for collecting valid
 [1] R. France and B. Rumpe, “Model-driven Development of Complex Soft-               software engineering data,” IEEE Transactions on Software Engineering,
     ware: A Research Roadmap,” Future of Software Engineering (FOSE                  vol. 10, no. 6, pp. 728–738, Nov 1984.
     ’07), no. 2, pp. 37–54, may 2007.                                           [14] L. Hugues and N. Bredeche, “Simbad: An Autonomous Robot Simu-
 [2] M. Völter, T. Stahl, J. Bettin, A. Haase, S. Helsen, K. Czarnecki, and           lation Package for Education and Research,” in Proceedings of the 9th
     B. von Stockfleth, Model-Driven Software Development: Technology,                International Conference on Simulation of Adaptive Behavior (SAB), ser.
     Engineering, Management, ser. Wiley Software Patterns Series. Wiley,             Lecture Notes in Computer Science, vol. 4095. Berlin, Heidelberg:
     2013.                                                                            Springer Berlin Heidelberg, 2006, pp. 831–842.
 [3] T. Greifenberg, K. Hölldobler, C. Kolassa, M. Look, P. Mir Seyed            [15] M. S. Horn, E. T. Solovey, R. J. Crouser, and R. J. Jacob, “Comparing
     Nazari, K. Müller, A. Navarro Perez, D. Plotnikov, D. Reiß, A. Roth,             the use of tangible and graphical programming languages for informal
     B. Rumpe, M. Schindler, and A. Wortmann, “Integration of Handwritten             science education,” in Proceedings of the SIGCHI Conference on Human
     and Generated Object-Oriented Code,” in Model-Driven Engineering                 Factors in Computing Systems. ACM, 2009, pp. 975–984.
     and Software Development Conference (MODELSWARD’15), ser. CCIS,             [16] L. Prechelt, “An empirical comparison of seven programming lan-
     vol. 580. Springer, 2015, pp. 112–132.                                           guages,” Computer, vol. 33, no. 10, pp. 23–29, Oct 2000.
 [4] A. N. Meyer, T. Fritz, G. C. Murphy, and T. Zimmermann, “Software           [17] M. Staron, “Adopting model driven software development in industry-a
     developers’ perceptions of productivity,” in Proceedings of the 22nd             case study at two companies,” in MoDELS, vol. 6. Springer, 2006, pp.
     ACM SIGSOFT International Symposium on Foundations of Software                   57–72.
     Engineering, 2014.
                                                                                 [18] P. Baker, S. Loh, and F. Weil, “Model-driven engineering in a large
 [5] J. O. Ringert, A. Roth, B. Rumpe, and A. Wortmann, “Language and
                                                                                      industrial context – motorola case study,” Model Driven Engineering
     Code Generator Composition for Model-Driven Engineering of Robotics
                                                                                      Languages and Systems, pp. 476–491, 2005.
     Component & Connector Systems,” Journal of Software Engineering for
     Robotics, 2015.                                                             [19] J. Hutchinson, M. Rouncefield, and J. Whittle, “Model-driven engi-
 [6] B. Rumpe, Modeling with UML: Language, Concepts, Methods.                        neering practices in industry,” in Proceedings of the 33rd International
     Springer International, July 2016.                                               Conference on Software Engineering. ACM, 2011, pp. 633–642.
 [7] H. Krahn, B. Rumpe, and S. Völkel, “MontiCore: a Framework for Com-         [20] C. S. Tzafestas, N. Palaiologou, and M. Alifragis, “Virtual and remote
     positional Development of Domain Specific Languages,” International              robotic laboratory: Comparative experimental evaluation,” IEEE Trans-
     Journal on Software Tools for Technology Transfer (STTT), vol. 12, no. 5,        actions on education, vol. 49, no. 3, pp. 360–369, 2006.
     pp. 353–372, September 2010.                                                [21] C. Wohlin, P. Runeson, M. Höst, M. C. Ohlsson, B. Regnell, and
 [8] J. Aldrich, C. Chambers, and D. Notkin, “ArchJava: connecting soft-              A. Wessln, Experimentation in Software Engineering. Springer Pub-
     ware architecture to implementation.” in International Conference on             lishing Company, Incorporated, 2012.
     Software Engineering (ICSE) 2002. ACM Press, 2002.                          [22] D. Falessi, N. Juristo, C. Wohlin, B. Turhan, J. Münch, A. Jedlitschka,
 [9] A. Haber, M. Look, A. Navarro Perez, P. Mir Seyed Nazari, B. Rumpe,              and M. Oivo, “Empirical software engineering experts on the use of
     S. Völkel, and A. Wortmann, “Integration of Heterogeneous Modeling               students and professionals in experiments,” Empirical Software Engi-
     Languages via Extensible and Composable Language Components,”                    neering, 2017.

</pre>