<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Comparing Comprehensibility of Modelling Languages for Specifying Behavioural Requirements</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Grischa Liebel</string-name>
          <email>grischa@chalmers.se</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Matthias Tichy</string-name>
          <email>matthias.tichy@uni-ulm.de</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Software Engineering Division, Chalmers j University of Gothenburg</institution>
          ,
          <country country="SE">Sweden</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Ulm University</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <fpage>17</fpage>
      <lpage>24</lpage>
      <abstract>
        <p>-The selection of a suitable modelling language influ- sequence-based notations (MSDs) and state-based notations ences the success of software modelling. Several experiments com- (TAs), as they are similar in terms of expressiveness and hence paring the comprehensibility of graphical modelling languages possible alternatives for expressing the same requirements, and thhaevecobmeepnrephuebnlsiisbhielidty. Hofowfeuvnecrt,ionnoalpruebqliusihreedmesntutsdymcoodmelpleadrining as both have been applied in industrial case studies, e.g. [10], different graphical modelling languages exists. This paper eval- [11], [12]. Additionally, both languages were already used in a uates how two requirements modelled in a sequence-based nota- joint project with industrial partners. The experiment is based tion, Modal Sequence Diagrams, and in a state-based notation, on an extensive and detailed requirements specification by a Timed Automata, compare with respect to comprehensibility. A vehicle manufacturer that defines the behaviour of a software ccoountrrsoelloend seoxfpt werairmeemntodweiltlhing22wsatsudpeenrtfofrrmomed.aOnuurnrdeesruglrtsadsuhaotwe function to be realised by a supplier. Hence, the requirements no significant differences with respect to the comprehensibility specification is quite detailed and as such a reasonable candidate of the two different languages, but subjects who answered the for modelling. We used 22 undergraduate students in a course questionnaire for the sequence-based notation completed signifi- on software modelling as subjects. Our results show no cantly more answers in the given time limit. These initial results significant difference with respect to the comprehensibility of imnoddicealltiengthbaatsecdhooonsincognvaenmieondceellidnogeslannogtusaiggneififocranrtelyquaifrfeemctetnhtes requirements modelled in the two languages, but requirements understanding of the resulting requirements. modelled in MSDs are significantly quicker to understand. This indicates that the current practice, selecting visual languages I. INTRODUCTION based on convenience, is in fact feasible with respect to the comprehension of the resulting requirements specifications. However, it might take longer to understand the requirements depending on the chosen language. The remainder of this paper is structured as follows. In Section II, related literature is discussed. Section III covers the basics of the two visual modelling languages we used in the experiment. Section IV describes the experiment design, followed by a discussion of validity threats in Section V. Section VI presents the actual results and discusses them in depth. The paper is concluded in Section VII.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Choosing a visual modelling language in practice is typically
dependent on previous experience with the modelling language
or the availability of the respective modelling tools. While both
aspects are certainly reasonable, other criteria are similarly
important.</p>
      <p>
        Comprehensibility is a commonly evaluated criteria of
visual languages, e.g., for UML diagrams in [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <p>
        However, evaluation results can be contradicting as in [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] and
[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Similarly to visual languages, the comprehensibility or
understandability of software requirements specifications is
the most common aspect evaluated in empirical requirements
engineering studies [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. At the same time, their practical
value is often questioned [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. A recent family of experiments
by Abraha˜o et al. reports that providing sequence diagrams
together with a natural language specification increases
comprehensibility [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. Additionally, the authors state that possible
future work in this area could be “experiments to analyse the
effect of different behavioural diagrams in the comprehension
of software models”.
      </p>
      <p>As a first step in this direction, we conducted a controlled
experiment in order to understand which behavioural diagrams
perform superior to others for the specific case of modelling
functional requirements with respect to comprehensibility.</p>
      <p>
        Specifically, we compared two modelling languages, Modal
Sequence Diagrams (MSDs) [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] and Timed Automata (TA) [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
      </p>
      <p>We chose these two languages as they are representatives for</p>
      <p>
        In the context of the UML [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], a number of experimental
studies have been published that compare different modelling
languages with respect to comprehensibility. The
comprehensibility of UML behavioural diagrams, namely sequence,
collaboration, and state machine diagrams, in both real-time
and management information systems is compared using a
controlled experiment by Otero and Dolado [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. The results
show that sequence diagrams are more comprehensible for
realtime systems than for management information systems. With
respect to the answering speed, their data shows that sequence
diagrams perform better than collaboration and state machine
diagrams for both domains. As subjects, 31 undergraduate
students are used in their study.
      </p>
      <p>
        In contrast to this, Glezer et al. report that sequence diagrams visual modelling languages. However, the outcomes vary and
are more comprehensible for management information systems are sometimes even contradicting, e.g. in [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] and [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
than for real-time systems [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. The authors mainly attribute Additionally, we are not aware of any experiment comparing
this difference to the previous knowledge of the subjects, who requirements represented by behavioural models only. This
were not experienced in real-time systems. In this study, the is a gap in knowledge, as requirements are typically on a
76 student subjects performed the experiment in terms of a more abstract level than for example software design and are,
mandatory mid-term exam. additionally, often intended to be read and understood by
non
      </p>
      <p>
        Nugroho investigates the impact of detail on the compre- experts. Particularly, in the automotive domain, it is the usual
hension of UML Class, Sequence, Package, and Use Case process that detailed requirements specificions covering the
diagrams in form of a controlled experiment with 53 graduate behaviour of software components are defined by the vehicle
students [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. The author reports that a low level of detail can manufacturer and subsequently sent to the supplier who needs
lead to misinterpretations and that the subjects’ knowledge did to correctly understand and realise the specified behaviour. We
not have an impact on the comprehension. are filling this gap with our contribution in this paper.
      </p>
      <p>
        Staron et al. report the results from four controlled
experiments studying the impact of using UML stereotypes III. BACKGROUND
on comprehensibility conducted with 68 students and 4 In the following, we introduce the two compared modelling
professionals in total [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. The studies show that stereotypes languages and illustrate them using sample models from our
indeed improve the comprehensibility and the total and relative experiment. The models specify the behaviour for the case
times for answering the used questionnaires. that a user wants to increase the speed of a wiper by one unit.
      </p>
      <p>
        Similar to the UML, comprehensibility is a commonly Since both languages basically employ the same modelling
studied characteristic of requirement specifications. Condori- notations to specify real-time aspects, we did not use any
realFerna´ndez et al. present an evaluation of empirical studies time aspects but instead focused on non real-time behaviour.
until 2008 on requirements comprehensibility [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. The authors Furthermore, a pilot of the experiment showed that the
realconclude that while comprehensibility studies are common, time aspects were too difficult to understand for the planned
many of them have practical limitations, such as using made- experiment. We plan a future experiment specifically targeting
up examples instead of real specifications. the real-time aspects.
      </p>
      <p>
        Kamsties et al. study how different specification techniques
affect the comprehensibility of a software requirements specifi- A. Modal Sequence Diagrams
cation, using a re-engineered specification of a bicycle computer Modal Sequence Diagrams (MSDs) [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] are a recent variant
[
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. The authors report that black-box specification techniques, of Live Sequence Charts (LSCs) [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] to model the behaviour
describing a system by its externally visible behaviour, lead of a set of objects. MSDs/LSCs are sequence diagrams that,
to a faster and more correct answering of the used instrument by different modalities assigned to messages and conditions,
than white-box specification techniques, where the system is allow to precisely describe scenarios with liveness (something
described by the behaviour between its entities. good must happen) and safety (something bad must not
      </p>
      <p>
        Finally, there are a number of studies which investigate the happen) properties. Notably, LSCs and MSDs define how
comprehensibility of requirements modelled in or enhanced multiple scenarios can be active concurrently and synchronise
with visual modelling languages. Scanniello et al. study the on common events as well as activate and de-activate MSDs.
effect on requirements comprehensibility when using SysML This allows engineers to flexibly specify systems that fulfill
diagrams in addition to natural language, compared to only different tasks at the same time.
natural language requirements [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. The authors use students One key advantage of MSDs/LSCs is that they can be
as subjects in two controlled experiments and report that executed with the play-out algorithm, which allows engineers
comprehensibility is increased when SysML diagrams are and other stakeholders to understand the behaviour emerging
provided, whereas completion time for the comprehension from the interplay of the scenarios [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]. Furthermore, it is
task is unaffected. possible to analyse whether a set of scenarios can be realised,
      </p>
      <p>
        A recent paper by Abraha˜o et al. reports a family of five i.e., it does not contain contradictions or results in deadlocks.
experiments on the comprehensibility of functional require- Figure 1 shows a sample MSD used in our experiment. It
ments modelled with sequence diagrams in addition to the specifies the communication between a user, a wiper controller
natural language specification [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. Hereby, one experiment uses as well as the actual wiper actuator. The sequence in the figure
undergraduate students, two experiments use master students, describes that (1) a request is sent to the wiper controller
one experiment uses doctoral students and one experiment uses to increase the speed, (2) it is checked whether the wiper is
professionals as subjects. Four out of five experiments show in the state active, and (3) the controller sends a message
statistically significant support for improved comprehensibility to the actuator to increase the speed by one. If the check in
when using sequence diagrams. step 2 fails, the MSD will be de-activated and not further
      </p>
      <p>In summary, a number of experiments exist that investigate executed. Once the first message in an MSD is executed
the comprehensibility of visual modelling languages, of re- (wiperRequest(WiperRequest::WIPER INCREASE) in Figure
quirements specifications, and of requirements represented in 1), it is called active. After the last message, the MSD is
MSD Start_Increase</p>
      <p>wiper: act:
usr: User WiperController WiperActuator
wiperRequest(WiperRequest::WIPER_INCREASE)
wiper.wiperState == WiperState::WIPER_ACTIVE
addToCurrentSpeed(1)
0
1
deactivated again. The numbers on the right side describe the
so-called cut, the positions in which an MSD can be.</p>
      <p>The complete MSD model consists of a set of five scenarios
covering the communication and conditions for the three
mentioned objects.</p>
    </sec>
    <sec id="sec-2">
      <title>B. Timed Automata</title>
      <p>
        Timed automata [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] are a state-based formalism which
extends finite automata with a set of real-valued variables called
clocks as well as various real-time constraints. Several timed
automata can be combined into a network of timed automata
where different automata synchronise their behaviour by, so
called, synchronisation channels. Synchronisation channels can
be used as a means to specify synchronous message passing.
Timed automata can be both simulated as well as verified for
correctness using model checking.
      </p>
      <p>Figure 2 shows the timed automaton for the wiper
controller covering the increase wiper speed scenario as
described previously for the MSD. It defines that if a wiper
request for increasing the speed (condition:
wiperRequestSignal==WiperRequest INCREASE) using the synchronisation
channel WiperRequest? is received and the wiper is active
(condition: Actuator WiperState==WiperState ACTIVE), then
the wiper speed is increased by 1 and a helper variable is set to
1 (addToCurrentSpeedSignal:=1). This helper variable is used
in another automaton for a long-press functionality.</p>
      <p>The complete TA model consists of a network of five timed
automata covering the communication and conditions for the
three mentioned objects.</p>
      <sec id="sec-2-1">
        <title>IV. EXPERIMENT DESIGN</title>
        <p>
          The evaluation of comprehensibility of the two considered
modelling languages used for requirements engineering was
performed using a controlled experiment. The goal of this
experiment is formulated as follows, using the Goal/Question/Metric
paradigm [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ]:
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Analyse requirements modelled in two different modelling</title>
      <p>languages for the purpose of comparison with respect to
comprehensibility from the point of view of software
developers in the following context: application (verification
and validation), subjects (students).</p>
      <p>
        We used a between-subject randomised design with two
treatments [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]. The between-subject design was chosen to
avoid learning effects. The treatments are the used modelling
language, namely MSD and TA. MSDs, a variant of Live
Sequence Charts [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ], are sequence diagrams with assigned
modalities that allow the expression of liveness and safety
properties, and real-time constraints. Timed Automata are
a modification of Finite Automata for the specification and
verification of real-time systems. Hence, both languages are
very similar in terms of expressiveness. However, MSDs use
a scenario-based description covering multiple objects in one
MSD, whereas TA use a state-based description covering a
single object in one TA. MSDs were chosen in order to have a
sequence-based language with executable semantics, in contrast
to UML Sequence Diagrams, and with the possibility to model
required and forbidden behaviour. In order to not introduce
any bias, we chose TA as a second modelling language, as
the language had not either been introduced during the course.
Both languages are used without their timing functionality.
Subjects were assigned randomly one of the two treatments. In
the following subsections, the details of the experiment design
are presented.
      </p>
    </sec>
    <sec id="sec-4">
      <title>A. Subjects</title>
      <p>We performed the experiment with 22 students from an
undergraduate course on software modelling. This is due to
availability reasons, as we had a scheduled university course
in the end of 2014 in which we could perform the experiment.
The students had basic knowledge of UML, as the experiment
was performed towards the end of the course. Both modelling
languages were only introduced prior to the experiment in a
single 45-minute lecture. However, the students were introduced
to similar languages earlier in the course, namely to UML
sequence diagrams and to UML state machine diagrams.</p>
    </sec>
    <sec id="sec-5">
      <title>B. Instrumentation</title>
      <p>As a basis for the experimental objects, which we used in the
study, we selected requirements from a real-life project within
the automotive domain from an industrial partner. The selected
requirements describe joint behaviour, i.e. the requirements are
not entirely independent. As these requirements are confidential,
we abstracted them and changed their actual content resembling
a car wiper specification. However, we ensured that the
complexity and the logic is comparable. These requirements
were then modelled by the main author of this paper using
MSD and TA. The resulting experimental objects consist of
two requirements models, SMSD and ST A, consisting of five
diagrams each. The diagrams specify the activation of a car
wiper in slow mode and in fast mode, the increase of the
wiper’s speed in two different ways, and the deactivation of
the wiper. Additionally, the experimental objects contained a
single page describing the context of each treatment. For the
MSD specification, this consisted of a UML class diagram and
an UML object diagram, and for the TA specification, this
page contained the system declarations.</p>
      <p>Finally, the instrument contained one page of syntax and
semantic explanation additionally to the introduction lecture
and a questionnaire. In turn, the questionnaire consisted of a
preexperiment part, collecting demographic data about the subjects
(including subjects’ knowledge regarding modelling languages),
a post-experiment part, collecting subjective judgment, and the
actual measurement questionnaire consisting of 12 questions
targeting the subjects’ understanding. The pre- and
postexperiment questionnaires were used to judge whether previous
experience, understanding of the introduction lectures, or other
factors might have affected the dependent variables. Due to
space limitations, we only discuss the data obtained from these
questionnaires briefly in Section VI. Each of the 12 questions
consisted of an initial state of the system and a number of
executed messages or commands. Then, 2 sub-questions were
asked. The subjects first had to answer whether the execution
violated the requirements or not. Additionally, we asked in
which state the system was after the execution (or right before
the requirements violation), either by asking for the system’s
variable values or by asking for the active cuts/states of each
diagram. Both sub-questions were awarded with one point
each. The second sub-question was only counted if the first
sub-question was correct, as it was otherwise already clear
that the subject had wrongly executed the requirements. An
example question with solutions for both the MSD and the TA
model is depicted in Figure 3.</p>
      <p>
        This questionnaire approach has been successfully applied
in many similar studies, e.g. in [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ], [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. The instrument,
together with the resulting raw data, is published at http://www.
grischaliebel.de/data/research/instrument exp msd ta.zip.
      </p>
    </sec>
    <sec id="sec-6">
      <title>C. Variables</title>
      <p>There is only a single independent variable in the performed
experiment. This is the used visual modelling language with
the values MSD or TA. We measured the comprehensibility
of the used requirements specification using three dependent
variables:
Answered: The number of answered questions.</p>
      <p>AScore: The average score achieved per answered question.
Score: The total score achieved for all 12 questions.
Instead of measuring the time, we decided to design the
instrument in a way that it would be difficult to answer all
questions in the given time frame. Therefore, we use the number
of answered questions, Answered, instead of the needed time.
We are foremost interested in using modelling languages for
verification and validation purposes later on. Therefore, we
Precondition:
Question 1:
Answer 1:
Question 2:
Answer 2:
Precondition:
Question 1:
Answer 1:
Question 2:
Answer 2:</p>
      <p>Actuator_WiperSpeed = Constants_SLOW
Actuator_WiperState = WiperState_ACTIVE
Wiper_VehicleStatus = VehicleStatus_RUNNING</p>
      <p>Wiper_WiperConfiguration = WiperConfig_INSTALLED!
1. WiperRequest is triggered, with wiperRequestSignal set to</p>
      <p>WiperRequest_OFF
2. SetWiperSpeed is triggered, with setWiperSpeedSignal set to</p>
      <p>Constants_OFF
Does the input scenario violate the specified behaviour?
No , Yes, in step: 1 ☐, 2 ☐
Which values do the following variables have
- after the execution of the input scenario (if A1 is ‘No’)
- before the violating step (if A1 is ‘Yes’)?
Actuator_WiperSpeed
Actuator_WiperState
Wiper_VehicleStatus
Wiper_WiperConfiguration
= Constants_OFF
= WiperState_ACTIVE
= VehicleStatus_RUNNING
= WiperConfig_INSTALLED
act.wiperSpeed = Constants.SPEED_SLOW
act.wiperState = WiperState::WIPER_ACTIVE
wiper.vehicleStatus = VehicleStatus::RUNNING
wiper.configuration = WiperConfig::WIPER_INSTALLED
1. usr sends Message ‘wiperRequest(WiperRequest::WIPER_OFF)’ to</p>
      <p>wiper
2. wiper sends Message ‘setWiperSpeed(Constants.SPEED_OFF)’ to</p>
      <p>act
Does the input scenario violate the specified behaviour?
No , Yes, in step: 1 ☐, 2 ☐
Which values do the following variables have
- after the execution of the input scenario (if A1 is ‘No’)
- before the violating step (if A1 is ‘Yes’)?
act.wiperSpeed
act.wiperState
wiper.vehicleStatus
wiper.configuration
= Constants.SPEED_OFF
= WiperState::WIPER_ACTIVE
= VehicleStatus::RUNNING
= WiperConfig::WIPER_INSTALLED
Fig. 3. Example Question for TA Model (above) and MSD Model (below)
think that an accurate understanding of a specification is more
important than speed. This is why we chose AScore as a metric
for measuring how correct a question is answered in average.
For completeness, we also added Score, which is related
to the other two metrics by Score = Answered AScore.
We opted for comprehensibility instead of letting subjects
create diagrams themselves, as this is easier and requires
less training. Furthermore, the experiment targets models of
functional requirements, not simply behavioural models in
general. Therefore, we argue that comprehensibility is of
particular importance, as the aim of requirements is to document
what a system shall fulfill. Hence, correctly understanding these
requirements is crucial.</p>
      <p>An additional variable which can influence the outcome of
the experiment is the subjects’ knowledge regarding modelling
languages and their domain knowledge in the automotive
domain. While all students are from the same course, they
might have different previous knowledge and experience. To
address this issue we employed a pre-experiment survey which
asked for background information, such as previous courses
on modelling taken by the subject.</p>
    </sec>
    <sec id="sec-7">
      <title>D. Hypotheses</title>
      <p>In the course of the experiment, we used the following null
and alternative hypotheses, H0 and H1, which we formulated
as follows.</p>
      <p>H0: There is no significant difference between Modal
Sequence Diagrams and Timed Automata with respect to
comprehensibility of requirements specifications.
H1: There are significant differences between Modal
Sequence Diagrams and Timed Automata with respect to
comprehensibility of requirements specifications.</p>
      <p>We evaluated the hypotheses separately for each of the
dependent variables. Each of the variables was tested for
significance using a non-parametric Mann-Whitney U test.
Additionally, we tested for equality of variances for each
of the variables using a Levene test in order to fulfill the
assumptions of the Mann-Whitney U test. For both tests, we
used a significance value of 0:05.</p>
    </sec>
    <sec id="sec-8">
      <title>E. Operation</title>
      <p>The experiment was piloted with two PhD students prior to
execution. The instrument turned out to be too complicated
and was therefore simplified furthermore to its current form.</p>
      <p>The experiment was conducted in a 90-minute lecture.
Participation was voluntary and the students received no
benefits for the modelling course, such as bonus points or higher
grades. In the first 45 minutes, both visual modelling languages
were introduced. While this is a rather short time for introducing
two new languages, we were limited to this time frame by
the course schedule. Additionally, the subjects had previous
knowledge in similar languages from the course, so that it
was possible to related the newly introduced languages to that
knowledge. Prior to the introduction lecture, we already handed
out the experimental objects, so that the subjects knew which
treatment they would receive and could concentrate on that
language during the lecture. Additionally, they could familiarise
themselves with the model. The subjects were encouraged
not to share or exchange the objects with each other. After
the introduction lecture, we handed out the remaining parts
of the instrument, namely the questionnaires and the syntax
help. Subjects then received 3 minutes for filling out the
preexperiment questionnaire, 40 minutes to fill out the experiment
questionnaire, and finally 2 minutes for the post-experiment
questionnaire.</p>
      <sec id="sec-8-1">
        <title>V. VALIDITY We will in the following discuss means which we took in order to ensure validity. We use the four aspects of validity as presented in Wohlin et al. [20].</title>
      </sec>
    </sec>
    <sec id="sec-9">
      <title>A. Construct Validity</title>
      <p>In order to avoid inadequate preoperational explication of
constructs, we have explicitly defined what ’comprehensibility’
means with respect to our study. Also, it is clearly defined that
a higher score in any of the three dependent variables means a
better result for that variable. Our dependent variables do not
require any human judgment and are therefore objective.
Monooperation bias can currently not entirely be ruled out, as we
only used one experimental object. We are planning to replicate
the experiment with another requirements specification in the
future in order to address this. Mono-method bias is addressed
by asking two sub-questions for each of the 12 experiment
questions. While the first of the two sub-questions is a simple
yes/no question, it is an additional check whether the subject
has correctly understood the model. If this one is already
incorrect, we automatically awarded 0 points to the second
sub-question as well. Additionally, the second sub-question
was much harder to get right by chance.</p>
    </sec>
    <sec id="sec-10">
      <title>B. Internal Validity</title>
      <p>In order to avoid maturation or learning effects, subjects were
only allowed to participate in the experiment once and only
in one group, and were not allowed to exchange information
with other subjects during the experiment. Additionally, we
used a pre-experiment questionnaire in order to assess the
subjects domain and modelling knowledge, which might affect
the outcome. While all students came from the same course,
they had different previous experience with respect to software
modelling and requirements engineering. We also assured that
the subjects voluntarily participated in the experiment, by not
giving rewards in the form of improved course grades or similar,
in order to avoid compensation rivalry or demoralisation.
However, we can not entirely rule out that some subjects
participated to win our appraisal later in the course. The fact
that we used volunteers might bias the results, as they could
have been more motivated than the average.</p>
    </sec>
    <sec id="sec-11">
      <title>C. External Validity</title>
      <p>We used parts of a real-life specification instead of a
toy example for the experiment instrument. However, the
requirements had to be abstracted as the original specification
is confidential. Additionally, while modelling the requirements,
we had to ensure that both treatments were modelled in the
same way and exhibited the same behaviour. This could have
lead to one of the treatments being modelled in a way which
would not happen in practice, and thus limit generalisability.
We tried to reduce this threat by iteratively discussing and
improving the instrument among the authors of this paper.
Additionally, the fact that we used student subjects possibly
limits the generalisability to an industrial context. Finally,
the specification is based on an automotive requirements
specification, which can limit the generalisability to other
domains.</p>
    </sec>
    <sec id="sec-12">
      <title>D. Conclusion Validity</title>
      <p>
        We tried to avoid ambiguous wording of questions in
the questionnaire by iteratively reviewing and improving it.
Additionally, we performed a pilot experiment with two PhD
students prior to the actual experiment, in order to improve
both the introduction material and the questionnaire. Reliability
of treatment implementation is given, as the introduction lecture
was only given once for the actual experiment. We did only
perform statistical tests on the three dependent variables, which
were defined up-front, and did not fish for results [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ].
      </p>
      <sec id="sec-12-1">
        <title>VI. RESULTS AND DISCUSSION</title>
        <p>In the following, we will discuss first the demography of
the subjects participating in the experiment. Afterwards, we
present and discuss the results of the hypothesis testing for
the experiment. Finally, we finish with a discussion of the
post-experiment questionnaire.</p>
      </sec>
    </sec>
    <sec id="sec-13">
      <title>A. Demographic Data</title>
      <p>Out of the 22 subjects, 19 are Bachelor students and 3 are
Master students. This can be explained through the fact that the
course in which we performed the experiment is on Bachelor
level, but can be taken as an elective course by first year
Master students. All 3 Master students were randomly assigned
the MSD treatment. Out of 22 subjects, 13 have a secondary
school degree, 7 a Bachelor degree, 1 a Master degree, and
1 subject another degree as their highest degree. This means
that 5 subjects on Bachelor level are already in possession of a
Bachelor degree, and one Master student already has a Master
degree. While this is certainly possible, it might also be caused
by misunderstanding the question. Most subjects already had
previous courses on related topics, such as Object-oriented
programming or Software Architecture. Only 6 subjects stated
to not have taken any related courses previously. Additionally,
we asked the subjects for their professional experience in
developing software, in modelling software, and in requirements
engineering. In both modelling software and in requirements
engineering, only 3 subjects answered that they had previous
professional experience, ranging from half a year to three years
of experience. In addition to this, 9 subjects stated that they
have professional experience in software development, with one
subject each stating 0.3 years, 1 year, and 8 years of experience,
and 3 subjects each stating 2 and 3 years of experience.</p>
    </sec>
    <sec id="sec-14">
      <title>B. Experiment Results</title>
      <p>The experiment was conducted on 4th December 2014 at
Chalmers University in Gothenburg, Sweden. The answers
from the paper questionnaire were afterwards digitalised in
order to allow computerised data processing. An overview over
both the descriptive statistics and the significance testing for
all three variables is depicted in Tables I and II.</p>
      <p>The results of the first dependent variable, Answered, are
depicted in Figure 4 for both treatments. Clearly, subjects in
the TA group took longer to answer the questionnaire, which
led to only one subject finishing all questions. In the MSD
group, half of the subjects finished all questions. Additionally,
four subjects in the TA group answered three or less questions,
whereas this is only the case for one subject in the MSD group.
The large difference in the two means for this variable already
indicates that the null hypothesis can be rejected, which is
confirmed by the significance test with p 0:021. Hence,
there is a significant difference with respect to the number of
answered questions between the two treatments. A possible
explanation for this might be the nature of MSDs, compared
to TAs. While a single MSD has to be taken into account
only once it is activated, each automaton in a TA is ’active’
by definition. This means that for each message in a given
scenario, all automata need to be studied, while only a subset
of the MSDs needs to be considered.</p>
      <p>Answered  (TA)  
12  
10  
8  
6  
4  
2  
0  
12  
10  
8  
6  
4  
2  
0  
1   2   3   4   5   6   7   8   9   10   11   12  </p>
      <p>Subject  </p>
      <p>Answered  (MSD)  
1  
2  
3  
4  
7  
8  
9  </p>
      <p>10  </p>
      <p>The second dependent variable, AScore, is depicted in Figure
5 for both TA and MSD treatment. Here, in the TA treatment
there is a much larger variance in the data set, with both
very high and very low values. For the MSD treatment, there
are few values in the extremes. The statistical test results in
p 0:947, so that the null hypothesis can not be rejected
for this variable. We do not have an explanation for the large
differences between subjects in the TA treatment, but they
might be attributed to misunderstandings with respect to the
modelling language. Several subjects achieved average scores
under 1 point, even though they stated in the post-experiment
questionnaire that they were confident in their answers. We
plan to replicate the experiment in the future which will include
some simple upfront questions in order to measure whether
the subjects have really understood the languages well enough
2  
1.8  
1.6  
1.4  
1.2  
1  
0.8  
0.6  
0.4  
0.2  
0  
2  
1.8  
1.6  
1.4  
1.2  
1  
0.8  
0.6  
0.4  
0.2  
0  
1   2   3   4   5   6   7   8   9   10   11   12  </p>
      <p>Subject  
AScore  (MSD)  
and analyse whether this correlates with the self-assessment.
1  
2  
3  
4  
7  
8  
9  </p>
      <p>10  </p>
      <p>As the third variable Score is directly computed from AScore
and Answered, it exhibits a similar pattern (see Figure 6). In the
TA group, two subjects achieved 20 or more points, close to the
maximum of 24. However, many subjects in this group have low
total scores. As subjects in the MSD group have significantly
higher values in the Answered metric, their average Score
values are higher, even though the average score AScore is
lower for this group.</p>
      <p>In summary, we can state that MSDs are significantly quicker
to comprehend. Therefore, if speed is a relevant factor, MSDs
should be chosen instead of TA. One could argue that speed
itself is not relevant, as long as AScore is low. Therefore, we
plan to replicate the experiment with subjects who are more
familiar with the modelling languages, in order to see whether
the difference in speed is still present.</p>
    </sec>
    <sec id="sec-15">
      <title>C. Correlation between Demographic Data and Dependent</title>
    </sec>
    <sec id="sec-16">
      <title>Variables</title>
      <p>We used the Pearson product-moment correlation coefficient
to assess the correlations between the three dependent variables
and the number of related courses previously taken by
students, the education level (Bachelor/Master), and the
subject’s confidence in their answers. The resulting values
for Pearson’s r and the p-value are depicted in Table III.
Assuming an effect size of r &lt; 0:3 as small, an effect size of
0:3 r &lt; 0:4 as medium, and an effect size of r 0:4
as large, we see that there is a large correlation between
all three dependent variables and the subject’s confidence
in their results. This result indicates that subjects had, in
average, a clear grasp of whether they understood the instrument
or not. Interestingly, both the number of previous courses
and the education level only show a small correlation with
the dependent variables. Similarly, previous experience in
Software Development, Software Modelling, and Requirements
Engineering has small correlation with the dependent variables,
as depicted in Table IV. These results could indicate that
the dependent variables were in fact influenced by other
factors, such as confusion regarding the newly introduced
modelling languages. However, they could also indicate that
the understanding of the requirements is not dependent on
previous education and experience. Further replications will be
necessary in order to answer these questions in a satisfactory
manner.</p>
      <sec id="sec-16-1">
        <title>VII. CONCLUSIONS AND FUTURE WORK</title>
        <p>In this paper, we have presented the results of a controlled
experiment with 22 students in an undergraduate course on
software modelling. We studied the comprehensibility of
functional requirements modelled in two graphical languages,
Modal Sequence Diagrams, a sequence-based notation, and
Timed Automata, a state-based notation. Subjects received a
model in one of the two languages and a questionnaire with
questions testing their understanding of the model. While we
can not reject the null hypothesis, that there are no significant
differences between the two treatments, for both the average
and the total questionnaire scores, subjects receiving the Modal
Sequence Diagram specification answered significantly more
questions. This indicates that if the speed or the efficiency plays
an important role, scenario-based models should be considered
instead of the state-based models. However, further studies
need to be conducted in order to understand whether this
effect persists with more experienced users who achieve higher
overall scores.</p>
        <p>While our sample of students without a previous knowledge
of the used treatments can be seen as a possible threat to
validity, this lack of experience is in fact a realistic setup
for industrial use in the automotive domain. As requirements
specifications are used across organisations and across roles
within an organisation, it can not be assumed that the receiver
of a specification is always familiar with every detail of the used
language. Additionally, receivers are often no experts in
modelling, but in other areas such as requirements engineering or
system design. Therefore, in contrast to, for example, software
development, the receivers of a requirements specification can
not be expected to be experts in the used language. Additionally,
our results indicate that the current practice, choosing the
modelling language based on convenience, is not a threat to
the comprehension of the specifications in itself.</p>
        <p>In the future, we will replicate the experiment both with
different groups of students and with professionals from our
industrial partners in order to eliminate possible bias and to
assess whether experience and a deeper knowledge of the
languages can have a significant impact on the understanding.
Additionally, we will aim at generating a theory on which
languages are suitable for which kind of task or system when
modelling requirements.</p>
      </sec>
      <sec id="sec-16-2">
        <title>ACKNOWLEDGEMENT</title>
        <p>We would like to express our gratitude to Nadja Marko
and Christian Webel, who helped in reviewing and discussing
an early experiment design. Additionally, we would like to
thank Pariya Kashfi and Vard Antinyan for participating in
the pilot experiment. The research leading to these results has
received partial funding from the European Union’s Seventh
Framework Program (FP7/2007-2013) for CRYSTAL-Critical
System Engineering Acceleration Joint Undertaking under grant
agreement No 332830 and from Vinnova under DIARIENR
2012-04304.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>M. C.</given-names>
            <surname>Otero</surname>
          </string-name>
          and
          <string-name>
            <given-names>J. J.</given-names>
            <surname>Dolado</surname>
          </string-name>
          , “
          <article-title>Evaluation of the comprehension of the dynamic modeling in UML,” Information and Software Technology</article-title>
          , vol.
          <volume>46</volume>
          , no.
          <issue>1</issue>
          , pp.
          <fpage>35</fpage>
          -
          <lpage>53</lpage>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>C.</given-names>
            <surname>Glezer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Last</surname>
          </string-name>
          , E. Nachmany, and
          <string-name>
            <given-names>P.</given-names>
            <surname>Shoval</surname>
          </string-name>
          , “
          <article-title>Quality and comprehension of uml interaction diagrams-an experimental comparison</article-title>
          ,
          <source>” Information and Software Technology</source>
          , vol.
          <volume>47</volume>
          , no.
          <issue>10</issue>
          , pp.
          <fpage>675</fpage>
          -
          <lpage>692</lpage>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A.</given-names>
            <surname>Nugroho</surname>
          </string-name>
          , “
          <article-title>Level of detail in uml models and its impact on model comprehension: A controlled experiment</article-title>
          ,
          <source>” Information and Software Technology</source>
          , vol.
          <volume>51</volume>
          , no.
          <issue>12</issue>
          , pp.
          <fpage>1670</fpage>
          -
          <lpage>1685</lpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>M.</given-names>
            <surname>Staron</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Kuzniarz</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Wohlin</surname>
          </string-name>
          , “
          <article-title>Empirical assessment of using stereotypes to improve comprehension of uml models: A set of experiments</article-title>
          ,
          <source>” Journal of Systems and Software</source>
          , vol.
          <volume>79</volume>
          , no.
          <issue>5</issue>
          , pp.
          <fpage>727</fpage>
          -
          <lpage>742</lpage>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>N.</given-names>
            <surname>Condori-Ferna</surname>
          </string-name>
          ´ndez, M. Daneva,
          <string-name>
            <given-names>K.</given-names>
            <surname>Sikkel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Wieringa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Dieste</surname>
          </string-name>
          , and
          <string-name>
            <given-names>O.</given-names>
            <surname>Pastor</surname>
          </string-name>
          , “
          <article-title>A systematic mapping study on empirical evaluation of software requirements specifications techniques</article-title>
          ,”
          <source>in Proceedings of the 2009 3rd International Symposium on Empirical Software Engineering and Measurement. IEEE Computer Society</source>
          ,
          <year>2009</year>
          , pp.
          <fpage>502</fpage>
          -
          <lpage>505</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>N.</given-names>
            <surname>Condori-Ferna</surname>
          </string-name>
          ´ndez, M. Daneva,
          <string-name>
            <given-names>K.</given-names>
            <surname>Sikkel</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Herrmann</surname>
          </string-name>
          , “
          <article-title>Practical relevance of experiments in comprehensibility of requirements specifications,” in Empirical Requirements Engineering (EmpiRE</article-title>
          ), 2011 First International Workshop on,
          <source>Aug</source>
          <year>2011</year>
          , pp.
          <fpage>21</fpage>
          -
          <lpage>28</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>S.</given-names>
            <surname>Abraha</surname>
          </string-name>
          ˜o,
          <string-name>
            <given-names>C.</given-names>
            <surname>Gravino</surname>
          </string-name>
          , E. Insfran, G. Scanniello, and G. Tortora, “
          <article-title>Assessing the effectiveness of sequence diagrams in the comprehension of functional requirements: Results from a family of five experiments,” Software Engineering</article-title>
          , IEEE Transactions on, vol.
          <volume>39</volume>
          , no.
          <issue>3</issue>
          , pp.
          <fpage>327</fpage>
          -
          <lpage>342</lpage>
          ,
          <year>March 2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>D.</given-names>
            <surname>Harel</surname>
          </string-name>
          and
          <string-name>
            <given-names>S.</given-names>
            <surname>Maoz</surname>
          </string-name>
          , “
          <article-title>Assert and negate revisited: Modal semantics for UML sequence diagrams</article-title>
          ,
          <source>” Software and Systems Modeling (SoSyM)</source>
          , vol.
          <volume>7</volume>
          , no.
          <issue>2</issue>
          , pp.
          <fpage>237</fpage>
          -
          <lpage>252</lpage>
          , May
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>R.</given-names>
            <surname>Alur</surname>
          </string-name>
          and
          <string-name>
            <given-names>D. L.</given-names>
            <surname>Dill</surname>
          </string-name>
          , “
          <article-title>A Theory of Timed Automata,” Theoretical Computer Science</article-title>
          , vol.
          <volume>126</volume>
          , no.
          <issue>2</issue>
          , pp.
          <fpage>183</fpage>
          -
          <lpage>235</lpage>
          ,
          <year>1994</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>K. G.</given-names>
            <surname>Larsen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mikucionis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Nielsen</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Skou</surname>
          </string-name>
          , “
          <article-title>Testing realtime embedded software using uppaal-tron: An industrial case study,”</article-title>
          <source>in Proceedings of the 5th ACM International Conference on Embedded Software. ACM</source>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>A.</given-names>
            <surname>Fehnker</surname>
          </string-name>
          , “
          <article-title>Scheduling a steel plant with timed automata,” in rtcsa</article-title>
          . IEEE,
          <year>1999</year>
          , p.
          <fpage>280</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>J.</given-names>
            <surname>Greenyer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Haase</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Marhenke</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>Bellmer</surname>
          </string-name>
          , “
          <article-title>Evaluating a formal scenario-based method for the requirements analysis in automotive software engineering</article-title>
          ,”
          <source>in Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering. ACM</source>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>O. M.</given-names>
            <surname>Group</surname>
          </string-name>
          , “
          <article-title>Unified modeling language</article-title>
          ,” http://www.uml.org/, Jun.
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>E.</given-names>
            <surname>Kamsties</surname>
          </string-name>
          , A. von
          <string-name>
            <surname>Knethen</surname>
            , and
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Reussner</surname>
          </string-name>
          , “
          <article-title>A controlled experiment to evaluate how styles affect the understandability of requirements specifications</article-title>
          ,
          <source>” Information and Software Technology</source>
          , vol.
          <volume>45</volume>
          , no.
          <issue>14</issue>
          , pp.
          <fpage>955</fpage>
          -
          <lpage>965</lpage>
          ,
          <year>2003</year>
          , eighth International Workshop on Requirements Engineering:
          <article-title>Foundation for Software Quality</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>G.</given-names>
            <surname>Scanniello</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Staron</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Burden</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>Heldal</surname>
          </string-name>
          , “
          <article-title>On the effect of using SysML requirement diagrams to comprehend requirements: Results from two controlled experiments,” in 18th International Conference on Evaluation Assessment in Software Engineering (EASE)</article-title>
          ,
          <source>May</source>
          <year>2014</year>
          , pp.
          <fpage>433</fpage>
          -
          <lpage>442</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>W.</given-names>
            <surname>Damm</surname>
          </string-name>
          and
          <string-name>
            <given-names>D.</given-names>
            <surname>Harel</surname>
          </string-name>
          , “
          <article-title>LSCs: Breathing life into message sequence charts,” in Formal Methods in System Design</article-title>
          , vol.
          <volume>19</volume>
          . Kluwer Academic,
          <year>2001</year>
          , pp.
          <fpage>45</fpage>
          -
          <lpage>80</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>D.</given-names>
            <surname>Harel</surname>
          </string-name>
          and
          <string-name>
            <given-names>R.</given-names>
            <surname>Marelly</surname>
          </string-name>
          , Come,
          <article-title>Let's Play: Scenario-Based Programming Using LSCs and the Play-Engine</article-title>
          . Springer,
          <year>August 2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>J.</given-names>
            <surname>Bengtsson</surname>
          </string-name>
          and
          <string-name>
            <given-names>W.</given-names>
            <surname>Yi</surname>
          </string-name>
          , “
          <article-title>Timed automata: Semantics, algorithms</article-title>
          and tools,”
          <source>in Lectures on Concurrency and Petri Nets</source>
          , vol.
          <volume>3098</volume>
          . Springer,
          <year>2003</year>
          , pp.
          <fpage>87</fpage>
          -
          <lpage>124</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>V. R.</given-names>
            <surname>Basili</surname>
          </string-name>
          , “
          <article-title>Software modeling and measurement: The goal</article-title>
          /question/metric paradigm,
          <source>” Tech. Rep.</source>
          ,
          <year>1992</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>C.</given-names>
            <surname>Wohlin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Runeson</surname>
          </string-name>
          , M. Ho¨st,
          <string-name>
            <given-names>M. C.</given-names>
            <surname>Ohlsson</surname>
          </string-name>
          , and
          <string-name>
            <given-names>B.</given-names>
            <surname>Regnell</surname>
          </string-name>
          , Experimentation in Software Engineering. Springer,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>W.</given-names>
            <surname>Damm</surname>
          </string-name>
          and
          <string-name>
            <given-names>D.</given-names>
            <surname>Harel</surname>
          </string-name>
          , “
          <article-title>LSCs: Breathing life into message sequence charts,” in Formal Methods in System Design</article-title>
          , vol.
          <volume>19</volume>
          . Kluwer Academic,
          <year>2001</year>
          , pp.
          <fpage>45</fpage>
          -
          <lpage>80</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>