<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Empirical Study: Comparing Hasselt with C# to describe multimodal dialogs</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Fredy Cuenca, Jan Van den Bergh, Kris Luyten, Karin Coninx Hasselt University - tUL - iMinds Expertise Centre for Digital Media Wetenschapspark 2</institution>
          ,
          <addr-line>3590 Diepenbeek</addr-line>
          <country country="BE">Belgium</country>
        </aff>
      </contrib-group>
      <fpage>25</fpage>
      <lpage>32</lpage>
      <abstract>
        <p>-Previous research has proposed guidelines for creating domain-specific languages for modeling human-machine multimodal dialogs. One of these guidelines suggests the use of multiple levels of abstraction so that the descriptions of multimodal events can be separated from the human-machine dialog model. In line with this guideline, we implemented Hasselt, a domain-specific language that combines textual and visual models, each of them aiming at describing different aspects of the intended dialog system. We conducted a user study to measure whether the proposed language provides benefits over equivalent event-callback code. During the user study participants had to modify the Hasselt models and the equivalent C# code. The completion times obtained for C# were on average shorter, although the difference was not statiscally significant. Subjective responses were collected using standardized questionnaires and an interview, which both indicated that participants saw value in the proposed models. We provide possible explanations for the results and discuss some lessons learned regarding the design of the empirical study. Index Terms-Multimodal systems, Human-machine dialog, Finite state machines, Dialog model, Domain-specific language.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>I. INTRODUCTION</title>
      <p>Multimodal systems allow users to communicate through
the coordinated use of multiple input modes, e.g. speech, gaze,
and gestures. These systems have the potential to support a
human-machine communication that is robust (e.g. multiple
inputs can be combined to perform disambiguation), flexible
(e.g. users can choose their preferred modality), and more
natural than ever before.</p>
      <p>
        However, implementing multimodal systems is still a
difficult task. This is partly because of the complexity of
multimodal interaction [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], the absence of standardized
methodology [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], and the mastering of different state of the
art technologies required for their construction [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>
        Several domain-specific languages have been proposed with
the intention of simplifying the implementation of multimodal
interfaces [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]–[
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. From an analysis of these languages,
Dumas et al. [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] proposed many guidelines for developing future
languages. One of these guidelines states that the specialized
language must be such that the declaration of multimodal
events can be separated from the description of the
humanmachine dialog (Figure 1). The present research has
implemented this idea and measured how potential users can benefit
from such an implementation.
      </p>
      <p>Concretely, we created a language, called Hasselt, that
allows notations for declaring multimodal events and
humanmachine dialogs separately. The multimodal events are
textually declared as combinations of predefined user events (e.g.
mouse clicks, speech inputs, etc.). The multimodal dialog
is depicted as finite state machines (FSMs) whose arcs are
labelled with multimodal event names.</p>
      <p>In order to evaluate the benefits of such separation of
concerns, a user study was conducted. Participants had to
sequentially modify two equivalent implementations of a
multimodal dialog system. In one case, both the code for handling
the events and the code for handling the dialog were included
in the same source file written in C#. In the other case, these
were specified separately with the textual and visual notations
provided by Hasselt.</p>
    </sec>
    <sec id="sec-2">
      <title>II. RELATED WORK</title>
      <sec id="sec-2-1">
        <title>A. Modeling multimodal dialogs as FSMs</title>
        <p>
          When modeling human-machine dialogs as finite state
machines (FSM), the nodes of the FSM represent the possible
states of the dialog system, and its arcs represent the transitions
in the dialog system’s state. Many researchers have proposed
FSM-based solutions for modeling unimodal human-machine
dialogs, e.g. IOG [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ], SwingStates [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ], Schwarz’s framework
[
          <xref ref-type="bibr" rid="ref12">12</xref>
          ], and InterState[
          <xref ref-type="bibr" rid="ref13">13</xref>
          ], among others.
        </p>
        <p>However, there are only few languages that allow modeling
multimodal human-machine dialogs as FSMs. Some
representative examples are listed in what follows.</p>
        <p>
          We can consider MEngine [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] as a FSM-based language that
allows modeling trivial multimodal dialogs, e.g. the system
reponses are always the same for a given multimodal input.
We can consider MEngine [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] as a FSM-based language that
allows modeling trivial multimodal dialogs, e.g. the system
reponses are always the same for a given multimodal input.
        </p>
        <p>
          In NiMMiT [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ], the dialog model is a state machine
where each state represent a set of tasks that are available
to the end user. NiMMiT is restricted to interactive virtual
environments (IVE) since its presentation model has to be
encoded in VRIXML [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ]. In contrast, with our proposal, the
presentation model can be implemented in any .NET language,
which opens a wide assortment of possibilities beyond IVE.
        </p>
        <p>
          SMUIML [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] provides different notations for declaring the
human-machine dialog and for combining user events. Unlike
Hasselt, SMUIML does not include a symbol for defining
iterative events. This reduces the space of multimodal events
that can be specified with SMUIML in comparison with
Hasselt. For instance, the drag-and-drop, which involves an
arbitrary number of mouse-move events cannot be specified
with SMUIML at the level of events. Another difference is
that, unlike Hasselt, SMUIML does not support state variables
or conditional transitions at the dialog level.
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>B. User studies. Interaction models vs. event-callback code</title>
        <p>To the best of our knowledge, none of the abovementioned
multimodal dialog modeling languages have been evaluated in
user studies. Nonetheless, outside the multimodal domain, we
found two user studies that guided us in the design of our
experiments.</p>
        <p>
          Oney et al. recruited 20 developers to evaluate the
understandability of Interstate’s visual notation. Each participant
had to modify two systems (drag-and-drop and a thumbnail
viewer) implemented in both RaphaelJS1 and InterState. It
was verified that InterState models are faster-to-modify than
equivalent event-callback code written in RaphaelJS [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ].
        </p>
        <p>
          The creators of Proton++ carried out two experiments with
12 programmers. Each participant was shown a multitouch
gesture specification and set of videos of a user performing
gestures. Gestures may be specified as a regular expression,
tablature, or with event-callback code and the participant
had to match the specification with the video showing the
described gesture. The results showed that the tablatures of
Proton++ are easier-to-comprehend than equivalent regular
expressions and event-callback code [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ].
        </p>
        <p>
          Since real-world scenarios require programmers not only to
comprehend but to write programming code, we followed the
schema of Oney et al [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]. We asked participants to perform
modifications with our language and with equivalent
eventcallback code.
        </p>
        <p>
          Hasselt provides notations for creating executable
specifications of multimodal human-machine dialogs. It comes with
a complete User Interface Management System (UIMS) [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ]
that offers the editors, runtime environment, and debugging
tools required to code, run, and test Hasselt specifications.
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>A. Running Example</title>
        <p>In the remainder of the paper, we will show how to
implement a simple multimodal dialog system with Hasselt.
The front-end of our running example system is shown in
Figure 3, BE. It allows end users to issue multimodal commands
to create, move, and remove objects from a canvas that is
initially empty. These commands may be enabled or disabled
depending on the current context-of-use.</p>
        <p>
          Users can create new objects by issuing voice commands
like ‘create green box here’ while clicking on the canvas to
indicate the position of the new object. Boxes are reshuffled by
issuing ‘put that there’ while clicking on both the target object
and its new position [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ]. And the canvas can be cleared up
in reaction to the voice command ‘remove objects’. To make
the system responses depend on the context-of-use, we added
two rules: The boxes can only be moved if there are more
than three of them on the canvas; and the canvas can only be
cleared up after the displacement of at least one object.
        </p>
      </sec>
      <sec id="sec-2-4">
        <title>B. How to use Hasselt UIMS?</title>
        <p>The steps required to create a multimodal dialog system
with Hasselt UIMS are as follows.</p>
        <p>1) Implementing a back-end application: One must create
an executable program implementing the front-end and the
handling methods of the intended system. For the purpose
of this work, such program will be referred to as back-end
application. The back-end application can be implemented
with any .NET programming language to be subsequently
imported into Hasselt UIMS.</p>
        <p>For the aforementioned running example, the back-end
application implements the front-end shown in Figure 3, BE,
and the methods for creating, moving, and removing
virtual objects, i.e. CREATEOBJECT(COLOR,X,Y),
PUTTHATTHERE(X1,Y1,X2,Y2), and REMOVEALLOBJECTS().</p>
      </sec>
      <sec id="sec-2-5">
        <title>2) Declaring multimodal events: Hasselt allows combining</title>
        <p>
          multiple user events into one single abstraction [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ], [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ].
        </p>
        <p>
          Programatically, user events can be combined through a set
of event operators that can be used in a recursive manner.
The operator F OLLOW ED BY (; ) indicates sequentiality
of events, the operator OR(j) serves to specify
alternative events, AN D(+) represents simultaneity of events, and
IT ERAT ION ( ) is meant to specify repetitive events [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ].
        </p>
        <p>To describe the interactions to be supported by our
running example system, we used these operators to declare the
following multimodal events (Figure 2, a):
(a)
(b)</p>
      </sec>
      <sec id="sec-2-6">
        <title>3) Binding multimodal events with event-handling call</title>
        <p>backs: Each multimodal event must be bound to a method
of the back-end application. At runtime, Hasselt UIMS will
automatically launch these methods whenever their associated
multimodal events are detected.</p>
        <p>For our running example, one has to bind the method
PUTTHATTHERE(X1,Y1,X2,Y2) to the event putT hatT here
shown in Equation 1. Similarly, methods
CREATEOBJECT(COLOR,X,Y), and REMOVEALLOBJECTS() can be
bound to the events declared in Equation 2 and Equation 3
respectively. The multimodal events do not have to have the
same name as their associated callbacks.</p>
        <p>
          We must highlight that the binding between multimodal
events and callback functions is specified through a textual
notation. With this notation, one can bind not only one
but multiple callbacks to one single multimodal event,
and to specify temporal and spatial constraints among the
constituents of a multimodal event. The notations for binding
multimodal events will not be presented herein and interested
readers can refer to [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ]. The focus of this paper is in
the evaluation of the visual language that is used after the
definition and binding of multimodal events.
        </p>
      </sec>
      <sec id="sec-2-7">
        <title>4) Describing the human-machine dialog: The visual editor</title>
        <p>
          provided by Hasselt UIMS (Figure 2, b) enables programmers
to describe human-machine dialogs as extended finite state
machines [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ], i.e. state machines augmented with state variables
and guard conditions.
        </p>
        <p>In a Hasselt visual model, the circles represent the potential
states of the dialog system, and the arcs represent the system’s
state transitions. Each arc is annotated with a multimodal
event whose occurrence causes the transition represented by
the arc. Additionally, one can use state variables to encode
quantitative aspects of the dialog, e.g. the number of times a
state (transition) is visited (traversed). The statements required
to maintain the state variables can be annotated in the arcs of
the extended state machine. Finally, guard conditions can also
be annotated in the arcs of a FSM to restrict their associated
state transitions.</p>
        <p>The visual model shown in Figure 2, b describes the dialog
supported by our running example system. The circle labelled
as 1 represents the state where canvas is empty; the circle 2
represents the state where there is at least one object on the
canvas; and the circle 3, the state where at least one object
has been moved. The system moves from the initial state 1
to state 2 upon the creation of the first object. It also moves
from state 2 to state 3 after the first displacement of an object.
The variable N is used (a) to count the number of objects
in the canvas –when this is relevant–, and (b) to condition
the displacement of objects, which should only be possible if
there are more than 3 objects on the canvas –notice the label
[N &gt; 3]. Finally, the removal of objects sets the system to its
initial state: the circle labelled as 1.</p>
        <p>It can be proved that if event-callback code were used to
implement the running example system, the identification of
the system’s state would have required a series of nested
if-else statements spread throughout a big portion of the
whole program. Rather, Hasselt models have fewer and simpler
conditional clauses that can be centralized in a FSM that
provides a comprehensive overview of the human-machine dialog.
There is one way to know whether these theoretical advantages
are reflected into practical benefits for programmers, which is
through a user study.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>IV. USER STUDY The experiment aims at determining whether separating the declaration of events from the dialog model brings about benefits in favor of programmers.</title>
      <sec id="sec-3-1">
        <title>A. Hypothesis</title>
        <p>We hypothesize that the maintainance of a multimodal
dialog can be performed faster and/or more easily with
Hasselt, where the events can be described separately from the
dialog model, than with C#, where the code for combining
multimodal events is intermixed with the code for dialog
management.</p>
      </sec>
      <sec id="sec-3-2">
        <title>B. Method</title>
        <p>1) Study Design: The participants were evaluated one by
one after receiving a training session.</p>
        <p>During the experiment, each participant was shown a
multimodal system with which he had to interact according to
the indications of the researcher. Once the participant was
familiar with the functionality of the system, he was shown
the source code/visual model of the system and asked to
perform modifications in it. Each participant had to sequentially
perform the changes in both Hasselt and C#. The changes to
be performed were explained orally, but also written in a sheet
that the participant can check during the experiment.</p>
        <p>While the participant modifies the code/visual model, the
researcher observes the changes made by the participant on
a secondary monitor that replicates the screen in front of the
participant. In this way, for each language, the researcher can
measure the completion time of the task, count how many
times the partial changes are tested in the runtime environment,
and watch how the participant navigates trough the C# code
or Hasselt visual model.</p>
        <p>After the participant performs the requested changes with
a language, he is asked to fill a post-task questionnaire. At
the end of the whole experiment, i.e. after using Hasselt
and C#, the participant is asked to evaluate the usability of
Hasselt UIMS and interviewed by the researcher.</p>
        <p>2) Participants: We recruited 12 participants, all of whom
are male. The programming experience of the participants
ranges from 4 to 13 years; their C# experience, between 1
and 8 years (Figure 4).</p>
        <p>3) Procedure: Before the beginning of the experiment, each
participant was given a 10-minutes tutorial about Hasselt.
Participants had to describe a simple, Hello world-like
multimodal interaction by following step-by-step instructions. The
tutorial help participants get acquainted with the visual editor,
debugging tools, and runtime environment of Hasselt UIMS.</p>
        <p>Since all participants had experience with C# and MS Visual
Studio, there was no need for training in this respect.</p>
        <p>For the experiment, the participant was presented with a
system similar to the one herein used as a running example.
It allowed users to create and remove virtual objects from a
canvas in response to multimodal input. In the version given
to participants, the objects could be created or removed at
any time, after which the end user was acknowledged with
voice feedback. Participants were asked to change the system
so that it can handle two contexts-of-use: the command to
remove objects must only be processed if there are objects on
the screen; otherwise, it should be ignored.</p>
        <p>The aforementioned system was described with both C# and
Hasselt. Each participant had to modify both sources within a
time limit of 30 minutes per language.</p>
      </sec>
      <sec id="sec-3-3">
        <title>4) Solution of the modeling task: With Hasselt, the required</title>
        <p>changes can be made by modifying the human-machine dialog
model only. Participants had to define different context-of-uses
to distinguish whether the form is empty or has objects on it.
Figure 5 shows two potential solutions.</p>
        <p>As to the C# code, participants had to declare one variable
for counting the number of objects on the form. This variable
has to be updated every time a new object is created and
whenever all the objects are removed from the form. It also
has to be interrogated before proceeding to clear up the form.
Although these four additions are easy to implement, they have
to be included in the right place of a source code of 114 lines.
(a) Model given to participants
(b) Most common solution
(c) Outlier’s solution
(Actually, the full code contained 273 lines, but we hide the
code for loading the speech recognizer, for hooking the mouse,
and the back-end functions. This was to make the comparison
as fair as possible. With Hasselt, the configuration code or
the back-end code cannot be seen either. The former is within
Hasselt UIMS; the latter, in a canned application imported into
Hasselt UIMS.)
C. Measures</p>
        <p>1) Observations: As the participant performs the required
modifications with a certain language, the researcher monitors
his working time and counts the number of times the code is
tested.</p>
        <p>
          2) Single Ease Question (SEQ) questionnaire: Right after
completing the changes with each language, participants were
asked to fill the Single Ease Question (SEQ) questionnaire, a
7-point rating scale (Figure 6) aimed to assess the perceived
difficulty (or perceived ease, depending on one’s perspective)
of a task. The questionnaire has been proven to be reliable,
sensitive, and valid while also being easy to respond and easy
to score [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ].
        </p>
      </sec>
      <sec id="sec-3-4">
        <title>3) System Usability Scale (SUS) questionnaire: At the end</title>
        <p>
          of the experiment, participants had to fill the System Usability
Scale (SUS) questionnaire [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ] (Figure 7, a), which has
become a well-known questionnaire for end-of-test subjective
assessments of usability [
          <xref ref-type="bibr" rid="ref23">23</xref>
          ].
        </p>
        <p>The SUS questionnaire consists of 10 items with 5-point
scales numbered from 1 (anchored with “Strongly disagree”)
to 5 (anchored with “Strongly agree”).</p>
        <p>
          SUS test scores are normalized to values between 0 and
100. To have a benchmark to which one can compare SUS
scores with, Lewis et al. shared historical information showing
that the average and third quartile of 324 usability evaluations
performed with SUS are 62.1 and 75.0 respectively [
          <xref ref-type="bibr" rid="ref23">23</xref>
          ].
        </p>
        <p>
          Finally, according to a factor analysis performed by
Lewis et al., the SUS questionnaire does not only measures
usability. It also measures learnability, being Q4 and Q10 the
questions that allow estimating the perceived learnability of the
system under evaluation [
          <xref ref-type="bibr" rid="ref23">23</xref>
          ]. In the taxonomy proposed by
Grossman et al. [
          <xref ref-type="bibr" rid="ref24">24</xref>
          ], this learnability falls within the category
of initial learnability given that participants have been exposed
to Hasselt for the first time during this experiment.
        </p>
      </sec>
      <sec id="sec-3-5">
        <title>D. Interview Highlights</title>
        <p>Based on the SEQ scores, a majority (7 out of 12
participants) considered that the modification of Hasselt visual
models was easier than changing C# code. When asked for a
reason, many of these participants referred to the overall view
provided by the visual models: “You can see all the system
in one screen” and “You do not have to browse code through
multiple screens” were common answers.</p>
        <p>One of the few participants who scored Hasselt as more
difficult than C# was the outlier seen in Figure 8, a. He pointed
out his total lack of knowledge in state machines as the cause
of his poor performance. All other participants had, at least,
pen-and-paper experience with state machines and thus, they
could get more benefits from the training session.</p>
      </sec>
      <sec id="sec-3-6">
        <title>E. Results</title>
        <p>All 12 participants could complete the changes with both
languages Hasselt and C#. The data from observations and
post-task questionnaires are synthesized in Figure 8. After
inspecting the data, we decided to drop the only participant
who had no previous experience with FSMs. He was an outlier
in the plots (a) and (b) shown in Figure 8. Therefore, the
following results are based on the remaining 11 participants.</p>
        <p>1) Completion time: On average, changes made with
Hasselt took 2.4 minutes in comparison with the 2.1 minutes
when using C#. However, these results were not statistically
significant. We could not reject the null hypothesis in favor
(a) System Usability Scale (SUS) questionnaire
(b) Scores per question for Hasselt UIMS
of the alternative hypothesis that Hasselt completion times are
higher than C# completion times: a Wilcoxon signed-rank test
resulted in p-value = 0:1562 &gt; 0:05 (W = 12.5, Z = 1.3828).</p>
        <p>2) Code testing effort: On average, programmers tested
their code 1.2 times when using Hasselt and 1.4 times when
using C#. But this result is not statistically significant either.
We could not reject the null hypothesis in favor of the
alternative hypothesis that the code testing effort is lower with
Hasselt than with C#: a Wilcoxon signed-rank test resulted in
p-value = 0.25 (W =0, Z = -1.4142).</p>
      </sec>
      <sec id="sec-3-7">
        <title>3) Perceived ease of the task: The average SEQ scores</title>
        <p>for Hasselt and C# were 6.6 and 5.9 respectively. In this
case, we found that this difference in favor of Hasselt was
statistically significant. A Wilcoxon signed-rank test indicated
that the alternative hypothesis that the SEQ scores are higher
for Hasselt than for C# can be accepted (p-value = 0.0078,
W =28, Z = 2.6153).</p>
        <p>Note: The use of Wilcoxon signed-rank tests instead of
paired t-tests responded to the fact that we could not guarantee
the normality assumption required by the latter. The
nonnormality of the pair differences was observed in both normal
Q-Q plots and Shapiro-Wilk normality tests. The data analysis
was performed with the open source software R2.</p>
      </sec>
      <sec id="sec-3-8">
        <title>4) Results of the SUS questionnaire: The SUS question</title>
        <p>
          naire was only used to evaluate Hasselt UIMS. Comparing
with the data repository provided by Lewis et al., the average
SUS score of 73.96 that the participants gave to Hasselt UIMS
indicates that its perceived usability is well above average but
not higher than 75% of the 324 systems reported in [
          <xref ref-type="bibr" rid="ref23">23</xref>
          ].
        </p>
        <p>The average scores obtained for Hasselt UIMS for each
of the 10 items of the SUS questionnaires are observed in
Figure 7, b.</p>
        <p>F. Threats to validity</p>
        <p>
          1) Construct validity: The general concept of validity was
traditionally defined as the degree to which a test measures
what it claims, or purports, to be measuring [
          <xref ref-type="bibr" rid="ref25">25</xref>
          ]. The construct
validity of our empirical study coud have been affected as
follows.
        </p>
        <p>First, the code testing effort was quantified as the number of
times the participant enters in the runtime environment. This
means we assumed that participants have to run the program in
order to test the correctness of the source code. This definition
may not be complete since it ignores the effort made when the
participant ‘runs and tests the code inside his head’.</p>
        <p>Second, the SUS questionnaire may have been measured
only certain aspects of the usability of Hasselt UIMS. An
expert in empirical studies made us notice that usability also
includes the long-term experience of using a software system,
which is not considered in our study: all participants used
Hasselt for the first and only time during the study. However,
the initial learnability, which is another dimension of the
SUS questionnaire, was correctly measured by Q4 and Q10,
according to the same expert.</p>
        <p>
          Construct validity is not the only type of validity that must
be considered when designing empirical research. An
empirical study is said to have internal validity when the impact
of almost all influencing factors are excluded, so the study is
performed in a highly controlled setting [
          <xref ref-type="bibr" rid="ref26">26</xref>
          ]. In contrast,
external validity consists of allowing some influencing factors so
that the experiment can emulate a real-world situation instead
of an ideal one [
          <xref ref-type="bibr" rid="ref26">26</xref>
          ]. Whereas external validity increases the
chances that results can be generalized to more realistic,
everyday situations, internal validity allows researchers to pinpoint
the reasons of improvement or degradation, but at the cost of
generalizability.
        </p>
        <p>2) Internal validity: We pursued for internal validity in the
following way.</p>
        <p>First, the order of the language to be used first (i.e. Hasselt
or C#) was balanced over the participants so that the
aggregated experience bias can be neutralized.</p>
        <p>Besides, since the goal of the experiment was to measure
the effort for describing multimodal dialogs, participants were
restricted to this portion of the code/model only. With Hasselt,
programmers were restricted to use the visual editor only.
With C#, the code for configuring the speech recognizer and
the application code (e.g. for creating, deleting objects) was
hidden to programmers –we put this portion of the code in
regions that were collapsed during the experiment.</p>
        <p>On the other hand, offering participants a tutorial on Hasselt
but no tutorial on creating multimodal dialogs using C# might
affect the experiment’s internal validity.</p>
        <p>3) External validity: In order to confer our results with high
external validity, we allow some ‘freedom’ to the experiments.</p>
        <p>First, the pool of participants was quite varied. It includes
master and PhD students, post-docs, and industry
programmers, from different universities and countries, with and
without background in finite state machines (FSMs).</p>
        <p>
          Most importantly, participants were left free in the wild.
This contrasts with other approaches commonly used in
empirical studies, such as the think-aloud protocol and the
questionsuggestion protocol [
          <xref ref-type="bibr" rid="ref24">24</xref>
          ]. The former would require
participants to speak out while programming in order to provide the
researcher with insights about their programming logic. The
latter would allow the researcher to give advice proactively
to the participant. In our experiments, the researcher only
interferes when participants ask for questions. In our opinion,
this is a more realistic scenario that reflects the typical case
of a programmer working by his own and eventually asking
for advice to more expert programmers when he got stuck on
a problem.
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>V. DISCUSSION AND CONCLUSION</title>
      <sec id="sec-4-1">
        <title>A. Modeling with Hasselt and C#</title>
        <p>We presented Hasselt, a language that provides notations
for defining multimodal human-machine interaction dialogs.
A dialog model in Hasselt is an extended finite state machine
specified with a visual editor and whose arcs are annotated
with multimodal events that are defined with a separate textual
notation.</p>
        <p>We expected to experience some benefits from separating
the event definition code from the dialog management model.
But this is what we found:</p>
        <p>First, we found that the better-separated Hasselt models are
not faster-to-modify than equivalent event-callback code where
the instructions for event handling and for dialog management
are intermixed. Although for our participants, the task of
implementing a multimodal dialog was, on average, performed
faster with C# than with Hasselt, these results were not
statistically significant.</p>
        <p>Second, our participants tested Hasselt models fewer times
than equivalent C# code. Despite of this, completing changes
with Hasselt took longer. Based on our observations, the
reason for this may be that modifying visual models is more
time-consuming than writing textual code.</p>
        <p>Finally, the SEQ questionnaires revealed that participants
perceived that performing the required changes with Hasselt
was easier than with C#. Although these measurements turned
out to be statistically significant, we cannot discard that some
response bias played a role here. Participants gave higher
scores to the language that led to longer completion times.</p>
      </sec>
      <sec id="sec-4-2">
        <title>B. Perceived usability and initial learnability of Hasselt UIMS</title>
        <p>Considering that odd-numbered questions are
positivelyworded, scores higher than 3 in these items reflect that
participants agree (to a certain degree) that the evaluated
system presents some good aspect/feature. In our study, all
odd-numbered questions were scored with more than 3 points
on average. From this group, Q3, i.e. “I thought the system
was easy to use” and Q7, i.e. “I would imagine that most
people would learn to use this system very quickly”, received
the highest scores.</p>
        <p>Similarly, since even-numbered items are
negativelyworded, scores lower than 3 would indicate that participants
are disagreeing (to a certain degree) with some negative
comment about the system. In our studies, all even-numbered
questions were scored with less than 3 points on average. From
this group, Q10, “I needed to learn a lot of things before I
could get going with this system”, Q4, i.e. “I think I would
need support of technical person to use this system”, and Q8,
i.e. “I found the system very cumbersome to use” received the
lowest scores (which in this case it is something positive).</p>
        <p>
          The salient scores obtained for Q4 and Q10, the two
questions that define perceived initial learnability [
          <xref ref-type="bibr" rid="ref23">23</xref>
          ], indicate
that, to a certain degree, participants consider that Hasselt
UIMS is easy-to-learn.
        </p>
      </sec>
      <sec id="sec-4-3">
        <title>C. Future work</title>
        <p>We think that the main reason why no clear winner emerged
from this study is that the task was too simple given the
programming experience of the participants. Thus, we plan
to repeat the experiment with more complex tasks.</p>
        <p>Other minor changes refer to the functionalities of the visual
editors. We want to minimize the effort involved in wiring the
FSMs. We plan to add combination keys for creating nodes and
links, not to allow resizing of the nodes, and allow jumping
between the elements of a FSM with the TAB key.</p>
        <p>
          Finally, we would like to gather objective cognitive load
measurements [
          <xref ref-type="bibr" rid="ref27">27</xref>
          ] like heart rate or pupil dilatation. We
expect to see some positive correlations between the perceived
difficulty declared by participants in the questionnaires and
their physiological reactions during the task.
        </p>
      </sec>
      <sec id="sec-4-4">
        <title>D. Lessons learned</title>
        <p>Based on this experience, we suggest some guidelines for
others trying to design comparative studies between
domainspecific languages and some mainstream language.</p>
        <p>It is important that the training session can be supervised
by the researcher and carried out right before the test. This
makes all participants to start the experiment with a similar
level of knowledge as long as they have similar backgrounds.
Otherwise, some participants can benefit more from the
training than others, which may cause the appearance of outliers.</p>
        <p>It may not be a good idea to ask programmers working in
the same research lab. Some may feel that one is going to
evaluate their programming skills. From a research lab with
more than 50 people, we could only recruit 5 participants. The
remaining 7 participants were recruited from external
institutions. Alternatively, one can also ask a person from an external
institution to play the role of researcher so that participants do
not feel observed by a acquaintance or colleague.</p>
        <p>The complexity of the programming task must be
appropriately calibrated. It has to be as high as to notice differences
in the measurements; but not so high as to affect completion
rates. In this matter, one must evaluate whether it is better
to ask programmers to modify an existing program or to
implement a new one from scratch.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Y. A.</given-names>
            <surname>Ameur</surname>
          </string-name>
          and
          <string-name>
            <given-names>N.</given-names>
            <surname>Kamel</surname>
          </string-name>
          , “
          <article-title>A generic formal specification of fusion of modalities in a multimodal hci,” in Building the Information Society</article-title>
          . Springer,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>W.</given-names>
            <surname>Dargie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Strunk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Winkler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Mrohs</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Thakar</surname>
          </string-name>
          , and W. Enkelmann, “
          <article-title>A model based approach for developing adaptive multimodal interactive systems.” in ICSOFT (PL</article-title>
          /DPS/KE/MUSE),
          <year>2007</year>
          , pp.
          <fpage>73</fpage>
          -
          <lpage>79</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>B.</given-names>
            <surname>Dumas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Lalanne</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>Ingold</surname>
          </string-name>
          , “
          <article-title>Description Languages for Multimodal Interaction: A Set of Guidelines and its Illustration with SMUIML,” Journal of multimodal user interfaces</article-title>
          , vol.
          <volume>3</volume>
          , no.
          <issue>3</issue>
          , pp.
          <fpage>237</fpage>
          -
          <lpage>247</lpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>M.</given-names>
            <surname>Bourguet</surname>
          </string-name>
          , “
          <article-title>Designing and prototyping multimodal commands</article-title>
          ,”
          <source>in Proceedings of INTERACT'03</source>
          ,
          <year>2003</year>
          , pp.
          <fpage>717</fpage>
          -
          <lpage>720</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>P.</given-names>
            <surname>Dragicevic and J.-D. Fekete</surname>
          </string-name>
          , “
          <article-title>Support for input adaptability in the icon toolkit</article-title>
          ,”
          <source>in Proceedings of the 6th ICMI'04</source>
          . New York, NY, USA: ACM,
          <year>2004</year>
          , pp.
          <fpage>212</fpage>
          -
          <lpage>219</lpage>
          . [Online]. Available: http://doi.acm.
          <source>org/10</source>
          .1145/1027933.1027969
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>J. De Boeck</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Vanacken</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Raymaekers</surname>
            , and
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Coninx</surname>
          </string-name>
          , “
          <article-title>High level modeling of multimodal interaction techniques using NiMMiT,”</article-title>
          <source>Journal of Virtual Reality and Broadcasting</source>
          , vol.
          <volume>4</volume>
          , no.
          <issue>2</issue>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>W. A.</given-names>
            <surname>Ko</surname>
          </string-name>
          <article-title>¨nig</article-title>
          , R. Ra¨dle, and H. Reiterer, “
          <article-title>Interactive design of multimodal user interfaces</article-title>
          ,
          <source>” Journal on Multimodal User Interfaces</source>
          , vol.
          <volume>3</volume>
          , no.
          <issue>3</issue>
          , pp.
          <fpage>197</fpage>
          -
          <lpage>213</lpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>J.-Y. L. Lawson</surname>
            ,
            <given-names>A.-A.</given-names>
          </string-name>
          <string-name>
            <surname>Al-Akkad</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Vanderdonckt</surname>
            , and
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Macq</surname>
          </string-name>
          , “
          <article-title>An open source workbench for prototyping multimodal interactions based on off-the-shelf heterogeneous components</article-title>
          ,”
          <source>in Proceedings of the EICS'09. ACM</source>
          ,
          <year>2009</year>
          , pp.
          <fpage>245</fpage>
          -
          <lpage>254</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>F.</given-names>
            <surname>Cuenca</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. Van der Bergh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Luyten</surname>
          </string-name>
          , and
          <string-name>
            <given-names>K.</given-names>
            <surname>Coninx</surname>
          </string-name>
          , “
          <article-title>A domainspecific textual language for rapid prototyping of multimodal interactive systems,” in Proceedings of the 6th ACM SIGCHI symposium on Engineering interactive computing systems (EICS'14)</article-title>
          . ACM,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>D. A.</given-names>
            <surname>Carr</surname>
          </string-name>
          , “
          <article-title>Specification of interface interaction objects</article-title>
          ,”
          <source>in Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM</source>
          ,
          <year>1994</year>
          , pp.
          <fpage>372</fpage>
          -
          <lpage>378</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>C.</given-names>
            <surname>Appert</surname>
          </string-name>
          and
          <string-name>
            <given-names>M.</given-names>
            <surname>Beaudouin-Lafon</surname>
          </string-name>
          , “Swingstates:
          <article-title>Adding state machines to java and the swing toolkit</article-title>
          ,
          <source>” Software: Practice and Experience</source>
          , vol.
          <volume>38</volume>
          , no.
          <issue>11</issue>
          , pp.
          <fpage>1149</fpage>
          -
          <lpage>1182</lpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>J.</given-names>
            <surname>Schwarz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Mankoff</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Hudson</surname>
          </string-name>
          , “
          <article-title>Monte carlo methods for managing interactive state, action and feedback under uncertainty,”</article-title>
          <source>in Proceedings of the 24th annual ACM symposium on UIST. ACM</source>
          ,
          <year>2011</year>
          , pp.
          <fpage>235</fpage>
          -
          <lpage>244</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>S.</given-names>
            <surname>Oney</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Myers</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Brandt</surname>
          </string-name>
          , “Interstate:
          <article-title>Interaction-oriented language primitives for expressing gui behavior,”</article-title>
          <source>in Proc. of UIST'14. ACM</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>J. De Boeck</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Raymaekers</surname>
            , and
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Coninx</surname>
          </string-name>
          , “
          <article-title>A tool supporting model based user interface design in 3d virtual environments</article-title>
          ,” in
          <source>Grapp 2008: proceedings of the third international conference on computer graphics theory and applications</source>
          ,
          <year>2008</year>
          , pp.
          <fpage>367</fpage>
          -
          <lpage>375</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>E.</given-names>
            <surname>Cuppens</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Raymaekers</surname>
          </string-name>
          , and
          <string-name>
            <given-names>K.</given-names>
            <surname>Coninx</surname>
          </string-name>
          , “
          <article-title>fVRIXMLg: A user interface description language for virtual environments</article-title>
          ,”
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>K.</given-names>
            <surname>Kin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Hartmann</surname>
          </string-name>
          , T. DeRose, and M. Agrawala, “Proton++
          <article-title>: a customizable declarative multitouch framework,” in Proceedings of the 25th annual ACM symposium on User interface software and technology (</article-title>
          <source>UIST'12)</source>
          ,
          <year>2012</year>
          , pp.
          <fpage>477</fpage>
          -
          <lpage>486</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>M.</given-names>
            <surname>Beaudouin-Lafon</surname>
          </string-name>
          ,
          <article-title>“User interface management systems: Present and future,” in From object modelling to advanced visual communication</article-title>
          . Springer,
          <year>1994</year>
          , pp.
          <fpage>197</fpage>
          -
          <lpage>223</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>R.</given-names>
            <surname>Bolt</surname>
          </string-name>
          , “
          <article-title>Put-that-there: Voice and gesture at the graphics interface,” in Proceedings of the 7th annual conference on computer graphics and interactive techniques (SIGGRAPH' 80)</article-title>
          . ACM,
          <year>1980</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>F.</given-names>
            <surname>Cuenca</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. Van der Bergh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Luyten</surname>
          </string-name>
          , and
          <string-name>
            <given-names>K.</given-names>
            <surname>Coninx</surname>
          </string-name>
          , “
          <article-title>Hasselt uims: a tool for describing multimodal interactions with composite events,”</article-title>
          <source>in Proceedings of EICS'15</source>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>V. S.</given-names>
            <surname>Alagar</surname>
          </string-name>
          and
          <string-name>
            <given-names>K.</given-names>
            <surname>Periyasamy</surname>
          </string-name>
          ,
          <source>Specification of software systems. Springer Science &amp; Business Media</source>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>J.</given-names>
            <surname>Sauro</surname>
          </string-name>
          and
          <string-name>
            <given-names>J. S.</given-names>
            <surname>Dumas</surname>
          </string-name>
          , “
          <article-title>Comparison of three one-question, post-task usability questionnaires</article-title>
          ,”
          <source>in Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM</source>
          ,
          <year>2009</year>
          , pp.
          <fpage>1599</fpage>
          -
          <lpage>1608</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>J.</given-names>
            <surname>Brooke</surname>
          </string-name>
          , “
          <article-title>Sus-a quick and dirty usability scale,” Usability evaluation in industry</article-title>
          , vol.
          <volume>189</volume>
          , no.
          <issue>194</issue>
          , pp.
          <fpage>4</fpage>
          -
          <lpage>7</lpage>
          ,
          <year>1996</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>J. R.</given-names>
            <surname>Lewis</surname>
          </string-name>
          and
          <string-name>
            <given-names>J.</given-names>
            <surname>Sauro</surname>
          </string-name>
          , “
          <article-title>The factor structure of the system usability scale,” in Human Centered Design</article-title>
          . Springer,
          <year>2009</year>
          , pp.
          <fpage>94</fpage>
          -
          <lpage>103</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>T.</given-names>
            <surname>Grossman</surname>
          </string-name>
          , G. Fitzmaurice, and
          <string-name>
            <given-names>R.</given-names>
            <surname>Attar</surname>
          </string-name>
          , “
          <article-title>A survey of software learnability: metrics, methodologies and guidelines</article-title>
          ,”
          <source>in Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM</source>
          ,
          <year>2009</year>
          , pp.
          <fpage>649</fpage>
          -
          <lpage>658</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <surname>J. D. Brown</surname>
          </string-name>
          ,
          <article-title>The elements of language curriculum: A systematic approach to program development</article-title>
          .
          <source>ERIC</source>
          ,
          <year>1995</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>J.</given-names>
            <surname>Siegmund</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Siegmund</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Apel</surname>
          </string-name>
          , “
          <article-title>Views on internal and external validity in empirical software engineering</article-title>
          ,”
          <source>in Proceedings of the 37th International Conference on Software Engineering, ICSE</source>
          <year>2015</year>
          ,(to appear),
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>R.</given-names>
            <surname>Brunken</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. L.</given-names>
            <surname>Plass</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D.</given-names>
            <surname>Leutner</surname>
          </string-name>
          , “
          <article-title>Direct measurement of cognitive load in multimedia learning</article-title>
          ,
          <source>” Educational Psychologist</source>
          , vol.
          <volume>38</volume>
          , no.
          <issue>1</issue>
          , pp.
          <fpage>53</fpage>
          -
          <lpage>61</lpage>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>