=Paper=
{{Paper
|id=Vol-1522/Cuenca2015HuFaMo
|storemode=property
|title=Empirical Study: Comparing Hasselt with C# to Describe Multimodal Dialogs
|pdfUrl=https://ceur-ws.org/Vol-1522/Cuenca2015HuFaMo.pdf
|volume=Vol-1522
|dblpUrl=https://dblp.org/rec/conf/models/CuencaBLC15
}}
==Empirical Study: Comparing Hasselt with C# to Describe Multimodal Dialogs==
Empirical Study: Comparing Hasselt with C# to describe multimodal dialogs Fredy Cuenca, Jan Van den Bergh, Kris Luyten, Karin Coninx Hasselt University - tUL - iMinds Expertise Centre for Digital Media Wetenschapspark 2, 3590 Diepenbeek Belgium Email: {fredy.cuencalucero,jan.vandenbergh,kris.luyten,karin.coninx}@uhasselt.be Abstract—Previous research has proposed guidelines for cre- Concretely, we created a language, called Hasselt, that ating domain-specific languages for modeling human-machine allows notations for declaring multimodal events and human- multimodal dialogs. One of these guidelines suggests the use machine dialogs separately. The multimodal events are textu- of multiple levels of abstraction so that the descriptions of multimodal events can be separated from the human-machine ally declared as combinations of predefined user events (e.g. dialog model. In line with this guideline, we implemented Hasselt, mouse clicks, speech inputs, etc.). The multimodal dialog a domain-specific language that combines textual and visual is depicted as finite state machines (FSMs) whose arcs are models, each of them aiming at describing different aspects of labelled with multimodal event names. the intended dialog system. In order to evaluate the benefits of such separation of We conducted a user study to measure whether the proposed language provides benefits over equivalent event-callback code. concerns, a user study was conducted. Participants had to During the user study participants had to modify the Hasselt sequentially modify two equivalent implementations of a mul- models and the equivalent C# code. The completion times timodal dialog system. In one case, both the code for handling obtained for C# were on average shorter, although the difference the events and the code for handling the dialog were included was not statiscally significant. Subjective responses were collected in the same source file written in C#. In the other case, these using standardized questionnaires and an interview, which both indicated that participants saw value in the proposed models. We were specified separately with the textual and visual notations provide possible explanations for the results and discuss some provided by Hasselt. lessons learned regarding the design of the empirical study. Index Terms—Multimodal systems, Human-machine dialog, II. RELATED WORK Finite state machines, Dialog model, Domain-specific language. A. Modeling multimodal dialogs as FSMs I. I NTRODUCTION When modeling human-machine dialogs as finite state ma- Multimodal systems allow users to communicate through chines (FSM), the nodes of the FSM represent the possible the coordinated use of multiple input modes, e.g. speech, gaze, states of the dialog system, and its arcs represent the transitions and gestures. These systems have the potential to support a in the dialog system’s state. Many researchers have proposed human-machine communication that is robust (e.g. multiple FSM-based solutions for modeling unimodal human-machine inputs can be combined to perform disambiguation), flexible dialogs, e.g. IOG [10], SwingStates [11], Schwarz’s framework (e.g. users can choose their preferred modality), and more [12], and InterState[13], among others. natural than ever before. However, there are only few languages that allow modeling However, implementing multimodal systems is still a dif- multimodal human-machine dialogs as FSMs. Some represen- ficult task. This is partly because of the complexity of tative examples are listed in what follows. multimodal interaction [1], [2], the absence of standardized We can consider MEngine [4] as a FSM-based language that methodology [2], and the mastering of different state of the allows modeling trivial multimodal dialogs, e.g. the system art technologies required for their construction [3]. reponses are always the same for a given multimodal input. Several domain-specific languages have been proposed with We can consider MEngine [4] as a FSM-based language that the intention of simplifying the implementation of multimodal allows modeling trivial multimodal dialogs, e.g. the system interfaces [3]–[9]. From an analysis of these languages, Du- reponses are always the same for a given multimodal input. mas et al. [3] proposed many guidelines for developing future In NiMMiT [14], the dialog model is a state machine languages. One of these guidelines states that the specialized where each state represent a set of tasks that are available language must be such that the declaration of multimodal to the end user. NiMMiT is restricted to interactive virtual events can be separated from the description of the human- environments (IVE) since its presentation model has to be machine dialog (Figure 1). The present research has imple- encoded in VRIXML [15]. In contrast, with our proposal, the mented this idea and measured how potential users can benefit presentation model can be implemented in any .NET language, from such an implementation. which opens a wide assortment of possibilities beyond IVE. 25 III. H ASSELT Hasselt provides notations for creating executable specifi- cations of multimodal human-machine dialogs. It comes with a complete User Interface Management System (UIMS) [17] that offers the editors, runtime environment, and debugging tools required to code, run, and test Hasselt specifications. A. Running Example In the remainder of the paper, we will show how to implement a simple multimodal dialog system with Hasselt. The front-end of our running example system is shown in Fig- Fig. 1. On the left side of the diagram, one can see the different levels ure 3, BE. It allows end users to issue multimodal commands of abstraction proposed by Dumas et al. [3]. On the right side, one can see to create, move, and remove objects from a canvas that is how our language follows the same framework: our visual language is at the initially empty. These commands may be enabled or disabled dialog level whereas our textual notations are at the events level. depending on the current context-of-use. Users can create new objects by issuing voice commands SMUIML [3] provides different notations for declaring the like ‘create green box here’ while clicking on the canvas to human-machine dialog and for combining user events. Unlike indicate the position of the new object. Boxes are reshuffled by Hasselt, SMUIML does not include a symbol for defining issuing ‘put that there’ while clicking on both the target object iterative events. This reduces the space of multimodal events and its new position [18]. And the canvas can be cleared up that can be specified with SMUIML in comparison with in reaction to the voice command ‘remove objects’. To make Hasselt. For instance, the drag-and-drop, which involves an the system responses depend on the context-of-use, we added arbitrary number of mouse-move events cannot be specified two rules: The boxes can only be moved if there are more with SMUIML at the level of events. Another difference is than three of them on the canvas; and the canvas can only be that, unlike Hasselt, SMUIML does not support state variables cleared up after the displacement of at least one object. or conditional transitions at the dialog level. B. How to use Hasselt UIMS? B. User studies. Interaction models vs. event-callback code The steps required to create a multimodal dialog system To the best of our knowledge, none of the abovementioned with Hasselt UIMS are as follows. multimodal dialog modeling languages have been evaluated in user studies. Nonetheless, outside the multimodal domain, we 1) Implementing a back-end application: One must create found two user studies that guided us in the design of our an executable program implementing the front-end and the experiments. handling methods of the intended system. For the purpose Oney et al. recruited 20 developers to evaluate the under- of this work, such program will be referred to as back-end standability of Interstate’s visual notation. Each participant application. The back-end application can be implemented had to modify two systems (drag-and-drop and a thumbnail with any .NET programming language to be subsequently viewer) implemented in both RaphaelJS1 and InterState. It imported into Hasselt UIMS. was verified that InterState models are faster-to-modify than For the aforementioned running example, the back-end equivalent event-callback code written in RaphaelJS [13]. application implements the front-end shown in Figure 3, BE, The creators of Proton++ carried out two experiments with and the methods for creating, moving, and removing 12 programmers. Each participant was shown a multitouch virtual objects, i.e. C REATE O BJECT ( COLOR , X , Y ), P UT- gesture specification and set of videos of a user performing T HAT T HERE ( X 1, Y 1, X 2, Y 2), and R EMOVE A LL O BJECTS (). gestures. Gestures may be specified as a regular expression, tablature, or with event-callback code and the participant 2) Declaring multimodal events: Hasselt allows combining had to match the specification with the video showing the multiple user events into one single abstraction [9], [19]. described gesture. The results showed that the tablatures of Programatically, user events can be combined through a set Proton++ are easier-to-comprehend than equivalent regular of event operators that can be used in a recursive manner. expressions and event-callback code [16]. The operator F OLLOW ED BY (; ) indicates sequentiality Since real-world scenarios require programmers not only to of events, the operator OR(|) serves to specify alterna- comprehend but to write programming code, we followed the tive events, AN D(+) represents simultaneity of events, and schema of Oney et al [13]. We asked participants to perform IT ERAT ION (∗ ) is meant to specify repetitive events [9]. modifications with our language and with equivalent event- callback code. To describe the interactions to be supported by our run- ning example system, we used these operators to declare the 1 http://raphaeljs.com/ following multimodal events (Figure 2, a): 26 (a) (b) Fig. 2. Hasselt UIMS during design time. (a) Textual editor for declaring multimodal events. It offers syntax highlighting, auto-completion popups, tooltip messages, and other features that facilitate the editing of code. (b) Visual editor for depicting human-machine dialogs. The arrows of the FSM are annotated with the multimodal events declared in (a). The arrows can include guard conditions. event putT hatT here = speech.put ; speech.that + mouse.downhx1, y1i; speech.there + mouse.downhx2, y2i; (1) event createObject = speech.create ; speech.anyhcolori; speech.here + mouse.downhx, yi; (2) event removeObjects = speech.remove ; (3) speech.objects 3) Binding multimodal events with event-handling call- backs: Each multimodal event must be bound to a method of the back-end application. At runtime, Hasselt UIMS will Fig. 3. Hasselt UIMS during runtime: The event viewer (EV) displays the automatically launch these methods whenever their associated user events as detected by the recognizers. The variable browser (VB) shows multimodal events are detected. the event parameters values. The automata view (AV) presents animations showing the progressive detection of multimodal events. The back-end appli- For our running example, one has to bind the method cation (BE) the end user has to interact with was imported into Hasselt UIMS. P UT T HAT T HERE ( X 1, Y 1, X 2, Y 2) to the event putT hatT here shown in Equation 1. Similarly, methods C REATE O B - JECT ( COLOR , X , Y ), and R EMOVE A LL O BJECTS () can be bound to the events declared in Equation 2 and Equation 3 to describe human-machine dialogs as extended finite state ma- respectively. The multimodal events do not have to have the chines [20], i.e. state machines augmented with state variables same name as their associated callbacks. and guard conditions. We must highlight that the binding between multimodal In a Hasselt visual model, the circles represent the potential events and callback functions is specified through a textual states of the dialog system, and the arcs represent the system’s notation. With this notation, one can bind not only one state transitions. Each arc is annotated with a multimodal but multiple callbacks to one single multimodal event, event whose occurrence causes the transition represented by and to specify temporal and spatial constraints among the the arc. Additionally, one can use state variables to encode constituents of a multimodal event. The notations for binding quantitative aspects of the dialog, e.g. the number of times a multimodal events will not be presented herein and interested state (transition) is visited (traversed). The statements required readers can refer to [19]. The focus of this paper is in to maintain the state variables can be annotated in the arcs of the evaluation of the visual language that is used after the the extended state machine. Finally, guard conditions can also definition and binding of multimodal events. be annotated in the arcs of a FSM to restrict their associated state transitions. 4) Describing the human-machine dialog: The visual editor The visual model shown in Figure 2, b describes the dialog provided by Hasselt UIMS (Figure 2, b) enables programmers supported by our running example system. The circle labelled 27 as 1 represents the state where canvas is empty; the circle 2 represents the state where there is at least one object on the canvas; and the circle 3, the state where at least one object has been moved. The system moves from the initial state 1 to state 2 upon the creation of the first object. It also moves from state 2 to state 3 after the first displacement of an object. The variable N is used (a) to count the number of objects in the canvas –when this is relevant–, and (b) to condition the displacement of objects, which should only be possible if there are more than 3 objects on the canvas –notice the label [N > 3]. Finally, the removal of objects sets the system to its initial state: the circle labelled as 1. Fig. 4. Programming experience of the 12 participants It can be proved that if event-callback code were used to implement the running example system, the identification of the system’s state would have required a series of nested After the participant performs the requested changes with if-else statements spread throughout a big portion of the a language, he is asked to fill a post-task questionnaire. At whole program. Rather, Hasselt models have fewer and simpler the end of the whole experiment, i.e. after using Hasselt conditional clauses that can be centralized in a FSM that pro- and C#, the participant is asked to evaluate the usability of vides a comprehensive overview of the human-machine dialog. Hasselt UIMS and interviewed by the researcher. There is one way to know whether these theoretical advantages 2) Participants: We recruited 12 participants, all of whom are reflected into practical benefits for programmers, which is are male. The programming experience of the participants through a user study. ranges from 4 to 13 years; their C# experience, between 1 and 8 years (Figure 4). IV. U SER S TUDY 3) Procedure: Before the beginning of the experiment, each The experiment aims at determining whether separating participant was given a 10-minutes tutorial about Hasselt. the declaration of events from the dialog model brings about Participants had to describe a simple, Hello world-like multi- benefits in favor of programmers. modal interaction by following step-by-step instructions. The tutorial help participants get acquainted with the visual editor, A. Hypothesis debugging tools, and runtime environment of Hasselt UIMS. We hypothesize that the maintainance of a multimodal Since all participants had experience with C# and MS Visual dialog can be performed faster and/or more easily with Has- Studio, there was no need for training in this respect. selt, where the events can be described separately from the For the experiment, the participant was presented with a dialog model, than with C#, where the code for combining system similar to the one herein used as a running example. multimodal events is intermixed with the code for dialog It allowed users to create and remove virtual objects from a management. canvas in response to multimodal input. In the version given to participants, the objects could be created or removed at B. Method any time, after which the end user was acknowledged with 1) Study Design: The participants were evaluated one by voice feedback. Participants were asked to change the system one after receiving a training session. so that it can handle two contexts-of-use: the command to During the experiment, each participant was shown a mul- remove objects must only be processed if there are objects on timodal system with which he had to interact according to the screen; otherwise, it should be ignored. the indications of the researcher. Once the participant was The aforementioned system was described with both C# and familiar with the functionality of the system, he was shown Hasselt. Each participant had to modify both sources within a the source code/visual model of the system and asked to per- time limit of 30 minutes per language. form modifications in it. Each participant had to sequentially 4) Solution of the modeling task: With Hasselt, the required perform the changes in both Hasselt and C#. The changes to changes can be made by modifying the human-machine dialog be performed were explained orally, but also written in a sheet model only. Participants had to define different context-of-uses that the participant can check during the experiment. to distinguish whether the form is empty or has objects on it. While the participant modifies the code/visual model, the Figure 5 shows two potential solutions. researcher observes the changes made by the participant on As to the C# code, participants had to declare one variable a secondary monitor that replicates the screen in front of the for counting the number of objects on the form. This variable participant. In this way, for each language, the researcher can has to be updated every time a new object is created and measure the completion time of the task, count how many whenever all the objects are removed from the form. It also times the partial changes are tested in the runtime environment, has to be interrogated before proceeding to clear up the form. and watch how the participant navigates trough the C# code Although these four additions are easy to implement, they have or Hasselt visual model. to be included in the right place of a source code of 114 lines. 28 (a) Model given to participants (b) Most common solution (c) Outlier’s solution Fig. 5. The model shown in (a) was given to participants. Here the system is always in the same context-of-use and any interaction is available at any time. The model (b) was presented as a final solution by 11 participants. The model (c) was the final solution found by the outlier, who had no previous experience in FSM. Both types of solutions were correct. SUS test scores are normalized to values between 0 and 100. To have a benchmark to which one can compare SUS scores with, Lewis et al. shared historical information showing that the average and third quartile of 324 usability evaluations performed with SUS are 62.1 and 75.0 respectively [23]. Finally, according to a factor analysis performed by Lewis et al., the SUS questionnaire does not only measures usability. It also measures learnability, being Q4 and Q10 the questions that allow estimating the perceived learnability of the Fig. 6. Single Ease Question (SEQ) questonnaire. system under evaluation [23]. In the taxonomy proposed by Grossman et al. [24], this learnability falls within the category (Actually, the full code contained 273 lines, but we hide the of initial learnability given that participants have been exposed code for loading the speech recognizer, for hooking the mouse, to Hasselt for the first time during this experiment. and the back-end functions. This was to make the comparison D. Interview Highlights as fair as possible. With Hasselt, the configuration code or the back-end code cannot be seen either. The former is within Based on the SEQ scores, a majority (7 out of 12 par- Hasselt UIMS; the latter, in a canned application imported into ticipants) considered that the modification of Hasselt visual Hasselt UIMS.) models was easier than changing C# code. When asked for a reason, many of these participants referred to the overall view C. Measures provided by the visual models: “You can see all the system in one screen” and “You do not have to browse code through 1) Observations: As the participant performs the required multiple screens” were common answers. modifications with a certain language, the researcher monitors One of the few participants who scored Hasselt as more his working time and counts the number of times the code is difficult than C# was the outlier seen in Figure 8, a. He pointed tested. out his total lack of knowledge in state machines as the cause 2) Single Ease Question (SEQ) questionnaire: Right after of his poor performance. All other participants had, at least, completing the changes with each language, participants were pen-and-paper experience with state machines and thus, they asked to fill the Single Ease Question (SEQ) questionnaire, a could get more benefits from the training session. 7-point rating scale (Figure 6) aimed to assess the perceived difficulty (or perceived ease, depending on one’s perspective) E. Results of a task. The questionnaire has been proven to be reliable, All 12 participants could complete the changes with both sensitive, and valid while also being easy to respond and easy languages Hasselt and C#. The data from observations and to score [21]. post-task questionnaires are synthesized in Figure 8. After 3) System Usability Scale (SUS) questionnaire: At the end inspecting the data, we decided to drop the only participant of the experiment, participants had to fill the System Usability who had no previous experience with FSMs. He was an outlier Scale (SUS) questionnaire [22] (Figure 7, a), which has in the plots (a) and (b) shown in Figure 8. Therefore, the become a well-known questionnaire for end-of-test subjective following results are based on the remaining 11 participants. assessments of usability [23]. 1) Completion time: On average, changes made with Has- The SUS questionnaire consists of 10 items with 5-point selt took 2.4 minutes in comparison with the 2.1 minutes scales numbered from 1 (anchored with “Strongly disagree”) when using C#. However, these results were not statistically to 5 (anchored with “Strongly agree”). significant. We could not reject the null hypothesis in favor 29 (a) System Usability Scale (SUS) questionnaire (b) Scores per question for Hasselt UIMS Fig. 7. (a) SUS questionnaire that was filled by the 12 participants to evaluate Hasselt UIMS. (b) Participants’ responses to the SUS questionnaire. Stacked barplots show the frequency of answers per question. The numbers at the right of the barplots indicate the average score per question. of the alternative hypothesis that Hasselt completion times are what it claims, or purports, to be measuring [25]. The construct higher than C# completion times: a Wilcoxon signed-rank test validity of our empirical study coud have been affected as resulted in p-value = 0.1562 > 0.05 (W = 12.5, Z = 1.3828). follows. 2) Code testing effort: On average, programmers tested First, the code testing effort was quantified as the number of their code 1.2 times when using Hasselt and 1.4 times when times the participant enters in the runtime environment. This using C#. But this result is not statistically significant either. means we assumed that participants have to run the program in We could not reject the null hypothesis in favor of the order to test the correctness of the source code. This definition alternative hypothesis that the code testing effort is lower with may not be complete since it ignores the effort made when the Hasselt than with C#: a Wilcoxon signed-rank test resulted in participant ‘runs and tests the code inside his head’. p-value = 0.25 (W =0, Z = -1.4142). Second, the SUS questionnaire may have been measured 3) Perceived ease of the task: The average SEQ scores only certain aspects of the usability of Hasselt UIMS. An for Hasselt and C# were 6.6 and 5.9 respectively. In this expert in empirical studies made us notice that usability also case, we found that this difference in favor of Hasselt was includes the long-term experience of using a software system, statistically significant. A Wilcoxon signed-rank test indicated which is not considered in our study: all participants used that the alternative hypothesis that the SEQ scores are higher Hasselt for the first and only time during the study. However, for Hasselt than for C# can be accepted (p-value = 0.0078, the initial learnability, which is another dimension of the W =28, Z = 2.6153). SUS questionnaire, was correctly measured by Q4 and Q10, Note: The use of Wilcoxon signed-rank tests instead of according to the same expert. paired t-tests responded to the fact that we could not guarantee Construct validity is not the only type of validity that must the normality assumption required by the latter. The non- be considered when designing empirical research. An empir- normality of the pair differences was observed in both normal ical study is said to have internal validity when the impact Q-Q plots and Shapiro-Wilk normality tests. The data analysis of almost all influencing factors are excluded, so the study is was performed with the open source software R2 . performed in a highly controlled setting [26]. In contrast, ex- 4) Results of the SUS questionnaire: The SUS question- ternal validity consists of allowing some influencing factors so naire was only used to evaluate Hasselt UIMS. Comparing that the experiment can emulate a real-world situation instead with the data repository provided by Lewis et al., the average of an ideal one [26]. Whereas external validity increases the SUS score of 73.96 that the participants gave to Hasselt UIMS chances that results can be generalized to more realistic, every- indicates that its perceived usability is well above average but day situations, internal validity allows researchers to pinpoint not higher than 75% of the 324 systems reported in [23]. the reasons of improvement or degradation, but at the cost of The average scores obtained for Hasselt UIMS for each generalizability. of the 10 items of the SUS questionnaires are observed in 2) Internal validity: We pursued for internal validity in the Figure 7, b. following way. F. Threats to validity First, the order of the language to be used first (i.e. Hasselt 1) Construct validity: The general concept of validity was or C#) was balanced over the participants so that the aggre- traditionally defined as the degree to which a test measures gated experience bias can be neutralized. Besides, since the goal of the experiment was to measure 2 https://www.r-project.org/ the effort for describing multimodal dialogs, participants were 30 (a) (b) (c) Fig. 8. Data collected from the 12 participants. (a) Completion times, (b) Number of times the code was tested. The plot whiskers are at the lowest datum still within 1.5 times the interquartile range (IQR) of the lower quartile, and the highest datum still within 1.5× IQR of the upper quartile. (c) Barplots showing the frequency of each answer for the SEQ questionnaire. restricted to this portion of the code/model only. With Hasselt, We expected to experience some benefits from separating programmers were restricted to use the visual editor only. the event definition code from the dialog management model. With C#, the code for configuring the speech recognizer and But this is what we found: the application code (e.g. for creating, deleting objects) was First, we found that the better-separated Hasselt models are hidden to programmers –we put this portion of the code in not faster-to-modify than equivalent event-callback code where regions that were collapsed during the experiment. the instructions for event handling and for dialog management On the other hand, offering participants a tutorial on Hasselt are intermixed. Although for our participants, the task of but no tutorial on creating multimodal dialogs using C# might implementing a multimodal dialog was, on average, performed affect the experiment’s internal validity. faster with C# than with Hasselt, these results were not 3) External validity: In order to confer our results with high statistically significant. external validity, we allow some ‘freedom’ to the experiments. Second, our participants tested Hasselt models fewer times First, the pool of participants was quite varied. It includes than equivalent C# code. Despite of this, completing changes master and PhD students, post-docs, and industry program- with Hasselt took longer. Based on our observations, the mers, from different universities and countries, with and reason for this may be that modifying visual models is more without background in finite state machines (FSMs). time-consuming than writing textual code. Most importantly, participants were left free in the wild. Finally, the SEQ questionnaires revealed that participants This contrasts with other approaches commonly used in empir- perceived that performing the required changes with Hasselt ical studies, such as the think-aloud protocol and the question- was easier than with C#. Although these measurements turned suggestion protocol [24]. The former would require partici- out to be statistically significant, we cannot discard that some pants to speak out while programming in order to provide the response bias played a role here. Participants gave higher researcher with insights about their programming logic. The scores to the language that led to longer completion times. latter would allow the researcher to give advice proactively B. Perceived usability and initial learnability of Hasselt UIMS to the participant. In our experiments, the researcher only Considering that odd-numbered questions are positively- interferes when participants ask for questions. In our opinion, worded, scores higher than 3 in these items reflect that this is a more realistic scenario that reflects the typical case participants agree (to a certain degree) that the evaluated of a programmer working by his own and eventually asking system presents some good aspect/feature. In our study, all for advice to more expert programmers when he got stuck on odd-numbered questions were scored with more than 3 points a problem. on average. From this group, Q3, i.e. “I thought the system was easy to use” and Q7, i.e. “I would imagine that most V. D ISCUSSION AND C ONCLUSION people would learn to use this system very quickly”, received the highest scores. A. Modeling with Hasselt and C# Similarly, since even-numbered items are negatively- We presented Hasselt, a language that provides notations worded, scores lower than 3 would indicate that participants for defining multimodal human-machine interaction dialogs. are disagreeing (to a certain degree) with some negative A dialog model in Hasselt is an extended finite state machine comment about the system. In our studies, all even-numbered specified with a visual editor and whose arcs are annotated questions were scored with less than 3 points on average. From with multimodal events that are defined with a separate textual this group, Q10, “I needed to learn a lot of things before I notation. could get going with this system”, Q4, i.e. “I think I would 31 need support of technical person to use this system”, and Q8, [4] M. Bourguet, “Designing and prototyping multimodal commands,” in i.e. “I found the system very cumbersome to use” received the Proceedings of INTERACT’03, 2003, pp. 717–720. [5] P. Dragicevic and J.-D. Fekete, “Support for input adaptability lowest scores (which in this case it is something positive). in the icon toolkit,” in Proceedings of the 6th ICMI’04. New The salient scores obtained for Q4 and Q10, the two York, NY, USA: ACM, 2004, pp. 212–219. [Online]. Available: questions that define perceived initial learnability [23], indicate http://doi.acm.org/10.1145/1027933.1027969 [6] J. De Boeck, D. Vanacken, C. Raymaekers, and K. Coninx, “High level that, to a certain degree, participants consider that Hasselt modeling of multimodal interaction techniques using NiMMiT,” Journal UIMS is easy-to-learn. of Virtual Reality and Broadcasting, vol. 4, no. 2, 2007. [7] W. A. König, R. Rädle, and H. Reiterer, “Interactive design of multi- C. Future work modal user interfaces,” Journal on Multimodal User Interfaces, vol. 3, no. 3, pp. 197–213, 2010. We think that the main reason why no clear winner emerged [8] J.-Y. L. Lawson, A.-A. Al-Akkad, J. Vanderdonckt, and B. Macq, from this study is that the task was too simple given the “An open source workbench for prototyping multimodal interactions programming experience of the participants. Thus, we plan based on off-the-shelf heterogeneous components,” in Proceedings of the EICS’09. ACM, 2009, pp. 245–254. to repeat the experiment with more complex tasks. [9] F. Cuenca, J. Van der Bergh, K. Luyten, and K. Coninx, “A domain- Other minor changes refer to the functionalities of the visual specific textual language for rapid prototyping of multimodal interactive editors. We want to minimize the effort involved in wiring the systems,” in Proceedings of the 6th ACM SIGCHI symposium on Engineering interactive computing systems (EICS’14). ACM, 2014. FSMs. We plan to add combination keys for creating nodes and [10] D. A. Carr, “Specification of interface interaction objects,” in Pro- links, not to allow resizing of the nodes, and allow jumping ceedings of the SIGCHI Conference on Human Factors in Computing between the elements of a FSM with the TAB key. Systems. ACM, 1994, pp. 372–378. Finally, we would like to gather objective cognitive load [11] C. Appert and M. Beaudouin-Lafon, “Swingstates: Adding state ma- chines to java and the swing toolkit,” Software: Practice and Experience, measurements [27] like heart rate or pupil dilatation. We vol. 38, no. 11, pp. 1149–1182, 2008. expect to see some positive correlations between the perceived [12] J. Schwarz, J. Mankoff, and S. Hudson, “Monte carlo methods for difficulty declared by participants in the questionnaires and managing interactive state, action and feedback under uncertainty,” in Proceedings of the 24th annual ACM symposium on UIST. ACM, 2011, their physiological reactions during the task. pp. 235–244. [13] S. Oney, B. Myers, and J. Brandt, “Interstate: Interaction-oriented D. Lessons learned language primitives for expressing gui behavior,” in Proc. of UIST’14. Based on this experience, we suggest some guidelines for ACM, 2014. [14] J. De Boeck, C. Raymaekers, and K. Coninx, “A tool supporting model others trying to design comparative studies between domain- based user interface design in 3d virtual environments,” in Grapp 2008: specific languages and some mainstream language. proceedings of the third international conference on computer graphics It is important that the training session can be supervised theory and applications, 2008, pp. 367–375. [15] E. Cuppens, C. Raymaekers, and K. Coninx, “{VRIXML}: A user by the researcher and carried out right before the test. This interface description language for virtual environments,” 2004. makes all participants to start the experiment with a similar [16] K. Kin, B. Hartmann, T. DeRose, and M. Agrawala, “Proton++: a level of knowledge as long as they have similar backgrounds. customizable declarative multitouch framework,” in Proceedings of the 25th annual ACM symposium on User interface software and technology Otherwise, some participants can benefit more from the train- (UIST’12), 2012, pp. 477–486. ing than others, which may cause the appearance of outliers. [17] M. Beaudouin-Lafon, “User interface management systems: Present and It may not be a good idea to ask programmers working in future,” in From object modelling to advanced visual communication. Springer, 1994, pp. 197–223. the same research lab. Some may feel that one is going to [18] R. Bolt, “Put-that-there: Voice and gesture at the graphics interface,” evaluate their programming skills. From a research lab with in Proceedings of the 7th annual conference on computer graphics and more than 50 people, we could only recruit 5 participants. The interactive techniques (SIGGRAPH’ 80). ACM, 1980. [19] F. Cuenca, J. Van der Bergh, K. Luyten, and K. Coninx, “Hasselt uims: remaining 7 participants were recruited from external institu- a tool for describing multimodal interactions with composite events,” in tions. Alternatively, one can also ask a person from an external Proceedings of EICS’15, 2015. institution to play the role of researcher so that participants do [20] V. S. Alagar and K. Periyasamy, Specification of software systems. Springer Science & Business Media, 2011. not feel observed by a acquaintance or colleague. [21] J. Sauro and J. S. Dumas, “Comparison of three one-question, post-task The complexity of the programming task must be appropri- usability questionnaires,” in Proceedings of the SIGCHI Conference on ately calibrated. It has to be as high as to notice differences Human Factors in Computing Systems. ACM, 2009, pp. 1599–1608. [22] J. Brooke, “Sus-a quick and dirty usability scale,” Usability evaluation in the measurements; but not so high as to affect completion in industry, vol. 189, no. 194, pp. 4–7, 1996. rates. In this matter, one must evaluate whether it is better [23] J. R. Lewis and J. Sauro, “The factor structure of the system usability to ask programmers to modify an existing program or to scale,” in Human Centered Design. Springer, 2009, pp. 94–103. [24] T. Grossman, G. Fitzmaurice, and R. Attar, “A survey of software implement a new one from scratch. learnability: metrics, methodologies and guidelines,” in Proceedings R EFERENCES of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 2009, pp. 649–658. [1] Y. A. Ameur and N. Kamel, “A generic formal specification of fusion [25] J. D. Brown, The elements of language curriculum: A systematic of modalities in a multimodal hci,” in Building the Information Society. approach to program development. ERIC, 1995. Springer, 2004. [26] J. Siegmund, N. Siegmund, and S. Apel, “Views on internal and [2] W. Dargie, A. Strunk, M. Winkler, B. Mrohs, S. Thakar, and W. Enkel- external validity in empirical software engineering,” in Proceedings mann, “A model based approach for developing adaptive multimodal of the 37th International Conference on Software Engineering, ICSE interactive systems.” in ICSOFT (PL/DPS/KE/MUSE), 2007, pp. 73–79. 2015,(to appear), 2015. [3] B. Dumas, D. Lalanne, and R. Ingold, “Description Languages for [27] R. Brunken, J. L. Plass, and D. Leutner, “Direct measurement of cog- Multimodal Interaction: A Set of Guidelines and its Illustration with nitive load in multimedia learning,” Educational Psychologist, vol. 38, SMUIML,” Journal of multimodal user interfaces, vol. 3, no. 3, pp. no. 1, pp. 53–61, 2003. 237–247, 2010. 32