=Paper= {{Paper |id=Vol-439/paper-1 |storemode=property |title=WOEB: Rapid Setting of Wizard of Oz Experiments and Reuse for Deployed Applications |pdfUrl=https://ceur-ws.org/Vol-439/paper1.pdf |volume=Vol-439 }} ==WOEB: Rapid Setting of Wizard of Oz Experiments and Reuse for Deployed Applications== https://ceur-ws.org/Vol-439/paper1.pdf
                         WOEB: Rapid Setting of Wizard of Oz Experiments and
                                 Reuse for Deployed Applications

                                           Andrea Bellucci, Paolo Bottoni, and Stefano Levialdi
                                                              Dipartimento di Informatica
                                                           Università Sapienza di Roma, Italy
                                                     {bellucci,bottoni,levialdi}@uniroma1.it


              ABSTRACT                                                          ance of a software application, including usability testing [8,
              1
               We describe an approach and an environment for setting up        10, 13].
              Wizard of Oz experiments to test multimodal interaction           Wizard of Oz (WOz) techniques [11, 14] are a popular way to
              (through mobile devices) for multimedia content delivery.         perform this testing. Similarly to the character in F. Baum’s
              This is based on a metamodel and a reference client-server        story, in a WOz experiment a human wizard simulates the
              architecture, whose implementation is discussed. The ap-          behaviour of a partially implemented multimodal system.
              proach allows progressive refinement of the Wizard of Oz          The subjects of the experiment must believe they are inter-
              environment into a deployed application, through the use of       acting with a real, fully implemented one, thus maintaining
              a single metamodel, to be progressively specialized, and a        a natural interaction behaviour. The wizard, unknown to
              service-oriented approach which allows the substitution of        the subject, monitors the user through a dedicated com-
              implemented services to simulated ones. A visual interface        puter program, connected to the observed system over a
              is provided for the definition of the wizard interface, of the    network. When the subject invokes a function that is not
              metadata relating content delivery and possible interactions      implemented by the observed system, the wizard simply sim-
              on it, and of some aspects of the client interaction.             ulates its effect. In this way, designers can capture specific
                                                                                interaction phenomena, without the mediation and distor-
                                                                                tions of human-human communication. However, setting up
              1.           INTRODUCTION                                         a WOz experiment may be hard in that the working envi-
                 The increasing capabilities of processing power and mem-       ronment for the wizard has to provide complete control on
              ory and the high resolution displays of current mobile de-        the user input and to present a general view of the avail-
              vices, together with the expansion of available bandwidth         able actions and of their relevance to the current interaction
              and powerful compression techniques, enable their users to        state, in order to activate them coherently. This requires
              enjoy advanced multimedia experiences. On the other hand,         an organization of content and interface which takes into
              PDAs, Tablet PCs, and even smart-phones, are embodying            account the device characteristics, so that the relevant con-
              different types of sensors and interaction strategies, includ-    tent has to be adapted to the device physical context as well
              ing pen-based interaction, VoIP support, GPS, etc. These          as to the user task context. On the other hand, this same
              also allow new methods of interaction, combined in a multi-       information will have to be used in the final application, so
              modal way, to steer the usage of multimedia material.             that it would be desirable to be able to employ the same
              Interaction with these devices differs from the traditional       logic for the construction of the WOz interface and of the
              desktop experience with multimedia content, as it occurs          final application.
              in less protected environments affected by noise, unusual            While many WOz experiments are performed using ad hoc
              postures, disruptions in availability of certain channels, e.g.   machinery, we propose WOEB (Wizard of Oz Experiment
              WiFi or GPS coverage [6]. Multimodality is becoming a             Builder) as a general framework for the construction of the
              common interaction paradigm due to its naturalness. How-          WOz logic and interface in terms of services. The deploy-
              ever, the peculiarities of multimodal interactive systems make    ment of the final application can proceed through the pro-
              it difficult to gather information from the use of modalities     gressive refinement of the available services, and the substi-
              that can be employed for improving the user interfaces [1].       tution of the WOz services with those running the actual ap-
              It is therefore important to be able to test the ways in which    plication. WOEB is based on a fragment of a metamodel for
              people can interact with new mobile applications before their     multimodal adaptive multimedia information systems and
              final deployment. Moreover, users’ needs can be detected us-      relies on services for the identification of the relevant con-
              ing evaluation techniques [7] dealing with real data, gathered    tent on the server-side, to be transmitted to the client-side
              from the observation of users accomplishing real tasks, while     on the mobile device, in the format which is most suitable
              operating on a physical mobile device [14]. System prototyp-      to the user preferences and the device capabilities. At this
              ing contributes to demonstrating functionality and appear-        stage of design, our approach focuses only on a subset of the
                                                                                possible interaction modalities typically related to interac-
              1
                Partially funded by MIUR: Projects CHAT and PRIN                tion with hand-held devices; for instance we do not take into
              2006.                                                             account bimanual interaction. We have tested WOEB in a
                                                                                scenario where a technician must be remotely instructed on
                                                                                troubleshooting hardware. We have run two experiments:




CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
in the first one, the system provides the technician with            In WOEB, the construction of WOz modules logic and
some predefined graphic or textual content about the hard-        interface relies on a metamodel and a reference architec-
ware, and interaction is performed by selecting special areas     ture. A typical WOz environment provides a set of tools
(hotspots or hyperlinks) in the content, along with oral com-     organized in a Client-Server architecture. The Server side is
mands. In the second, in conjunction with speech inputs, 2D       the Wizard module, hosting a human operator who simu-
physical labels are placed directly on hardware components        lates the not yet implemented functionalities of the system.
[9]. Using the built-in camera of the mobile device, the user     The Client side is the multimodal human-computer interface
can detect the label of a desired component and request fur-      deployed on the mobile device.
ther information from the system.

2.   RELATED WORK
   WOz experiments can be run on both high- and low-
fidelity prototypes. In [10], the authors present the WOZ
PRO system, a pen-based software environment enabling
both the creation of low-fidelity prototypes, and the con-
duction of WOz experiments. Since they do not aim at the
iterative refinement of the prototype into a fully functional
multimodal application, they do not exploit an architectural
metamodel.
The development of a complete multimodal system starting
from WOz experiments is described in [3], where the exper-        Figure 1: The basic metamodel for the definition of
iment is conducted in a different environment, in order to        models.
study integration of speech and gesture pointing in the con-
struction of collaborative story-telling.
Momento [6] supports on-site evaluation of ubiquitous com-          All packages of the WOEB metamodel import a basic
puting applications, addressing issues identified in interviews   Model package, a fragment of the UML 2.1.1 metamodel
with developers of mobile technology and observations of          presented in Figure 1, which allows the definition of (sim-
daily studies [5, 4]. Like WOEB, Momento consists of a            ple or derived) typed properties, paired with expressions for
set of configurable tools that can be used without writing        their evaluation. A model can contain different relationships
source code. Momento employs the text messaging and me-           between properties, and the presence of some properties can
dia messaging services of mobile devices to share content         be constrained by others. Finally, the model must provide
between the mobile application and the wizard’s platform.         ways to update its composition, upon receiving some trig-
Conversely, WOEB exploits a wider range of communica-             gering event.
tion channels (such as Wi-Fi or GPRS connections) to share
multimedia content, including audio or video streams.
   The Ozlab [13] tool investigates different forms of user
participation in the early phases of systems development and
supports the WOz protocol to perform experimental stud-
ies. Ozlab was conceived to handle both natural language
interfaces as well as GUI interactivity testing. It allows text
output typed by the wizard or previously defined or voice
output, wizard-made or pre-recorded.
   Bunt et al. [2] distinguish between content adaptation and
presentation, where the first involves deciding what content
is most relevant and how to structure this content in a coher-
ent way, while the second involves deciding how to adapt the      Figure 2: A fragment of the complete metamodel
presentation of the selected content to the user. WOEB sup-       for the definition of interactive applications.
ports both processes, leaving different degrees of freedom to
designers and wizards to force specific forms of adaptation,
while leaving others to the implemented components.                  Figure 2 illustrates the relationship between information
   The applications of the suggested environment are that         in the client model and in other packages, in particular the
of adaptive hypermedia, for which several general models          multimedia content model and the adaptation model, by
are being proposed. In [12], a set of models for the adap-        defining suitable classes as containers of properties. Hence,
tation process are identified, namely: domain, navigation,        the client will host information sources derived from some
user, integration, presentation, adaptation and context, and      content on the server, organized into a composite structure.
three types of adaptable elements: content, navigation and        The content is served by a content handler, according to an
presentation. Our metamodel has several aspects in com-           adaptation process performed by a request manager which
mon with this, but offers reuse, allowing binding of model        interprets event descriptions received by specific handlers for
components to specific implementation elements.                   client-generated events. The content delivery to the client
                                                                  creates an information source, which provides support to
                                                                  further interactions in the form of some base, triggering in-
3.   A METAMODEL FOR MULTIMEDIA
                                                                  teraction events which are composed and interpreted as re-
     INFORMATION SYSTEMS                                          quests for new content.
4.     SYSTEM COMPONENTS OUTLINE                                  3), and assigned two ”what” or ”how” label to relative con-
   The WOEB environment is a set of tools for the rapid           tent. This information can then be used by the DISelect
setup of WOz experiments. We identified here three main           engine to filter a subset of content, matching this attribute.
actors: the WOz experiment designer, the wizard and the           The designer can also define general purpose content, not
subject of the experiments.                                       associated with any interactive area, in order to manage
   The WOz Builder Module offers a GUI for specification          oral user interactions or unpredicted events. Once all the
of XML-based descriptions of the WOz Server and Client            elements are specified within the WOz Builder, definitions
components. An XML Interpreter processes the XML de-              (in terms of interface artifacts and services) are stored in an
scription to produce the Server and Client widgets and func-      XML archive. This platform-independent description of the
tionalities, according to the metamodel. The WOz Server           logic and interface of the WOz modules can be processed by
Module represents the system used by the human wizard to          customized engines to generate the effective WOz Server and
lead the experiments. It allows the wizard to serve users         Client components. In the current implementation the XM-
requests with a simple and intuitive interface and integrates     LInterpreter co-generates both the WOz Server and WOz
the data collection facilities. The server module also imple-     Client interfaces and services.
ments the server-side communication mechanisms for send-
ing and receiving data from the WOz Client Module, which          5.1    WOz Server Module
is the multimodal application running on a mobile device            This module was automatically generated by the XM-
used by the subject of the experiment.                            LInterpreter, starting from the definition contained in the
                                                                  XML archive previously mentioned. Information sources are
                                                                  placed in a tabbed pane including an appropriate content
5.     DESIGNING THE PROTOTYPE OF A                               viewer. For each information source a dedicated panel is
       MULTIMODAL SYSTEM                                          created for content selection. Each panel contains buttons
   In the current implementation, the WOz Builder enables         for each trigger base element in the information source (the
the definition of the information sources, which represent the    motherboard image in Figure 4) that can be used by the
way the Client displays content. As the interaction model         human wizard to call the associated content handler (e.g.
accommodates the combination of speech input with trig-           the DISelect Engine).
gering events, the WOz Builder has to present an interface
for the definition of such events.
   In the first experiment, WOEB is used for developing a
multimodal mobile application where interaction is based on
the selection (through a pointing device) of special areas in
the information source, an image in our case. We do not
describe the second experiment, due to lack of space. The
designer loads the source (the image itself) in a dedicated
panel of the WOz Builder and defines the TriggerBase as
sensitive areas. In the current implementation, an area can
be a hyperlink, if the information source is a hypertext, or
a polygon for a graphical information source.
   By selecting an interactive area, the designer associates a
description of the relative content, which will be sent to the
client following a user request. This involves the specifica-
tion of attribute-value pairs describing the content’s behav-
ior relative to the user’s delivery context. Such attributes
can be used to guide the adaptation of content to be de-
livered to the mobile device (via a DISelect engine2 which
represents the realization of the ComplexRequestManager in
our implementation of the metamodel). For example con-             Figure 4: A portion of the WOz server interface.
tent manifestation admits as values audio, text or graphics,
while request context defines the type of user request a par-
ticular content is designed to serve, e.g. how or what. Differ-      In order to simplify content selection by the human wiz-
ent request context types can be defined at the time of WOz       ard, buttons are organized in columns labeled with the dif-
modules definition. The builder can automatically organize        ferent request contexts. Contexts are used by the DISelect
the layout of the WOz Server GUI, so that the buttons are         engine to preliminarily filter the content to be retrieved, de-
distributed over columns labeled by the request context.          pending on the mobile device. This feature can be disabled
   In Figure 3, the information source is an image of a com-      during WOz module definition: for example if some compo-
puter mother-board on which a technician has to intervene         nents are fully implemented. The WOz Server is designed
supported by the mobile system. The CPU socket area has           to be a starting point to build a real adaptive multimodal
been defined as a hot spot. On the client-side, a user can in-    system. A data communication interface, handling data ex-
teract with the system selecting it and asking ”what is this?”    change between client and server, is implemented as well as a
or ”how can I use this one?”. The designer has thus defined       Timestamp Agent tracing the temporal order of the events.
two request context types, called ”what” and ”how”(Figure         To deploy the final system, only services that are currently
                                                                  embodied by the human wizard have to be implemented:
2
    http://www.w3.org/TR/cselection/.                             for example an Adaptive Dialog Manager, or a module to
                                        Figure 3: WOz Builder : adding a content



manage the multimodal fusion of pointing and speech events            Web, volume 4321 of LNCS, pages 409–432. Springer,
through the Timestamp Agent. One can also replace existing            2007.
modules with customized ones: as an example the DISelect          [3] S. Carbini, L. Delphin-Poulat, L. Perron, and J.E.
Engine can be substituted by another implementation of the            Viallet. From a Wizard of Oz experiment to a real
Content Handler interface, simply by redefining the process           time speech and gesture multimodal interface. Signal
of attribute definition and the content description within the        Processing, 86(12):3559–3577, 2006.
WOz Builder.                                                      [4] S. Carter, S. R. Klemmer, and J. Mankoff. Exiting the
                                                                      cleanroom: On ecological validity and ubiquitous
5.2   WOz Client Module                                               computing. HCI, 23(1):47–99, 2008.
   In a way similar to the Server Module, the WOz Client          [5] S. Carter and J. Mankoff. When participants do the
Module is automatically generated by the XMLInterpreter.              capturing: the role of media in diary studies. In (CHI
This module is essentially a data viewer, which can display           2005), pages 899–908. ACM, ACM Press, 2005.
content sent by the server and manage user pointing and oral
                                                                  [6] S. Carter, J. Mankoff, and J. Heer. Momento: support
interaction. Pointing interaction is captured through the
                                                                      for situated ubicomp experimentation. In CHI 2007,
trigger base, while audio inputs are processed by a Speech
                                                                      pages 125–134. ACM, 2007.
Capture engine. A description of a triggered event is sent,
                                                                  [7] Scott Carter and Jennifer Mankoff. Prototypes in the
while the captured speech is streamed directly to the Dialog
                                                                      wild: Lessons from three ubicomp systems. IEEE
Manager on the server (e.g. the human wizard).
                                                                      Pervasive Computing, 4(4):51–57, 2005.
                                                                  [8] B. Hartmann and S. R. Klemmer. Reflective physical
6.    CONCLUSIONS                                                     prototyping through integrated design, test, and
   We presented WOEB (Wizard of Oz Experiments Builder),              analysis. In UIST06, pages 299–308. ACM, 2006.
a framework and environment for setting up of Wizard of           [9] L. E. Holmquist. Tagging the world. interactions,
Oz experiments by defining modules interface and services.            13(4):51–ff, 2006.
WOEB encourages the development of multimodal systems            [10] C. Hundhausen, S. Trent, A. Balkar, and M. Nuur.
for managing multimedia information through the progres-              The design and experimental evaluation of a tool to
sive refinement and implementation of components and ser-             support the construction and wizard-of-oz testing of
vices. The adoption of a common metamodel for the defini-             low fidelity prototypes. In IEEE Symposium on
tion of the Wizard of Oz experiments and the deployment of            VL/HCC 2008, pages 86–90, 2008.
the final application favors the rapid identification of prob-
                                                                 [11] J. F. Kelley. An iterative design methodology for
lem sources and the reuse of components and services for
                                                                      user-friendly natural language office information
different application scenarios. The first tests with WOEB
                                                                      applications. ACM Trans. Inf. Syst., 2(1):26–41, 1984.
show its efficacy in rapid setting of experiments for explor-
ing different interaction modes. Further initial experiments     [12] D. Schwabe P. Seefelder de Assis. A semantic
have been performed, only partially reported here due to              meta-model for adaptive hypermedia systems. In AH
lack of space.                                                        2004, volume 3137 of LNCS, pages 360–365. Springer,
                                                                      2004.
                                                                 [13] J.S. Pettersson. Ozlab: a system overview with an
7.    REFERENCES                                                      account of two years of experiences. In HumanIT
 [1] R. Bernhaupt, D. Navarre, P. Palanque, and M.A.
     Winckler. Model-based evaluation: A new way to                   2003, pages 159–185, 2003.
     support usability evaluation of multimodal interactive      [14] D. Salber and J. Coutaz. Applying the Wizard of Oz
     applications. In E. Law, E. Hvannberg, G. Cockton,               technique to the study of multimodal systems. In
     and J. Vanderdonckt, editors, Maturing Usability:                EWHCI, volume 753 of LNCS, pages 219–230.
     Quality in Software, Interaction and Quality, volume             Springer, 1993.
     HCIS, pages 96–122. Springer, 2008.
 [2] A. Bunt, G. Carenini, and C. Conati. Adaptive
     content presentation for the web. In The Adaptive