=Paper=
{{Paper
|id=Vol-439/paper-1
|storemode=property
|title=WOEB: Rapid Setting of Wizard of Oz Experiments and Reuse for Deployed Applications
|pdfUrl=https://ceur-ws.org/Vol-439/paper1.pdf
|volume=Vol-439
}}
==WOEB: Rapid Setting of Wizard of Oz Experiments and Reuse for Deployed Applications==
WOEB: Rapid Setting of Wizard of Oz Experiments and
Reuse for Deployed Applications
Andrea Bellucci, Paolo Bottoni, and Stefano Levialdi
Dipartimento di Informatica
Università Sapienza di Roma, Italy
{bellucci,bottoni,levialdi}@uniroma1.it
ABSTRACT ance of a software application, including usability testing [8,
1
We describe an approach and an environment for setting up 10, 13].
Wizard of Oz experiments to test multimodal interaction Wizard of Oz (WOz) techniques [11, 14] are a popular way to
(through mobile devices) for multimedia content delivery. perform this testing. Similarly to the character in F. Baum’s
This is based on a metamodel and a reference client-server story, in a WOz experiment a human wizard simulates the
architecture, whose implementation is discussed. The ap- behaviour of a partially implemented multimodal system.
proach allows progressive refinement of the Wizard of Oz The subjects of the experiment must believe they are inter-
environment into a deployed application, through the use of acting with a real, fully implemented one, thus maintaining
a single metamodel, to be progressively specialized, and a a natural interaction behaviour. The wizard, unknown to
service-oriented approach which allows the substitution of the subject, monitors the user through a dedicated com-
implemented services to simulated ones. A visual interface puter program, connected to the observed system over a
is provided for the definition of the wizard interface, of the network. When the subject invokes a function that is not
metadata relating content delivery and possible interactions implemented by the observed system, the wizard simply sim-
on it, and of some aspects of the client interaction. ulates its effect. In this way, designers can capture specific
interaction phenomena, without the mediation and distor-
tions of human-human communication. However, setting up
1. INTRODUCTION a WOz experiment may be hard in that the working envi-
The increasing capabilities of processing power and mem- ronment for the wizard has to provide complete control on
ory and the high resolution displays of current mobile de- the user input and to present a general view of the avail-
vices, together with the expansion of available bandwidth able actions and of their relevance to the current interaction
and powerful compression techniques, enable their users to state, in order to activate them coherently. This requires
enjoy advanced multimedia experiences. On the other hand, an organization of content and interface which takes into
PDAs, Tablet PCs, and even smart-phones, are embodying account the device characteristics, so that the relevant con-
different types of sensors and interaction strategies, includ- tent has to be adapted to the device physical context as well
ing pen-based interaction, VoIP support, GPS, etc. These as to the user task context. On the other hand, this same
also allow new methods of interaction, combined in a multi- information will have to be used in the final application, so
modal way, to steer the usage of multimedia material. that it would be desirable to be able to employ the same
Interaction with these devices differs from the traditional logic for the construction of the WOz interface and of the
desktop experience with multimedia content, as it occurs final application.
in less protected environments affected by noise, unusual While many WOz experiments are performed using ad hoc
postures, disruptions in availability of certain channels, e.g. machinery, we propose WOEB (Wizard of Oz Experiment
WiFi or GPS coverage [6]. Multimodality is becoming a Builder) as a general framework for the construction of the
common interaction paradigm due to its naturalness. How- WOz logic and interface in terms of services. The deploy-
ever, the peculiarities of multimodal interactive systems make ment of the final application can proceed through the pro-
it difficult to gather information from the use of modalities gressive refinement of the available services, and the substi-
that can be employed for improving the user interfaces [1]. tution of the WOz services with those running the actual ap-
It is therefore important to be able to test the ways in which plication. WOEB is based on a fragment of a metamodel for
people can interact with new mobile applications before their multimodal adaptive multimedia information systems and
final deployment. Moreover, users’ needs can be detected us- relies on services for the identification of the relevant con-
ing evaluation techniques [7] dealing with real data, gathered tent on the server-side, to be transmitted to the client-side
from the observation of users accomplishing real tasks, while on the mobile device, in the format which is most suitable
operating on a physical mobile device [14]. System prototyp- to the user preferences and the device capabilities. At this
ing contributes to demonstrating functionality and appear- stage of design, our approach focuses only on a subset of the
possible interaction modalities typically related to interac-
1
Partially funded by MIUR: Projects CHAT and PRIN tion with hand-held devices; for instance we do not take into
2006. account bimanual interaction. We have tested WOEB in a
scenario where a technician must be remotely instructed on
troubleshooting hardware. We have run two experiments:
CEUR
ceur-ws.org
Workshop ISSN 1613-0073
Proceedings
in the first one, the system provides the technician with In WOEB, the construction of WOz modules logic and
some predefined graphic or textual content about the hard- interface relies on a metamodel and a reference architec-
ware, and interaction is performed by selecting special areas ture. A typical WOz environment provides a set of tools
(hotspots or hyperlinks) in the content, along with oral com- organized in a Client-Server architecture. The Server side is
mands. In the second, in conjunction with speech inputs, 2D the Wizard module, hosting a human operator who simu-
physical labels are placed directly on hardware components lates the not yet implemented functionalities of the system.
[9]. Using the built-in camera of the mobile device, the user The Client side is the multimodal human-computer interface
can detect the label of a desired component and request fur- deployed on the mobile device.
ther information from the system.
2. RELATED WORK
WOz experiments can be run on both high- and low-
fidelity prototypes. In [10], the authors present the WOZ
PRO system, a pen-based software environment enabling
both the creation of low-fidelity prototypes, and the con-
duction of WOz experiments. Since they do not aim at the
iterative refinement of the prototype into a fully functional
multimodal application, they do not exploit an architectural
metamodel.
The development of a complete multimodal system starting
from WOz experiments is described in [3], where the exper- Figure 1: The basic metamodel for the definition of
iment is conducted in a different environment, in order to models.
study integration of speech and gesture pointing in the con-
struction of collaborative story-telling.
Momento [6] supports on-site evaluation of ubiquitous com- All packages of the WOEB metamodel import a basic
puting applications, addressing issues identified in interviews Model package, a fragment of the UML 2.1.1 metamodel
with developers of mobile technology and observations of presented in Figure 1, which allows the definition of (sim-
daily studies [5, 4]. Like WOEB, Momento consists of a ple or derived) typed properties, paired with expressions for
set of configurable tools that can be used without writing their evaluation. A model can contain different relationships
source code. Momento employs the text messaging and me- between properties, and the presence of some properties can
dia messaging services of mobile devices to share content be constrained by others. Finally, the model must provide
between the mobile application and the wizard’s platform. ways to update its composition, upon receiving some trig-
Conversely, WOEB exploits a wider range of communica- gering event.
tion channels (such as Wi-Fi or GPRS connections) to share
multimedia content, including audio or video streams.
The Ozlab [13] tool investigates different forms of user
participation in the early phases of systems development and
supports the WOz protocol to perform experimental stud-
ies. Ozlab was conceived to handle both natural language
interfaces as well as GUI interactivity testing. It allows text
output typed by the wizard or previously defined or voice
output, wizard-made or pre-recorded.
Bunt et al. [2] distinguish between content adaptation and
presentation, where the first involves deciding what content
is most relevant and how to structure this content in a coher-
ent way, while the second involves deciding how to adapt the Figure 2: A fragment of the complete metamodel
presentation of the selected content to the user. WOEB sup- for the definition of interactive applications.
ports both processes, leaving different degrees of freedom to
designers and wizards to force specific forms of adaptation,
while leaving others to the implemented components. Figure 2 illustrates the relationship between information
The applications of the suggested environment are that in the client model and in other packages, in particular the
of adaptive hypermedia, for which several general models multimedia content model and the adaptation model, by
are being proposed. In [12], a set of models for the adap- defining suitable classes as containers of properties. Hence,
tation process are identified, namely: domain, navigation, the client will host information sources derived from some
user, integration, presentation, adaptation and context, and content on the server, organized into a composite structure.
three types of adaptable elements: content, navigation and The content is served by a content handler, according to an
presentation. Our metamodel has several aspects in com- adaptation process performed by a request manager which
mon with this, but offers reuse, allowing binding of model interprets event descriptions received by specific handlers for
components to specific implementation elements. client-generated events. The content delivery to the client
creates an information source, which provides support to
further interactions in the form of some base, triggering in-
3. A METAMODEL FOR MULTIMEDIA
teraction events which are composed and interpreted as re-
INFORMATION SYSTEMS quests for new content.
4. SYSTEM COMPONENTS OUTLINE 3), and assigned two ”what” or ”how” label to relative con-
The WOEB environment is a set of tools for the rapid tent. This information can then be used by the DISelect
setup of WOz experiments. We identified here three main engine to filter a subset of content, matching this attribute.
actors: the WOz experiment designer, the wizard and the The designer can also define general purpose content, not
subject of the experiments. associated with any interactive area, in order to manage
The WOz Builder Module offers a GUI for specification oral user interactions or unpredicted events. Once all the
of XML-based descriptions of the WOz Server and Client elements are specified within the WOz Builder, definitions
components. An XML Interpreter processes the XML de- (in terms of interface artifacts and services) are stored in an
scription to produce the Server and Client widgets and func- XML archive. This platform-independent description of the
tionalities, according to the metamodel. The WOz Server logic and interface of the WOz modules can be processed by
Module represents the system used by the human wizard to customized engines to generate the effective WOz Server and
lead the experiments. It allows the wizard to serve users Client components. In the current implementation the XM-
requests with a simple and intuitive interface and integrates LInterpreter co-generates both the WOz Server and WOz
the data collection facilities. The server module also imple- Client interfaces and services.
ments the server-side communication mechanisms for send-
ing and receiving data from the WOz Client Module, which 5.1 WOz Server Module
is the multimodal application running on a mobile device This module was automatically generated by the XM-
used by the subject of the experiment. LInterpreter, starting from the definition contained in the
XML archive previously mentioned. Information sources are
placed in a tabbed pane including an appropriate content
5. DESIGNING THE PROTOTYPE OF A viewer. For each information source a dedicated panel is
MULTIMODAL SYSTEM created for content selection. Each panel contains buttons
In the current implementation, the WOz Builder enables for each trigger base element in the information source (the
the definition of the information sources, which represent the motherboard image in Figure 4) that can be used by the
way the Client displays content. As the interaction model human wizard to call the associated content handler (e.g.
accommodates the combination of speech input with trig- the DISelect Engine).
gering events, the WOz Builder has to present an interface
for the definition of such events.
In the first experiment, WOEB is used for developing a
multimodal mobile application where interaction is based on
the selection (through a pointing device) of special areas in
the information source, an image in our case. We do not
describe the second experiment, due to lack of space. The
designer loads the source (the image itself) in a dedicated
panel of the WOz Builder and defines the TriggerBase as
sensitive areas. In the current implementation, an area can
be a hyperlink, if the information source is a hypertext, or
a polygon for a graphical information source.
By selecting an interactive area, the designer associates a
description of the relative content, which will be sent to the
client following a user request. This involves the specifica-
tion of attribute-value pairs describing the content’s behav-
ior relative to the user’s delivery context. Such attributes
can be used to guide the adaptation of content to be de-
livered to the mobile device (via a DISelect engine2 which
represents the realization of the ComplexRequestManager in
our implementation of the metamodel). For example con- Figure 4: A portion of the WOz server interface.
tent manifestation admits as values audio, text or graphics,
while request context defines the type of user request a par-
ticular content is designed to serve, e.g. how or what. Differ- In order to simplify content selection by the human wiz-
ent request context types can be defined at the time of WOz ard, buttons are organized in columns labeled with the dif-
modules definition. The builder can automatically organize ferent request contexts. Contexts are used by the DISelect
the layout of the WOz Server GUI, so that the buttons are engine to preliminarily filter the content to be retrieved, de-
distributed over columns labeled by the request context. pending on the mobile device. This feature can be disabled
In Figure 3, the information source is an image of a com- during WOz module definition: for example if some compo-
puter mother-board on which a technician has to intervene nents are fully implemented. The WOz Server is designed
supported by the mobile system. The CPU socket area has to be a starting point to build a real adaptive multimodal
been defined as a hot spot. On the client-side, a user can in- system. A data communication interface, handling data ex-
teract with the system selecting it and asking ”what is this?” change between client and server, is implemented as well as a
or ”how can I use this one?”. The designer has thus defined Timestamp Agent tracing the temporal order of the events.
two request context types, called ”what” and ”how”(Figure To deploy the final system, only services that are currently
embodied by the human wizard have to be implemented:
2
http://www.w3.org/TR/cselection/. for example an Adaptive Dialog Manager, or a module to
Figure 3: WOz Builder : adding a content
manage the multimodal fusion of pointing and speech events Web, volume 4321 of LNCS, pages 409–432. Springer,
through the Timestamp Agent. One can also replace existing 2007.
modules with customized ones: as an example the DISelect [3] S. Carbini, L. Delphin-Poulat, L. Perron, and J.E.
Engine can be substituted by another implementation of the Viallet. From a Wizard of Oz experiment to a real
Content Handler interface, simply by redefining the process time speech and gesture multimodal interface. Signal
of attribute definition and the content description within the Processing, 86(12):3559–3577, 2006.
WOz Builder. [4] S. Carter, S. R. Klemmer, and J. Mankoff. Exiting the
cleanroom: On ecological validity and ubiquitous
5.2 WOz Client Module computing. HCI, 23(1):47–99, 2008.
In a way similar to the Server Module, the WOz Client [5] S. Carter and J. Mankoff. When participants do the
Module is automatically generated by the XMLInterpreter. capturing: the role of media in diary studies. In (CHI
This module is essentially a data viewer, which can display 2005), pages 899–908. ACM, ACM Press, 2005.
content sent by the server and manage user pointing and oral
[6] S. Carter, J. Mankoff, and J. Heer. Momento: support
interaction. Pointing interaction is captured through the
for situated ubicomp experimentation. In CHI 2007,
trigger base, while audio inputs are processed by a Speech
pages 125–134. ACM, 2007.
Capture engine. A description of a triggered event is sent,
[7] Scott Carter and Jennifer Mankoff. Prototypes in the
while the captured speech is streamed directly to the Dialog
wild: Lessons from three ubicomp systems. IEEE
Manager on the server (e.g. the human wizard).
Pervasive Computing, 4(4):51–57, 2005.
[8] B. Hartmann and S. R. Klemmer. Reflective physical
6. CONCLUSIONS prototyping through integrated design, test, and
We presented WOEB (Wizard of Oz Experiments Builder), analysis. In UIST06, pages 299–308. ACM, 2006.
a framework and environment for setting up of Wizard of [9] L. E. Holmquist. Tagging the world. interactions,
Oz experiments by defining modules interface and services. 13(4):51–ff, 2006.
WOEB encourages the development of multimodal systems [10] C. Hundhausen, S. Trent, A. Balkar, and M. Nuur.
for managing multimedia information through the progres- The design and experimental evaluation of a tool to
sive refinement and implementation of components and ser- support the construction and wizard-of-oz testing of
vices. The adoption of a common metamodel for the defini- low fidelity prototypes. In IEEE Symposium on
tion of the Wizard of Oz experiments and the deployment of VL/HCC 2008, pages 86–90, 2008.
the final application favors the rapid identification of prob-
[11] J. F. Kelley. An iterative design methodology for
lem sources and the reuse of components and services for
user-friendly natural language office information
different application scenarios. The first tests with WOEB
applications. ACM Trans. Inf. Syst., 2(1):26–41, 1984.
show its efficacy in rapid setting of experiments for explor-
ing different interaction modes. Further initial experiments [12] D. Schwabe P. Seefelder de Assis. A semantic
have been performed, only partially reported here due to meta-model for adaptive hypermedia systems. In AH
lack of space. 2004, volume 3137 of LNCS, pages 360–365. Springer,
2004.
[13] J.S. Pettersson. Ozlab: a system overview with an
7. REFERENCES account of two years of experiences. In HumanIT
[1] R. Bernhaupt, D. Navarre, P. Palanque, and M.A.
Winckler. Model-based evaluation: A new way to 2003, pages 159–185, 2003.
support usability evaluation of multimodal interactive [14] D. Salber and J. Coutaz. Applying the Wizard of Oz
applications. In E. Law, E. Hvannberg, G. Cockton, technique to the study of multimodal systems. In
and J. Vanderdonckt, editors, Maturing Usability: EWHCI, volume 753 of LNCS, pages 219–230.
Quality in Software, Interaction and Quality, volume Springer, 1993.
HCIS, pages 96–122. Springer, 2008.
[2] A. Bunt, G. Carenini, and C. Conati. Adaptive
content presentation for the web. In The Adaptive