=Paper=
{{Paper
|id=None
|storemode=property
|title=A Supportive User Interface for Customization of Graphical-to-Vocal Adaptation
|pdfUrl=https://ceur-ws.org/Vol-828/SUI_2011_paper1.pdf
|volume=Vol-828
}}
==A Supportive User Interface for Customization of Graphical-to-Vocal Adaptation==
A Supportive User Interface for Customization of
Graphical-to-Vocal Adaptation
Fabio Paternò Christian Sisti
CNR-ISTI, HIIS Laboratory CNR-ISTI, HIIS Laboratory
Via Moruzzi 1, 56124 Pisa, Italy Via Moruzzi 1, 56124 Pisa, Italy
Fabio.Paterno@isti.cnr.it Christian.Sisti@isti.cnr.it
ABSTRACT Concrete User Interfaces (CUIs) are dependent on the
In this paper, we describe an approach to adapting interaction resources of the target platforms but are
graphical Web pages into vocal ones, and show how independent of the implementation languages.
the approach is supported by a tool that allows the user
An AUI is composed by a number of presentations, a
to drive the adaptation results by customizing the
data model and a set of external functions. Moreover
adaptation parameters. The adaptation process exploits
each presentation contains a number of user interface
model-based user interface descriptions.
elements, called interactors, and a number of, so
Keywords called, interactor compositions. Examples of interactor
Vocal Interfaces, Model-Based, Adaptation, Supportive compositions are grouping and relations to
User Interfaces, Accessibility. group/relate different interactors. The interactors can
be classified in terms of editing, selection, output and
INTRODUCTION control and may have associated a number of events
Vocal interfaces are important in a number of different handler.
contexts, such as for vision-impaired users or when the As already mentioned, the CUIs are dependent on the
visual channel is busy (e.g, car driving) [7]. Design interaction resources of the target platform so, while in
techniques in developing Vocal Interfaces has been Desktop modality a presentation can be defined as a
widely studied [1] but little attention has been paid on set of user interface elements perceivable at a given
how to adapt web pages for vocal browsing. Moreover, time, in the case of Vocal modality a presentation is
recognition of natural language is improving [2] and in defined as a set of dialogues between user and platform
future it will be possible to develop vocal interfaces that can be identified as a logical unit (e.g. the
able to recognize any user input. communication necessary for a vocal form filling).
We found that adaptation of graphical Web pages into
vocal ones needs to be supplemented through
Supportive User Interfaces (SUI), that enable the users
to customize the adaptation. Indeed, a completely
automatic transformation cannot provide good results
in many case.
The adaptation process is based on the exploitation of
MARIA [5], a recent model-based language, which
allows designers to specify abstract and concrete user
interface languages according to the CAMELEON
Reference framework [3]. The customization tool has a
Web interface allowing the user to drive the Vocal
Interfaces generation.
In this workshop paper we firstly present the overall
Model-Based Language Architecture, secondly we
introduce the adaptation approach and lastly we show
an example of application of the supportive interface
for graphical-to-vocal adaptations, also showing how a
parameter change can lead to different results in the
final user interface. Figure 1. Some Possible Abstraction Levels
MODEL-BASED INTERFACES in MULTI-DEVICE Figure 1 shows the relationship between AUI and CUIs
ENVIRONMENTS limited to Desktop and Vocal target platform (some
MARIA is a model-based language, which allows other target platforms available are Mobile, Multi-
designers to specify abstract and concrete user interface Touch and Multi-Modal). Figure 1 also represents
languages. Abstract User Interfaces (AUIs) are some possible transformations that can be performed,
independent on the interaction modalities, while such as the HTML generation from Desktop Logical
Descriptions (an instance of a Desktop CUI) and the into a (semantically equivalent) element of a
VoiceXML generation from Vocal Logical new Vocal CUI.
Descriptions.
The aim of our work is to develop an adaptation The final implementation language is VoiceXML 2.0
process that take as input HTML pages, and generates [8], a standard language, supported by W3C, for the
corresponding VoiceXML (opportunely adapted for specification of Vocal Interfaces. The VoiceXML code
voice modality) documents. This is not a simple task generated by the transformation has been tested with
and raises a large number of adaptation issues (such as the Voxeo Voice Browser [9] (suggested by W3C), and
the retrieving of the menu items for vocal interaction has passed the validation test integrated in it. More
and the adaptation of images). In this context detail on the VoiceXML generation is provided in [4].
Supportive User Interfaces can provide useful support,
in particular in the customization of the adaptation
rules. THE CUSTOMIZATION SUPPORT
The adaptation process is complex and the results
depend on a number of factors, such as the structure of
APPROACH the Web pages in input and their conformance to the
Our solution is based on an adaptation server that accessibility guidelines. In order to obtain better results
consists of three modules (see Figure 2): we have designed a Supportive User Interface, which
Reverser: parses the Web pages and builds up allows the user to customize the adaptation results.
an equivalent Desktop Concrete Logical The adaptation process can be driven setting a number
Description. of parameters. Such parameters can influence different
Adapter: transforms the Desktop Concrete states of the transformation process.
Logical Description into an adapted Vocal To adjust the pre-conversion step the following
Concrete Logical Description. parameters are available:
Generator: generates the VoiceXML taking
in input the Vocal Concrete Logical Remove Whitespaces: if enabled it removes
Description. the grouping that contains only whitespaces
from the computation. This can happen due to
graphical formatting purposes (e.g., list of
“ ”).
Min Image Width/Height: images under
these size limits (that not contains ALT
attribute) are removed.
Min Grouping Threshold: in the
specification provided by the reverse
engineering removing grouping operators
when they contain little text (below the
threshold) to synthesize.
Figure 2. The Adaptation Server Architecture.
To customize the menu generator step it is possible to
set the following parameters:
The reverser, taking into account the associated page
style-sheet, transforms the HTML tag patterns into Max Grouping Threshold: if the textual
opportune Desktop CUI elements. This process enables grouping content length is above the max
the possibility to obtain a more semantic description. threshold, then new menu items are created by
The adapter is subdivided into three sub-modules that splitting the original grouping.
are executed in pipeline: Descr/Nav ratio: to set the ratio between the
1. Pre-Converter: removes the elements that description and navigator interactors in order
cannot be rendered vocally (e.g., images to identify the groupings that contain a
without ALT tag) but also corrects possible navigator bar.
inconsistences due to the reverse process (e.g.,
grouping containing only one interactor due to Finally, to customize the mapper step, the parameters
formatting purposes). are:
2. Menu-Generator: generally the vocal
interfaces are navigated through lists of Multiple Choice: to set how the final vocal
menus. This step aims to convert a Desktop interface will perform the multiple choice.
Logical Description into a new one structured There are two solutions: Yes/No Questions, for
into a set of of menus/sub-menus every possible choice the platform will ask a
hierarchically structured. Yes/No confirmation to the user; Grammar
3. Graphical-to-Vocal Mapper: with this step Based: the user can select more than one
each elements of the Desktop CUI is mapped
possible choice with one single sentence EXAMPLE CONFIGURATION PARAMETER
(listing the choices in sequence). CHANGE
End Form Sound: to decide if each vocal In this section we show an example of configuration
dialogue should terminate with a short sound. parameter change, which affects the structure of the
resulting user interface.
Figure 3 and 4 show our Supportive User Interface that In particular, we consider Max_Threshold parameter,
allows such parameterization. The left panel (shown in which defines the threshold in terms of text length to
Figure 3) contains some modifiable parameters and render vocally. If the length exceeds this limit the
their default’s values. adaptation system splits the presentation content. If we
set max_threshold = 2500 then we obtain the structure
in Figure 4.
Figure 4. Initial parameter set.
Thus, the Returning home part (see Figure 5) will be
rendered a single piece of information.
Figure 3. Customization of the adapter.
The right panel (see figure below) shows the structure
and the menu items of the generated vocal page. In this
way the designer can decide whether to download the
final vocal interface (as a zip file containing the
VoiceXML documents) or change the transformation
parameters in order to obtain a different structure.
Figure 5. The considered content part.
If we change the parameter to max_threshold = 700
Figure 4. Application right panel: vocal menu structure.
we obtain the structure in Figure 6.
We consider this tool as useful support to provide users
with full control on the final results. Given the
complexity of the existing Web content, we plan to add
new features to both the adaptation rules and the
customization interface, in order to have further
flexible control on the adaptation results.
ACKNOWLEDGMENTS
We gratefully acknowledge support from the Artemis
EU SMARCOS and the ICT EU SERENOA projects.
Figure 6. The resulting modified structure REFERENCES
1. A., Edwards and I., Pitt.: Design of Speech-
Based devices. Springer (2007).
We can note that the resulting structure has more sub-
levels: the section Returning home is subdivided in 2. A., Franz. and B., Milch.:Searching the web
multiple parts, highlighted by dashed lines in Figure 7, by Voice. In proceeding of the 19th
which can be further subdivided. international conference on Computational
Linguistic - Volume 2, pp. 1-5, Stroudsburg,
PA, USA. (2002).
3. Calvary, G., Coutaz, J., Bouillon, L., Florins,
M., Limbourg, O., Marucci, L., Paternò, F.:
The CAMELEON reference framework.
CAMELEON project, Deliverable 1.1. (2002).
4. F., Paternò and C., Sisti.: Deriving Vocal
Interfaces in Multi-device Authoring
Environments. In Proceedings of the 10th
International Conference on Web Engineering,
pp. 204-217 (2010).
5. Paternò F., Santoro C., Spano L.D.: MARIA:
A universal, declarative, multiple abstraction-
level language for service-oriented
applications in ubiquitous environments.
ACM Trans. Comput.-Hum. Interact., 16(4).
(2009).
6. UNICEF. http://www.unicef.org/.
7. Voice Browser Activity.
http://www.w3.org/Voice/.
8. Voice extensible markup language
(VoiceXML) version 2.0.
http://www.w3.org/TR/2009/REC-
voicexml20-20090303/7.
9. Voxeo Voice Browser.
http://www.voxeo.com/.
Figure 7. How the content is further divided.
CONCLUSION
A Model-Based approach to supporting Graphical-to-
Vocal Adaptation is introduced. A Supportive User
Interface is then proposed (as Web Application) in
order to help the user to manage the overall adaptation
process.