Proceedings of CHI 2006 Workshop “The Many Faces of Consistency in Cross-Platform Design” 14 Design methods and software tools for consistency in multi-channel web applications Charlie Wiecha, IBM T.J. Watson Research Center, Yorktown Heights, N.Y., 10598, USA, wiecha@us.ibm.com Rahul Akolkar, IBM T.J. Watson Research Center, Yorktown Heights, N.Y., 10598, USA, akolkar@us.ibm.com Rafah Hosn, IBM T.J. Watson Research Center, Yorktown Heights, N.Y., 10598, USA, rhosn@us.ibm.com Thomas Ling IBM T.J. Watson Research Center, Yorktown Heights, N.Y., 10598, USA, ling@us.ibm.com TV Raman, Google Research, raman@google.com INTRODUCTION The desire for ubiquitous access to web applications like online shopping, combined with the coming of age of speech technology, creates unique challenges as we design for multi-channel interaction. Usable and cost-effective addition of spoken access to these applications requires: • Identifying user scenarios that benefit from voice interaction, implying the ability to treat spoken interaction as a first-class user interface modality; experience shows that simply speaking the same content as in the visual interface does not deliver usable interfaces. • High quality speech components that need not be developed on a per application basis and that deliver a consistent “sound and feel” – often called the speech “persona” – within the voice channel. • An application design that reuses many of the programming artifacts used by mainstream Web developers in delivering visual access, and hence also offers appropriate consistency across the voice and visual channels. This paper presents a design and engineering approach to multi-channel applications that supports both intra-channel consistency and cross-channel consistency. The main innovation is the introduction of a UI component model allowing for sharing or overriding design decisions as appropriate to interaction using multiple channels. In this paper we illustrate such an iterative enablement of a transactional web application with voice interaction – using Amazon.com as a test case. CONSISTENCY IN MULTI-CHANNEL APPLICATIONS We do not assume that the entirety of the business process is supported on each of the interaction channels. Indeed, an important part of the design process is to determine the desired allocation of function to each user category and to each interaction channel. Our goal is that interaction should be consistent both within and across channels. For example, all voice interaction in an application should collect dates using a common prompt style, recognition vocabulary, and confirmation dialog. Across channels as different as voice and GUI, even though prompts, grammars, and confirmations may be specific to voice interaction, the navigation rules for collecting such data may reflect the use of a consistent design with the visual interface. Consider a linear wizard for collecting individual data items followed by a confirmation of the aggregate data in one final step. Since Web Application frameworks factor the implementation of such user tasks into controllers and views, they afford us the opportunity to re-use navigation controllers across channels, while appropriately customizing the view layer on a per-modality basis. This leads to cross-channel consistency in the overall user experience. © 2006 for the individual papers by the papers' authors. Copying permitted for private and scientific purposes. Re-publication of material on this page requires permission by the copyright owners. Page 72 of 81 Proceedings of CHI 2006 Workshop “The Many Faces of Consistency in Cross-Platform Design” SPEECH ENABLING AMAZON.COM Our goal was to analyze scenarios of widely used online applications to determine specific user tasks that are best performed using spoken interaction, and to support those in a manner consistent with the complementary visual interactions. Amazon.com (http://www.amazon.com) was chosen as the target application and offers a set of user tasks accessible through web services interfaces thus readily supporting multi-channel interaction. To support multi-channel design, we have defined a common programming model for user navigation in both voice and visual displays. This notation for flow is based on the well-known State Chart semantics from UML, and is being extended with an XML notation in the W3C Voice Browser Working Group (SCXML) [2]. State charts avoid some of the common problems of state-transition networks, for example an exploding number of states for global actions, by supporting both sequential and parallel (concurrent) flows. Our team is participating in the definition of the SCXML language, and in addition is contributing an open source implementation of an SCXML interpreter to Apache [3]. WITHIN-CHANNEL CONSISTENCY Interactions within the speech-enabled sections of the application are under control of components implemented using our library of Reusable Dialog Components (RDCs) in Apache[1]. RDCs encapsulate a recursive MVC design pattern that capture unique aspects of voice interaction, and help better align the data interfaces amongst the different interaction modalities. Each of a number of data-type specific Reusable Dialog Components encapsulates the best practices of prompting and error correction – typically across multiple dialog turns – and corresponds roughly to the unit of content that a single HTML element would gather in an visual page. Associated with the implementation of an RDC is a more detailed interaction flow diagram, represented again in a State Chart XML notation, which drives the dialog management within that component. An example interaction with the voice-based catalog is: S: Welcome to the RDC Music Store. Choose a music genre or select one of the V-Store categories. U: [silence] S: Say Today's Deals, Top Sellers, New Releases or choose a genre. U: New Releases S: The following titles are available. Please select one of: Thelonious Monk Quartet with John Coltrane at Carnegie Hall, Amarantine, Confessions on a Dance Floor, At Folsom Prison, Some Hearts, Born to Run: 30th Anniversary 3-Disc Set, X and Y, Breakaway, Brokeback Mountain, The Breakthrough. U: Brokeback Mountain S: Brokeback Mountain is available for $13.49 You can buy the album, obtain more information or look up similar titles. U: Describe album S: Brokeback Mountain has been released by Verve on November 01 2005. You can buy the album or look up similar titles. U: Buy album S: Brokeback Mountain has been added to your cart U: View Cart S: The following titles are in your cart: Brokeback Mountain, Confessions on a Dance Floor. S: You can proceed to checkout or continue shopping. Page 73 of 81 Proceedings of CHI 2006 Workshop “The Many Faces of Consistency in Cross-Platform Design” ACROSS-CHANNEL CONSISTENCY SCXML is the programming model both for voice dialogs in RDCs and for dialog flows across multiple JSP pages in a visual application. The goal is to capture essential navigation patterns and then allow for their refinement as either visual pages in HTML or voice interaction using the RDC component library generating VoiceXML. The detailed voice interaction will certainly be different from the HTML pages –and will typically involve multiple turns of dialog prompting for inputs and correcting for errors, in order to complete what is a single screen of a visual interface. However, the higher level task oriented navigation can often remain the same. An interesting special case is in mobile GUI devices, where the information content at each step of visual interaction is similar to that observed in voice interaction. The smaller screen size of mobile devices more closely approximates the information a voice user is able to hear and generate per turn of dialog. In that case, we have been able to draw SCXML dialog flows once and reuse them across both visual and voice interaction without further refinement on the voice side – the following screen shots are from the PDA version of the Amazon voice catalog search described above and were generated from the same navigation controller in both versions of the application. Figure 1: PDA version of Amazon.com catalog shopping We believe the greatest utility of the above technology may be less to achieve a literal reuse of the same web artifacts (navigation controllers, data models, etc) across multiple channels as it is to allow for a much more rapid exploration of refinements to any one channel without disrupting the overall design of the application. When done with our initial catalog search, for example, we felt that the order fulfillment and tracking tasks (“Where’s My Stuff?” in Amazon) might be even more valuable over a speech channel than initial order entry. The effort to have experimented with order entry and then observed its voice characteristics was comparable to that of implementing it over a visual channel – unlike traditional speech development which may be far more costly. We believe, therefore, that techniques such as in our approach can now bring iterative voice development, and hence a practical approach to consistency, within the range of affordable use. REFERENCES [1] Reusable Dialog Components at Apache: http://jakarta.apache.org/taglibs/doc/rdc- doc/intro.html [2] Working Draft of State Chart XML (SCXML): State Machine Notation for Control Abstraction 1.0: http://www.w3.org/TR/scxml/ [3] Commons SCXML implementation at Apache: http://jakarta.apache.org/commons/sandbox/scxml/ Page 74 of 81