=Paper=
{{Paper
|id=Vol-2960/paper9
|storemode=property
|title=Voicing Concerns: User-Specific Pitfalls of Favoring Voice over Text in Conversational Recommender Systems (Short paper)
|pdfUrl=https://ceur-ws.org/Vol-2960/paper9.pdf
|volume=Vol-2960
|authors=Alain D. Starke,Minha Lee
|dblpUrl=https://dblp.org/rec/conf/recsys/StarkeL21
}}
==Voicing Concerns: User-Specific Pitfalls of Favoring Voice over Text in Conversational Recommender Systems (Short paper)==
Voicing Concerns: User-Specific Pitfalls of Favoring Voice over Text in Conversational Recommender Systems Alain D. Starke1,2 , Minha Lee3 1 Wageningen University & Research, Droevendaalsesteeg 4, 6708 PB Wageningen, The Netherlands 2 University of Bergen, P.O. Box 7800, 5020 Bergen, Norway 3 Eindhoven University of Technology, Groene Loper 3, 5612 AE Eindhoven, The Netherlands Abstract In the context of Conversational Recommender Systems (CRSs) and Conversational User Interfaces (CUIs; e.g., Digital Assistants, such as Siri), an increasing number of voice-based applications are emerging, often at the expense of text-based applications. In this position paper, we argue that the possible first-mover advantage of adopting voice-based technologies may put specific groups of users at a profound disadvantage, as they are likely to run into accessibility issues. For example, users that stammer or whom are not fluent in the English language have a hard time using voice-based conversational recommender systems. Along this line, we describe a number of challenges and issues for current and future systems. Keywords Conversational User Interfaces, Recommender Systems, Accessibility, Inclusion, Voice-based Systems 1. Introduction tions will be a bigger target for many commercial ap- plications than text-based systems. For one, a specific Voice-based technologies are diffusing through society at application of voice-based interactions for recommender a fast pace. Reportedly 4.2 billion digital voice assistants systems research is the users’ ability to retrieve person- were in use in 2020 [1], including well-known technolo- alized suggestions by voice, as hands-free interaction gies such as Amazon Alexa and Siri. Their role in the [3, 7]. ‘Internet of Things’ system is becoming increasingly im- Despite the possibilities of voice-based interactions, portant [2], in the sense incumbent technologies, such as we see some challenges. Specifically, the trend of the recommender systems, are often made compatible with commercial landscape that prioritizes voice first comes voice-based applications [3]. with disadvantages for specific users who are either not One aspect of voice-based or conversational user in- equipped to work with the technology (e.g., Siri) or who terfaces is to retrieve personalized content. To date, how- are not the targeted, ‘mainstream’ user. ever, most conversational recommender systems (CRSs) We briefly give an overview of text and voice systems to date are text-based [4]. They focus on mining textual before we jump to the critique of voice-based recom- user input, such as through fixed messages in clickable mender systems. We bring up why text-based solutions menus or by open-ended text queries [5, 6]. In compar- may be more beneficial in certain contexts and for spe- ison, the number of voice-based conversational recom- cific users, making it important text-based CRSs are not mender systems are still limited, but is likely to expand discontinued. However, due to the growing trend of in the coming years [3]. voice-based interactions, e.g., the rise of Alexa, we be- The user-system dynamic between text-based and lieve that we cannot avoid designing for different types voice-based interactions differs greatly. Whereas text- of voice-based interactions in the coming future. For the based CRSs can rely on either open-ended queries or fixed latter, we will formulate a few suggestions. input (e.g., the user selects an answer option), voice-based queries tend to be impromptu and are more complex to process. Nonetheless, given the current share and ex- 2. Conversational Systems pected growth of digital assistant use [1], the emergence of digital assistants, such as Amazon Alexa and Google 2.1. Text-based systems Home, suggests that designing for voice-based interac- Text-based conversational systems, which are also known as chatbots, have been around for decades, such as 3rd Edition of Knowledge-aware and Conversational Recommender Systems (KaRS) & 5th Edition of Recommendation in Complex Weizenbaum’s ELIZA in the 1960s [8]. Chatbots now Environments (ComplexRec) Joint Workshop @ RecSys 2021, exist on many business-to-consumer websites, for ex- September 27–1 October 2021, Amsterdam, Netherlands ample as an automated customer service agent [9]. In Envelope-Open alain.starke@wur.nl (A. D. Starke); m.lee@tue.nl (M. Lee) terms of technical implementation, two approaches are © 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). taken to build chatbots, which typically also applies to CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) text-based conversational recommenders [6]. They are understand multimodal cues like gestures or gaze that ac- either built as command-based systems that respond to company users’ speech [15], which is likely to affect the user queries as if they are commands or as bots that use interpretability of the voice-based query. Finally, when natural language understanding. For example, on Slack, there is more than one person talking, the system has to one can issue commands (e.g., unsubscribe) that chatbots distinguish whose voice to zone in on [7], which might can easily understand rather than using natural language lead to conflicts of agency [16]. Each of these problems (e.g., “please get me out of this channel”) that can be are challenging. Yet, even if these technical issues are vague for systems to understand [10]. Also, people may resolved, they will not create a headway for a seamless easily misspell or give incomplete input that chatbots experience for individuals; inclusion is not about one- cannot accurately interpret. size-fits-all, but about how these technical issues do not An important challenge of conversational systems is disproportionally affect specific userse. to mitigate misunderstandings or a conversational break- down [11]. There are inadequate responses to user re- quests, false positives [11], either because of unclear 3. Critique of Voice-based systems query or a missing database category [12, 13]. In such Voice-based queries tend to be ‘messier’ than text-based cases, conversational repair strategies becomes important inputs [6]. The current state-of-the-art in Natural Lan- like how the system should correct for misunderstood guage Processing methods opens up possibilities for more phrases or unclear user intentions [12]. An example of a open-ended conversational strategies in recommenda- repair strategy is a system giving potential options that tion. However, even NLP-based systems are often still people can choose from, such as “I did not understand limited to familiar input, i.e., requiring an explicit under- that. Did you mean X or Y?”, to keep the conversation standing of users’ messages, which often gets misunder- going. stood. Problems like of environmental noise distortion In sum, the two strategies then are to design 1) of user input are common [7]. Yet when it comes to command-based systems that allow for minimum flexi- user adoption, voice-based technologies have diffused bility on user input for efficiency or 2) natural language- at a large speed in terms of innovation adoption [17, 2]. based systems that allow for greater input flexibility, but These have both positive and negative side effects. also then, with an increased number of possible repair On the one hand, it seems that voice-based applica- strategies that are not always successful. tions be integrated in a modular way, as they can work with recommendation libraries without designing an ap- 2.2. Voice-based systems propriate user interface. On the other hands, it seems Voice-based conversational user interfaces (VUIs) are that innovation in technology may only benefit those becoming more popular in everyday use. In particular, that can work it. Even for people who are considered to smart home assistants such as Google Home and Ama- be “regular users”, there is a lot of trial and error when zon Alexa are used more frequently to help its users find it comes to learning how to interact with voice-based content that they are looking whether [14], regardless of agents [18] and recommender systems. But there are whether the query is mundane and factual (e.g., ‘What people who have additional difficulties due to various weather is it today?’), or more exploratory (e.g., ‘Play me differences in abilities; technical solutions often are built a song for my dinner party’). The latter is more com- around the normative assumption that users are fully monly explored in recommender systems research, for it able-bodied, i.e., with sight, hearing, and other abilities seeks to retrieve an item that a user does not explicitly intact [19]. know about. In recommender domains that use more traditional Current voice-based systems face a number of techni- interfaces, marginalized people are often ‘served’ by a cal issues that are often situational. For example, consid- simple fix. For example, a tourism recommender system ering the use of voice-based assistant in a car [7], there for people with physical disabilities would apply post- may be environmental noise, people not formulating filtering to an appropriate set of recommendations [20]. clearly due to multi-tasking driving and voice interac- The problem for conversational recommenders, however, tion, among other causes. Technical issues that come not only applies to the appropriateness of the suggested with VUIs are many, and will also impact the design of context, but also to the usability of the technology in voice-based conversational recommender systems. To the first place. For example, people who stammer, being list three, they at times lack noise robustness, multimodal a small subset of the population, face difficulties at the understanding, and addressee detection [7]. In most con- start of their interaction: a voice-based system often can- texts, users’ environmental conditions will feature a cer- not understand what they say due to the lack of training tain degree of background noise, such as when people are data and design choices [21]. Their speech does not fit away from home. Moreover, voice-based system cannot the normative template of how people should “normally” talk. Furthermore, many smart home assistants are only text-based chatbots with voice-based agents. In terms compatible with a few languages (e.g., they are ‘biased’ of users being “better understood”, the decades old text- towards English [14]), and speech recognition may be dis- based interactions may be better suited. Perhaps counter- torted because of fluctuations in human emotions [4, 22]. intuitively, due to the limits of query and text-based con- We realize that these issues on inclusion in the use of versational recommendation, the odds are smaller that voice-based assistants is nuanced [16]. This means that it ‘gets it wrong’. Or, more technically, that it generates notions on who can easily use voice-based agents de- a negative adversarial response [25], or has a conversa- pends on multiple factors, such as accents, speech pat- tional breakdown [12]. Although the usability is arguably terns, and the access to commercial agents like Alexa, lower in the sense that one needs to “touch” an inter- which introduces many ways that efforts to include can face, this poses a huge advantage to those who have to also be exclusive, e.g., by prioritizing one accent over concentrate to interact with such a voice-based applica- others. tion. However, the consumer trend is shaping up to favor voice-based applications; IBM, Google, and Amazon have product lines that promote voice-first interactions. 4. Suggestions for Conversational We offer two suggestions moving forward. To optimize Recommender Systems accessibility for all users, a move towards ‘voice-enabled’ rather than ‘voice-first’ or voice-based recommender sys- We have highlighted different challenges for both text- tems would be desirable, akin to technology in which based and voice-based interactions. What stands out is ‘voice’ is a feature rather a key characteristic (e.g., Siri on that some challenges are easier to resolve with user train- an Apple iPhone). Although this requires the deployment ing or adaptation (e.g., lacking sufficient technical knowl- of two different retrieval and recommendation pipelines, edge to use a text-based interface), than other challenges it maximizes accessibility by combining ‘the best of two (e.g., non-native users lack vocal skils, such as because of worlds’. To note, we did not consider multimodality, e.g., stammering) [21]. What these challenges have in com- combination of voice, gaze, body movements, and more, mon is how people’s assumptions about conversational which will become more important in the coming years agents, be they chatbots or Alexas, shape their interac- [26]. tions. People may have expectations that conversational We also suggest that diversity of data for retrieval and agents cannot meet, as the systems cannot yet to com- recommendation is essential to design inclusive conver- plex tasks such as email management by voice [23]. Even sational recommender systems, or systems that cater to text-based chatbots often do not meet people’s needs, as specific users. Efforts are ongoing when it comes to mak- users expect a higher level of understanding from bots ing voice-based interactions more accessible; Google’s that they were not designed for [10]. Hence, for most of Project Euphonia1 aims to collect more data on atypical us, going beyond simple interactions and towards more speech, e.g., from people with cerebral palsy. Similarly, complex exchanges is a problem that we all share due to more time should be spend on collecting difficult data the state of the technology. when it comes to voice in research, in terms of responding Some studies describe that conversational recom- to “unconventional voices”. mender systems are distinct from the more traditional chatbots and dialogue-based systems [6]. However, we argue that the retrieval of conversational elements in 5. Conclusion conjunction with ‘task-related items’ are two sides of the This paper has reflected on current practices in conver- same coin. A task-based conversation can be dialogue- sational recommender systems. In particular, we have based, by supporting a task at hand. Instead of focusing pitted text-based systems against voice-based systems, on a false dichotomy between task-based or dialogue- observing that while voice-based recommender systems based systems, a better way forward is being attentive to are becoming more common because of their integration how different users’ capacities get highlighted or ignored with digital assistant [3], it may put specific users at a by systems. The problem to focus on is inclusion vs. ex- disadvantage. We have identified a number of challenges clusion of user groups based on systems’ assumptions of to make CRSs more inclusive, particularly for the emerg- different abilities that people may or may not have. ing domain of voice-based user interfaces. We emphasize How should we move forward with conversational lastly that inclusion for some may mean exclusion for recommender systems? Recommender systems are tradi- others. In order to recommend to all users, we need to tionally applied in domains where one-shot recommen- understand all users. Specifically, understanding users dations are effective [24], such as movies, e-commerce, not only in terms of preferences, but also in terms of the and books. The use of conversations, however, makes for more complex interactions which introduces greater technical challenges. We above differentiated between 1 https://sites.research.google/euphonia/about/ fundamental conversational elements, such as speech, tivity with artificial conversational agents: people should be a priority. are more likely to initiate repairs of misunderstand- ings with agents represented as human, Computers in Human Behavior 58 (2016) 431–442. References [14] B. R. Cowan, P. Doyle, J. Edwards, D. Garaialde, A. Hayes-Brady, H. P. Branigan, J. Cabral, L. Clark, [1] L. S. Vailshery, Number of digital voice assistants in What’s in an accent? the impact of accented syn- use worldwide from 2019 to 2024 (in billions), 2021. thetic speech on lexical choice in human-machine URL: https://www.statista.com/statistics/973815/ dialogue, in: Proceedings of the 1st Interna- worldwide-digital-voice-assistant-in-use/. tional Conference on Conversational User Inter- [2] D. Pal, C. Arpnikanondt, S. Funilkul, W. Chuti- faces, 2019, pp. 1–8. maskul, The adoption analysis of voice-based smart [15] D. Heylen, Head gestures, gaze and the principles iot products, IEEE Internet of Things Journal 7 of conversational structure, International Journal (2020) 10852–10867. of Humanoid Robotics 3 (2006) 241–267. [3] A. Iovine, F. Narducci, G. Semeraro, Conversational [16] M. Lee, R. Noortman, C. Zaga, A. Starke, G. Huis- recommender systems and natural language:: A man, K. Andersen, Conversational futures: Emanci- study through the converse framework, Decision pating conversational interactions for futures worth Support Systems 131 (2020) 113250. wanting, in: Proceedings of the 2021 CHI Con- [4] C. Gao, W. Lei, X. He, M. de Rijke, T.-S. Chua, ference on Human Factors in Computing Systems, Advances and challenges in conversational rec- 2021, pp. 1–13. ommender systems: A survey, arXiv preprint [17] G. McLean, K. Osei-Frimpong, Hey alexa… exam- arXiv:2101.09459 (2021). ine the variables influencing the use of artificial [5] D. C. Hernandez-Bocanegra, J. Ziegler, Conversa- intelligent in-home voice assistants, Computers in tional review-based explanations for recommender Human Behavior 99 (2019) 28–37. systems: Exploring users’ query behavior, in: CUI [18] C. M. Myers, L. F. Laris Pardo, A. Acosta-Ruiz, 2021-3rd Conference on Conversational User Inter- A. Canossa, J. Zhu, “try, try, try again:” sequence faces, 2021, pp. 1–11. analysis of user interaction data with a voice user [6] D. Jannach, A. Manzoor, W. Cai, L. Chen, A survey interface, in: CUI 2021-3rd Conference on Conver- on conversational recommender systems, ACM sational User Interfaces, 2021, pp. 1–8. Computing Surveys (CSUR) 54 (2021) 1–36. [19] S. Costanza-Chock, Design justice: Towards an [7] F. Weng, P. Angkititrakul, E. E. Shriberg, L. Heck, intersectional feminist framework for design theory S. Peters, J. H. Hansen, Conversational in-vehicle and practice, Proceedings of the Design Research dialog systems: The past, present, and future, IEEE Society (2018). Signal Processing Magazine 33 (2016) 49–60. [20] R. Mahmoud, N. El-Bendary, H. M. Mokhtar, A. E. [8] J. Weizenbaum, Eliza—a computer program for the Hassanien, Similarity measures based recom- study of natural language communication between mender system for rehabilitation of people with man and machine, Communications of the ACM 9 disabilities, in: The 1st International Conference (1966) 36–45. on Advanced Intelligent System and Informatics [9] R. Dale, The return of the chatbots, Natural Lan- (AISI2015), November 28-30, 2015, Beni Suef, Egypt, guage Engineering 22 (2016) 811–817. Springer, 2016, pp. 523–533. [10] M. Lee, L. Frank, W. IJsselsteijn, Brokerbot: A [21] L. Clark, B. R. Cowan, A. Roper, S. Lindsay, cryptocurrency chatbot in the social-technical gap O. Sheers, Speech diversity and speech interfaces: of trust, Computer Supported Cooperative Work Considering an inclusive future through stammer- (CSCW) 30 (2021) 79–117. ing, in: Proceedings of the 2nd Conference on [11] A. Følstad, C. Taylor, Conversational repair in chat- Conversational User Interfaces, 2020, pp. 1–3. bots for customer service: the effect of expressing [22] J. Pittermann, A. Pittermann, W. Minker, Emotion uncertainty and suggesting alternatives, in: Interna- recognition and adaptation in spoken dialogue sys- tional Workshop on Chatbot Research and Design, tems, International Journal of Speech Technology Springer, 2019, pp. 201–214. 13 (2010) 49–60. [12] Z. Ashktorab, M. Jain, Q. V. Liao, J. D. Weisz, Re- [23] E. Luger, A. Sellen, ” like having a really bad pa” the silient chatbots: Repair strategy preferences for gulf between user expectation and experience of conversational breakdowns, in: Proceedings of the conversational agents, in: Proceedings of the 2016 2019 CHI Conference on Human Factors in Com- CHI conference on human factors in computing puting Systems, 2019, pp. 1–12. systems, 2016, pp. 5286–5297. [13] K. Corti, A. Gillespie, Co-constructing intersubjec- [24] G. Adomavicius, A. Tuzhilin, Toward the next generation of recommender systems: A survey of the state-of-the-art and possible extensions, IEEE transactions on knowledge and data engineering 17 (2005) 734–749. [25] G. Penha, C. Hauff, What does bert know about books, movies and music? probing bert for con- versational recommendation, in: Fourteenth ACM Conference on Recommender Systems, 2020, pp. 388–397. [26] Y. Deldjoo, J. R. Trippas, H. Zamani, Towards multi- modal conversational information seeking, in: Pro- ceedings of the ACM Conference on Research and Development in Information Retrieval, SIGIR, vol- ume 21, 2021.