Personalizing IoT Ecosystems via Voice Luigi De Russisa , Alberto Monge Roffarelloa a Politecnico di Torino, Dipartimento di Automatica e Informatica, Corso Duca degli Abruzzi 24, 10129 Torino, Italy Abstract Intelligent Personal Assistants (IPAs), embedded in smart speakers, allow users to set up some trigger-action rules, via their mobile apps, to personalize the IoT ecosystem in which they are located. Vocal capabilities might be involved in such rules as triggers or actions, but the actual rule composition and execution flow is totally segregated into the mobile app. This position paper reflects on the challenges and opportunities brought by IPAs if they played a more prominent and integrated role in this personalization scenario. Keywords Internet of Things, Speech, Intelligent Personal Assistant, Voice Interaction, End-User Development 1. Introduction and Background Smart speakers, such as Google Home or Amazon Echo, are entering our homes and enriching the Internet of Things (IoT) ecosystem already present in them. The Intelligent Personal Assistants (IPAs) they include allow users to ask for different information (e.g., the weather or a recipe), set up reminders and lists, and to directly control other IoT devices (e.g., lamps), among the various options. Such assistants, through a companion app installed on their owner’ smartphone, provide advanced features like the possibility to set up some routines in the form of trigger-action rules (see Figure 1 for an example). IPAs can be part of these rules either as triggers (i.e., when the user says a specific sentence) or actions (i.e., reproduce a specific sentence), transforming them from intelligent agents to simple sensors and actuators. End-user personalization capabilities, in other terms, are present in the system but segregated in a mobile app and take no advantages from the NLP and vocal capabilities of such devices. Can we provide IPAs with a more prominent role in such a personalization scenario, directly on the smart speakers? In which steps of the personalization process (creation, explanation and debugging, etc.) those devices can be particularly beneficial? The reminder of this paper will reflect upon those questions, by highlighting challenges and op- portunities towards a better integration of IPAs and smart speakers into end-user development of IoT ecosystems. To do so, we will consider the possible impact of IPAs in smart speakers during the four main steps that might involve personalization rules: a) creation (i.e., composition or selection), b) management, c) execution, and d) debugging and explanation (in case of issues). EMPATHY: Empowering People in Dealing with Internet of Things Ecosystems. Workshop co-located with AVI 2020, Island of Ischia, Italy " luigi.derussis@polito.it (L. De Russis); alberto.monge@polito.it (A. Monge Roffarello)  0000-0001-7647-6652 (L. De Russis); 0000-0002-9746-2476 (A. Monge Roffarello) © 2020 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) Figure 1: An example of a routine for the Amazon Echo: if the user says “Alexa, cinema mode”, the IPA will turn on a lamp with a defined color and luminosity level, and will deactivate the microphone on the smart speaker. 2. Personalization through Conversation For creating a personalization rule, IPAs might either allow selection or direct composition of rules. For what concerns the selection, to our knowledge, InstructableCrowd [1] and HeyTAP [2] are the only two examples of related conversational systems in the literature. InstructableCrowd is a crowd- sourcing system that enables users to create IF-THEN rules based on their needs, via conversation with crowd workers, to describe some problems they are encountering, e.g., being late for a meeting. Crowd workers can then exploit a tailored interface to combine triggers and actions in appropriate IF- THEN rules that are then sent back to the users. HeyTAP, instead, tries to automatically map abstract users’ needs to actual IF-THEN rules, i.e., without the help of other users. It adopts a semantic-based approach to analyze users’ inputs and contextual information to provide a set of appropriate IF-THEN rules from which a user can choose. Both systems heavily rely on a screen-based interaction (either in a web browser or through a mobile app) and allow an indirect definition of trigger-action rules, by providing the user with a set of rules to select from. However, both can be easily imagined as working in a smart speaker equipped with a screen. From these works, a few challenges emerge: for instance, when users were asked to elicit their needs, some of them do not know what to say and, especially in HeyTAP, the need could not always be mapped to specific trigger-action rules. Discoverability, i.e., the ability for users to find and execute features through a user interface, is a recurrent problem with IPAs [3] and it is something to consider also in this scenario. Direct composition of IF-THEN rules can be imagined as well, but the process could be long and annoying for the user, and the same issues that already exist for the trigger-action paradigm (e.g., the difficulty in understanding the difference be- tween events and conditions) should be tackled also in this case. In a voice-driven scenario, however, the IPA could provide suggestions and help the user to understand those differences. Alternatively, a totally different paradigm could be envisioned, which could allow end users to directly compose a rule vocally and without the need to explicitly indicate each trigger and action. Management of existing rules might be an interesting effort, especially for those smart speakers that are not equipped with a screen. Here, the difficulty is not in the command that the end user can provide, as those commands are quite simple (e.g., “delete a rule”) or are strongly linked to the creation phase (e.g., “edit a rule”). The challenge is how to present the list of available rules. Probably, reading all the rules with all their details is inappropriate. Similarly, just listing a few information from the rule (e.g., the title) could provide a very limited overview and increase errors. The challenge is to find a balance between these extremes. In addition, should the user be able to select some rules that she wants to manage, for instance through a search mechanism, such as “list the rules that start this morning” or “list the rules that involve the kitchen lamp”? During the execution of rules, IPAs could not play a significant role. However, it is possible to envision a proactive role for them: since smart speakers are always-on devices, they could be aware of what is happening and which rules are currently active. They might allow the user to ask for those information and stop some rules to be executed. It remains to understand whether and how much this is useful and appreciated. Asking for information about rules in execution can be, instead, particularly useful in case of con- flicts among rules. IPAs could help to debug problematic rules during their execution or, more im- portantly, to explain why a conflict arose. This could be done automatically by the IPA, as soon as it identifies an issue, or manually by the user, if she notices that something strange is happening, like a lamp that started to blink and never stops. The challenge here is, again, at the presentation level: when is it legit to warn the user about a problem? Who is the user to warn? How can the conflict be explained? Options range by describing why a certain rule (or set of rules) is misbehaving, to allowing the user to deactivate one of them, with various levels of details. All these steps present other common challenges and opportunities, as we move away from the “common” screen-based interaction. Authorization issues might arise, for example: who is allowed to create, manage, and get explanation? How can this authorization be performed and set up? By recognizing the speaker through her voice? Procedural issues might be present as well: how long should the conversation last? What happens if the IPA misunderstands a portion of the conversation and activates a problematic rule? Finally, who are the end users in this context? In a home, different people can interact with the system, and they can have different ages, diverse needs, various expertise, and different ways to understand and communicate things. These and similar questions could spark the discussion during the workshop and, possibly, foster new research ideas around vocal interaction for end-user personalization in the IoT. References [1] T.-H. K. Huang, A. Azaria, J. P. Bigham, InstructableCrowd: Creating IF-THEN Rules via Con- versations with the Crowd, in: Proceedings of the 2016 CHI Conference Extended Abstracts on Human Factors in Computing Systems, CHI EA ’16, ACM, New York, NY, USA, 2016, pp. 1555– 1562. doi:10.1145/2851581.2892502. [2] F. Corno, L. De Russis, A. Monge Roffarello, HeyTAP: Bridging the Gaps Between Users’ Needs and Technology in IF-THEN Rules via Conversation, in: Proceedings of the 2020 AVI Confererence (AVI’20), ACM, 2020. doi:10.1145/3399715.3399905. [3] P. Kirschthaler, M. Porcheron, J. E. Fischer, What Can I Say? Effects of Discoverability in VUIs on Task Performance and User Experience, in: Proceedings of the 2nd Conference on Conversa- tional User Interfaces, CUI ’20, Association for Computing Machinery, New York, NY, USA, 2020. doi:10.1145/3405755.3406119.