<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Bari, Italy
" luigi.derussis@polito.it (L. De Russis); alberto.monge@polito.it (A. Monge Rofarello);
carlo.borsarelli@studenti.polito.it (C. Borsarelli)</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Towards Vocally-Composed Personalization Rules in the IoT</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Luigi De Russis</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alberto Monge Rofarello</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Carlo Borsarelli</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Politecnico di Torino, Dipartimento di Automatica e Informatica</institution>
          ,
          <addr-line>Corso Duca degli Abruzzi 24, 10129 Torino</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2021</year>
      </pub-date>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0001</lpage>
      <abstract>
        <p>This paper presents a study aimed at understanding whether and how end users would converse with a conversational assistant to personalize their domestic Internet-of-Things ecosystem. The underlying hypothesis is that users are willing to create personalization rules vocally and that conversational assistants could facilitate the composition process, given their knowledge of the IoT ecosystem. The preliminary study was conducted as a semi-structured interview with 7 non-programmers and provided some evidence in support of this hypothesis.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Internet of Things</kwd>
        <kwd>End-User Development</kwd>
        <kwd>Conversational Agents</kwd>
        <kwd>Vocal Interaction</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction and Goal</title>
      <p>vocally, b) by using which rule format, and c) which kind of support they are expecting from the
IPAs during the creation process. To take a step forward this goal, we conducted 7 interviews
with non-programmers with diferent occupations and backgrounds, described in the remainder
of the paper.</p>
    </sec>
    <sec id="sec-2">
      <title>2. User Study</title>
      <p>We recruited participants through convenience and snowball sampling, by sending private
messages to our social circles. To minimize self-selection bias, we selected participants to enroll
end users that have a medium-high interest in home automation, have a mix of participants
using/not using a smart speaker with an IPA, and balance our population in terms of occupation
and educational background. Our final sample included 3 participants who self-identify as male
and 4 who self-identify as female, with age ranging from 18 to 52 years old. Table 1 summarizes
the relevant information. All participants currently live in Italy and the study was conducted in
Italian.</p>
      <p>All participants completed a two-part study session composed of a background interview
and an imagination exercise. Due to the Covid-19 pandemic, those one-to-one study sessions
were conducted partially online (with Zoom) and partially in-person during the month of April
2021. Study sessions lasted from 25 to 40 minutes.</p>
      <p>Background interview. We first conducted a background, semi-structured interview to
understand which are the relationships that users have with smart speakers (if they have one)
or with IPA’s in general. We also asked about the experience that participants have with home
automation and IoT devices, providing examples when possible. Questions included: “Which
are the main issues you experienced with a smart speaker?” and “In an IoT-powered home, which
activities would you like to automate?”</p>
      <p>Imagination exercise. After the background interview, we conducted an imagination
exercise to elicit, directly from the interviewed participants, how they will create personalization
rules in diferent scenarios by using a smart speaker. Since not all the participants might have
knowledge of end-user personalization in the IoT, we briefly introduced them to the topic.
The collected information allows us to explore the possibilities and approaches that a
nonprogrammer would use to freely create custom rules via conversation. Participants received a
description of a home (e.g., a fully IoT-powered home with a smart speaker for each room) and
two personalization goals: 1) “you would like to turn on the central kitchen light every time you
enter that room” and 2) “you want that, when you go to bed, the shutters of the house are closed
and all the lights go out”. Participants have to express, freely but vocally, an instruction (i.e., a
rule) for realizing each goal and then answer some questions about it.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Results</title>
      <p>All the participants have some knowledge and experience with smart speakers; as expected,
smart speaker’s owners have a more extended knowledge of the possibilities and limitations
of such devices, while the others used them in a more sporadic way, e.g., while at a friend’s
home. They demonstrated, however, to know at least the basic features of smart speakers,
especially of the Amazon Echo. For what concerns home automation and personalization rules,
instead, only 2 participants (P2 and P3) have smart devices at home but they are not in charge
of configuring devices and creating personalization rules. However, P1, P3, and P7 knew about
the rule composition capability included in the mobile app of smart speakers, the Amazon Echo
in this case. P1 knew about such a capability, but never tried to create a rule. P3 came up with
interesting rules, but he was not able to actually set all of them. P7 was the only one who
actually created a rule through the Amazon Alexa’s app, namely a “goodnight” scenario to be
activated on a vocal command.</p>
      <p>In the imagination exercise, all participants created rules with a structure similar to the
trigger-action formalism, even if they were not instructed or primed to do it. As mentioned in
previous work about trigger-action programming, also in this case participants used triggers
one level of abstraction higher than direct sensors [5].</p>
      <p>We noticed, however, a clear diference in the rules composed by participants who owned a
smart speaker and those who did not. For instance, while speaking with the kitchen’s smart
speaker to realize the “you would like to turn on the central kitchen light every time you enter
that room” personalization goal, the former group composed the following rule, with very
minor diferences among participants: “ Alexa, every time I walk into the kitchen, turn on the
central light.” These participants specified the room where the rule should happen, even if they
knew that rule was set in the kitchen and that the smart speaker was in the kitchen. Conversely,
with the same instructions, the latter group composed diferent rules, such as “ Alexa, turn on
the light every time I pass by” (P3) or “Alexa, whenever you see someone walk through the door,
turn on the main light” (P6). None of those participants mentioned the kitchen, given that they
were speaking with the smart speaker located in that same room. When asked to fulfill the
same personalization goal but while speaking with a smart speaker in a diferent room, all
the participants generated rules very similar to the one elicited from smart speaker’s owners.
This is possibly due to the participants’ experience with smart speakers: they could have been
primed by the current possibilities of the devices, which require them to be very precise in
their requests. When the participants do not have an extended experience with smart speakers,
instead, they seem to consider that the smart speaker may possess some implicit knowledge,
e.g., where it is located.</p>
      <p>When asked about the possible answers of the speaker after the rule creation, participants
preferred to have an explicit acknowledgment that the rule was correctly understood, i.e., by
having the smart speaker repeat the entire rule in its own terms, with a confirmation at the end.</p>
      <p>Finally, participants commented about what should happen if the IPA did not fully understand
the rule. They recognized two main options: i) the composed rule has a missing/unclear info
(e.g., which lamp to turn on) or ii) there is one or more mistakes in the rule. In the former case,
participants would accept either an auto-complete feature, if possible (e.g., if there is only one
lamp that can be turned on), or an explicit request from the speaker (e.g., “which lamp do you
want to turn on among these?”). In the latter case, participants would use a trial-and-error
approach, where they rephrase the rule until the IPA understands it correctly.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusions</title>
      <p>This paper presents a pilot study, in the form of semi-structured interviews, aimed at
understanding whether and how end users would converse with a smart speaker to personalize
their IoT ecosystem at home. Preliminary results are promising and seem to indicate that end
users would be inclined to compose personalization rules by speaking with a smart speaker.
Also, they naturally used the trigger-action format to compose such rules, thus confirming
the versatility of that formalism. Interesting, non-programmers would create rules with more
or less details according to their own experience with smart speakers. The most experienced
user would specify several details, even if the IPA could possess that implicit knowledge (e.g.,
the room where the rule is composed), while less experienced users would “trust” more the
smart speaker’s knowledge. In any case, they appreciated the possibility to perform such a
relatively complex task with a conversational approach and they would all have an explicit
acknowledgement that the IPA correctly understood the rule they composed with diferent
strategies for error recovering or missing/not precise information. They would accept a certain
degree of automation in deriving which are the devices they would involve in the rule, while
in case of more significant errors they would like to adopt a trial-and-error approach. Future
work will expand the pilot study to confirm these results and will consider other approaches
for the rule composition, complementary to the vocal interaction, such as the possibility to
physically act on the home devices to show the IPA what to do. In addition, a prototype of a
conversational agent in the form of a smart speaker will be designed, developed, and evaluated
with users in a real IoT ecosystem at home.
internet of things, in: End-User Development, Springer International Publishing, Cham,
2019, pp. 209–216. doi:10.1007/978-3-030-24781-2_17.
[4] M. Manca, P. Parvin, F. Paternò, C. Santoro, Integrating alexa in a rule-based personalization
platform, in: Proceedings of the 6th EAI International Conference on Smart Objects and
Technologies for Social Good, GoodTechs ’20, Association for Computing Machinery, New
York, NY, USA, 2020, p. 108–113. doi:10.1145/3411170.3411228.
[5] B. Ur, E. McManus, M. Pak Yong Ho, M. L. Littman, Practical trigger-action programming
in the smart home, in: Proceedings of the SIGCHI Conference on Human Factors in
Computing Systems, CHI ’14, Association for Computing Machinery, New York, NY, USA,
2014, p. 803–812. doi:10.1145/2556288.2557420.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>T.</given-names>
            <surname>Ammari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kaye</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. Y.</given-names>
            <surname>Tsai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Bentley</surname>
          </string-name>
          , Music, search, and
          <article-title>iot: How people (really) use voice assistants</article-title>
          ,
          <source>ACM Transactions on Computer-Human Interaction</source>
          <volume>26</volume>
          (
          <year>2019</year>
          ). doi:
          <volume>10</volume>
          . 1145/3311956.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>L.</given-names>
            <surname>De Russis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Monge</surname>
          </string-name>
          <string-name>
            <surname>Rofarello</surname>
          </string-name>
          ,
          <article-title>Personalizing iot ecosystems via voice</article-title>
          ,
          <source>in: Proceedings of the 1st International Workshop on Empowering People in Dealing with Internet of Things Ecosystems</source>
          , volume
          <volume>2702</volume>
          <source>of EMPATHY</source>
          <year>2020</year>
          ,
          <article-title>CEUR-</article-title>
          <string-name>
            <surname>WS</surname>
          </string-name>
          ,
          <year>2020</year>
          , pp.
          <fpage>37</fpage>
          -
          <lpage>40</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>B. R.</given-names>
            <surname>Barricelli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Casiraghi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Valtolina</surname>
          </string-name>
          ,
          <article-title>Virtual assistants for end-user development in the</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>