On Privacy Disclosure from User-Generated Content
of Automation Rules⋆
Bernardo Breve1,∗,† , Gaetano Cimino1,∗,† , Vincenzo Deufemia1,† and
Annunziata Elefante1,∗,†
1
    University of Salerno, via Giovanni Paolo II, Fisciano (SA), 84084, Italy


                                         Abstract
                                         Trigger-Action Platforms (TAPs) are systems that enable users to automate routine tasks, such as turning
                                         off lights at a specific time, without requiring technical skills. In the process of creating automation rules,
                                         users are prompted to provide descriptions in natural language, which are referred to as User-Generated
                                         Content (UGC), such as the title that explains the intended behavior of the rule. However, UGC may
                                         contain sensitive information that could expose users to unwanted situations or be exploited by cyber
                                         attackers. This position paper provides an initial assessment of the risks associated with UGC in TAPs
                                         and discusses the use of NLP techniques to mitigate these risks. Additionally, the paper highlights the
                                         need for further research to better understand the impact of UGC on privacy and to develop effective
                                         privacy-preserving mechanisms for TAPs.

                                         Keywords
                                         Trigger-action platforms, Privacy leakage, User-generated content, Automation rules, Smart homes


1. Introduction
The blossoming of smart technology in contemporary society has significantly impacted every
aspect of everyday life. Terms such as “smart cities”, “smart houses”, “smart mobility”, and
“smart health” are now well-known within our vocabulary. The Internet of Things (IoT) [1],
has revolutionized the way end-users interact with technology-injected variants of everyday
objects, allowing for unprecedented control and management over the Internet.
   In order to simplify the way end-users can interact and customize smart devices, the End-User
Development (EUD) [2, 3] paradigm has become in our days increasingly popular, enabling
individuals to access and utilize IoT technology throughout various domains, from business
to healthcare (eHealth) [4]. Smart Houses, in particular, represent a rapidly growing area of
interest and application for IoT technology, enabling users to control all aspects of their home,
such as lighting, television, air conditioning, and garage. To simplify the automation of all these
household tasks, users can utilize Trigger-Action Platforms (TAPs) to create automation rules

IS-EUD 2023: 9th International Symposium on End-User Development, 6-8 June 2023, Cagliari, Italy
∗
    Corresponding author.
†
     These authors contributed equally.
Envelope-Open bbreve@unisa.it (B. Breve); gcimino@unisa.it (G. Cimino); deufemia@unisa.it (V. Deufemia);
anelefante@unisa.it (A. Elefante)
Orcid 0000-0002-3898-7512 (B. Breve); 0000-0001-8061-7104 (G. Cimino); 0000-0002-6711-3590 (V. Deufemia);
0009-0001-7141-6105 (A. Elefante)
                                       © 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       CEUR Workshop Proceedings (CEUR-WS.org)
that incorporate triggers, conditions, and actions [5]. These rules connect online services that
represent both digital and physical resources and are executed when the conditions associated
with triggers are satisfied, leading to the completion of the action. For example, a user may
program a rule that automatically turns on lights at sunset or a rule for activating the air-
conditioning at a specific time of the day.
   The most popular TAP to date is If-This-Then-That (IFTTT)1 , a web-based platform that
boasted more than 18 million users as of 2020 [6]. The popularity of IFTTT can be attributed
to its user-friendly interface, which allows even novice users to easily create new rules from
scratch or use pre-existing ones from its catalog. Searching for rules in the catalog is also a
straightforward process, as each rule is defined by a specific natural language textual title and
description. These two components, formally addressed as User-Generated Content (UGC) [7],
offer significant benefits, as they assist the rule creator in remembering the behavior of their
automatism and provide a means of support for new users to comprehend how the rule operates.
For example, a user can readily activate a rule such as:

       IF I PUBLISH A PHOTO ON FACEBOOK THEN SHARE IT ON INSTAGRAM

      which by means of a UGC could be described as so:

       Keep your Instagram followers updated! This rule allows you to automatically synchronize
       any new photo you upload on Facebook with your Instagram profile.

   At first glance, this rule appears to provide users with significant benefits as it saves individuals
from having to upload their photos manually on both social networks. However, automation
rules can intrinsically raise privacy and security concerns either for the smart environment or
the users, especially when such rules are defined and used by inexperienced users [6, 8, 9, 10, 11].
With regard to the previous example, there might be scenarios where a user would not want
to share his or her photos with followers of one social network over another one, leading to
unwilling uploads of photos that could cause embarrassment.
   In addition, UGC employed by users to describe their rules may provide further damage to
their privacy. In fact, users might mistakenly disclose sensitive information when explaining
the intended behavior of their rules. Alternatively, a user may choose to allow the platform to
automatically complete the fields with relevant information. However, in either case, a user
may inadvertently publish a rule with private and personal data (e.g., the user’s real email), as
shown by the following description:

       When Lautaro Martinez publishes a photo on Instagram, then send an email to EMAIL
       ADDRESS

  Therefore, end-users might thus publicly share their sensitive information, particularly since
the typical user of these platforms lacks technical background and may be unaware of the
potential privacy risks implied by the degree of freedom when typing UGC.

1
    https://ifttt.com
   This position paper outlines a viable solution to mitigate the sensitive information leakage
issue within the context of TAPs.


2. Identifying Privacy Leakage from UGC in the TAP domain
In recent years, several studies have highlighted the sensitive information that is inadvertently
disclosed by users of automation platforms, such as TAPs [12]. In particular, researchers have
investigated the possibility of inferring and constructing a complete profile of the user from the
release of personal data on the Internet, without the user being aware of the harm involved
[9, 13, 14, 15, 16].
   Identifying vulnerabilities in the domestic environment, particularly in the smart devices
that are utilized by millions of users on a daily basis in their houses, is a related area of concern
regarding privacy and data leakage. In fact, if an attacker gains knowledge of all the rules
published by a user on a TAP, s/he could potentially descend to the level of individual devices
[17] and deduce private information about the user. In such cases, it is imperative to conduct
an analysis of the IoT infrastructure to identify and mitigate these security risks.
   The information pertaining to personal data and IoT devices is derived from the unregulated
usage of TAPs by users who may not possess a comprehensive understanding of the internal
mechanisms of these systems. As a result, when users divulge information through UGC, they
may not fully contemplate the ramifications that even a solitary piece of sensitive information
could have on their privacy. UGC in the TAP domain has the potential to cause privacy breaches
in various ways. For example, UGC may inadvertently contain personal information, such as
location data or personal identifiers, which can be easily accessed by third parties, including
attackers and data brokers. Additionally, UGC may be utilized to uncover personal information
by identifying patterns of behavior or preferences. For instance, a user who frequently posts
about their workout routine may be inferred to be health-conscious, potentially making them a
target for health-related advertisements or offers. It is crucial for users to be cognizant of the
potential risks associated with UGC in the TAP domain and to take necessary steps to safeguard
their privacy.
   One promising strategy for addressing the problem of privacy leakage in UGC is the appli-
cation of Natural Language Processing (NLP) techniques to analyze and comprehend human
language used by online users [10]. These techniques can be employed in multiple ways to
help identify any sensitive information being shared. For instance, NLP can detect personal
identifiers like names, addresses, and phone numbers, as well as sensitive data such as financial
information, health data, or passwords. Additionally, it can recognize patterns of behavior
or preferences that may reveal sensitive information about a person. Finally, NLP can also
scrutinize metadata linked to UGC, including timestamps, locations, and devices used to post
the content. While these elements may seem meaningless when considered alone, they could
potentially provide malicious individuals with useful information to plan attacks. For instance,
if a thief is aware that a user has activated the rule “Turn off living room lights when I leave
home”, they could examine the rule-targeted device (the lights) and its location (the living room)
to determine the right moment to carry out a theft.
   An NLP-based methodology for achieving such goals is the employment of Named Entity
Recognition (NER) techniques, which focus on extracting and classifying from texts different
types of entities according to the domain of interest [18]. In the TAP domain, the entities
should refer to the users’ information and the smart devices and online services they use within
automation rules. Specifically, it is necessary to define specific labels, such as PERSON to
indicate a person’s first and/or last name, ORG to denote an online service, and SENS to
highlight sensitive data. Below is an example demonstrating the application within the rule
description shown in Section 1:

    When Lautaro Martinez PERSON publishes a photo on Instagram ORG , then
   send an email to l.martinez@unisa.com SENS

  In conclusion, through the application of Natural Language Processing (NLP), we can gain a
deeper understanding of the potential privacy risks associated with UGC in the TAP domain
and take measures to mitigate them.
  At the workshop, we will discuss how the involvement of NLP techniques can benefit the
achievement of the discussed goals.


Acknowledgments
This work has been supported by the Italian Ministry of University and Research (MUR) un-
der grant PRIN 2017 “EMPATHY: Empowering People in deAling with internet of THings
ecosYstems” (Progetti di Rilevante Interesse Nazionale − Bando 2017, Grant 2017MX9T7H).


References
 [1] L. Atzori, A. Iera, G. Morabito, The internet of things: A survey, Computer networks 54
     (2010) 2787–2805.
 [2] P. Markopoulos, J. Nichols, F. Paternò, V. Pipek, End-user development for the internet of
     things, ACM Transactions on Computer-Human Interaction (TOCHI) 24 (2017) 1–3.
 [3] B. R. Barricelli, F. Cassano, D. Fogli, A. Piccinno, End-user development, end-user pro-
     gramming and end-user software engineering: A systematic mapping study, Journal of
     Systems and Software 149 (2019) 101–137.
 [4] S. S. Mishra, A. Rasool, IoT health care monitoring and tracking: A survey, in: Proceedings
     of 3rd International Conference on Trends in Electronics and Informatics (ICOEI), IEEE,
     2019, pp. 1052–1057.
 [5] G. Ghiani, M. Manca, F. Paternò, C. Santoro, Personalization of context-dependent appli-
     cations through trigger-action rules, ACM Transactions on Computer-Human Interaction
     (TOCHI) 24 (2017) 1–33.
 [6] C. Cobb, M. Surbatovich, A. Kawakami, M. Sharif, L. Bauer, A. Das, L. Jia, How risky are
     real users’ IFTTT applets?, in: Proceedings of the 16th USENIX Conference on Usable
     Privacy and Security, USENIX Association, 2020, pp. 505–529.
 [7] X. Chen, X. Song, R. Ren, L. Zhu, Z. Cheng, L. Nie, Fine-grained privacy detection with
     graph-regularized hierarchical attentive representation learning, ACM Transactions on
     Information Systems (TOIS) 38 (2020) 1–26.
 [8] B. Breve, G. Cimino, V. Deufemia, Towards explainable security for ECA rules, in:
     Proceedings of the 3rd International Workshop on Empowering People in Dealing with
     Internet of Things Ecosystems (EMPATHY ’22), volume 3172 of CEUR Workshop Proceedings,
     CEUR-WS.org, 2022, pp. 26–30.
 [9] Y.-H. Chiang, H.-C. Hsiao, C.-M. Yu, T. H.-J. Kim, On the privacy risks of compromised
     trigger-action platforms, in: Proceedings of 25th European Symposium on Research in
     Computer Security (ESORICS 2020), Springer, 2020, pp. 251–271.
[10] B. Breve, G. Cimino, V. Deufemia, Identifying security and privacy violation rules in
     trigger-action IoT platforms with NLP models, IEEE IoT J 10 (2023) 5607–5622.
[11] M. Surbatovich, J. Aljuraidan, L. Bauer, A. Das, L. Jia, Some recipes can do more than spoil
     your appetite: Analyzing the security and privacy risks of IFTTT recipes, in: Proceedings
     of the 26th International Conference on World Wide Web, ACM, 2017, p. 1501–1510.
[12] R. Xu, Q. Zeng, L. Zhu, H. Chi, X. Du, M. Guizani, Privacy leakage in smart homes and its
     mitigation: IFTTT as a case study, IEEE Access 7 (2019) 63457–63471.
[13] X. Chen, X. Song, G. Peng, S. Feng, L. Nie, Adversarial-enhanced hybrid graph network
     for user identity linkage, in: Proceedings of the 44th International ACM SIGIR Conference
     on Research and Development in Information Retrieval, 2021, pp. 1084–1093.
[14] A. Abbas, J. Holmberg, Information extraction from short text messages, LU-CS-EX
     2019-18 (2019).
[15] F. Erlandsson, M. Boldt, H. Johnson, Privacy threats related to user profiling in online
     social networks, in: Proceedings of International Conference on Privacy, Security, Risk
     and Trust and International Conference on Social Computing, IEEE, 2012, pp. 838–842.
[16] X. Song, X. Wang, L. Nie, X. He, Z. Chen, W. Liu, A personal privacy preserving framework:
     I let you know who can see what, in: Proceedings of the 41st International ACM SIGIR
     Conference on Research & Development in Information Retrieval, 2018, pp. 295–304.
[17] S. Rizvi, R. Pipetti, N. McIntyre, J. Todd, I. Williams, Threat model for securing internet of
     things (IoT) network at device-level, Internet of Things 11 (2020) 100240.
[18] G. Lample, M. Ballesteros, S. Subramanian, K. Kawakami, C. Dyer, Neural architectures
     for named entity recognition, arXiv preprint arXiv:1603.01360 (2016).