December 2019 Automatic Generation of Explanations to Prevent Privacy Violations Ruiz-Dolz RAMON a,1 , Alemany JOSE a , Heras STELLA a and Garcia-Fornes ANA a a Departament de Sistemes Informàtics i Computació, Universitat Politècnica de València, Spain Abstract. With the massive use of online social environments and technologies, users’ concern regarding the privacy of their own data has significantly increased. In this paper, we present a method to automatically generate explanations in a social network domain. This method uses the available data from the network to anticipate any potential privacy violation, automatically generates explanations, and shows them to the user with the purpose of avoiding the detected violation. Keywords. Argumentation, Explainable AI, Privacy, Social Networks 1. Introduction With the continuous increase of the population of online social networks [9], users’ con- cern about the privacy of their data has significantly increased. Online social networks provide users with a series of tools for interacting, sharing and publishing information with other users. With these provided tools, to spread information and data is easier than ever, thus it is imperative to make a responsible use of these technologies. When this does not happen, several threats may appear. An important threat to users’ privacy may be one’s own publications. Although this may not seem very logical, as described in [21], it is very common to find users regretting their own online publications. On the other hand, when involving other users in a publication, since their privacy preferences may be un- known for the author, multi party privacy conflicts [20] are also a very common privacy threat in social networks. Therefore, it is interesting to be able not only to warn users but also to give them explanations of the main reasons of the potential privacy violations de- tected to minimise their occurrence. Additionally, with the recent appearance of stricter laws in the field of data privacy protection, these privacy violations may also have legal consequences for the privacy violator, and social media users’ legal consciousness about privacy is usually very low [16]. This increasing concern about privacy threats in social networks has attracted the interest of researchers to find effective mechanisms to deal with privacy violations. Several privacy management assistance tools have been identified in the literature. In [14,18,19,22], we can observe different approaches to reduce the harm caused by privacy 1 E-mail: raruidol@dsic.upv.es Copyright c 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). December 2019 violations. In [13], an underlying negotiation protocol to choose the optimal privacy pol- icy is presented. However, two important flaws can be identified in all these approaches. First, all of them are focused on minimising conflicts when multiple parties are involved, ignoring the potential self inflicted privacy violations. Second, none of them provide the author with proper explanations of why a specific decision should be made. As pointed out in [10, 11], argumentation seems to be the most coherent approach to tackle this kind of problems. In [12], a negotiation protocol based on computational argumentation techniques to decide the optimal privacy policy is proposed. However, the decision is automatically taken by the system, and no explanation is given to the user. Computational argumentation has an extensive and successful history of applica- tions in law [5, 6], and it is a suitable a method for enforcing privacy and fairness. How- ever, it has not been until recently that the community has started to investigate the po- tential applications of argumentation as a tool to provide AI systems with greater ex- planatory power [17], also in the legal domain [4]. Therefore, with this work we intend to pave the way for the automatic generation of explanations in the privacy management domain. Thus, we propose an argumentation- based approach to automatically generate such explanations and prevent privacy viola- tions in social networks. With our system, users can receive warnings and explanations from the social network site that will improve their awareness on the potential conse- quences of their publications. The paper is structured as follows, Section 2 contextualises the framework of the research presented in this work. Section 3 presents the method proposed to automatically generate explanations to prevent privacy violations in social networks. In Section 4, a case study is set to illustrate how the proposed method works. Finally, Section 5 summarises the main concepts presented in this work and proposes the main lines of future work. 2. Background The research carried out in this work is based on two main pillars: P ESEDIA, an edu- cational social network which is the application domain of the method proposed in this work, and an argumentation framework for online social networks. P ESEDIA is an online social network for educational and research purposes that in- cludes: (i) the design and development of new metrics to analyse and quantify privacy risks [3], (ii) the application of methods to change users’ behaviour regarding their pri- vacy concerns, (iii) the implementation of new features to improve the management of users’ content and (iv) the evaluation and testing of new proposals with real users. The underlying implementation of P ESEDIA uses Elgg [8], which is an open source engine that is used to build social environments. The environment provided by this engine is similar to other social networks (e.g., Facebook). In [15], we formally defined an argumentation framework for online social networks and proposed an architecture for an argumentation system capable of handling the whole argumentation process. The proposed architecture is structured in four different modules: (i) the feature extraction, (ii) the argument generation, (iii) the solver and (iv) the dia- logue modules. The feature extraction module is in charge of retrieving all the needed data to both, prevent any potential privacy violation and to determine which computa- tional arguments (i.e., 3-element tuples) will be subsequently generated by the argument December 2019 generation module. This module must be able to generate four different types of argu- ments: Privacy, Trust, Risk and Content arguments. Once all the computational argu- ments are generated, the solver module must compute the set of acceptable arguments in favour or against making a publication. When the set of acceptable arguments is against making a publication (meaning that a potential privacy violation has been detected), the dialogue module is in charge of trying to persuade the author to modify the publication in order to satisfy the privacy preferences of all users affected. This whole process is the automatic generation of explanations to prevent privacy violations that we will present in this work. 3. Method To properly generate explanations in our argumentation system, it is of the utmost impor- tance to process the features extracted in a coherent way considering the argumentation framework. Computational arguments must be generated taking those features into ac- count and finally, the explanations must be built in such a way that they can be interpreted by humans. It is important to emphasise that, since the implementation has been carried out in P ESEDIA [1], the method proposed is not only compliant with the argumentation framework [15] requirements, but also with the social network functions and limitations. 3.1. Features Four different features are considered by our proposed method to generate explanations: Privacy Preferences, Trust, Privacy Risk Score (PRS) and Sensitive Content. All those features are retrieved by the feature extraction module that works as an intermediary be- tween the social network and the argumentation system. Two main sources can be identi- fied when regarding the acquisition of the features: user preferences data and publication data. 3.1.1. Privacy Preferences When sharing content in an online social network, the author must choose the target au- dience for each publication. For that purpose, privacy selectors such as the one depicted in Figure 1 are available in our educational social network. The options provided to the author as target audiences are the following: public, friends, collections and private. Pub- lic is the adequate option to share content with the whole network. With friends option it is possible to share the content only with the friends of the author. Collections are sub- sets of friends created by each user. Finally, the private option allow authors to keep a publication only visible to themselves. Figure 1. Privacy selector in our social network. December 2019 On the other hand, when creating a profile in the social network, each user must de- fine the default target audience for their publications. Thus, the feature extraction module will retrieve both, users privacy preferences and the privacy configuration of the publi- cation going to be shared. 3.1.2. Trust When more users than the author are involved in a publication, it is important to respect every user data privacy to prevent multi party privacy conflicts. In P ESEDIA, users can evaluate their friendship with a star based rating as depicted in Figure 2. This evaluation is retrieved by the feature extraction module, allowing us to quantify the existing trust between both users. Figure 2. Relationship rating in our social network. 3.1.3. Privacy Risk Score When sharing content in online social networks, it is hard to estimate the scope of a pub- lication. The Privacy Risk Score [3] is a metric that allows us to obtain a value represent- ing the risk of being read by unexpected users. Therefore, the feature extraction module will retrieve the computed PRS value for every publication going to be shared. 3.1.4. Sensitive Content The content of the own publication is also an important feature to be considered when preventing privacy violations. Based in [7] work, we considered the following six cat- egories of sensitive content: location, medical, drug, personal, relatives and offensive. Before sharing any publication, a text content analyser developed in the P ESEDIA frame- work [2] is used to detect any of those categories of sensitive content. A six dimensional vector is generated by the analyser indicating the detected categories in the text. The fea- ture extraction module retrieves the generated vector to feed the argumentation system with the data related to the sensitive content detected. 3.2. Argument Generation Once all the features have been retrieved, the computational arguments are generated. Those arguments are defined by three parameters: the claim, the type and the support. The claim of an argument in this domain can have two perspectives, positive arguments in favour of sharing the publication or negative arguments against doing it. Secondly, each argument can belong to four different classes depending on the source of the features used to generate it: Privacy, Trust, Risk and Content arguments. Finally, the support is December 2019 a value computed from the extracted features that allow to both, quantify the individual strength of an argument and determine the claim of it. Privacy arguments are generated from the privacy features. For this purpose, privacy options are represented with the following values: public (0), friends (0.5), collections (0.75) and private (1). Then, the support of the argument is computed as the difference between the author’s default target audience and the publication privacy configuration. If the resulting value is negative, a privacy argument against making the publication will be generated, since the privacy configuration of the publication is less restrictive than the author’s privacy preferences. Conversely, if the resulting value is not negative, a privacy argument in favour of making the publication will be generated. Trust arguments are generated only when more than one user is tagged in a publi- cation. The evaluation provided by users when adding friends (i.e., a 0-1 ranged value computed from the star based rating) is taken into account as the support for these argu- ments. We fixed a threshold of 4/5 stars (i.e., 0.8 in the numerical value scale) to start considering that trust may be enough to share some content without previously consult- ing. Therefore, a positive trust argument will be generated for each tagged user that has rated the author with 4/5 or 5/5 stars when evaluating the friendship. Conversely, if less stars are given from the tagged users to the author, trust arguments against making the publication will be generated. Risk arguments are generated with the Privacy Risk Score as their own support. A threshold has been defined to discriminate between safe publications and publications that may cause a privacy violation. Due to the nature of this metric, we defined that threshold in the value 0.2, considering all publications with higher PRS dangerous for the author. Therefore, an argument against doing the publication will be generated if the threshold is surpassed. On the other side, an argument in favour of doing the publication will be generated if the threshold is not surpassed. Content arguments are always generated against making the publication if any type of sensitive content is detected. The preference value of the user towards the specific type of content detected is used as the support of the generated argument. That value is modelled as follows, nt v(t) = max(1 − , ε) (1) N where nt is the number of publications containing some specific type of content t and N is the total amount of publications made by the user. This way, if any type of content predominates in some user publications, we can give priority to other types of content (e.g., a user whose account is medical content based will not be concerned about sharing medical content, but it may be about sharing other types of content). Initially, every user has the same value (1) towards each type of content. To smooth the initial decrements, we use ε to make sure the preference value is always higher than zero. Once all the arguments are generated, an aggregation of the scores of every argument is done to compute the acceptable set as explained in [15]. Using complete semantics, only one of the extensions of arguments either in favour or against sharing the publication can be accepted. In the case of accepting the extension of arguments positioned against making the publication, automatically generated explanations will be shown to authors. December 2019 3.3. Explanations In this method, we propose the use of templates to generate the explanations from the set of acceptable arguments. Table 1 contains how each type of computational argument is translated into natural language in order to be correctly interpreted by human users. Therefore, as many explanations as there are computational arguments in the acceptable set, will be generated and displayed to the author. In the next section, we provide an example of the automatic generation of explanations in the P ESEDIA network. Type of Argument Explanation Generated Privacy The publication is going to be read by... (no one., your friends., a collection of friends., all the users.) Trust Some of the people you mention might get upset. Risk Your publication may be read by unknown people. Content(Location) You can be revealing information about where you are or where you’re going. Content(Medical) You may be publishing private medical information. Content(Drugs) People might think you’re on drugs/alcohol. Content(Personal) You could be publishing sensitive personal data. Content(Relatives) You could be making public information related to family or friends. Content(Offensive) Your publication might offend the people who read it. Table 1. Explanations automatically generated by our system. 4. Case Study A brief example of the proposed method for automatically generating explanations is depicted in both Figure 3 and Figure 5. For this example, we have considered a user profile publicly sharing a message that contains several privacy violations (Figure 3). Figure 3. Example of publication triggering the argumentation system. The author has his/her default target audience configured with the public option. Additionally, his tagged friend Juan, has rated his relation with the author with only three stars. From all the retrieved features (Table 2) in this scenario, a computational argument in favour of making the publication (i.e., Privacy) and four different compu- tational arguments against sharing the content (i.e., Risk, Trust, Content(Location) and Content(Personal)) will be generated by the argumentation system. The corresponding December 2019 User Preferences Data Publication Data Privacy Public Public Trust - 0.6 (Juan to Author) Risk - 0.75 Content(Location) 0.4 X Content(Medical) 1 - Content(Drugs) 0.9 - Content(Personal) 0.8 X Content(Relatives) 0.9 - Content(Offensive) 1 - Table 2. Features retrieved by the feature extraction module in the case study scenario. -1, Trust, 0.6 -1, Risk, 0.75 +1, Privacy, 1 -1, Location, 0.4 -1, Personal, 0.8 Figure 4. Argumentation graph generated by the argument generation module in the case study scenario. argumentation graph generated by the argumentation system in this scenario is depicted in Figure 4. With this example it is possible to observe the important difference between Privacy and Risk arguments. Although the author has configured the publication with a privacy policy coherent with his/her preferences, a Risk argument against doing the publication is generated since it is hard to predict to which audience the publication is going to reach. Once the inner procedure of the argumentation system has determined the set of acceptable arguments (against doing the publication), in order to prevent the potential privacy violation, explanations are automatically generated (Figure 5). The author can respond to the generated explanations either accepting them and modifying the original content, asking for more reasons, or else, ignoring them. Therefore, the final decision still relies on the user, but at least it will be made with a clear view on its potential consequences. December 2019 Figure 5. Example of automatic generated explanations. An explanation generated from a personal content argument (left) and an explanation generated from a trust argument (right) are depicted. 5. Discussion In this work, we have proposed a method to automatically generate explanations to pre- vent privacy violations. For this purpose, an educational social network and an underly- ing argumentation framework for social networks have been used. The method proposed uses all the available information from the social network and the mechanisms provided by the argumentation framework to properly generate explanations for preventing a po- tential privacy violation. A case study has also been presented to illustrate the proposed method when facing a real situation. We foresee several improvements as future work. In the current method, except for privacy arguments (where the audience is considered to generate the explanation), all the explanations are the same without taking into account anything else than the type of the argument. It would be interesting to explore the possibility of generating different expla- nations for the same type of arguments depending on the strength of them. Another in- teresting future work task would be to consider not only text, but also image data shared in the network, since data protection rules are even stricter on videos and photos, and many privacy violations are made by these means. Finally, to complement the improve- ments presented above, we are also considering adding a new type of argument based on the potential legal consequences of sharing some specific content. This new type of argument and its subsequent explanation will be useful in trying to persuade the author of a publication to modify it, warning of the possible legal consequences of sharing the publication. Acknowledgements This work is partially supported by the Spanish Government project TIN2017-89156- R, the FPI grant BES-2015-074498, and the Valencian Government project PROME- TEO/2018/002. References [1] J Alemany. Pesedia. red social para concienciar en privacidad. 2016. [2] J Alemany, E del Val, and Ana Garcı́a-Fornes. Empowering users regarding the sensitivity of their data in social networks through nudge mechanisms. In Proceedings of the 53rd Hawaii International Conference on System Sciences (in press), 2020. [3] J Alemany, Elena del Val, J Alberola, and Ana Garcı́a-Fornes. Estimation of privacy risk through cen- trality metrics. Future Generation Computer Systems, 82:63–76, 2018. December 2019 [4] Michał ARASZKIEWICZ and Grzegorz J NALEPA. Explainability of formal models of argumentation applied to legal domain. [5] Trevor Bench-Capon. Before and after dung: Argumentation in ai and law. Argument Computation, pages 1–18, 11 2019. [6] Trevor Bench-Capon, Henry Prakken, and Giovanni Sartor. Argumentation in legal reasoning. In Argu- mentation in artificial intelligence, pages 363–382. Springer, 2009. [7] Aylin Caliskan Islam, Jonathan Walsh, and Rachel Greenstadt. Privacy detective: Detecting private in- formation and collective privacy behavior in a large social network. In Proceedings of the 13th Workshop on Privacy in the Electronic Society, pages 35–46. ACM, 2014. [8] Cash Costello. Elgg 1.8 social networking. Packt Publishing Ltd, 2012. [9] eMarketer. Number of social network users worldwide from 2010 to 2021 (in billions) Statista. https://www.statista.com/statistics/278414/number-of-worldwide-social-network-users/, 2019. [10] Ricard L Fogues, Pradeep K Murukannaiah, Jose M Such, and Munindar P Singh. Sharing policies in multiuser privacy scenarios: Incorporating context, preferences, and arguments in decision making. ACM Transactions on Computer-Human Interaction (TOCHI), 24(1):5, 2017. [11] Ricard L Fogues, Pradeep Murukanniah, Jose M Such, Agustin Espinosa, Ana Garcia-Fornes, and Munindar Singh. Argumentation for multi-party privacy management. 2015. [12] Nadin Kökciyan, Nefise Yaglikci, and Pinar Yolum. An argumentation approach for resolving privacy disputes in online social networks. ACM Transactions on Internet Technology (TOIT), 17(3):27, 2017. [13] Yavuz Mester, Nadin Kökciyan, and Pınar Yolum. Negotiating privacy constraints in online social networks. In International Workshop on Multiagent Foundations of Social Computing, pages 112–129. Springer, 2015. [14] Primal Pappachan, Roberto Yus, Prajit Kumar Das, Tim Finin, Eduardo Mena, Anupam Joshi, et al. A semantic context-aware privacy model for faceblock. In Second International Workshop on Society, Privacy and the Semantic Web-Policy and Technology (PrivOn 2014), Riva del Garda (Italy), 2014. [15] Ramon Ruiz-Dolz, Stella Heras, J Alemany, and Ana Garcı́a-Fornes. Towards an argumentation system for assisting users with privacy management in online social networks. In Proceedings of the 19th Workshop on Computational Models of Natural Argument., pages 17–28, 2019. [16] Katharine Sarikakis and Lisa Winter. Social media users’ legal consciousness about privacy. Social Media+ Society, 3(1):2056305117695325, 2017. [17] Elizabeth I Sklar and Mohammad Q Azhar. Explanation through argumentation. In Proceedings of the 6th International Conference on Human-Agent Interaction, pages 277–285. ACM, 2018. [18] Anna C Squicciarini, Federica Paci, and Smitha Sundareswaran. Prima: a comprehensive ap- proach to privacy protection in social network sites. annals of telecommunications-annales des télécommunications, 69(1-2):21–36, 2014. [19] Anna C Squicciarini, Heng Xu, and Xiaolong Zhang. Cope: Enabling collaborative privacy management in online social networks. Journal of the American Society for Information Science and Technology, 62(3):521–534, 2011. [20] Kurt Thomas, Chris Grier, and David M Nicol. unfriendly: Multi-party privacy risks in social networks. In International Symposium on Privacy Enhancing Technologies Symposium, pages 236–252. Springer, 2010. [21] Yang Wang, Gregory Norcie, Saranga Komanduri, Alessandro Acquisti, Pedro Giovanni Leon, and Lor- rie Faith Cranor. I regretted the minute i pressed share: A qualitative study of regrets on facebook. In Proceedings of the seventh symposium on usable privacy and security, page 10. ACM, 2011. [22] Ryan Wishart, Domenico Corapi, Srdjan Marinovic, and Morris Sloman. Collaborative privacy pol- icy authoring in a social networking context. In 2010 IEEE International Symposium on Policies for Distributed Systems and Networks, pages 1–8. IEEE, 2010.