A visual privacy tool to help users in preserving social network data Stefano Cirillo1 , Domenico Desiato2,* , Michele Scalera2 and Giandomenico Solimando1 1 Department of Computer Science, University of Salerno, via Giovanni Paolo II n.132, 84084 Fisciano (SA), Italy 2 Department of Computer Science, University of Bari Aldo Moro, via Edoardo Orabona n.4, 70125 Bari (BA), Italy Abstract In the current era, social network platforms are increasingly important, especially for disseminating data that refers to virtual lives that, in most cases, are strictly coupled with real ones. For example, social networks permit us to share emotions, and ways of thinking, connect with people worldwide, find a job, etc. However, to have access to the virtual world, users need to register their data that, in most cases, univocally identify themselves. To this end, arise the necessity to make users aware of privacy issues that may occur when such an amount of data spread over social network platforms are mismanaged. In this work, we propose a visual privacy framework that improves the users’ awareness concerning disseminating their data over social network platforms. Moreover, we define interactive visual metaphors that permit users to understand which kind of information they share and how to manage information disseminated over different social network platforms. Keywords Data wrapping, Data reconstruction, Privacy, Social Networks, Data Analysis 1. Introduction Social networks interpret a crucial role in human interactions because they enable people to subscribe to multiple contents such as emotions, ways of thinking, points of view, and so on. Moreover, plenty of people have social profiles disseminated over several social network platforms, sharing a vast amount of information. Under this view, preserving users’ privacy is challenging for social network platforms since they cannot permit to put at risk the privacy of their users [1]. Users exploit social networks to share information massively, and often, they do not privatize data and are unaware of the privacy threats they can be exposed to. Furthermore, the increasing number of users with social network profiles yields the necessity of monitoring how they manage their privacy, especially when they have multiple social network profiles. Multiple studies have analyzed data privacy in social network domain [2, 3], but few of them provided tools exploited to improve users’ awareness when they share data over social network platforms [4, 5, 6]. In our work, we perform cross-social network analysis over several social network platforms to understand which is the information that is most frequently shared over social networks and that can jeopardize users’ privacy [7, 8, 9]. To this end, we define interactive IS-EUD 2023: 9th International Symposium on End-User Development, 6-8 June 2023, Cagliari, Italy * Corresponding author. $ scirillo@unisa.it (S. Cirillo); domenico.desiato@uniba.it (D. Desiato); michele.scalera@uniba.it (M. Scalera); gsolimando@unisa.it (G. Solimando)  0000-0003-0201-2753 (S. Cirillo); 0000-0002-2455-2032 (M. Scalera); 0009-0000-6627-8820 (G. Solimando) © 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) visual metaphors that permit users to understand which kind of information they share and how to manage information disseminated over different social network platforms. In our proposal, we define a visual tool on top of the SOcial Data Analyzer (SODA) proposed in [10]. The latter can find and extract available information of users on different platforms considering only their photos. In particular, SODA allowed us to perform an accurate analysis for revealing privacy threats linked to incorrect usage of data sharing in social networks. Furthermore, (SODA) also allowed us to evaluate the sensitiveness of information shared by users and perform an exhaustive analysis to understand how social networks can reconstruct users’ data even if some of them are privatized on other platforms. The proposed visual tool is independent of the privacy settings offered by social networks since it simulates the search of a real user and retrieves data publicly available in social network profiles. In other words, if a user has privatized specific information over a specific social network, our visual tool is not able to retrieve that information. However, if the user has some information not privatized over different social networks, the proposed tool retrieves such information. Thus, our visual tool can help users in managing privacy settings offered by social network platforms. In summary, the main contributions of our study are 𝑖) a new visual tool capable of managing users’ data from different social network platforms, and 𝑖𝑖) visual metaphors that permit to have a detailed analysis of users’ data extracted from different social networks aiming to evaluate their privacy and improve their awareness concerning privacy threats in social network platforms. The paper is organized as follows, Section 2 describes related works, whereas Section 3 presents the architecture of the proposed visual tool. Section 4 presents data reconstruction through multiple social networks, and Section 5 describes the experimental evaluation. Finally, conclusions and future research directions are discussed in Section 6. 2. Related work This section discusses relevant articles in which social network privacy-preservation is addressed to evaluate risks connected to personal user data. In the context of privacy preservation for sharing data in social network platforms, several approaches define strategies to make users aware of the privacy issues linked to their posted data. In [11], the authors define a new approach for helping social media users to evaluate their privacy disclosure score (PDS). They assess PDS by taking into account user data shared across multiple social networking sites. Besides, they highlight sensitivity and visibility as the main points that significantly impact user privacy to derive the PDS for each user. The proposed approach exploits the statistical and fuzzy systems for specifying potential information loss derived from the PDS. The authors have analyzed data concerning 15 users registered over different social networks (Facebook, ResearchGate, LinkedIn, and Google+) to perform their analysis. The main differences concerning our work are the methodology used for collecting data and the analysis made over them, i.e. the number of examined users and the social networks considered. Social network data represents a rich source of information, mainly when it characterizes users, and malicious users can jeopardize the user’s privacy by performing targeted attacks to recover sensitive information. In [12], the authors define two modes of users’ private information disclosure behavior: voluntary sharing and mandatory provision. They exploit the Communication Privacy Management theory to build a framework for explaining the impact of individual characteristics, context, and benefit-risk ratio on the user’s willingness to disclose voluntarily or mandatorily. Authors show that voluntary sharing is more likely to be driven by positive factors, such as perceived benefits, social network size, and customization. Simultaneously, mandatory provision is affected by individual characteristics such as age, privacy policy, and perceived risks. They highlight that perceived risk has less impact on voluntary sharing than previous studies suggested. Concerning machine learning applications to preserve privacy in social network contexts, in [13], a comprehensive survey of multiple applications of social network analysis using robust machine learning algorithms is reported. In [14], the authors defined a privacy preservation algorithm that incorporates supervised and unsupervised machine learning anomaly detection techniques with access control models. They evaluated the algorithm over real datasets achieving over 95% accuracy using a Bayesian classifier and 95.53% using deep neural networks. In [15] perform a depression analysis using machine learning approaches over Facebook data collected from an online public source. They evaluated the efficiency of their method using a set of various psycholinguistic features. The authors put evidence that their method can significantly improve the accuracy and classification error rate by revealing that the Decision Tree obtains the highest accuracy than other machine learning approaches to discriminate the user’s depression. Finally, a recent study used data from people from social networks to find Multi-SIM subscribers within the same operator or between operators for improving campaigns and churn prediction models of Telecom customers [16]. 3. Visual Social Network Privacy Tool As previously introduced, we have designed a visual interactive tool on the top of the tool SODA [10]. In particular, the tool combines the effectiveness of SODA with a new tool named Profil3r, which is capable of finding the URLs of people’s profiles on different social network sites, websites, and web applications1 . Profil3r is an OSINT tool that can be executed through a command-line interface. However, these types of tools can be challenging to use, especially for non-expert users, since they cannot provide direct and clear feedback due to the lack of graphical interfaces. For example, a command-line program can be complex because it requires learning the correct syntax of the command, which often needs several parameters. In this paper, we have chosen to integrate a lite version of SODA, limited to reconstructing information only from Facebook and Instagram, with the tool Profil3r that, on the other hand, is limited to finding the URLs related to a user on different social network sites and websites starting from a few basic information. It is important to notice that the original version of Profil3r cannot extract information from the URLs linked to a user. This functionality has been integrated into Profil3r through the use of SODA. However, the SODA requires as a mandatory input an image of a user and/or general information such as his/her name or surname to work correctly. Without these inputs, the SODA is not able to operate. Figure 1 shows an overview of the architecture of the proposed tool. As we can see, the tool starts from a set of specified data according to the input parameters defined by Profil3r. Then, it performs a first-level search on the web to find URLs to the social profiles of the user who requested the analysis. After completing the analysis, we filtered the URLs extracted by Profil3r to obtain only those related to Instagram and Facebook. By starting from these, it is possible 1 Official Repository: https://github.com/Greyjedix/Profil3r SeleniumBrowser http://localhost:3000/index.html Data Reconstructed User User Data UserData Data Reconstructed Username Generator Selenium Driver Profil3r Data SODA Lite Social URL Generator Facebook Crawler Instagram Crawler www.facebook.com/[user]/ Instaloader www.instagram.com/[user]/ Figure 1: Overview of the architecture underlying the proposed visual social network privacy tool. to execute the lite version of SODA. It is important to notice that we have re-designed all the input modules of SODA to work only using a link to a user’s profile. More specifically, SODA receives both Instagram and Facebook URLs and is able to visit these web pages and extract publicly available user information from web pages using two focused crawlers, i.e., Facebook and Instagram Crawler, respectively. Furthermore, the proposed tool exploits the Instaloader2 . framework to extend the set of information that can be reconstructed. In fact, by exploiting Instaloader, the proposed tool can retrieve new information from a public profile, such as hashtags, user stories, geotags, and captions of the posts. Finally, the extracted information is displayed through an interactive interface to help users properly manage their social network data. 4. Data reconstruction through multiple social networks This section presents a cross-social evaluation to show the tool’s effectiveness in analyzing sensitive data shared on various social networks. The collected data and experimental evaluation of the analyzed user data and the performance of the proposed tool in terms of extrapolated attributes are presented below. The experimental evaluation involved a set of real users who were unaware of privacy threats liked to the sharing of information over social networks. All users involved in our experimentation have used the toll only for a personal purpose with full awareness of its potential functions. Through the use of the proposed tool, a user is able to understand the information that can be reconstructed from social, despite any privacy requirement. Figure 2 shows the interface defined for the proposed visual privacy tool, which is provided to the users for evaluating their privacy. In the upper part of Figure 2, the user can decide which social s/he wants to analyze by selecting Facebook, Instagram, or both. Based on the selected choice, s/he provides his/her data, such as first name, last name and username, in order to access the web platform. Submission of data will lead to the execution of forms that are based on Profil3r, which are capable of finding the URLs of users’ profiles on Facebook and Instagram, respectively. Following its execution, the user selects the link to his/her account. User information is identified and collected by executing a light version of SODA. The latter is able to visit the web pages and extract publicly available user information from web platforms using two crawlers. The extracted information belong to the various informative section of the platforms. For example, Facebook could provide data concerning work, education, the place lived, 2 Official Repository: https://github.com/instaloader Figure 2: Overview of the interface of the proposed visual social network privacy tool. family, relationships and personal contact, whereas Instagram could provide data concerning biography, personal site and publicly visible posts. The data are shown in tables to help users easily view the publicly available information extracted. Moreover, to help users to identify sensitive information, visual labels are employed to determine if the extracted data could violate users’ privacy, either on an individual or aggregated level. At the bottom of Figure 2 are provided to the user two additional sections containing the posts and publicly available comments extracted via InstaLoader together with locations where the posts were defined. The latter exploits an interactive geographical map to show the user a history of the places visited by him/her. 5. Experimental Evaluation This section reports an experimental evaluation for verifying the effectiveness of the proposed data reconstruction tool. In particular, we conducted a user study involving several participants. The user evaluation was performed in a research laboratory, where users accessed a pre-configured computer having the proposed tool installed. The study consisted of three phases: an initial survey, a task to be addressed, and a final survey. The initial survey aimed to assess the following aspects: (i)how much users concern about security and privacy, (ii) what behaviour the users adopt for sharing information on social networks, and (iii) what of the shared data the users consider to be sensitive. Instead, the objective of the final survey was to evaluate the participants’ experiences and opinions about the proposed tool. In particular, we involved 10 participants for the user study, comprising individuals with different ages, educational backgrounds, and levels of social media usage. Moreover, the study involved students and employees of the University of Salerno. These participants were informed about the study’s objectives and methods, and before participating, they gave their informed consent. Participants were given access to the tool and guided to provide limited personal information, such as their name and surname. The tool then explored publicly available data from different social networks to reconstruct the public information of the participants.The task submitted to users lasted 5 minutes. The experiment started with explaining the task and the tool to the participants. Then, they were introduced to the purpose of the experiment, i.e. understand users awareness through the tool. In addition, they were given a release attesting that they were aware of the purpose of the experiment and the possible reconstruction of sensitive information. Once the preliminary phase was completed, users were asked to log in using their social login information to extract data shared on the platforms. After using the tool, participants were asked to complete a final survey consisting of Likert scale questions and open-ended prompts. The Likert scale questions assessed participants’ perceptions of the tool’s usefulness. The open-ended prompts allowed participants to provide qualitative feedback on their experience and suggest potential improvements. The purpose of this survey was to collect feedback from participants about the tool’s effectiveness, their satisfaction with the reconstructed data, and their willingness to use the tool in the future. Moreover, participants were encouraged to provide additional feedback or suggestions for enhancing the tool’s performance. It is important to notice that the initial and the post-task questionnaires share several questions, aiming to monitor whether the users’ privacy perception changed after using the proposed tool. The collected survey was analyzed using both quantitative and qualitative methods. The Likert scale responses were subjected to statistical analysis to determine the average ratings for each aspect of the tool. Open-ended responses from the initial and final surveys were analyzed using thematic analysis to identify common patterns within the participants’ feedback. The experiment revealed that users shared their concerns after utilizing our tool. In particular, they noted that certain information they initially deemed non-sensitive in the first survey was able to jeopardize their privacy. Moreover, users highlighted that the tool could be used to support users in understanding how personal information is spread and how it can be reconstructed from different social platforms. Finally, based on the latest survey results, many users expressed concern about the tool’s capability to track their visited locations through Instagram posts. 6. Conclusion In our work, we defined a visual social network privacy tool that helps users to manage their data over social network platforms. In particular, we performed a cross-social evaluation concerning users’ data to help them figure out the sensitivity of their data. In the future, we would like to collect more data concerning users by integrating information over other social networks. Acknowledgments This Publication was produced with the co-funding of the European union - Next Generation EU: NRRP Initiative, Mission 4, Component 2, Investment 1.3 – Partnerships extended to universities, research centers, companies and research D.D. MUR n. 341 del 5.03.2022 – Next Generation EU (PE0000014 - "Security and Rights In the CyberSpace - SERICS" - CUP: H93C22000620001). References [1] M. T. Baldassarre, V. S. Barletta, D. Caivano, A. Piccinno, M. Scalera, Privacy knowledge base for supporting decision-making in software development, in: Sense, Feel, Design: INTERACT 2021 IFIP TC 13 Workshops, Bari, Italy, August 30–September 3, 2021, Springer, 2022, pp. 147–157. [2] M. Teresa Baldassarre, V. Santa Barletta, D. Caivano, A. Piccinno, Integrating security and privacy in hcd-scrum, in: CHItaly 2021: 14th Biannual Conference of the Italian SIGCHI Chapter, 2021, pp. 1–5. [3] L. Caruccio, D. Desiato, G. Polese, Fake account identification in social networks, in: 2018 IEEE international conference on big data (big data), IEEE, 2018, pp. 5078–5085. [4] S. Cirillo, D. Desiato, B. Breve, Chravat-chronology awareness visual analytic tool, in: 2019 23rd International Conference Information Visualisation (IV), IEEE, 2019, pp. 255–260. [5] B. Breve, L. Caruccio, S. Cirillo, D. Desiato, V. Deufemia, G. Polese, Enhancing user awareness during internet browsing., in: ITASEC, 2020, pp. 71–81. [6] V. S. Barletta, G. Desolda, D. Gigante, R. Lanzilotti, M. Saltarella, From GDPR to privacy design patterns: The MATERIALIST framework, in: S. D. C. di Vimercati, P. Samarati (Eds.), Proceedings of the 19th International Conference on Security and Cryptography, SECRYPT 2022, Lisbon, Portugal, July 11-13, 2022, SCITEPRESS, 2022, pp. 642–648. [7] D. Desiato, G. Tortora, A methodology for gdpr compliant data processing., in: SEBD, volume 2161, 2018, pp. 1–4. [8] L. Caruccio, D. Desiato, G. Polese, G. Tortora, Gdpr compliant information confidentiality preservation in big data processing, IEEE Access 8 (2020) 205034–205050. [9] L. Caruccio, D. Desiato, G. Polese, G. Tortora, N. Zannone, A decision-support framework for data anonymization with application to machine learning processes, Information Sciences 613 (2022) 1–32. [10] F. Cerruto, S. Cirillo, D. Desiato, S. M. Gambardella, G. Polese, Social network data analysis to highlight privacy threats in sharing data, Journal of Big Data 9 (2022) 1–26. [11] E. Aghasian, S. Garg, L. Gao, S. Yu, J. Montgomery, Scoring users’ privacy disclosure across multiple online social networks, IEEE access 5 (2017) 13118–13130. [12] K. Li, L. Cheng, C.-I. Teng, Voluntary sharing and mandatory provision: Private information disclosure on social networking sites, Information Processing & Management 57 (2020) 102128. [13] T. Balaji, C. S. R. Annavarapu, A. Bablani, Machine learning algorithms for social media analysis: A survey, Computer Science Review 40 (2021) 100395. [14] R. Aljably, Y. Tian, M. Al-Rodhaan, Preserving privacy in multimedia social networks using machine learning anomaly detection, Security and Communication Networks 2020 (2020). [15] M. R. Islam, M. A. Kabir, A. Ahmed, A. R. M. Kamal, H. Wang, A. Ulhaq, Depression detection from social network data using machine learning techniques, Health information science and systems 6 (2018) 1–12. [16] N. R. Al-Molhem, Y. Rahal, M. Dakkak, Social network analysis in telecom data, Journal of Big Data 6 (2019) 1–17.