=Paper=
{{Paper
|id=Vol-3618/dc_paper_3
|storemode=property
|title=Privacy-compliant software reuse: A framework for considering privacy compliance in software reuse scenarios
|pdfUrl=https://ceur-ws.org/Vol-3618/dc_paper_3.pdf
|volume=Vol-3618
|authors=Jenny Guber
|dblpUrl=https://dblp.org/rec/conf/er/Guber23
}}
==Privacy-compliant software reuse: A framework for considering privacy compliance in software reuse scenarios==
Privacy-compliant software reuse: A framework for considering privacy compliance in software reuse scenarios Jenny Guber 1 1 Department of Information Systems, University of Haifa, Haifa, Israel Abstract In recent years, privacy-compliant software development has become an important topic, especially with the emergence of the EU General Data Protection Regulation (GDPR). Existing practices of software development challenge privacy compliance by increasingly promoting reuse, adaptation and integration of existing software artifacts from organizational or open- source repositories. Methods and approaches have been introduced to accelerate and improve development through reuse on the one hand and to mitigate threats related to data privacy on the other hand. However, the operationalization of this body of knowledge for developing systems that intensively reuse software artifacts is understudied. Moreover, ontologies, taxonomies and frameworks developed to conceptualize, organize and model privacy requirements focus on forward engineering activities (software design and development), and are less oriented for application in existing software projects and artifacts that are considered for reuse and integration. The aim of this research is to create a framework aimed to investigate, explore and guide privacy-compliant software reuse, especially in open-source environments. To this end, we will follow a design science approach whose main artifact will be a privacy compliance assessment method. The method will be developed in three steps: (1) systematically reviewing and analyzing the state-of-the-art in privacy-compliant software reuse; (2) empirically studying open-source repositories (in particular, GitHub) for privacy discussions, including ontology- based machine learning method for privacy discussions identification; and (3) developing and evaluating a privacy assessment method, for supporting reuse decisions, utilizing the existing models and frameworks. Keywords 1 Privacy; Software Reuse; Compliance; Software Development; Open-Source; GDPR 1. Introduction and motivation 1.1. Privacy regulations, strategies and technologies Privacy is a fundamental human right. In the digital world, we are dependent on trustworthy functioning of information and communication technologies on one hand and experience a growing power of imbalance between data processing entities and the individuals whose data is at stake on the other hand [1]. To protect the individuals’ right for privacy, several regulations and standards have been established, with the General Data Protection Regulation (GDPR) [2] being the most studied one. Published in 2016 and enforced in Europe since May 2018, the GDPR has introduced several meaningful changes [3], expanding the scope of data protection and the definition of personal data. The evolution of information technologies and the growing concern for privacy and data protection yielded the development of Privacy Enhancing Technologies (PETs), for assessing and mitigating ER2023: Companion Proceedings of the 42nd International Conference on Conceptual Modeling: ER Forum, 7th SCME, Project Exhibitions, Posters and Demos, and Doctoral Consortium, November 06-09, 2023, Lisbon, Portugal jguber@campus.haifa.ac.il (J. Guber) 0000-0002-2585-6601 (J. Guber) © 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Wor Pr ks hop oceedi ngs ht I tp: // ceur - SSN1613- ws .or 0073 g CEUR Workshop Proceedings (CEUR-WS.org) CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings privacy risks. PETs protect the individual's privacy using technical means, such as encryption, anonymous communication protocols, attribute-based access and private database querying [4]. To utilize the full benefit for privacy and data protection, PETs need to be introduced into the initial stages of system development, i.e., design, rather than added in late development stages [1], [5]. The Privacy-by-Design (PbD)2 approach has been developed [6] with the aim to ensure data protection of individuals by integrating privacy considerations from the outset of the development of products, services, business practices, and infrastructures. PbD can be supported through eight privacy design strategies [7]: minimize, hide, separate, aggregate, inform, control, enforce and demonstrate. While the PbD approach is already incorporated into the industry standards and practices and the importance of PETs is recognized, those techniques and approaches focus mainly on forward engineering activities, and privacy compliance of existing artifacts is less studied. 1.2. Privacy compliance in software engineering Even before the GDPR came into force, it has been imperative for organizations to consider privacy compliance at the initial stages of the software development [8]. However, software reuse raises additional challenges to privacy compliance. While the existing privacy compliance approaches mainly reside in the forward engineering activities, i.e., designing and building privacy compliant software components [9], assessing compliance of existing artifacts and integrating them in an existing system in a privacy compliant manner are still challenging. These require reverse engineering of privacy requirements and analysis of applied privacy strategies and PETs. The effort required to adapt privacy- compliant software reuse methods to different (and evolving) privacy regulations should be further investigated. In addition, objectively measuring the level of privacy compliance may assist in improving reuse decisions in general and selecting the most appropriate artifacts for reuse in particular. 1.3. Privacy models, ontologies and taxonomies Over the years, several ontologies and taxonomies have been proposed to conceptualize, organize and model privacy requirements. Two of the recent examples are the ontology by Gharib et al. [10] and the taxonomy by Sangaroonsilp et al. [11]. Conducting a systematic literature review, Gharib et al. identified 55 concepts and relations, grouped into four main categories: organizational concepts (including agentive, intentional and informational entities, as well as entities’ interactions), risk, treatment, and privacy [requirement] concepts. Sangaroonsilp et al. developed a taxonomy that provides a comprehensive set of privacy requirements based on four well-established personal data protection regulations and privacy frameworks, including GDPR, ISO/IEC 29100, Thailand Personal Data Protection Act (PDPA) and Asia-Pacific Economic Cooperation (APEC). The taxonomy includes seven categories (User Participation, Notice, User Desirability, Data Processing, Breach, Complaint/Request and Security), their sub-categories and the specific privacy requirements that may belong to multiple categories and sub-categories. In addition, to facilitate and realize the Data Protection by Design approach, a few projects have been initiated by individuals and by the EU to create frameworks for privacy compliance. One example is the privacy threat analysis framework LINDDUN [12], which stands for Linkability, Identifiability, Non-repudiation, Deniability, Disclosure of information, Unawareness and Non-compliance – privacy threat types that negate common widely accepted privacy properties. Additionally, PDP4E project suggested methods and tools for GDPR compliance through privacy and data protection engineering by implementing those methods into ongoing systematic engineering practices, and DEFeND project (Data Governance for Supporting GDPR) delivered a platform for assisting organizations achieve compliance with legal and privacy requirements, focusing on the GDPR implementation [13]. However, those frameworks concentrate on incorporating compliance from the beginning of the software development lifecycle. Facilitating privacy compliance while performing software reuse remains insufficient. 2 Sometimes referred to as “Data Protection by Design”. 1.4. Paper Structure Privacy has gained an increasing interest in the last two decades and a variety of regulations, strategies and technologies have been proposed to address its different aspects. While performing forward engineering activities in software development in a manner compliant to privacy regulations has already been explored by different studies but achieving privacy compliant software reuse is still understudied. This calls for developing a method that aims to assess and evaluate privacy level of the components and to integrate them into the system in a compliant way. In addition, while modeling by taxonomies, ontologies and frameworks has been underway, those mostly target privacy requirements and concentrate on forward engineering activities, and do not handle assessing privacy levels of existing software artifacts to support reuse. The rest of this paper is organized as follows. Section 2 presents the research objectives and the research questions. Section 3 presents the related work. Section 4 describes the research methodology and the expected contributions, while Section 5 details the progress achieved so far. 2. Research objectives and questions 2.1. Research objectives Our main working hypothesis is that software reuse challenges privacy compliance in general and PbD in particular. Two major gaps identified in literature serve as the motivation for our research. The first gap regards analyzing, mining, monitoring and tracing privacy requirements. Despite the large corpus of privacy regulations and compliance methods, developers face challenges to operationalize them [14]. This is significantly noticeable when the development includes reuse of open- source artifacts. Our first objective is to analyze and mine existing artifacts for privacy characteristics through analyzing their meta-data and discussions. The second gap regards adaptation and integration of artifacts in complex projects that comprise of dependent software artifacts [15]. Complex projects may comprise of artifacts originating from different sources (i.e., proprietary software, third-party software, open-source repositories), and having different levels of privacy. Our second objective is to analyze, simulate and devise the “aggregate” level of privacy of the entire system and check whether it satisfies the overall privacy requirements, thus performing privacy compliance assessment of the artifacts and the integrated software. 2.2. Research Questions To fulfil the above objectives, we consider the following research questions (RQ): 1. RQ1. What is the current state-of-the-art in privacy-compliant software reuse? • RQ1.1 What are the regulations considered in privacy-compliant software reuse? • RQ1.2 What are the business and technological domains in which privacy-compliant software reuse is researched? • RQ1.3 What are the utilized reuse approaches and how do they relate to the reuse landscape? • RQ1.4 What are the privacy strategies implemented in the context of privacy-compliant software reuse? • RQ1.5 How is privacy-compliant software reuse evaluated? • RQ1.6 What are the main open challenges for performing privacy-compliant software reuse? 2. RQ2. How are privacy issues discussed and dealt with in open-source environments? • RQ2.1 How can privacy issues be identified in open-source environments? • RQ2.2 To what extent can the identification be improved based on privacy ontologies? • RQ2.3 To what extent do automatically analyzed sentiments of privacy-related issues correlate with privacy compliance of those projects? • RQ2.4 What are the unique privacy-related characteristics for software projects with a high reuse potential? 3. RQ3. How can the privacy compliance of a software artifact be assessed to facilitate and support reuse? • RQ3.1 How can the privacy compliance of existing software artifacts be assessed? • RQ3.2 What is the aggregate privacy compliance level when integrating software artifacts? 3. Related work This section briefly reviews the literature relevant for answering our research questions. State-of-the-art in privacy compliant software development (RQ1): A few SLR works were performed in the last decade on security aspects in software development. Among them, the works in [16]–[18] discuss security in cyber-physical systems, electronic health records and software development lifecycle in general. Differently from these works, our study concentrates on software reuse which requires assessment and integration of already existing artifacts, evaluating their level of privacy compliance and preserving the level of privacy when being adapted and integrated. Several systematic reviews on software requirements reuse [19], and on non-functional requirements [20], [21] that inherently include privacy requirements were also performed. However, these works do not concentrate on privacy compliance aspects. The systematic mapping study in [22] deals with privacy-by-design approaches in software engineering. This work maps the goals of privacy-by-design to software engineering activities. We further aim to analyze the challenges of privacy compliance in software reuse scenarios. Discussions on privacy aspects in open-source environments (RQ2): Discussions in open-source environments have been researched for different purposes, such as understanding the social potential of those environments, devising on popularity of the different projects and analyzing their maintenance level and required contributions [23]. In the context of privacy and security, the issues usually provide an important source of information on the project. The work in [24] identified several end-user human- centric issues discussed on GitHub: Inclusiveness, Privacy & Security, Compatibility, Location & Language, Preference, Satisfaction, Emotional Aspects, and Accessibility. Some works further analyzed the sentiments of comments and discussions in social networks. The work in [25], for example, found that applying emotion mining to developer issue reports can be useful to identify and monitor the mood of the development team, and thus predict and resolve potential threats to the team well-being, as well as discover factors that enhance team productivity. The work in [26] created a domain-specific tool for sentiment analysis of the issues documented in software development, improving the accuracy of the analysis by enriching the classifier with domain-specific lexicon. The work in [27] reports on a positive correlation between favorable sentiments and improved practices in the context of software engineering and development. We intend to advance this body of research by analyzing the relations of the analyzed sentiments to privacy compliance, and to explore whether meta-data of OSS projects (in particular, issues) can predict privacy compliance. One of the challenges in performing empirical research on privacy issues in OSS is a lack of previously annotated datasets on the subject, impeding the precision and correctness of privacy issues identification and categorization. An approach to overcome this challenge is utilizing ontology. The researchers in [28] conducted experiments on text classification with various classifiers, both prior to and subsequent to utilizing a disease ontology. Notably, this integration led to enhancement in the outcomes. Moreover, the adoption of a domain-specific ontology has demonstrated improvement in the accuracy of text classification in situations where a sufficiently large and well-labeled training corpus is not at hand [29]. Our research seeks to extrapolate these approaches to the realm of privacy, employing Gharib et al.’s ontology [10]. Privacy compliance assessment (RQ3): Privacy is a multi-faceted concept [30]–[32] that may refer to social, physical, informational and psychological domains. The methods for assessing privacy levels that are described in literature focus mostly on security aspects [33], [34], such as the Common Vulnerability Scoring System (CVSS)3, an open scoring system of security vulnerabilities and threats. Previous works analyze privacy and security breaches for a specific domain (smart homes) [35], or a specific type of applications (contact tracing) [36]. However, checking an overall privacy compliance 3 https://www.first.org/cvss/ of software is challenging. Few works focus on the analysis of the software code itself for implementation of privacy requirements [37], [38]. According to [39], a traditional static code analysis- based vulnerability discovery is insufficient for compliance checking of regulatory requirements. A model for automated privacy compliance checking of applications in the cloud is introduced in [40]. Despite the above studies, to the best of our knowledge, there is no end-to-end approach for assessing and measuring privacy compliance level of software artifacts, and we aim to advance this line of research by creating a privacy compliance assessment method. 4. The research methodology and expected research contributions Figure 1 depicts the main research activities and their outcomes. Figure 1: Main research activities and their outcomes 4.1. Systematic literature review of privacy-compliant software reuse This activity aims to address RQ1 by systematically reviewing and analyzing the existing state-of- the-art privacy-compliant software development, focusing on reuse scenarios. It follows the guidelines for conducting SLRs by Kitchenham [41], complemented by the guidelines on snowball sampling by Wohlin [42] and the guidelines for study selection in PRISMA2020 statement [43]. The SLR search query includes the following concepts with a logical condition of AND between them: • Concept 1 (the examined aspect): privacy OR “data protection” • Concept 2 (the examined process): reuse OR reusing OR reusab* OR cloning OR clone OR config* • Concept 3 (the examined object): software OR product OR system • Concept 4 (the examined phase): requirement* OR analys* OR analis* OR analyz* OR analiz* OR domain* OR model* OR design*. After retrieving the potential papers to be included in the SLR, inclusion and exclusion are applied. The inclusion criteria are: (1) the paper should introduce a technique/method/tool for privacy-compliant software reuse and (2) it should be published in a peer-reviewed journal, conference or workshop. The exclusion criteria are: (1) the paper should not be too short (3 pages or less), (2) the paper is not a variant of another paper in the corpus, (3) the paper is written in English, (4) the paper is not a primary study, (5) the paper is not directly related to software reuse or data privacy, and (5) the full text of the paper is not accessible. We resulted with a corpus of 61 papers which was analyzed to address questions RQ1.x (see Section 5.1 for more details). 4.2. Empirical exploration of privacy issues in open-source environments To address RQ2, we apply an empirical approach on a sample of open-source projects, with the purpose of identifying privacy aspects in highly reusable projects as compared to those with a lesser reuse potential. Those aspects may appear in different fields (e.g., issues, comments, commits and pull requests). We aim to explore whether machine learning methods can automatically identify privacy- related issues, and whether this method can be enhanced utilizing existing knowledge representations. We assume that, if the automatically computed outcomes can be mapped to human-generated knowledge base, then large datasets of issues can be automatically analyzed and decisions, e.g., regarding using or reusing the related projects, can be made automatically or semi-automatically, based on privacy requirements, as reflected in issue discussions. As for devising the reuse potential of the projects, some meta-data available in open-source repositories refer to popularity of projects or users, e.g., the numbers of active forks, stars and followers [44], [45]. The authors of [46] have already demonstrated a relation between popularity of a project to its quality and reuse potential. We plan to analyze the popularity characteristics of a sample of OSS projects, to differentiate between potentially highly reusable and “regular” software projects, and to explore the relations between the privacy characteristics of those two kinds of projects. While initial results have already been achieved and submitted to a conference (see Section 5.2), we intend to extend the datasets and the explored elements beyond issues. In addition, we intend to devise whether the sentiment of the privacy discussions (positive/neutral/negative) and the subjective privacy compliance level of the project are correlated. We plan to conduct sentiment analysis of the identified privacy issues, and independently, to conduct a survey with OSS project owners or privacy experts to annotate the projects in our dataset to [privacy-]compliant and non-compliant. To this end, we plan to base our survey on the privacy design strategies presented in [7]. The results of those two steps will be compared for correlations using statistical techniques. 4.3. Development of the privacy compliance assessment method This activity will follow the design science approach [47] and will be based on the results of the SLR (addressing RQ1) and the empirical exploration (addressing RQ2). We will develop techniques for privacy compliance assessments of software artifacts and for presenting those assessments in supporting reuse decisions. An initial input for the privacy compliance assessment method will be the comparison of the privacy requirements for the regulation compliance of the software artifact and the privacy strategies [7] and/or PETs [48] already implemented in the artifact. Since the discovery of the already implemented privacy strategies and PETs requires detailed technical documentation of the software project and is not always available, our method will incorporate the existing documentation together with the community discussions on and sentiment analysis of the software artifact. This part of the research has not begun and there are still a few challenges to overcome: The first challenge refers to the definition of the privacy compliance level of software artifacts. Whilst this level may be either qualitative or quantitative in one or more dimensions, we plan on creating a multi-dimensional qualitative scoring method. The method will include, as a first step, a scoring scale for applying PETs; as a second step mining of relevant privacy strategies will be performed for the specific software artifact, based on the known privacy requirements; next, extraction of already applied PETs will be performed based on the project documentation and meta-data; and finally, the score for the software artifact privacy compliance level will be devised, based on the previous steps and additional data such as sentiment analysis. An additional challenge is the level at which the privacy compliance level will be calculated – e.g., a project, a component or a function. Currently, our plans refer to whole projects, since most of the data on which we base our analysis is managed on the project level. However, as the privacy compliance level of the different parts of a project may differ, we will have to check the impact of such a score on lower levels of granularity (e.g., components and functions). In addition, many of the reuse scenarios deal with reuse on lower levels of granularity by integrating third-party components, libraries and/or functions into existing projects to resolve existing issues and to create additional functionality. The challenge is to assess the impact of those integrations on the overall privacy compliance level of the software project. Finally, the suggested method has to be evaluated for usability and generality, and the access to evaluation objects may be challenging [49]. One of the main ways of evaluating method usability and usefulness is with experts. To overcome this challenge, we plan to conduct expert evaluation of the method itself, or its outcomes, with professionals and/or advanced students. 4.4. Expected contributions The contributions of the work are valuable both for research and practice. From a research point of view, the research artifacts will enrich the knowledge base with: A1. A systematic review (RQ1) that will examine the current state-of-the-art and present the results of a comparative analysis of privacy-compliant software reuse methods. Special attention will be given to the reuse approaches and privacy strategies of these methods. The SLR will also identify the contemporary challenges for performing privacy-compliant software reuse and future directions. A2. An ontology-guided machine-learning method (RQ2) for identifying and classifying privacy- related discussions in the form of issues and additional meta-data items in open-source repositories. The method will also devise the potential privacy compliance of projects based on the sentiment of the privacy discussions in the OSS and opinions of experts involved in the projects, for projects with different reuse potential. A3. A method for assessing the privacy compliance level of software projects (RQ3) that will support the reuse lifecycle and improve privacy compliance of those projects by making sure the additional software artifacts integrated into the projects do not decrease the current compliance level. From a practical point of view, we plan on supporting privacy-compliant software reuse by applying the method in A3 on third-party and open-source artifacts and assessing their privacy compliance level to select the most appropriate artifacts for reuse. This will increase the level of privacy compliance in complex software systems that make extensive reuse of software artifacts. 5. Progress and preliminary results We so far concentrated on the two first research questions, forming the basis for RQ3. Below is a summary of the achievements. 5.1. The current state of the systematic literature review This part was completed, and its outcomes were sent as a paper to the IST (Information & Software Technology) journal. It is currently under minor revision. We found that the reviewed 61 studies vary in terms of business domains (e.g., healthcare, smart objects and finance) and technological domains (e.g., IoT, mobile, cloud and microservices). Most of the studies do not refer to a specific regulation and if so – to GDPR. Their common purpose is to support benign reuse, most notably through patterns, components & libraries and model-driven engineering, but malicious reuse is also researched to a lesser extent. A strong emphasis is put on integrating privacy strategies whose goal is building trust and transparency (in particular, inform and demonstrate), while other strategies are studied to a limited extent in software reuse context. Evaluation is commonly performed through analytical, observational and experimental approaches. We further found that the assessment and operationalization of privacy compliance practices for existing software artifacts is still challenging. The challenges encompass improving trustworthiness of reused artifacts, ensuring privacy compliance in distributed architectures, bridging the gap between legal regulations and software requirements, enhancing privacy analysis and vulnerability detection, supporting late application of privacy strategies, and developing objective assessments for privacy- compliant software reuse. 5.2. The current state of the empirical exploration of privacy issues So far, we studied to what extent machine learning outcomes can be mapped to privacy-related knowledge representations for identifying and categorizing privacy-related issues in open-source environments. We explored a dataset of 2,556 issues from open-source projects. 1,374 of them (about 54%) are issue reports from Jira associated with two large-scale, popular and well-maintained software projects, Chrome and Moodle, used in [11] and annotated as privacy-related. The other 1,182 are issues from six diverse projects in GitHub annotated by [24] as non-privacy related. First, we preprocessed the dataset by means of text cleaning, removing irrelevant parts (e.g., html tags, numbers, punctuation marks and stop words) and performing tokenization and lemmatization [50]. We extracted words from URLs that appear in the issues and left them for the analyses, assuming that some may be meaningful and relevant for classification. Then, we used YAKE! keyword extractor [51] for extracting potential key-terms from the preprocessed text. We applied Reduced Error Pruning (REP) tree [52] and Support Vector Machine (SVM) [53] classifiers. We chose these classifiers because they differ in their underlying principles, learning approach, and decision-making processes. The metrics of the two classifiers were compared to validate that the results are similar in terms of correctly classified items, precision, recall and F-measure, indicating that similar patterns or relationships in the data have been learned. As indicated by the relatively high values of F-measure for both classifiers (86.6% for REP tree and 87.1% for SVM), we can conclude that the classifiers have the potential for distinguishing between privacy and non-privacy issues. Due to the simple, intuitive and visual format of REP tree outcomes, we used them to identify key-terms for classification of the issues into one of two classes – privacy and non-privacy related. Lastly, we succeeded in manually mapping the identified key terms to the main concepts of the ontology in [10]. The root term, privacy, appeared in only 20% of privacy- related issues, so the relatively high accuracy of the classifier cannot be attributed only to this term. The terms user, datum and setting can be considered organizational; tool policy and tool data privacy relate to policies or guidelines for data privacy handling; and incognito window, incognito mode and [third] party cookie – to privacy-related risks. Acknowledgements I am grateful to my advisor Professor Iris Reinhartz-Berger for supervising my research, for countless inspirational discussions and for being available for me almost around the clock. I really appreciate this enormous dedication. References [1] G. Danezis et al., Privacy and data protection by design - from policy to engineering, no. December. 2015. doi: 10.2824/38623. [2] “EU Regulation 2016/679 of the European Parliament and of the Council,” Official Journal of the European Union, 2016. https://gdpr.eu/ (accessed Feb. 17, 2022). [3] Y. S. Martin and A. Kung, “Methods and tools for GDPR compliance through privacy and data protection engineering,” Proc. - 3rd IEEE Eur. Symp. Secur. Priv. Work. EURO S PW 2018, pp. 108–111, 2018, doi: 10.1109/EuroSPW.2018.00021. [4] J. J. Borking and C. D. Raab, “Laws, PETs and other technologies for privacy protection,” J. Information, Law Technol., 2001. [5] S. D. Ringmann, H. Langweg, and M. Waldvogel, “Requirements for legally compliant software based on the GDPR,” vol. 11230 LNCS. 2018. doi: 10.1007/978-3-030-02671-4_15. [6] A. Cavoukian, “Privacy by design: The 7 foundational principles,” Priv. by Des. Canada, vol. 3, no. 2, pp. 247–251, 2010. [7] J.-H. Hoepman, “Privacy design strategies,” in SEC 2014, IFIP AICT 428, 2014, pp. 446–459. [8] A. J. Aberkane, G. Poels, and S. Vanden Broucke, “Exploring automated GDPR-compliance in requirements engineering: A systematic mapping study,” IEEE Access, vol. 9, pp. 66542–66559, 2021, doi: 10.1109/ACCESS.2021.3076921. [9] H. van Rossum et al., “Privacy-enhancing technologies: The path to anonymity,” vol. I, no. Volume I, pp. 1–60, 1995. [10] M. Gharib, P. Giorgini, and J. Mylopoulos, “Towards an ontology for privacy requirements via a systematic literature review,” ER, 2017, doi: 10.1007/s13740-020-00116-5. [11] P. Sangaroonsilp, H. K. Dam, M. Choetkiertikul, C. Ragkhitwetsagul, and A. Ghose, “A taxonomy for mining and classifying privacy requirements in issue reports,” Inf. Softw. Technol., vol. 157, 2023, doi: 10.1016/j.infsof.2023.107162. [12] M. Deng, K. Wuyts, R. Scandariato, B. Preneel, and W. Joosen, “A privacy threat analysis framework: Supporting the elicitation and fulfillment of privacy requirements,” Requir. Eng., vol. 16, no. 1, pp. 3–32, 2011, doi: 10.1007/s00766-010-0115-7. [13] R. M. de Carvalho et al., “Protecting citizens’ personal data and privacy: Joint effort from GDPR EU cluster research projects,” SN Comput. Sci., vol. 1, no. 4, pp. 1–16, 2020, doi: 10.1007/s42979-020-00218-8. [14] B. Kostova, S. Gürses, and C. Troncoso, “Privacy engineering meets software engineering. On the challenges of engineering privacy by design,” 2020, [Online]. Available: http://arxiv.org/abs/2007.08613 [15] M. Lungu, R. Robbes, and M. Lanza, “Recovering inter-project dependencies in software ecosystems,” ASE’10 - Proc. IEEE/ACM Int. Conf. Autom. Softw. Eng., pp. 309–312, 2010, doi: 10.1145/1858996.1859058. [16] N. M. Mohammed, M. Niazi, M. Alshayeb, and S. Mahmood, “Exploring software security approaches in software development lifecycle: A systematic mapping study,” Comput. Stand. Interfaces, vol. 50, no. October 2016, pp. 107–115, 2017, doi: 10.1016/j.csi.2016.10.001. [17] P. H. Nguyen, S. Ali, and T. Yue, “Model-based security engineering for cyber-physical systems : A systematic mapping study,” Inf. Softw. Technol., vol. 83, pp. 116–135, 2017, doi: 10.1016/j.infsof.2016.11.004. [18] J. Luis Fernandez-Aleman, I. Carrion Senor, P. A. Oliver Lozoya, and A. Toval, “Security and privacy in electronic health records: A systematic literature review,” J. Biomed. Inform., vol. 46, no. 3, pp. 541–562, 2013, doi: 10.1016/j.jbi.2012.12.003. [19] M. Irshad, K. Petersen, and S. Poulding, “A systematic literature review of software requirements reuse approaches,” Information and Software Technology, vol. 93. Elsevier B.V., pp. 223–245, Jan. 01, 2018. doi: 10.1016/j.infsof.2017.09.009. [20] M. Glinz, “On non-functional requirements,” Proc. - 15th IEEE Int. Requir. Eng. Conf. RE 2007, 2007, doi: 10.1109/RE.2007.45. [21] N. Afreen, A. Khatoon, and M. Sadiq, “A taxonomy of software’s non-functional requirements,” in Proceedings of the Second International Conference on Computer and Communication Technologies, 2016. doi: 10.1007/978-81-322-2517-1. [22] M. E. Morales-Trujillo, E. O. Matla-Cruz, G. A. García-Mireles, and M. Piattini, “Privacy by design in software engineering: A systematic mapping study,” Av. en Ing. Softw. a Niv. Iberoam. CIbSE 2018, vol. 22, no. 1, pp. 107–120, 2018. [23] E. Kalliamvakou, G. Gousios, K. Blincoe, L. Singer, D. M. German, and D. Damian, “An in- depth study of the promises and perils of mining GitHub,” Empir. Softw. Eng., vol. 21, no. 5, pp. 2035–2071, 2016, doi: 10.1007/s10664-015-9393-5. [24] H. Khalajzadeh, M. Shahin, H. O. Obie, and J. Grundy, How are diverse end-user human-centric issues discussed on GitHub? Association for Computing Machinery, 2022. [25] A. Murgia and B. Adams, “Do developers feel emotions ? An exploratory analysis of emotions in software artifacts,” in MSR 2014, 2014, pp. 262–271. doi: 10.1145/2597073.2597086. [26] J. Ding, H. Sun, X. Wang, and X. Liu, “Entity-level sentiment analysis of issue comments,” in SEmotion’ 18:IEEE/ACM 3rd International Workshop on Emotion Awareness in Software Engineering, 2018, pp. 7–13. doi: 10.1145/3194932.3194935. [27] R. S. C. Junior and G. D. F. Carneiro, “Impact of developers sentiments on practices and artifacts in open source software projects : A systematic literature review,” in Proceedings of the 22nd International Conference on Enterprise Information Systems (ICEIS 2020), 2020, vol. 2, pp. 978–989. doi: 10.5220/0009313200310042. [28] S. Malik and S. Jain, “Semantic ontology-based approach to enhance text classification,” in ISIC 2021, 2021. [29] N. Sanchez-pi, L. Martí, A. Cristina, and B. Garcia, “Improving ontology-based text classification : An occupational health and security application,” J. Appl. Log., vol. 17, pp. 48– 58, 2016, doi: 10.1016/j.jal.2015.09.008. [30] H. Leino-kilpi et al., “Privacy : a review of the literature,” Int. J. Nurs. Stud., vol. 38, 2001. [31] V. Demertzi, S. Demertzis, and K. Demertzis, “An overview of privacy dimensions on Industrial Internet of Things ( IIoT ),” arXiv Prepr. arXiv2301.06172., pp. 1–17, 2023. [32] A. Martínez-ballesté, P. A. Pérez-martínez, and A. Solanas, “The pursuit of citizens’ privacy : A privacy-aware smart city is possible,” IEEE Commun. Mag., no. June, pp. 136–141, 2013, doi: 10.1109/MCOM.2013.6525606. [33] P. Mell, K. Scarfone, and S. Romanosky, “Common vulnerability scoring system,” IEEE Secur. Priv., vol. 4, no. 6, pp. 85–89, 2006, doi: 10.1109/MSP.2006.145. [34] P. Mell, “The generation of software security scoring systems leveraging human expert opinion,” 2022 IEEE 29th Annu. Softw. Technol. Conf., pp. 116–124, 2022, doi: 10.1109/STC55697.2022.00023. [35] J. S. Edu, J. M. Such, and G. Suarez-tangil, “Smart home personal assistants : A security and privacy review,” ACM Comput. Surv., vol. 53, no. 6, 2020, doi: 10.1145/3412383. [36] L. Krehling and A. Essex, “A security and privacy scoring system for contact tracing apps,” J. Cybersecurity Priv., vol. 1, pp. 597–614, 2021. [37] S. Zimmeck et al., “Automated analysis of privacy requirements for mobile apps,” in The 2016 AAAI Fall Symposium Series: Privacy and Language Technologies Technical Report FS-16-04, 2016, vol. 3066, no. 132, pp. 286–296. [38] M. Tahaei, A. Frik, and K. Vaniea, “Privacy champions in sofware teams : Understanding their motivations , strategies , and challenges,” in CHI Conference on Human Factors in Computing Systems (CHI ’21), 2021. doi: 10.1145/3411764.3445768. [39] M. Farhadi, H. Haddad, and H. Shahriar, “Compliance checking of open source EHR applications for HIPAA and ONC security and privacy requirements,” in 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), 2019, vol. 1, pp. 704– 713. doi: 10.1109/COMPSAC.2019.00106. [40] M. Farhadi, G. Pierre, and D. Miorandi, “Towards automated privacy compliance checking of applications in Cloud and Fog environments,” 2021 8th Int. Conf. Futur. Internet Things Cloud, pp. 11–18, 2021, doi: 10.1109/FiCloud49777.2021.00010. [41] B. Kitchenham, O. Pearl Brereton, D. Budgen, M. Turner, J. Bailey, and S. Linkman, “Systematic literature reviews in software engineering - A systematic literature review,” Inf. Softw. Technol., vol. 51, no. 1, pp. 7–15, 2009, doi: 10.1016/j.infsof.2008.09.009. [42] C. Wohlin, “Guidelines for snowballing in systematic literature studies and a replication in software engineering,” ACM Int. Conf. Proceeding Ser., 2014, doi: 10.1145/2601248.2601268. [43] M. J. Page et al., “The PRISMA 2020 statement: An updated guideline for reporting systematic reviews,” BMJ, vol. 372, 2021, doi: 10.1136/bmj.n71. [44] M. E. Paschali, A. Ampatzoglou, S. Bibi, A. Chatzigeorgiou, and I. Stamelos, “Reusability of open source software across domains: A case study,” J. Syst. Softw., vol. 134, pp. 211–227, 2017, doi: 10.1016/j.jss.2017.09.009. [45] M. D. Papamichail, T. Diamantopoulos, and A. L. Symeonidis, “Measuring the reusability of software components using static analysis metrics and reuse rate information,” J. Syst. Softw., vol. 158, p. 110423, 2019, doi: 10.1016/j.jss.2019.110423. [46] F. Kunz and Z. A. Mann, “Finding risk patterns in cloud system models,” IEEE Int. Conf. Cloud Comput. CLOUD, vol. 2019-July, no. Vm, pp. 251–255, 2019, doi: 10.1109/CLOUD.2019.00051. [47] A. R. Hevner, S. T. March, J. Park, and S. Ram, “Design science in information systems research,” MIS Q., vol. 28, no. 1, pp. 75–105, 2004, doi: https://doi.org/10.2307/25148625. [48] J. Heurix, P. Zimmermann, T. Neubauer, and S. Fenz, “A taxonomy for privacy enhancing technologies,” Comput. Secur., vol. 53, pp. 1–17, 2015, doi: 10.1016/j.cose.2015.05.002. [49] F. Dervin and C. Dyer, Constructing methodology for qualitative research. 2016. doi: 10.1057/978-1-137-59943-8. [50] S. Bird, E. Klein, and E. Loper, Natural language processing with Python. O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472, 2009. [51] R. Campos, V. Mangaravite, A. Pasquali, A. Jorge, C. Nunes, and A. Jatowt, “YAKE! Keyword extraction from single documents using multiple local features,” Inf. Sci. (Ny)., vol. 509, pp. 257–289, 2020, doi: 10.1016/j.ins.2019.09.013. [52] J. R. Quinlan, “Simplifying decision trees,” Int. J. Hum. Comput. Stud., vol. 27, pp. 221–234, 1987, doi: 10.1006/ijhc.1987.0321. [53] C. Cortes and V. Vapnik, “Support-vector networks,” Mach. Learn., vol. 20, pp. 273–297, 1995.