<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Privacy Requirements Detector From User Stories</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Francesco Casillo</string-name>
          <email>fcasillo@unisa.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Vincenzo Deufemia</string-name>
          <email>deufemia@unisa.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Carmine Gravino</string-name>
          <email>gravino@unisa.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science, University of Salerno</institution>
          ,
          <addr-line>Via Giovanni Paolo II, 132, Fisciano(SA),84084</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>User Stories</institution>
          ,
          <addr-line>Natural Language Processing, Deep Learning, Transfer Learning</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2022</year>
      </pub-date>
      <fpage>21</fpage>
      <lpage>24</lpage>
      <abstract>
        <p>In the context of requirements engineering, stakeholders are often unaware of identifying and managing privacy and security requirements. The purpose of this paper is to present a tool, namely PReDUS, for the detection of privacy content from user stories. The core of the tool is the use of deep learning algorithms that exploit Natural Language Processing techniques and linguistic resources. Identifying non-functional requirements (NFRs) from stakeholders during requirements engineering (RE) phase can be a problematic activity due to several factors [1]. Failure to take care in documenting and defining these requirements can lead to defects in software development [2]. Privacy is one of the NFRs that has become more important in recent years, in part because the needs of businesses increasingly require the protection and safeguarding of their data [3]. Although privacy requirements are inherent in the software development process, stakeholders are often unable to recognize them from customer requirements [ 4].</p>
      </abstract>
      <kwd-group>
        <kwd>ments defined as User Stories</kwd>
        <kwd>we introduce</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        PReDUS, a tool exploiting recent Natural Language
Processing (NLP) technologies to extract features which are then used by a convolutional neural
networks (CNNs) based model, obtained by employing Transfer Learning technique [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. PReDUS’s approach</title>
      <p>
        PReDUS is a web application that aims to provide some insights about the privacy requirements
contained in a User Story (US) [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Figure 1 shows the main components of PReDUS to identify
privacy content. The text of the input US is first processed by the NLP Toolkit in order to
capture both the grammatical structure of the text and the meaning being conveyed. Then,
the output of this phase becomes input to a Transfer Learning model, which allows to involve
two CNNs to accomplish a semantic and syntactic analyses, respectively. In this way, both
syntax and meaning are analyzed to detect privacy content. In spite of a simple user interface
(see Figure 2), the tool is highly recommended for stakeholders with a few experience in the
identification of privacy content: the detection of this NFR in the requirements definition phase
could anticipate significant issues that can lead to software malfunctions [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
I/O Data. The algorithm designed to make PReDUS work requires a US as input, whose
format is, as designed in literature [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], as the following:
      </p>
      <p>“As a [role], I want to [feature], so that [reason]”.</p>
      <p>As an example: “As a project manager I want to access to data about my colleagues
progress so I can better report our success and failures.”</p>
      <p>The US to be analyzed will be texted into the form provided in the first page of the web
application (see Figure 2a). The output of PReDUS is the prediction of the deep learning
algorithm developed to detect privacy content, supported by further useful information like the
privacy words identified, the categories they belong to, and the description of the category to
specify why the considered word is related to privacy matters (see Figure 2b).
Preprocessing. SpaCy1 is used to extract features exploited by the algorithm to make the
prediction. In particular, the algorithm uses two types of analysis, summarized as follows:
1. Syntax Analysis. The input US is first tokenized by using nlp object2 of spaCy. It is
essentially a pipeline of several text pre-processing operations applied to extract:
• Entities3. SpaCy provides an eficient statistical system that can assign labels to
individual tokens or groups of tokens that are contiguous. It can recognize a wide
range of named or numeric entities, which include people, organizations, languages,
events and so on.
• Parts of Speech4. The Part of speech (POS) tagging is the process of marking a
word in the text to a particular part of speech based on both its context and definition.</p>
      <p>
        In brief, it is the process of identifying a word as nouns, pronouns, verbs, and so on.
• Dependencies4. Dependency parsing is the process of analyzing the grammatical
structure of a sentence based on the dependencies between its words. Words are
replaced by tags, called dependency tags, that represent the relationship between
two or more words.
2. Semantic analysis. It is carried out on the individual terms of the US to search for
terms strongly related to privacy issues aiming to reinforce what has been done in the
previous step and to expand the number of features which will be used for privacy content
predictions. The dictionary proposed in [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], and specifically developed for privacy content
analysis, has been used to facilitate the search of privacy-related terms, and to obtain the
privacy category each term belongs to.
      </p>
      <p>
        Privacy Detection. The CNNs based model to predict whether the US contains or not privacy
content is built using Transfer Learning, an advanced deep learning technique that consists in
reusing the knowledge developed by an algorithm to solve a task and applying it to a diferent
but related problem (see [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] for details). In particular, a neural network developed for the
detection of privacy disclosures in an unstructured text is used to increase the number of
features exploited by our prediction model. Further details about the implemented CNNs can
be found in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. The classification of USs is based on the assumption that if the US contains
privacy disclosures and contains privacy-related words, then the US is highly related to privacy
issues. The result tell us if the considered US contains or not privacy content.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Demo plan</title>
      <p>Environment Configuration. To use PReDUS you must start the server by running a
Python5 script. To make the script work, you need to install nine libraries that are useful for
both the User Interface and the US analysis. In particular, Flask and IPython were used to create
the UI, while the remaining libraries allowed us to handle data (Pandas), to process them (NLTK,
1https://spacy.io
2https://spacy.io/usage/models
3https://spacy.io/usage/spacy-101#annotations-ner
4https://spacy.io/usage/linguistic-features
5https://www.python.org/
spaCy, Numpy), and to get the model work (Tensorflow, Pickleshare, Keras). PReDUS and the
libraries are available on Github6.</p>
      <p>
        Privacy detection execution. As shown in Figure 2a, first the user is asked to enter a US in
the text box and then he/she can start the privacy content detection process by pressing the
“Analyze” button. The application server processes the US given as input, performs semantic
and syntactic analyses whose results are used as input to the CNNs based Transfer Learning
model, whose output is the prediction regarding privacy content. In particular, as shown in
Figure 2b the user is informed about the presence of privacy content, the identified privacy
terms, the categories those terms belong to and its description to explain why that terms are
related to privacy matters. Further explanations about privacy categories are explained in [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
Usage examples. To demonstrate the usefulness and efectiveness of PReDUS we show three
use case scenarios from the Web application domain. The first US given in input to PReDUS
contains privacy aspects and PReDUS highlight the words and the categories they belong to,
similar to Figure 2b. The second US considers a borderline example, where the input US does
not contain privacy aspects but it contains privacy-related words. PReDUS reports the missing
privacy-related issues, and the user is made aware of the privacy words detected during the
semantic analysis. Finally, the third US is obtained by performing two changes to the previous
US, which modify the sense of the sentence also from the privacy point of view. PReDUS is
capable of detecting the meaning of the modified sentence and highlights the privacy contents.
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>D.</given-names>
            <surname>Méndez Fernández</surname>
          </string-name>
          , et al.,
          <article-title>Naming the pain in requirements engineering - contemporary problems, causes, and efects in practice, Empirical software engineering (</article-title>
          <year>2017</year>
          )
          <fpage>2298</fpage>
          -
          <lpage>2338</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>S. H.</given-names>
            <surname>Houmb</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Islam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Knauss</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Jürjens</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Schneider</surname>
          </string-name>
          ,
          <article-title>Eliciting security requirements and tracing them to design: an integration of common criteria, heuristics</article-title>
          , and UMLsec, Requirements Engineering (
          <year>2010</year>
          )
          <fpage>63</fpage>
          -
          <lpage>93</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>P.</given-names>
            <surname>Anthonysamy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Rashid</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Chitchyan</surname>
          </string-name>
          ,
          <article-title>Privacy requirements: Present future</article-title>
          ,
          <source>in: Int. Conference on Software Engineering: SEIS track</source>
          ,
          <year>2017</year>
          , pp.
          <fpage>13</fpage>
          -
          <lpage>22</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>F.</given-names>
            <surname>Casillo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Deufemia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Gravino</surname>
          </string-name>
          ,
          <article-title>Detecting privacy requirements from user stories with NLP transfer learning models</article-title>
          ,
          <source>Information and Software Technology</source>
          <volume>146</volume>
          (
          <year>2022</year>
          )
          <fpage>106853</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>L.</given-names>
            <surname>Torrey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Shavlik</surname>
          </string-name>
          ,
          <article-title>Transfer learning, in: Handbook of research on machine learning applications and trends: algorithms, methods, and techniques</article-title>
          ,
          <source>IGI global</source>
          ,
          <year>2010</year>
          , pp.
          <fpage>242</fpage>
          -
          <lpage>264</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>M.</given-names>
            <surname>Cohn</surname>
          </string-name>
          ,
          <source>User Stories Applied: For Agile Software Development, Addison Wesley</source>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>I.</given-names>
            <surname>Sommerville</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Sawyer</surname>
          </string-name>
          , Requirements Engineering:
          <string-name>
            <given-names>A Good</given-names>
            <surname>Practice</surname>
          </string-name>
          <string-name>
            <surname>Guide</surname>
          </string-name>
          , Wiley,
          <year>1997</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>G.</given-names>
            <surname>Lucassen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Dalpiaz</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. M. van der Werf</surname>
          </string-name>
          , S. Brinkkemper,
          <article-title>The use and efectiveness of user stories in practice</article-title>
          , in: Req. Eng.: Found. for Soft. Quality, Springer,
          <year>2016</year>
          , pp.
          <fpage>205</fpage>
          -
          <lpage>222</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>A. J.</given-names>
            <surname>Gill</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Vasalou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Papoutsi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. N.</given-names>
            <surname>Joinson</surname>
          </string-name>
          ,
          <article-title>Privacy dictionary: A linguistic taxonomy of privacy for content analysis</article-title>
          ,
          <source>in: Proceedings of Inter. Conference on Human Factors in Computing Systems (CHI)</source>
          , ACM,
          <year>2011</year>
          , pp.
          <fpage>3227</fpage>
          -
          <lpage>3236</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>