<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Renault, S.; Mndez, O.; Franch, X.; Quer, C. A Pattern-based Method for building Requirements Docu-
ments in Call-for-tender Processes Int J Comput Sci Appl</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Research on NLP for RE at UPC: a Report</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Carme Quer</string-name>
          <email>cquer@essi.upc.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Dolors Costal Universitat Politecnica de Catalunya (UPC) Barcelona</institution>
          ,
          <country country="ES">Spain</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Ricard Borrull Universitat Politecnica de Catalunya (UPC) Barcelona</institution>
          ,
          <country country="ES">Spain</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Universitat Politecnica de Catalunya (UPC) Barcelona</institution>
          ,
          <country country="ES">Spain</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Xavier Franch Universitat Politecnica de Catalunya (UPC) Barcelona</institution>
          ,
          <country country="ES">Spain</country>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>[Zhu05] Zhu, X.; Jin, Z. Inconsistency measurement of software requirements speci cations: an ontology-based approach 10th IEEE International Conference on Engineering of Complex Computer Systems (ICECCS)</institution>
          ,
          <addr-line>402-410, 2005</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2018</year>
      </pub-date>
      <volume>6</volume>
      <issue>5</issue>
      <abstract>
        <p>[Team Overview] The Software and Service Engineering Group (GESSI) of UPC has traditionally conducted research in many elds of software engineering. [Research Plan on NLP] As a result of our participation in the OpenReq project, natural language processing (NLP) has become one of our highest priority research elds. We are using NLP for interdependency detection and requirements reuse, being the center piece of both tasks the identi cation of similar requirements.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Overview</title>
      <p>The project focuses on using arti cial intelligence-based techniques that proactively support stakeholders, both
as individuals and as groups, within the scope of RE.</p>
      <p>Inside OpenReq, UPC is working, among others, in the interdependency detection and requirements reuse
part. We envisage using NLP for both tasks. In particular, this is of special importance for two of the trials of the
project. One of them deals with bid projects of thousands of requirements, where they want to identify similar
and dependent requirements in the same bid and from previous bids, and to reuse requirements knowledge of
previous bids in the bit at hand; it is expected that this will reduce the cost of the bid phase and the requirements
analysis phase by 10%. The other trial has a database with almost a hundred thousand requests (containing
requirements and bugs). They have not that much interest in the reuse functionality, but in the identi cation
of similar and dependent requests. It is estimated that at least 1000 requests (annually) could improve their
management or be avoided.
2.1</p>
      <sec id="sec-1-1">
        <title>Interdependency Detection</title>
        <p>Interdependency detection will combine the identi cation of explicit and non-explicit interdependencies in the
requirements. By explicit interdependencies, we mean explicit references in a requirement to other requirements,
while by non-explicit interdependencies we mean those ones that are not explicitly stated in the requirements
but that can be identi ed by analyzing the requirements both from a syntactic and semantic point of view.</p>
        <p>For the detection of explicit interdependencies, we aim at following the approaches used in the well-known area
of cross-references detection and resolution. In the rst approach, we will identify natural language patterns that
are used in requirements text to refer to other requirements and use them to propose new interdependencies.
This approach will be based on works such as those of [Bre08] and [Pal03]. In the future, though, we will consider
a second approach that consists in the possibility to extract these patterns automatically as done in [San17].
Although [San17] extract patterns used for cross-referencing in legal texts, we believe that a similar approach
for automatic extraction of such patterns can be extrapolated to requirements speci cations.</p>
        <p>Regarding the detection of non-explicit interdependencies, the rst step will be the identi cation of pairs of
similar requirements (both from a syntactic and semantic point of view). Similarity detection (i.e., syntactic
similarity) and paraphrase detection (i.e., semantic similarity) are approaches closely related to detection of
interdependencies between requirements.</p>
        <p>The relationship between similarity and paraphrase detection and interdependency detection is evident in the
case where two requirements have almost exactly the same formulation. Imagine the requirements "The user
interface should use the letter type Arial letter type." and "The user interface should use the letter type Calibri
letter type.". It is clear that these two requirements cannot be used in the same system (since it is not possible
to use two letter types for the whole user interface), so these requirements are related by an OR interdependency
(using the terminology proposed by Carlshamre [Car01]). However, even in other cases there are commonalities.
As an example, if requirement R1 states that "It shall be possible to lter personal data by name and address"
and requirements R2 states that "The system shall allow to lter personal data by age", it would probably be
convenient to treat these two requirements at the same time to save development resources. This example can
be considered as an ICOST interdependency, according again to the terminology described in [Car01]. Other
examples of interdependencies that can trigger a similarity analysis at a lexical level can be, e.g., the con icting
requirements "The button shall be blue." and "The button shall be red.".</p>
        <p>Therefore, identifying syntactically and semantically similar requirements could be used as a basis to identify
related requirements. As there are several well-known components already developed to detect similar texts in
English, our aim is to select one of these components to be integrated and expanded in OpenReq. At the time
being, we envision adding some pre-processing, such as clustering or classi cation techniques, to the use of the
component to reduce the set of requirements for which the component will run: if we reduce the set to which a
requirement is compared to just the group of requirements it pertains according to a clustering algorithm, the
possibilities to achieve better matches will improve. The components we are currently evaluating are:
Cortical (http://cortical.io/). Among its functionalities, it provides a method to measure the similarity
using the Cosine metric between two given texts. In this case, the method is not parameterizable.
DKPro (https://dkpro.github.io/). DKPro provides, among others, a comprehensive repository of text
similarity measures ranging from ones based on simple n-grams and common subsequences to more complex
ones based on high-dimensional vector comparisons and structural, stylistic, and phonetic measures.
Gensim (https://radimrehurek.com/gensim/tutorial.html). It includes implementations for popular
algorithms such as LSA, LDA and RP. Gensim allows loading a corpus of texts to which a sentence can be
compared. The calls to the algorithms are parameterizable.</p>
        <p>Semilar (http://www.semanticsimilarity.org/). The methods o ered by Semilar, which are completely
parameterizable, range from simple lexical overlap methods to methods that rely on word-to-word similarity
metrics to more sophisticated fully unsupervised methods that derive the meaning of words and sentences
such as LSA and LDA to kernel-based methods for assessing similarity. In addition, one can select the
tokenizer, tagger, stemmer and parser to be used as pre-processing (having as options, for instance, the
libraries OpenNLP, Stanford parser and WordNet).</p>
        <p>Scikit-learn (http://scikit-learn.org/stable/documentation.html). It allows the transformation of texts into
vectors, using TF-IDF among other algorithms, and the measurement of the similarity over them using the
Cosine metric.</p>
        <p>However, we do not discard to add new components to this list if the results of our evaluations are not good
enough or if other researchers know of other components that give better results.</p>
        <p>The second step of non-explicit interdependencies section is to improve and expand the requirements similarity
detection with further features, such as:</p>
        <p>Creating a speci c list of synonyms that are domain dependent, so that the similarity algorithms can know
when two words that in principle are not synonyms are actually synonyms in a speci c domain.
Constructing models that can help to detect interdependencies by relating concepts on this model. This
is similar to the ontology used in [Zhu05], but in our model, relationships will broaden the scope of the
previous work, which is focused on inconsistencies. For instance, if we know that technologies A and B are
incompatible, A and B will be related in this model as con icting. Therefore, when these two technologies
are used at the same time in a single project, we can extrapolate that the requirements stating technologies
A and B are actually con icting and a new interdependency of this type will be created among them.
2.2</p>
      </sec>
      <sec id="sec-1-2">
        <title>Requirements Reuse</title>
        <p>To achieve requirements reuse, we plan to adapt the PABRE framework [Fra13] [Ren09], which stands for
PAtterns-Based Requirement Elicitation, to OpenReq. The core asset in the PABRE framework are Software
Requirement Patterns (SRPs). In a nutshell, an SRP is a set of Templates that pursue the same goal in a system
to be developed. Each template corresponds to the text to be used as a requirement and, if necessary, some
optional Parameters (with a set of possible values: range of numbers to be used, possible strings, etc.) that need
to be instantiated when applying the pattern. As a goal can be achieved in di erent ways, an SRP consists of
several Forms, each one representing a di erent solution for achieving the goal. In additon, Forms are organized
into Parts, each of them being a template: a Fixed Part, which is always applied if the form is chosen, and some
Extended Parts, which may be applied or not. Finally, SRPs are classi ed using Classi cation Schemas, which
are hierarchies of classi ers that facilitate the organization of SRPs.</p>
        <p>In OpenReq, we will adapt the structure of SRP de ned in the PABRE framework to the needs of the project.
Additionally, we will need to de ne a catalogue of SRPs for OpenReq (possibly one per trial). Therefore, it is
important that the population of this catalogue will combine automatic extraction with expert assessment. The
automatic extraction will need some type of NLP processing, and probably machine learning. For instance, we
believe topic modelling could be used to extract the common topics found in requirements speci cations,
clustering techniques to group requirements, and syntactic and semantic similarity identi cation to nd requirements
that are repeated along the di erent speci cations. These techniques might help to populate the SRP catalogue.</p>
        <p>Finally, we envisage di erent ways in which this catalogue of patterns could be used in OpenReq. Here, we
present the ones that will need the support of NLP:
1. Propose SRP that are dependent. Given a requirement, using a similarity algorithm it will be possible to
recover similar SRP (by analyzing the templates in the SRPs with the given requirement). Then, once we
know the similar SRPs to a given requirement, if they are dependent to other SRPs, we can propose these
dependent SRP so they can be reused. For instance, R1 is similar to a template in the requirement pattern
SRP2, which is dependent on the requirement pattern SRP3. In that case, SRP3 could be proposed to be
reused for R1.
2.3</p>
      </sec>
      <sec id="sec-1-3">
        <title>Further Issues</title>
        <p>It is important to highlight here the fact that OpenReq will deal not only with English language, but also with
Italian and German, since one of the trials deals with requirements written in Italian, and another trial deals with
requirements (partially) written in German. This diversity of languages used to write the text analyzed supposes
a challenge for the project. The majority of the existing NLP approaches target the English language, as they
are trained and validated using English text corpora. Although NLP approaches and software libraries exist
for both languages [Bas15] [Reh12], we have to check that their performances (e.g., precision) are not inferior
compared to the well-established, English-based ones (specially when used for parsing and similarity detection).</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>Acknowledgements References</title>
      <p>The work presented in this paper has been conducted within the scope of the Horizon 2020 project OpenReq,
which is supported by the European Union under the Grant Nr. 732463.
[Bas15] Basili R., Bosco C., Delmonte R., Moschitti A.; Simi M. Harmonization and development of resources
and tools for Italian natural language processing within the PARLI Project Springer, 2015.
[Bre08] Breaux, T.; Anton, A. Analyzing regulatory rules for privacy and security requirements IEEE Transaction
on Software Engineering, 34(1), 520, 2008.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [Car01]
          <string-name>
            <surname>Carlshamre</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Sandahl</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Lindvall</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Regnell</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ; och Dag,
          <string-name>
            <surname>J. N.</surname>
          </string-name>
          <article-title>An industrial survey of requirements interdependencies in software product release planning 5th</article-title>
          <source>IEEE International Symposium on Requirements Engineering</source>
          ,
          <fpage>84</fpage>
          -
          <lpage>91</lpage>
          2001.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [Fra13]
          <string-name>
            <surname>Franch</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Quer</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Renault</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Guerlain</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Palomares</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <article-title>Constructing and using software requirement patterns Managing requirements</article-title>
          knowledge Springer,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [Pal03]
          <string-name>
            <surname>Palmirani</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Brighi</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ; Massini M.
          <source>Automated extraction of normative references in legal texts 9th international conference on arti cial intelligence and law (ICAIL03)</source>
          ,
          <volume>105106</volume>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>