<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Application of Natural Language Processing with GQM and AHP approaches for requirements quality assessment *1</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Evgenii Timoshchuk</string-name>
          <email>e.timoshchuk@innopolis.university</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Amartiwi Utih</string-name>
          <email>u.amartiwi@innopolis.university</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sergey Kuznetsov</string-name>
          <email>ser.kuznetsov@innopolis.university</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Atif Farah</string-name>
          <email>f.atif@innopolis.university</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>ZiyoMukhammad Usmonov</string-name>
          <email>z.usmonov@innopolis.university</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Harrif Saliu</string-name>
          <email>h.saliu@innopolis.university</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Innopolis University</institution>
          ,
          <addr-line>Innopolis</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The quality of requirements is difficult to measure in an automated way because of need in reviews and subjective opinion of stakeholders. Plenty of attributes can be used to evaluate requirements quality, but most of them have vague meaning and no concrete metrics for measurement. We proposed a model based on a goal-question-metric approach to identify the most important quality attributes and its metrics, which can be calculated in an automated way. Text of requirements can be analyzed by natural language processing techniques to reveal weak words and phrases, which make sentence subjective and ambiguous. We proposed metrics for such quality attributes as unambiguity, subjectivity, singularity, completeness, and calculated indexes based on the number of words and sentences for the read-ability attribute. Analytic hierarchy process for complex decisions was applied to convert calculated metrics of every requirement into overall quality evaluation of requirement document according to customer's priorities. Model was implemented in a prototype with focusing on adopting NLP techniques for Russian language and supporting external API.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        This work aims to combine the efforts of NLP [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], GQM [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] and AHP [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] approaches for assessing overall quality of
requirement documents in automated way. Techniques to process the words from the document were applied that enable
the system to carry out further analysis on the syntactic and semantic structure of the text. After processing, each
requirement statement and the overall requirement are assigned to numeric values based on calculations carried out by
the system to determine what areas of the requirement document need modification. The ultimate goal of this work was
encapsulating the best of these techniques and methods for measurement requirement quality into a single model and
provide a prototype of a tool for automated validation of real-world requirements against it.
* Copyright © 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
      </p>
    </sec>
    <sec id="sec-2">
      <title>Quality assessment model</title>
      <sec id="sec-2-1">
        <title>The Goal-Question-Metric method based on a system of questions and straightforward answers about properties</title>
        <p>
          evaluation [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]. This approach consists of three main steps: specifying goals, pointing relevant attributes, and providing
measurements. GQM framework helped to define appropriate metrics and estimate the quality of requirements in our
case. The goal should be defined for an object, with a purpose, from a perspective, in an environment. The overall goal of
current the project is to measure quality of requirements and can be formulated by following template:
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>Analyze requirement quality</title>
        <p>for the purpose of improving
with respect to quality attributes
from the viewpoint of project managers
in the context of product development.</p>
      </sec>
      <sec id="sec-2-3">
        <title>In addition, we identified several sub-goals, which should be fulfilled to achieve the primary goal. For instance:</title>
      </sec>
      <sec id="sec-2-4">
        <title>Sub-goal: Analyze requirement unambiguity for the purpose of improving with respect to quality attributes from the viewpoint of project managers in the context of product development.</title>
      </sec>
      <sec id="sec-2-5">
        <title>Question: How many vague words and weak phrases make requirement ambiguous?</title>
      </sec>
      <sec id="sec-2-6">
        <title>Metric: Number of ambiguous words in 1 requirement divided by an average number of words in 1 requirement.</title>
        <p>2.2</p>
        <sec id="sec-2-6-1">
          <title>Quality attributes and their metrics</title>
        </sec>
      </sec>
      <sec id="sec-2-7">
        <title>Our model adopted the five core quality attributes to give final quality measurement for the whole requirement set</title>
        <p>evaluating by syntax and semantic analysis.</p>
        <p>
          Unambiguity. It requires that only one semantic interpretation of the requirement exists. To evaluate the ambiguity of
each requirement, we propose to use dictionaries with a set of words, which indicates ambiguity in the requirement
[
          <xref ref-type="bibr" rid="ref6">6</xref>
          ][
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. As the metric for assessing ambiguity, we used the following formula:
        </p>
      </sec>
      <sec id="sec-2-8">
        <title>Where Nambg – the number of words in the requirement, Ntotal – the number of ambiguous words in the requirement.</title>
      </sec>
      <sec id="sec-2-9">
        <title>Singularity. Statement of the requirement must relate to only one unique requirement that does not overlap with others.</title>
        <p>The presence of several modal words tells us that the requirement contains several meanings and that the statement does
not have the characteristic of singularity. These words may include could, may, might, can, should, will, shall, must,
would, etc. The number of connective words may also indicate the presence of several requirements within one
(mentioned above). As the metric for assessing singularity, we used the following formula:

% = (1 −
where Ntotal – the number of words in the requirement, Nmodal – the number of modal verbs which are not zero, Nconnective–
the number of connective words in the requirement.</p>
      </sec>
      <sec id="sec-2-10">
        <title>Readability. This attribute indicates how easily requirement text can be read and understood, it can be based on the number of syllables per word and number of words per sentence. It can be calculated by Flesch-Kincaid Grade Level [8], Coleman-Liau Grade Level [9], and Smog Grade [10]. We chose the second one:</title>
        <p>where L – average number of letters per 100 words, S – average number of sentences per 100 words. If CLI is around 10,
text is easy to read, but if CLI &gt; 15 text is too difficult for understanding. We made a mapping into percentage
interpretation (if CLI index is more than 17.5, than readability is 0%) by following formula:
Completeness. It requires that the requirement contain all necessary elements, includ-ing constraints and conditions, to
enable the requirement to be implemented [18]. We calculated completeness quality attribute by this formula:

% =   
 
× 100
where Ntotal – the number of elements in the structural template, Nfilled – the number of elements form templated that were
identified in requirement sentence.
2.3</p>
        <sec id="sec-2-10-1">
          <title>Natural Language Processing</title>
        </sec>
      </sec>
      <sec id="sec-2-11">
        <title>NLP is considered a branch of Artificial Intelligence that is concerned with the analysis and interpretation of natural</title>
        <p>language or human language via several techniques such as Parsing, Part of Speech Tagging, Named Entity Recognition,</p>
      </sec>
      <sec id="sec-2-12">
        <title>Tokenization, Sentiment Analysis, etc. NLP system is asked to make unambiguous decisions about word meaning,</title>
        <p>
          category, syntactic structure, and semantic scope [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. In software engineering, requirements can be seen as a set of
sentences written in a specific language, and as any text data requirements may suffer from ambiguity. That’s why NLP
is handy to extract meaning and insight from requirements and, in our case, to get know how good requirements to a set
of quality attributes.
        </p>
        <sec id="sec-2-12-1">
          <title>Feature</title>
        </sec>
      </sec>
      <sec id="sec-2-13">
        <title>One of approaches that can help us in analyzing the priority of quality attributes is Analytical Hierarchy Process (AHP).</title>
      </sec>
      <sec id="sec-2-14">
        <title>In this case, there are 5 attributes used to analyze the requirement. Then we ask our customer to fill this questionnaire about their priority: Table 1: Customer priority</title>
      </sec>
      <sec id="sec-2-15">
        <title>From this table, for example, in the third row we got that unambiguity is 3 levels more important than unsubjectivity and</title>
      </sec>
      <sec id="sec-2-16">
        <title>Unambiguity and completeness are in same level of importancy.</title>
        <p>After that we calculated pairwise matrix, where the score from questionnaire is provided and  
= 1/  and   = 1.
Then we normalized matrix by formula:  
=   /</p>
        <p />
      </sec>
      <sec id="sec-2-17">
        <title>Unambiguity</title>
      </sec>
      <sec id="sec-2-18">
        <title>Singularity</title>
      </sec>
      <sec id="sec-2-19">
        <title>Readability</title>
      </sec>
      <sec id="sec-2-20">
        <title>Unsubjectivity Completeness Unambiguity Singularity</title>
        <p>(
 × 
ℎ  )
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Prototype</title>
      <sec id="sec-3-1">
        <title>To fully support the extraction of metrics for all before-mentioned quality attributes, the prototype should have several features. The prototype is a software tool, which main goal is to perform requirements quality measurement.</title>
      </sec>
      <sec id="sec-3-2">
        <title>Requirements can be of any type expressed in the text form: functional, non-functional, use-cases. The prototype is able</title>
        <p>to perform several functions:
• Integration with project management system to gather textual requirements from it (via API)
• Perform syntax and semantic analysis of said requirements (supporting Russian language [11][12])</p>
      </sec>
      <sec id="sec-3-3">
        <title>The core of the prototype is the Requirement Quality Model, which contains a consistent set of requirements quality</title>
        <p>metrics and is expressed in algorithms on how to measure these metrics and how to draw conclusions (average quality of
a requirement/set of requirements). The prototype provides a requirement engineer with a graphical user interface or
command-line interface to obtain the results of requirements measurement. For NLP were used custom alternative</p>
      </sec>
      <sec id="sec-3-4">
        <title>Python libraries Wordnet [13] and Spacy [14] with Russian language support.</title>
        <p>4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Conclusions</title>
      <sec id="sec-4-1">
        <title>We proposed the model for process of quality assessment was based on NLP tools. Different quality attributes were</title>
        <p>analyzed and adopted. We developed a prototype that capable of reducing the challenges development team face with
interpreting requirements due to ambiguity, subjectivity, poor readability or incompleteness. Suggested approach was
tested on sample of requirements text. Quality metrics for different attributes were calculated according to customer’s
priorities for every require-ment and for overall document. This prototype can be further improved by exploring other</p>
      </sec>
      <sec id="sec-4-2">
        <title>NLP techniques to furnish users with a detailed explanation of why requirements lack quality attributes.</title>
        <p>5</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgements</title>
      <sec id="sec-5-1">
        <title>We thank the organizers of CASE in Tools International Hackathon: Andrey Sadovykh, Alexandr Naumchev, and every</title>
        <p>single person who contributed to the success of this event. We are immensely grateful to Giancarlo Succi for comments
on an earlier version of the proposed model. This research would not have been conducted without efforts of Konstantin</p>
      </sec>
      <sec id="sec-5-2">
        <title>Valeev, the challenge owner, who shed more light on grey areas of this project and provided us with enough resources.</title>
      </sec>
      <sec id="sec-5-3">
        <title>We are also indebted to appreciate Rostelecom IT company for the opportunity to work on industry-related challenge.se a third level heading for the acknowledgements</title>
      </sec>
      <sec id="sec-5-4">
        <title>8. Kincaid, J.P., Fishburne, R.P., Rogers, R.L., &amp; Chissom, B.S. (1975). Derivation of new readability formulas</title>
        <p>(automated readability index, fog count, and flesch reading ease formula) for Navy enlisted personnel. Research</p>
      </sec>
      <sec id="sec-5-5">
        <title>Branch Report 8–75. Chief of Naval Technical Training: Naval Air Station Memphis.</title>
      </sec>
      <sec id="sec-5-6">
        <title>9. Coleman, Meri; and Liau, T. L. (1975); A computer readability formula designed for machine scoring, Journal of</title>
      </sec>
      <sec id="sec-5-7">
        <title>Applied Psychology, Vol. 60, pp. 283–284</title>
        <p>10. McLaughlin, G. Harry (May 1969). "SMOG Grading — a New Readability Formula" (PDF). Journal of Reading. 12
(8): 639–646
11. Kirill Igorevich Gaydamaka, “Characteristics and quality indicators of requirements for the Russian-speaking
engineering environment,” in Conference “Technologies for the Development of Information Systems” (Federal</p>
      </sec>
      <sec id="sec-5-8">
        <title>State Autonomous Educational Establishment of Higher Education "Southern Federal University", 2017)</title>
        <p>12. Victor Konstantinovich Batovrin and Kirill Igorevich Gaydamaka, “Some features of the assessment of the
characteristics of the requirements for systems,” Informatization and communication, no. 4 (2017): 191–196.
13. (2017). ru-wordnet. GitHub repository. Retrieved from https://github.com/jamsic/ru-wor
14. Baburov, Y. (2018). spacy-ru. GitHub repository. Retrieved from https://github.com/buriy/spacy-ru</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Khurana</surname>
            , Diksha &amp; Koli, Aditya &amp; Khatter, Kiran &amp; Singh,
            <given-names>Sukhdev.</given-names>
          </string-name>
          (
          <year>2017</year>
          ).
          <source>Natural Language Processing: State of The Art, Current Trends and Challenges.</source>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Solingen</surname>
            , Rini &amp; Berghout,
            <given-names>Egon.</given-names>
          </string-name>
          (
          <year>1999</year>
          ).
          <article-title>The Goal/Question/Metric Method: A Practical Guide for Quality Improvement of Software Development</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Ayalew</surname>
            , Yirsaw &amp; Masizana,
            <given-names>Audrey.</given-names>
          </string-name>
          (
          <year>2009</year>
          ).
          <source>Requirements Elicitation Techniques Selection Using AHP.. I. J. Comput. Appl.</source>
          .
          <volume>16</volume>
          .
          <fpage>180</fpage>
          -
          <lpage>190</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Basili</surname>
          </string-name>
          , Victor; Gianluigi
          <string-name>
            <surname>Caldiera; H. Dieter Rombach</surname>
          </string-name>
          ,
          <article-title>The Goal Question Metric Approach</article-title>
          , Basili,Victor;GianluigiCaldiera,
          <year>1994</year>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>Daniel</given-names>
            <surname>Jurafsky</surname>
          </string-name>
          &amp;
          <string-name>
            <surname>James H. Martin.</surname>
          </string-name>
          (
          <year>2006</year>
          ).
          <article-title>Speech and Language Processing: An introduction to natural language processing, computational linguistics, and speech recognition</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Chantree</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Nuseibeh</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>De Roeck</surname>
            , Anne &amp; Willis,
            <given-names>Alistair.</given-names>
          </string-name>
          (
          <year>2006</year>
          ).
          <article-title>Identifying Nocuous Ambiguities in Natural Language Requirements</article-title>
          .
          <source>Proceedings of 14th IEEE International Requirements Engineering Conference (RE'06)</source>
          .
          <fpage>59</fpage>
          -
          <lpage>68</lpage>
          .
          <fpage>10</fpage>
          .1109/RE.
          <year>2006</year>
          .
          <volume>31</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Massey</surname>
            , Aaron &amp; Rutledge, Richard &amp; Antón, Annie &amp; Swire,
            <given-names>Peter.</given-names>
          </string-name>
          (
          <year>2014</year>
          ).
          <article-title>Identifying and classifying ambiguity for regulatory requirements</article-title>
          .
          <source>2014 IEEE 22nd International Requirements Engineering Conference, RE 2014 - Proceedings. 83-92</source>
          .
          <fpage>10</fpage>
          .1109/RE.
          <year>2014</year>
          .
          <volume>6912250</volume>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>