<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>well can IBM's ”Requirements Quality Assistant” review automotive requirements?</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Amalinda Post</string-name>
          <email>Amalinda.Post@de.bosch.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Thomas Fuhr</string-name>
          <email>Thomas.Fuhr@de.bosch.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>In: F.B. Aydemir</institution>
          ,
          <addr-line>C. Gralha, S. Abualhaija, T. Breaux, M. Daneva, N. Ernst, A. Ferrari, X. Franch, S. Ghanavati, E. Groen</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Proceedings of REFSQ-2021 Workshops, OpenRE</institution>
          ,
          <addr-line>Posters and Tools Track, and Doctoral Symposium, Essen</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>R. Guizzardi</institution>
          ,
          <addr-line>J. Guo, A. Herrmann, J. Horkof, P. Mennig, E. Paja, A. Perini, N. Seyf, A. Susi, A. Vogelsang (eds.): Joint</addr-line>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Robert Bosch GmbH</institution>
          ,
          <addr-line>Postfach 300240, 70442 Stuttgart</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2021</year>
      </pub-date>
      <abstract>
        <p>In the last years natural language processing evolved immensely. Tools based on that technology, like e.g., the Requirements Quality Assistant (RQA) by IBM or QVscribe by QRA claim to improve the requirements process for natural language requirements. In this paper we evaluate whether RQA supports a requirements engineer at Bosch in evaluating the quality of automotive requirements. In our case study, we come to the conclusion, that pure syntactical checks, based on the INCOSE rules of writing requirements, are not suficient to address the needs of Bosch. We think that tool vendors should integrate semantic checks to their syntactical ones to further increase the practical use of their tools.</p>
      </abstract>
      <kwd-group>
        <kwd>case study</kwd>
        <kwd>requirements analysis</kwd>
        <kwd>natural language processing</kwd>
        <kwd>Requirements Quality Assistant</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Requirements are the foundation of projects in the automotive domain. Misinterpretation
of or defects in requirements lead to a substantial amount of rework or even safety critical
failures if not detected in time. To minimize the risk, the development process asks for a
review before starting the implementation. In the review the requirements engineer invites all
afected engineers, that will implement or test the requirements in the future, to get a common
understanding and find defects. The review process takes about one week, as everyone has a
tight schedule and needs some time to review. For an agile team that seems like an eternity. It
would be a great improvement to have a tool that gives feedback in real-time.</p>
      <p>
        There are natural language tools available, that claim to assess the quality of requirements in
real-time. One such tool is the Requirements Quality Assistant (RQA) by IBM [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. In this case
study, we raised the question whether RQA could support the review process of automotive
requirements at Bosch. We chose to evaluate RQA, as this tool can be easily integrated in the
Requirements Management Tools ”Doors” and ”Doors Next Gen” most commonly used in the
automotive domain at Bosch.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Case Study Design</title>
      <sec id="sec-2-1">
        <title>2.1. Problem description and study goals</title>
        <p>
          The Requirements Quality Assistant (RQA) [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] by IBM shall support a requirements engineer in
writing good requirements. IBM claims that in using RQA the review costs can be decreased by
up to 25%. In order to assess whether RQA can support a requirements engineer at Bosch in the
automotive domain we raised several questions:
Question 1 Does RQA support a requirements engineer in reviewing against Bosch’s
requirements quality criteria?
Question 2 Does RQA give further valuable feedback (in addition to Bosch’s quality criteria)?
Question 3 How fast does RQA compute its analysis results?
        </p>
        <p>
          To address Question 1, Table 1 gives an overview on Bosch’s requirements quality criteria.
These criteria are the minimum set of quality criteria, all requirements written at Bosch shall
adhere to. In the diferent company parts often further quality criteria, like e.g., feasibility
or unambiguity, are added in the review checklists. Bosch’s quality criteria are derived from
standards like, e.g., IEEE830 [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] and ISO26262 [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]. Note that some of the criteria, like, e.g.,
clearness can be checked per requirement, while others like, e.g., completeness or consistency
can only be checked for the whole set of requirements. This diferentiation is important, as
RQA currently checks every requirement on its own.
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Selection Criteria for Requirements</title>
        <p>In the first step we selected requirements documents from diferent BOSCH projects of the
automotive domain. To get a representative sampling, we applied stratified sampling over the
automotive application domains power train, driving assistance and car multimedia. We then
used convenience sampling to select a project out of every stratum.</p>
        <p>Each project had several requirements documents, some consisting of more than 100 pages.
In order to get a representative sample we asked the corresponding requirements engineers
to give us a subset of their requirements such that, first, the subset contained roughly 50
requirements, second, the subset was representative for their application domain and, third, the
subset contained all requirements in one or more chapters. We asked the requirements engineers
to give us chapters of requirements instead of randomly chosen, unrelated requirements, as
in our experience a requirement should not be interpreted out of its context. This way we
obtained the following four requirements sets: a set specifying a throttle functionality  1 (of a
powertrain project), a set specifying radio functionality  2 (of a car multimedia project), one
specifying cruise control functionality  3 (of the driving assistance domain), and one regarding
variant coding  4 (needed in every automotive domain).  , the union of these sets, consisted of
173 artefacts: 15 headings and 158 requirements.
Correct
Consistent
and Free
of overlap
Clear
Verifiable
Traceable
Assessable</p>
        <p>Description
The requirements set fully implements all requirements at
the previous hierarchical level and the requirements of all
stakeholders have been considered. I.e., it contains all
relevant functional and non-functional requirements, boundary
conditions, and implicitly expected requirements.</p>
        <p>The requirement is accepted as correct and valid for the
stakeholder of the respective level. It is correctly derived
from its superior specification.</p>
        <p>Each requirement is consistent with the other requirements.</p>
        <p>There is no overlap between requirements.
x
The requirements specification is comprehensible for the x
intended stakeholders. All requirements are quantified, with
defined boundary conditions.</p>
        <p>The implementation of the requirement is possible and can x
be verified (technically assessable).</p>
        <p>The origin of the requirement, its implementation all the x
way through testing, and the change history is traceable.</p>
        <p>The requirement can be evaluated and prioritized, for ex- x
ample, in terms of promised performance/delighting factors,
and risk/stability
criteria
per req.</p>
        <p>criteria per
req. set
x
x
x
x
x
x
x</p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Case Study Approach</title>
        <p>The setting of the case study is depicted in Figure 1: As input we used 1, …  4. In the first step,
these requirements sets were analysed in a manual review. The Findings were documented per
requirement in the requirements management system Doors Next Gen in the attribute ”Manual
Review”. After that we started RQA. The tool documented its findings per requirement in the
attribute ”Issues Found by RQA” and a quality score in the attribute ”Score”. After that we
compared the Findings of the manual review with the findings by the tool and documented the
result in the attribute ”Evaluation Result” and ”Eval Comment”.</p>
        <p>We diferentiated as ”Evaluation Result” between: no/bad fit, fits&gt;=60%, fits 100% . We chose fits
100% , if the Findings of the manual review and the RQA Evaluation Result were (semantically)
identical. We chose fits&gt;=60% if the majority of the findings of the manual review and RQA
iftted to each other, otherwise we chose no/bad fit . Note, that this implies that we also set the
”Evaluation Result” to no/bad fit , if the reviewers had no findings in the manual review, but the
tool added findings in ”Issues Found by RQA”.</p>
      </sec>
      <sec id="sec-2-4">
        <title>2.4. Case Study Evaluation</title>
        <p>To address the first Study Question we evaluated how well the Findings of the manual review
and the RQA evaluation matched. Figure 2 depicts the result for the attribute ”Evaluation Result”.
Note, that in  2 the percentage of fits&gt;=60% findings was the highest. This requirements set
was a set taken from a new project, where the requirements were a first draft that hadn’t been
reviewed before. Most fitting findings were addressing ”compound requirements”, i.e., findings,
noting that the requirement was not atomic.  1,  3 and  4 were taken of projects in a much
later development phase. In later development phases, such findings are less frequent in manual
reviews, as the requirements are then already splitted into atomic parts.</p>
        <p>
          The findings of 132 of 158 requirements matched with ”no/bad fit”. Thus, the findings of the
manual review and the tool difered vastly. RQA implements checks derived of the INCOSE Guide
for Writing Requirements [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]. The guide is about how to express textual requirements in the
context of systems engineering. There are rules like ”Avoid vague terms”, ”Avoid combinators”,
”Avoid the use of ’not’”. The rules are mainly meant for junior requirements engineers to write
better requirements.
        </p>
        <p>However, it seems the checks do not really address many of Bosch’s quality criteria. Bosch’s
quality criteria are mainly content related, like completeness, consistency, correctness, verifiability .
Many of the manual review findings were of that sort, i.e., the reviewer stated that information
was missing. The tool makes syntactical checks for natural language grammar—but these won’t
be suficient to address content related properties. Semantic checks on the whole requirements
set would be needed, to address the need.</p>
        <p>To address the second Study Question, we checked whether RQA detected findings, that
were not detected in the manual review, and found valuable by the requirements engineer.
We saw that many of the rules resulted in false positives. For example the rule ”Avoid the
use of ’not’” seems to be dificult in the automotive domain. In the automotive domain, the
requirements are very detailed. Already on system level, there are &gt; 1000 requirements for one
single control unit. Often the signals on system level are already defined with a signal table.
So, for a signal   there is a definition of the enumeration values like, e.g., ”of, init,
active, standby”. A formulation of a requirement like ”If the   is not ’active’, then...”
is preferred to a formulation like ”If the   is in ’of’ OR ’init’ OR ’standby’, then...
The reason for that is both readability and maintainability. Just think in a later system release a
further   value ”shutDown” is added. In the first formulation only a small subset of
requirements will have to change - in the second formulation many more requirements have to
be adapted, resulting in a huge efort also in the later development steps.</p>
        <p>A check that was considered as helpful, with very few false positives, was the check for
”compound requirements”, which detected non-atomic requirements. Another check that would
be helpful is the check for missing limits and missing units. However, currently this check
is done for single every requirement. In the automotive domain, this is not a good strategy.
When having 1000 requirements, writing the limits and units of a signal in every requirement
will eventually lead to inconsistent requirements. Instead they are often defined centrally once
per signal. So instead of doing the check per requirement, it should be done for the whole
requirements set.</p>
        <p>In the case study, the requirements engineers came to the conclusion that in the current state,
RQA didn’t support them in the review. It didn’t address semantic checks, and many rules lead
to false positives.</p>
        <p>Regarding the third study Question, RQA is fast. It took some seconds to analyse the
requirements set and write the analysis result into the requirements management system. Compared to
a manual review (which in case of an inspection takes about 1 week), this is a huge improvement.
Thus, if RQA would address the quality criteria of Table 1, the tool could help to vastly shorten
the review process.</p>
      </sec>
      <sec id="sec-2-5">
        <title>2.5. Threats to validity</title>
        <p>
          In this section, we analyze threats to validity defined in Neuendorf [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ], Krippendorf [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ], and
Wohlin [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ].
        </p>
        <sec id="sec-2-5-1">
          <title>2.5.1. External Validity</title>
          <p>
            Sampling Validity [
            <xref ref-type="bibr" rid="ref5">5</xref>
            ] This threat arises if the sample is not representative for the
requirements. In order to minimize this threat we used the selection procedure described in Section 2.3,
i.e. requirements from four diferent projects and three automotive application domains. A
limitation of the case study is that the samples are small and we only used requirements of
BOSCH projects. Thus we cannot extend our results to the whole automotive domain but only
for BOSCH’s automotive domain.
          </p>
          <p>
            Interaction of Selection and Treatment [
            <xref ref-type="bibr" rid="ref7">7</xref>
            ] This threat arises if the requirements engineer
in this study (see Section 2.3) are not representative for BOSCH requirements engineers. The
requirements engineers in this case study were experienced requirements engineers, with more
than 10 years of experience in requirements engineering in the automotive domain. Thus, it
may be that the results difer for less experienced requirements engineers.
          </p>
        </sec>
        <sec id="sec-2-5-2">
          <title>2.5.2. Internal Validity</title>
          <p>
            Selection [
            <xref ref-type="bibr" rid="ref7">7</xref>
            ] This threat arises due to natural variation in human performance. The
requirements engineer in this study (see Section 2.3) could have been especially good or bad
in reviewing requirements. For two sets the same evaluator did the manual review on the
requirements. For two other sets the manual review findings were taken from the reviews
by the project team. To minimize the threat, the requirements engineers double checked the
evaluation results of the other samples. If they found difering review findings in the manual
review, they discussed the findings.
          </p>
        </sec>
        <sec id="sec-2-5-3">
          <title>2.5.3. Construct Validity</title>
          <p>
            Experimenter expectancies [
            <xref ref-type="bibr" rid="ref7">7</xref>
            ] Expectations of an outcome may inadvertently cause the
evaluators to view data in a diferent way. The evaluators have no benefit or disadvantage from
a good or bad outcome for the applicability of the RQA tool. Thus, such psychological efects
probably did not afect the evaluators.
          </p>
          <p>
            Semantic Validity [
            <xref ref-type="bibr" rid="ref6">6</xref>
            ] This threat arises if the analytical categories of texts do not correspond
to the meaning these texts have for particular readers. In the case study the same evaluator
mapped the categories no/bad fit, fits&gt;=60%, fits 100% to the requirements for all four sets of
requirements.
          </p>
          <p>Note, that in the study we didn’t address, whether the tool might help when formulating
requirements. Another setting would be needed to investigate that aspect.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Conclusion</title>
      <p>
        This case study investigates the question whether in practice RQA supports a requirements
engineer in the automotive domain at Bosch in reviewing requirements. Based on the results of
Section2.4 we come to the conclusion, that in the current state RQA does not suficiently address
the need. The main reason is that the tool mainly checks for syntactical rules, derived from
INCOSE rules. In contrast the BOSCH requirements engineers are interested in semantic checks.
If RQA would also integrate semantic checks, then the tool could lead to big improvements, as a
direct feedback on, e.g., consistency or completeness when writing the requirements would be
very valuable. We think this information is very important for tool vendors. Other NLP-based
tools like QVscribe [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] by QRA also base their tool solely on the INCOSE rules. Based on
our results, we claim that a tool only addressing these rules will not meet the needs of the
automotive domain at Bosch.
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>IBM</surname>
          </string-name>
          , Engineering requirements quality assistant, https://www.ibm.com/products/ requirements-quality
          <source>-assistant, 8.2</source>
          .
          <year>2021</year>
          . URL: https://www.ibm.com/products/ requirements-quality-assistant.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <issue>IEEE830</issue>
          ,
          <source>Recommended Practice for Software Requirements Specifications</source>
          ,
          <year>1998</year>
          . URL: http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=
          <fpage>720574</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <issue>ISO26262</issue>
          ,
          <article-title>Road vehicles - Functional safety, Part 8</article-title>
          ,
          <string-name>
            <surname>Baseline</surname>
            <given-names>17</given-names>
          </string-name>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>INCOSE</surname>
          </string-name>
          ,
          <article-title>Guide for writing requirements</article-title>
          , https://connect.incose.org/Pages/Product-Details.
          <source>aspx?ProductCode=TechGuideWR2019Soft</source>
          ,
          <year>2019</year>
          . URL: https://connect.incose.org/Pages/ Product-Details.aspx?ProductCode=
          <fpage>TechGuideWR2019Soft</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>K. A.</given-names>
            <surname>Neuendorf</surname>
          </string-name>
          ,
          <source>Content Analysis Guidebook, Sage Publications</source>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>K. H.</given-names>
            <surname>Krippendorf</surname>
          </string-name>
          ,
          <article-title>Content Analysis: An Introduction to Its Methodology</article-title>
          , 2nd ed.,
          <string-name>
            <surname>Sage</surname>
            <given-names>Publications</given-names>
          </string-name>
          , Inc,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>C.</given-names>
            <surname>Wohlin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Runeson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Höst</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. C.</given-names>
            <surname>Ohlsson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Regnell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Wesslén</surname>
          </string-name>
          ,
          <article-title>Experimentation in software engineering: an introduction</article-title>
          , Kluwer Academic Publishers, Norwell, USA,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>QRA</surname>
          </string-name>
          ,
          <article-title>Qvscribe by qra corp</article-title>
          , https://qracorp.com/qvscribe/, 8.2.
          <year>2021</year>
          . URL: https://qracorp. com/qvscribe/.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>