Predicting Requirements Volatility: An Industry Case Study
Anıl Holat1,2 , Ayse Tosun2
1
    Aselsan Inc., Ankara, Turkey
2
    Faculty of Computer and Informatics Engineerings, Istanbul Technical University, Turkey


                                             Abstract
                                             Software requirements are exposed to many changes during their software development life-cycle. These changes namely
                                             additions, modifications or deletions are defined as requirements volatility. Prior requirement volatility prediction studies
                                             utilize different requirement volatility measures. In this study we predict number of changes per software requirement as
                                             requirement volatility for a large scale safety-critical avionics project in ASELSAN. We employ a comprehensive metric set to
                                             explain requirements volatility: requirement quality measures, project specific factors and requirement interdependencies.
                                             Predictive models are created through combining input metric sets with machine learners. Success of models in predicting
                                             requirement changes, the best performing input metric combinations, the best performing machine learners and success of
                                             models in predicting highly-volatile requirements are evaluated in this study. The best prediction results are obtained with
                                             the model employing quality metrics, project specific metrics, network metrics altogether with k-nearest neighbour machine
                                             learner (MMRE=0.366). Also the best model correctly identifies 63.2% of highly volatile requirements which are exposed to
                                             80% of the total requirement changes. Our study results are encouraging in terms of creating requirement change prediction
                                             tools to prevent requirement volatility risks prior to the requirement review process.

                                             Keywords
                                             Predicting Requirements Volatility, Quality Metrics, Network Metrics, Requirements Quality, Requirements Change


1. Introduction                                                                                                        changes [5], requirement stability index of the project [6],
                                                                                                                       requirements that will be changed in next iteration [7],
Although software engineering has experienced signifi-                                                                 requirement change impact [8], the impact of require-
cant advancements in the last decades, majority of the                                                                 ments changes on project distribution and cost factor
large-scale software projects still try to cope with re-                                                               [9], software schedule [10]. Related studies also propose
quirement changes during their software development                                                                    requirements complexity metrics [6], requirement de-
life cycle due to dynamic nature of software develop-                                                                  pendency metrics [8], requirement size metrics [5] and
ment activities [1]. Changes for requirements namely                                                                   requirements evolution metrics [7] to predict their own
additions, deletions or modifications are defined as re-                                                               definition of requirement volatility measure.
quirements volatility [2]. Continual requirement changes                                                                  In our study we aim to predict number of changes
during software development have tremendous impact                                                                     per software requirement by using requirement quality
on the cost, the schedule and the quality of the final                                                                 measures, project specific factors and requirement in-
product. Unfortunately, significant number of software                                                                 terdependencies. We define requirement volatility as
projects cannot be completed successfully or completed                                                                 the number of change requests reported for a software
partially because of requirements’ high volatility [2].                                                                requirement. This change request could be either for
   According to a survey conducted by Thakurta [3],                                                                    adding a new requirement or modifying an existing re-
project managers use various requirement volatility mea-                                                               quirement. We chose a safety-critical avionics software
sures: number of changes to the identified use cases,                                                                  project in ASELSAN with more than 20,000 requirements
number of changing requirements identified within the                                                                  for our study. Loconsole et al. [5] conducted a similar
issued change requests, realized requirements out of total                                                             study to predict number of requirement changes using
requirements, and amount of budget the project had to                                                                  size measures on projects with less than 50 requirements.
spent on the changing requirements. Alsalemi et al. [4]                                                                Our study complements the prior work by mining a larger
also report a literature review on requirements volatility                                                             dataset with thousands of requirements and a more com-
prediction. Accordingly, ten studies have employed ma-                                                                 prehensive metric set considering quality and interdepen-
chine learning methods to predict requirements volatil-                                                                dency aspects of requirements as well as project specific
ity until 2017. These studies utilize different require-                                                               factors. It should be noted that the change requests that
ment volatility measures such as number of requirement                                                                 we study in this work occured in any phase of software
QuASoQ’21: 9th International Workshop on Quantitative Approaches                                                       development after Software Requirements Specification
to Software Quality, December 06, 2021, Taipei, Taiwan                                                                 (SRS) document has been reviewed and confirmed. Thus
Envelope-Open aholat@aselsan.com.tr (A. Holat); tosunay@itu.edu.tr (A. Tosun)                                          we investigate post-SRS requirements volatility for the
Orcid 0000-0002-8727-6768 (A. Holat); 0000-0003-1859-7872 (A. Tosun)                                                   avionics project under study.
                                       © 2021 Copyright for this paper by its authors. Use permitted under Creative
                                       Commons License Attribution 4.0 International (CC BY 4.0).                         The rest of paper is organized as follows. Section 2
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       CEUR Workshop Proceedings (CEUR-WS.org)


                                                                                                                      51
presents related empirical studies carried on for require-      According to the literature review, only one study con-
ment volatility prediction. Section 3 explains study de-     ducted by Loconsole et al. [5] present an empirical study
sign model in detail. Results and Threats to Validity        to predict number of changes per requirement, so this
of our work discussed in Section 4. Section 5 presents       study is the most relevant to our work. Following size
conclusion and points out possible directions for future     measures are used to predict number of requirement
work.                                                        changes: number of actors interacting with use cases,
                                                             number of words in each file, number of revisions of files,
                                                             number of lines per file. In our study, size measures are
2. Related Work                                              also used to represent requirement quality but we also
                                                             enriched our metric set with project specific metrics and
In this section we present previous studies that aim to
                                                             network metrics. It should be noted that we do not take
predict volatility for requirements, and we focus on the
                                                             deletion requests and deleted requirements into consider-
input metrics they employed. We report details of five
                                                             ation while defining requirements volatility, because in
relevant studies [11, 6, 7, 8, 5] from the literature review
                                                             our industrial context we rarely encounter such requests
conducted by Alsalemi et al. [4]. We also discuss the
                                                             for the safety-critical software. Finally, we applied our
approaches of other recently published, related studies
                                                             model on more than 20,000 requirements that help us
[12, 13] in this section.
                                                             assess the generalizability of our findings on predicting
   Nakatani et al. [11] propose a method to predict re-
                                                             volatility on every software requirement using different
quirement volatility using social relations between execu-
                                                             metrics sets.
tives, competitors, cooperative organizations, and the nat-
ural environment. Those measures can be applied to cus-
tomer requirements easily but it would take some effort 3. Study Design
to associate them with software requirements. Christo-
pher et al. [6] present requirements complexity metrics In this section we explain our empirical study design in
to define volatility. Functional requirement complex- detail. In Section 3.1 research questions are explained.
ity, non-functional requirement complexity, input-output The analyzed project for which a model would be pro-
complexity, interface and file complexity measures are posed is described in Section 3.2. In Section 3.3 selected
used to calculate whole project’s stability, whereas we input metrics for requirement volatility prediction are de-
seek to predict requirement volatility for each software scribed. Section 3.4 describes the output measure of the
requirement. Shi et al. [7] present a model to predict fu- prediction model. The used tools are explained in Section
ture requirement changes by using previous requirement 3.5. Machine learning techniques employed in this study
change metrics. They generated six history metrics for are presented in Section 3.6. Finally in 3.7, performance
requirements that contain information about volatility of evaluation measures are defined for our model.
topic, frequency of changes and time duration between
changes. History metrics can be used to predict require- 3.1. Research Questions
ments that will be changed in next iteration, but has little
use in predicting requirements volatility for new projects. Our main goal is to predict requirements volatility at
Pedrycz et al. [13] also employ the following change logs earlier stages of development lifecycle, and accordingly
as input metrics: created version of requirement, last de- two research questions are defined.
veloper, number of modifications, requirement lifetime          Research Question(RQ) 1: To what extent do re-
duration. Change logs are created on later phases of soft- quirement quality metrics, project specific metrics and
ware development thus they are again not very useful network metrics predict the volatility of a software re-
to predict requirements volatility for projects in earlier quirement?
development phases. Goknil et al. [8] and Hein et al. [12]      Previous studies used different metric sets to predict
use requirements interrelations for volatility prediction. requirements volatility. In this study we aim to use a
Goknil et al. [8] utilize formal semantics of requirement comprehensive set of input metrics, and observe their
relations as input features, whereas Hein et al. [12] create individual effects on requirement volatility prediction.
network metrics by using syntactical natural language While predicting the volatility, we use the number of
data. We have combined both measures and created change (addition and modification) requests on a soft-
network metrics by using links between system and soft- ware requirement. Being inspired by the metric sets used
ware requirements instead of lingual relations between in the literature, we form a group of requirement qual-
requirement texts. Regarding network metrics we em- ity metrics and network metrics. Additionally, project
ploy degree centrality, eigenvector centrality, closeness specific metrics for this particular safety-critical avionics
centrality and betweenness centrality metrics, whereas project are defined and utilized throughout this study.
Hein et al. [12] used 40 network metrics.                    During model assessment, the performance of each in-


                                                          52
Table 1                                                        3.3. Input Metrics
Release based AVPRJ statistics
                                                               We have employed several metrics to predict the volatil-
   Release     Number        Mean CR        Median CR          ity of each software requirement in AVPRJ. The metrics
               of REQs       per REQ        Per REQ            represent three dimensions: requirement quality metrics,
   Release 1   8,640         0.7457         1                  project specific metrics and requirement network met-
   Release 2   11,401        1.1165         1                  rics. Requirement quality metrics extracted by NASA
   Release 3   2,730         0.5267         0                  Automated Requirements Measurement(ARM) tool have
   Total       22,771        0.9051         1                  been used to predict faulty modules previously [16]. Ini-
                                                               tially, we believed the way requirements are documented
                                                               will affect requirements volatility besides fault proneness
put metric set and the combination of those are reported.      of modules. Some requirement quality size metrics are
Detailed sub-questions related to RQ 1 are also listed         already utilized in predicting requirements volatility [5],
below:                                                         therefore we decided to include requirement quality met-
   RQ 1.1: Which metric group is a better indicator of         ric set in our study. Network metrics are employed to
the number of requirement changes?                             predict requirement change volatility in a recent study
   RQ 1.2: Which machine learning algorithm is better          [12]. This sparked the idea of utilizing network metrics
at predicting the number of requirement changes?               for requirements volatility prediction. Initial observation
   RQ 2: How successful are the proposed models in             of various change request notes confirmed that software
predicting highly volatile software requirements?              requirements that are changed within a particular change
   Software requirements have a history of varying num-        request have a tendency to be linked to similar system re-
ber of changes during software development life cycle.         quirements. Accordingly, we employed network metrics
Some requirements do not change at all; however, some          created by traceability information. In order to enrich
requirements expose to multiple changes and pose risks         input metric set with a new metric group we focused on
to a software project. Practically, our model should pre-      safety-critical avionics project characteristics under this
dict highly volatile requirements, so those requirements       study. Features are evaluated separately and the ones
will be reviewed by experienced reviewers in detail. For       would provide information on requirements volatility are
this research question (RQ 2) we measure the success of        selected as project specific metrics. Rationales of project
our models on highly volatile requirements based on a          specific metric selection are given in detail in subsection
technique in [14].                                             3.3.2. Detailed explanations for each group are given in
                                                               the following subsections.
3.2. Analyzed Project
                                                               3.3.1. Quality Metrics
We chose a safety-critical avionics software project to
perform our analysis. We will refer to this project with       While selecting requirement quality metrics to predict
AVPRJ in the rest of this paper. AVPRJ has many releases       volatility, we have inspired by two studies. The first study
from which three releases are selected. Software require-      propose requirement metrics in the context of NASA
ments for those releases are related since they all belong     Metrics Data Program(MDP) to predict software faults
to the same project; however they are partially distinct       [16]. These metrics are calculated by automatically go-
since each release consists of implementation of different     ing through requirements documents to highlight vague,
software components developed by many software devel-          ambiguous, long, complex requirements. The second
opers. AVPRJ has a total of 22,771 software requirements.      study also reports requirement quality metrics [17] to
Some release based descriptive statistic for AVPRJ are         find out which requirement quality analyze tool is more
given in Table 1. CR is used as an abbreviation for change     successful regarding measurement of those metrics.
request, REQ is used as an abbreviation for a single re-          Combining both studies’ list and customizing that to
quirement. Most of the employed requirements belong to         the requirements document templates in our industrial
second release and this release has highest mean change        context, we present 20 quality metrics in Table 2. All of
request per software requirement value. Third release          these metrics take numeric values, e.g. number of flow
has relatively fewer software requirements and less addi-      sentences in a requirement, number of directives in a
tion or modification is performed on requirements belong       requirement.
to this release. More than half of the requirements are           During the preprocessing, stage, we had to remove
modified at least once for this project; 9,848 out of 22,771   three metrics from our analysis since they gave little to
requirements are not changed which complies with Stan-         no information for AVPRJ: Conditional, Rationale and
dish Group’s survey results over more than 8,000 software      Subjective. Only one requirement contains conditional
projects [15].                                                 expressions, three requirements contain rationale expres-


                                                           53
Table 2
Requirement Quality Metrics
  Acronyms                         The number of abbreviations in a software requirement. For AVPRJ permitted acronym
                                   list is used to extract this metric.
  Actions                          The number of actions to be performed if conditions of a software requirement are
                                   satisfied.
  Ambiguity                        The number of ambiguous expressions in a software requirement, e.g. adequate,
                                   sufficiently, optimal, slow.
  Chars between punctuation        Average character count between punctuation marks. Long sentences without punc-
                                   tuation marks decrease readability.
  Conditions                       The number of conditions need to be satisfied to perform a software requirement.
  Conditional                      The number of phrases that give the developers freedom to whether or not to imple-
                                   ment a software requirement, e.g. maybe, can’t, would.
  Connectors                       The number of connectors that are employed to link multiple sentences or group of
                                   words, e.g. and, or, as well as.
  Directives                       The number of directive expressions to refer a table, a note, a figure or an example.
  Flow sentences                   The number of expressions that semantically bond a sentence to another one, e.g.
                                   although, but, else.
  Imperatives                      The number of phrases that command to perform particular actions in a software
                                   requirement, e.g. shall, must, will.
  Implicitness                     The number of pronouns that make the software requirement difficult to understand,
                                   e.g. this, that, it. A software requirement should be defined explicitly.
  Incompleteness                   The number of expressions that indicate a software requirement is yet incomplete, e.g.
                                   and so on, tbd, etc.
  In links                         The number of incoming links to a software requirement from other documents. For
                                   AVPRJ test cases are linked to software requirements, so the number of in links refer
                                   to the number of linked test cases.
  Negative Sentences               The number of phrases that give negative meaning, e.g. doesn’t, none, can’t.
  Nested levels                    For AVPRJ nested level metric value is the greatest level in hierarchical nesting structure
                                   of a software requirement.
  Out links                        The number of out links of a software requirement. In AVPRJ software requirements
                                   are linked to system requirements. Therefore the number of out links is the total
                                   number of linked system requirements by a software requirement.
  Rationale                        The number of expressions that give justification in a software requirement, e.g. thus,
                                   in order to.
  Speculative Sentences            The number of speculative phrases which lead to question necessity of a software
                                   requirement, e.g. normally, eventually, almost.
  Subjectivity                     The number of subjective expressions presenting personal opinion rather than objec-
                                   tivity e.g. I think, in my opinion.
  Text length                      The total number of characters in a software requirement.


sions and none of the requirements have subjective ex-     cific metrics employed in this study. If the project follows
pressions. Thus we ended up having 17 metrics repre-       an inspection activity on requirements, it is more likely
senting the quality aspect of requirements for predicting  that the team would find the ambiguities and inconsis-
their volatility.                                          tencies on the requirements. Since derived requirements
                                                           are not part of customer needs, they cannot be validated
3.3.2. Project Specific Metrics                            through user acceptance tests. If a requirement has a
                                                           safety aspect, more comprehensive software tests will be
Project specific metrics may differ regarding the scope performed, thus exposure of a potential change is highly
of a software project, but the metrics we chose to use probable. Number of related components is a measure
are not so specific to the development environment, pro- of impact of a software requirement on general prod-
gramming language, or domain in which the software is uct, thus more feedback will be given to requirements
developed. We believe project specific metrics would pro- affecting many components by development team. Each
vide information about development characteristics in an software release has different dynamics that affect re-
organization, and hence the factors affecting the change quirements maturity e.g. release schedule, experience of
proneness of requirements. Table 3 list these project spe- developers, complexity of system. For example if sched-


                                                            54
Table 3                                                        ments regarding system requirement traceability links.
Project Specific Metrics                                       Software requirements which are derived from similar
   Inspection      Indicates if a software requirement is      system requirements are tend to be closer in our model.
                   evaluated through an inspection activ-      Weight assignment formula is given below (Equation 1).
                   ity. This procedure might be preferred      W is weight between software requirements, NCLINK
                   to complement functional tests.             is the number of common system requirement links be-
   Derived         Software requirements that are not ex-      tween two software requirements and NTOTLINK is the
                   plicitly stated in system requirements      total number of system requirements linked from those
                   but derived based on design decisions       two software requirements. After weight assignment,
                   [18].                                       a symmetrical 𝑛 × 𝑛 matrix is created where n denotes
   Safety          Shows if a software requirement is          the number of software requirements. Then the network
                   safety critical.
                                                               metrics are computed over this matrix.
   No. of Re-      Number of isolated software compo-
   lated Com-      nents that a requirement is related.                                   𝑁 𝐶𝐿𝐼 𝑁 𝐾𝑖,𝑗
   ponents                                                                       𝑊𝑖𝑗 =                                (1)
   Release         Release number that the software re-                                  𝑁 𝑇 𝑂𝑇 𝐿𝐼 𝑁 𝐾𝑖,𝑗
   Number          quirement belongs to.
                                                               3.4. Model Output
Table 4                                                        Our proposed model output is the number of change re-
Network Metrics                                                quests per software requirement. After the SRS document
   Degree cen-        Gives score to requirements based        is reviewed and completed for AVPRJ, change requests
   trality            on the number of links.                  linked to each software requirement are reported in the
   Betweenness        Measures how many times a require-       issue management system, and the document is modified
   centrality         ment is on the shortest path in the      accordingly by the analysis team. Thus we define require-
                      graph.
                                                               ments volatility in our industrial context with respect
   Closeness          Indicates how close a requirement
                                                               to number of change requests that have been applied
   centrality         to other requirements considering
                      the whole graph.                         to add a new requirement or to modify an existing re-
   Eigenvector        Measures how a node influences           quirement in the associated SRS document. Please note
   centrality         other nodes in network through con-      that our model outputs decimal values, but number of
                      nections.                                change request per requirement in practice can only get
                                                               integer values. Therefore we round fractional parts to
                                                               the nearest integer.
ule is too tight to complete SRS document, requirements
could be immature and more requirements changes could          3.5. Tools
be performed in the future for this release.
                                                               We wrote scripts to extract requirement quality and
3.3.3. Network Metrics                                         project specific metrics from SRS documents. Later,
                                                               UCINET tool [20] is used to create network metrics from
Hein et al. [12] earlier utilized 40 network metrics to pre-   the matrix that we extracted based on software and sys-
dict requirements change volatility. On the other hand,        tem requirements. Regression models with different ma-
Valente et al. [19] present correlations between degree,       chine learners are trained using WEKA tool [21]. Pre-
betweenness, closeness, eigenvector centrality measures,       diction results are further post-processed in MATLAB to
and indicate that those measures are distinct but notion-      obtain the performance measures regarding all RQs.
ally related. Thus in this work instead of employing 40
metrics, we chose the metrics suggested in [19] to pre-        3.6. Machine Learning Techniques
dict requirements volatility for AVPRJ. These centrality
metrics give each software requirement a value regard-         We train models using linear regression, random for-
ing their position in network. Brief explanations of the       est regression, support vector regression and k-nearest
employed network metrics are given in Table 4.                 neighbor regression methods. Linear regression was uti-
   Hein et al. [12] used language processing to create         lized in [5], whereas classifier version of the other three
network for requirements. In this study instead we used        techniques were used in [12].
traceability information to create network graph for soft-        For k-nearest neighbor regression, inversely propor-
ware requirements. Traceability links from software re-        tional weighting option is selected. Higher weights are
quirements to system requirements are used for this pur-       assigned to closer training samples which resulted in bet-
pose. We assigned weights between software require-            ter prediction results for our model. For support vector


                                                            55
regression commonly used radial basis function kernel Table 5
is selected. Increasing gamma parameter too much may Requirements volatility rank results evaluation
result in over-fitting [22] and we also experienced a great
                                                                    Condition                                 Evaluation
computational cost with little to no prediction success
gain for large gamma. Thus C and gamma parameters                   𝑅𝑎𝑐𝑡𝑢𝑎𝑙 ,𝑅𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑 ≤ 𝑁𝑟𝑒𝑞 × 𝑃            True Positive
are assigned as 1.                                                  𝑅𝑎𝑐𝑡𝑢𝑎𝑙 ,𝑅𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑 > 𝑁𝑟𝑒𝑞 × 𝑃            True Negative
   In this study 10-fold cross validation technique is used         𝑅𝑎𝑐𝑡𝑢𝑎𝑙 ≤ 𝑁𝑟𝑒𝑞 × 𝑃, 𝑅𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑 > 𝑁𝑟𝑒𝑞 × 𝑃 False Negative
to split training and test sets. Firstly, the dataset contain-      𝑅𝑎𝑐𝑡𝑢𝑎𝑙 > 𝑁𝑟𝑒𝑞 × 𝑃, 𝑅𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑 ≤ 𝑁𝑟𝑒𝑞 × 𝑃 False Positive
ing all software requirements is shuffled randomly and
split into 10 groups of approximately equal size. One
group is labeled as a test set and other groups are used             • Step 5: Calculate recall, accuracy and false alarm
to train machine learning models. This procedure is re-                  rate.
peated 10 times until each unique group is used as test
set once.                                                          Table 5 can be interpreted as follows: True Positive in-
                                                                stances are requirements that are actually highly volatile
                                                                and the model also categorizes those as highly volatile.
3.7. Performance Evaluation                                     In the case of True Negatives, a requirement is actually
For RQ 1, the following measures are used for perfor- less volatile, so is its prediction. False Negatives occur
mance evaluation: Mean Magnitude of Relative Error when highly volatile requirements are regarded as less
(MMRE), Median Magnitude of Relative Error (MdMRE), volatile by the predictor. Finally, False Positives indicate
Pred(0.5) and Pred(0.25) [23]. Relative error is calculated less volatile requirements predicted as highly volatile.
according to Equation 2. 𝐸𝑟𝑟𝑟𝑒𝑙𝑎𝑡𝑖𝑣𝑒 is relative error, 𝑉 𝑎𝑙𝑎𝑐𝑡    To answer RQ2, recall, accuracy and false alarm rate
is the actual value, whereas 𝑉 𝑎𝑙𝑝𝑟𝑒𝑑 is the predicted value. measures are computed. Recall result shows how suc-
                                                                cessful model in predicting highly volatile requirements.
                              |𝑉 𝑎𝑙𝑎𝑐𝑡 − 𝑉 𝑎𝑙𝑝𝑟𝑒𝑑 |             According to us this measure is the most important one
                𝐸𝑟𝑟𝑟𝑒𝑙𝑎𝑡𝑖𝑣𝑒 =                              (2) regarding RQ2. Accuracy measure shows the prediction
                                    |𝑉 𝑎𝑙𝑎𝑐𝑡 |
                                                                success for both highly volatile and less volatile require-
   There are requirements with zero change requests. ments. False alarm rate presents how much effort has
Thus division by zero problem arises while calculating put in vain by mis-evaluating less volatile requirements.
relative error. We made an assumption for unchanged
requirements as presented in Equation 3.
                                                                4. Results and Discussion
                                       |𝑉 𝑎𝑙𝑎𝑐𝑡 − 𝑉 𝑎𝑙𝑝𝑟𝑒𝑑 |    We present and discuss the performance of the models
      If 𝑉 𝑎𝑙𝑎𝑐𝑡 = 0   𝐸𝑟𝑟𝑟𝑒𝑙𝑎𝑡𝑖𝑣𝑒 =                    (3)
                                           1                    with respect to two RQs in this section. We also compare
   Pred(k) is a measure of variance of the error distribu-      the performance of the prediction models proposed in
tion. This measure is based on relative error and it shows      this study with the prior work [5] .
the percentage of predictions whose errors are less than
or equal to k.                                                  4.1. RQ 1
   For RQ2, we aim to predict highly volatile require-
ments, and thus, we first employ a method to identify    After obtaining processed data, machine learning regres-
those among the set of requirements:                     sion methods are applied to answer the question if re-
                                                         quirement quality metrics, network metrics, project spe-
     • Step 1: Rank requirements by their actual number cific metrics can be used to predict the number of changes
       of change requests in descending order and record on each software requirement by employing machine
       their rank as 𝑅𝑎𝑐𝑡𝑢𝑎𝑙 .                           learning methods. Model performance results are gath-
     • Step 2: Obtain regression prediction results for ered for all input metric and machine learning method
       each software requirement.                        combinations separately.
     • Step 3: Rank requirements by their predicted         Results for RQ 1 is given in Table 6. The following
       number of change requests in descending order abbreviations are used: ML for machine learning, Q for
       and record their rank as 𝑅𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑 .             requirement quality metrics, P for project specific met-
     • Step 4: Evaluate results according to the listing rics, N for network metrics, KNN for k-nearest neighbor
       in Table 5. P denotes percentage of requirements regression, LR for linear regression, RF for random forest
       which are perceived as highly volatile, and 𝑁𝑟𝑒𝑞 regression and SVR for support vector regression.
       denotes total number of requirements in valida-      In terms of input metric combinations, the best MMRE
       tion set.                                         results are achieved with Q&P&N(0.366), Q&N(0.381) and


                                                               56
Table 6                                                     tion models give the same best result, we do not rank the
Performance evaluation results for RQ 1                     best performing models with regard to MdMRE.
                                                               The best performance with respect to Pred(0.25) and
  Metrics+ML      MMRE MdMRE Pred(0.5) Pred(0.25)
  method
                                                            Pred(0.5) are obtained with Q&P&N+KNN. Q&N+KNN
                                                            and Q&P+KNN report the second and third best results.
  Q&P&N+KNN       0.366   0         0.681     0.57          Those results indicate that requirement quality metrics
  Q&P&N+LR        0.53    0.5       0.524     0.411         and k-nearest neighbor algorithm are also successful with
  Q&P&N+RF        0.392   0         0.663     0.545
                                                            respect to Pred measures.
  Q&P&N+SVR       0.392   0         0.662     0.554
  Q&P+KNN         0.402   0         0.641     0.529
                                                               To sum up, best performance results are achieved by us-
  Q&P+LR          0.513   0.5       0.541     0.428         ing requirement quality metrics, project specific metrics
  Q&P+RF          0.45    0.333     0.6       0.486         and network metrics altogether. Accordingly, K-nearest
  Q&P+SVR         0.459   0.5       0.584     0.479         neighbor algorithm gives best performance results for
  P&N+KNN         0.422   0         0.632     0.52          all measures. In all best performing models, quality met-
  P&N+LR          0.55    0.667     0.484     0.372         rics are utilized either as a pair with project or network
  P&N+RF          0.454   0.5       0.595     0.48          metrics or as combination of all three. It seems the way
  P&N+SVR         0.469   0.5       0.561     0.475         requirements are documented has a high effect on the
  Q&N+KNN         0.381   0         0.665     0.553         volatility rates.
  Q&N+LR          0.534   0.5       0.52      0.407
                                                               We compared our findings against the study conducted
  Q&N+RF          0.394   0         0.662     0.545
                                                            by Loconsole et al. [5]. Table 7 reports the performance
  Q&N+SVR         0.426   0         0.621     0.515
  Q+KNN           0.443   0.333     0.598     0.488         of linear regression model with the best metric set in our
  Q+LR            0.512   0.5       0.542     0.43          study, our best performing model and the best perform-
  Q+RF            0.455   0.5       0.594     0.483         ing model of [5]. If we compare the findings only on LR,
  Q+SVR           0.483   0.5       0.556     0.446         we observe that using number of lines predicts volatility
  P+KNN           0.555   0.667     0.483     0.37          better on their commercial setting, while in our context
  P+LR            0.548   0.667     0.485     0.373         using quality metrics only does not give the best result.
  P+RF            0.556   0.667     0.483     0.371         Other algorithms like KNN in combination with all met-
  P+SVR           0.516   0.5       0.512     0.417         rics significantly improve the prediction performance by
  N+KNN           0.448   0.5       0.596     0.482         reducing MMRE down to 0.36 and MdMRE down to 0,
  N+LR            0.549   0.667     0.484     0.372
                                                            and increasing Pred(0.25) up to 57%.
  N+RF            0.485   0.5       0.561     0.446
  N+SVR           0.53    0.5       0.5       0.392
                                                            4.2. RQ 2
Table 7                                                     RQ 2 aims to measure the success of our model in predict-
Comparison of our performance (RQ 1) against [5]            ing highly volatile requirements. We present our tech-
                                                            nique to identify highly volatile requirements in Section
                   MMRE MdMRE Pred(0.25) Pred(0.5)
                                                            3.7. We first need to determine change request cover-
  Q+LR             0.51    0.5      0.43       0.54         age to categorize highly-volatile requirements, and later
  Best model       0.36    0        0.57       0.68         calculate recall, accuracy and false alarm rates. Rates
  NLines+LR [5]    0.58    0.27     0.5        0.63         for various change request coverage by most volatile
                                                            requirements are given in Table 8. As change request
                                                            coverage grows more requirements are labeled as highly
Q&P(0.402). We may interpret that requirement quality       volatile. We chose 80% change request coverage since
metrics (Q) are successful at predicting number of change   approximately 40% percent of reviewers are considered
requests per software requirement, and its combinations     as well-experienced in AVPRJ. Therefore by applying
with the other metrics also give good results. With re-     this model we could assign review task of 38.6% of total
spect to the machine learning algorithm, the three best     requirements, which are possibly highly volatile, to ex-
performing metric combinations give the highest predic-     perienced developers in early phase of development. We
tion performance when k-nearest neighbor algorithm is       did not present other coverage results in this study due
utilized.                                                   to page limitation.
   MdMRE is zero for the following metric and ma-              In Table 9 the best recall results are achieved
chine learner combinations: Q&P&N+KNN, Q&P&N+RF,            with Q&P&N+KNN(0.632), Q&N+RF(0.616) and
Q&P&N+SVR, Q&P+KNN, P&N+KNN, Q&N+KNN,                       Q&P+KNN(0.604). Again, all best performing models
Q&N+RF and Q&N+SVR. Number of change requests               have requirement quality metrics in common, whereas
for more than half of the software requirements are pre-    the best combination consists of all metrics. The best
dicted correctly with these models. Since many predic-      accuracy results are obtained from Q&P&N+KNN(0.716),


                                                        57
Table 8                                                      4.3. Threats to Validity
Change request and requirement coverage relation for AVPRJ
                                                             Internal validity: In this study we present require-
        CR Coverage Percent       REQ Coverage               ments volatility in a software project can be predicted to
        60%                       20.5%                      some extent utilizing requirement quality metrics, project
        70%                       29.6%                      specific factors and network metrics altogether. However,
        80%                       38.6%                      this does not imply causal relationship between input
        90%                       47.7%                      and output metrics since we did not conduct a controlled
                                                             experiment.
                                                                External validity: We have conducted the case study
Table 9                                                      on one project, so results have local validity. However,
Performance results of RQ 2 model for 80% CR coverage
                                                             the dataset is quite large with more than 20,000 require-
  Metrics+ML          Re-       Accu-     False Alarm        ments from three distinct releases developed by many
  method              call      racy      Rate               software developers. Nonetheless, applying the predic-
  Q&P&N+KNN           0.632     0.716     0.232
                                                             tive models on different projects in the future would be
  Q&P&N+LR            0.531     0.638     0.295              better in terms of generalization of results.
  Q&P&N+RF            0.624     0.71      0.237                 Construct validity: Developers did not use their na-
  Q&P&N+SVR           0.591     0.684     0.258              tive language in software requirements. Thus there could
  Q&P+KNN             0.604     0.694     0.249              be some typos which may affect textual requirement qual-
  Q&P+LR              0.508     0.62      0.31               ity metrics. Also there could be some expressions used by
  Q&P+RF              0.602     0.692     0.251              developers in software requirements, e.g. subjective ex-
  Q&P+SVR             0.558     0.658     0.278              pressions that should have taken into consideration while
  P&N+KNN             0.574     0.671     0.268              creating requirement quality metrics but we missed. Due
  P&N+LR              0.418     0.55      0.366
                                                             to the size of dataset we couldn’t manually check these
  P&N+RF              0.557     0.658     0.279
                                                             kinds of typos and grammatical errors, but we know that
  P&N+SVR             0.496     0.61      0.318
  Q&N+KNN             0.614     0.702     0.243              reviewers are responsible for correcting those. We create
  Q&N+LR              0.53      0.637     0.296              network graphs based on traceability links between soft-
  Q&N+RF              0.616     0.703     0.242              ware and system requirements as indicated in the SRS.
  Q&N+SVR             0.569     0.667     0.271              We could have use linguistic data to connect software
  Q+KNN               0.586     0.68      0.261              requirements as previous study [12] and it may reflect
  Q+LR                0.508     0.62      0.31               relationship between requirements in a better way. We
  Q+RF                0.582     0.677     0.263              plan to do it as a future work.
  Q+SVR               0.529     0.636     0.296                 Conclusion validity: For RQ 2 only the results of 80%
  P+KNN               0.503     0.616     0.313              change request coverage are presented due to page limi-
  P+LR                0.423     0.554     0.363
                                                             tation. Regarding the results of other CR coverage, we
  P+RF                0.496     0.611     0.317
  P+SVR               0.462     0.584     0.339
                                                             observe higher recall and accuracy whereas false alarm
  N+KNN               0.557     0.657     0.279              rate grows undesirably as the coverage grows. Therefore
  N+LR                0.404     0.539     0.376              RQ 2 results would differ in that way if we had chosen
  N+RF                0.565     0.664     0.274              other CR coverage rate.
  N+SVR               0.498     0.612     0.316

                                                             5. Conclusion and Future Work
Q&N+RF(0.703) and Q&P+KNN(0.694). The lowest false In this paper, we have carried out an empirical study
alarm rate results are achieved by Q&P&N+KNN(0.232), to predict number of changes per software requirement
Q&N+RF(0.242) and Q&P+KNN(0.249).               K-nearest by using requirement quality measures, project specific
neighbor and random forest regression methods are factors and requirement interdependencies. 22,771 soft-
successful in predicting highly volatile requirements for ware requirements from a safety-critical software project
80% change request coverage.                              in ASELSAN are utilized to build 28 prediction models
   The most important measure for RQ 2 is recall since and assess the best performing metric suite and algo-
the purpose of this question is to measure success on rithm. We conclude that we can predict volatility of
predicting highly volatile requirements. We correctly requirements with an average MMRE of 36% by observ-
identify 63.2% of highly volatile requirements which are ing metrics of similar requirements through KNN. We
exposed to 80% of the total requirement changes.          also observe that measuring requirements from different
                                                          aspects like quality, project and network dependencies
                                                          gives a much better performance. We plan to integrate


                                                         58
such a predictor model into requirement management [10] X. Wang, C. Wu, L. Ma, Software project schedule
tools like DOORS to be used prior to the SRS review             variance prediction using bayesian network, in:
activity so that highly-volatile requirements could be au-      2010 IEEE International Conference on Advanced
tomatically and accurately identified. This way, software       Management Science, volume 2 of ICAMS 2010,
development leads could take precautions beforehand to          IEEE, Chengdu, China, 2010, pp. 26–30.
reduce requirements volatility related risks. Since there [11] T. Nakatani, T. Tsumaki, Predicting requirements
is not enough empirical studies conducted in related area,      changes by focusing on the social relations, in:
more empirical research should be carried out to validate       Proceedings of the Tenth Asia-Pacific Conference
the best performing models.                                     on Conceptual Modelling, volume 154 of APCCM
                                                                2014, Auckland, New Zealand, 2014, pp. 65–70.
                                                           [12] P. H. Hein, E. Kames, C. Chen, B. Morkos, Employ-
References                                                      ing machine learning techniques to assess require-
                                                                ment change volatility, Research in Engineering
  [1] N. Nurmuliani, D. Zowghi, S. Powell, Analysis of re-
                                                                Design 32 (2021) 245–269.
      quirements volatility during software development
                                                           [13] W. Pedrycz, J. Iljazi, A. Sillitti, G. Succi, Prediction
      life cycle, in: 2004 Australian Software Engineering
                                                                of the successful completion of requirements in
      Conference, ASWEC ’04, IEEE, 2004, pp. 28–37.
                                                                software development—an initial study, in: Agent
  [2] G. Swathi, A. Jagan, C. Prasad, Writing software
                                                                and Multi-Agent Systems: Technology and Appli-
      requirements specification quality requirements:
                                                                cations, Springer, 2016, pp. 261–269.
      An approach to manage requirements volatility, Int.
                                                           [14] T. J. Ostrand, E. J. Weyuker, How to measure suc-
      J. Comp. Tech. Appl. 2 (2011) 631–638.
                                                                cess of fault prediction models, in: Fourth interna-
  [3] R. Thakurta, A mixed mode analysis of the im-
                                                                tional workshop on Software quality assurance: in
      pact of requirement volatility on software project
                                                                conjunction with the 6th ESEC/FSE joint meeting,
      success, Journal of International Technology and
                                                                SOQUA’07, Dubrovnik, Croatia, 2007, pp. 25–30.
      Information Management 20 (2011).
                                                           [15] T. Clancy, The standish group report, Chaos report
  [4] A. M. Alsalemi, E.-T. Yeoh, A systematic literature
                                                                (1995).
      review of requirements volatility prediction, in:
                                                           [16] Y. Jiang, B. Cukic, T. Menzies, Fault prediction using
      2017 International Conference on Current Trends
                                                                early lifecycle data, in: The 18th IEEE International
      in Computer, Electrical, Electronics and Communi-
                                                                Symposium on Software Reliability, ISSRE’07, IEEE,
      cation, ICCTCEEC-2017, IEEE, 2017, pp. 55–64.
                                                                2007, pp. 237–246.
  [5] A. Loconsole, J. Börstler, Construction and Valida-
                                                           [17] G. Génova, J. M. Fuentes, J. Llorens, O. Hurtado,
      tion of Prediction Models for Number of Changes
                                                                V. Moreno, A framework to measure and improve
      to Requirements, Umeå University Technical Re-
                                                                the quality of textual requirements, Requirements
      port UMINF-07.03, Umeå University Department of
                                                                engineering 18 (2013) 25–41.
      Computing Science, UMEÅ, SWEDEN, 2007.
                                                           [18] A. Faisandier, Systems architecture and design, Sin-
  [6] D. F. X. Christopher, E. Chandra, Prediction of
                                                                ergy’Com Belberaud, France, 2013.
      software requirements stability based on complex-
                                                           [19] T. W. Valente, K. Coronges, C. Lakon, E. Costen-
      ity point measurement using multi-criteria fuzzy
                                                                bader, How correlated are network centrality mea-
      approach, International Journal of Software Engi-
                                                                sures?, Connections 28 (2008) 16–26.
      neering & Applications 3 (2012) 101–115.
                                                           [20] S. P. Borgatti, M. G. Everett, L. C. Freeman, Ucinet
  [7] L. Shi, Q. Wang, M. Li, Learning from evolution his-
                                                                for windows: Software for social network analysis,
      tory to predict future requirement changes, in: 2013
                                                                Harvard, MA: analytic technologies 6 (2002).
      21st IEEE International Requirements Engineering
                                                           [21] F. Eibe, M. A. Hall, I. H. Witten, The weka work-
      Conference, RE-2013, IEEE, 2013, pp. 135–144.
                                                                bench. online appendix for data mining: practical
  [8] A. Goknil, R. van Domburg, I. Kurtev, K. van den
                                                                machine learning tools and techniques, in: Morgan
      Berg, F. Wijnhoven, Experimental evaluation of a
                                                                Kaufmann, 2016.
      tool for change impact prediction in requirements
                                                           [22] A. Ben-Hur, J. Weston, A user’s guide to support
      models: Design, results, and lessons learned, in:
                                                                vector machines, in: Data mining techniques for
      2014 IEEE 4th International Model-Driven Require-
                                                                the life sciences, Springer, 2010, pp. 223–239.
      ments Engineering Workshop (MoDRE), MoDRE
                                                           [23] D. Zhang, J. J. Tsai, Machine learning applications in
      2014, IEEE, Karlskrona, Sweden, 2014, pp. 57–66.
                                                                software engineering, volume 16, World Scientific,
  [9] B. Morkos, J. Mathieson, J. D. Summers, Compar-
                                                                2005.
      ative analysis of requirements change prediction
      models: manual, linguistic, and neural network,
      Research in Engineering Design 25 (2014) 139–156.


                                                          59