=Paper=
{{Paper
|id=Vol-3062/paper08
|storemode=property
|title=Predicting Requirements Volatility: An Industry Case Study
|pdfUrl=https://ceur-ws.org/Vol-3062/Paper08_QuASoQ.pdf
|volume=Vol-3062
|authors=Anil Holat,Ayse Tosun
|dblpUrl=https://dblp.org/rec/conf/apsec/HolatT21
}}
==Predicting Requirements Volatility: An Industry Case Study==
Predicting Requirements Volatility: An Industry Case Study
Anıl Holat1,2 , Ayse Tosun2
1
Aselsan Inc., Ankara, Turkey
2
Faculty of Computer and Informatics Engineerings, Istanbul Technical University, Turkey
Abstract
Software requirements are exposed to many changes during their software development life-cycle. These changes namely
additions, modifications or deletions are defined as requirements volatility. Prior requirement volatility prediction studies
utilize different requirement volatility measures. In this study we predict number of changes per software requirement as
requirement volatility for a large scale safety-critical avionics project in ASELSAN. We employ a comprehensive metric set to
explain requirements volatility: requirement quality measures, project specific factors and requirement interdependencies.
Predictive models are created through combining input metric sets with machine learners. Success of models in predicting
requirement changes, the best performing input metric combinations, the best performing machine learners and success of
models in predicting highly-volatile requirements are evaluated in this study. The best prediction results are obtained with
the model employing quality metrics, project specific metrics, network metrics altogether with k-nearest neighbour machine
learner (MMRE=0.366). Also the best model correctly identifies 63.2% of highly volatile requirements which are exposed to
80% of the total requirement changes. Our study results are encouraging in terms of creating requirement change prediction
tools to prevent requirement volatility risks prior to the requirement review process.
Keywords
Predicting Requirements Volatility, Quality Metrics, Network Metrics, Requirements Quality, Requirements Change
1. Introduction changes [5], requirement stability index of the project [6],
requirements that will be changed in next iteration [7],
Although software engineering has experienced signifi- requirement change impact [8], the impact of require-
cant advancements in the last decades, majority of the ments changes on project distribution and cost factor
large-scale software projects still try to cope with re- [9], software schedule [10]. Related studies also propose
quirement changes during their software development requirements complexity metrics [6], requirement de-
life cycle due to dynamic nature of software develop- pendency metrics [8], requirement size metrics [5] and
ment activities [1]. Changes for requirements namely requirements evolution metrics [7] to predict their own
additions, deletions or modifications are defined as re- definition of requirement volatility measure.
quirements volatility [2]. Continual requirement changes In our study we aim to predict number of changes
during software development have tremendous impact per software requirement by using requirement quality
on the cost, the schedule and the quality of the final measures, project specific factors and requirement in-
product. Unfortunately, significant number of software terdependencies. We define requirement volatility as
projects cannot be completed successfully or completed the number of change requests reported for a software
partially because of requirements’ high volatility [2]. requirement. This change request could be either for
According to a survey conducted by Thakurta [3], adding a new requirement or modifying an existing re-
project managers use various requirement volatility mea- quirement. We chose a safety-critical avionics software
sures: number of changes to the identified use cases, project in ASELSAN with more than 20,000 requirements
number of changing requirements identified within the for our study. Loconsole et al. [5] conducted a similar
issued change requests, realized requirements out of total study to predict number of requirement changes using
requirements, and amount of budget the project had to size measures on projects with less than 50 requirements.
spent on the changing requirements. Alsalemi et al. [4] Our study complements the prior work by mining a larger
also report a literature review on requirements volatility dataset with thousands of requirements and a more com-
prediction. Accordingly, ten studies have employed ma- prehensive metric set considering quality and interdepen-
chine learning methods to predict requirements volatil- dency aspects of requirements as well as project specific
ity until 2017. These studies utilize different require- factors. It should be noted that the change requests that
ment volatility measures such as number of requirement we study in this work occured in any phase of software
QuASoQ’21: 9th International Workshop on Quantitative Approaches development after Software Requirements Specification
to Software Quality, December 06, 2021, Taipei, Taiwan (SRS) document has been reviewed and confirmed. Thus
Envelope-Open aholat@aselsan.com.tr (A. Holat); tosunay@itu.edu.tr (A. Tosun) we investigate post-SRS requirements volatility for the
Orcid 0000-0002-8727-6768 (A. Holat); 0000-0003-1859-7872 (A. Tosun) avionics project under study.
© 2021 Copyright for this paper by its authors. Use permitted under Creative
Commons License Attribution 4.0 International (CC BY 4.0). The rest of paper is organized as follows. Section 2
CEUR
Workshop
Proceedings
http://ceur-ws.org
ISSN 1613-0073
CEUR Workshop Proceedings (CEUR-WS.org)
51
presents related empirical studies carried on for require- According to the literature review, only one study con-
ment volatility prediction. Section 3 explains study de- ducted by Loconsole et al. [5] present an empirical study
sign model in detail. Results and Threats to Validity to predict number of changes per requirement, so this
of our work discussed in Section 4. Section 5 presents study is the most relevant to our work. Following size
conclusion and points out possible directions for future measures are used to predict number of requirement
work. changes: number of actors interacting with use cases,
number of words in each file, number of revisions of files,
number of lines per file. In our study, size measures are
2. Related Work also used to represent requirement quality but we also
enriched our metric set with project specific metrics and
In this section we present previous studies that aim to
network metrics. It should be noted that we do not take
predict volatility for requirements, and we focus on the
deletion requests and deleted requirements into consider-
input metrics they employed. We report details of five
ation while defining requirements volatility, because in
relevant studies [11, 6, 7, 8, 5] from the literature review
our industrial context we rarely encounter such requests
conducted by Alsalemi et al. [4]. We also discuss the
for the safety-critical software. Finally, we applied our
approaches of other recently published, related studies
model on more than 20,000 requirements that help us
[12, 13] in this section.
assess the generalizability of our findings on predicting
Nakatani et al. [11] propose a method to predict re-
volatility on every software requirement using different
quirement volatility using social relations between execu-
metrics sets.
tives, competitors, cooperative organizations, and the nat-
ural environment. Those measures can be applied to cus-
tomer requirements easily but it would take some effort 3. Study Design
to associate them with software requirements. Christo-
pher et al. [6] present requirements complexity metrics In this section we explain our empirical study design in
to define volatility. Functional requirement complex- detail. In Section 3.1 research questions are explained.
ity, non-functional requirement complexity, input-output The analyzed project for which a model would be pro-
complexity, interface and file complexity measures are posed is described in Section 3.2. In Section 3.3 selected
used to calculate whole project’s stability, whereas we input metrics for requirement volatility prediction are de-
seek to predict requirement volatility for each software scribed. Section 3.4 describes the output measure of the
requirement. Shi et al. [7] present a model to predict fu- prediction model. The used tools are explained in Section
ture requirement changes by using previous requirement 3.5. Machine learning techniques employed in this study
change metrics. They generated six history metrics for are presented in Section 3.6. Finally in 3.7, performance
requirements that contain information about volatility of evaluation measures are defined for our model.
topic, frequency of changes and time duration between
changes. History metrics can be used to predict require- 3.1. Research Questions
ments that will be changed in next iteration, but has little
use in predicting requirements volatility for new projects. Our main goal is to predict requirements volatility at
Pedrycz et al. [13] also employ the following change logs earlier stages of development lifecycle, and accordingly
as input metrics: created version of requirement, last de- two research questions are defined.
veloper, number of modifications, requirement lifetime Research Question(RQ) 1: To what extent do re-
duration. Change logs are created on later phases of soft- quirement quality metrics, project specific metrics and
ware development thus they are again not very useful network metrics predict the volatility of a software re-
to predict requirements volatility for projects in earlier quirement?
development phases. Goknil et al. [8] and Hein et al. [12] Previous studies used different metric sets to predict
use requirements interrelations for volatility prediction. requirements volatility. In this study we aim to use a
Goknil et al. [8] utilize formal semantics of requirement comprehensive set of input metrics, and observe their
relations as input features, whereas Hein et al. [12] create individual effects on requirement volatility prediction.
network metrics by using syntactical natural language While predicting the volatility, we use the number of
data. We have combined both measures and created change (addition and modification) requests on a soft-
network metrics by using links between system and soft- ware requirement. Being inspired by the metric sets used
ware requirements instead of lingual relations between in the literature, we form a group of requirement qual-
requirement texts. Regarding network metrics we em- ity metrics and network metrics. Additionally, project
ploy degree centrality, eigenvector centrality, closeness specific metrics for this particular safety-critical avionics
centrality and betweenness centrality metrics, whereas project are defined and utilized throughout this study.
Hein et al. [12] used 40 network metrics. During model assessment, the performance of each in-
52
Table 1 3.3. Input Metrics
Release based AVPRJ statistics
We have employed several metrics to predict the volatil-
Release Number Mean CR Median CR ity of each software requirement in AVPRJ. The metrics
of REQs per REQ Per REQ represent three dimensions: requirement quality metrics,
Release 1 8,640 0.7457 1 project specific metrics and requirement network met-
Release 2 11,401 1.1165 1 rics. Requirement quality metrics extracted by NASA
Release 3 2,730 0.5267 0 Automated Requirements Measurement(ARM) tool have
Total 22,771 0.9051 1 been used to predict faulty modules previously [16]. Ini-
tially, we believed the way requirements are documented
will affect requirements volatility besides fault proneness
put metric set and the combination of those are reported. of modules. Some requirement quality size metrics are
Detailed sub-questions related to RQ 1 are also listed already utilized in predicting requirements volatility [5],
below: therefore we decided to include requirement quality met-
RQ 1.1: Which metric group is a better indicator of ric set in our study. Network metrics are employed to
the number of requirement changes? predict requirement change volatility in a recent study
RQ 1.2: Which machine learning algorithm is better [12]. This sparked the idea of utilizing network metrics
at predicting the number of requirement changes? for requirements volatility prediction. Initial observation
RQ 2: How successful are the proposed models in of various change request notes confirmed that software
predicting highly volatile software requirements? requirements that are changed within a particular change
Software requirements have a history of varying num- request have a tendency to be linked to similar system re-
ber of changes during software development life cycle. quirements. Accordingly, we employed network metrics
Some requirements do not change at all; however, some created by traceability information. In order to enrich
requirements expose to multiple changes and pose risks input metric set with a new metric group we focused on
to a software project. Practically, our model should pre- safety-critical avionics project characteristics under this
dict highly volatile requirements, so those requirements study. Features are evaluated separately and the ones
will be reviewed by experienced reviewers in detail. For would provide information on requirements volatility are
this research question (RQ 2) we measure the success of selected as project specific metrics. Rationales of project
our models on highly volatile requirements based on a specific metric selection are given in detail in subsection
technique in [14]. 3.3.2. Detailed explanations for each group are given in
the following subsections.
3.2. Analyzed Project
3.3.1. Quality Metrics
We chose a safety-critical avionics software project to
perform our analysis. We will refer to this project with While selecting requirement quality metrics to predict
AVPRJ in the rest of this paper. AVPRJ has many releases volatility, we have inspired by two studies. The first study
from which three releases are selected. Software require- propose requirement metrics in the context of NASA
ments for those releases are related since they all belong Metrics Data Program(MDP) to predict software faults
to the same project; however they are partially distinct [16]. These metrics are calculated by automatically go-
since each release consists of implementation of different ing through requirements documents to highlight vague,
software components developed by many software devel- ambiguous, long, complex requirements. The second
opers. AVPRJ has a total of 22,771 software requirements. study also reports requirement quality metrics [17] to
Some release based descriptive statistic for AVPRJ are find out which requirement quality analyze tool is more
given in Table 1. CR is used as an abbreviation for change successful regarding measurement of those metrics.
request, REQ is used as an abbreviation for a single re- Combining both studies’ list and customizing that to
quirement. Most of the employed requirements belong to the requirements document templates in our industrial
second release and this release has highest mean change context, we present 20 quality metrics in Table 2. All of
request per software requirement value. Third release these metrics take numeric values, e.g. number of flow
has relatively fewer software requirements and less addi- sentences in a requirement, number of directives in a
tion or modification is performed on requirements belong requirement.
to this release. More than half of the requirements are During the preprocessing, stage, we had to remove
modified at least once for this project; 9,848 out of 22,771 three metrics from our analysis since they gave little to
requirements are not changed which complies with Stan- no information for AVPRJ: Conditional, Rationale and
dish Group’s survey results over more than 8,000 software Subjective. Only one requirement contains conditional
projects [15]. expressions, three requirements contain rationale expres-
53
Table 2
Requirement Quality Metrics
Acronyms The number of abbreviations in a software requirement. For AVPRJ permitted acronym
list is used to extract this metric.
Actions The number of actions to be performed if conditions of a software requirement are
satisfied.
Ambiguity The number of ambiguous expressions in a software requirement, e.g. adequate,
sufficiently, optimal, slow.
Chars between punctuation Average character count between punctuation marks. Long sentences without punc-
tuation marks decrease readability.
Conditions The number of conditions need to be satisfied to perform a software requirement.
Conditional The number of phrases that give the developers freedom to whether or not to imple-
ment a software requirement, e.g. maybe, can’t, would.
Connectors The number of connectors that are employed to link multiple sentences or group of
words, e.g. and, or, as well as.
Directives The number of directive expressions to refer a table, a note, a figure or an example.
Flow sentences The number of expressions that semantically bond a sentence to another one, e.g.
although, but, else.
Imperatives The number of phrases that command to perform particular actions in a software
requirement, e.g. shall, must, will.
Implicitness The number of pronouns that make the software requirement difficult to understand,
e.g. this, that, it. A software requirement should be defined explicitly.
Incompleteness The number of expressions that indicate a software requirement is yet incomplete, e.g.
and so on, tbd, etc.
In links The number of incoming links to a software requirement from other documents. For
AVPRJ test cases are linked to software requirements, so the number of in links refer
to the number of linked test cases.
Negative Sentences The number of phrases that give negative meaning, e.g. doesn’t, none, can’t.
Nested levels For AVPRJ nested level metric value is the greatest level in hierarchical nesting structure
of a software requirement.
Out links The number of out links of a software requirement. In AVPRJ software requirements
are linked to system requirements. Therefore the number of out links is the total
number of linked system requirements by a software requirement.
Rationale The number of expressions that give justification in a software requirement, e.g. thus,
in order to.
Speculative Sentences The number of speculative phrases which lead to question necessity of a software
requirement, e.g. normally, eventually, almost.
Subjectivity The number of subjective expressions presenting personal opinion rather than objec-
tivity e.g. I think, in my opinion.
Text length The total number of characters in a software requirement.
sions and none of the requirements have subjective ex- cific metrics employed in this study. If the project follows
pressions. Thus we ended up having 17 metrics repre- an inspection activity on requirements, it is more likely
senting the quality aspect of requirements for predicting that the team would find the ambiguities and inconsis-
their volatility. tencies on the requirements. Since derived requirements
are not part of customer needs, they cannot be validated
3.3.2. Project Specific Metrics through user acceptance tests. If a requirement has a
safety aspect, more comprehensive software tests will be
Project specific metrics may differ regarding the scope performed, thus exposure of a potential change is highly
of a software project, but the metrics we chose to use probable. Number of related components is a measure
are not so specific to the development environment, pro- of impact of a software requirement on general prod-
gramming language, or domain in which the software is uct, thus more feedback will be given to requirements
developed. We believe project specific metrics would pro- affecting many components by development team. Each
vide information about development characteristics in an software release has different dynamics that affect re-
organization, and hence the factors affecting the change quirements maturity e.g. release schedule, experience of
proneness of requirements. Table 3 list these project spe- developers, complexity of system. For example if sched-
54
Table 3 ments regarding system requirement traceability links.
Project Specific Metrics Software requirements which are derived from similar
Inspection Indicates if a software requirement is system requirements are tend to be closer in our model.
evaluated through an inspection activ- Weight assignment formula is given below (Equation 1).
ity. This procedure might be preferred W is weight between software requirements, NCLINK
to complement functional tests. is the number of common system requirement links be-
Derived Software requirements that are not ex- tween two software requirements and NTOTLINK is the
plicitly stated in system requirements total number of system requirements linked from those
but derived based on design decisions two software requirements. After weight assignment,
[18]. a symmetrical 𝑛 × 𝑛 matrix is created where n denotes
Safety Shows if a software requirement is the number of software requirements. Then the network
safety critical.
metrics are computed over this matrix.
No. of Re- Number of isolated software compo-
lated Com- nents that a requirement is related. 𝑁 𝐶𝐿𝐼 𝑁 𝐾𝑖,𝑗
ponents 𝑊𝑖𝑗 = (1)
Release Release number that the software re- 𝑁 𝑇 𝑂𝑇 𝐿𝐼 𝑁 𝐾𝑖,𝑗
Number quirement belongs to.
3.4. Model Output
Table 4 Our proposed model output is the number of change re-
Network Metrics quests per software requirement. After the SRS document
Degree cen- Gives score to requirements based is reviewed and completed for AVPRJ, change requests
trality on the number of links. linked to each software requirement are reported in the
Betweenness Measures how many times a require- issue management system, and the document is modified
centrality ment is on the shortest path in the accordingly by the analysis team. Thus we define require-
graph.
ments volatility in our industrial context with respect
Closeness Indicates how close a requirement
to number of change requests that have been applied
centrality to other requirements considering
the whole graph. to add a new requirement or to modify an existing re-
Eigenvector Measures how a node influences quirement in the associated SRS document. Please note
centrality other nodes in network through con- that our model outputs decimal values, but number of
nections. change request per requirement in practice can only get
integer values. Therefore we round fractional parts to
the nearest integer.
ule is too tight to complete SRS document, requirements
could be immature and more requirements changes could 3.5. Tools
be performed in the future for this release.
We wrote scripts to extract requirement quality and
3.3.3. Network Metrics project specific metrics from SRS documents. Later,
UCINET tool [20] is used to create network metrics from
Hein et al. [12] earlier utilized 40 network metrics to pre- the matrix that we extracted based on software and sys-
dict requirements change volatility. On the other hand, tem requirements. Regression models with different ma-
Valente et al. [19] present correlations between degree, chine learners are trained using WEKA tool [21]. Pre-
betweenness, closeness, eigenvector centrality measures, diction results are further post-processed in MATLAB to
and indicate that those measures are distinct but notion- obtain the performance measures regarding all RQs.
ally related. Thus in this work instead of employing 40
metrics, we chose the metrics suggested in [19] to pre- 3.6. Machine Learning Techniques
dict requirements volatility for AVPRJ. These centrality
metrics give each software requirement a value regard- We train models using linear regression, random for-
ing their position in network. Brief explanations of the est regression, support vector regression and k-nearest
employed network metrics are given in Table 4. neighbor regression methods. Linear regression was uti-
Hein et al. [12] used language processing to create lized in [5], whereas classifier version of the other three
network for requirements. In this study instead we used techniques were used in [12].
traceability information to create network graph for soft- For k-nearest neighbor regression, inversely propor-
ware requirements. Traceability links from software re- tional weighting option is selected. Higher weights are
quirements to system requirements are used for this pur- assigned to closer training samples which resulted in bet-
pose. We assigned weights between software require- ter prediction results for our model. For support vector
55
regression commonly used radial basis function kernel Table 5
is selected. Increasing gamma parameter too much may Requirements volatility rank results evaluation
result in over-fitting [22] and we also experienced a great
Condition Evaluation
computational cost with little to no prediction success
gain for large gamma. Thus C and gamma parameters 𝑅𝑎𝑐𝑡𝑢𝑎𝑙 ,𝑅𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑 ≤ 𝑁𝑟𝑒𝑞 × 𝑃 True Positive
are assigned as 1. 𝑅𝑎𝑐𝑡𝑢𝑎𝑙 ,𝑅𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑 > 𝑁𝑟𝑒𝑞 × 𝑃 True Negative
In this study 10-fold cross validation technique is used 𝑅𝑎𝑐𝑡𝑢𝑎𝑙 ≤ 𝑁𝑟𝑒𝑞 × 𝑃, 𝑅𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑 > 𝑁𝑟𝑒𝑞 × 𝑃 False Negative
to split training and test sets. Firstly, the dataset contain- 𝑅𝑎𝑐𝑡𝑢𝑎𝑙 > 𝑁𝑟𝑒𝑞 × 𝑃, 𝑅𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑 ≤ 𝑁𝑟𝑒𝑞 × 𝑃 False Positive
ing all software requirements is shuffled randomly and
split into 10 groups of approximately equal size. One
group is labeled as a test set and other groups are used • Step 5: Calculate recall, accuracy and false alarm
to train machine learning models. This procedure is re- rate.
peated 10 times until each unique group is used as test
set once. Table 5 can be interpreted as follows: True Positive in-
stances are requirements that are actually highly volatile
and the model also categorizes those as highly volatile.
3.7. Performance Evaluation In the case of True Negatives, a requirement is actually
For RQ 1, the following measures are used for perfor- less volatile, so is its prediction. False Negatives occur
mance evaluation: Mean Magnitude of Relative Error when highly volatile requirements are regarded as less
(MMRE), Median Magnitude of Relative Error (MdMRE), volatile by the predictor. Finally, False Positives indicate
Pred(0.5) and Pred(0.25) [23]. Relative error is calculated less volatile requirements predicted as highly volatile.
according to Equation 2. 𝐸𝑟𝑟𝑟𝑒𝑙𝑎𝑡𝑖𝑣𝑒 is relative error, 𝑉 𝑎𝑙𝑎𝑐𝑡 To answer RQ2, recall, accuracy and false alarm rate
is the actual value, whereas 𝑉 𝑎𝑙𝑝𝑟𝑒𝑑 is the predicted value. measures are computed. Recall result shows how suc-
cessful model in predicting highly volatile requirements.
|𝑉 𝑎𝑙𝑎𝑐𝑡 − 𝑉 𝑎𝑙𝑝𝑟𝑒𝑑 | According to us this measure is the most important one
𝐸𝑟𝑟𝑟𝑒𝑙𝑎𝑡𝑖𝑣𝑒 = (2) regarding RQ2. Accuracy measure shows the prediction
|𝑉 𝑎𝑙𝑎𝑐𝑡 |
success for both highly volatile and less volatile require-
There are requirements with zero change requests. ments. False alarm rate presents how much effort has
Thus division by zero problem arises while calculating put in vain by mis-evaluating less volatile requirements.
relative error. We made an assumption for unchanged
requirements as presented in Equation 3.
4. Results and Discussion
|𝑉 𝑎𝑙𝑎𝑐𝑡 − 𝑉 𝑎𝑙𝑝𝑟𝑒𝑑 | We present and discuss the performance of the models
If 𝑉 𝑎𝑙𝑎𝑐𝑡 = 0 𝐸𝑟𝑟𝑟𝑒𝑙𝑎𝑡𝑖𝑣𝑒 = (3)
1 with respect to two RQs in this section. We also compare
Pred(k) is a measure of variance of the error distribu- the performance of the prediction models proposed in
tion. This measure is based on relative error and it shows this study with the prior work [5] .
the percentage of predictions whose errors are less than
or equal to k. 4.1. RQ 1
For RQ2, we aim to predict highly volatile require-
ments, and thus, we first employ a method to identify After obtaining processed data, machine learning regres-
those among the set of requirements: sion methods are applied to answer the question if re-
quirement quality metrics, network metrics, project spe-
• Step 1: Rank requirements by their actual number cific metrics can be used to predict the number of changes
of change requests in descending order and record on each software requirement by employing machine
their rank as 𝑅𝑎𝑐𝑡𝑢𝑎𝑙 . learning methods. Model performance results are gath-
• Step 2: Obtain regression prediction results for ered for all input metric and machine learning method
each software requirement. combinations separately.
• Step 3: Rank requirements by their predicted Results for RQ 1 is given in Table 6. The following
number of change requests in descending order abbreviations are used: ML for machine learning, Q for
and record their rank as 𝑅𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑 . requirement quality metrics, P for project specific met-
• Step 4: Evaluate results according to the listing rics, N for network metrics, KNN for k-nearest neighbor
in Table 5. P denotes percentage of requirements regression, LR for linear regression, RF for random forest
which are perceived as highly volatile, and 𝑁𝑟𝑒𝑞 regression and SVR for support vector regression.
denotes total number of requirements in valida- In terms of input metric combinations, the best MMRE
tion set. results are achieved with Q&P&N(0.366), Q&N(0.381) and
56
Table 6 tion models give the same best result, we do not rank the
Performance evaluation results for RQ 1 best performing models with regard to MdMRE.
The best performance with respect to Pred(0.25) and
Metrics+ML MMRE MdMRE Pred(0.5) Pred(0.25)
method
Pred(0.5) are obtained with Q&P&N+KNN. Q&N+KNN
and Q&P+KNN report the second and third best results.
Q&P&N+KNN 0.366 0 0.681 0.57 Those results indicate that requirement quality metrics
Q&P&N+LR 0.53 0.5 0.524 0.411 and k-nearest neighbor algorithm are also successful with
Q&P&N+RF 0.392 0 0.663 0.545
respect to Pred measures.
Q&P&N+SVR 0.392 0 0.662 0.554
Q&P+KNN 0.402 0 0.641 0.529
To sum up, best performance results are achieved by us-
Q&P+LR 0.513 0.5 0.541 0.428 ing requirement quality metrics, project specific metrics
Q&P+RF 0.45 0.333 0.6 0.486 and network metrics altogether. Accordingly, K-nearest
Q&P+SVR 0.459 0.5 0.584 0.479 neighbor algorithm gives best performance results for
P&N+KNN 0.422 0 0.632 0.52 all measures. In all best performing models, quality met-
P&N+LR 0.55 0.667 0.484 0.372 rics are utilized either as a pair with project or network
P&N+RF 0.454 0.5 0.595 0.48 metrics or as combination of all three. It seems the way
P&N+SVR 0.469 0.5 0.561 0.475 requirements are documented has a high effect on the
Q&N+KNN 0.381 0 0.665 0.553 volatility rates.
Q&N+LR 0.534 0.5 0.52 0.407
We compared our findings against the study conducted
Q&N+RF 0.394 0 0.662 0.545
by Loconsole et al. [5]. Table 7 reports the performance
Q&N+SVR 0.426 0 0.621 0.515
Q+KNN 0.443 0.333 0.598 0.488 of linear regression model with the best metric set in our
Q+LR 0.512 0.5 0.542 0.43 study, our best performing model and the best perform-
Q+RF 0.455 0.5 0.594 0.483 ing model of [5]. If we compare the findings only on LR,
Q+SVR 0.483 0.5 0.556 0.446 we observe that using number of lines predicts volatility
P+KNN 0.555 0.667 0.483 0.37 better on their commercial setting, while in our context
P+LR 0.548 0.667 0.485 0.373 using quality metrics only does not give the best result.
P+RF 0.556 0.667 0.483 0.371 Other algorithms like KNN in combination with all met-
P+SVR 0.516 0.5 0.512 0.417 rics significantly improve the prediction performance by
N+KNN 0.448 0.5 0.596 0.482 reducing MMRE down to 0.36 and MdMRE down to 0,
N+LR 0.549 0.667 0.484 0.372
and increasing Pred(0.25) up to 57%.
N+RF 0.485 0.5 0.561 0.446
N+SVR 0.53 0.5 0.5 0.392
4.2. RQ 2
Table 7 RQ 2 aims to measure the success of our model in predict-
Comparison of our performance (RQ 1) against [5] ing highly volatile requirements. We present our tech-
nique to identify highly volatile requirements in Section
MMRE MdMRE Pred(0.25) Pred(0.5)
3.7. We first need to determine change request cover-
Q+LR 0.51 0.5 0.43 0.54 age to categorize highly-volatile requirements, and later
Best model 0.36 0 0.57 0.68 calculate recall, accuracy and false alarm rates. Rates
NLines+LR [5] 0.58 0.27 0.5 0.63 for various change request coverage by most volatile
requirements are given in Table 8. As change request
coverage grows more requirements are labeled as highly
Q&P(0.402). We may interpret that requirement quality volatile. We chose 80% change request coverage since
metrics (Q) are successful at predicting number of change approximately 40% percent of reviewers are considered
requests per software requirement, and its combinations as well-experienced in AVPRJ. Therefore by applying
with the other metrics also give good results. With re- this model we could assign review task of 38.6% of total
spect to the machine learning algorithm, the three best requirements, which are possibly highly volatile, to ex-
performing metric combinations give the highest predic- perienced developers in early phase of development. We
tion performance when k-nearest neighbor algorithm is did not present other coverage results in this study due
utilized. to page limitation.
MdMRE is zero for the following metric and ma- In Table 9 the best recall results are achieved
chine learner combinations: Q&P&N+KNN, Q&P&N+RF, with Q&P&N+KNN(0.632), Q&N+RF(0.616) and
Q&P&N+SVR, Q&P+KNN, P&N+KNN, Q&N+KNN, Q&P+KNN(0.604). Again, all best performing models
Q&N+RF and Q&N+SVR. Number of change requests have requirement quality metrics in common, whereas
for more than half of the software requirements are pre- the best combination consists of all metrics. The best
dicted correctly with these models. Since many predic- accuracy results are obtained from Q&P&N+KNN(0.716),
57
Table 8 4.3. Threats to Validity
Change request and requirement coverage relation for AVPRJ
Internal validity: In this study we present require-
CR Coverage Percent REQ Coverage ments volatility in a software project can be predicted to
60% 20.5% some extent utilizing requirement quality metrics, project
70% 29.6% specific factors and network metrics altogether. However,
80% 38.6% this does not imply causal relationship between input
90% 47.7% and output metrics since we did not conduct a controlled
experiment.
External validity: We have conducted the case study
Table 9 on one project, so results have local validity. However,
Performance results of RQ 2 model for 80% CR coverage
the dataset is quite large with more than 20,000 require-
Metrics+ML Re- Accu- False Alarm ments from three distinct releases developed by many
method call racy Rate software developers. Nonetheless, applying the predic-
Q&P&N+KNN 0.632 0.716 0.232
tive models on different projects in the future would be
Q&P&N+LR 0.531 0.638 0.295 better in terms of generalization of results.
Q&P&N+RF 0.624 0.71 0.237 Construct validity: Developers did not use their na-
Q&P&N+SVR 0.591 0.684 0.258 tive language in software requirements. Thus there could
Q&P+KNN 0.604 0.694 0.249 be some typos which may affect textual requirement qual-
Q&P+LR 0.508 0.62 0.31 ity metrics. Also there could be some expressions used by
Q&P+RF 0.602 0.692 0.251 developers in software requirements, e.g. subjective ex-
Q&P+SVR 0.558 0.658 0.278 pressions that should have taken into consideration while
P&N+KNN 0.574 0.671 0.268 creating requirement quality metrics but we missed. Due
P&N+LR 0.418 0.55 0.366
to the size of dataset we couldn’t manually check these
P&N+RF 0.557 0.658 0.279
kinds of typos and grammatical errors, but we know that
P&N+SVR 0.496 0.61 0.318
Q&N+KNN 0.614 0.702 0.243 reviewers are responsible for correcting those. We create
Q&N+LR 0.53 0.637 0.296 network graphs based on traceability links between soft-
Q&N+RF 0.616 0.703 0.242 ware and system requirements as indicated in the SRS.
Q&N+SVR 0.569 0.667 0.271 We could have use linguistic data to connect software
Q+KNN 0.586 0.68 0.261 requirements as previous study [12] and it may reflect
Q+LR 0.508 0.62 0.31 relationship between requirements in a better way. We
Q+RF 0.582 0.677 0.263 plan to do it as a future work.
Q+SVR 0.529 0.636 0.296 Conclusion validity: For RQ 2 only the results of 80%
P+KNN 0.503 0.616 0.313 change request coverage are presented due to page limi-
P+LR 0.423 0.554 0.363
tation. Regarding the results of other CR coverage, we
P+RF 0.496 0.611 0.317
P+SVR 0.462 0.584 0.339
observe higher recall and accuracy whereas false alarm
N+KNN 0.557 0.657 0.279 rate grows undesirably as the coverage grows. Therefore
N+LR 0.404 0.539 0.376 RQ 2 results would differ in that way if we had chosen
N+RF 0.565 0.664 0.274 other CR coverage rate.
N+SVR 0.498 0.612 0.316
5. Conclusion and Future Work
Q&N+RF(0.703) and Q&P+KNN(0.694). The lowest false In this paper, we have carried out an empirical study
alarm rate results are achieved by Q&P&N+KNN(0.232), to predict number of changes per software requirement
Q&N+RF(0.242) and Q&P+KNN(0.249). K-nearest by using requirement quality measures, project specific
neighbor and random forest regression methods are factors and requirement interdependencies. 22,771 soft-
successful in predicting highly volatile requirements for ware requirements from a safety-critical software project
80% change request coverage. in ASELSAN are utilized to build 28 prediction models
The most important measure for RQ 2 is recall since and assess the best performing metric suite and algo-
the purpose of this question is to measure success on rithm. We conclude that we can predict volatility of
predicting highly volatile requirements. We correctly requirements with an average MMRE of 36% by observ-
identify 63.2% of highly volatile requirements which are ing metrics of similar requirements through KNN. We
exposed to 80% of the total requirement changes. also observe that measuring requirements from different
aspects like quality, project and network dependencies
gives a much better performance. We plan to integrate
58
such a predictor model into requirement management [10] X. Wang, C. Wu, L. Ma, Software project schedule
tools like DOORS to be used prior to the SRS review variance prediction using bayesian network, in:
activity so that highly-volatile requirements could be au- 2010 IEEE International Conference on Advanced
tomatically and accurately identified. This way, software Management Science, volume 2 of ICAMS 2010,
development leads could take precautions beforehand to IEEE, Chengdu, China, 2010, pp. 26–30.
reduce requirements volatility related risks. Since there [11] T. Nakatani, T. Tsumaki, Predicting requirements
is not enough empirical studies conducted in related area, changes by focusing on the social relations, in:
more empirical research should be carried out to validate Proceedings of the Tenth Asia-Pacific Conference
the best performing models. on Conceptual Modelling, volume 154 of APCCM
2014, Auckland, New Zealand, 2014, pp. 65–70.
[12] P. H. Hein, E. Kames, C. Chen, B. Morkos, Employ-
References ing machine learning techniques to assess require-
ment change volatility, Research in Engineering
[1] N. Nurmuliani, D. Zowghi, S. Powell, Analysis of re-
Design 32 (2021) 245–269.
quirements volatility during software development
[13] W. Pedrycz, J. Iljazi, A. Sillitti, G. Succi, Prediction
life cycle, in: 2004 Australian Software Engineering
of the successful completion of requirements in
Conference, ASWEC ’04, IEEE, 2004, pp. 28–37.
software development—an initial study, in: Agent
[2] G. Swathi, A. Jagan, C. Prasad, Writing software
and Multi-Agent Systems: Technology and Appli-
requirements specification quality requirements:
cations, Springer, 2016, pp. 261–269.
An approach to manage requirements volatility, Int.
[14] T. J. Ostrand, E. J. Weyuker, How to measure suc-
J. Comp. Tech. Appl. 2 (2011) 631–638.
cess of fault prediction models, in: Fourth interna-
[3] R. Thakurta, A mixed mode analysis of the im-
tional workshop on Software quality assurance: in
pact of requirement volatility on software project
conjunction with the 6th ESEC/FSE joint meeting,
success, Journal of International Technology and
SOQUA’07, Dubrovnik, Croatia, 2007, pp. 25–30.
Information Management 20 (2011).
[15] T. Clancy, The standish group report, Chaos report
[4] A. M. Alsalemi, E.-T. Yeoh, A systematic literature
(1995).
review of requirements volatility prediction, in:
[16] Y. Jiang, B. Cukic, T. Menzies, Fault prediction using
2017 International Conference on Current Trends
early lifecycle data, in: The 18th IEEE International
in Computer, Electrical, Electronics and Communi-
Symposium on Software Reliability, ISSRE’07, IEEE,
cation, ICCTCEEC-2017, IEEE, 2017, pp. 55–64.
2007, pp. 237–246.
[5] A. Loconsole, J. Börstler, Construction and Valida-
[17] G. Génova, J. M. Fuentes, J. Llorens, O. Hurtado,
tion of Prediction Models for Number of Changes
V. Moreno, A framework to measure and improve
to Requirements, Umeå University Technical Re-
the quality of textual requirements, Requirements
port UMINF-07.03, Umeå University Department of
engineering 18 (2013) 25–41.
Computing Science, UMEÅ, SWEDEN, 2007.
[18] A. Faisandier, Systems architecture and design, Sin-
[6] D. F. X. Christopher, E. Chandra, Prediction of
ergy’Com Belberaud, France, 2013.
software requirements stability based on complex-
[19] T. W. Valente, K. Coronges, C. Lakon, E. Costen-
ity point measurement using multi-criteria fuzzy
bader, How correlated are network centrality mea-
approach, International Journal of Software Engi-
sures?, Connections 28 (2008) 16–26.
neering & Applications 3 (2012) 101–115.
[20] S. P. Borgatti, M. G. Everett, L. C. Freeman, Ucinet
[7] L. Shi, Q. Wang, M. Li, Learning from evolution his-
for windows: Software for social network analysis,
tory to predict future requirement changes, in: 2013
Harvard, MA: analytic technologies 6 (2002).
21st IEEE International Requirements Engineering
[21] F. Eibe, M. A. Hall, I. H. Witten, The weka work-
Conference, RE-2013, IEEE, 2013, pp. 135–144.
bench. online appendix for data mining: practical
[8] A. Goknil, R. van Domburg, I. Kurtev, K. van den
machine learning tools and techniques, in: Morgan
Berg, F. Wijnhoven, Experimental evaluation of a
Kaufmann, 2016.
tool for change impact prediction in requirements
[22] A. Ben-Hur, J. Weston, A user’s guide to support
models: Design, results, and lessons learned, in:
vector machines, in: Data mining techniques for
2014 IEEE 4th International Model-Driven Require-
the life sciences, Springer, 2010, pp. 223–239.
ments Engineering Workshop (MoDRE), MoDRE
[23] D. Zhang, J. J. Tsai, Machine learning applications in
2014, IEEE, Karlskrona, Sweden, 2014, pp. 57–66.
software engineering, volume 16, World Scientific,
[9] B. Morkos, J. Mathieson, J. D. Summers, Compar-
2005.
ative analysis of requirements change prediction
models: manual, linguistic, and neural network,
Research in Engineering Design 25 (2014) 139–156.
59