Predicting Requirements Volatility: An Industry Case Study Anıl Holat1,2 , Ayse Tosun2 1 Aselsan Inc., Ankara, Turkey 2 Faculty of Computer and Informatics Engineerings, Istanbul Technical University, Turkey Abstract Software requirements are exposed to many changes during their software development life-cycle. These changes namely additions, modifications or deletions are defined as requirements volatility. Prior requirement volatility prediction studies utilize different requirement volatility measures. In this study we predict number of changes per software requirement as requirement volatility for a large scale safety-critical avionics project in ASELSAN. We employ a comprehensive metric set to explain requirements volatility: requirement quality measures, project specific factors and requirement interdependencies. Predictive models are created through combining input metric sets with machine learners. Success of models in predicting requirement changes, the best performing input metric combinations, the best performing machine learners and success of models in predicting highly-volatile requirements are evaluated in this study. The best prediction results are obtained with the model employing quality metrics, project specific metrics, network metrics altogether with k-nearest neighbour machine learner (MMRE=0.366). Also the best model correctly identifies 63.2% of highly volatile requirements which are exposed to 80% of the total requirement changes. Our study results are encouraging in terms of creating requirement change prediction tools to prevent requirement volatility risks prior to the requirement review process. Keywords Predicting Requirements Volatility, Quality Metrics, Network Metrics, Requirements Quality, Requirements Change 1. Introduction changes [5], requirement stability index of the project [6], requirements that will be changed in next iteration [7], Although software engineering has experienced signifi- requirement change impact [8], the impact of require- cant advancements in the last decades, majority of the ments changes on project distribution and cost factor large-scale software projects still try to cope with re- [9], software schedule [10]. Related studies also propose quirement changes during their software development requirements complexity metrics [6], requirement de- life cycle due to dynamic nature of software develop- pendency metrics [8], requirement size metrics [5] and ment activities [1]. Changes for requirements namely requirements evolution metrics [7] to predict their own additions, deletions or modifications are defined as re- definition of requirement volatility measure. quirements volatility [2]. Continual requirement changes In our study we aim to predict number of changes during software development have tremendous impact per software requirement by using requirement quality on the cost, the schedule and the quality of the final measures, project specific factors and requirement in- product. Unfortunately, significant number of software terdependencies. We define requirement volatility as projects cannot be completed successfully or completed the number of change requests reported for a software partially because of requirements’ high volatility [2]. requirement. This change request could be either for According to a survey conducted by Thakurta [3], adding a new requirement or modifying an existing re- project managers use various requirement volatility mea- quirement. We chose a safety-critical avionics software sures: number of changes to the identified use cases, project in ASELSAN with more than 20,000 requirements number of changing requirements identified within the for our study. Loconsole et al. [5] conducted a similar issued change requests, realized requirements out of total study to predict number of requirement changes using requirements, and amount of budget the project had to size measures on projects with less than 50 requirements. spent on the changing requirements. Alsalemi et al. [4] Our study complements the prior work by mining a larger also report a literature review on requirements volatility dataset with thousands of requirements and a more com- prediction. Accordingly, ten studies have employed ma- prehensive metric set considering quality and interdepen- chine learning methods to predict requirements volatil- dency aspects of requirements as well as project specific ity until 2017. These studies utilize different require- factors. It should be noted that the change requests that ment volatility measures such as number of requirement we study in this work occured in any phase of software QuASoQ’21: 9th International Workshop on Quantitative Approaches development after Software Requirements Specification to Software Quality, December 06, 2021, Taipei, Taiwan (SRS) document has been reviewed and confirmed. Thus Envelope-Open aholat@aselsan.com.tr (A. Holat); tosunay@itu.edu.tr (A. Tosun) we investigate post-SRS requirements volatility for the Orcid 0000-0002-8727-6768 (A. Holat); 0000-0003-1859-7872 (A. Tosun) avionics project under study. © 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). The rest of paper is organized as follows. Section 2 CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) 51 presents related empirical studies carried on for require- According to the literature review, only one study con- ment volatility prediction. Section 3 explains study de- ducted by Loconsole et al. [5] present an empirical study sign model in detail. Results and Threats to Validity to predict number of changes per requirement, so this of our work discussed in Section 4. Section 5 presents study is the most relevant to our work. Following size conclusion and points out possible directions for future measures are used to predict number of requirement work. changes: number of actors interacting with use cases, number of words in each file, number of revisions of files, number of lines per file. In our study, size measures are 2. Related Work also used to represent requirement quality but we also enriched our metric set with project specific metrics and In this section we present previous studies that aim to network metrics. It should be noted that we do not take predict volatility for requirements, and we focus on the deletion requests and deleted requirements into consider- input metrics they employed. We report details of five ation while defining requirements volatility, because in relevant studies [11, 6, 7, 8, 5] from the literature review our industrial context we rarely encounter such requests conducted by Alsalemi et al. [4]. We also discuss the for the safety-critical software. Finally, we applied our approaches of other recently published, related studies model on more than 20,000 requirements that help us [12, 13] in this section. assess the generalizability of our findings on predicting Nakatani et al. [11] propose a method to predict re- volatility on every software requirement using different quirement volatility using social relations between execu- metrics sets. tives, competitors, cooperative organizations, and the nat- ural environment. Those measures can be applied to cus- tomer requirements easily but it would take some effort 3. Study Design to associate them with software requirements. Christo- pher et al. [6] present requirements complexity metrics In this section we explain our empirical study design in to define volatility. Functional requirement complex- detail. In Section 3.1 research questions are explained. ity, non-functional requirement complexity, input-output The analyzed project for which a model would be pro- complexity, interface and file complexity measures are posed is described in Section 3.2. In Section 3.3 selected used to calculate whole project’s stability, whereas we input metrics for requirement volatility prediction are de- seek to predict requirement volatility for each software scribed. Section 3.4 describes the output measure of the requirement. Shi et al. [7] present a model to predict fu- prediction model. The used tools are explained in Section ture requirement changes by using previous requirement 3.5. Machine learning techniques employed in this study change metrics. They generated six history metrics for are presented in Section 3.6. Finally in 3.7, performance requirements that contain information about volatility of evaluation measures are defined for our model. topic, frequency of changes and time duration between changes. History metrics can be used to predict require- 3.1. Research Questions ments that will be changed in next iteration, but has little use in predicting requirements volatility for new projects. Our main goal is to predict requirements volatility at Pedrycz et al. [13] also employ the following change logs earlier stages of development lifecycle, and accordingly as input metrics: created version of requirement, last de- two research questions are defined. veloper, number of modifications, requirement lifetime Research Question(RQ) 1: To what extent do re- duration. Change logs are created on later phases of soft- quirement quality metrics, project specific metrics and ware development thus they are again not very useful network metrics predict the volatility of a software re- to predict requirements volatility for projects in earlier quirement? development phases. Goknil et al. [8] and Hein et al. [12] Previous studies used different metric sets to predict use requirements interrelations for volatility prediction. requirements volatility. In this study we aim to use a Goknil et al. [8] utilize formal semantics of requirement comprehensive set of input metrics, and observe their relations as input features, whereas Hein et al. [12] create individual effects on requirement volatility prediction. network metrics by using syntactical natural language While predicting the volatility, we use the number of data. We have combined both measures and created change (addition and modification) requests on a soft- network metrics by using links between system and soft- ware requirement. Being inspired by the metric sets used ware requirements instead of lingual relations between in the literature, we form a group of requirement qual- requirement texts. Regarding network metrics we em- ity metrics and network metrics. Additionally, project ploy degree centrality, eigenvector centrality, closeness specific metrics for this particular safety-critical avionics centrality and betweenness centrality metrics, whereas project are defined and utilized throughout this study. Hein et al. [12] used 40 network metrics. During model assessment, the performance of each in- 52 Table 1 3.3. Input Metrics Release based AVPRJ statistics We have employed several metrics to predict the volatil- Release Number Mean CR Median CR ity of each software requirement in AVPRJ. The metrics of REQs per REQ Per REQ represent three dimensions: requirement quality metrics, Release 1 8,640 0.7457 1 project specific metrics and requirement network met- Release 2 11,401 1.1165 1 rics. Requirement quality metrics extracted by NASA Release 3 2,730 0.5267 0 Automated Requirements Measurement(ARM) tool have Total 22,771 0.9051 1 been used to predict faulty modules previously [16]. Ini- tially, we believed the way requirements are documented will affect requirements volatility besides fault proneness put metric set and the combination of those are reported. of modules. Some requirement quality size metrics are Detailed sub-questions related to RQ 1 are also listed already utilized in predicting requirements volatility [5], below: therefore we decided to include requirement quality met- RQ 1.1: Which metric group is a better indicator of ric set in our study. Network metrics are employed to the number of requirement changes? predict requirement change volatility in a recent study RQ 1.2: Which machine learning algorithm is better [12]. This sparked the idea of utilizing network metrics at predicting the number of requirement changes? for requirements volatility prediction. Initial observation RQ 2: How successful are the proposed models in of various change request notes confirmed that software predicting highly volatile software requirements? requirements that are changed within a particular change Software requirements have a history of varying num- request have a tendency to be linked to similar system re- ber of changes during software development life cycle. quirements. Accordingly, we employed network metrics Some requirements do not change at all; however, some created by traceability information. In order to enrich requirements expose to multiple changes and pose risks input metric set with a new metric group we focused on to a software project. Practically, our model should pre- safety-critical avionics project characteristics under this dict highly volatile requirements, so those requirements study. Features are evaluated separately and the ones will be reviewed by experienced reviewers in detail. For would provide information on requirements volatility are this research question (RQ 2) we measure the success of selected as project specific metrics. Rationales of project our models on highly volatile requirements based on a specific metric selection are given in detail in subsection technique in [14]. 3.3.2. Detailed explanations for each group are given in the following subsections. 3.2. Analyzed Project 3.3.1. Quality Metrics We chose a safety-critical avionics software project to perform our analysis. We will refer to this project with While selecting requirement quality metrics to predict AVPRJ in the rest of this paper. AVPRJ has many releases volatility, we have inspired by two studies. The first study from which three releases are selected. Software require- propose requirement metrics in the context of NASA ments for those releases are related since they all belong Metrics Data Program(MDP) to predict software faults to the same project; however they are partially distinct [16]. These metrics are calculated by automatically go- since each release consists of implementation of different ing through requirements documents to highlight vague, software components developed by many software devel- ambiguous, long, complex requirements. The second opers. AVPRJ has a total of 22,771 software requirements. study also reports requirement quality metrics [17] to Some release based descriptive statistic for AVPRJ are find out which requirement quality analyze tool is more given in Table 1. CR is used as an abbreviation for change successful regarding measurement of those metrics. request, REQ is used as an abbreviation for a single re- Combining both studies’ list and customizing that to quirement. Most of the employed requirements belong to the requirements document templates in our industrial second release and this release has highest mean change context, we present 20 quality metrics in Table 2. All of request per software requirement value. Third release these metrics take numeric values, e.g. number of flow has relatively fewer software requirements and less addi- sentences in a requirement, number of directives in a tion or modification is performed on requirements belong requirement. to this release. More than half of the requirements are During the preprocessing, stage, we had to remove modified at least once for this project; 9,848 out of 22,771 three metrics from our analysis since they gave little to requirements are not changed which complies with Stan- no information for AVPRJ: Conditional, Rationale and dish Group’s survey results over more than 8,000 software Subjective. Only one requirement contains conditional projects [15]. expressions, three requirements contain rationale expres- 53 Table 2 Requirement Quality Metrics Acronyms The number of abbreviations in a software requirement. For AVPRJ permitted acronym list is used to extract this metric. Actions The number of actions to be performed if conditions of a software requirement are satisfied. Ambiguity The number of ambiguous expressions in a software requirement, e.g. adequate, sufficiently, optimal, slow. Chars between punctuation Average character count between punctuation marks. Long sentences without punc- tuation marks decrease readability. Conditions The number of conditions need to be satisfied to perform a software requirement. Conditional The number of phrases that give the developers freedom to whether or not to imple- ment a software requirement, e.g. maybe, can’t, would. Connectors The number of connectors that are employed to link multiple sentences or group of words, e.g. and, or, as well as. Directives The number of directive expressions to refer a table, a note, a figure or an example. Flow sentences The number of expressions that semantically bond a sentence to another one, e.g. although, but, else. Imperatives The number of phrases that command to perform particular actions in a software requirement, e.g. shall, must, will. Implicitness The number of pronouns that make the software requirement difficult to understand, e.g. this, that, it. A software requirement should be defined explicitly. Incompleteness The number of expressions that indicate a software requirement is yet incomplete, e.g. and so on, tbd, etc. In links The number of incoming links to a software requirement from other documents. For AVPRJ test cases are linked to software requirements, so the number of in links refer to the number of linked test cases. Negative Sentences The number of phrases that give negative meaning, e.g. doesn’t, none, can’t. Nested levels For AVPRJ nested level metric value is the greatest level in hierarchical nesting structure of a software requirement. Out links The number of out links of a software requirement. In AVPRJ software requirements are linked to system requirements. Therefore the number of out links is the total number of linked system requirements by a software requirement. Rationale The number of expressions that give justification in a software requirement, e.g. thus, in order to. Speculative Sentences The number of speculative phrases which lead to question necessity of a software requirement, e.g. normally, eventually, almost. Subjectivity The number of subjective expressions presenting personal opinion rather than objec- tivity e.g. I think, in my opinion. Text length The total number of characters in a software requirement. sions and none of the requirements have subjective ex- cific metrics employed in this study. If the project follows pressions. Thus we ended up having 17 metrics repre- an inspection activity on requirements, it is more likely senting the quality aspect of requirements for predicting that the team would find the ambiguities and inconsis- their volatility. tencies on the requirements. Since derived requirements are not part of customer needs, they cannot be validated 3.3.2. Project Specific Metrics through user acceptance tests. If a requirement has a safety aspect, more comprehensive software tests will be Project specific metrics may differ regarding the scope performed, thus exposure of a potential change is highly of a software project, but the metrics we chose to use probable. Number of related components is a measure are not so specific to the development environment, pro- of impact of a software requirement on general prod- gramming language, or domain in which the software is uct, thus more feedback will be given to requirements developed. We believe project specific metrics would pro- affecting many components by development team. Each vide information about development characteristics in an software release has different dynamics that affect re- organization, and hence the factors affecting the change quirements maturity e.g. release schedule, experience of proneness of requirements. Table 3 list these project spe- developers, complexity of system. For example if sched- 54 Table 3 ments regarding system requirement traceability links. Project Specific Metrics Software requirements which are derived from similar Inspection Indicates if a software requirement is system requirements are tend to be closer in our model. evaluated through an inspection activ- Weight assignment formula is given below (Equation 1). ity. This procedure might be preferred W is weight between software requirements, NCLINK to complement functional tests. is the number of common system requirement links be- Derived Software requirements that are not ex- tween two software requirements and NTOTLINK is the plicitly stated in system requirements total number of system requirements linked from those but derived based on design decisions two software requirements. After weight assignment, [18]. a symmetrical 𝑛 × 𝑛 matrix is created where n denotes Safety Shows if a software requirement is the number of software requirements. Then the network safety critical. metrics are computed over this matrix. No. of Re- Number of isolated software compo- lated Com- nents that a requirement is related. 𝑁 𝐶𝐿𝐼 𝑁 𝐾𝑖,𝑗 ponents 𝑊𝑖𝑗 = (1) Release Release number that the software re- 𝑁 𝑇 𝑂𝑇 𝐿𝐼 𝑁 𝐾𝑖,𝑗 Number quirement belongs to. 3.4. Model Output Table 4 Our proposed model output is the number of change re- Network Metrics quests per software requirement. After the SRS document Degree cen- Gives score to requirements based is reviewed and completed for AVPRJ, change requests trality on the number of links. linked to each software requirement are reported in the Betweenness Measures how many times a require- issue management system, and the document is modified centrality ment is on the shortest path in the accordingly by the analysis team. Thus we define require- graph. ments volatility in our industrial context with respect Closeness Indicates how close a requirement to number of change requests that have been applied centrality to other requirements considering the whole graph. to add a new requirement or to modify an existing re- Eigenvector Measures how a node influences quirement in the associated SRS document. Please note centrality other nodes in network through con- that our model outputs decimal values, but number of nections. change request per requirement in practice can only get integer values. Therefore we round fractional parts to the nearest integer. ule is too tight to complete SRS document, requirements could be immature and more requirements changes could 3.5. Tools be performed in the future for this release. We wrote scripts to extract requirement quality and 3.3.3. Network Metrics project specific metrics from SRS documents. Later, UCINET tool [20] is used to create network metrics from Hein et al. [12] earlier utilized 40 network metrics to pre- the matrix that we extracted based on software and sys- dict requirements change volatility. On the other hand, tem requirements. Regression models with different ma- Valente et al. [19] present correlations between degree, chine learners are trained using WEKA tool [21]. Pre- betweenness, closeness, eigenvector centrality measures, diction results are further post-processed in MATLAB to and indicate that those measures are distinct but notion- obtain the performance measures regarding all RQs. ally related. Thus in this work instead of employing 40 metrics, we chose the metrics suggested in [19] to pre- 3.6. Machine Learning Techniques dict requirements volatility for AVPRJ. These centrality metrics give each software requirement a value regard- We train models using linear regression, random for- ing their position in network. Brief explanations of the est regression, support vector regression and k-nearest employed network metrics are given in Table 4. neighbor regression methods. Linear regression was uti- Hein et al. [12] used language processing to create lized in [5], whereas classifier version of the other three network for requirements. In this study instead we used techniques were used in [12]. traceability information to create network graph for soft- For k-nearest neighbor regression, inversely propor- ware requirements. Traceability links from software re- tional weighting option is selected. Higher weights are quirements to system requirements are used for this pur- assigned to closer training samples which resulted in bet- pose. We assigned weights between software require- ter prediction results for our model. For support vector 55 regression commonly used radial basis function kernel Table 5 is selected. Increasing gamma parameter too much may Requirements volatility rank results evaluation result in over-fitting [22] and we also experienced a great Condition Evaluation computational cost with little to no prediction success gain for large gamma. Thus C and gamma parameters 𝑅𝑎𝑐𝑡𝑢𝑎𝑙 ,𝑅𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑 ≤ 𝑁𝑟𝑒𝑞 × 𝑃 True Positive are assigned as 1. 𝑅𝑎𝑐𝑡𝑢𝑎𝑙 ,𝑅𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑 > 𝑁𝑟𝑒𝑞 × 𝑃 True Negative In this study 10-fold cross validation technique is used 𝑅𝑎𝑐𝑡𝑢𝑎𝑙 ≤ 𝑁𝑟𝑒𝑞 × 𝑃, 𝑅𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑 > 𝑁𝑟𝑒𝑞 × 𝑃 False Negative to split training and test sets. Firstly, the dataset contain- 𝑅𝑎𝑐𝑡𝑢𝑎𝑙 > 𝑁𝑟𝑒𝑞 × 𝑃, 𝑅𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑 ≤ 𝑁𝑟𝑒𝑞 × 𝑃 False Positive ing all software requirements is shuffled randomly and split into 10 groups of approximately equal size. One group is labeled as a test set and other groups are used • Step 5: Calculate recall, accuracy and false alarm to train machine learning models. This procedure is re- rate. peated 10 times until each unique group is used as test set once. Table 5 can be interpreted as follows: True Positive in- stances are requirements that are actually highly volatile and the model also categorizes those as highly volatile. 3.7. Performance Evaluation In the case of True Negatives, a requirement is actually For RQ 1, the following measures are used for perfor- less volatile, so is its prediction. False Negatives occur mance evaluation: Mean Magnitude of Relative Error when highly volatile requirements are regarded as less (MMRE), Median Magnitude of Relative Error (MdMRE), volatile by the predictor. Finally, False Positives indicate Pred(0.5) and Pred(0.25) [23]. Relative error is calculated less volatile requirements predicted as highly volatile. according to Equation 2. 𝐸𝑟𝑟𝑟𝑒𝑙𝑎𝑡𝑖𝑣𝑒 is relative error, 𝑉 𝑎𝑙𝑎𝑐𝑡 To answer RQ2, recall, accuracy and false alarm rate is the actual value, whereas 𝑉 𝑎𝑙𝑝𝑟𝑒𝑑 is the predicted value. measures are computed. Recall result shows how suc- cessful model in predicting highly volatile requirements. |𝑉 𝑎𝑙𝑎𝑐𝑡 − 𝑉 𝑎𝑙𝑝𝑟𝑒𝑑 | According to us this measure is the most important one 𝐸𝑟𝑟𝑟𝑒𝑙𝑎𝑡𝑖𝑣𝑒 = (2) regarding RQ2. Accuracy measure shows the prediction |𝑉 𝑎𝑙𝑎𝑐𝑡 | success for both highly volatile and less volatile require- There are requirements with zero change requests. ments. False alarm rate presents how much effort has Thus division by zero problem arises while calculating put in vain by mis-evaluating less volatile requirements. relative error. We made an assumption for unchanged requirements as presented in Equation 3. 4. Results and Discussion |𝑉 𝑎𝑙𝑎𝑐𝑡 − 𝑉 𝑎𝑙𝑝𝑟𝑒𝑑 | We present and discuss the performance of the models If 𝑉 𝑎𝑙𝑎𝑐𝑡 = 0 𝐸𝑟𝑟𝑟𝑒𝑙𝑎𝑡𝑖𝑣𝑒 = (3) 1 with respect to two RQs in this section. We also compare Pred(k) is a measure of variance of the error distribu- the performance of the prediction models proposed in tion. This measure is based on relative error and it shows this study with the prior work [5] . the percentage of predictions whose errors are less than or equal to k. 4.1. RQ 1 For RQ2, we aim to predict highly volatile require- ments, and thus, we first employ a method to identify After obtaining processed data, machine learning regres- those among the set of requirements: sion methods are applied to answer the question if re- quirement quality metrics, network metrics, project spe- • Step 1: Rank requirements by their actual number cific metrics can be used to predict the number of changes of change requests in descending order and record on each software requirement by employing machine their rank as 𝑅𝑎𝑐𝑡𝑢𝑎𝑙 . learning methods. Model performance results are gath- • Step 2: Obtain regression prediction results for ered for all input metric and machine learning method each software requirement. combinations separately. • Step 3: Rank requirements by their predicted Results for RQ 1 is given in Table 6. The following number of change requests in descending order abbreviations are used: ML for machine learning, Q for and record their rank as 𝑅𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑 . requirement quality metrics, P for project specific met- • Step 4: Evaluate results according to the listing rics, N for network metrics, KNN for k-nearest neighbor in Table 5. P denotes percentage of requirements regression, LR for linear regression, RF for random forest which are perceived as highly volatile, and 𝑁𝑟𝑒𝑞 regression and SVR for support vector regression. denotes total number of requirements in valida- In terms of input metric combinations, the best MMRE tion set. results are achieved with Q&P&N(0.366), Q&N(0.381) and 56 Table 6 tion models give the same best result, we do not rank the Performance evaluation results for RQ 1 best performing models with regard to MdMRE. The best performance with respect to Pred(0.25) and Metrics+ML MMRE MdMRE Pred(0.5) Pred(0.25) method Pred(0.5) are obtained with Q&P&N+KNN. Q&N+KNN and Q&P+KNN report the second and third best results. Q&P&N+KNN 0.366 0 0.681 0.57 Those results indicate that requirement quality metrics Q&P&N+LR 0.53 0.5 0.524 0.411 and k-nearest neighbor algorithm are also successful with Q&P&N+RF 0.392 0 0.663 0.545 respect to Pred measures. Q&P&N+SVR 0.392 0 0.662 0.554 Q&P+KNN 0.402 0 0.641 0.529 To sum up, best performance results are achieved by us- Q&P+LR 0.513 0.5 0.541 0.428 ing requirement quality metrics, project specific metrics Q&P+RF 0.45 0.333 0.6 0.486 and network metrics altogether. Accordingly, K-nearest Q&P+SVR 0.459 0.5 0.584 0.479 neighbor algorithm gives best performance results for P&N+KNN 0.422 0 0.632 0.52 all measures. In all best performing models, quality met- P&N+LR 0.55 0.667 0.484 0.372 rics are utilized either as a pair with project or network P&N+RF 0.454 0.5 0.595 0.48 metrics or as combination of all three. It seems the way P&N+SVR 0.469 0.5 0.561 0.475 requirements are documented has a high effect on the Q&N+KNN 0.381 0 0.665 0.553 volatility rates. Q&N+LR 0.534 0.5 0.52 0.407 We compared our findings against the study conducted Q&N+RF 0.394 0 0.662 0.545 by Loconsole et al. [5]. Table 7 reports the performance Q&N+SVR 0.426 0 0.621 0.515 Q+KNN 0.443 0.333 0.598 0.488 of linear regression model with the best metric set in our Q+LR 0.512 0.5 0.542 0.43 study, our best performing model and the best perform- Q+RF 0.455 0.5 0.594 0.483 ing model of [5]. If we compare the findings only on LR, Q+SVR 0.483 0.5 0.556 0.446 we observe that using number of lines predicts volatility P+KNN 0.555 0.667 0.483 0.37 better on their commercial setting, while in our context P+LR 0.548 0.667 0.485 0.373 using quality metrics only does not give the best result. P+RF 0.556 0.667 0.483 0.371 Other algorithms like KNN in combination with all met- P+SVR 0.516 0.5 0.512 0.417 rics significantly improve the prediction performance by N+KNN 0.448 0.5 0.596 0.482 reducing MMRE down to 0.36 and MdMRE down to 0, N+LR 0.549 0.667 0.484 0.372 and increasing Pred(0.25) up to 57%. N+RF 0.485 0.5 0.561 0.446 N+SVR 0.53 0.5 0.5 0.392 4.2. RQ 2 Table 7 RQ 2 aims to measure the success of our model in predict- Comparison of our performance (RQ 1) against [5] ing highly volatile requirements. We present our tech- nique to identify highly volatile requirements in Section MMRE MdMRE Pred(0.25) Pred(0.5) 3.7. We first need to determine change request cover- Q+LR 0.51 0.5 0.43 0.54 age to categorize highly-volatile requirements, and later Best model 0.36 0 0.57 0.68 calculate recall, accuracy and false alarm rates. Rates NLines+LR [5] 0.58 0.27 0.5 0.63 for various change request coverage by most volatile requirements are given in Table 8. As change request coverage grows more requirements are labeled as highly Q&P(0.402). We may interpret that requirement quality volatile. We chose 80% change request coverage since metrics (Q) are successful at predicting number of change approximately 40% percent of reviewers are considered requests per software requirement, and its combinations as well-experienced in AVPRJ. Therefore by applying with the other metrics also give good results. With re- this model we could assign review task of 38.6% of total spect to the machine learning algorithm, the three best requirements, which are possibly highly volatile, to ex- performing metric combinations give the highest predic- perienced developers in early phase of development. We tion performance when k-nearest neighbor algorithm is did not present other coverage results in this study due utilized. to page limitation. MdMRE is zero for the following metric and ma- In Table 9 the best recall results are achieved chine learner combinations: Q&P&N+KNN, Q&P&N+RF, with Q&P&N+KNN(0.632), Q&N+RF(0.616) and Q&P&N+SVR, Q&P+KNN, P&N+KNN, Q&N+KNN, Q&P+KNN(0.604). Again, all best performing models Q&N+RF and Q&N+SVR. Number of change requests have requirement quality metrics in common, whereas for more than half of the software requirements are pre- the best combination consists of all metrics. The best dicted correctly with these models. Since many predic- accuracy results are obtained from Q&P&N+KNN(0.716), 57 Table 8 4.3. Threats to Validity Change request and requirement coverage relation for AVPRJ Internal validity: In this study we present require- CR Coverage Percent REQ Coverage ments volatility in a software project can be predicted to 60% 20.5% some extent utilizing requirement quality metrics, project 70% 29.6% specific factors and network metrics altogether. However, 80% 38.6% this does not imply causal relationship between input 90% 47.7% and output metrics since we did not conduct a controlled experiment. External validity: We have conducted the case study Table 9 on one project, so results have local validity. However, Performance results of RQ 2 model for 80% CR coverage the dataset is quite large with more than 20,000 require- Metrics+ML Re- Accu- False Alarm ments from three distinct releases developed by many method call racy Rate software developers. Nonetheless, applying the predic- Q&P&N+KNN 0.632 0.716 0.232 tive models on different projects in the future would be Q&P&N+LR 0.531 0.638 0.295 better in terms of generalization of results. Q&P&N+RF 0.624 0.71 0.237 Construct validity: Developers did not use their na- Q&P&N+SVR 0.591 0.684 0.258 tive language in software requirements. Thus there could Q&P+KNN 0.604 0.694 0.249 be some typos which may affect textual requirement qual- Q&P+LR 0.508 0.62 0.31 ity metrics. Also there could be some expressions used by Q&P+RF 0.602 0.692 0.251 developers in software requirements, e.g. subjective ex- Q&P+SVR 0.558 0.658 0.278 pressions that should have taken into consideration while P&N+KNN 0.574 0.671 0.268 creating requirement quality metrics but we missed. Due P&N+LR 0.418 0.55 0.366 to the size of dataset we couldn’t manually check these P&N+RF 0.557 0.658 0.279 kinds of typos and grammatical errors, but we know that P&N+SVR 0.496 0.61 0.318 Q&N+KNN 0.614 0.702 0.243 reviewers are responsible for correcting those. We create Q&N+LR 0.53 0.637 0.296 network graphs based on traceability links between soft- Q&N+RF 0.616 0.703 0.242 ware and system requirements as indicated in the SRS. Q&N+SVR 0.569 0.667 0.271 We could have use linguistic data to connect software Q+KNN 0.586 0.68 0.261 requirements as previous study [12] and it may reflect Q+LR 0.508 0.62 0.31 relationship between requirements in a better way. We Q+RF 0.582 0.677 0.263 plan to do it as a future work. Q+SVR 0.529 0.636 0.296 Conclusion validity: For RQ 2 only the results of 80% P+KNN 0.503 0.616 0.313 change request coverage are presented due to page limi- P+LR 0.423 0.554 0.363 tation. Regarding the results of other CR coverage, we P+RF 0.496 0.611 0.317 P+SVR 0.462 0.584 0.339 observe higher recall and accuracy whereas false alarm N+KNN 0.557 0.657 0.279 rate grows undesirably as the coverage grows. Therefore N+LR 0.404 0.539 0.376 RQ 2 results would differ in that way if we had chosen N+RF 0.565 0.664 0.274 other CR coverage rate. N+SVR 0.498 0.612 0.316 5. Conclusion and Future Work Q&N+RF(0.703) and Q&P+KNN(0.694). The lowest false In this paper, we have carried out an empirical study alarm rate results are achieved by Q&P&N+KNN(0.232), to predict number of changes per software requirement Q&N+RF(0.242) and Q&P+KNN(0.249). K-nearest by using requirement quality measures, project specific neighbor and random forest regression methods are factors and requirement interdependencies. 22,771 soft- successful in predicting highly volatile requirements for ware requirements from a safety-critical software project 80% change request coverage. in ASELSAN are utilized to build 28 prediction models The most important measure for RQ 2 is recall since and assess the best performing metric suite and algo- the purpose of this question is to measure success on rithm. We conclude that we can predict volatility of predicting highly volatile requirements. We correctly requirements with an average MMRE of 36% by observ- identify 63.2% of highly volatile requirements which are ing metrics of similar requirements through KNN. We exposed to 80% of the total requirement changes. also observe that measuring requirements from different aspects like quality, project and network dependencies gives a much better performance. We plan to integrate 58 such a predictor model into requirement management [10] X. Wang, C. Wu, L. Ma, Software project schedule tools like DOORS to be used prior to the SRS review variance prediction using bayesian network, in: activity so that highly-volatile requirements could be au- 2010 IEEE International Conference on Advanced tomatically and accurately identified. This way, software Management Science, volume 2 of ICAMS 2010, development leads could take precautions beforehand to IEEE, Chengdu, China, 2010, pp. 26–30. reduce requirements volatility related risks. Since there [11] T. Nakatani, T. Tsumaki, Predicting requirements is not enough empirical studies conducted in related area, changes by focusing on the social relations, in: more empirical research should be carried out to validate Proceedings of the Tenth Asia-Pacific Conference the best performing models. on Conceptual Modelling, volume 154 of APCCM 2014, Auckland, New Zealand, 2014, pp. 65–70. [12] P. H. Hein, E. Kames, C. Chen, B. Morkos, Employ- References ing machine learning techniques to assess require- ment change volatility, Research in Engineering [1] N. Nurmuliani, D. Zowghi, S. Powell, Analysis of re- Design 32 (2021) 245–269. quirements volatility during software development [13] W. Pedrycz, J. Iljazi, A. Sillitti, G. Succi, Prediction life cycle, in: 2004 Australian Software Engineering of the successful completion of requirements in Conference, ASWEC ’04, IEEE, 2004, pp. 28–37. software development—an initial study, in: Agent [2] G. Swathi, A. Jagan, C. Prasad, Writing software and Multi-Agent Systems: Technology and Appli- requirements specification quality requirements: cations, Springer, 2016, pp. 261–269. An approach to manage requirements volatility, Int. [14] T. J. Ostrand, E. J. Weyuker, How to measure suc- J. Comp. Tech. Appl. 2 (2011) 631–638. cess of fault prediction models, in: Fourth interna- [3] R. Thakurta, A mixed mode analysis of the im- tional workshop on Software quality assurance: in pact of requirement volatility on software project conjunction with the 6th ESEC/FSE joint meeting, success, Journal of International Technology and SOQUA’07, Dubrovnik, Croatia, 2007, pp. 25–30. Information Management 20 (2011). [15] T. Clancy, The standish group report, Chaos report [4] A. M. Alsalemi, E.-T. Yeoh, A systematic literature (1995). review of requirements volatility prediction, in: [16] Y. Jiang, B. Cukic, T. Menzies, Fault prediction using 2017 International Conference on Current Trends early lifecycle data, in: The 18th IEEE International in Computer, Electrical, Electronics and Communi- Symposium on Software Reliability, ISSRE’07, IEEE, cation, ICCTCEEC-2017, IEEE, 2017, pp. 55–64. 2007, pp. 237–246. [5] A. Loconsole, J. Börstler, Construction and Valida- [17] G. Génova, J. M. Fuentes, J. Llorens, O. Hurtado, tion of Prediction Models for Number of Changes V. Moreno, A framework to measure and improve to Requirements, Umeå University Technical Re- the quality of textual requirements, Requirements port UMINF-07.03, Umeå University Department of engineering 18 (2013) 25–41. Computing Science, UMEÅ, SWEDEN, 2007. [18] A. Faisandier, Systems architecture and design, Sin- [6] D. F. X. Christopher, E. Chandra, Prediction of ergy’Com Belberaud, France, 2013. software requirements stability based on complex- [19] T. W. Valente, K. Coronges, C. Lakon, E. Costen- ity point measurement using multi-criteria fuzzy bader, How correlated are network centrality mea- approach, International Journal of Software Engi- sures?, Connections 28 (2008) 16–26. neering & Applications 3 (2012) 101–115. [20] S. P. Borgatti, M. G. Everett, L. C. Freeman, Ucinet [7] L. Shi, Q. Wang, M. Li, Learning from evolution his- for windows: Software for social network analysis, tory to predict future requirement changes, in: 2013 Harvard, MA: analytic technologies 6 (2002). 21st IEEE International Requirements Engineering [21] F. Eibe, M. A. Hall, I. H. Witten, The weka work- Conference, RE-2013, IEEE, 2013, pp. 135–144. bench. online appendix for data mining: practical [8] A. Goknil, R. van Domburg, I. Kurtev, K. van den machine learning tools and techniques, in: Morgan Berg, F. Wijnhoven, Experimental evaluation of a Kaufmann, 2016. tool for change impact prediction in requirements [22] A. Ben-Hur, J. Weston, A user’s guide to support models: Design, results, and lessons learned, in: vector machines, in: Data mining techniques for 2014 IEEE 4th International Model-Driven Require- the life sciences, Springer, 2010, pp. 223–239. ments Engineering Workshop (MoDRE), MoDRE [23] D. Zhang, J. J. Tsai, Machine learning applications in 2014, IEEE, Karlskrona, Sweden, 2014, pp. 57–66. software engineering, volume 16, World Scientific, [9] B. Morkos, J. Mathieson, J. D. Summers, Compar- 2005. ative analysis of requirements change prediction models: manual, linguistic, and neural network, Research in Engineering Design 25 (2014) 139–156. 59