Towards Automatically Identifying Potential Sustainability Effects of Requirements Iris Groher Norbert Seyff Tahira Iqbal Johannes Kepler University FHNW fortiss GmbH Linz, Austria Windisch, Switzerland Munich, Germany iris.groher@jku.at University of Zurich iqbal@fortiss.org Zurich, Switzerland norbert.seyff@fhnw.ch Abstract—Software developers are gradually becoming certain aspects of a system are modified in the context of aware that their systems have effects on sustainability. The system evolution. This, in part, also makes the identification identification of potential effects software-intensive systems can of potential effects a hypothetical endeavour, which often have on different sustainability dimensions over time is yet in its infancy. Researchers are currently exploring approaches which needs to be based on expert opinions rather than facts. Only strongly make use of expert knowledge to identify potential after development, when the system is used in its application effects. In this work in progress paper, we are looking at the context, one can eventually validate its effects on the different problem from a different angle: we report on the exploration of sustainability dimensions over time. a machine learning-based approach to identify potential effects. Researchers have started to build methods and tools to Such an approach allows to save time and costs but increases the risk that potential effects are overseen. First results of support the identification of sustainability effects [5] [6] [7]. applying the machine learning-based approach in the domain of Although such approaches can be successfully applied to home automation systems are promising, but also indicate that identify potential sustainability effects, it appears that they further research is needed before our approach can be applied require a significant time investment of companies, which in practice. Furthermore, we have learned that even providing might prevent their adoption. the ground truth for training the algorithms is a challenging task. In this work in progress paper, we follow a current trend in Index Terms—Sustainability, Analysis, Requirements, Machine software and requirements engineering and propose the use of Learning. Machine Learning (ML) for the early identification of potential sustainability effects. In this paper, we present this idea in I. I NTRODUCTION more detail and also report on a first application experiment in the home automation domain. Based on requirements for home Software-intensive systems do not operate in isolation but in automation systems, we have identified potential sustainability complex socio-technical contexts. Therefore, they have an im- effects and have used these results as ground truth for training pact on this context, manifesting itself in different dimensions our algorithms. Early results indicate that using ML for the such as the environmental, economic, social, individual, and identification of sustainability effects is promising, which technical dimension [1]. As effects can occur over time, we motivates us to continue with this research. can also identify three different orders of effects [2] [3]. The In Section 2 we discuss existing approaches for identifying cumulative positive and negative effects a software-intensive sustainability effects in requirements. In Section 3 we present system has on its context define its sustainability. our ML-based approach and report on a first experiment we In previous research [4], we have learned that practitioners conducted in Section 4. In Section 5 we conclude the paper are not aware of the fact that software-intensive systems have and present an outlook on next steps we plan. an impact on sustainability and that raising awareness is an essential step towards the development of sustainable software- II. E XISTING S USTAINABILITY E FFECT I DENTIFICATION intensive systems. Furthermore, the complexity of this matter A PPROACHES AND T OOLS and the lack of adequate methods and tools supporting the The work presented in this paper is motivated by research in identification of potential effects are hurdles for practitioners the field of requirements engineering, where researchers aim who are already aware of their responsibility to build sustain- at identifying potential sustainability effects. able systems. In previous work on tailoring requirements negotiation to Requirements are the key to sustainability [1], which indi- sustainability [5], an extension of the WinWin negotiation cates that the identification of potential sustainability effects model was proposed. This approach incorporates sustainability needs to start before systems are actually built or when so that the negotiation is used to identify potential effects of Copyright 2019 for this paper by its authors. Use permitted under Creative requirements on sustainability. For requirements which might Commons License Attribution 4.0 International (CC BY 4.0). have negative effects, alternative requirements options are discussed to minimize these negative effects. This method on different levels of granularity and prioritizing requirements was applied in an exploratory industrial case study, where [16] [17]. it allowed practitioners to reflect on requirements and their B. ML Application Overview effects on sustainability. To automatically identify the impact of requirements on Recent work presents a question-based framework for rais- sustainability we follow the ML workflow [18], as shown in ing awareness of the potential effects of software systems on Fig. 1. sustainability [6]. The Sustainability Awareness Framework The first key step is data preprocessing that helps to avoid (SusAF) was used by students to carry out interviews for a data incompleteness and inconsistency issues. This data is software system of their choice to identify potential effects of used as an input for the ML algorithm, which means that the these systems on sustainability and in particular even identify ML algorithm learns from existing data. On the basis of this potential chain of effects. Results from this feasibility study learning, a learning model is produced as output, which can indicate that SusAF stimulates the discussion about potential be used to make predictions on a different dataset than the one effects of software systems on sustainably. used for training. The learning model can be evaluated based Alharthi et al. [7] present the SuSoftPro tool, which supports on its performance. For measuring the performance, we use the analysis of the impact of requirements on different sustain- well-known ML parameters such as precision, recall, accuracy, ability dimensions via a Fuzzy Rating Scale method. This tool- and F-score. supported approach allows for different visualization option of In the next section, we will describe how we have performed the results (e.g., a bar graph that illustrates the sustainability the above discussed steps in our experiment. level). We conclude that researchers have identified the need IV. A F IRST E XPERIMENT for identifying sustainability effects and that first promising We performed a first experiment in the domain of home methods and even tool-supported approaches are appearing. automation systems to evaluate our ML-based approach. In However, all presented methods strongly depend on human the next subsection, we describe the setup of our experiment involvement and might require time-intensive discussions, the and in the subsequent subsection we present our preliminary involvement of experts or a large number of people to derive results. results. The quality of the produced results might further vary A. Setup a lot depending on different factors such as the complexity of the domain and the level of expertise of the people involved. Data: The dataset used for training and evaluating our ML- Nevertheless, bespoke methods can raise awareness and can based approach is comprised of publicly available smart home help, at least in part, to improve the sustainability of software- requirements1 . The requirements were collected as part of intensive systems. research on crowd-RE [19]. In a first step, three annotators manually classified 200 III. M ACHINE L EARNING -BASED E FFECT I DENTIFICATION randomly selected requirements from the total set of avail- able requirements (around 2900). All three annotators had The goal of our ongoing research is to automate the iden- proficient knowledge and expertise for sustainable systems in tification of potential sustainability effects by analysing the software engineering. For each requirement, each annotator requirements of a software-intensive system. In contrast to independently marked which sustainability dimension(s) it had existing methods in place, we expect that such an approach an effect on. To support this classification, a literature review will result in the significant reduction of the effort needed has been performed on sustainability dimensions and created for the analysis. However, we also see the risks that such an a classification guideline based on the results of this analysis. approach might result in overlooking potential effects. The guideline contained for each dimension a set of influence factors and for each factor a description, rational, example A. ML in Software and Requirements Engineering requirements, and literature references for further reading. To achieve this goal, we follow a recent trend in software Table I shows an example influence factor in the environmental and requirements engineering and explore the application of dimension. ML [8] [9]. ML has already been successfully used to clas- The annotators used the guideline as a reference during the sify software requirements into functional and non-functional manual classification of the 200 smart home requirements. requirements [10] [11]. The analysis of a large number of Each requirement was independently classified according to user feedback from multiple sources such as the app store its influence on the five dimensions of software sustainability and Twitter has been automated by applying ML [12] [13]. as positive, negative, or neutral. The plus sign (+) was assigned This analysis helps to identify useful information such as bug for positive influence, a minus sign (-) for negative influence, reports and feature requests to support software evolution. For and no sign for indicating no influence as shown in Table II. validating requirements, automated analysis of requirement The ratings were merged and cases in which the researchers traceability with the help of natural language processing and did not agree were discussed until consensus was reached. ML has been studied [14] [15]. ML has also been applied 1 Smarthome Crowd Requirements Dataset, in requirements management such as visualizing requirements https://crowdre.github.io/murukannaiah-smarthome-requirements-dataset/ Fig. 1. Basic ML workflow steps TABLE I E XAMPLE INFLUENCE FACTOR Recall = T P/(T P + F N ) (4) Dimension Environment Factor Recycling Description Process of converting waste mate- F 1 − score = P recision/Recall (5) rials into new materials and objects Influence/Rational Strong positive influence on the en- vironment as not only the amount Here, TP (True Positive) is the number of requirements of waste is reduced but also the correctly classified as belonging to a category. TN (True farming of natural resources to cre- Negative) is the number of requirements that are correctly ate new products is decreased. Example If a device needs to be removed classified as not belonging to a category. FP (False Positive) is from the system, the system should the number of requirements incorrectly classified as belonging provide information on how to dis- to a category, and FN (False Negative) is the number of pose it properly. requirements that are incorrectly classified as not belonging to a category. B. Application In simple words, accuracy is the ratio of correctly classified data over total data. This helps to predict the model perfor- Pre-processing: We applied natural language processing mance whereas high accuracy results in a better performance techniques on our data before applying the different ML algo- of the model. However, data can be asymmetric and thus rithms. First, we applied text tokenization on each requirement. parameters other than accuracy need to be evaluated. The Then we eliminated all stop words and converted the text into precision metric refers to the ratio of correctly predicted small characters. We applied stemming as our next step. As positive values to the total number of predicted positive values. the last step, we converted pre-processed text as a vector space On the other hand, recall is the ratio of total predicted positive model using Term Frequency-Inverse Document Frequency values to the actual number of positive values. It is not possible (TF-ID or TF-IDF) as a weighting scheme: to maximize both recall and precision metrics at the same time, tf idf (t, d) = tf (t, d) ∗ idf (1) as one comes at the cost of another. To consider both, F1-score is used which is the harmonic mean of precision and recall. Here t is a term in a vector and d is a requirement in a The highest accuracy and F1-score decide which algorithm collection of requirements [20]. is the best among others. We achieved the highest accuracy Classification: For the automated classification of sustain- with DT classifier (70% precision) followed by SVM (69% ability requirements and their dimensions, we trained our precision). Our dataset was not balanced, meaning that the model using the annotated requirements dataset. We imple- five different sustainability dimensions were not equally rep- mented Nave Bayes (NB), k-Nearest Neighbor (KNN), De- resented in the dataset (see Fig. 2). The economic, environ- cision trees (DT), Support Vector Machine (SVM), Logistic mental, and social dimensions were almost equally repre- regression (LR), and Neural Network (NN) algorithms and sented. The individual dimension had high occurrences and also trained our classifier with and without stemming the data, the technical dimension was almost inexistent. To overcome as discussed in [2]. The results were quite similar with a minor this problem, we used the weighting technique by assigning difference and we used stemmed data for final analysis. We more weight to fewer data. After applying this setting, we used tenfold cross-validation for evaluating our results. achieved the highest accuracy for SVM (75%), as shown in C. Results column SVM (b) of Table III. Recall and precision are 63% and 57% respectively, which is acceptable according to our The results for six different classifiers from our experiments accuracy. We also calculated the F1-scores, and the highest are shown in Table III. For choosing the best classifier, we value was achieved with SVM (60%). evaluated the performance of all these classifiers on the The results from this initial experiment can be improved basis of commonly used ML metrics i.e., accuracy, precision, further as we observed some issues that are impacting our recall, and F1-score. These metrics can be calculated using results. The structure of our dataset varied with respect to the following formulae: the length of the textual requirements. For example, one requirement consisted of 20 words, and another one consisted Accuracy = T P + T N/T P + F P + F N + T N (2) of 200 words. Due to the significant difference regarding their length, our ML classifier features were sparse, which might P recision = T P/(T P + F P ) (3) have lead to an underfitted model. TABLE II M ANUALLY CLASSIFIED SMART HOME REQUIREMENTS ID Req T I S Ec En 1 Music should be available throughout the house - + - - 2 Temperature in the house should be adjusted based on the weather outside + + + 3 The lights shut be shut off in the rooms with nobody in them + + + 4 The garage door should be opened when it senses my vehicle arriving outside of it + Fig. 2. Distribution of sustainability dimensions TABLE III S CORES FOR THE DIFFERENT CLASSIFIERS NB KNN DT LR SVM SVM (b) NN Accuracy 0.58 0.65 0.7 (0.03) 0.62 0.67 0.75 0.69 Precision 0.46 0.51 0.46 0.57 0.5 0.57 0.44 Recall 0.43 0.56 0.56 0.54 0.48 0.63 0.47 F-score 0.44 0.53 0.51 0.56 0.48 0.60 0.45 Moreover, our negative and positive influence values on the As a next step, we plan to increase the size of our dataset. sustainability dimensions were also not equally distributed. We have already designed a web-based solution where experts Our data only contained 12 requirements with negative in- can update the categorized requirements and add additional fluence, the rest were positive influences. requirements. This web-based tool will help us to improve our labeling and support us in getting more data. It will also allow us to provide a more balanced dataset. V. C ONCLUSION AND N EXT S TEPS Our current results indicate that there is the risk of overlook- The goal of our ongoing research is the automation of the ing requirements which have potential effects. As a next step, identification of requirements, which potentially have effects we plan to focus on optimizing recall to minimize this risk. We on the sustainability of a software-intensive system. envision that our approach could be used to complement exist- In this paper, we present the application of a state-of-the-art ing methods. Instead of discussing each requirement manually, ML approach to support effect identification. Our first results our approach could provide a list of relevant requirements, indicate that ML can be successfully used for the identification which should be discussed further by human stakeholders. of potential sustainability effects. However, we have learned High recall might result in lower precision, which means that that the results from our first experiment can be improved human stakeholders are confronted with a larger number of further. This starts with the dataset. The current dataset can false positives. However, we expect that providing a reduced be improved to generate a more suitable learning model for set of requirements for discussion will enable practitioners to the classifiers. save time compared to a full manual analysis. Similar to other work in this field, we have also experienced [11] Douglas S Lange. Text classification and machine learning support for that manually identifying potential effects can lead to different requirements analysis using blogs. In Monterey Workshop, pages 182– 195. Springer, 2007. opinions amongst the annotators. Although, the three annota- [12] Grant Williams and Anas Mahmoud. Mining twitter feeds for software tors were able to agree on a ground truth used for training our user requirements. In 2017 IEEE 25th International Requirements classifier, we would like to highlight that our results might Engineering Conference (RE), pages 1–10. IEEE, 2017. [13] Walid Maalej and Hadeer Nabil. Bug report, feature request, or simply reflect a rather subjective viewpoint of the annotators. praise? on automatically classifying app reviews. In 2015 IEEE 23rd Overall, we would like to explore how we can integrate our international requirements engineering conference (RE), pages 116–125. work into existing studies for requirements classification. In IEEE, 2015. [14] Sandeep Reddivari, Zhangji Chen, and Nan Niu. Recvisu: A tool for particular, we envision to use our approach within planned clustering-based visual exploration of requirements. In 2012 20th IEEE work on crowd-focused semi-automated requirements engi- International Requirements Engineering Conference (RE), pages 327– neering for evolution towards sustainability [21]. This would 328. IEEE, 2012. [15] Hakim Sultanov and Jane Huffman Hayes. Application of reinforcement also allow us to use our approach for other domains than learning to requirements engineering: requirements tracing. In 2013 21st smart homes, which might also result in datasets with different IEEE International Requirements Engineering Conference (RE), pages characteristics allowing us to further improve the classification 52–61. IEEE, 2013. [16] Vincenzo Gervasi and Didar Zowghi. Mining requirements links. results. In International Working Conference on Requirements Engineering: Foundation for Software Quality, pages 196–201. Springer, 2011. ACKNOWLEDGEMENT [17] Paolo Avesani, Anna Perini, Alberto Siena, and Angelo Susi. Goals at The authors would like to thank Robert Ördög for im- risk? machine learning at support of early assessment. In 2015 IEEE 23rd International Requirements Engineering Conference (RE), pages plementing the tool support for our ML-based approach and 252–255. IEEE, 2015. for conducting the experiment presented in this paper. This [18] Moussa Amrani, Levi Lúcio, and Adrien Bibal. Ml+ fv=? a survey research was partially funded by the European Unions Hori- on the application of machine learning to formal verification. arxiv: 1806.03600, 2018. zon 2020 research and innovation program under the Marie [19] Pradeep K Murukannaiah, Nirav Ajmeri, and Munindar P Singh. Acquir- Skodowska-Curie grant agreement No. 674875. ing creative requirements from the crowd: Understanding the influences of personality and creative potential in crowd re. In 2016 IEEE 24th R EFERENCES International Requirements Engineering Conference (RE), pages 176– [1] Christoph Becker, Stefanie Betz, Ruzanna Chitchyan, Leticia Duboc, 185. IEEE, 2016. Steve Easterbrook, Birgit Penzenstadler, Norbert Seyff, and Colin C. [20] Juan Ramos et al. Using tf-idf to determine word relevance in document Venters. Requirements: The key to sustainability. IEEE Software, queries. In Proceedings of the first instructional conference on machine 33(1):56–65, Jan 2016. learning, volume 242, pages 133–142. Piscataway, NJ, 2003. [2] Christoph Becker, Ruzanna Chitchyan, Leticia Duboc, Steve East- [21] Norbert Seyff, Stefanie Betz, Iris Groher, Melanie Stade, Ruzanna erbrook, Birgit Penzenstadler, Norbert Seyff, and Colin C. Venters. Chitchyan, Letı́cia Duboc, Birgit Penzenstadler, Colin Venters, and Sustainability design and software: The karlskrona manifesto. In Pro- Christoph Becker. Crowd-focused semi-automated requirements en- ceedings of the 37th International Conference on Software Engineering - gineering for evolution towards sustainability. In 2018 IEEE 26th Volume 2, ICSE ’15, pages 467–476, Piscataway, NJ, USA, 2015. IEEE International Requirements Engineering Conference (RE), pages 370– Press. 375. IEEE, 2018. [3] Jeremy L Caradonna. Sustainability: A history. Oxford University Press, 2014. [4] Ruzanna Chitchyan, Christoph Becker, Stefanie Betz, Leticia Duboc, Birgit Penzenstadler, Norbert Seyff, and Colin C. Venters. Sustainability design in requirements engineering: State of practice. In Proceedings of the 38th International Conference on Software Engineering Companion, ICSE ’16, pages 533–542, New York, NY, USA, 2016. ACM. [5] Norbert Seyff, Stefanie Betz, Leticia Duboc, Colin Venters, Christoph Becker, Ruzanna Chitchyan, Birgit Penzenstadler, and Markus Nöbauer. Tailoring requirements negotiation to sustainability. In 2018 IEEE 26th International Requirements Engineering Conference (RE), pages 304– 314. IEEE, 2018. [6] Leticia Duboc, Stefanie Betz, Birgit Penzenstadler, Sedef Akinli Kocak, Ruzanna Chitchyan, Ola Leifler, Jari Porras, Norbert Seyff, and Colin C Venters. Do we really know what we are building? raising awareness of potential sustainability effects of software systems in requirements engineering. In 27th IEEE International Requirements Engineering Conference, 2019. [7] Ahmed D Alharthi, Maria Spichkova, and Margaret Hamilton. Su- softpro: Sustainability profiling for software. In 2018 IEEE 26th International Requirements Engineering Conference (RE), pages 500– 501. IEEE, 2018. [8] T. Iqbal, P. Elahidoost, and L. Lcio. A bird’s eye view on requirements engineering and machine learning. In 2018 25th Asia-Pacific Software Engineering Conference (APSEC), pages 11–20, Dec 2018. [9] William Martin, Federica Sarro, Yue Jia, Yuanyuan Zhang, and Mark Harman. A survey of app store analysis for software engineering. IEEE transactions on software engineering, 43(9):817–847, 2016. [10] Zijad Kurtanović and Walid Maalej. Automatically classifying functional and non-functional requirements using supervised machine learning. In 2017 IEEE 25th International Requirements Engineering Conference (RE), pages 490–495. IEEE, 2017.