Towards Automatically Identifying Potential
              Sustainability Effects of Requirements
                           Iris Groher                                    Norbert Seyff                         Tahira Iqbal
                  Johannes Kepler University                                 FHNW                               fortiss GmbH
                         Linz, Austria                               Windisch, Switzerland                    Munich, Germany
                      iris.groher@jku.at                              University of Zurich                    iqbal@fortiss.org
                                                                      Zurich, Switzerland
                                                                     norbert.seyff@fhnw.ch


   Abstract—Software developers are gradually becoming                         certain aspects of a system are modified in the context of
aware that their systems have effects on sustainability. The                   system evolution. This, in part, also makes the identification
identification of potential effects software-intensive systems can             of potential effects a hypothetical endeavour, which often
have on different sustainability dimensions over time is yet in its
infancy. Researchers are currently exploring approaches which                  needs to be based on expert opinions rather than facts. Only
strongly make use of expert knowledge to identify potential                    after development, when the system is used in its application
effects. In this work in progress paper, we are looking at the                 context, one can eventually validate its effects on the different
problem from a different angle: we report on the exploration of                sustainability dimensions over time.
a machine learning-based approach to identify potential effects.                  Researchers have started to build methods and tools to
Such an approach allows to save time and costs but increases
the risk that potential effects are overseen. First results of                 support the identification of sustainability effects [5] [6] [7].
applying the machine learning-based approach in the domain of                  Although such approaches can be successfully applied to
home automation systems are promising, but also indicate that                  identify potential sustainability effects, it appears that they
further research is needed before our approach can be applied                  require a significant time investment of companies, which
in practice. Furthermore, we have learned that even providing                  might prevent their adoption.
the ground truth for training the algorithms is a challenging task.
                                                                                  In this work in progress paper, we follow a current trend in
  Index Terms—Sustainability, Analysis, Requirements, Machine                  software and requirements engineering and propose the use of
Learning.                                                                      Machine Learning (ML) for the early identification of potential
                                                                               sustainability effects. In this paper, we present this idea in
                          I. I NTRODUCTION                                     more detail and also report on a first application experiment in
                                                                               the home automation domain. Based on requirements for home
   Software-intensive systems do not operate in isolation but in
                                                                               automation systems, we have identified potential sustainability
complex socio-technical contexts. Therefore, they have an im-
                                                                               effects and have used these results as ground truth for training
pact on this context, manifesting itself in different dimensions
                                                                               our algorithms. Early results indicate that using ML for the
such as the environmental, economic, social, individual, and
                                                                               identification of sustainability effects is promising, which
technical dimension [1]. As effects can occur over time, we
                                                                               motivates us to continue with this research.
can also identify three different orders of effects [2] [3]. The
                                                                                  In Section 2 we discuss existing approaches for identifying
cumulative positive and negative effects a software-intensive
                                                                               sustainability effects in requirements. In Section 3 we present
system has on its context define its sustainability.
                                                                               our ML-based approach and report on a first experiment we
   In previous research [4], we have learned that practitioners
                                                                               conducted in Section 4. In Section 5 we conclude the paper
are not aware of the fact that software-intensive systems have
                                                                               and present an outlook on next steps we plan.
an impact on sustainability and that raising awareness is an
essential step towards the development of sustainable software-                 II. E XISTING S USTAINABILITY E FFECT I DENTIFICATION
intensive systems. Furthermore, the complexity of this matter                                  A PPROACHES AND T OOLS
and the lack of adequate methods and tools supporting the
                                                                                  The work presented in this paper is motivated by research in
identification of potential effects are hurdles for practitioners
                                                                               the field of requirements engineering, where researchers aim
who are already aware of their responsibility to build sustain-
                                                                               at identifying potential sustainability effects.
able systems.
                                                                                  In previous work on tailoring requirements negotiation to
   Requirements are the key to sustainability [1], which indi-
                                                                               sustainability [5], an extension of the WinWin negotiation
cates that the identification of potential sustainability effects
                                                                               model was proposed. This approach incorporates sustainability
needs to start before systems are actually built or when
                                                                               so that the negotiation is used to identify potential effects of
  Copyright 2019 for this paper by its authors. Use permitted under Creative   requirements on sustainability. For requirements which might
Commons License Attribution 4.0 International (CC BY 4.0).                     have negative effects, alternative requirements options are
discussed to minimize these negative effects. This method            on different levels of granularity and prioritizing requirements
was applied in an exploratory industrial case study, where           [16] [17].
it allowed practitioners to reflect on requirements and their        B. ML Application Overview
effects on sustainability.
                                                                        To automatically identify the impact of requirements on
   Recent work presents a question-based framework for rais-
                                                                     sustainability we follow the ML workflow [18], as shown in
ing awareness of the potential effects of software systems on
                                                                     Fig. 1.
sustainability [6]. The Sustainability Awareness Framework
                                                                        The first key step is data preprocessing that helps to avoid
(SusAF) was used by students to carry out interviews for a
                                                                     data incompleteness and inconsistency issues. This data is
software system of their choice to identify potential effects of
                                                                     used as an input for the ML algorithm, which means that the
these systems on sustainability and in particular even identify
                                                                     ML algorithm learns from existing data. On the basis of this
potential chain of effects. Results from this feasibility study
                                                                     learning, a learning model is produced as output, which can
indicate that SusAF stimulates the discussion about potential
                                                                     be used to make predictions on a different dataset than the one
effects of software systems on sustainably.
                                                                     used for training. The learning model can be evaluated based
   Alharthi et al. [7] present the SuSoftPro tool, which supports
                                                                     on its performance. For measuring the performance, we use
the analysis of the impact of requirements on different sustain-
                                                                     well-known ML parameters such as precision, recall, accuracy,
ability dimensions via a Fuzzy Rating Scale method. This tool-
                                                                     and F-score.
supported approach allows for different visualization option of
                                                                        In the next section, we will describe how we have performed
the results (e.g., a bar graph that illustrates the sustainability
                                                                     the above discussed steps in our experiment.
level).
   We conclude that researchers have identified the need                              IV. A F IRST E XPERIMENT
for identifying sustainability effects and that first promising         We performed a first experiment in the domain of home
methods and even tool-supported approaches are appearing.            automation systems to evaluate our ML-based approach. In
However, all presented methods strongly depend on human              the next subsection, we describe the setup of our experiment
involvement and might require time-intensive discussions, the        and in the subsequent subsection we present our preliminary
involvement of experts or a large number of people to derive         results.
results. The quality of the produced results might further vary
                                                                     A. Setup
a lot depending on different factors such as the complexity of
the domain and the level of expertise of the people involved.           Data: The dataset used for training and evaluating our ML-
Nevertheless, bespoke methods can raise awareness and can            based approach is comprised of publicly available smart home
help, at least in part, to improve the sustainability of software-   requirements1 . The requirements were collected as part of
intensive systems.                                                   research on crowd-RE [19].
                                                                        In a first step, three annotators manually classified 200
III. M ACHINE L EARNING -BASED E FFECT I DENTIFICATION               randomly selected requirements from the total set of avail-
                                                                     able requirements (around 2900). All three annotators had
    The goal of our ongoing research is to automate the iden-        proficient knowledge and expertise for sustainable systems in
tification of potential sustainability effects by analysing the      software engineering. For each requirement, each annotator
requirements of a software-intensive system. In contrast to          independently marked which sustainability dimension(s) it had
existing methods in place, we expect that such an approach           an effect on. To support this classification, a literature review
will result in the significant reduction of the effort needed        has been performed on sustainability dimensions and created
for the analysis. However, we also see the risks that such an        a classification guideline based on the results of this analysis.
approach might result in overlooking potential effects.              The guideline contained for each dimension a set of influence
                                                                     factors and for each factor a description, rational, example
A. ML in Software and Requirements Engineering                       requirements, and literature references for further reading.
   To achieve this goal, we follow a recent trend in software        Table I shows an example influence factor in the environmental
and requirements engineering and explore the application of          dimension.
ML [8] [9]. ML has already been successfully used to clas-              The annotators used the guideline as a reference during the
sify software requirements into functional and non-functional        manual classification of the 200 smart home requirements.
requirements [10] [11]. The analysis of a large number of            Each requirement was independently classified according to
user feedback from multiple sources such as the app store            its influence on the five dimensions of software sustainability
and Twitter has been automated by applying ML [12] [13].             as positive, negative, or neutral. The plus sign (+) was assigned
This analysis helps to identify useful information such as bug       for positive influence, a minus sign (-) for negative influence,
reports and feature requests to support software evolution. For      and no sign for indicating no influence as shown in Table II.
validating requirements, automated analysis of requirement           The ratings were merged and cases in which the researchers
traceability with the help of natural language processing and        did not agree were discussed until consensus was reached.
ML has been studied [14] [15]. ML has also been applied                 1 Smarthome Crowd Requirements Dataset,
in requirements management such as visualizing requirements          https://crowdre.github.io/murukannaiah-smarthome-requirements-dataset/
                                                             Fig. 1. Basic ML workflow steps


                               TABLE I
                      E XAMPLE INFLUENCE FACTOR
                                                                                                 Recall = T P/(T P + F N )                   (4)
 Dimension                             Environment
 Factor                                Recycling
 Description                           Process of converting waste mate-                       F 1 − score = P recision/Recall               (5)
                                       rials into new materials and objects
 Influence/Rational                    Strong positive influence on the en-
                                       vironment as not only the amount           Here, TP (True Positive) is the number of requirements
                                       of waste is reduced but also the        correctly classified as belonging to a category. TN (True
                                       farming of natural resources to cre-    Negative) is the number of requirements that are correctly
                                       ate new products is decreased.
 Example                               If a device needs to be removed         classified as not belonging to a category. FP (False Positive) is
                                       from the system, the system should      the number of requirements incorrectly classified as belonging
                                       provide information on how to dis-      to a category, and FN (False Negative) is the number of
                                       pose it properly.
                                                                               requirements that are incorrectly classified as not belonging
                                                                               to a category.
B. Application                                                                    In simple words, accuracy is the ratio of correctly classified
                                                                               data over total data. This helps to predict the model perfor-
   Pre-processing: We applied natural language processing
                                                                               mance whereas high accuracy results in a better performance
techniques on our data before applying the different ML algo-
                                                                               of the model. However, data can be asymmetric and thus
rithms. First, we applied text tokenization on each requirement.
                                                                               parameters other than accuracy need to be evaluated. The
Then we eliminated all stop words and converted the text into
                                                                               precision metric refers to the ratio of correctly predicted
small characters. We applied stemming as our next step. As
                                                                               positive values to the total number of predicted positive values.
the last step, we converted pre-processed text as a vector space
                                                                               On the other hand, recall is the ratio of total predicted positive
model using Term Frequency-Inverse Document Frequency
                                                                               values to the actual number of positive values. It is not possible
(TF-ID or TF-IDF) as a weighting scheme:
                                                                               to maximize both recall and precision metrics at the same time,
                      tf idf (t, d) = tf (t, d) ∗ idf                    (1)   as one comes at the cost of another. To consider both, F1-score
                                                                               is used which is the harmonic mean of precision and recall.
   Here t is a term in a vector and d is a requirement in a                       The highest accuracy and F1-score decide which algorithm
collection of requirements [20].                                               is the best among others. We achieved the highest accuracy
   Classification: For the automated classification of sustain-                with DT classifier (70% precision) followed by SVM (69%
ability requirements and their dimensions, we trained our                      precision). Our dataset was not balanced, meaning that the
model using the annotated requirements dataset. We imple-                      five different sustainability dimensions were not equally rep-
mented Nave Bayes (NB), k-Nearest Neighbor (KNN), De-                          resented in the dataset (see Fig. 2). The economic, environ-
cision trees (DT), Support Vector Machine (SVM), Logistic                      mental, and social dimensions were almost equally repre-
regression (LR), and Neural Network (NN) algorithms and                        sented. The individual dimension had high occurrences and
also trained our classifier with and without stemming the data,                the technical dimension was almost inexistent. To overcome
as discussed in [2]. The results were quite similar with a minor               this problem, we used the weighting technique by assigning
difference and we used stemmed data for final analysis. We                     more weight to fewer data. After applying this setting, we
used tenfold cross-validation for evaluating our results.                      achieved the highest accuracy for SVM (75%), as shown in
C. Results                                                                     column SVM (b) of Table III. Recall and precision are 63%
                                                                               and 57% respectively, which is acceptable according to our
   The results for six different classifiers from our experiments
                                                                               accuracy. We also calculated the F1-scores, and the highest
are shown in Table III. For choosing the best classifier, we
                                                                               value was achieved with SVM (60%).
evaluated the performance of all these classifiers on the
                                                                                  The results from this initial experiment can be improved
basis of commonly used ML metrics i.e., accuracy, precision,
                                                                               further as we observed some issues that are impacting our
recall, and F1-score. These metrics can be calculated using
                                                                               results. The structure of our dataset varied with respect to
the following formulae:
                                                                               the length of the textual requirements. For example, one
                                                                               requirement consisted of 20 words, and another one consisted
      Accuracy = T P + T N/T P + F P + F N + T N                         (2)
                                                                               of 200 words. Due to the significant difference regarding their
                                                                               length, our ML classifier features were sparse, which might
                P recision = T P/(T P + F P )                            (3)   have lead to an underfitted model.
                                                                    TABLE II
                                                 M ANUALLY CLASSIFIED SMART HOME REQUIREMENTS

 ID          Req                                                                                          T   I        S        Ec        En
 1           Music should be available throughout the house                                               -   +                 -         -
 2           Temperature in the house should be adjusted based on the weather outside                         +                 +         +
 3           The lights shut be shut off in the rooms with nobody in them                                     +                 +         +
 4           The garage door should be opened when it senses my vehicle arriving outside of it                +


                                                     Fig. 2. Distribution of sustainability dimensions

                                                                     TABLE III
                                                       S CORES FOR THE DIFFERENT CLASSIFIERS

                     NB                 KNN                DT                  LR                  SVM            SVM (b)         NN
 Accuracy            0.58               0.65               0.7 (0.03)          0.62                0.67           0.75            0.69
 Precision           0.46               0.51               0.46                0.57                0.5            0.57            0.44
 Recall              0.43               0.56               0.56                0.54                0.48           0.63            0.47
 F-score             0.44               0.53               0.51                0.56                0.48           0.60            0.45


   Moreover, our negative and positive influence values on the                    As a next step, we plan to increase the size of our dataset.
sustainability dimensions were also not equally distributed.                   We have already designed a web-based solution where experts
Our data only contained 12 requirements with negative in-                      can update the categorized requirements and add additional
fluence, the rest were positive influences.                                    requirements. This web-based tool will help us to improve
                                                                               our labeling and support us in getting more data. It will also
                                                                               allow us to provide a more balanced dataset.
               V. C ONCLUSION AND N EXT S TEPS
                                                                                  Our current results indicate that there is the risk of overlook-
   The goal of our ongoing research is the automation of the                   ing requirements which have potential effects. As a next step,
identification of requirements, which potentially have effects                 we plan to focus on optimizing recall to minimize this risk. We
on the sustainability of a software-intensive system.                          envision that our approach could be used to complement exist-
   In this paper, we present the application of a state-of-the-art             ing methods. Instead of discussing each requirement manually,
ML approach to support effect identification. Our first results                our approach could provide a list of relevant requirements,
indicate that ML can be successfully used for the identification               which should be discussed further by human stakeholders.
of potential sustainability effects. However, we have learned                  High recall might result in lower precision, which means that
that the results from our first experiment can be improved                     human stakeholders are confronted with a larger number of
further. This starts with the dataset. The current dataset can                 false positives. However, we expect that providing a reduced
be improved to generate a more suitable learning model for                     set of requirements for discussion will enable practitioners to
the classifiers.                                                               save time compared to a full manual analysis.
   Similar to other work in this field, we have also experienced                 [11] Douglas S Lange. Text classification and machine learning support for
that manually identifying potential effects can lead to different                     requirements analysis using blogs. In Monterey Workshop, pages 182–
                                                                                      195. Springer, 2007.
opinions amongst the annotators. Although, the three annota-                     [12] Grant Williams and Anas Mahmoud. Mining twitter feeds for software
tors were able to agree on a ground truth used for training our                       user requirements. In 2017 IEEE 25th International Requirements
classifier, we would like to highlight that our results might                         Engineering Conference (RE), pages 1–10. IEEE, 2017.
                                                                                 [13] Walid Maalej and Hadeer Nabil. Bug report, feature request, or simply
reflect a rather subjective viewpoint of the annotators.                              praise? on automatically classifying app reviews. In 2015 IEEE 23rd
   Overall, we would like to explore how we can integrate our                         international requirements engineering conference (RE), pages 116–125.
work into existing studies for requirements classification. In                        IEEE, 2015.
                                                                                 [14] Sandeep Reddivari, Zhangji Chen, and Nan Niu. Recvisu: A tool for
particular, we envision to use our approach within planned                            clustering-based visual exploration of requirements. In 2012 20th IEEE
work on crowd-focused semi-automated requirements engi-                               International Requirements Engineering Conference (RE), pages 327–
neering for evolution towards sustainability [21]. This would                         328. IEEE, 2012.
                                                                                 [15] Hakim Sultanov and Jane Huffman Hayes. Application of reinforcement
also allow us to use our approach for other domains than                              learning to requirements engineering: requirements tracing. In 2013 21st
smart homes, which might also result in datasets with different                       IEEE International Requirements Engineering Conference (RE), pages
characteristics allowing us to further improve the classification                     52–61. IEEE, 2013.
                                                                                 [16] Vincenzo Gervasi and Didar Zowghi. Mining requirements links.
results.                                                                              In International Working Conference on Requirements Engineering:
                                                                                      Foundation for Software Quality, pages 196–201. Springer, 2011.
                   ACKNOWLEDGEMENT                                               [17] Paolo Avesani, Anna Perini, Alberto Siena, and Angelo Susi. Goals at
   The authors would like to thank Robert Ördög for im-                             risk? machine learning at support of early assessment. In 2015 IEEE
                                                                                      23rd International Requirements Engineering Conference (RE), pages
plementing the tool support for our ML-based approach and                             252–255. IEEE, 2015.
for conducting the experiment presented in this paper. This                      [18] Moussa Amrani, Levi Lúcio, and Adrien Bibal. Ml+ fv=? a survey
research was partially funded by the European Unions Hori-                            on the application of machine learning to formal verification. arxiv:
                                                                                      1806.03600, 2018.
zon 2020 research and innovation program under the Marie                         [19] Pradeep K Murukannaiah, Nirav Ajmeri, and Munindar P Singh. Acquir-
Skodowska-Curie grant agreement No. 674875.                                           ing creative requirements from the crowd: Understanding the influences
                                                                                      of personality and creative potential in crowd re. In 2016 IEEE 24th
                              R EFERENCES                                             International Requirements Engineering Conference (RE), pages 176–
 [1] Christoph Becker, Stefanie Betz, Ruzanna Chitchyan, Leticia Duboc,               185. IEEE, 2016.
     Steve Easterbrook, Birgit Penzenstadler, Norbert Seyff, and Colin C.        [20] Juan Ramos et al. Using tf-idf to determine word relevance in document
     Venters. Requirements: The key to sustainability. IEEE Software,                 queries. In Proceedings of the first instructional conference on machine
     33(1):56–65, Jan 2016.                                                           learning, volume 242, pages 133–142. Piscataway, NJ, 2003.
 [2] Christoph Becker, Ruzanna Chitchyan, Leticia Duboc, Steve East-             [21] Norbert Seyff, Stefanie Betz, Iris Groher, Melanie Stade, Ruzanna
     erbrook, Birgit Penzenstadler, Norbert Seyff, and Colin C. Venters.              Chitchyan, Letı́cia Duboc, Birgit Penzenstadler, Colin Venters, and
     Sustainability design and software: The karlskrona manifesto. In Pro-            Christoph Becker. Crowd-focused semi-automated requirements en-
     ceedings of the 37th International Conference on Software Engineering -          gineering for evolution towards sustainability. In 2018 IEEE 26th
     Volume 2, ICSE ’15, pages 467–476, Piscataway, NJ, USA, 2015. IEEE               International Requirements Engineering Conference (RE), pages 370–
     Press.                                                                           375. IEEE, 2018.
 [3] Jeremy L Caradonna. Sustainability: A history. Oxford University Press,
     2014.
 [4] Ruzanna Chitchyan, Christoph Becker, Stefanie Betz, Leticia Duboc,
     Birgit Penzenstadler, Norbert Seyff, and Colin C. Venters. Sustainability
     design in requirements engineering: State of practice. In Proceedings of
     the 38th International Conference on Software Engineering Companion,
     ICSE ’16, pages 533–542, New York, NY, USA, 2016. ACM.
 [5] Norbert Seyff, Stefanie Betz, Leticia Duboc, Colin Venters, Christoph
     Becker, Ruzanna Chitchyan, Birgit Penzenstadler, and Markus Nöbauer.
     Tailoring requirements negotiation to sustainability. In 2018 IEEE 26th
     International Requirements Engineering Conference (RE), pages 304–
     314. IEEE, 2018.
 [6] Leticia Duboc, Stefanie Betz, Birgit Penzenstadler, Sedef Akinli Kocak,
     Ruzanna Chitchyan, Ola Leifler, Jari Porras, Norbert Seyff, and Colin C
     Venters. Do we really know what we are building? raising awareness
     of potential sustainability effects of software systems in requirements
     engineering. In 27th IEEE International Requirements Engineering
     Conference, 2019.
 [7] Ahmed D Alharthi, Maria Spichkova, and Margaret Hamilton. Su-
     softpro: Sustainability profiling for software. In 2018 IEEE 26th
     International Requirements Engineering Conference (RE), pages 500–
     501. IEEE, 2018.
 [8] T. Iqbal, P. Elahidoost, and L. Lcio. A bird’s eye view on requirements
     engineering and machine learning. In 2018 25th Asia-Pacific Software
     Engineering Conference (APSEC), pages 11–20, Dec 2018.
 [9] William Martin, Federica Sarro, Yue Jia, Yuanyuan Zhang, and Mark
     Harman. A survey of app store analysis for software engineering. IEEE
     transactions on software engineering, 43(9):817–847, 2016.
[10] Zijad Kurtanović and Walid Maalej. Automatically classifying functional
     and non-functional requirements using supervised machine learning. In
     2017 IEEE 25th International Requirements Engineering Conference
     (RE), pages 490–495. IEEE, 2017.