            Towards Automatically Identifying Potential
              Sustainability Effects of Requirements
                           Iris Groher                                    Norbert Seyff                         Tahira Iqbal
                  Johannes Kepler University                                 FHNW                               fortiss GmbH
                         Linz, Austria                               Windisch, Switzerland                    Munich, Germany
                      iris.groher@jku.at                              University of Zurich                    iqbal@fortiss.org
                                                                      Zurich, Switzerland

   Abstract—Software developers are gradually becoming                         certain aspects of a system are modified in the context of
aware that their systems have effects on sustainability. The                   system evolution. This, in part, also makes the identification
identification of potential effects software-intensive systems can             of potential effects a hypothetical endeavour, which often
have on different sustainability dimensions over time is yet in its
infancy. Researchers are currently exploring approaches which                  needs to be based on expert opinions rather than facts. Only
strongly make use of expert knowledge to identify potential                    after development, when the system is used in its application
effects. In this work in progress paper, we are looking at the                 context, one can eventually validate its effects on the different
problem from a different angle: we report on the exploration of                sustainability dimensions over time.
a machine learning-based approach to identify potential effects.                  Researchers have started to build methods and tools to
Such an approach allows to save time and costs but increases
the risk that potential effects are overseen. First results of                 support the identification of sustainability effects [5] [6] [7].
applying the machine learning-based approach in the domain of                  Although such approaches can be successfully applied to
home automation systems are promising, but also indicate that                  identify potential sustainability effects, it appears that they
further research is needed before our approach can be applied                  require a significant time investment of companies, which
in practice. Furthermore, we have learned that even providing                  might prevent their adoption.
the ground truth for training the algorithms is a challenging task.
                                                                                  In this work in progress paper, we follow a current trend in
  Index Terms—Sustainability, Analysis, Requirements, Machine                  software and requirements engineering and propose the use of
Learning.                                                                      Machine Learning (ML) for the early identification of potential
                                                                               sustainability effects. In this paper, we present this idea in
                          I. I NTRODUCTION                                     more detail and also report on a first application experiment in
                                                                               the home automation domain. Based on requirements for home
   Software-intensive systems do not operate in isolation but in
                                                                               automation systems, we have identified potential sustainability
complex socio-technical contexts. Therefore, they have an im-
                                                                               effects and have used these results as ground truth for training
pact on this context, manifesting itself in different dimensions
                                                                               our algorithms. Early results indicate that using ML for the
such as the environmental, economic, social, individual, and
                                                                               identification of sustainability effects is promising, which
technical dimension [1]. As effects can occur over time, we
                                                                               motivates us to continue with this research.
can also identify three different orders of effects [2] [3]. The
                                                                                  In Section 2 we discuss existing approaches for identifying
cumulative positive and negative effects a software-intensive
                                                                               sustainability effects in requirements. In Section 3 we present
system has on its context define its sustainability.
                                                                               our ML-based approach and report on a first experiment we
   In previous research [4], we have learned that practitioners
                                                                               conducted in Section 4. In Section 5 we conclude the paper
are not aware of the fact that software-intensive systems have
                                                                               and present an outlook on next steps we plan.
an impact on sustainability and that raising awareness is an
essential step towards the development of sustainable software-                 II. E XISTING S USTAINABILITY E FFECT I DENTIFICATION
intensive systems. Furthermore, the complexity of this matter                                  A PPROACHES AND T OOLS
and the lack of adequate methods and tools supporting the
                                                                                  The work presented in this paper is motivated by research in
identification of potential effects are hurdles for practitioners
                                                                               the field of requirements engineering, where researchers aim
who are already aware of their responsibility to build sustain-
                                                                               at identifying potential sustainability effects.
able systems.
                                                                                  In previous work on tailoring requirements negotiation to
   Requirements are the key to sustainability [1], which indi-
                                                                               sustainability [5], an extension of the WinWin negotiation
cates that the identification of potential sustainability effects
                                                                               model was proposed. This approach incorporates sustainability
needs to start before systems are actually built or when
                                                                               so that the negotiation is used to identify potential effects of
  Copyright 2019 for this paper by its authors. Use permitted under Creative   requirements on sustainability. For requirements which might
Commons License Attribution 4.0 International (CC BY 4.0).                     have negative effects, alternative requirements options are
discussed to minimize these negative effects. This method            on different levels of granularity and prioritizing requirements
was applied in an exploratory industrial case study, where           [16] [17].
it allowed practitioners to reflect on requirements and their        B. ML Application Overview
effects on sustainability.
                                                                        To automatically identify the impact of requirements on
   Recent work presents a question-based framework for rais-
                                                                     sustainability we follow the ML workflow [18], as shown in
ing awareness of the potential effects of software systems on
                                                                     Fig. 1.
sustainability [6]. The Sustainability Awareness Framework
                                                                        The first key step is data preprocessing that helps to avoid
(SusAF) was used by students to carry out interviews for a
                                                                     data incompleteness and inconsistency issues. This data is
software system of their choice to identify potential effects of
                                                                     used as an input for the ML algorithm, which means that the
these systems on sustainability and in particular even identify
                                                                     ML algorithm learns from existing data. On the basis of this
potential chain of effects. Results from this feasibility study
                                                                     learning, a learning model is produced as output, which can
indicate that SusAF stimulates the discussion about potential
                                                                     be used to make predictions on a different dataset than the one
effects of software systems on sustainably.
                                                                     used for training. The learning model can be evaluated based
   Alharthi et al. [7] present the SuSoftPro tool, which supports
                                                                     on its performance. For measuring the performance, we use
the analysis of the impact of requirements on different sustain-
                                                                     well-known ML parameters such as precision, recall, accuracy,
ability dimensions via a Fuzzy Rating Scale method. This tool-
                                                                     and F-score.
supported approach allows for different visualization option of
                                                                        In the next section, we will describe how we have performed
the results (e.g., a bar graph that illustrates the sustainability
                                                                     the above discussed steps in our experiment.
   We conclude that researchers have identified the need                              IV. A F IRST E XPERIMENT
for identifying sustainability effects and that first promising         We performed a first experiment in the domain of home
methods and even tool-supported approaches are appearing.            automation systems to evaluate our ML-based approach. In
However, all presented methods strongly depend on human              the next subsection, we describe the setup of our experiment
involvement and might require time-intensive discussions, the        and in the subsequent subsection we present our preliminary
involvement of experts or a large number of people to derive         results.
results. The quality of the produced results might further vary
                                                                     A. Setup
a lot depending on different factors such as the complexity of
the domain and the level of expertise of the people involved.           Data: The dataset used for training and evaluating our ML-
Nevertheless, bespoke methods can raise awareness and can            based approach is comprised of publicly available smart home
help, at least in part, to improve the sustainability of software-   requirements1 . The requirements were collected as part of
intensive systems.                                                   research on crowd-RE [19].
                                                                        In a first step, three annotators manually classified 200
III. M ACHINE L EARNING -BASED E FFECT I DENTIFICATION               randomly selected requirements from the total set of avail-
                                                                     able requirements (around 2900). All three annotators had
    The goal of our ongoing research is to automate the iden-        proficient knowledge and expertise for sustainable systems in
tification of potential sustainability effects by analysing the      software engineering. For each requirement, each annotator
requirements of a software-intensive system. In contrast to          independently marked which sustainability dimension(s) it had
existing methods in place, we expect that such an approach           an effect on. To support this classification, a literature review
will result in the significant reduction of the effort needed        has been performed on sustainability dimensions and created
for the analysis. However, we also see the risks that such an        a classification guideline based on the results of this analysis.
approach might result in overlooking potential effects.              The guideline contained for each dimension a set of influence
                                                                     factors and for each factor a description, rational, example
A. ML in Software and Requirements Engineering                       requirements, and literature references for further reading.
   To achieve this goal, we follow a recent trend in software        Table I shows an example influence factor in the environmental
and requirements engineering and explore the application of          dimension.
ML [8] [9]. ML has already been successfully used to clas-              The annotators used the guideline as a reference during the
sify software requirements into functional and non-functional        manual classification of the 200 smart home requirements.
requirements [10] [11]. The analysis of a large number of            Each requirement was independently classified according to
user feedback from multiple sources such as the app store            its influence on the five dimensions of software sustainability
and Twitter has been automated by applying ML [12] [13].             as positive, negative, or neutral. The plus sign (+) was assigned
This analysis helps to identify useful information such as bug       for positive influence, a minus sign (-) for negative influence,
reports and feature requests to support software evolution. For      and no sign for indicating no influence as shown in Table II.
validating requirements, automated analysis of requirement           The ratings were merged and cases in which the researchers
traceability with the help of natural language processing and        did not agree were discussed until consensus was reached.
ML has been studied [14] [15]. ML has also been applied                 1 Smarthome Crowd Requirements Dataset,
in requirements management such as visualizing requirements          https://crowdre.github.io/murukannaiah-smarthome-requirements-dataset/
                                                             Fig. 1. Basic ML workflow steps

                               TABLE I
                      E XAMPLE INFLUENCE FACTOR
                                                                                                 Recall = T P/(T P + F N )                   (4)
 Dimension                             Environment
 Factor                                Recycling
 Description                           Process of converting waste mate-                       F 1 − score = P recision/Recall               (5)
                                       rials into new materials and objects
 Influence/Rational                    Strong positive influence on the en-
                                       vironment as not only the amount           Here, TP (True Positive) is the number of requirements
                                       of waste is reduced but also the        correctly classified as belonging to a category. TN (True
                                       farming of natural resources to cre-    Negative) is the number of requirements that are correctly
                                       ate new products is decreased.
 Example                               If a device needs to be removed         classified as not belonging to a category. FP (False Positive) is
                                       from the system, the system should      the number of requirements incorrectly classified as belonging
                                       provide information on how to dis-      to a category, and FN (False Negative) is the number of
                                       pose it properly.
                                                                               requirements that are incorrectly classified as not belonging
                                                                               to a category.
B. Application                                                                    In simple words, accuracy is the ratio of correctly classified
                                                                               data over total data. This helps to predict the model perfor-
   Pre-processing: We applied natural language processing
                                                                               mance whereas high accuracy results in a better performance
techniques on our data before applying the different ML algo-
                                                                               of the model. However, data can be asymmetric and thus
rithms. First, we applied text tokenization on each requirement.
                                                                               parameters other than accuracy need to be evaluated. The
Then we eliminated all stop words and converted the text into
                                                                               precision metric refers to the ratio of correctly predicted
small characters. We applied stemming as our next step. As
                                                                               positive values to the total number of predicted positive values.
the last step, we converted pre-processed text as a vector space
                                                                               On the other hand, recall is the ratio of total predicted positive
model using Term Frequency-Inverse Document Frequency
                                                                               values to the actual number of positive values. It is not possible
(TF-ID or TF-IDF) as a weighting scheme:
                                                                               to maximize both recall and precision metrics at the same time,
                      tf idf (t, d) = tf (t, d) ∗ idf                    (1)   as one comes at the cost of another. To consider both, F1-score
                                                                               is used which is the harmonic mean of precision and recall.
   Here t is a term in a vector and d is a requirement in a                       The highest accuracy and F1-score decide which algorithm
collection of requirements [20].                                               is the best among others. We achieved the highest accuracy
   Classification: For the automated classification of sustain-                with DT classifier (70% precision) followed by SVM (69%
ability requirements and their dimensions, we trained our                      precision). Our dataset was not balanced, meaning that the
model using the annotated requirements dataset. We imple-                      five different sustainability dimensions were not equally rep-
mented Nave Bayes (NB), k-Nearest Neighbor (KNN), De-                          resented in the dataset (see Fig. 2). The economic, environ-
cision trees (DT), Support Vector Machine (SVM), Logistic                      mental, and social dimensions were almost equally repre-
regression (LR), and Neural Network (NN) algorithms and                        sented. The individual dimension had high occurrences and
also trained our classifier with and without stemming the data,                the technical dimension was almost inexistent. To overcome
as discussed in [2]. The results were quite similar with a minor               this problem, we used the weighting technique by assigning
difference and we used stemmed data for final analysis. We                     more weight to fewer data. After applying this setting, we
used tenfold cross-validation for evaluating our results.                      achieved the highest accuracy for SVM (75%), as shown in
C. Results                                                                     column SVM (b) of Table III. Recall and precision are 63%
                                                                               and 57% respectively, which is acceptable according to our
   The results for six different classifiers from our experiments
                                                                               accuracy. We also calculated the F1-scores, and the highest
are shown in Table III. For choosing the best classifier, we
                                                                               value was achieved with SVM (60%).
evaluated the performance of all these classifiers on the
                                                                                  The results from this initial experiment can be improved
basis of commonly used ML metrics i.e., accuracy, precision,
                                                                               further as we observed some issues that are impacting our
recall, and F1-score. These metrics can be calculated using
                                                                               results. The structure of our dataset varied with respect to
the following formulae:
                                                                               the length of the textual requirements. For example, one
                                                                               requirement consisted of 20 words, and another one consisted
      Accuracy = T P + T N/T P + F P + F N + T N                         (2)
                                                                               of 200 words. Due to the significant difference regarding their
                                                                               length, our ML classifier features were sparse, which might
                P recision = T P/(T P + F P )                            (3)   have lead to an underfitted model.
                                                                    TABLE II
                                                 M ANUALLY CLASSIFIED SMART HOME REQUIREMENTS

 ID          Req                                                                                          T   I        S        Ec        En
 1           Music should be available throughout the house                                               -   +                 -         -
 2           Temperature in the house should be adjusted based on the weather outside                         +                 +         +
 3           The lights shut be shut off in the rooms with nobody in them                                     +                 +         +
 4           The garage door should be opened when it senses my vehicle arriving outside of it                +

                                                     Fig. 2. Distribution of sustainability dimensions

                                                                     TABLE III
                                                       S CORES FOR THE DIFFERENT CLASSIFIERS

                     NB                 KNN                DT                  LR                  SVM            SVM (b)         NN
 Accuracy            0.58               0.65               0.7 (0.03)          0.62                0.67           0.75            0.69
 Precision           0.46               0.51               0.46                0.57                0.5            0.57            0.44
 Recall              0.43               0.56               0.56                0.54                0.48           0.63            0.47
 F-score             0.44               0.53               0.51                0.56                0.48           0.60            0.45

   Moreover, our negative and positive influence values on the                    As a next step, we plan to increase the size of our dataset.
sustainability dimensions were also not equally distributed.                   We have already designed a web-based solution where experts
Our data only contained 12 requirements with negative in-                      can update the categorized requirements and add additional
fluence, the rest were positive influences.                                    requirements. This web-based tool will help us to improve
                                                                               our labeling and support us in getting more data. It will also
                                                                               allow us to provide a more balanced dataset.
               V. C ONCLUSION AND N EXT S TEPS
                                                                                  Our current results indicate that there is the risk of overlook-
   The goal of our ongoing research is the automation of the                   ing requirements which have potential effects. As a next step,
identification of requirements, which potentially have effects                 we plan to focus on optimizing recall to minimize this risk. We
on the sustainability of a software-intensive system.                          envision that our approach could be used to complement exist-
   In this paper, we present the application of a state-of-the-art             ing methods. Instead of discussing each requirement manually,
ML approach to support effect identification. Our first results                our approach could provide a list of relevant requirements,
indicate that ML can be successfully used for the identification               which should be discussed further by human stakeholders.
of potential sustainability effects. However, we have learned                  High recall might result in lower precision, which means that
that the results from our first experiment can be improved                     human stakeholders are confronted with a larger number of
further. This starts with the dataset. The current dataset can                 false positives. However, we expect that providing a reduced
be improved to generate a more suitable learning model for                     set of requirements for discussion will enable practitioners to
the classifiers.                                                               save time compared to a full manual analysis.
