=Paper= {{Paper |id=Vol-2584/NLP4RE-paper1 |storemode=property |title=A comparison of NLP Tools for RE to extract Variation Points |pdfUrl=https://ceur-ws.org/Vol-2584/NLP4RE-paper1.pdf |volume=Vol-2584 |authors=Monica Arrabito,Alessandro Fantechi,Stefania Gnesi,Laura Semini |dblpUrl=https://dblp.org/rec/conf/refsq/ArrabitoFGS20 }} ==A comparison of NLP Tools for RE to extract Variation Points== https://ceur-ws.org/Vol-2584/NLP4RE-paper1.pdf
 A comparison of NLP Tools for RE to extract Variation
                       Points

                                         Monica Arrabito
            Dipartimento di Informatica, Università di Pisa, monica.arrabito@hotmail.it

                                       Alessandro Fantechi
     Dip. di Ingegneria dell’Informazione, Università di Firenze, alessandro.fantechi@unifi.it

                                           Stefania Gnesi
                   Istituto di Scienza e Tecnologie dell’Informazione “A.Faedo”
           Consiglio Nazionale delle Ricerche, ISTI-CNR, Pisa, stefania.gnesi@isti.cnr.it

                                           Laura Semini
                Dipartimento di Informatica, Università di Pisa, laura.semini@unipi.it




                                                       Abstract

                       In the requirement engineering of software product lines, several re-
                       searches have focused on exploiting NLP techniques and tools to ex-
                       tract information related to features and variability from requirement
                       documents. In a previous work we have proposed the use of the tool
                       QuARS, a NLP Tool for Requirements Analysis, showing that some of
                       the indicators used to detect NL ambiguity can also be exploited to
                       detect variability. In this paper we discuss a comparison of the appli-
                       cation at this regard of QuARS and other Requirements Analysis tools
                       presented in the last edition of NLP4RE, in particular with respect to
                       their ability to extract potential variation points, in search of better
                       performances and of novel indicators.

NLP, natural language requirements, variability, ambiguity.

1    Introduction
Natural language (NL) is the most common way to express software requirements even though it is intrinsically
ambiguous, and ambiguity is seen as a possible source of problems in the later interpretation of requirements.
Natural Language Processing (NLP) techniques have been used to analyse requirement documents to single out,
among other issues, ambiguity defects.
   Although ambiguity or under-specification at requirement level are usually seen as defects, they can be some-
times deliberate, as a form to leave some issues to further analysis or to later decision. In this case, ambiguity

   Copyright c 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International
(CC BY 4.0).
   In: M. Sabetzadeh, A. Vogelsang, S. Abualhaija, M. Borg, F. Dalpiaz, M. Daneva, N. Fernández, X. Franch, D. Fucci, V.
Gervasi, E. Groen, R. Guizzardi, A. Herrmann, J. Horkoff, L. Mich, A. Perini, A. Susi (eds.): Joint Proceedings of REFSQ-2020
Workshops, Doctoral Symposium, Live Studies Track, and Poster Track, Pisa, Italy, 24-03-2020, published at http://ceur-ws.org
 Defect                           Indicators
 vagueness                        clear, easy, strong, good, bad, adequate...
 subjectivity                     similar, have in mind, take into account,...
 optionality                      or, and/or, possibly, eventually, case, if possible, if appropriate...
 implicity                        demonstrative adjectives or pronouns
 weakness                         weak verbs (e.g.: can, could, may)
 under-specification              wordings to be instantiated (e.g.: information, interface)
 multiplicity                     multiple syntactic constructs such as multiple verbs or multiple subjects

 Variability                      Indicators
 variability                      if, where, whether, when, choose, choice, configurable...
 cross-tree constraints           need, request, require, excludes, expect

                              Table 1: Defects and variability detected by QuARS
defects could give an indication of possible variability, either in design choice, in implementation choices or by
later configurability.
   In this view, ambiguous or under-specified requirements can be considered as requirements for a product
line [PBvdL05], where different products can be obtained by different choices in design, implementation and
configuration. So, analysing a requirement document, the ambiguity that is detected can be used to identify
possible variation points, where ambiguity is not a defect but points to different choices that can give space for
a range of different products.
   In [FGS17] we have presented this idea, with its consequence, specifically that NLP tools may help in revealing
information on variability that can be later used to define feature models from which different systems can be
instantiated. In subsequent works [FFGS18a, FFGS18b] we presented the results of an empirical evaluation of
the position, performed on large requirement documents from industry using QuARS [GLT05]. QuARS - Quality
Analyzer for Requirement Specifications – is a tool for analyzing NL requirements in a systematic and automatic
way by means of NLP techniques with a focus on ambiguity detection. The evaluation has shown that some
ambiguous terms, such as “and/or” “or”, and weak terms such as “may” or “could” are more likely to indicate
variability rather than ambiguity. Instead, typically vague terms, such as “useful”, “significant”, etc. are more
likely to indicate ambiguity. This has been realized implementing a quality model composed of high level quality
properties for NL requirements to be evaluated by means of indicators directly detectable and measurable on
NL requirement documents.
   Here, we report on the experience made with industrial NLP analysis tools, aimed to compare their per-
formance in detecting variability with that of QuARS, and to see whether they include any fresh indicator
relevant to detect variability [Arr19]. We have considered the tools that were presented in a showcase at
NLP4RE19 [DFF+ 19], namely Requirements Scout, Semantic Processing Platform, QVscribe, and ReqSuite.
After a preliminary analisys, we restricted the attention to Requirements Scout and QVscribe and we have run
experiments to compare them with QuARS. Our aim is twofold: to validate the approach we proposed in [FGS17]
and to compare the effectiveness of the different indicators to extract variability from linguistic defects.
   In Section 2 we introduce the tools used for the comparison, in Section 3 we describe the results of the
application of the tools to a case study and analyse the results. A related works and conclusions section follows.

2     The tools
2.1   QuARS
QuARS allows to perform an initial parsing of the requirements for automatically detecting potential linguistic
defects due to ambiguity that can determine interpretation problems at the subsequent development stages
of a software product. The analysis is done starting from a quality model composed of high level quality
properties for NL requirements to be evaluated by means of indicators. These indicators are listed in the first
part of Table 1. The second part of the table includes the specific variability revealing indicators introduced in
[FFGS18b, AFGS20].

2.2   QVscribe
QVscribe is a tool for requirements analysis for quality and consistency, developed by QRA
(https://qracorp.com/). QVscribe analyzes the quality of the requirements, highlighting ambiguity, inconsis-
 Defect                             Indicators
 imperatives                        absence, negation, or multiple occurrence of imperatives
 optional escape clauses            optional terms like possibly, may, . . .
 vague words                        vague nouns and verbs as various, completed, . . .
 cross-referencing pronouns         both, everybody, anyone, it,. . . e.g. an office has a door connecting it to a hallway
 non-specific temporal words        early, years ago, before, . . .
 continuances                       otherwise, in particular, below, following, . . .
 superfluous infinitives            as in shall permit since they can hide the subject
 passive voice                      based, found, shipped since it can hide the subject
 immeasurable quantification        abundant, far, always, all, . . .
 incomplete sentences               missing critical details of who must do something or what must be done

                                     Table 2: Defects detected by QVscribe

 Defect                             Indicators, if any, or motivation
 long&complicated sentences         which are difficult to read and prone to ambiguities
 passive voice                      done, found, sent, . . . since they can hide the subject
 multiple negations                 requirements must be expressed in positive terms
 universal quantifiers              all, always, every, any, nothing, . . .
 imprecise phrases (vagueness)      possibly, various, current, small, general, if possible, . . .
 vague pronouns                     that, which, their, it, nobody, . . .
 comparatives & superlatives        faster than, fastest, bigger than, . . . they make a requirement not understandable
                                    in isolation
 exactly one shall or should        more than one occurrence of shall or should
 occurrence of will or may          weak verbs such as will, may, . . .
 wrong abstraction level            to exclude implementation details
 dangerous slash                    ”/” , that can be interpreted both as an and and an or, like in ”The system sends
                                    an email alert to all administrators/managers”
 UI details                         requirements should not contain details of the user interface.
 cloning                            since duplicates burden successive maintenance

                               Table 3: Defects detected by Requirements Scout
tencies and possible similarities. In addition, it allows the generation of detailed reports that can be used to
increase clarity and consistency of requirements, reducing the review and rewriting work and avoiding that crit-
ical errors can manifest themselves in the later stages of development. The defects detected by the tool and the
related indicators can be classified according to Table 2.

2.3   Requirements Scout
Requirements Scout is a tool developed by Qualicen GmbH, to analyze requirements specifications
(https://www.qualicen.de/en/). Requirements Scout, besides analyzing NL requirements with the aim of iden-
tifying the defects, also allows the analyst to keep track of different versions of the requirements, creating a
complete history of the detected defects: as soon as the requirements are updated, the tool re-analyzes the
modified parts and shows whether the update has eliminated existing defects or has introduced new ones.
   The defects and indicators of Requirements Scout are shown in Table 3.

2.4   Semantic Processing Platform
Semantic Processing Platform (Semantha) addresses the comparison of documents at the semantic level. It has
been developed by thingsThinking (https://www.thingsthinking.net/). Semantha is able to search for common
concepts in different documents, comparing them and highlighting the differences.

2.5   ReqSuite
The ReqSuite tool, produced by OSSENO Software GmbH (https://www.osseno.com/en/), supports rigorous re-
quirement definition by assisting the writer who is asked to follow some patterns. For instance, requirements have
to be formulated as “The system shall”, and some fields have to be filled. These fields are: ID, benefit, damage,
costs, priority, type (functional/quality) and source (to specify the stakeholder requiring that requirement).
 R1 The system shall enable user to enter the search text on the screen.
 R2 The system shall display all the matching products based on the search.
 R3 The system possibly notifies with a pop-up the user when no matching product is found on the search.
 R4 The system shall allow a user to create his profile and set his credentials.
 R5 The system shall authenticate user credentials to enter the profile.
 R6 The system shall display the list of active and/or the list of completed orders in the customer profile.
 R7 The system shall maintain customer email information as a required part of customer profile.
 R8 The system shall send an order confirmation to the user through email.
 R9 The system shall allow an user to add and remove products in the shopping cart.
 R10 The system shall display various shipping methods.
 R11 The order shall be shipped to the client address or, if the “collect in-store” service is available, to an associated
     store.
 R12 The system shall enable the user to select the shipping method.
 R13 The system may display the current tracking information about the order.
 R14 The system shall display the available payment methods.
 R15 The system shall allow the user to select the payment method for order.
 R16 After delivery, the system may enable the users to enter their reviews or ratings.
 R17 In order to publish the feedback on the purchases, the system needs to collect both reviews and ratings.
 R18 The “collect in-store” service excludes the tracking information service.

                                  Table 4: Requirements of the E-shop case study
2.6   A first result: tools suited to detect variation points
A preliminary analysis has shown that only two of the considered industrial tools, namely QVscribe and Re-
quirements Scout, are suited to detect ambiguity defects, returning an output similar to QuARS.
   ReqSuite performs a structural analysis, but it does not address the detection of lexical or syntactical ambi-
guities that can be of help to single out variation points. For instance: if a weak verb or a passive voice are used,
the tool generically asks to rephrase the sentence into a ”shall” or ”should” statement; a vague term as various
and an optional term as possibly are not detected.
   Semantic Processing Platform could actually be used to search for variabilities, but with a different technique,
namely exploiting a contrastive analysis. This is a known approach to apply NLP techniques to extract informa-
tion related to features from existing NL documents [FSD13, FSGD15, NBA+ 17]. The idea is that of extracting
candidate terms from different documents describing a product, and identifying commonalities and variabilities
to be collected in a product family, as, e.g., in [BH11]. Our technique, on the contrary, exploiting ambiguity
detection, permits to extract variability from a single requirement document. Semantic Processing Platform is
hence not suited to this kind of analysis since it is not designed to detect ambiguity.
   As a consequence, in the following we apply only QVscribe and Requirements Scout to work out a fine grained
comparison with QuARS.

3     Application of the tools to a E-shop case study: Results and Observations
QuARS, QVscribe and Requirements Scout are compared first for their general qualities, then in their ability to
support a variability extraction process. The running example is a simple E-shop [AFGS20], whose requirements
are presented in Table 4.
   In [FFGS18b, AFGS20] we have shown that the QuARS ambiguity indicators that proved most useful to
indicate variation points are multiplicity and weakness, but also optionality is another natural candidate to
detect variability. The experiment presented here is aimed to understand if the other tools perform better at
this regard, and to understand if they are able to reveal any fresh indicator relevant to detect variability.

3.1   General qualities comparison
We first address documentation, learnability, and usability. QuARS was simple to learn and use without referring
to any manual, also by the first author who was a newcomer to our project. QVscribe comes equipped with good
documentation and video tutorials and it was easy to be acquainted with. Requirements Scout is the tool were
most difficulties were encountered, because of lack of documentation, a non intuitive interface, and a complex
setup of the user profile.
   To give a rough measure of learnability, the first author, which had no previous experience on any of the three
tools, needed the following number of hours of training in order to proficiently use them: 30 for QuARS, 36 for
          Requirement                                             Tool         Indicator   Defect
                                                                  QuARS                    multiplicity
          R1 The system shall enable the user to enter the        QVscribe     Enable      vague words
          search text on the screen.                              Req. Scout   Screen      UI details
                                                                  QuARS
          R2 The system shall display all the matching                         all         universal quantifiers
                                                                  QVscribe
          products based on the search.                                        based       passive voice
                                                                  Req. Scout   all         universal quantifiers
                                                                               possibly    optionality
                                                                  QuARS        when        variability
                                                                                           multiplicity
                                                                               possibly    optional escape clauses
          R3 The system possibly notifies with a pop-up                        when        immeasurable quantification
          the user when no matching product is found on           QVscribe     no          universal quantifiers
          the search.                                                          found       passive voice
                                                                                           no imperatives
                                                                               possibly    vagueness
                                                                  Req. Scout   found       passive voice
                                                                                           exactly one shall or should
                                                                  QuARS                    multiplicity
          R4 The system shall allow a user to create his                       allow       superfluous infinitives
                                                                  QVscribe
          profile and set his credentials                                      his         cross-referencing pronouns
                                                                  Req. Scout   his         vague pronouns
                                                                  QuARS        and/or      optionality
          R6 The system shall display the list of active
                                                                  QVscribe
          and/or the list of completed orders in the cus-
                                                                               completed   vagueness
          tomer profile                                           Req. Scout
                                                                               and/or      dangerous slash
                                                                  QuARS                    multiplicity
          R7 The system shall maintain customer email in-                      maintain    superfluous infinitives
                                                                  QVscribe
          formation as a required part of customer profile                     as          immeasurable quantification
                                                                  Req. Scout
                                                                  QuARS                    multiplicity
          R9 The system shall allow an user to add and
                                                                  QVscribe     allow       superfluous infinitives
          remove products in the shopping cart
                                                                  Req. Scout
                                                                  QuARS        various     vagueness
          R10 The system shall display various shipping           QVscribe
          methods                                                 Req. Scout   various     vagueness
                                                                                           multiplicity
          R11 The order shall be shipped to the client ad-        QuARS        or          optionality
          dress or, if the “collect in-store” service is avail-                available   variability
          able, to an associated store                            QVscribe     shipped     passive voice
                                                                  Req. Scout   shipped     passive voice
                                                                                           multiplicity
                                                                  QuARS
          R12 The system shall enable the user to select                       select      variability
          the shipping method                                     QVscribe     enable      vague words
                                                                  Req. Scout
                                                                  QuARS        may         weakness
                                                                               may         optional escape clauses
                                                                  QVscribe     about       vague words
          R13 The system may display the current tracking                                  no imperatives
          information about the order                                          current     vagueness
                                                                  Req. Scout   may         occurrence of will or may
                                                                                           exactly one shall or should
                                                                  QuARS        available   variability
          R14 The system shall display the available pay-         QVscribe
          ment methods                                            Req. Scout
                                                                  QuARS        select      variability
          R15 The system shall allow the user to select the
                                                                  QVscribe     allow       superfluous infinitives
          payment method for order
                                                                  Req. Scout
                                                                                           multiplicity
                                                                  QuARS        or          optionality
                                                                               may         weakness
                                                                               after       non-specific temporal words
                                                                               may         optional escape clauses
          R16 After delivery, the system may enable the
                                                                  QVscribe     enable      vague words
          users to enter their reviews or ratings
                                                                               their       cross-referencing pronouns
                                                                                           no imperatives
                                                                               their       vague pronouns
                                                                  Req. Scout   may         occurrence of will or may
                                                                                           exactly one shall or should
                                                                  QuARS        needs       cross-tree constraints
          R17 In order to publish the feedback on the pur-
                                                                               both        cross-referencing pronouns
          chases, the system needs to collect both reviews        QVscribe
                                                                                           no imperatives
          and ratings
                                                                  Req. Scout               exactly one shall or should
                                                                  QuARS        excludes    cross-tree constraints
          R18 The “collect in-store” service excludes the
                                                                  QVscribe                 no imperatives
          tracking information service
                                                                  Req. Scout               exactly one shall or should

Table 5: Results of the application of QuARS, QVscribe, and Requirements Scout to the e-shop case study.
Requirements R5 and R8 contain no defect according to all tools.
                                                 QuARS                 Qvscribe        Requirements Scout
                    E-shop              F.Pos.     Amb.   Var.   F.Pos.  Amb.   Var.   F.Pos. Amb.   Var.
       Vagueness                          -          -     1       4        -    -       2      -      2
       Optionality                        -          -     4       -        -    1            n.a.
       Weakness                           -          -     2       -        -    2       -      -      2
       Multiplicity                       5          -     3             n.a.                 n.a.
       Under-Specificaiton                -          -     -             n.a.                 n.a.
       Variability                        1          -     4             n.a.                 n.a.
       Cross-tree Constraints             -          -     2             n.a.                 n.a.
       Imperatives                                n.a.             -        -    5       -      -      5
       Vague Pronouns                             n.a.             1       2     -       -      2      -
       Passive voices                             n.a.             1       2     -       -      2      -
       Immeasurable quantification                n.a.             2       2     -       1      -      -
       Superflous infinitives                     n.a.             4        -    -            n.a.
       Incomplete sentences                       n.a.             -        -    -            n.a.
       Long / complicated sentences               n.a.                   n.a.            -      -      -
       Multiple negations                         n.a.                   n.a.            -      -      -
       Comparatives, superlatives                 n.a.                   n.a.            -      -      -
       Wrong abstraction level                    n.a.                   n.a.            -      -      -
       Dangerous slash                            n.a.                   n.a.            -      1      -
       UI details                                 n.a.                   n.a.            1      -      -

                     Table 6: Summary of variability detection (n.a. means not applicable)
QVscribe, 48 for Requirements Scout.
   Efficiency was found to be an issue for Requirements Scout – to analyze documents of 20 and 50 requirements
the tool takes 1 minute and 2 minutes respectively – while it was not for QVscribe and QuARS: with both
tools the analysis time for the considered documents was few seconds. The problem was probably due to the
larger amount of checks performed by Requirements Scout, so it has to be considered as an issue related to the
particular usage of the tool for variability detection, rather than a generic low performance of the tool.
   Extensibility is represented by the ability to add new quality indicators. This is permitted in QuARS, which
also enables the user to select the indicators she want to use for the analysis. In QVscribe, only the modification
of the indicators already present in the tool is permitted, by adding or removing terms to be identified during
the analysis. Requirements Scout implements indicators’ selection too, but indicators are fixed.
   Report generation is possible in QuARS and QVscribe. In QuARS the report contains, for each quality
indicator, the list of requirements that present an ambiguity, together with the terms deemed incorrect. The
report generated by QVscribe assigns to each requirement a score ranging from 1 to 5, depending on the gravity
of the defects. Moreover, results can be filtered to focus on specific defects.
   Other qualities that are worth mentioning are interoperability and version control. A particularly important
positive aspect of QVscribe is the possibility of integrating the tool as a Word feature, so that the analysis of
a document can be started by selecting QVscribe from the Word ribbon, and selecting the requirements to be
analyzed. This feature also permits to correct the requirements on the fly and run the analysis again.
   A version control system is offered by Requirements Scout: the tool records the history containing the various
versions of a document so that the comparison of two versions of the same document returns the list of defects
added or removed. However the tool does not permit any editing of the document under analysis: the user has
to edit the document externally and load the new version.

3.2   Comparison of the ability to detect variability
We now focus on comparing the three tools from the point of view of the ability of their indicators to detect
variation points. We are interested in the quality indicators for which, using QVscribe or Requirements Scout, we
have detected a different number of variabilities than using QuARS. We report in Table 5 the raw outcomes of the
analysis of the E-shop example with the three tools, requirement by requirement: the ”Indicator” column shows
the words that have been considered by each tool to indicate a certain defect (reported in the last column). These
results have then been manually analysed to see whether the defects could actually be seen as variation points.
The detailed results of this analysis are discussed in the following indicator by indicator. Table 6 cumulatively
shows the number of false positives, ambiguities, and variabilities, with respect to the notion of variation point,
as the result of the manual analysis of the tools’ outcome of Table 5.
   For vagueness QuARS detects a defect, QVscribe and Requirements Scout detect four defects each. The
vagueness related to requirement R10, detected both by QuARS and Requirements Scout, can be indeed classified
as a variability (various). The same happens for the term possibly detected by Requirements Scout in R3. All
the other defects are false positives.
   We note that the term possibly in QuARS and QVscribe is an indicator of Optionality and is hence detected
according to another indicator.
   With respect to optionality, we refer to its meaning as in QuARS, and include the term possibly, classified
as Optional Escape Clause by QVscribe. According to this indicator, there are four variabilities detected by
QuARS (R3, R6, R11, and R16) and one by QVscribe (R3). Optionality is not an indicator of Requirement
Scout. The good number of variabilities detected by QuARS is due to fact that it is the only tool looking for
occurrences of or and of and/or.
   For weakness all the tools perform the same on E-shop. Weakness is referred to as optional escape clause in
QVscribe and occurrence of will or may in Requirement Scout. When applying the tools to other documents, we
have also observed that QuARS and QVscribe detect the weak verb can which is not detected by Requirements
Scout.
   The three variabilities related to multiplicity in QuARS are indeed conjunctions as in R4 or disjunctions as
in R11, R16. Conjunctions are not really a variability indication, they simply show that all alternatives are
mandatory: in software product lines terminology, customer profile and credentials in R4 are two mandatory
features of E-shop. Disjunctions in R11, R16 are also detected by optionality indicators.
   Variability and cross-tree constraints, the new indicators added to QuARS precisely to detect variation points,
are found in a number or requirements. There is a false positive, in R3, and four variabilities, in R11, R12,
R14, R15. Requirements R17 and R18 contain constraints among features.
   Looking at Table 6, we notice that the absence of imperatives is a main variability indicator. This is an
indicator considered by QVScribe (no imperatives) and in Requirement Scout (exactly one shall or should ), but
not by QuARS, hence this finding is an answer to our quest for new indicators. However, we can notice that the
requirements lacking an imperative (R3, R13, R16, R17, R18) are those containing terms such as if possible,
or weak verbs such as may or can or cross-tree constraints. QuARS captures them with other indicators, namely
optionality, weakness and cross-tree constraints indicators.
   To conclude, we observe that none of the indicators going, in Table 6, from vague pronouns to UI details, that
are present in QVscribe or in Requirement Scout and not in QuARS, prove useful in detecting variability.

4   Related Work and Conclusions
Ambiguity detection in requirements is a lively research field, with several contributions published already in
the nineties (e.g., the ARM tool [WRH97]). Most of the works stem from the typically defective terms and
constructions classified in the ambiguity paper of Berry et al. [BK04]. Based on these studies, rule-based NLP
tools such as QuARS [GLT05], SREE [TB13] and the tool of Gleich et al. [GCK10] were developed. More
recently, industrial applications of these approaches were studied by Femmer et al. [FFWE17] and by Ferrari et
al. [FGR+ 18]. Furthermore, Arora et al. [ASBZ15] presented RETA (REquirements Template Analyzer), a tool
that employs rule-based approaches to detect typical ambiguous terms and constructions, as the other mentioned
works, and, in addition, checks the conformance of the requirements to a given template.
    As shown also in these studies, rule-based approaches tend to produce a high number of false positive cases –
i.e., linguistic ambiguities that have one single reading in practice. Hence, statistical approaches were proposed
by Chantree et al. [CNDRW06] and by Yang et al. [YDRG+ 10] to reduce the number of false positive cases,
referred as innocuous ambiguities. Statistical NLP approaches are also used in [FDG17], to identify domain-
dependent ambiguities, i.e., pragmatic ambiguities that depend on the domain background of the reader of the
requirements.
Our work differs from the contributions in this field, in that it integrates the research in ambiguity detection,
with the research in feature identification. More specifically, we use the ambiguity detection capabilities of the
considered tools to identify variation points in requirements documents. The closest works in feature identification
are those that focus on variant feature identification from NL documents, as, e.g., [FSD13, NBA+ 17]. However,
these works leverage the automated extraction of domain-specific terms, while in our work we focus on ambiguity
detection.
   An alternative to the experimentation of off-the-shelf tools that we have proposed in this paper is the adoption
and customization of more general and flexible NLP tools, that allow to tune the kind of ambiguities and other
defects that can be used as variability indicators. GATE [Cun02] is an example of such tools: it collects several
NLP modules and provides a means to define ad hoc rules (JAPE rules), so to create advanced and customized
NLP solutions. As an example related to requirement analysis, in [FGR+ 18] GATE was used to tune the proposed
requirement analysis according to the requirement writing style adopted by the involved company, achieving a
significantly better quality of the analysis. This alternative, that requires a substantial work to build a new
framework on top of GATE, has not been considered in this paper, which was limited to a comparison of existing
tools: anyway, the information gathered by the presented comparison can provide useful input to a future effort
dedicated to build (e.g. on top of GATE) a performant customized NLP tool to extract variability information
from requirement documents.
   Summing up, our focus on the ambiguity indicators provided by the considered tools have soon brought us to
exclude Semantha and ReqSuite as not useful to detect variability. The experimentation of the three remaining
tools has shown that QuARS performs slightly better than QVscribe, followed by Requirement Scout, in terms
of the number of variabilities that their indicators were able to identify. On the other hand, the latter tools
have pointed at the absence of imperatives as a main variability indicator, not provided by QuARS, so providing
an answer to our quest for new indicators. These preliminary results need to be confirmed by extending the
experiments to other requirements documents; we are currently considering two publicly available larger case
studies, namely Blit taken from [FFGS18b] and Digital Home taken from [FFGS18a]. Blit is a draft of the
functional specication of the 55 requirements of a business project management tool, DigitalHome specifies the
requirements for the development of a Smart House, where a resident can manage devices that control the
environment of the home. Although we have not completed the analysis of these case studies, they seem to
confirm the results achieved on the E-shop example. Our study was limited to look at the tools presented in
the last edition of NLP4RE, but other tools should be considered as well. One candidate is RAT (Requirements
Authoring Tool) from REUSE (https://www.reusecompany.com/), that is able to detect ambiguities, but that,
for its commercial nature, was not available for our study.

Acknowledgements
We gratefully thank the developers of the tools (QVscribe, Requirements Scout, Semantic Processing Framework,
and ReqSuite) that provided us with the licence for academic purposes.
  The research has been partially supported by the MIUR project PRIN 2017FTXR7S “IT-MaTTerS” (Methods
and Tools for Trustworthy Smart Systems).

References
[AFGS20]     E. Arganese, A. Fantechi, S. Gnesi, and L. Semini. Nuts and bolts of extracting variability models
             from natural language requirements documents. In Integrating Research and Practice in Software
             Engineering, volume 851 of Studies in Computational Intelligence, pages 125–143. Springer, 2020.

[Arr19]      M. Arrabito. Strumenti di analisi dei requisiti per specificare linee di prodotti, 12 2019. Bachelor
             thesis, Dipartimento di Informatica, Univ. di Pisa, (In italian).

[ASBZ15]     C. Arora, M. Sabetzadeh, L. Briand, and F. Zimmer. Automated checking of conformance to
             requirements templates using natural language processing. IEEE Trans. on Software Engineering,
             41(10):944–968, 2015.

[BH11]       E. Boutkova and F. Houdek. Semi-automatic identification of features in requirement specifications.
             In RE 2011, 19th IEEE Int. Requirements Engineering Conf., pages 313–318. IEEE, 2011.

[BK04]       D.M. Berry and E. Kamsties. Ambiguity in requirements specification. In Perspectives on software
             requirements, volume 753 of The International Series in Engineering and Computer Science, pages
             7–44. Springer US, 2004.

[CNDRW06] F. Chantree, B. Nuseibeh, A. De Roeck, and A. Willis. Identifying nocuous ambiguities in natural
          language requirements. In Proceedings of the 14th IEEE International Conference on Requirements
          Engineering, pages 59–68, Minneapolis/St.Paul, MN, USA, September 2006. IEEE.

[Cun02]      H. Cunningham. Gate, a general architecture for text engineering. Computers and the Humanities,
             36(2):223–254, 2002.
[DFF+ 19]    F. Dalpiaz, A. Ferrari, X. Franch, S. Gregory, F. Houdek, and C. Palomares. NLP4RE. In Joint
             Proceedings of REFSQ-2019 Workshops, Essen, Germany, CEUR Workshop Proceedings. CEUR-
             WS.org, 2019.
[FDG17]      A. Ferrari, B. Donati, and S. Gnesi. Detecting domain-specific ambiguities: an NLP approach based
             on wikipedia crawling and word embeddings. In Proc. 4th Int. Workshop on Artificial Intelligence
             for Requirements (AIRE), 25th Int. Requirements Engineering Conf. Workshops, pages 393–399.
             IEEE, 2017.
[FFGS18a]    A. Fantechi, A. Ferrari, S. Gnesi, and L. Semini. Hacking an ambiguity detection tool to extract
             variation points: an experience report. In Proc. 12th Int. Workshop on Variability Modelling of
             Software-Intensive Systems (VAMOS), Madrid, pages 43–50. ACM, 2018.
[FFGS18b]    A. Fantechi, A. Ferrari, S. Gnesi, and L. Semini. Requirement engineering of software product lines:
             Extracting variability using NLP. In Proc. 26th IEEE International Requirements Engineering
             Conference, Banff, AB, Canada, pages 418–423. IEEE, 2018.
[FFWE17]     H. Femmer, D.M. Fernández, S. Wagner, and S. Eder. Rapid quality assurance with requirements
             smells. Journal of Systems and Software, 123:190–213, 2017.
[FGR+ 18]    A. Ferrari, G. Gori, B. Rosadini, I. Trotta, S. Bacherini, A. Fantechi, and S. Gnesi. Detecting
             requirements defects with NLP patterns: an industrial experience in the railway domain. Empirical
             Software Engineering, 23(6):3684–3733, 2018.
[FGS17]      A. Fantechi, S. Gnesi, and L. Semini. Ambiguity defects as variation points in requirements. In
             Proc. 11th Int. Workshop on Variability Modelling of Software-intensive Systems (VAMOS), pages
             13–19, Eindhoven, 2017. ACM.
[FSD13]      A. Ferrari, G.O. Spagnolo, and F. Dell’Orletta. Mining commonalities and variabilities from natural
             language documents. In Proceedings of the 17th International Software Product Line Conference,
             SPLC 2013, Tokyo, Japan - August 26 - 30, 2013, pages 116–120, 2013.
[FSGD15]     A. Ferrari, G.O. Spagnolo, S. Gnesi, and F. Dell’Orletta. CMT and FDE: tools to bridge the gap
             between natural language documents and feature diagrams. In Proc. 19th International Conference
             on Software Product Line, SPLC 2015, Nashville, USA, pages 402–410. ACM, 2015.
[GCK10]      B. Gleich, O. Creighton, and L. Kof. Ambiguity detection: Towards a tool explaining ambigu-
             ity sources. In Requirements Engineering: Foundation for Software Quality, 16th Int. Working
             Conference, REFSQ 2010, Essen, Germany, volume 6182 of LNCS, pages 218–232. Springer, 2010.
[GLT05]      S. Gnesi, G. Lami, and G. Trentanni. An automatic tool for the analysis of natural language
             requirements. Computer Systems: Science & Engineering, 20(1), 2005.
[NBA+ 17]    S. Ben Nasr, G. Bécan, M. Acher, J.B. Ferreira Filho, N. Sannier, B. Baudry, and J.-M. Davril.
             Automated extraction of product comparison matrices from informal product descriptions. Journal
             of Systems and Software, 124:82–103, 2017.
[PBvdL05]    K. Pohl, G. Böckle, and F. van der Linden. Software Product Line Engineering - Foundations,
             Principles, and Techniques. Springer, 2005.
[TB13]       S.F. Tjong and D.M. Berry. The design of SREE - a prototype potential ambiguity finder for
             requirements specifications and lessons learned. In International Working Conference on Require-
             ments Engineering: Foundation for Software Quality, volume 7830 of LNCS, pages 80–95, Essen,
             Germany, 2013. Springer.
[WRH97]      W. Wilson, L. Rosenberg, and L. Hyatt. Automated analysis of requirement specifications. In
             Proc. of the 19th Int. Conf. on Software Engineering (ICSE), pages 161–171, Boston, 1997. ACM.
[YDRG+ 10] H. Yang, A. De Roeck, V. Gervasi, A. Willis, and B. Nuseibeh. Extending nocuous ambiguity anal-
           ysis for anaphora in natural language requirements. In Proceedings of RE 2010, 18th International
           Requirements Engineering Conference, pages 25–34, Sydney, Australia, 2010. IEEE.