Introduction

A comparison of NLP Tools for RE to extract Variation Points

Monica Arrabito

monica.arrabito@hotmail.it 0 1

ISTI-CNR

0 1

stefania.gnesi@isti.cnr.it

0 1 0 Dipartimento di Informatica, Universita di Pisa 1 Laura Semini Dipartimento di Informatica, Universita di Pisa

2020

In the requirement engineering of software product lines, several researches have focused on exploiting NLP techniques and tools to extract information related to features and variability from requirement documents. In a previous work we have proposed the use of the tool QuARS, a NLP Tool for Requirements Analysis, showing that some of the indicators used to detect NL ambiguity can also be exploited to detect variability. In this paper we discuss a comparison of the application at this regard of QuARS and other Requirements Analysis tools presented in the last edition of NLP4RE, in particular with respect to their ability to extract potential variation points, in search of better performances and of novel indicators.

Introduction

Natural language (NL) is the most common way to express software requirements even though it is intrinsically ambiguous, and ambiguity is seen as a possible source of problems in the later interpretation of requirements. Natural Language Processing (NLP) techniques have been used to analyse requirement documents to single out, among other issues, ambiguity defects.

Although ambiguity or under-speci cation at requirement level are usually seen as defects, they can be sometimes deliberate, as a form to leave some issues to further analysis or to later decision. In this case, ambiguity Defect vagueness subjectivity optionality implicity weakness under-speci cation multiplicity

Variability

variability cross-tree constraints

Indicators clear, easy, strong, good, bad, adequate... similar, have in mind, take into account,... or, and/or, possibly, eventually, case, if possible, if appropriate... demonstrative adjectives or pronouns weak verbs (e.g.: can, could, may ) wordings to be instantiated (e.g.: information, interface) multiple syntactic constructs such as multiple verbs or multiple subjects

Indicators

if, where, whether, when, choose, choice, con gurable...

need, request, require, excludes, expect defects could give an indication of possible variability, either in design choice, in implementation choices or by later con gurability.

In this view, ambiguous or under-speci ed requirements can be considered as requirements for a product line [PBvdL05], where di erent products can be obtained by di erent choices in design, implementation and con guration. So, analysing a requirement document, the ambiguity that is detected can be used to identify possible variation points, where ambiguity is not a defect but points to di erent choices that can give space for a range of di erent products.

In [FGS17] we have presented this idea, with its consequence, speci cally that NLP tools may help in revealing information on variability that can be later used to de ne feature models from which di erent systems can be instantiated. In subsequent works [FFGS18a, FFGS18b] we presented the results of an empirical evaluation of the position, performed on large requirement documents from industry using QuARS [GLT05]. QuARS - Quality Analyzer for Requirement Speci cations { is a tool for analyzing NL requirements in a systematic and automatic way by means of NLP techniques with a focus on ambiguity detection. The evaluation has shown that some ambiguous terms, such as \and/or" \or", and weak terms such as \may" or \could" are more likely to indicate variability rather than ambiguity. Instead, typically vague terms, such as \useful", \signi cant", etc. are more likely to indicate ambiguity. This has been realized implementing a quality model composed of high level quality properties for NL requirements to be evaluated by means of indicators directly detectable and measurable on NL requirement documents.

Here, we report on the experience made with industrial NLP analysis tools, aimed to compare their performance in detecting variability with that of QuARS, and to see whether they include any fresh indicator relevant to detect variability [Arr19]. We have considered the tools that were presented in a showcase at NLP4RE19 [DFF+19], namely Requirements Scout, Semantic Processing Platform, QVscribe, and ReqSuite. After a preliminary analisys, we restricted the attention to Requirements Scout and QVscribe and we have run experiments to compare them with QuARS. Our aim is twofold: to validate the approach we proposed in [FGS17] and to compare the e ectiveness of the di erent indicators to extract variability from linguistic defects.

In Section 2 we introduce the tools used for the comparison, in Section 3 we describe the results of the application of the tools to a case study and analyse the results. A related works and conclusions section follows. 2 2.1

The tools QuARS

QuARS allows to perform an initial parsing of the requirements for automatically detecting potential linguistic defects due to ambiguity that can determine interpretation problems at the subsequent development stages of a software product. The analysis is done starting from a quality model composed of high level quality properties for NL requirements to be evaluated by means of indicators. These indicators are listed in the rst part of Table 1. The second part of the table includes the speci c variability revealing indicators introduced in [FFGS18b, AFGS20]. 2.2

QVscribe

QVscribe is a tool for requirements analysis for quality and consistency, developed by QRA (https://qracorp.com/). QVscribe analyzes the quality of the requirements, highlighting ambiguity, inconsisDefect imperatives optional escape clauses vague words cross-referencing pronouns non-speci c temporal words continuances super uous in nitives passive voice immeasurable quanti cation incomplete sentences Defect long&complicated sentences passive voice multiple negations universal quanti ers imprecise phrases (vagueness) vague pronouns comparatives & superlatives exactly one shall or should occurrence of will or may wrong abstraction level dangerous slash

UI details

cloning

Indicators absence, negation, or multiple occurrence of imperatives optional terms like possibly, may, . . . vague nouns and verbs as various, completed, . . . both, everybody, anyone, it,. . . e.g. an o ce has a door connecting it to a hallway early, years ago, before, . . . otherwise, in particular, below, following, . . . as in shall permit since they can hide the subject based, found, shipped since it can hide the subject abundant, far, always, all, . . . missing critical details of who must do something or what must be done Indicators, if any, or motivation which are di cult to read and prone to ambiguities done, found, sent, . . . since they can hide the subject requirements must be expressed in positive terms all, always, every, any, nothing, . . . possibly, various, current, small, general, if possible, . . . that, which, their, it, nobody, . . . faster than, fastest, bigger than, . . . they make a requirement not understandable in isolation more than one occurrence of shall or should weak verbs such as will, may, . . . to exclude implementation details "/" , that can be interpreted both as an and and an or, like in "The system sends an email alert to all administrators/managers" requirements should not contain details of the user interface.

since duplicates burden successive maintenance tencies and possible similarities. In addition, it allows the generation of detailed reports that can be used to increase clarity and consistency of requirements, reducing the review and rewriting work and avoiding that critical errors can manifest themselves in the later stages of development. The defects detected by the tool and the related indicators can be classi ed according to Table 2. 2.3

Requirements Scout

Requirements Scout is a tool developed by Qualicen GmbH, to analyze requirements speci cations (https://www.qualicen.de/en/). Requirements Scout, besides analyzing NL requirements with the aim of identifying the defects, also allows the analyst to keep track of di erent versions of the requirements, creating a complete history of the detected defects: as soon as the requirements are updated, the tool re-analyzes the modi ed parts and shows whether the update has eliminated existing defects or has introduced new ones.

The defects and indicators of Requirements Scout are shown in Table 3. 2.4

Semantic Processing Platform

Semantic Processing Platform (Semantha) addresses the comparison of documents at the semantic level. It has been developed by thingsThinking (https://www.thingsthinking.net/). Semantha is able to search for common concepts in di erent documents, comparing them and highlighting the di erences. 2.5

ReqSuite

The ReqSuite tool, produced by OSSENO Software GmbH (https://www.osseno.com/en/), supports rigorous requirement de nition by assisting the writer who is asked to follow some patterns. For instance, requirements have to be formulated as \The system shall", and some elds have to be lled. These elds are: ID, bene t, damage, costs, priority, type (functional/quality) and source (to specify the stakeholder requiring that requirement).

R1 The system shall enable user to enter the search text on the screen.

R2 The system shall display all the matching products based on the search.

R3 The system possibly noti es with a pop-up the user when no matching product is found on the search. R4 The system shall allow a user to create his pro le and set his credentials.

R5 The system shall authenticate user credentials to enter the pro le.

R6 The system shall display the list of active and/or the list of completed orders in the customer pro le. R7 The system shall maintain customer email information as a required part of customer pro le. R8 The system shall send an order con rmation to the user through email.

R9 The system shall allow an user to add and remove products in the shopping cart.

R10 The system shall display various shipping methods.

R11 The order shall be shipped to the client address or, if the \collect in-store" service is available, to an associated store.

R12 The system shall enable the user to select the shipping method.

R13 The system may display the current tracking information about the order.

R14 The system shall display the available payment methods.

R15 The system shall allow the user to select the payment method for order.

R16 After delivery, the system may enable the users to enter their reviews or ratings.

R17 In order to publish the feedback on the purchases, the system needs to collect both reviews and ratings. R18 The \collect in-store" service excludes the tracking information service. A preliminary analysis has shown that only two of the considered industrial tools, namely QVscribe and Requirements Scout, are suited to detect ambiguity defects, returning an output similar to QuARS.

ReqSuite performs a structural analysis, but it does not address the detection of lexical or syntactical ambiguities that can be of help to single out variation points. For instance: if a weak verb or a passive voice are used, the tool generically asks to rephrase the sentence into a "shall" or "should" statement; a vague term as various and an optional term as possibly are not detected.

Semantic Processing Platform could actually be used to search for variabilities, but with a di erent technique, namely exploiting a contrastive analysis. This is a known approach to apply NLP techniques to extract information related to features from existing NL documents [FSD13, FSGD15, NBA+17]. The idea is that of extracting candidate terms from di erent documents describing a product, and identifying commonalities and variabilities to be collected in a product family, as, e.g., in [BH11]. Our technique, on the contrary, exploiting ambiguity detection, permits to extract variability from a single requirement document. Semantic Processing Platform is hence not suited to this kind of analysis since it is not designed to detect ambiguity.

As a consequence, in the following we apply only QVscribe and Requirements Scout to work out a ne grained comparison with QuARS. 3

Application of the tools to a E-shop case study: Results and Observations

QuARS, QVscribe and Requirements Scout are compared rst for their general qualities, then in their ability to support a variability extraction process. The running example is a simple E-shop [AFGS20], whose requirements are presented in Table 4.

In [FFGS18b, AFGS20] we have shown that the QuARS ambiguity indicators that proved most useful to indicate variation points are multiplicity and weakness, but also optionality is another natural candidate to detect variability. The experiment presented here is aimed to understand if the other tools perform better at this regard, and to understand if they are able to reveal any fresh indicator relevant to detect variability. 3.1

General qualities comparison

We rst address documentation, learnability, and usability. QuARS was simple to learn and use without referring to any manual, also by the rst author who was a newcomer to our project. QVscribe comes equipped with good documentation and video tutorials and it was easy to be acquainted with. Requirements Scout is the tool were most di culties were encountered, because of lack of documentation, a non intuitive interface, and a complex setup of the user pro le.

To give a rough measure of learnability, the rst author, which had no previous experience on any of the three tools, needed the following number of hours of training in order to pro ciently use them: 30 for QuARS, 36 for Requirement R1 The system shall enable the user to enter the search text on the screen.

R2 The system shall display all the matching products based on the search.

R3 The system possibly noti es with a pop-up the user when no matching product is found on the search.

R4 The system shall allow a user to create his pro le and set his credentials R6 The system shall display the list of active and/or the list of completed orders in the customer pro le R7 The system shall maintain customer email information as a required part of customer pro le R9 The system shall allow an user to add and remove products in the shopping cart R10 The system shall display various shipping methods R11 The order shall be shipped to the client address or, if the \collect in-store" service is available, to an associated store R12 The system shall enable the user to select the shipping method R13 The system may display the current tracking information about the order R14 The system shall display the available payment methods R15 The system shall allow the user to select the payment method for order R16 After delivery, the system may enable the users to enter their reviews or ratings R17 In order to publish the feedback on the purchases, the system needs to collect both reviews and ratings R18 The \collect in-store" service excludes the tracking information service

Tool QuARS QVscribe Req. Scout QuARS Vagueness Optionality Weakness Multiplicity Under-Speci caiton Variability Cross-tree Constraints Imperatives Vague Pronouns Passive voices Immeasurable quanti cation Super ous in nitives Incomplete sentences Long / complicated sentences Multiple negations Comparatives, superlatives Wrong abstraction level Dangerous slash UI details

QuARS

Amb.

n.a. n.a. n.a. n.a. n.a. n.a. n.a. n.a. n.a. n.a. n.a. n.a.

F.Pos.

4 5

Comparison of the ability to detect variability

We now focus on comparing the three tools from the point of view of the ability of their indicators to detect variation points. We are interested in the quality indicators for which, using QVscribe or Requirements Scout, we have detected a di erent number of variabilities than using QuARS. We report in Table 5 the raw outcomes of the analysis of the E-shop example with the three tools, requirement by requirement: the "Indicator" column shows the words that have been considered by each tool to indicate a certain defect (reported in the last column). These results have then been manually analysed to see whether the defects could actually be seen as variation points. The detailed results of this analysis are discussed in the following indicator by indicator. Table 6 cumulatively shows the number of false positives, ambiguities, and variabilities, with respect to the notion of variation point, as the result of the manual analysis of the tools' outcome of Table 5.

For vagueness QuARS detects a defect, QVscribe and Requirements Scout detect four defects each. The vagueness related to requirement R10, detected both by QuARS and Requirements Scout, can be indeed classi ed as a variability (various). The same happens for the term possibly detected by Requirements Scout in R3. All the other defects are false positives.

We note that the term possibly in QuARS and QVscribe is an indicator of Optionality and is hence detected according to another indicator.

With respect to optionality, we refer to its meaning as in QuARS, and include the term possibly, classi ed as Optional Escape Clause by QVscribe. According to this indicator, there are four variabilities detected by QuARS (R3, R6, R11, and R16) and one by QVscribe (R3). Optionality is not an indicator of Requirement Scout. The good number of variabilities detected by QuARS is due to fact that it is the only tool looking for occurrences of or and of and/or.

For weakness all the tools perform the same on E-shop. Weakness is referred to as optional escape clause in QVscribe and occurrence of will or may in Requirement Scout. When applying the tools to other documents, we have also observed that QuARS and QVscribe detect the weak verb can which is not detected by Requirements Scout.

The three variabilities related to multiplicity in QuARS are indeed conjunctions as in R4 or disjunctions as in R11, R16. Conjunctions are not really a variability indication, they simply show that all alternatives are mandatory: in software product lines terminology, customer pro le and credentials in R4 are two mandatory features of E-shop. Disjunctions in R11, R16 are also detected by optionality indicators.

Variability and cross-tree constraints, the new indicators added to QuARS precisely to detect variation points, are found in a number or requirements. There is a false positive, in R3, and four variabilities, in R11, R12, R14, R15. Requirements R17 and R18 contain constraints among features.

Looking at Table 6, we notice that the absence of imperatives is a main variability indicator. This is an indicator considered by QVScribe (no imperatives) and in Requirement Scout (exactly one shall or should ), but not by QuARS, hence this nding is an answer to our quest for new indicators. However, we can notice that the requirements lacking an imperative (R3, R13, R16, R17, R18) are those containing terms such as if possible, or weak verbs such as may or can or cross-tree constraints. QuARS captures them with other indicators, namely optionality, weakness and cross-tree constraints indicators.

To conclude, we observe that none of the indicators going, in Table 6, from vague pronouns to UI details, that are present in QVscribe or in Requirement Scout and not in QuARS, prove useful in detecting variability. 4

Related Work and Conclusions

Ambiguity detection in requirements is a lively research eld, with several contributions published already in the nineties (e.g., the ARM tool [WRH97]). Most of the works stem from the typically defective terms and constructions classi ed in the ambiguity paper of Berry et al. [BK04]. Based on these studies, rule-based NLP tools such as QuARS [GLT05], SREE [TB13] and the tool of Gleich et al. [GCK10] were developed. More recently, industrial applications of these approaches were studied by Femmer et al. [FFWE17] and by Ferrari et al. [FGR+18]. Furthermore, Arora et al. [ASBZ15] presented RETA (REquirements Template Analyzer), a tool that employs rule-based approaches to detect typical ambiguous terms and constructions, as the other mentioned works, and, in addition, checks the conformance of the requirements to a given template.

As shown also in these studies, rule-based approaches tend to produce a high number of false positive cases { i.e., linguistic ambiguities that have one single reading in practice. Hence, statistical approaches were proposed by Chantree et al. [CNDRW06] and by Yang et al. [YDRG+10] to reduce the number of false positive cases, referred as innocuous ambiguities. Statistical NLP approaches are also used in [FDG17], to identify domaindependent ambiguities, i.e., pragmatic ambiguities that depend on the domain background of the reader of the requirements.

Our work di ers from the contributions in this eld, in that it integrates the research in ambiguity detection, with the research in feature identi cation. More speci cally, we use the ambiguity detection capabilities of the considered tools to identify variation points in requirements documents. The closest works in feature identi cation are those that focus on variant feature identi cation from NL documents, as, e.g., [FSD13, NBA+17]. However, these works leverage the automated extraction of domain-speci c terms, while in our work we focus on ambiguity detection.

An alternative to the experimentation of o -the-shelf tools that we have proposed in this paper is the adoption and customization of more general and exible NLP tools, that allow to tune the kind of ambiguities and other defects that can be used as variability indicators. GATE [Cun02] is an example of such tools: it collects several NLP modules and provides a means to de ne ad hoc rules (JAPE rules), so to create advanced and customized NLP solutions. As an example related to requirement analysis, in [FGR+18] GATE was used to tune the proposed requirement analysis according to the requirement writing style adopted by the involved company, achieving a signi cantly better quality of the analysis. This alternative, that requires a substantial work to build a new framework on top of GATE, has not been considered in this paper, which was limited to a comparison of existing tools: anyway, the information gathered by the presented comparison can provide useful input to a future e ort dedicated to build (e.g. on top of GATE) a performant customized NLP tool to extract variability information from requirement documents.

Summing up, our focus on the ambiguity indicators provided by the considered tools have soon brought us to exclude Semantha and ReqSuite as not useful to detect variability. The experimentation of the three remaining tools has shown that QuARS performs slightly better than QVscribe, followed by Requirement Scout, in terms of the number of variabilities that their indicators were able to identify. On the other hand, the latter tools have pointed at the absence of imperatives as a main variability indicator, not provided by QuARS, so providing an answer to our quest for new indicators. These preliminary results need to be con rmed by extending the experiments to other requirements documents; we are currently considering two publicly available larger case studies, namely Blit taken from [FFGS18b] and Digital Home taken from [FFGS18a]. Blit is a draft of the functional specication of the 55 requirements of a business project management tool, DigitalHome speci es the requirements for the development of a Smart House, where a resident can manage devices that control the environment of the home. Although we have not completed the analysis of these case studies, they seem to con rm the results achieved on the E-shop example. Our study was limited to look at the tools presented in the last edition of NLP4RE, but other tools should be considered as well. One candidate is RAT (Requirements Authoring Tool) from REUSE (https://www.reusecompany.com/), that is able to detect ambiguities, but that, for its commercial nature, was not available for our study.

Acknowledgements

We gratefully thank the developers of the tools (QVscribe, Requirements Scout, Semantic Processing Framework, and ReqSuite) that provided us with the licence for academic purposes.

The research has been partially supported by the MIUR project PRIN 2017FTXR7S \IT-MaTTerS" (Methods and Tools for Trustworthy Smart Systems). [AFGS20]

E. Arganese, A. Fantechi, S. Gnesi, and L. Semini. Nuts and bolts of extracting variability models from natural language requirements documents. In Integrating Research and Practice in Software Engineering, volume 851 of Studies in Computational Intelligence, pages 125{143. Springer, 2020. M. Arrabito. Strumenti di analisi dei requisiti per speci care linee di prodotti, 12 2019. Bachelor thesis, Dipartimento di Informatica, Univ. di Pisa, (In italian).

C. Arora, M. Sabetzadeh, L. Briand, and F. Zimmer. Automated checking of conformance to requirements templates using natural language processing. IEEE Trans. on Software Engineering, 41(10):944{968, 2015.

E. Boutkova and F. Houdek. Semi-automatic identi cation of features in requirement speci cations. In RE 2011, 19th IEEE Int. Requirements Engineering Conf., pages 313{318. IEEE, 2011. D.M. Berry and E. Kamsties. Ambiguity in requirements speci cation. In Perspectives on software requirements, volume 753 of The International Series in Engineering and Computer Science, pages 7{44. Springer US, 2004. [CNDRW06] F. Chantree, B. Nuseibeh, A. De Roeck, and A. Willis. Identifying nocuous ambiguities in natural language requirements. In Proceedings of the 14th IEEE International Conference on Requirements Engineering, pages 59{68, Minneapolis/St.Paul, MN, USA, September 2006. IEEE.

H. Cunningham. Gate, a general architecture for text engineering. Computers and the Humanities, 36(2):223{254, 2002. [FDG17] [FFGS18b] [FFWE17] [FGR+18] [NBA+17] [PBvdL05]

A. Ferrari, B. Donati, and S. Gnesi. Detecting domain-speci c ambiguities: an NLP approach based on wikipedia crawling and word embeddings. In Proc. 4th Int. Workshop on Arti cial Intelligence for Requirements (AIRE), 25th Int. Requirements Engineering Conf. Workshops, pages 393{399. IEEE, 2017.

A. Fantechi, A. Ferrari, S. Gnesi, and L. Semini. Hacking an ambiguity detection tool to extract variation points: an experience report. In Proc. 12th Int. Workshop on Variability Modelling of Software-Intensive Systems (VAMOS), Madrid, pages 43{50. ACM, 2018.

A. Fantechi, A. Ferrari, S. Gnesi, and L. Semini. Requirement engineering of software product lines: Extracting variability using NLP. In Proc. 26th IEEE International Requirements Engineering Conference, Ban , AB, Canada, pages 418{423. IEEE, 2018.

H. Femmer, D.M. Fernandez, S. Wagner, and S. Eder. Rapid quality assurance with requirements smells. Journal of Systems and Software, 123:190{213, 2017.

A. Ferrari, G. Gori, B. Rosadini, I. Trotta, S. Bacherini, A. Fantechi, and S. Gnesi. Detecting requirements defects with NLP patterns: an industrial experience in the railway domain. Empirical Software Engineering, 23(6):3684{3733, 2018.

A. Fantechi, S. Gnesi, and L. Semini. Ambiguity defects as variation points in requirements. In Proc. 11th Int. Workshop on Variability Modelling of Software-intensive Systems (VAMOS), pages 13{19, Eindhoven, 2017. ACM.

A. Ferrari, G.O. Spagnolo, and F. Dell'Orletta. Mining commonalities and variabilities from natural language documents. In Proceedings of the 17th International Software Product Line Conference, SPLC 2013, Tokyo, Japan - August 26 - 30, 2013, pages 116{120, 2013.

A. Ferrari, G.O. Spagnolo, S. Gnesi, and F. Dell'Orletta. CMT and FDE: tools to bridge the gap between natural language documents and feature diagrams. In Proc. 19th International Conference on Software Product Line, SPLC 2015, Nashville, USA, pages 402{410. ACM, 2015.

B. Gleich, O. Creighton, and L. Kof. Ambiguity detection: Towards a tool explaining ambiguity sources. In Requirements Engineering: Foundation for Software Quality, 16th Int. Working Conference, REFSQ 2010, Essen, Germany, volume 6182 of LNCS, pages 218{232. Springer, 2010. S. Gnesi, G. Lami, and G. Trentanni. An automatic tool for the analysis of natural language requirements. Computer Systems: Science & Engineering, 20(1), 2005.

S. Ben Nasr, G. Becan, M. Acher, J.B. Ferreira Filho, N. Sannier, B. Baudry, and J.-M. Davril. Automated extraction of product comparison matrices from informal product descriptions. Journal of Systems and Software, 124:82{103, 2017.

K. Pohl, G. Bockle, and F. van der Linden. Software Product Line Engineering - Foundations, Principles, and Techniques. Springer, 2005.

S.F. Tjong and D.M. Berry. The design of SREE - a prototype potential ambiguity nder for requirements speci cations and lessons learned. In International Working Conference on Requirements Engineering: Foundation for Software Quality, volume 7830 of LNCS, pages 80{95, Essen, Germany, 2013. Springer.

W. Wilson, L. Rosenberg, and L. Hyatt. Automated analysis of requirement speci cations. In Proc. of the 19th Int. Conf. on Software Engineering (ICSE), pages 161{171, Boston, 1997. ACM.

[FFGS18a] [YDRG+10]

Yang , A. De Roeck , V.

Gervasi , A.

Willis , and B.

Nuseibeh . Extending nocuous ambiguity analysis for anaphora in natural language requirements . In Proceedings of RE 2010 , 18th International Requirements Engineering Conference, pages 25 { 34 , Sydney , Australia, 2010 . IEEE.