=Paper= {{Paper |id=Vol-2951/paper15 |storemode=property |title=NLP-Based Requirements Formalization for Automatic Test Case Generation |pdfUrl=https://ceur-ws.org/Vol-2951/paper15.pdf |volume=Vol-2951 |authors=Robin Gröpler,Viju Sudhi,Emilio José Calleja García,Andre Bergmann |dblpUrl=https://dblp.org/rec/conf/csp/GroplerSGB21 }} ==NLP-Based Requirements Formalization for Automatic Test Case Generation== https://ceur-ws.org/Vol-2951/paper15.pdf

NLP-Based Requirements Formalization for
Automatic Test Case Generation
Robin Gröpler1 , Viju Sudhi1 , Emilio José Calleja García2 and Andre Bergmann2
1
ifak Institut für Automation und Kommunikation e.V., 39106 Magdeburg, Germany
2
AKKA Germany GmbH, 80807 München, Germany

Abstract
Due to the growing complexity and rapid changes of software systems, the assurance of their quality
becomes increasingly difficult. Model-based testing in agile development is a way to overcome these
difficulties. However, major effort is still required to create specification models from a large set of
functional requirements provided in natural language. This paper presents an approach for a machine-
aided requirements formalization technique based on Natural Language Processing (NLP) to be used for
an automatic test case generation. The goal of the presented method is to automate the process of model
creation from requirements in natural language by utilizing appropriate algorithms, thus reducing cost
and effort. The application of our procedure will be demonstrated using an industry example from the
e-mobility domain. In this example, requirement models are generated for a charging approval system
within a larger vehicle battery charging application. Additionally, existing tools for automated model
synthesis and test case generation are applied to our models to evaluate whether valid test cases can be
generated.

Keywords
Requirements analysis, natural language processing, test generation

1. Introduction
In the life cycle of a device, component or system in industrial use, a rapidly changing and
growing number of requirements and the associated increase in features and feature changes
inevitably lead to an increasing effort for verifying requirements and testing of the implemen-
tation. To manage test complexity and reduce necessary test effort and cost, agile methods
for model-based testing have been developed [1]. The effectiveness of model-based testing
highly depends on the quality of the used specification model. The creation and maintenance
of well-defined specification models is therefore crucial and usually comes with high effort and
cost. This is especially true in agile development, where requirements are subject to frequent
changes.
In this context, an approach for requirements-based testing was developed that enables
efficient test processes, see Fig. 1. Model synthesis and model-based test generation methods are
used to systematically and efficiently create a test suite that contains suitable test cases. This
approach is based on behavioral requirements that serve as input for model synthesis. The only

29th International Workshop on Concurrency, Specification and Programming (CS&P’21)
Envelope-Open robin.groepler@ifak.eu (R. Gröpler); viju.sudhi@ifak.eu (V. Sudhi); emiliojose.calleja@gmail.com (E. J. Calleja
García); andre.bergmann@akka.eu (A. Bergmann)
© 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
CEUR
Workshop
Proceedings
http://ceur-ws.org
ISSN 1613-0073
CEUR Workshop Proceedings (CEUR-WS.org)
Functional Sequence Specification Generated
requirements models model test cases

Text documents Req. 1 Req. 2 Req. n UML state machine Abstract test cases

TC6
TC5
TC2 TC4
TC3
TC1

Requirements Model Test Case
Formalization Synthesis Generation
(semi-automated) (automated) (automated)

ReForm ModGen TCG

Figure 1: Toolchain for requirements-based test case generation.

time-consuming manual step is the creation of requirement models from textual requirements
documents.
Recent advances in natural language processing show promising results to organize and
identify desired information from raw text. As a result, NLP techniques show a growing interest
in automating various software development activities like test case generation. Several NLP
approaches and tools have been investigated in recent years aiming to generate test cases from
preliminary requirements documents [2, 3, 4]. A major drawback of existing methods is the use
of controlled natural language or templates that force the requirements engineer or designer
not only to concentrate on the content but also on the syntax of the requirement. Furthermore,
those algorithms are in general not applicable to existing requirements.
In this work, we propose a new, semi-automated technique for requirements-based model gen-
eration that reduces human effort and supports frequent requirements changes and extensions.
The aim of our approach is to develop a method that
1) can handle an extended range of domains and formats of requirements, i.e. it is not limited
to a specific template or controlled natural language, and
2) provides enhanced but easily interpretable intermediate results in the form of a textual
and graphical representation of UML sequence diagrams.
Our approach utilizes an existing NLP parser to obtain basic syntactic information about the
words and their relationship to each other. Based upon this information, several rule-based
steps are performed in order to identify relevant syntactic entities which are then mapped to
semantic entities. Finally, these entities are used to form requirement models as UML sequence
diagrams. The main contributions of this work are
1) the development of a rule-based approach based on NLP information that automates the
various steps involved in deriving requirement models, and
2) the evaluation on an industrial use case using meaningful metrics that demonstrates the
good quality of our approach.
The paper is structured as follows. In Section 2, we briefly outline related work on NLP-
based requirements formalization methods. In Section 3, we present the individual steps of our
methodology for deriving requirement models from textual descriptions. The method is applied
to the battery charging approval system presented in Section 4. In Section 5, we define several
evaluation metrics and demonstrate the results of the application. Finally, a conclusion and
outlook is given in Section 6.

2. Related work
In order to circumvent the challenges of analyzing highly complex requirements, many authors
restrict their NLP approaches to a specific domain or a prescribed format. In [5], the authors
propose an algorithm creating activity diagrams from requirements following a predefined
structure. They consider the SOPHIST method which performs a refinement and formalization
of structured texts by introducing text templates with a defined syntactical structure [6]. In [7], a
small set of structural rules was developed to address common requirement problems including
ambiguity, complexity and vagueness. In [8], requirements are expected to be written in a
controlled natural language and are supposed to be from Data-Flow Reactive Systems (DFRS).
The approach in [9] is to generate test cases from use cases or user stories, both of which
have to comply with a specified format. In [10], requirements engineers shall be supported
with formalization of use case descriptions by writing pre-conditions and post-conditions in a
predefined format from which test cases can be generated automatically. Likewise [11] aims to
find and complete missing test cases by parsing exceptional behaviors from Javadoc comments
written in natural language, provided the documentation is in a specified template. [12] relies
on the artifacts that the programmers create while developing the product which belong to a
smaller subset of specifications.
Even for simple syntactical structures of requirements it is still necessary to enable the
requirements engineer to review the intermediate results, i.e. the generated model artifacts, and
to adjust them where necessary. The toolchain of [13] involves eliciting requirements according
to Restricted Use Case Modeling (RUCM) specifications. This applies to the work of [14], where
the authors attempt to generate executable test cases from a Restricted Test Case Modeling
(RTCM) language which restricts the style of writing test cases. This becomes an additional
overhead to the requirement engineers who draft formal requirements. Additionally, the users
are expected to inspect the generated OCL constraints before proceeding to test case generation.
Similarly, in [15] the authors explore the possibility of test case generation using Petri Net
simulation; however the interpretability of Colored Petri Nets as proposed in the approach
may vary depending on the user’s level of expertise. These intermediate results may not be
easily understood by the user and it may be cumbersome for him to fine-tune or modify the
predictions before generating reliable test cases.
A notable work from the authors of [16] makes use of recursive dependency matching to
formulate test cases. Though our approach aligns with theirs in this step, we attempt to generate
test cases from a broader set of functional requirements while they restrict themselves with
user stories from which a cause-effect relationship can be learnt.
3. Methodology
We utilize an existing NLP parser and use a rule-based algorithm to perform the transformation
from requirements written in natural language to requirement models. Our rule set tries to
conceive all relevant rules that could satisfactorily parse the input behavioral requirement and
extract its semantic content.

3.1. Linguistic pre-processing
The behavioral requirements are, in general, complex by nature. In order to reliably extract the
syntactic and semantic content of these requirements, a thorough linguistic pre-processing is
indispensable. For this stage, we rely on spaCy (v2.1.8) [17] - a free, open-source library for
advanced Natural Language Processing. We follow the basic NLP pipeline including tokenization,
lemmatization, part-of-speech (POS) tagging and dependency parsing in various stages of the
algorithm.

3.1.1. Pronoun resolution
Though the formal requirements tend to avoid first person (I, me, etc.) or second person (you,
your, etc.) pronouns, they may contain third person neutral pronouns (it, they, etc.) [18].
These pronouns are identified and resolved with the farthest subject, inline with the algorithm
proposed in [19] and [20]. Owing to the simplicity of the task, we assume there is no particular
need to use more sophisticated algorithms checking grammatical gender and person while
resolving pronouns. However, we attempt to resolve pronouns only if the grammatical number
of the pronoun agrees with that of the antecedent. Since pleonastic pronouns (pronouns without
a direct antecedent) do not affect the algorithm, they are cited but not replaced.

Example: Consider the requirement ”If the temperature of the battery is below Tmin or it
exceeds Tmax, charging approval has to be withdrawn”. Here, the pronoun it is resolved with
its antecedent the temperature of the battery.

3.1.2. Decomposition
Textual requirements with multiple conditions and conjunctions are hard to be transformed
and mapped to individual relations. This demands decomposition of complex requirements
into simple clauses [21]. Multiple conditions (sentences with multiple if s, whiles, etc.), root
conjunctions (sentences with multiple roots connected with a conjunction) and noun phrase
conjunctions (sentences with multiple subjects and/or objects connected with a conjunction)
are decomposed to simple primitive clauses.
We resort to the syntactic dependencies obtained from the parser to decompose requirements.
The algorithm considers the token(s) with dependency mark to decompose multiple conditions
and dependency conj for decomposing root and noun phrase conjunctions. The span of the
sub-requirement can then be defined by identifying the edges (for e.g. the left-most edge refers
to the token towards the left of the dependency graph with which the parent token holds a
syntactic dependency) of the token of interest.
Example: In the requirement ”If the temperature of the battery is below Tmin or the tempera-
ture of the battery exceeds Tmax, charging approval has to be withdrawn”, the root conjunction
(arising from the two roots is and exceeds) and the subsequent multiple conditions (arising
from if ) are decomposed to three sub-requirements as ”[if the temperature of the battery is
below Tmin] or [if the temperature of the battery exceeds Tmax], [charging approval has to be
withdrawn]”.

3.2. Syntactic entity identification
Almost all behavioral requirements describe a particular action (linguistically, verb) done by an
agent (linguistically, subject) on the system of interest (linguistically, object). This motivates
the idea of identifying syntactic entities from the requirements. The algorithm identifies these
syntactic entities by checking the dependencies of tokens with the root.

1) Action: The main action verb in the requirement (mostly, with the dependency ROOT )
is identified and called an action. The algorithm particularly distinguishes the type of
actions as: Nominal action which has a noun and a verb together (e.g. send a message),
Boolean action which can take a Boolean constraint (e.g. is withdrawn) and Simple action
which has only an action verb (e.g. send).
In addition, the algorithm also tries to identify the verb type(s) (transitive, dative, preposi-
tional, etc.) as suggested in [21] to supplement the syntactic significance of action types.
This is essential particularly when we rely on action types for relation formulation.
2) Subjects and Objects: The tokens with dependencies subj and obj (and their variants
like nsubj, pobj, dobj, etc.) are identified mostly as Subjects and Objects, respectively.
They can be noun chunks (e.g. temperature of the battery), compound nouns (e.g. battery
temperature) or single tokens (e.g. battery) in the requirement.

Also, we noted that there are several requirements involving a logical comparison (identified as
an adjective or an adverb) between the expressed quantities. In order to identify comparisons
(e.g. greater than, exceeds, etc.) in the requirement, we utilize the exhaustive synonym hyperlinks
from Roget’s Thesaurus [22] and map them to the corresponding equality (=), inequality (!=),
inferiority (<, <=) and superiority (>, >=) symbols.

Example: From the sub-requirements ”[if the temperature of the battery is below Tmin] or
[if the temperature of the battery exceeds Tmax], [charging approval has to be withdrawn]”,
the system identifies Battery_Temperature and Charging_Approval as Subjects, Tmin and Tmax
as Objects and withdrawn as a Boolean Action. Also, the comparison term below is mapped as <
and exceeds is mapped as >.

3.3. Semantic entity identification
Semantic entities are tightly coupled with the end application which translates the parsed
syntactic information to sequence diagrams and then to abstract test cases. The semantic
Table 1
Mapping of syntactic to semantic entities
Syntactic entities Semantic entities
Action Signal
Action constraints Attributes
Subject / Object Actor / Component

entities are defined from the perspective of interactions in a sequence diagram and are outlined
below. The algorithm derives these entities from their syntactic counterparts1 .

1) Actor or Component: The participants involved in an interaction are defined as actors
and components. To differentiate other participants from the system under test (SUT),
component is always considered as the SUT.
2) Signal: The interaction between different participants is defined as a signal.
3) Attributes: The variables holding the status at different points of interaction are defined
as attributes.
4) State: This refers to the initial, intermediate and final states of an interaction.

Semantic entities demand additional details for completeness. For example, if the value of an
attribute is not given, it can not be initialized in its corresponding test case. Likewise, for each
signal the corresponding actor needs to be identified. For each requirement, the direction of
communication (incoming: towards the system under test or outgoing: from the system under
test) should be identified. In cases where the algorithm lacks the desired ontology information,
user input is demanded to update these values.
It is worth noting that the separation of the entities as syntactic (application independent but
grammar dependent) and semantic (application dependent but grammar independent) gives
more flexibility to the algorithm to be used in parts also in a different environment than the
description language considered here. However, the mapping from the syntactic entities to their
semantic counterparts can be completely automated with stricter rules or can be accomplished
with user intervention and validation.

Example: From the sub-requirements ”[if the temperature of the battery is below Tmin] or
[if the temperature of the battery exceeds Tmax], [charging approval has to be withdrawn]”,
the identified Subjects (Battery_Temperature and Charging_Approval) are mapped as Signals and
the identified Objects (Tmin and Tmax) are mapped as Attributes. Here, the identified Action
withdrawn is also considered as an Attribute owing to the semantics of its corresponding Boolean
Signal. Additionally, we can arrive at the Actor for Battery_Temperature as battery. However,
the Actor of Charging_Approval is ambiguous (or rather unknown). Likewise, Attribute values
should either be passed by the user or they remain uninitialized in the resulting test case.

1
Note that the algorithm maps syntactic to semantic entities with more complex rules (including action types
and verb types). In Table 1, we have presented only the most primitive ones for brevity. This difference is also
detailed in the example where an Action is considered as an Attribute and a Subject is mapped to a Signal.
3.4. Transformation to requirement model
For the description of the formal requirements a simple text-based domain-specific language
(DSL) is used, the ifak requirements description language (IRDL) [23]. This notation for require-
ment models was developed on the basis of UML sequence diagrams and is specially adapted to
the needs of describing requirements as sequences. The IRDL defines a series of model elements
(e.g. components, messages) with associated attributes (name, description, recipient, sender,
etc.) and special model structures (behavior based on logical operators or time). Functional,
behavior-based requirements are described textually using IRDL and can then be visualized
graphically as sequence diagrams (Fig. 2).
Once the entities are mapped and validated, the algorithm forms IRDL relations for each
clause and then combines them together to form relations for the whole requirement. IRDL
defines mainly two types of relations:

1) Incoming messages: SUT receives these messages provided the guard expression evaluates
to be true and then continues to the next sequence in an interaction. IRDL defines this
class of messages as ’Check’.
2) Outgoing messages: SUT sends these messages to other interaction participants with the
content defined in the signal. In IRDL, these messages are denoted as ’Message’.
Check(Actor->Component): Signal[guard expression];
Message(Component->Actor): Signal(signal content);

As an intermediate result, the user is shown the formulated IRDL relations along with
the sequence diagram corresponding to the requirement and is asked if the IRDL and the
corresponding sequence diagram are correct. In case the user wants to further modify the
relation formulation, the algorithm repeats from the mapping of syntactic entities to semantic
entities. This continues until the user confirms the model is satisfactory.

Example: IRDL relations for the example requirement ”If the temperature of the battery
is below Tmin or it exceeds Tmax, charging approval has to be withdrawn”, after the above-
mentioned steps is shown in Fig. 2.

Textual representation (IRDL) Graphical representation
system battery unknown_actor
State iState_001 at system;
iState_001
Check(battery->system):Battery_Temperature
[msg.value < Tmin || msg.value > Tmax]; Battery_Temperature
Message(system->unknown_actor): Charging_Approval
Charging_Approval(false);
State fState_001 at system; fState_001

Figure 2: Visualization of a requirement model in IRDL and as sequence diagram.
3.5. Model synthesis and test generation
The formalized requirements of the SUT can be combined to a specification model using existing
methods for model synthesis [23]. The sequence elements described before, are transformed
using a rule-based algorithm into equivalent elements of a UML state machine.
After model synthesis, test cases can be automatically generated from the state machine
using an existing method for model-based test generation [24]. Selecting a specific graph-based
coverage criteria such as ”all paths”, ”all decisions”, ”all places” or ”all transitions”, the state
machine is transformed into a special form of a Petri net from which abstract test cases in the
form of sequence diagrams can be generated. In this way, the approach allows modeling of
even complex system behavior and applying graph-based coverage criteria to the entire system
model.

4. Application
The toolchain for requirements-based model and test case generation presented in the previous
section is applied to an industrial use case from the e-mobility domain. The use case describes
a system for charging approval of an electric vehicle in interaction with a charging station.
The industrial use case was defined by AKKA and aims to provide a typical basic scenario and
development workflow in software development for an automotive electronic control unit (ECU).
It does so by defining requirements, using model-based software development and deploying
the functionality on an ECU.
The use case has to be seen in the context of an electric car battery that is supposed to be
charged. The function “charging approval” implements a simple function, which decides upon
specific input signals, if the charging process of the battery is allowed or not. For example,
charging approval is given or withdrawn depending on the battery temperature, voltage or state
of charge, the requested current is adjusted according to the battery temperature, and error
behavior is handled for certain conditions. This is a continuous process, i.e. the signal values
may change over time. A more detailed technical description of the industrial use case can be
found in [25]. To fulfill the requirement of model-based software development, the module is
implemented in Matlab Simulink. Matlab Simulink Coder is used to generate C/C++ code that
can be compiled and deployed to the target. A Raspberry Pi is used to simulate some but not all
aspects of an ECU. A basic overview of the charging approval system and its interfaces to the
environment is given in Fig. 3.

Charging Approval
Charging
Environment Velocity Environment
Approval State of Charge
Parking brake
Ignition Temperature of Battery

Figure 3: Process overview of charging approval system.
5. Results
The battery charging approval system described in the former section is used to evaluate the
proposed method. We first define the used evaluation metrics and then demonstrate the results.
To our knowledge, there are no available tools with similar input and output properties as our
tool that enable a direct comparison.

5.1. Evaluation metrics
Let 𝑅 be the set of textual requirements. For a requirement 𝑟 ∈ 𝑅, let 𝑋𝑟 be the set of expected
artifacts and 𝑌𝑟 be the set of generated artifacts. Here, artifacts refer to all the semantic entities
including the relation indicators. Let 𝑋 = ⋃𝑟∈𝑅 𝑋𝑟 denote the set of expected artifacts in all
requirements and 𝑌 = ⋃𝑟∈𝑅 𝑌𝑟 the set of generated artifacts in all requirements. Then we define
the following metrics to measure the performance of the method.
1) Completeness: For an individual requirement, this metric denotes the number of ex-
pected artifacts 𝑥 ∈ 𝑋𝑟 for which a corresponding (not necessarily identical) generated
artifact 𝑦 ∈ 𝑌𝑟 exists, in relation to the total number of expected artifacts |𝑋𝑟 |.
2) Correctness: For an individual requirement, this metric denotes the number of gener-
ated artifacts 𝑦 ∈ 𝑌𝑟 for which a corresponding, semantically identical (up to naming
conventions) expected artifact 𝑥 ∈ 𝑋𝑟 exists, in relation to the total number of generated
artifacts |𝑌𝑟 |.
3) Consistency: This metric denotes the number of generated artifacts 𝑦 ∈ 𝑌 for which a
corresponding expected artifact 𝑥 ∈ 𝑋 exists and is used identically in all requirements
𝑟 ∈ 𝑅, in relation to the total number of generated artifacts |𝑌 |.
The macro average for completeness and correctness, respectively, is then given by the mean
value of all individual percentage values for all 𝑟 ∈ 𝑅. The micro average is given by the sum of
all values in the numerator divided by the sum of all values in the denominator for all 𝑟 ∈ 𝑅.

Example: In order to assert the evaluation metrics in detail, consider the requirement clause
’if the SoC of the battery is below SoC_max’.

Expected: C h e c k ( c h a r g i n g _ m a n a g e m e n t - > s y s t e m ) : B a t t e r y _ S o C [ m s g . I a l u e < S o C _ m a x ] ;
Generated: C h e c k ( b a t t e r y - > s y s t e m ) : b a t t e r y _ S o C [ m s g . v a l u e < S o C _ m a x ] ;

For the metric completeness, we check if all the expected artifacts (i.e. Check, charging_man-
agement, Battery_SoC, etc.) are generated by the algorithm. In this case, we can see that all
of them were generated. For obtaining the correctness, we check if those generated artifacts
are semantically correct. In this case, though we expect the actor charging_management, the
algorithm generates battery. This reduces the value of correctness. If the algorithm generates
battery_SoC for every occurrence of ’SoC of battery’ across all the requirements, it is considered
consistent for this artifact.
Table 2
Evaluation of the algorithm on the charging approval system
without domain knowledge with domain knowledge
macro avg. micro avg. macro avg. micro avg.
Completeness 78.2% 79.8% 81.4% 84.1%
Correctness 74.9% 78.8% 78.3% 82.1%
Consistency 94.1% 96.4%

5.2. Requirements formalization
As part of the demonstrated use case, AKKA has provided functional requirement documents
describing the expected behavior for the relevant SUT. To apply the NLP-based requirements
formalization method, each statement is treated as a separate entity for which a well-defined
requirement model is created. Overall, the charging approval SUT is described by 14 separate
requirement statements.
The results of our evaluation are shown in Table 2. We have determined the individual
correctness and completeness values and calculated the macro and micro average for them.
We avoided double counting of identical entity detections as not to skew the results. As
mentioned above, if an actor or value of an attribute is not explicitly mentioned in the textual
requirement, it cannot be detected by the algorithm. Therefore we also show the results using
domain knowledge, which could be in the form of a predefined list of signals, attributes, etc. or
integrated by direct user interaction from an expert with knowledge about the system.
As one can observe, the method shows good results, most of the signals and other artifacts
were detected correctly and completely. Having a list of artifact declarations in advance
produces even more accurate predictions. Thus, our NLP-based approach shows a good quality
and supports the generation of the formal requirement model to a significant extent.
A comparison of the time for its creation, both with and without the provided tool is not
measured directly. However, from our experience of former and the presented use case it takes
a lot of time for a requirements engineer to get into the description language for sequence
diagrams by reading documentations and having discussions, to create the logical structure
and to add all the details to the model manually. The new semi-automated approach supports
the user in a great manner. It gives a first proposal of the requirement model in a textual and
graphical view and provides options for handeling unclear points. This should therefore save a
lot of time, even though a manual review of the created model is still required.

5.3. Model synthesis and test generation
For the next step, the requirement models of the charging approval system were used as the
input for model synthesis using ifak’s prototypical tool ModGen. Since the NLP-based approach
treats every requirement as a separate entity, it is also necessary to connect each requirement by
modelling the boundaries explicitly. As a result, a graph-based representation of the functionality
as described by the requirements was generated in the form of a UML state machine (Fig. 4).
The generated model contains 6 states and 20 transitions with appropriate signals, guards and
actions. The semantic as well as syntactic validity of the generated UML state machine could
Figure 4: UML state machine of charging approval (left) and a test case visualized as a sequence
diagram (right).

be confirmed by a thorough evaluation based on the initial requirement documents and by
checking for deadlocks and livelocks. It could be shown that no further manual editing of the
model is required for a full description of the behavior of the system.
In this evaluation, ifak’s prototypical tool TCG with the coverage criteria “all-paths” was
selected, which ensures that each possible path in the utilized model is covered by at least one
test case. By utilizing the existing algorithm for test generation, a total of 73 test cases were
generated. In Fig. 4, one of the generated test cases is visualized in the form of a sequence
diagram. Here, a test system (TS) interacts with the SUT (charging approval) and provides a
number of parameters, upon which the system decides if charging approval is given.
Overall, it can be shown that valid abstract test cases are generated based on the specification
model. Using an appropriate framework for test case execution and a suitable test adapter, the
generated test cases could be used for validation of the functional behavior of the SUT.

6. Conclusion
In this work, an NLP-based method for machine-aided model generation from textual require-
ments is presented. The method is designed to cover a wide range of requirements formulations
without being restricted to a specific domain or format. Further, the generated requirement
models are given in a user-friendly, comprehensible textual and graphical representation in the
form of sequence diagrams.
We evaluated our approach on the industrial use case of a battery charging approval system
and showed that the algorithm can produce complete, correct and consistent artifacts to a high
degree. We have also shown how these artifacts are then used to create sequence diagrams
for each requirement and transformed into a state machine for the entire specification model
to finally generate abstract test cases. With the proposed semi-automated approach, we aim
to reduce the human effort of creating test cases from textual requirements to validating the
generated requirement models. In future versions of this prototypical implementation, we intend
to refine the rule-based approach further, thus reducing the need for manual modifications. One
possible solution to this regard could be training a Named Entity Recognition (NER) algorithm
to identify the semantic entities, however at the cost of intensive labelling work. Another
solution could be to rely on (pre-trained) Semantic Role Labels (SRL).
This study is still research-in-progress, since even more complex textual requirements have
to be considered for future applications. The use of the methodology is also conceivable in
other domains, such as in rail, industrial communication and automotive. In future work, we
therefore intend to analyze how we can improve the method to cover more application domains.

Acknowledgments
This research was funded by the German Federal Ministry of Education and Research (BMBF)
within the ITEA 3 projects TESTOMAT under grant no. 01IS17026G and XIVT under grant
no. 01IS18059E. We thank our former colleague Martin Reider and our research assistant Libin
Kutty from ifak for the valuable contributions to this paper. We also thank AKKA Germany
GmbH for providing an industrial use case for the evaluation of the presented method.

References
[1] P. Ammann, J. Offutt, Introduction to Software Testing, 2nd ed., Cambridge University
Press, 2017.
[2] M. J. Escalona, J. J. Gutierrez, M. Mejías, G. Aragón, I. Ramos, J. Torres, F. J. Domínguez,
An overview on test generation from functional requirements, Journal of Systems and
Software 84 (2011) 1379–1393.
[3] I. Ahsan, W. H. Butt, M. A. Ahmed, M. W. Anwar, A comprehensive investigation of
natural language processing techniques and tools to generate automated test cases, in:
ICC, 2017, pp. 1–10.
[4] V. Garousi, S. Bauer, M. Felderer, NLP-assisted software testing: A systematic mapping of
the literature, Information and Software Technology 126 (2020).
[5] M. Riebisch, M. Hubner, Traceability-Driven Model Refinement for Test Case Generation,
in: ECBS, 2005, pp. 113–120.
[6] C. Rupp, Requirements-Engineering und -Management: Aus der Praxis von klassisch bis
agil, 6th ed., Hanser, 2014.
[7] A. Mavin, P. Wilkinson, A. Harwood, M. Novak, Easy Approach to Requirements Syntax
(EARS), in: RE, 2009, pp. 317–322.
[8] G. Carvalho, F. Barros, A. Carvalho, A. Cavalcanti, A. Mota, A. Sampaio, NAT2TEST Tool:
From Natural Language Requirements to Test Cases Based on CSP, in: SEFM, 2015, pp.
283–290.
[9] S. C. Allala, J. P. Sotomayor, D. Santiago, T. M. King, P. J. Clarke, Towards Transforming
User Requirements to Test Cases Using MDE and NLP, in: COMPSAC, 2019, pp. 350–355.
[10] C. Nebut, F. Fleurey, Y. Le Traon, J.-M. Jezequel, Automatic test generation: A use case
driven approach, IEEE Transactions on Software Engineering 32 (2006) 140–155.
[11] A. Goffi, A. Gorla, M. D. Ernst, M. Pezzè, Automatic generation of oracles for exceptional
behaviors, in: ISSTA, 2016, pp. 213–224.
[12] A. Blasi, A. Goffi, K. Kuznetsov, A. Gorla, M. D. Ernst, M. Pezzè, S. D. Castellanos, Trans-
lating code comments to procedure specifications, in: ISSTA, 2018, pp. 242–253.
[13] C. Wang, F. Pastore, A. Goknil, L. Briand, Automatic Generation of Acceptance Test Cases
from Use Case Specifications: an NLP-based Approach, IEEE Transactions on Software
Engineering (2020) 1–38.
[14] T. Yue, S. Ali, M. Zhang, RTCM: a natural language based, automated, and practical test
case generation framework, in: ISSTA, 2015, pp. 397–408.
[15] B. C. F. Silva, G. Carvalho, A. Sampaio, Test Case Generation from Natural Language
Requirements Using CPN Simulation, in: SBMF, 2015, pp. 178–193.
[16] J. Fischbach, A. Vogelsang, D. Spies, A. Wehrle, M. Junker, D. Freudenstein, Specmate:
Automated creation of test cases from acceptance criteria, in: ICST, 2020, pp. 321–331.
[17] spaCy, Industrial-strength Natural Language Processing in Python, 2020. URL: https:
//spacy.io/.
[18] H. Yang, A. de Roeck, V. Gervasi, A. Willis, B. Nuseibeh, Analysing anaphoric ambiguity
in natural language requirements, Requirements Engineering 16 (2011) 163–189.
[19] S. Lappin, H. J. Leass, An algorithm for pronominal anaphora resolution, Computational
Linguistics 20 (1994) 535–561.
[20] L. Qiu, M.-Y. Kan, T.-S. Chua, A Public Reference Implementation of the RAP Anaphora
Resolution Algorithm, in: LREC, 2004, pp. 291–294.
[21] D. K. Deeptimahanti, R. Sanyal, An Innovative Approach for Generating Static UML
Models from Natural Language Requirements, in: ASEA, 2008, pp. 147–163.
[22] Roget’s Hyperlinked Thesaurus, Categories of notions, 2020. URL: http://www.roget.org/
scripts/hier.php/?class=I&division=0§ion=III.
[23] S. Magnus, T. Ruß, J. Krause, C. Diedrich, Modellsynthese für die Testfallgenerierung
sowie Testdurchführung unter Nutzung von Methoden zur Netzwerkanalyse, at - Automa-
tisierungstechnik 65 (2017) 73–86.
[24] J. Krause, Testfallgenerierung aus modellbasierten Systemspezifikationen auf der Basis
von Petrinetzentfaltungen, Ph.D. thesis, Otto-von-Guericke-Universität Magdeburg, 2012.
[25] D. Grujic, T. Henning, E. J. C. García, A. Bergmann, Testing a Battery Management System
via Criticality-based Rare Event Simulation, preprint, arXiv:2107.00530 [cs.SE], 2021.