Back to Basics: Extracting Software Requirements
                  with a Syntactic Approach

         Matthew Caron                         Frederik S. Bäumer                     Michaela Geierhos
      University of Paderborn                University of Paderborn                University of Paderborn
      mcaron@campus.upb.de                  frederik.baeumer@upb.de                 michaela.geierhos@upb.de


                                                       Abstract
                       As our world grows in complexity, companies and employees alike need,
                       more than ever before, solutions tailored to their needs. Such tools do
                       not always exist and need to be designed from scratch. In this paper, we
                       present a syntactic rule-based extraction tool for software requirements
                       specification documents. Notably, our tool allows non-expert users to
                       express their software needs in unfiltered natural language.


1    Motivation
Since software solutions are not always available out of the box and need to be developed anew, modern tech-
niques, such as of On-The-Fly Computing – i.e. “the automatic ad-hoc configuration of individual service
compositions that fulfil customer-specific requirements” [GSB15] – provide a swifter solution to this problem and
could become a genuine part of the software acquisition process. Nevertheless, projects involving custom-made
solutions all share the same fundamental challenge: extracting and understanding software requirements. In
truth, we regard this challenge as being twofold. First, not all users are technology experts; therefore, mak-
ing the task of describing complex software requirements extremely difficult. Second, because non-expert users
do not always use appropriate technical language, it can be difficult for developers to understand the genuine
meaning of software requirements specification documents.
   Requirements engineering, more precisely, the extraction and understanding of users’ needs, plays an essential
role in software development [Che12]. Requirements must be extracted, processed, analysed, and most impor-
tantly, understood by either a machine or a human agent. This challenge has been a recurring conundrum that
researchers have been working hard to solve [VK08]. However, some current solutions are complicated and in
some cases, even require the users to use a highly specific technical jargon when describing the desired require-
ments. As previously introduced, non-expert users may not have the technical expertise required to fill-out
complex service templates. In this regard, we would like to present our vision of a natural language software
requirements extraction approach that would allow the analysis of lengthy software requirements specification
documents written in unfiltered natural language, giving non-expert users the opportunity to express themselves
freely while providing developers with comprehensive, structured, and complete information.

2    Concept
With data sparsity being a major obstacle, modern approaches involving supervised or unsupervised machine
learning methods are inconceivable. As a result, our architecture is based on a syntactic rule-based algorithm
powered by a state-of-the-art dependency parser. In short, our approach aims at extracting every single subject-
verb-object triple (SVO) from an input document. We consider this elementary approach to be an efficient and
reliable way to extract actions and requirements from text and that, without the need for labelled data.
Copyright c 2018 by the paper’s authors. Copying permitted for private and academic purposes.
   Given the extent of this research project, this study was divided into three phases. In this ongoing first phase,
the focus was set, as seen in Figure 1, on retrieving SVOs, relevant adjectives, complements, and negative words.
Moreover, since the order of requirements found in a document may not always represent the true logical order,
we experimented with different lexicon-based techniques as a means to sort all extracted software requirements
into a proper sequential order. Lastly, to address the ambiguity dilemma, it was decided that supplementary
semantic information about the individual requirements should also be generated using a word sense API.


                                         Figure 1: Architecture | Phase 1

   In the second phase of this research project, the focus will be set on further refining the syntactic rule-based
extraction algorithm with the help of crowdsourced data collected during the first phase. As it turns out, our
tool will be made available to the public as a beta version; thus, allowing us to collect data and feedback. The
acquired data will also be employed to build a classification model used for the validation of extracted software
requirements. Since the SVO approach is purely syntactic, we are of the opinion that an authentication process
should be added to the processing pipeline. Additionally, the possibility to replace the lexicon-based technique
used to sort extracted requirements with a more contemporary solution will also be investigated.
   Lastly, the third and final phase will focus on solving common problems related to the field of software
requirements extraction, namely incompleteness, inconsistency, and vagueness [VK08]. As reported, requirements
found in a specification document may be conflicting or even lack some valuable information. Therefore, it is
clear that this matter, too, needs to be studied.

3   Conclusion
Our approach aims at facilitating software development processes by giving (non-expert) users the opportunity
to voice their needs and wishes in unfiltered natural language. Because the tool’s architecture is based on a
syntactic rule-based approach, it has the advantage of not being domain-specific, meaning that the origin of an
input document should not affect the overall performance of the extraction algorithm. As exposed, our research
project will focus on overcoming common, yet arduous challenges related to software requirements engineering,
namely ambiguity, incompleteness, inconsistency, and vagueness. Finally, we believe that the topic of sequential
ordering is essential to any software requirements extraction task and will, therefore, be thoroughly examined.

Acknowledgements
This work was partially supported by the German Research Foundation (DFG) within the Collaborative Research
Centre “On-The-Fly Computing” (SFB 901).

References
[Che12] Murali Chemuturi. Requirements Engineering and Management for Software Development Projects.
        Springer Science & Business Media, New York, USA, 2012.

[GSB15] Michaela Geierhos, Sabine Schulze, and Frederik Simon Bäumer. What did you mean? Facing the
        Challenges of User-generated Software Requirements. In Proceedings of the 7th International Conference
        on Agents and Artificial Intelligence, pages 277–283, Lisbon, Portugal, 10 - 12 January 2015.
[VK08]   Kunal Verma and Alex Kass. Requirements Analysis Tool: A Tool for Automatically Analyzing Software
         Requirements Documents. The Semantic Web-ISWC 2008, pages 751–763, 2008.