<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">OpenReq-DD: A Requirements Dependency Detection Tool</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Quim</forename><surname>Motger</surname></persName>
							<email>jmotger@essi.upc.edu</email>
							<affiliation key="aff0">
								<orgName type="department">Cristina Palomares ESSI Dept</orgName>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Essi</forename><surname>Dept</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Cristina Palomares ESSI Dept</orgName>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Ricard</forename><surname>Borrull</surname></persName>
							<email>rborrull@essi.upc.edu</email>
							<affiliation key="aff1">
								<orgName type="department">CS Dept</orgName>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Jordi</forename><surname>Marco</surname></persName>
							<email>jmarco@cs.upc.edu</email>
							<affiliation key="aff2">
								<orgName type="institution">Universitat Politecnica de Catalunya</orgName>
							</affiliation>
						</author>
						<title level="a" type="main">OpenReq-DD: A Requirements Dependency Detection Tool</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">AAAD5C97D15C4BB284A3E0C5BC256667</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-25T03:01+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Requirements Engineering (RE) is one of the most critical phases in software development. Analyzing requirements data is a laborious task performed by expert stakeholders using manual processes, as there are no standard automatic tools to handle this issue in a more efficient way. The purpose of this paper is to summarize the approach of the OpenReq-DD dependency detection tool developed at the OpenReq project, which allows an automatic requirement dependency detection approach. The core of this proposal is based on an ontology which defines dependency relations between specific terminologies related to the domain of the requirements. Using this information, it is possible to apply Natural Language Processing techniques to extract meaning from these requirements and relations, and Machine Learning techniques to apply conceptual clustering, with the major purpose of classifying these requirements into the defined ontology.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>One of the most critical branches in software engineering is Requirements Engineering (RE) <ref type="bibr" target="#b0">[1]</ref>. There are many different problems related to this area, one of them being requirements traceability (RT). This is known as the "ability to describe and follow the life of a requirement, in both forwards and backwards direction" <ref type="bibr" target="#b1">[2]</ref>. Traceability allows to identify dependencies between these relations, and how changes in the requirements may affect others that are related to them. Identifying dependencies between requirements, usually specified in Natural Language (NL) is considered a difficult, expensive and long process <ref type="bibr" target="#b2">[3]</ref>  <ref type="bibr" target="#b3">[4]</ref>.</p><p>In order to provide an automated solution for the dependency detection step, we introduce OpenReq-DD, a requirements Dependency-Detection tool addressed to apply NL and Machine Learning (ML) techniques to analyze requirements and extract dependencies between them. This tool has been developed inside OpenReq <ref type="bibr" target="#b4">[5]</ref>, an EU Horizon 2020 project that aims to provide advanced innovative tools for community-driven RE.</p><p>In this context we present a use case based on data provided by Siemens (Austria), one of the industrial partners of OpenReq. This data is a set of several documents called Request for Proposals (RFP) in the railway systems domain, which comprise NL requirements and can be several hundred pages long. Furthermore, stakeholders do not keep track of current dependencies. Therefore, it would be a large task to extract these dependencies in a manual way. For a better comprehension of the tool action, an example of these requirements is used for the description of the steps carried out by the tool.</p><p>OpenReq-DD architecture is composed by a RESTful service as the main component, which exposes an API to provide the required data and perform the dependency detection algorithm. This is intended to match a microservice architecture and define an isolated, decoupled component that can be used in different contexts. This component exports and integrates as internal dependencies all required toolkits and frameworks for the different algorithm steps (see Fig. <ref type="figure" target="#fig_0">1</ref>). For demonstrations, a simple GUI is provided (see Sec. 3). The OpenReq-DD project is available at GitHub (https://github.com/OpenReqEU/dependency-detection), including a README file with all required information to deploy and run the tool.</p><p>Figure <ref type="figure" target="#fig_0">1</ref> shows the sequence of steps that OpenReq-DD performs to extract requirement dependencies. In the following, we describe the needed input data (Sec. 2.1), and a a brief description of each stage (Sec. 2.2). The dependency detection algorithm designed and developed in OpenReq-DD requires two types of data to perform the dependency extraction. The format of this data has been defined and discussed with the stakeholders of the OpenReq project.</p><p>• Requirements list. A list of requirements from which to extract dependencies is provided. The JSON Schema used for input and output response is available at https://goo.gl/Gx8vpJ</p><p>• Ontology. The ontology provides the tool the required knowledge about the general patterns and dependency types that are potential dependencies between requirements. This is the result of an study performed by stakeholders of the project. The ontology knowledge is structured in a dependency relations tree, where each node is a topic and each edge a dependency relation type (see Fig. <ref type="figure" target="#fig_1">2</ref>). An example of this ontology is available at https://goo.gl/Hx6GS2</p><p>The output of the RESTful service is a JSON response using the same format that the input data, but with the set of detected dependencies included. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2.1">Preprocessing</head><p>In order to reduce deficiencies in the data set, two preprocessing methods are applied to the requirements.</p><p>• Sentence Boundary Disambiguation (SBD). Sentence detection is applied to extract isolated sentences from each requirement, by deciding where are the beginning and the end of each sentence. The Apache toolkit OpenNLP <ref type="bibr" target="#b5">[6]</ref> is used for this purpose.</p><p>• Noisy Text Cleaning. After SBD, a total of 14 rules are applied to clean the text of each sentence. The cleaning includes, between others: removal of character, numeric and roman numerals list pointers; removal of acronyms that may appear at the beginning of a requirement; removal of escape sequence characters; addition of white spaces to prevent PoS tagger faults (e.g., between parenthesis or question marks).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2.2">Syntax Analysis</head><p>Given the preprocessed requirements, a syntax analysis is executed in order to extract words that can be potential candidates of a match with concepts of the ontology.</p><p>• Tokenization. The input sentence is split into single words using the OpenNLP toolkit.</p><p>Sentence: "The parameters for OBU must be given by RBC." Tokens: "The", "parameters", "for", "OBU", "must", "be", "given", "by", "RBC", "."</p><p>• PoS tagging. Each token of the sentence is marked with a part-of-speech tag using the NLP4J toolkit <ref type="bibr" target="#b6">[7]</ref>.</p><p>The parameters for OBU must be given by RBC .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>DT NNS</head><p>IN NP MD VB VBN IN NNP .</p><p>• Dependency Parser. Using the NLP4J toolkit, a dependency tree is generated where each node is a token of the input sentence and edges are the relations between parent words and child words.</p><p>• Information Extraction. In this step, the words to categorize each requirement into the ontology are detected. For doing so, patterns that take into account the position in the sentence and PoS tag of the word are used. To extract these patterns, a study was performed with existing datasets: the most representative words for each requirement were detected and afterwards patterns were extracted according to the position in the sentences of these representative words.  • N-Grams Generation. The matched patterns inside the dependency tree are analyzed in order to generate n-grams of nodes directly connected within the tree that are composed by a set of keywords encapsulating a big concept, a general idea superior to the individual meaning of each keyword. See Fig. <ref type="figure" target="#fig_5">4</ref> for an example.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2.3">Semantic Analysis</head><p>Semantic analysis is the process that interprets the language meaning (i.e., the topic concept) of the whole text. The main goal is to obtain the meaning of each word of the n-gram, and join them to get a unique meaning that can be matched with the concepts of the ontology.</p><p>• Lemmatization. The morphological analyzer included in the NLP4J framework is used to apply several rules (based on a large dictionary and several advanced heuristics) to extract the lemmas of each token. These lemmas allow us to compare different words with the same lexeme.</p><p>• Semantic similarity. The DKPro-Similarity framework <ref type="bibr" target="#b7">[8]</ref> is used as a word pair similarity detection in order to improve the lemmatization process, by identifying those tokens with a high similarity score that are not identified as part of the same lexeme. This step is potentially interesting for synonyms and similar meanings, which are analyzed using the lexical database WordNet <ref type="bibr" target="#b8">[9]</ref>.  </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2.4">Ontology Categorization</head><p>OpenReq-DD uses conceptual clustering to classify requirements into the different concepts of the input ontology by similar features. For each n-gram obtained in the semantic analysis, the following rules are applied.</p><p>• First, find equal matches between words combinations of the n-gram and the n-grams of the ontology.</p><p>• If there is no match, find equal matches between combinations of the extracted lemmas from the n-gram and the extracted lemmas of the input ontology.</p><p>• If there is no match, calculate the semantic relatedness between lemmas from the n-gram and the input ontology. The requirement is matched if the value is greater than a provided threshold.</p><p>• If none of the previous conditions are satisfied, the requirement is discarded as an individual in the ontology.</p><p>The result of this step is the ontology filled by requirement individuals.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2.5">Dependency Extraction</head><p>Finally, each pair of classes in the ontology that are linked with a dependency relation are analyzed extracting their instances (i.e., the requirements individuals) to find the existing dependencies.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Demo plan</head><p>In this section we provide details about a demo plan of the dependency detection analysis using OpenReq-DD.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1">Environment configuration</head><p>For a demo execution, it is necessary to run both the OpenReq-DD web service and the Java GUI application. The GUI component is a presentation layer that simplifies the communication with the Dependency-Detection RESTful service. It allows the user to execute a dependency detection analysis in a simplified way, decoupling HTTP communication requirements and output data interpretation on the client side.</p><p>As input data we need a list of requirements and an ontology. Examples of both files are referenced in Sec. 2.1.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2">Dependency detection execution</head><p>Figures <ref type="figure" target="#fig_7">5a and 5b</ref> show the main views of the OpenReq-DD GUI application. The user is asked to upload two files: the ontology structure file (*.owl file) and the requirements list file (*.json file). Once these files have been uploaded, the user can initiate the dependency analysis by clicking on the "Extract dependencies" button. After this interaction, the whole process of dependency detection starts: the back-end of the tool applies the steps explained in Sec. 2.2 . The RESTful service returns a JSON format response including the requirements and the list of dependencies detected. Through the GUI component, the list of dependencies is shown in a table including, for each dependency, three items: the source of the dependency; the dependency type; and the target of the dependency.  </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3">Siemens use case results</head><p>We use the 6 RFP documents of the Siemens' use case to validate OpenReq-DD and the achievement of objectives in terms of efficiency and efficacy of the requirements' dependencies detection.</p><p>Tables <ref type="table" target="#tab_1">1 and 2</ref> present a summary of both the automatic generated results of the OpenReq-DD tool and the stakeholder's manual validation. On the left side, we introduce the RFPs used for the Siemens use case, the number of total requirements of each RFP, and the number of dependencies extracted by our tool. These results were manually analyzed by stakeholders' experts, using three statistic measures: the precision of true positive detected dependencies, the precision with a refinement possibility (Precision-R) of true positive detected dependencies, and the imprecision of the detected dependencies, which is related to false positives outcomes. The results are good, but there is still room for improvement. We consider as future work exploring dependencies beyond semantic similarity, using other natural language criteria, and the extraction of relevant words from requirements using ML techniques (e.g., topic modelling). </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Document Requirements Dependencies</head></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Dependency detection overview</figDesc><graphic coords="2,55.06,185.82,500.58,111.14" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: Ontology structure example 2.2 Dependency detection process Given the input data, OpenReq-DD initiates the dependency extraction by following the next steps (which are transparent to the user).</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>( a )</head><label>a</label><figDesc>Dependency parser result (b) Deepest path for term extraction</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head>Figure 3 :</head><label>3</label><figDesc>Figure 3: Dependency tree</figDesc><graphic coords="3,126.60,340.89,154.71,127.56" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_4"><head></head><label></label><figDesc>(a) A parent with two direct term children to merge in a unique set of words.(b) A parent with one direct term child that has its own set of words to be merged in a unique set of words.(c) A parent with one direct term child to merge in a set of words, and another separated set of words from the other not relevant child.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_5"><head>Figure 4 :</head><label>4</label><figDesc>Figure 4: N-gram generation</figDesc><graphic coords="4,213.64,54.07,115.43,126.97" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_7"><head>Figure 5 :</head><label>5</label><figDesc>Figure 5: OpenReq-DD GUI</figDesc><graphic coords="5,126.27,54.07,152.05,85.04" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1 :</head><label>1</label><figDesc>Detected dependencies within provided RFPs</figDesc><table><row><cell>RPF1</cell><cell>1209</cell><cell>4453</cell><cell></cell></row><row><cell>RPF2 RPF3 RPF4 RPF5</cell><cell>6880 1209 1209 1209</cell><cell>12242 182 167 39</cell><cell>Precision Precision-R 7.7% 89.2% Imprecision 3.1%</cell></row><row><cell>RPF6</cell><cell>1209</cell><cell>1261</cell><cell></cell></row><row><cell>Total</cell><cell>14714</cell><cell>18344</cell><cell></cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2 :</head><label>2</label><figDesc>Dataset results</figDesc><table /></figure>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgments</head><p>The work presented in this paper has been conducted within the scope of the Horizon 2020 project OpenReq, which is supported by the European Union under the Grant Nr. 732463.</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">The three dimensions of requirements engineering: A framework and its applications</title>
		<author>
			<persName><forename type="first">K</forename><surname>Pohl</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Information Systems</title>
		<imprint>
			<biblScope unit="volume">19</biblScope>
			<biblScope unit="issue">3</biblScope>
			<biblScope unit="page" from="243" to="258" />
			<date type="published" when="1994">1994</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">An analysis of the requirements traceability problem</title>
		<author>
			<persName><forename type="first">O</forename><forename type="middle">C Z</forename><surname>Gotel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">W</forename><surname>Finkelstein</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of IEEE International Conference on Requirements Engineering</title>
				<meeting>of IEEE International Conference on Requirements Engineering</meeting>
		<imprint>
			<date type="published" when="1994">1994</date>
			<biblScope unit="page" from="94" to="101" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Automated identification of ltl patterns in natural language requirements</title>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">P</forename><surname>Nikora</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Balcom</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">20th International Symposium on Software Re-liability Engineering</title>
				<imprint>
			<date type="published" when="2009">2009</date>
			<biblScope unit="page" from="185" to="194" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Natural Language Processing for Requirements Engineering: The Best Is Yet to Come</title>
		<author>
			<persName><forename type="first">Fabiano</forename><surname>Dalpiaz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Alessio</forename><surname>Ferrari</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Xavier</forename><surname>Franch</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Cristina</forename><surname>Palomares</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE Software</title>
		<imprint>
			<biblScope unit="volume">35</biblScope>
			<biblScope unit="issue">5</biblScope>
			<biblScope unit="page" from="115" to="119" />
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<monogr>
		<ptr target="https://openreq.eu/" />
		<title level="m">OpenReq, Project</title>
				<imprint>
			<biblScope unit="page" from="2019" to="2020" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<monogr>
		<ptr target="http://opennlp.apache.org" />
		<title level="m">Apache OpenNLP Toolkit</title>
				<imprint>
			<biblScope unit="page" from="2019" to="2020" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<monogr>
		<ptr target="https://emorynlp.github.io/nlp4j/" />
		<title level="m">NLP Toolkit for JVM Languages (NLP4J), Part-of-speech Tagging</title>
				<imprint>
			<biblScope unit="page" from="2019" to="2020" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<monogr>
		<ptr target="https://dkpro.github.io/dkpro-similarity" />
		<title level="m">DKPro Similarity</title>
				<imprint>
			<biblScope unit="page" from="2019" to="2020" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<monogr>
		<title level="m" type="main">WordNet -A Lexical Database for English</title>
		<ptr target="https://wordnet.princeton.edu/" />
		<imprint>
			<biblScope unit="page" from="2019" to="2020" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
