=Paper=
{{Paper
|id=Vol-1554/PD_MoDELS_2015_paper_11
|storemode=property
|title=REAssistant: a Tool for Identifying Crosscutting Concerns in Textual Requirements
|pdfUrl=https://ceur-ws.org/Vol-1554/PD_MoDELS_2015_paper_11.pdf
|volume=Vol-1554
|authors=Alejandro Rago,Claudia Marcos,J. Andres Diaz-Pace
|dblpUrl=https://dblp.org/rec/conf/models/RagoMD15a
}}
==REAssistant: a Tool for Identifying Crosscutting Concerns in Textual Requirements==
REAssistant: a Tool for Identifying Crosscutting
Concerns in Textual Requirements
Alejandro Rago∗†1 , Claudia Marcos∗‡2 , J. Andrés Diaz-Pace∗†3
∗ Instituto Superior de Ingeniería de Software (ISISTAN-UNICEN)
Tandil, Buenos Aires, Argentina
† CONICET, Argentina
‡ CIC, Buenos Aires, Argentina
1 2 3
{arago,cmarcos,adiaz}@exa.unicen.edu.ar
Abstract—Use case modeling is very useful to capture re- and time-consuming task, mainly due to the semantics and
quirements and communicate with the stakeholders. Use cases ambiguities of natural language.
normally have textual specifications that describe the interactions In this context, is useful for analysts to rely on tool support
between the system and external actors. However, since use cases
are specified from a functional perspective, concerns that do for processing requirements and identifying CCCs. Such a
not fit well this decomposition criterion are kept away from tool should be able to quickly gather a list of candidate
the analysts’ eye and might end up intermingled in multiple CCCs from the text and present it to the analysts (e.g.,
use cases. These crosscutting concerns (CCCs) are generally integrity, synchronization, access control, etc.). Then, it is
relevant for analysis, design and implementation activities and up to the analysts to inspect the list to determine which
should be dealt with from early stages. Unfortunately, identify-
ing such concerns by hand is a cumbersome and error-prone CCCs are actually relevant. There are several concern mining
task, mainly because it requires a semantic interpretation of tools available that use Natural Language Processing (NLP)
textual requirements. To ease the analysis of CCCs, we have techniques and domain-specific dictionaries (e.g., taxonomies
developed an automated tool called REAssistant that is able to of quality attributes) [3]–[5]. Unfortunately, these tools have
extract semantic information from textual use cases and reveal trouble to identify portions of functionality implicitly (or
candidate CCCs, helping analysts to reason about them before
making important commitments in the development. Our tool indirectly) affected by CCCs because they have poor semantic
performs a series of advanced NLP analyses based on the UIMA capabilities when processing textual requirements.
framework. Analysts can define concern-specific queries in the To overcome these limitations, we have developed the
tool to search for CCCs in the requirements via a flexible SQL- REAssistant (REquirements Analysis Assistant) tool [6]. Our
like language. In this article, we briefly discuss the technologies tool supports the search of latent CCCs by relying on ad-
behind the tool and explain how an end user can interact
with REAssistant to analyze CCCs in use case specifications. vanced NLP modules and domain knowledge about use cases.
A short video explaining the main features of the tool can be REAssistant uses an annotation-based representation of use
found at https://youtu.be/i3kSJil_2eg. The REAssistant tool can cases that holds lexical, syntactical, semantical and domain
be downloaded from https://code.google.com/p/reassistant. information of the text. Furthermore, our tool is equipped
with a NLP pipeline assembled with the UIMA framework
I. I NTRODUCTION that decorates the use cases with annotations [7]. The pipeline
performs several linguistic analyses in the text, such as:
Most software systems have certain concerns that are key dependency parsing, semantic role labeling and domain actions
for the success of a project [1]. These concerns are often classification [6]. To find candidate concerns, the tool provides
related to business goals of the system (e.g., profit, market customizable concern-specific rules that can query the annota-
opportunities or brand positioning, etc.) and quality attributes tions generated earlier to extract not only CCCs but also their
(e.g., performance, fault tolerance or security, etc.) [2]. Since crosscutting relations (i.e., requirements affected by CCCs).
many requirements modeling techniques (e.g., use cases) are The rules take advantage of the so-called "domain actions",
based on a functional decomposition criterion, some concerns which are a taxonomy of domain-neutral classes applicable to
are likely to be hidden in textual specifications, tangled with use cases. Finally, our tool is implemented as a set of Eclipse
functionality and scattered across documents. These concerns plugins that provide mechanisms for the analysis of concerns,
are referred to as crosscutting concerns (CCCs) [1]. For including special views for visualizing CCCs at different levels
example, an access control policy (part of a security quality of granularity.
attribute) can subtlety appear in multiple use cases, and might The rest of this article is organized in 3 sections. Section
be overlooked by analysts and architects during the system II explains the concern discovery problem with a motivating
design. Since requirements are commonly documented in example. Section III briefly discusses the architectural design
natural language, analysts and developers must peruse textual of the tool and its components. Finally, Section IV provides a
specifications to reveal CCCs of interest for further analysis. quick tour of the working of REAssistant from the viewpoint
Still, searching for latent concerns in requirements is a difficult of an analyst.
Figure 1. Concern impacts in use cases Figure 2. Overview of the Architecture of REAssistant
II. R EVEALING L ATENT C ONCERNS IN U SE C ASES III. A RCHITECTURE OF REAssistant
Identifying CCCs in use case specifications demands a REAssistant is built on the Eclipse IDE as a set of plugins
careful manual inspection by analysts, as well as a dosage of that support both the linguistic analysis of textual use cases and
domain experience. There are three main activities that ana- the execution of search rules for identifying concerns [6]. Fig-
lysts should do: i) finding candidate concern(s), ii) determining ure 2 shows the main components of the architecture, namely:
the “real” concerns and specifying them, and iii) identifying UIMAProcessingEngine, QueryingEngine, and REAssistant-
the points of the specification (e.g., use case steps) affected by GUI. The communication among these components take place
each concern, i.e., finding its crosscutting relations or impacts. through files that contain (serialized) EMF1 models. Use cases
Let us consider the excerpts from three use cases shown in Fig- are first imported into the UIMAProcessingEngine via the
ure 1. The sentences of UCS1 and UCS3 qualify functionality UseCaseReader component, which gathers text from varied
with phrases such as “in less than 10 seconds” and “as fast as sources, such as PDF, DOC, HTML files, or directly from the
possible”, which are hints of a P ERFORMANCE concern. Thus, use case editor bundled in REAssistant. Then, the text is stored
an analyst can interpret that there is a P ERFORMANCE CCC in the AnnotationSchema, which is a shared data structure
at play, and search for performance-related words to quickly that allows the communication of text analytic modules. Once
expose the concern. We call these explicit references in the text imported, a pipeline of Annotators takes the text from use
direct impacts of the concern. Several tools currently support cases and breaks them into individual sentences (e.g., behavior
keyword-based searches to mine them [3]–[5]. steps), and automatically generates different annotations for
However, there might be other use cases implicitly affected these sentences. There are two kind of annotators. The first
by the same concern as well. For instance, an experienced an- set of annotators run a series of NLP tasks that include
alyst could determine that one of the steps of UCS2, referring standard linguistic analyses (e.g., stemming, POS tagging).
to a “computation”, is also constrained by the P ERFORMANCE The second set of annotators runs more complex analyses (e.g.,
CCC. This relation can be discovered after making a semantic semantic role labeling) for extracting the predicate structure
analysis of the text, rather than a lexical or syntactical analysis. of the sentences and for mapping these predicates to domain
We call these implicit relations in the text indirect impacts actions. The computation of domain actions is performed
of the concern. Indirect impacts are usually harder to detect with a special classifier reported in [6]. Furthermore, a NLP
because they require an interpretation of the semantics in specialist can configure annotators via UIMA before running
textual requirements. From an automation viewpoint, indirect the pipeline. The resulting annotations are later exported to
impacts can be (approximately) detected by uncovering asso- an EMF model. The QueryingEngine is equipped with the
ciations between specific concerns and “abstract” actions (e.g., concern-specific searching rules that were defined beforehand
compute, calculate, perform, execute) expressed in use cases, by experts. By analyzing the use cases and their annotations,
because such associations often hold the key for recognizing the QueryingEngine executes the searching rules on the text
concern impacts. However, it is the analysts’ responsibility to of use cases. At last, the REAssistantGUI features different
determine if a sentence is truly affected by a CCC. The role of views for browsing candidate CCCs and their impacts. This
tool support is then to recommend potential CCCs and let the component provides edition and visualization support for the
analyst make the final decisions. Unfortunately, existing tools analyst to explore and refine the concerns found.
for mining concerns have problems to detect indirect impacts,
because their semantic-level features are limited. 1 http://www.eclipse.org/modeling/emf/
Figure 3. Annotations in a use case sentence Figure 4. Rule syntax for searching CCCs
Figure 5. Concern editor provided in REAssistant
REAssistant leverages on the UIMA framework2 [6], [8].
UIMA is an extensible architecture for building analytic appli-
cations that process unstructured information. The architecture
of our tool makes extensive use of the annotation mechanisms
provided by UIMA. An annotation identifies and labels a
specific region of a text document. Figure 3 shows a linguistic
analysis of a use case step from a requirements specification.
The annotations of level 1 correspond to tokens. Direct impacts
would be typically discovered by analyzing information at this
level. The annotations of levels 2 and 3 provide richer infor-
mation, such as the predicate structure and domain actions,
respectively. Indirect impacts can be discovered by querying
information at level 3.
The QueryingEngine is implemented on top of the EMF publication, we also report on the results of an empirical
Query23 project, which serves as an SQL-like language for evaluation of REAssistant with three case-studies.
searching through EMF models. The rule syntax is sim-
ple to understand and powerful enough to express concern- IV. REAssistant IN ACTION
related queries. In addition, we have developed an abstraction
layer that allows analysts to seamlessly incorporate UIMA- The REAssistant tool offers analysts functionality for editing
generated annotations in the queries. There are two types of use cases, performing a linguistic analysis of the use cases, and
rules: i) direct rules, responsible for detecting a CCC; and ii) applying searching rules for identifying CCCs. In this section,
indirect rules, for detecting domain actions that are potentially we discuss the operation of the tool from the perspective of an
related to that concern. Direct rules are focused in finding analyss who is using it and explain how she/he interacts with
explicit references to a particular CCC, for example, the the tool in the concern identification and analysis process (see
word “server” or “database”. Complementary, indirect rules a video at https://youtu.be/i3kSJil_2eg). Initially, the analyst
are focused in finding more subtle associations that come from needs to provide the text of use case specifications. Our tool
a semantic interpretation of the use cases. Figure 4 illustrates has a form-based editor that handles the documentation of use
a P ERFORMANCE rule composed of three queries. Query #1 cases and stores them in a persistent file with extension “ucs”.
would find parts of the text related to P ERFORMANCE through The internal structure of “ucs” files are based on a standard
the analysis of token lemmas such as “response” and “second”, use case template that contains sections to describe actors,
similarly to keyword-based approaches. Queries #2 and #3 main flow, alternative flows, supplementary requirements, etc.
make use of domain actions to reveal indirect impacts, looking Once the “ucs” file is complete, analysts can automatically run
for actions such as “calculation” and “process”. For more a series of NLP analyses on the use cases. From the user’s
information about the architecture of the tool, the NLP pipeline viewpoint, the linguistic analyses will produce all kinds of
and the concern ruleset, the reader is referred to [6]. In this meta-information for the use cases in the form of layers of
annotations, which are later stored in a persistent file with
2 http://uima.apache.org/ extension “uima”. This file holds the results of the semantic
3 http://www.eclipse.org/modeling/emf/downloads/?project=query2 analysis of the text.
Figure 6. Views provided in REAssistant for analyzing CCCs After the text is processed, users can open a new editor
to conduct analyses for the CCCs and their relations with
the use cases. The editor will create a persistent file with
extension “rea”. Figure 5 shows a snapshot of this editor,
where the analysts are free to accept, modify or delete any
of the concerns detected, based on their understanding of
the requirements. In order to identify CCCs, users just have
to press a button labeled “Rule Mine CCC” to execute the
predefined queries loaded in REAssistant with the rule-based
engine. The queries codify knowledge about concerns and
how they relate semantically to natural language expressions,
and were defined by experienced analysts to cover a wide
range of software domains. Anyway, our tool has an editor
in which analysts can customize the rules at any time. Let
us assume that the analyst selects the rules associated to the
(a) Global view P ERFORMANCE concern. The execution of the rules will mark
the sentences that are potentially crosscut by the concern.
The tool can display the crosscutting relations using different
colors on the text and at two levels of granularity: at the level
of use cases (global view, Figure 7a), or at the level of behavior
steps for a given use case (detailed view, Figure 7b). There is
also another view within the concern editor that computes a
traceability matrix among the use cases and the concerns. In
this way, the analyst can easily get insights on: how a given
concern impacts on the use cases, whether a concern is well-
modularized (in terms of a narrow set of use cases), or how a
given use case gets affected by several concerns.
R EFERENCES
[1] A. Moreira, R. Chitchyan, J. Araujo, and A. Rashid, Eds., Aspect-Oriented
(b) Detailed view Requirements Engineering. Springer Berlin Heidelberg, 2013, vol. XIX.
[2] L. Bass, P. Clements, and R. Kazman, Software Architecture in Practice,
3rd ed., ser. SEI Series in Software Engineering. Addison-Wesley
Professional, October 2012.
[3] E. Baniassad, P. Clements et al., “Discovering early aspects,” IEEE
Software, vol. 23, no. 1, pp. 61–70, 2006.
[4] A. Sampaio, A. Rashid, R. Chitchyan, and P. Rayson, “EA-Miner: towards
automation in aspect-oriented requirements engineering,” Transactions on
Aspect-Oriented Software Development III, pp. 4–39, 2007.
[5] A. Rago, C. Marcos, and A. Diaz-Pace, “Uncovering quality-attribute
concerns in use case specifications via early aspect mining,” Requirements
Engineering, vol. 18, no. 1, pp. 67–84, March 2012. [Online]. Available:
http://dx.doi.org/10.1007/s00766-011-0142-z
[6] ——, “Assisting requirements analysts to find latent concerns with
REAssistant,” Automated Software Engineering, June 2014.
[7] D. Ferrucci and A. Lally, “UIMA: an architectural approach to unstruc-
tured information processing in the corporate research environment,”
Natural Language Engineering, vol. 10, no. 3-4, pp. 327–348, 2004.
[8] A. Rago, C. Marcos, and A. Diaz-Pace, “Identifying duplicate function-
(c) Traceability view ality in textual use cases by aligning semantic actions,” Software and
Systems Modeling, August 2014.