=Paper=
{{Paper
|id=Vol-1554/PD_MoDELS_2015_paper_11
|storemode=property
|title=REAssistant: a Tool for Identifying Crosscutting Concerns in Textual Requirements
|pdfUrl=https://ceur-ws.org/Vol-1554/PD_MoDELS_2015_paper_11.pdf
|volume=Vol-1554
|authors=Alejandro Rago,Claudia Marcos,J. Andres Diaz-Pace
|dblpUrl=https://dblp.org/rec/conf/models/RagoMD15a
}}
==REAssistant: a Tool for Identifying Crosscutting Concerns in Textual Requirements==
<pdf width="1500px">https://ceur-ws.org/Vol-1554/PD_MoDELS_2015_paper_11.pdf</pdf>
<pre>
     REAssistant: a Tool for Identifying Crosscutting
           Concerns in Textual Requirements
                              Alejandro Rago∗†1 , Claudia Marcos∗‡2 , J. Andrés Diaz-Pace∗†3
                                ∗ Instituto Superior de Ingeniería de Software (ISISTAN-UNICEN)

                                                   Tandil, Buenos Aires, Argentina
                                                       † CONICET, Argentina
                                                   ‡ CIC, Buenos Aires, Argentina
                                     1 2 3
                                             {arago,cmarcos,adiaz}@exa.unicen.edu.ar


   Abstract—Use case modeling is very useful to capture re-           and time-consuming task, mainly due to the semantics and
quirements and communicate with the stakeholders. Use cases           ambiguities of natural language.
normally have textual specifications that describe the interactions      In this context, is useful for analysts to rely on tool support
between the system and external actors. However, since use cases
are specified from a functional perspective, concerns that do         for processing requirements and identifying CCCs. Such a
not fit well this decomposition criterion are kept away from          tool should be able to quickly gather a list of candidate
the analysts’ eye and might end up intermingled in multiple           CCCs from the text and present it to the analysts (e.g.,
use cases. These crosscutting concerns (CCCs) are generally           integrity, synchronization, access control, etc.). Then, it is
relevant for analysis, design and implementation activities and       up to the analysts to inspect the list to determine which
should be dealt with from early stages. Unfortunately, identify-
ing such concerns by hand is a cumbersome and error-prone             CCCs are actually relevant. There are several concern mining
task, mainly because it requires a semantic interpretation of         tools available that use Natural Language Processing (NLP)
textual requirements. To ease the analysis of CCCs, we have           techniques and domain-specific dictionaries (e.g., taxonomies
developed an automated tool called REAssistant that is able to        of quality attributes) [3]–[5]. Unfortunately, these tools have
extract semantic information from textual use cases and reveal        trouble to identify portions of functionality implicitly (or
candidate CCCs, helping analysts to reason about them before
making important commitments in the development. Our tool             indirectly) affected by CCCs because they have poor semantic
performs a series of advanced NLP analyses based on the UIMA          capabilities when processing textual requirements.
framework. Analysts can define concern-specific queries in the           To overcome these limitations, we have developed the
tool to search for CCCs in the requirements via a flexible SQL-       REAssistant (REquirements Analysis Assistant) tool [6]. Our
like language. In this article, we briefly discuss the technologies   tool supports the search of latent CCCs by relying on ad-
behind the tool and explain how an end user can interact
with REAssistant to analyze CCCs in use case specifications.          vanced NLP modules and domain knowledge about use cases.
A short video explaining the main features of the tool can be         REAssistant uses an annotation-based representation of use
found at https://youtu.be/i3kSJil_2eg. The REAssistant tool can       cases that holds lexical, syntactical, semantical and domain
be downloaded from https://code.google.com/p/reassistant.             information of the text. Furthermore, our tool is equipped
                                                                      with a NLP pipeline assembled with the UIMA framework
                       I. I NTRODUCTION                               that decorates the use cases with annotations [7]. The pipeline
                                                                      performs several linguistic analyses in the text, such as:
   Most software systems have certain concerns that are key           dependency parsing, semantic role labeling and domain actions
for the success of a project [1]. These concerns are often            classification [6]. To find candidate concerns, the tool provides
related to business goals of the system (e.g., profit, market         customizable concern-specific rules that can query the annota-
opportunities or brand positioning, etc.) and quality attributes      tions generated earlier to extract not only CCCs but also their
(e.g., performance, fault tolerance or security, etc.) [2]. Since     crosscutting relations (i.e., requirements affected by CCCs).
many requirements modeling techniques (e.g., use cases) are           The rules take advantage of the so-called "domain actions",
based on a functional decomposition criterion, some concerns          which are a taxonomy of domain-neutral classes applicable to
are likely to be hidden in textual specifications, tangled with       use cases. Finally, our tool is implemented as a set of Eclipse
functionality and scattered across documents. These concerns          plugins that provide mechanisms for the analysis of concerns,
are referred to as crosscutting concerns (CCCs) [1]. For              including special views for visualizing CCCs at different levels
example, an access control policy (part of a security quality         of granularity.
attribute) can subtlety appear in multiple use cases, and might          The rest of this article is organized in 3 sections. Section
be overlooked by analysts and architects during the system            II explains the concern discovery problem with a motivating
design. Since requirements are commonly documented in                 example. Section III briefly discusses the architectural design
natural language, analysts and developers must peruse textual         of the tool and its components. Finally, Section IV provides a
specifications to reveal CCCs of interest for further analysis.       quick tour of the working of REAssistant from the viewpoint
Still, searching for latent concerns in requirements is a difficult   of an analyst.
                Figure 1. Concern impacts in use cases                           Figure 2. Overview of the Architecture of REAssistant


    II. R EVEALING L ATENT C ONCERNS IN U SE C ASES                                  III. A RCHITECTURE OF REAssistant

   Identifying CCCs in use case specifications demands a                  REAssistant is built on the Eclipse IDE as a set of plugins
careful manual inspection by analysts, as well as a dosage of          that support both the linguistic analysis of textual use cases and
domain experience. There are three main activities that ana-           the execution of search rules for identifying concerns [6]. Fig-
lysts should do: i) finding candidate concern(s), ii) determining      ure 2 shows the main components of the architecture, namely:
the “real” concerns and specifying them, and iii) identifying          UIMAProcessingEngine, QueryingEngine, and REAssistant-
the points of the specification (e.g., use case steps) affected by     GUI. The communication among these components take place
each concern, i.e., finding its crosscutting relations or impacts.     through files that contain (serialized) EMF1 models. Use cases
Let us consider the excerpts from three use cases shown in Fig-        are first imported into the UIMAProcessingEngine via the
ure 1. The sentences of UCS1 and UCS3 qualify functionality            UseCaseReader component, which gathers text from varied
with phrases such as “in less than 10 seconds” and “as fast as         sources, such as PDF, DOC, HTML files, or directly from the
possible”, which are hints of a P ERFORMANCE concern. Thus,            use case editor bundled in REAssistant. Then, the text is stored
an analyst can interpret that there is a P ERFORMANCE CCC              in the AnnotationSchema, which is a shared data structure
at play, and search for performance-related words to quickly           that allows the communication of text analytic modules. Once
expose the concern. We call these explicit references in the text      imported, a pipeline of Annotators takes the text from use
direct impacts of the concern. Several tools currently support         cases and breaks them into individual sentences (e.g., behavior
keyword-based searches to mine them [3]–[5].                           steps), and automatically generates different annotations for
   However, there might be other use cases implicitly affected         these sentences. There are two kind of annotators. The first
by the same concern as well. For instance, an experienced an-          set of annotators run a series of NLP tasks that include
alyst could determine that one of the steps of UCS2, referring         standard linguistic analyses (e.g., stemming, POS tagging).
to a “computation”, is also constrained by the P ERFORMANCE            The second set of annotators runs more complex analyses (e.g.,
CCC. This relation can be discovered after making a semantic           semantic role labeling) for extracting the predicate structure
analysis of the text, rather than a lexical or syntactical analysis.   of the sentences and for mapping these predicates to domain
We call these implicit relations in the text indirect impacts          actions. The computation of domain actions is performed
of the concern. Indirect impacts are usually harder to detect          with a special classifier reported in [6]. Furthermore, a NLP
because they require an interpretation of the semantics in             specialist can configure annotators via UIMA before running
textual requirements. From an automation viewpoint, indirect           the pipeline. The resulting annotations are later exported to
impacts can be (approximately) detected by uncovering asso-            an EMF model. The QueryingEngine is equipped with the
ciations between specific concerns and “abstract” actions (e.g.,       concern-specific searching rules that were defined beforehand
compute, calculate, perform, execute) expressed in use cases,          by experts. By analyzing the use cases and their annotations,
because such associations often hold the key for recognizing           the QueryingEngine executes the searching rules on the text
concern impacts. However, it is the analysts’ responsibility to        of use cases. At last, the REAssistantGUI features different
determine if a sentence is truly affected by a CCC. The role of        views for browsing candidate CCCs and their impacts. This
tool support is then to recommend potential CCCs and let the           component provides edition and visualization support for the
analyst make the final decisions. Unfortunately, existing tools        analyst to explore and refine the concerns found.
for mining concerns have problems to detect indirect impacts,
because their semantic-level features are limited.                       1 http://www.eclipse.org/modeling/emf/
               Figure 3. Annotations in a use case sentence                       Figure 4. Rule syntax for searching CCCs


                                                                               Figure 5. Concern editor provided in REAssistant


   REAssistant leverages on the UIMA framework2 [6], [8].
UIMA is an extensible architecture for building analytic appli-
cations that process unstructured information. The architecture
of our tool makes extensive use of the annotation mechanisms
provided by UIMA. An annotation identifies and labels a
specific region of a text document. Figure 3 shows a linguistic
analysis of a use case step from a requirements specification.
The annotations of level 1 correspond to tokens. Direct impacts
would be typically discovered by analyzing information at this
level. The annotations of levels 2 and 3 provide richer infor-
mation, such as the predicate structure and domain actions,
respectively. Indirect impacts can be discovered by querying
information at level 3.
   The QueryingEngine is implemented on top of the EMF              publication, we also report on the results of an empirical
Query23 project, which serves as an SQL-like language for           evaluation of REAssistant with three case-studies.
searching through EMF models. The rule syntax is sim-
ple to understand and powerful enough to express concern-                             IV. REAssistant IN ACTION
related queries. In addition, we have developed an abstraction
layer that allows analysts to seamlessly incorporate UIMA-             The REAssistant tool offers analysts functionality for editing
generated annotations in the queries. There are two types of        use cases, performing a linguistic analysis of the use cases, and
rules: i) direct rules, responsible for detecting a CCC; and ii)    applying searching rules for identifying CCCs. In this section,
indirect rules, for detecting domain actions that are potentially   we discuss the operation of the tool from the perspective of an
related to that concern. Direct rules are focused in finding        analyss who is using it and explain how she/he interacts with
explicit references to a particular CCC, for example, the           the tool in the concern identification and analysis process (see
word “server” or “database”. Complementary, indirect rules          a video at https://youtu.be/i3kSJil_2eg). Initially, the analyst
are focused in finding more subtle associations that come from      needs to provide the text of use case specifications. Our tool
a semantic interpretation of the use cases. Figure 4 illustrates    has a form-based editor that handles the documentation of use
a P ERFORMANCE rule composed of three queries. Query #1             cases and stores them in a persistent file with extension “ucs”.
would find parts of the text related to P ERFORMANCE through        The internal structure of “ucs” files are based on a standard
the analysis of token lemmas such as “response” and “second”,       use case template that contains sections to describe actors,
similarly to keyword-based approaches. Queries #2 and #3            main flow, alternative flows, supplementary requirements, etc.
make use of domain actions to reveal indirect impacts, looking      Once the “ucs” file is complete, analysts can automatically run
for actions such as “calculation” and “process”. For more           a series of NLP analyses on the use cases. From the user’s
information about the architecture of the tool, the NLP pipeline    viewpoint, the linguistic analyses will produce all kinds of
and the concern ruleset, the reader is referred to [6]. In this     meta-information for the use cases in the form of layers of
                                                                    annotations, which are later stored in a persistent file with
  2 http://uima.apache.org/                                         extension “uima”. This file holds the results of the semantic
  3 http://www.eclipse.org/modeling/emf/downloads/?project=query2   analysis of the text.
Figure 6. Views provided in REAssistant for analyzing CCCs      After the text is processed, users can open a new editor
                                                             to conduct analyses for the CCCs and their relations with
                                                             the use cases. The editor will create a persistent file with
                                                             extension “rea”. Figure 5 shows a snapshot of this editor,
                                                             where the analysts are free to accept, modify or delete any
                                                             of the concerns detected, based on their understanding of
                                                             the requirements. In order to identify CCCs, users just have
                                                             to press a button labeled “Rule Mine CCC” to execute the
                                                             predefined queries loaded in REAssistant with the rule-based
                                                             engine. The queries codify knowledge about concerns and
                                                             how they relate semantically to natural language expressions,
                                                             and were defined by experienced analysts to cover a wide
                                                             range of software domains. Anyway, our tool has an editor
                                                             in which analysts can customize the rules at any time. Let
                                                             us assume that the analyst selects the rules associated to the
                     (a) Global view                         P ERFORMANCE concern. The execution of the rules will mark
                                                             the sentences that are potentially crosscut by the concern.
                                                             The tool can display the crosscutting relations using different
                                                             colors on the text and at two levels of granularity: at the level
                                                             of use cases (global view, Figure 7a), or at the level of behavior
                                                             steps for a given use case (detailed view, Figure 7b). There is
                                                             also another view within the concern editor that computes a
                                                             traceability matrix among the use cases and the concerns. In
                                                             this way, the analyst can easily get insights on: how a given
                                                             concern impacts on the use cases, whether a concern is well-
                                                             modularized (in terms of a narrow set of use cases), or how a
                                                             given use case gets affected by several concerns.

                                                                                           R EFERENCES
                                                             [1] A. Moreira, R. Chitchyan, J. Araujo, and A. Rashid, Eds., Aspect-Oriented
                    (b) Detailed view                            Requirements Engineering. Springer Berlin Heidelberg, 2013, vol. XIX.
                                                             [2] L. Bass, P. Clements, and R. Kazman, Software Architecture in Practice,
                                                                 3rd ed., ser. SEI Series in Software Engineering.         Addison-Wesley
                                                                 Professional, October 2012.
                                                             [3] E. Baniassad, P. Clements et al., “Discovering early aspects,” IEEE
                                                                 Software, vol. 23, no. 1, pp. 61–70, 2006.
                                                             [4] A. Sampaio, A. Rashid, R. Chitchyan, and P. Rayson, “EA-Miner: towards
                                                                 automation in aspect-oriented requirements engineering,” Transactions on
                                                                 Aspect-Oriented Software Development III, pp. 4–39, 2007.
                                                             [5] A. Rago, C. Marcos, and A. Diaz-Pace, “Uncovering quality-attribute
                                                                 concerns in use case specifications via early aspect mining,” Requirements
                                                                 Engineering, vol. 18, no. 1, pp. 67–84, March 2012. [Online]. Available:
                                                                 http://dx.doi.org/10.1007/s00766-011-0142-z
                                                             [6] ——, “Assisting requirements analysts to find latent concerns with
                                                                 REAssistant,” Automated Software Engineering, June 2014.
                                                             [7] D. Ferrucci and A. Lally, “UIMA: an architectural approach to unstruc-
                                                                 tured information processing in the corporate research environment,”
                                                                 Natural Language Engineering, vol. 10, no. 3-4, pp. 327–348, 2004.
                                                             [8] A. Rago, C. Marcos, and A. Diaz-Pace, “Identifying duplicate function-
                   (c) Traceability view                         ality in textual use cases by aligning semantic actions,” Software and
                                                             Systems Modeling, August 2014.

</pre>