<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>REAssistant: a Tool for Identifying Crosscutting Concerns in Textual Requirements</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Alejandro Rago y</string-name>
          <email>arago@exa.unicen.edu.ar</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Claudia Marcos z</string-name>
          <email>cmarcos@exa.unicen.edu.ar</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>J. Andrés Diaz-Pace y</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Instituto Superior de Ingeniería de Software (ISISTAN-UNICEN) Tandil</institution>
          ,
          <addr-line>Buenos Aires</addr-line>
          ,
          <country country="AR">Argentina</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>-Use case modeling is very useful to capture requirements and communicate with the stakeholders. Use cases normally have textual specifications that describe the interactions between the system and external actors. However, since use cases are specified from a functional perspective, concerns that do not fit well this decomposition criterion are kept away from the analysts' eye and might end up intermingled in multiple use cases. These crosscutting concerns (CCCs) are generally relevant for analysis, design and implementation activities and should be dealt with from early stages. Unfortunately, identifying such concerns by hand is a cumbersome and error-prone task, mainly because it requires a semantic interpretation of textual requirements. To ease the analysis of CCCs, we have developed an automated tool called REAssistant that is able to extract semantic information from textual use cases and reveal candidate CCCs, helping analysts to reason about them before making important commitments in the development. Our tool performs a series of advanced NLP analyses based on the UIMA framework. Analysts can define concern-specific queries in the tool to search for CCCs in the requirements via a flexible SQLlike language. In this article, we briefly discuss the technologies behind the tool and explain how an end user can interact with REAssistant to analyze CCCs in use case specifications. A short video explaining the main features of the tool can be found at https://youtu.be/i3kSJil_2eg. The REAssistant tool can be downloaded from https://code.google.com/p/reassistant.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>I. INTRODUCTION</title>
      <p>
        Most software systems have certain concerns that are key
for the success of a project [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. These concerns are often
related to business goals of the system (e.g., profit, market
opportunities or brand positioning, etc.) and quality attributes
(e.g., performance, fault tolerance or security, etc.) [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Since
many requirements modeling techniques (e.g., use cases) are
based on a functional decomposition criterion, some concerns
are likely to be hidden in textual specifications, tangled with
functionality and scattered across documents. These concerns
are referred to as crosscutting concerns (CCCs) [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. For
example, an access control policy (part of a security quality
attribute) can subtlety appear in multiple use cases, and might
be overlooked by analysts and architects during the system
design. Since requirements are commonly documented in
natural language, analysts and developers must peruse textual
specifications to reveal CCCs of interest for further analysis.
Still, searching for latent concerns in requirements is a difficult
and time-consuming task, mainly due to the semantics and
ambiguities of natural language.
      </p>
      <p>
        In this context, is useful for analysts to rely on tool support
for processing requirements and identifying CCCs. Such a
tool should be able to quickly gather a list of candidate
CCCs from the text and present it to the analysts (e.g.,
integrity, synchronization, access control, etc.). Then, it is
up to the analysts to inspect the list to determine which
CCCs are actually relevant. There are several concern mining
tools available that use Natural Language Processing (NLP)
techniques and domain-specific dictionaries (e.g., taxonomies
of quality attributes) [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]–[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Unfortunately, these tools have
trouble to identify portions of functionality implicitly (or
indirectly) affected by CCCs because they have poor semantic
capabilities when processing textual requirements.
      </p>
      <p>
        To overcome these limitations, we have developed the
REAssistant (REquirements Analysis Assistant) tool [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Our
tool supports the search of latent CCCs by relying on
advanced NLP modules and domain knowledge about use cases.
REAssistant uses an annotation-based representation of use
cases that holds lexical, syntactical, semantical and domain
information of the text. Furthermore, our tool is equipped
with a NLP pipeline assembled with the UIMA framework
that decorates the use cases with annotations [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. The pipeline
performs several linguistic analyses in the text, such as:
dependency parsing, semantic role labeling and domain actions
classification [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. To find candidate concerns, the tool provides
customizable concern-specific rules that can query the
annotations generated earlier to extract not only CCCs but also their
crosscutting relations (i.e., requirements affected by CCCs).
The rules take advantage of the so-called "domain actions",
which are a taxonomy of domain-neutral classes applicable to
use cases. Finally, our tool is implemented as a set of Eclipse
plugins that provide mechanisms for the analysis of concerns,
including special views for visualizing CCCs at different levels
of granularity.
      </p>
      <p>The rest of this article is organized in 3 sections. Section
II explains the concern discovery problem with a motivating
example. Section III briefly discusses the architectural design
of the tool and its components. Finally, Section IV provides a
quick tour of the working of REAssistant from the viewpoint
of an analyst.</p>
    </sec>
    <sec id="sec-2">
      <title>II. REVEALING LATENT CONCERNS IN USE CASES</title>
      <p>
        Identifying CCCs in use case specifications demands a
careful manual inspection by analysts, as well as a dosage of
domain experience. There are three main activities that
analysts should do: i) finding candidate concern(s), ii) determining
the “real” concerns and specifying them, and iii) identifying
the points of the specification (e.g., use case steps) affected by
each concern, i.e., finding its crosscutting relations or impacts.
Let us consider the excerpts from three use cases shown in
Figure 1. The sentences of UCS1 and UCS3 qualify functionality
with phrases such as “in less than 10 seconds” and “as fast as
possible”, which are hints of a PERFORMANCE concern. Thus,
an analyst can interpret that there is a PERFORMANCE CCC
at play, and search for performance-related words to quickly
expose the concern. We call these explicit references in the text
direct impacts of the concern. Several tools currently support
keyword-based searches to mine them [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]–[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>However, there might be other use cases implicitly affected
by the same concern as well. For instance, an experienced
analyst could determine that one of the steps of UCS2, referring
to a “computation”, is also constrained by the PERFORMANCE
CCC. This relation can be discovered after making a semantic
analysis of the text, rather than a lexical or syntactical analysis.
We call these implicit relations in the text indirect impacts
of the concern. Indirect impacts are usually harder to detect
because they require an interpretation of the semantics in
textual requirements. From an automation viewpoint, indirect
impacts can be (approximately) detected by uncovering
associations between specific concerns and “abstract” actions (e.g.,
compute, calculate, perform, execute) expressed in use cases,
because such associations often hold the key for recognizing
concern impacts. However, it is the analysts’ responsibility to
determine if a sentence is truly affected by a CCC. The role of
tool support is then to recommend potential CCCs and let the
analyst make the final decisions. Unfortunately, existing tools
for mining concerns have problems to detect indirect impacts,
because their semantic-level features are limited.</p>
    </sec>
    <sec id="sec-3">
      <title>III. ARCHITECTURE OF REAssistant</title>
      <p>
        REAssistant is built on the Eclipse IDE as a set of plugins
that support both the linguistic analysis of textual use cases and
the execution of search rules for identifying concerns [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
Figure 2 shows the main components of the architecture, namely:
UIMAProcessingEngine, QueryingEngine, and
REAssistantGUI. The communication among these components take place
through files that contain (serialized) EMF1 models. Use cases
are first imported into the UIMAProcessingEngine via the
UseCaseReader component, which gathers text from varied
sources, such as PDF, DOC, HTML files, or directly from the
use case editor bundled in REAssistant. Then, the text is stored
in the AnnotationSchema, which is a shared data structure
that allows the communication of text analytic modules. Once
imported, a pipeline of Annotators takes the text from use
cases and breaks them into individual sentences (e.g., behavior
steps), and automatically generates different annotations for
these sentences. There are two kind of annotators. The first
set of annotators run a series of NLP tasks that include
standard linguistic analyses (e.g., stemming, POS tagging).
The second set of annotators runs more complex analyses (e.g.,
semantic role labeling) for extracting the predicate structure
of the sentences and for mapping these predicates to domain
actions. The computation of domain actions is performed
with a special classifier reported in [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Furthermore, a NLP
specialist can configure annotators via UIMA before running
the pipeline. The resulting annotations are later exported to
an EMF model. The QueryingEngine is equipped with the
concern-specific searching rules that were defined beforehand
by experts. By analyzing the use cases and their annotations,
the QueryingEngine executes the searching rules on the text
of use cases. At last, the REAssistantGUI features different
views for browsing candidate CCCs and their impacts. This
component provides edition and visualization support for the
analyst to explore and refine the concerns found.
      </p>
      <p>1http://www.eclipse.org/modeling/emf/</p>
      <p>
        REAssistant leverages on the UIMA framework2 [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
UIMA is an extensible architecture for building analytic
applications that process unstructured information. The architecture
of our tool makes extensive use of the annotation mechanisms
provided by UIMA. An annotation identifies and labels a
specific region of a text document. Figure 3 shows a linguistic
analysis of a use case step from a requirements specification.
The annotations of level 1 correspond to tokens. Direct impacts
would be typically discovered by analyzing information at this
level. The annotations of levels 2 and 3 provide richer
information, such as the predicate structure and domain actions,
respectively. Indirect impacts can be discovered by querying
information at level 3.
      </p>
      <p>
        The QueryingEngine is implemented on top of the EMF
Query23 project, which serves as an SQL-like language for
searching through EMF models. The rule syntax is
simple to understand and powerful enough to express
concernrelated queries. In addition, we have developed an abstraction
layer that allows analysts to seamlessly incorporate
UIMAgenerated annotations in the queries. There are two types of
rules: i) direct rules, responsible for detecting a CCC; and ii)
indirect rules, for detecting domain actions that are potentially
related to that concern. Direct rules are focused in finding
explicit references to a particular CCC, for example, the
word “server” or “database”. Complementary, indirect rules
are focused in finding more subtle associations that come from
a semantic interpretation of the use cases. Figure 4 illustrates
a PERFORMANCE rule composed of three queries. Query #1
would find parts of the text related to PERFORMANCE through
the analysis of token lemmas such as “response” and “second”,
similarly to keyword-based approaches. Queries #2 and #3
make use of domain actions to reveal indirect impacts, looking
for actions such as “calculation” and “process”. For more
information about the architecture of the tool, the NLP pipeline
and the concern ruleset, the reader is referred to [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. In this
2http://uima.apache.org/
3http://www.eclipse.org/modeling/emf/downloads/?project=query2
publication, we also report on the results of an empirical
evaluation of REAssistant with three case-studies.
      </p>
    </sec>
    <sec id="sec-4">
      <title>IV. REAssistant IN ACTION</title>
      <p>The REAssistant tool offers analysts functionality for editing
use cases, performing a linguistic analysis of the use cases, and
applying searching rules for identifying CCCs. In this section,
we discuss the operation of the tool from the perspective of an
analyss who is using it and explain how she/he interacts with
the tool in the concern identification and analysis process (see
a video at https://youtu.be/i3kSJil_2eg). Initially, the analyst
needs to provide the text of use case specifications. Our tool
has a form-based editor that handles the documentation of use
cases and stores them in a persistent file with extension “ucs”.
The internal structure of “ucs” files are based on a standard
use case template that contains sections to describe actors,
main flow, alternative flows, supplementary requirements, etc.
Once the “ucs” file is complete, analysts can automatically run
a series of NLP analyses on the use cases. From the user’s
viewpoint, the linguistic analyses will produce all kinds of
meta-information for the use cases in the form of layers of
annotations, which are later stored in a persistent file with
extension “uima”. This file holds the results of the semantic
analysis of the text.
(a) Global view
(b) Detailed view</p>
      <p>After the text is processed, users can open a new editor
to conduct analyses for the CCCs and their relations with
the use cases. The editor will create a persistent file with
extension “rea”. Figure 5 shows a snapshot of this editor,
where the analysts are free to accept, modify or delete any
of the concerns detected, based on their understanding of
the requirements. In order to identify CCCs, users just have
to press a button labeled “Rule Mine CCC” to execute the
predefined queries loaded in REAssistant with the rule-based
engine. The queries codify knowledge about concerns and
how they relate semantically to natural language expressions,
and were defined by experienced analysts to cover a wide
range of software domains. Anyway, our tool has an editor
in which analysts can customize the rules at any time. Let
us assume that the analyst selects the rules associated to the
PERFORMANCE concern. The execution of the rules will mark
the sentences that are potentially crosscut by the concern.
The tool can display the crosscutting relations using different
colors on the text and at two levels of granularity: at the level
of use cases (global view, Figure 7a), or at the level of behavior
steps for a given use case (detailed view, Figure 7b). There is
also another view within the concern editor that computes a
traceability matrix among the use cases and the concerns. In
this way, the analyst can easily get insights on: how a given
concern impacts on the use cases, whether a concern is
wellmodularized (in terms of a narrow set of use cases), or how a
given use case gets affected by several concerns.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Moreira</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Chitchyan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Araujo</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A</given-names>
            . Rashid, Eds.,
            <surname>Aspect-Oriented Requirements</surname>
          </string-name>
          Engineering. Springer Berlin Heidelberg,
          <year>2013</year>
          , vol. XIX.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>L.</given-names>
            <surname>Bass</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Clements</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>Kazman</surname>
          </string-name>
          , Software Architecture in Practice, 3rd ed., ser. SEI Series in Software Engineering.
          <string-name>
            <surname>Addison-Wesley</surname>
            <given-names>Professional</given-names>
          </string-name>
          ,
          <year>October 2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>E.</given-names>
            <surname>Baniassad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Clements</surname>
          </string-name>
          et al., “Discovering early aspects,
          <source>” IEEE Software</source>
          , vol.
          <volume>23</volume>
          , no.
          <issue>1</issue>
          , pp.
          <fpage>61</fpage>
          -
          <lpage>70</lpage>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>A.</given-names>
            <surname>Sampaio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Rashid</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Chitchyan</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P.</given-names>
            <surname>Rayson</surname>
          </string-name>
          , “
          <article-title>EA-Miner: towards automation in aspect-oriented requirements engineering,” Transactions on Aspect-Oriented Software Development III</article-title>
          , pp.
          <fpage>4</fpage>
          -
          <lpage>39</lpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A.</given-names>
            <surname>Rago</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Marcos</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Diaz-Pace</surname>
          </string-name>
          , “
          <article-title>Uncovering quality-attribute concerns in use case specifications via early aspect mining,” Requirements Engineering</article-title>
          , vol.
          <volume>18</volume>
          , no.
          <issue>1</issue>
          , pp.
          <fpage>67</fpage>
          -
          <lpage>84</lpage>
          ,
          <year>March 2012</year>
          . [Online]. Available: http://dx.doi.org/10.1007/s00766-011-0142-z
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6] --, “
          <article-title>Assisting requirements analysts to find latent concerns with REAssistant,”</article-title>
          <source>Automated Software Engineering</source>
          ,
          <year>June 2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>D.</given-names>
            <surname>Ferrucci</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Lally</surname>
          </string-name>
          , “
          <article-title>UIMA: an architectural approach to unstructured information processing in the corporate research environment,” Natural Language Engineering</article-title>
          , vol.
          <volume>10</volume>
          , no.
          <issue>3-4</issue>
          , pp.
          <fpage>327</fpage>
          -
          <lpage>348</lpage>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>A.</given-names>
            <surname>Rago</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Marcos</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Diaz-Pace</surname>
          </string-name>
          , “
          <article-title>Identifying duplicate functionality in textual use cases by aligning semantic actions</article-title>
          ,
          <source>” Software and Systems Modeling</source>
          ,
          <year>August 2014</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>