<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Research on NLP for RE at the University of Hamburg: a Report</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Davide Fucci</string-name>
          <email>fucci@informatik.uni-hamburg.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Hamburg</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Germany</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Hamburg</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Germany</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Timo Johann</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Walid Maalej</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>HITeC/University of Hamburg</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Hamburg</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>University of Hamburg</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>The Mobile Applied Software Technology (MAST) group at the University of Hamburg focuses its research on context-aware adaptive systems and the social side of software engineering. In the context of natural language processing for requirements engineering, the group has mostly focused on mining app stores reviews. Currently, the group is involved in the OpenReq project where natural language processing is being used to recommend requirements from diverse sources (e.g., social media, issue trackers), and to improve the structural quality of existing requirements.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Copyright c 2018 by the paper's authors. Copying permitted for private and academic purposes.</p>
      <p>1https://mast.informatik.uni-hamburg.de/</p>
      <p>SSE pertains the social and human aspects of software engineering as well as the engineering of social
software. Within SSE, we recognize the importance of software socialness |the systematic involvement of
end-users and their communities in the software life cycle, from authoring documentation to even development
and integration tasks.</p>
      <p>There are several synergies between these research topics investigated by the group and NLP for RE. With
the advent of app stores, this is especially the case in mobile services domain. Users produce large, complex, yet
information-rich textual data on app stores which can be analyzed, using NLP approaches, to extract
requirements. At the same time, recommender systems leverage structured and semi-structured data to support the
work of requirements engineers (e.g., requirements elicitation) together with other stakeholders (e.g.,
requirements negotiation).
2</p>
    </sec>
    <sec id="sec-2">
      <title>Past Research on NLP for RE</title>
      <p>This section summarizes, in ascending chronological order, the work done by the MAST research group, which
focuses on NLP to advance the state-of-the-art in requirements engineering.</p>
      <p>Our investigations focus on user-driven requirements engineering. In particular, our NLP studies target
usergenerated textual content in review systems such as app stores (e.g., Google Play, Apple App Store, Amazon
Appstore).
2.1</p>
      <sec id="sec-2-1">
        <title>App Reviews</title>
        <p>The data, scripts, and tools for the paper described in this subsection are available at research group website2.</p>
      </sec>
      <sec id="sec-2-2">
        <title>How Do Users Like this Feature? A Fine Grained Sentiment Analysis of App Reviews</title>
        <p>Guzman and Maalej [GM14] use NLP to extract app features from app reviews and analyze the sentiment users
show when discussing these features. For the feature extraction, they perform ordinary text preprocessing steps
such as stop-word removal, lemmatization, and part-of-speech ltering. After the preprocessing, collocations are
used to nd app features in the reviews. The collocation process ignores the word order, takes a word window
of three words, and is only considered if it appears in at least three reviews. Then collocations, also with similar
words, are grouped. Finally, the most frequent collocation within each group was selected as the representative
name for that feature.</p>
        <p>Moreover, a sentiment analysis was performed using SentiStrength [TBP+10]. This analysis shows how users
express their opinion about speci c features, or in the whole review. SentiStrength calculates a positive and a
negative score for a given text, as both types of expressions can be part of a single text.</p>
        <p>As a result of this work, we can extract app features with an average f1-score of 55% and show how these
features are perceived (e.g., either positively or negatively) by the users.</p>
      </sec>
      <sec id="sec-2-3">
        <title>On the automatic classi cation of app reviews</title>
        <p>Maalej et al.[MKNS16] paper on the classi cation of app reviews is an extended version of the previously
submitted work of Maalej and Nabil [MN15], which focuses on automatically classifying app reviews as bug report,
feature request, user experience, and rating. This paper approaches the classi cation problem by analyzing which
classi er achieves better results and by trying di erent combinations of machine learning features. In the paper,
we consider metadata and NLP based information as machine learning features. For the classi cation, the
results are reported by using only reviews metadata, or only NLP-based machine learning features, or with the
combination of both. The data used in the approach are app reviews from the Google Play Store and the Apple
App Store. The classi cation benchmark shows promising results with f1-scores ranging from 89% to 99% for
the four classes.</p>
        <p>Besides the classi cation, we developed a prototype of an analytics tool that aggregates the information
retrieved from the classi cation. The tool shows, for example, how the number of bugs evolved, the distribution
of the four classes for an app in di erent app stores, and gives deeper insight by showing concrete reviews in
each class. Finally, the tool was evaluated by interviews with nine practitioners, such as software developers and
analysts. The interviews show that most practitioners have a need for ltering app reviews that do not contain
useful information, such as \great app", or \I hate it".</p>
        <p>2https://mast.informatik.uni-hamburg.de/app-review-analysis/</p>
      </sec>
      <sec id="sec-2-4">
        <title>SAFE: A Simple Approach for Feature Extraction from App Descriptions and App Reviews</title>
        <p>In this paper, Johann et al. [JSM+17] describe a uniform approach (SAFE) that can extract app features from
app descriptions, app reviews, and matches both together. SAFE extracts app features without prior machine
learning training to analyze what features the app developer provide and to understand how the users talk about
it. To extract app features, we use NLP to analyze the structure of sentences. Through qualitative analysis, we
found that there are 18 common part-of-speech patterns and four common sentence structures that describe app
features. The extraction from app descriptions achieved an average f1-score of 46% while the extraction from
reviews had an average f-score of 35%.</p>
        <p>After SAFE extracted the app features from the app description and reviews, the nal step was to match
which features were mentioned in both sources. This information provides insights about the app, such as the
identi cation of (un)popular features, feature requests, and bug reports. The matching was performed in three
steps. First SAFE checks if the terms contained in both sources (i.e., app description and app reviews) are
identical. Second, we tackle language ambiguity using WordNet to compare the synonyms of each word of the
app feature. Third, SAFE extracts the semantic similarity of the app features and calculates the cosine similarity
to nd a match. The matching procedure achieved and accuracy of 87%.
2.2</p>
      </sec>
      <sec id="sec-2-5">
        <title>Mining User Rationale from Software Reviews</title>
        <p>Kurtanovic and Maalej [KM17b] introduce user rationale for requirements engineering. Motivated by the amount
of data available in social media, user forums, and app stores, software vendors started to give these channels
increasing attention. Software vendors want to easily access users' input to make better decisions about software
design, its development, and the evolution. This work focuses on the identi cation of design- and user rationale,
which can be valuable for software and requirements engineering. In this work, we found, among others, that
rationale, alternatives, criteria, and decisions often co-occur in user comments and that in 21% to 70% of the
cases they contain justi cations.</p>
        <p>In this work, we studied 32,414 reviews for 52 software applications in the Amazon Store. To identify user
rationale, we employ a supervised machine learning approach using text, metadata, sentiments, and syntactic
features and compare these results between three classi cation algorithms (Naive Bayes, Support Vector Machine,
and Logistic Regression). The classi cation is tested with di erent con gurations and predicts user rationale at
comment and sentence level. The precision and recall for all considered user rationale concepts range between
80%-99% at a comment level and between 69%-98% at a sentence level.
2.3</p>
      </sec>
      <sec id="sec-2-6">
        <title>Other</title>
        <p>In this section, we report our experience with topics, other than the application of NLP to app reviews, which
we deem interesting for the community.</p>
        <p>Toward Data-Driven Requirements Engineering In this paper [MNJR16], we suggest a shift in the
requirements engineering community to include user feedback to enable user-centered, data-driven identi cation,
prioritization, and management of software requirements. We show the importance of user feedback and explain
what research has achieved so far. These achievements are scoped to the area of analytics of user feedback,
such as classifying user feedback into bug reports and feature requests, the classi cation of stakeholders, and the
summarization of user reviews. One primary focus of the paper is to show how these topics are addressed using
NLP-based approaches.</p>
      </sec>
      <sec id="sec-2-7">
        <title>Automatically Classifying Functional and Non-Functional Requirements Using Supervised Ma</title>
        <p>chine Learning Kurtanovic and Maalej [KM17a] use the supervised machine learning classi er Support Vector
Machine to classify functional (FR) and non-functional requirements (NFR) automatically using metadata,
lexical features, and syntactical features of the requirement text. We show how to classify ne-grained NFRs, such
as Usability, Security, Operational, and Performance. From a methodological perspective, one contribution of
this paper is the use of under- and over-sampling strategies to handle imbalanced data in the di erent NFR
classes. The classi cation of FRs and NFRs achieved an f1-score of up to 93%. The classi cation results of more
speci c NFRs achieved f1-scores ranging between 51% and 82%.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Research Plan on NLP for RE</title>
      <p>Currently, the group is involved in the H2020-funded project OpenReq.3</p>
      <p>The goal of the project is to research, develop, and evaluate intelligent recommendation and decision
technologies that will support communities and individual stakeholders in the gathering and management of software
requirements.</p>
      <p>In particular, OpenReq wants to bridge the gap between the development and usage of software products and
services. To that end, the project aim is to take into account the user community as part of the innovation process
and continuously observe and involve stakeholders and end users in the decision-making process. OpenReq use
cases will cover open-source development, telecommunications, and railways bidding.</p>
      <p>In the context of the project, the group will apply NLP to two speci c activities related to requirements
engineering, i) derive/improve requirements from unstructured text, and ii) improve the quality of existing
requirements.</p>
      <p>Activity i) is currently under development. It consists of collecting explicit user feedback from public
channels, such as social media, reviews system, ticketing systems, and discussion forums, and then aggregating and
analyzing this large amount data to facilitate stakeholders understanding of users needs and help them to react
quickly.</p>
      <p>From an NLP perspective, we are using such data to tackle four tasks:
provide features, based on statistical language processing (e.g., tf-idf, GloVe), for machine learning classi ers.
Here, we want to di erentiate between relevant and irrelevant feedback, as well as further categorize relevant
one|e.g., understand whether the feedback contains a request for a new feature, a complaint about an
existing one, or both
perform sentiment analysis, to assess, for example, the user base reception of a new feature and allow
stakeholders to act accordingly,
perform summarization and facilitate the access to this signi cant amount of data to stakeholders and
decision makers; in this regard, we are interested in visualization techniques to support this task,
perform Named-Entity Recognition (NER) and topic recognition to understand what are the speci c areas
in which the previous tasks can be applied.</p>
      <p>The above points are particularly interesting from a research point-of-view, as the language used in these
texts is not only English but also Italian. Moreover, since much of the data is collected from channels such as
Twitter, the text tends to be short and colloquial.</p>
      <p>Activity ii) is currently in a preliminary phase. Here, we will analyze requirement documents|either
structured (e.g., user stories) or not (e.g., free-form text). NLP techniques will be used to build a recommender
system for improving structural properties of the requirements text.</p>
      <p>In particular, we expect to focus on the following tasks:</p>
      <p>Word sense disambiguation and coreference resolution to identify ambiguous passages in the requirement
text and suggest corrective actions.</p>
      <p>Chunking and relationship extraction to assess (and eventually correct) conformance to templates, such as
user stories.</p>
      <p>Semantic role labeling and textual entailment to assess the completeness of a requirement text concerning
several concerns (e.g., risk).</p>
      <p>As these documents contain domain-speci c knowledge, we are investigating the possibility to support the
NLP approaches with ontologies and glossaries.</p>
    </sec>
    <sec id="sec-4">
      <title>Acknowledgment</title>
      <p>We would like to acknowledge the H2020 EU research project OpenReq (ID 732463).
[GM14]</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <given-names>Emitza</given-names>
            <surname>Guzman</surname>
          </string-name>
          and
          <string-name>
            <given-names>Walid</given-names>
            <surname>Maalej</surname>
          </string-name>
          .
          <article-title>How do users like this feature? a ne grained sentiment analysis of app reviews</article-title>
          .
          <source>In Requirements Engineering Conference (RE)</source>
          ,
          <source>2014 IEEE 22nd International</source>
          , pages
          <volume>153</volume>
          {
          <fpage>162</fpage>
          . IEEE,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [JSM+17]
          <string-name>
            <surname>Timo</surname>
            <given-names>Johann</given-names>
          </string-name>
          , Christoph Stanik,
          <string-name>
            <given-names>Walid</given-names>
            <surname>Maalej</surname>
          </string-name>
          , et al.
          <article-title>Safe: A simple approach for feature extraction from app descriptions and app reviews</article-title>
          .
          <source>In Requirements Engineering Conference (RE)</source>
          ,
          <source>2017 IEEE 25th International</source>
          , pages
          <volume>21</volume>
          {
          <fpage>30</fpage>
          . IEEE,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [KM17a]
          <article-title>[KM17b] Zijad Kurtanovic and Walid Maalej. Automatically classifying functional and non-functional requirements using supervised machine learning</article-title>
          .
          <source>In Requirements Engineering Conference (RE)</source>
          ,
          <source>2017 IEEE 25th International</source>
          , pages
          <volume>490</volume>
          {
          <fpage>495</fpage>
          . IEEE,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <given-names>Zijad</given-names>
            <surname>Kurtanovic</surname>
          </string-name>
          and
          <string-name>
            <given-names>Walid</given-names>
            <surname>Maalej</surname>
          </string-name>
          .
          <article-title>Mining user rationale from software reviews</article-title>
          .
          <source>In Requirements Engineering Conference (RE)</source>
          ,
          <source>2017 IEEE 25th International</source>
          , pages
          <volume>61</volume>
          {
          <fpage>70</fpage>
          . IEEE,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [MKNS16]
          <string-name>
            <given-names>Walid</given-names>
            <surname>Maalej</surname>
          </string-name>
          , Zijad Kurtanovic, Hadeer Nabil, and
          <string-name>
            <given-names>Christoph</given-names>
            <surname>Stanik</surname>
          </string-name>
          .
          <article-title>On the automatic classi cation of app reviews</article-title>
          .
          <source>Requirements Engineering</source>
          ,
          <volume>21</volume>
          (
          <issue>3</issue>
          ):
          <volume>311</volume>
          {
          <fpage>331</fpage>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [MN15]
          <string-name>
            <given-names>Walid</given-names>
            <surname>Maalej</surname>
          </string-name>
          and
          <string-name>
            <given-names>Hadeer</given-names>
            <surname>Nabil</surname>
          </string-name>
          .
          <article-title>Bug report, feature request, or simply praise? on automatically classifying app reviews</article-title>
          .
          <source>In Requirements Engineering Conference (RE)</source>
          ,
          <source>2015 IEEE 23rd International</source>
          , pages
          <volume>116</volume>
          {
          <fpage>125</fpage>
          . IEEE,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [MNJR16]
          <string-name>
            <given-names>Walid</given-names>
            <surname>Maalej</surname>
          </string-name>
          , Maleknaz Nayebi, Timo Johann, and
          <string-name>
            <given-names>Guenther</given-names>
            <surname>Ruhe</surname>
          </string-name>
          .
          <article-title>Toward data-driven requirements engineering</article-title>
          .
          <source>IEEE Software</source>
          ,
          <volume>33</volume>
          (
          <issue>1</issue>
          ):
          <volume>48</volume>
          {
          <fpage>54</fpage>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [TBP+10]
          <string-name>
            <surname>Mike</surname>
            <given-names>Thelwall</given-names>
          </string-name>
          , Kevan Buckley, Georgios Paltoglou, Di Cai, and
          <string-name>
            <given-names>Arvid</given-names>
            <surname>Kappas</surname>
          </string-name>
          .
          <article-title>Sentiment strength detection in short informal text</article-title>
          .
          <source>Journal of the Association for Information Science and Technology</source>
          ,
          <volume>61</volume>
          (
          <issue>12</issue>
          ):
          <volume>2544</volume>
          {
          <fpage>2558</fpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>