<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Research on NLP for RE at Universita della Svizzera Italiana (USI): A Report</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Arianna Blasi</string-name>
          <email>arianna.blasi@usi.ch</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mauro Pezze</string-name>
          <email>mauro.pezze@usi.ch</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alessandra Gorla</string-name>
          <email>alessandra.gorla@imdea.org</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Michael D. Ernst</string-name>
          <email>mernst@cs.washington.edu</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>IMDEA Software Institute</institution>
          ,
          <addr-line>Madrid</addr-line>
          ,
          <country country="ES">Spain</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Universita della Svizzera italiana (USI)</institution>
          ,
          <addr-line>Lugano</addr-line>
          ,
          <country country="CH">Switzerland</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>University of Washington</institution>
          ,
          <addr-line>Seattle, WA</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>We report the activity of the Software Testing and Analysis Research (STAR) laboratory of USI Universita della Svizzera italiana about the use of NLP to automatically generate test cases from documentation in natural language. We rst introduce the research contributions of the group to contextualize the work related to NLP. We then summarize our research techniques to automatically generate test oracles from speci cations expressed in terms of Javadoc tags using NLP. We conclude by presenting the challenges of shifting the focus on more complex software systems, and on more complex artifacts in natural language.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Overview</title>
      <p>This report describes research to generate inputs and oracles to automatically test software systems. Our research
uses Natural Language Processing (NLP) to automatically produce executable speci cations from software
documentation written in natural language. This complements our other work on automatically generating test inputs
for applications with complex structured inputs [BDMP17], interactive and GUI-based applications [MPZ18],
concurrent and distributed software systems [TP18], and test cases for exercising software in the eld [GMPP17].</p>
      <p>This report presents our work on the generation of test oracles, such as assertions for test cases. We focus on
the automatic generation of semantically relevant oracles, that is, oracles that can reveal failures due to semantic
mismatches with respect to software requirements. Such oracles are more powerful than simple implicit and
regression oracles [BHM+15] that are commonly generated by automatic testing tools such as Randoop [PLEB07]
and Evosuite [FA13].</p>
      <p>In 2009 we started investigating redundancy intrinsically present in software systems [CGP09, CGPP15],
and we designed a technique that exploits such redundancy to generate test oracles [CGG+14]. Along the
same line of research, we designed and experimented with techniques and prototype tools to automatically
identify semantically equivalent method calls in Java programs [GGM+14]. We then developed techniques
for automatically generating assertions from speci cations expressed in natural language. We developed an
approach that exploits NLP to automatically infer executable speci cations from Javadoc comments, and use
such executable speci cations as test oracles [GGEP16, BGK+18]. In the next sections, we describe the results
that we obtained so far, and our research plans to automatically generate semantically relevant oracles from
speci cations in natural language with NLP.</p>
      <p>Copyright c 2019 by the paper's authors. Copying permitted for private and academic purposes.</p>
    </sec>
    <sec id="sec-2">
      <title>Past Research on NLP for RE</title>
      <p>Test cases can be generated from di erent sources of information [BP06, BHM+15]: requirements speci cations
(black-box and model-based testing), source code (white-box testing), possible faults (fault-based testing), former
versions and similar code (regression and metamorphic testing). When generating test cases by exploiting
requirements speci cations, the goal is to identify a nite set of test inputs that properly sample the execution
space (partition testing), and a set of assertions (oracles) that check the results of testing the software system.
Most of the approaches for automatically generating test cases proposed so far focus on generating test inputs
from formal and semi-formal speci cations [PY07]. Relatively little work takes advantage of natural language
requirements, and it uses simplistic techniques, such as pattern matching, to determine conditions related to
nullness of parameters (Tan et al.'s @tComment [TMTL12]), part-of-speech tagging and pattern-matching to
generates simple pre- and post-conditions (Pandita et al.'s ALICS [PXZ+12]). Some approaches take advantage
of the simpli cations induced by the structure of semi-formal speci cations to generate test inputs. For instance,
Wang et al. automatically derive test cases from use case speci cations [WPG+15]. Many techniques use NLP
to solve problems related to requirements quality, such as ambiguity [FDE+17], which are not strictly related to
testing oracle speci c issues.</p>
      <p>We have investigated more powerful and e ective approaches. As illustrated in the following simple example,
the Javadoc tags indicate the scope of the speci cations (i.e. whether it is a pre-condition on a parameter, or
a post-condition on the method execution result), and this simpli es the task of processing the information
for testing. However, Javadoc tags may predicate on program elements that are partially implicit, for instance
implicit subjects referring to parameters, and often use developers' jargon, for instance not null. Such features
pose a challenge for traditional natural language processing:
1 /
2 Merges the arrays in input
3
4 @param x the rst array, not null
5 @param y the second array, not null
6 @return an array which is the result of the merge, empty if both arrays are empty
7 @throws IllegalArgumentEsxception if either array is null
8 /
9 public Object[] merge(Object[] x, Object[] y) throws IllegalArgumentException f...g</p>
      <p>Listing 1: Sample Javadoc speci cation of a method
@param tags indicate the preconditions on the method input parameters, the @return tag (at most one) and
@throws tags (one for each exception that the method may rise) indicate the postconditions of the method
execution. The information expressed with Javadoc tags is useful to determine the correctness of the results of
the test executions, but it is necessary to translate it into executable code assertions to act as test oracles.</p>
      <p>For example, the translation of the speci cation expressed in the @param tags in Listing 1 is the following
executable assertion, which automatically acts as testing oracle:</p>
      <p>x != null &amp;&amp; y!=null
and the @return tag translation is:
(x == null || y==null)</p>
      <p>! java.lang.IllegalArgumentException
(x.length==0 &amp;&amp; y.length==0)
! result.length==0</p>
      <p>We designed and developed Toradocu [GGEP16] later extended to Jdoctor [BGK+18], a technique that
automatically infers executable assertions from comments in Javadoc tags expressed in natural language, as shown in
the example. The early Toradocu approach performs simple translations of exceptional postconditions. Jdoctor
extends Toradocu to all Javadoc tags and greatly improves the translation abilities of Toradocu, supporting also
semantic similarity for interpreting synonyms [KSKW15].</p>
      <p>Toradocu and Jdoctor use the Stanford Parser to produce a semantic graph for each sentence. First, they
preprocess the text in natural language to deal with the peculiarities of Javadoc comments, which are rarely complete
and grammatically sound English sentences. For example, most Javadoc speci cations lack punctuation, many
have implicit subjects and verbs, and often intermix mathematical or code notation with English. Also, di erent
types of tags need di erent preprocesses, and Toradocu and Jdoctor take this into account.</p>
      <p>The core idea of Toradocu and Jdoctor is to exploit information already present in the source code to produce
ready-to-use executable assertions. This approach does not require any other external intervention or e ort from
developers. The last experimental results obtained by executing Jdoctor on 6 popular open source Java projects
are encouraging: the tool achieves 92% recall and 83% precision on 829 translations [BGK+18]. Also, Jdoctor
assertions are o cially integrated with Randoop [PLEB07], and in our evaluation they produce test cases that
raise fewer false alarms and reveal more defects.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Research Plan on NLP for RE</title>
      <p>The results of our past work con rm the research hypothesis of our long term research plan: automatically
generating test inputs and oracles from requirements speci cations given in natural language with NLP is feasible
and e ective.</p>
      <p>In the short term, we plan to analyze free, unstructured text in Javadoc, beside the speci c Javadoc tags that
we already support. Moreover, we aim to extract information beyond functional properties. We plan to focus
on temporal, security, performance and other non-functional properties. As an example, Figure 1 shows some
temporal properties about call protocols1. Such information is very useful for reducing the amount false alarms
that a ect existing testing approaches.</p>
      <p>We will combine di erent approaches to interpret various properties expressed in natural language. We will
resort to Open Information Extraction to infer information from unstructured text [DCG13, FSE11, SBS+12],
and natural language parsing, pattern matching, semantic similarities and machine translation techniques to
match documentation with code elements.</p>
      <p>Our mid-term plan aims to extend Jdoctor to deal with information coming from other artifacts in
natural language, such as wikis, issue trackers, and community forums, which are commonly available for popular
applications. Even if these artifacts do not have a narrow scope as Javadoc comments do | i.e., Javadoc
refers to a speci c method or a speci c class | they are still often partially-structured, and thus we believe
that our techniques, if properly extended, can deal with them. The software engineering research community
already produced some techniques that derive test artifacts from system requirements such as use-case
requirements [WPG+15, MPGB18].</p>
      <p>Our long term plan is to de ne and develop a set of techniques to automatically test human-centric software
systems. Such systems have key features that make them di erent from traditional software systems: First and
foremost, the user is an integral part of the system. Secondly, they often integrate di erent sensors and physical
devices. Lastly, they often rely on machine learning components that drive the decisions of the system based on
the observed inputs from sensors. In a nutshell, human-centric software systems can be seen as an evolution of
ultra large software systems also called systems of systems [MPS08].</p>
      <p>To deal with human-centric software systems we need to radically change the considered scenario and widen the
set of techniques that we plan to use, mostly because the expected behavior of the system may be hard to predict,
and it is seldom speci ed in the requirements. So far we studied the problem of automatically generating test
inputs and oracles for functional properties of program units (classes and methods) with deterministic behaviors.</p>
      <p>To address the problem of properly testing human-centric software systems, we need to move from functional
properties of software components with deterministic behavior, to properties of subsystems with non deterministic
behavior. Non determinism may derive from concurrency, and may be due to machine learning components that
act di erently depending on the underlying model they use. Moreover, external physical sensors and users
involved in the system may increase the uncertainty of the expected behavior of the system.</p>
      <p>Despite these challenges, we still plan to focus our analysis on natural language artifacts, and aim to infer the
missing information to test such systems. No matter how complex such systems may be, their requirements still
have to be expressed in some form: It could be, for example, classical user stories. We will investigate what kind
of artifacts are mostly used to document such type of systems. We will study di erent ways of contextualizing the
fragmented and incomplete information expressed in natural language to solve ambiguities and incompleteness,
and we will properly exploit the inferred speci cation to test these complex systems.</p>
      <p>Acknowledgments
This work was partially supported by the Spanish projects DETEST, by the Madrid Regional projects BLUETS
and MadridFlightOnChip, and by the Swiss project ASTERIx: Automatic System TEsting of inteRactive
software applIcations (SNF-200021 178742). This material is also based on research sponsored by DARPA under
agreement numbers FA8750-12-2-0107, FA8750-15-C-0010, and FA8750-16-2-0032. The U.S. Government is
authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright notation
thereon.
[BDMP17] Pietro Braione, Giovanni Denaro, Andrea Mattavelli, and Mauro Pezze. Combining symbolic
execution and search-based testing for programs with complex heap inputs. In Proceedings of the
International Symposium on Software Testing and Analysis, ISSTA '17, pages 90{101. ACM, 2017.
[BGK+18] Arianna Blasi, Alberto Go , Konstantin Kuznetsov, Alessandra Gorla, Michael D. Ernst, Mauro
Pezze, and Sergio Delgado Castellanos. Translating code comments to procedure speci cations. In
Proceedings of the International Symposium on Software Testing and Analysis, ISSTA '18. ACM,
2018.
[BHM+15] Earl T. Barr, Mark Harman, Phil McMinn, Muzammil Shahbaz, and Shin Yoo. The oracle problem
in software testing: A survey. IEEE Transactions on Software Engineering, 41(5):507{525, 2015.</p>
      <p>Luciano Baresi and Mauro Pezze. An introduction to software testing. Electronic Notes in Theoretical
Computer Science, 148(1):89{111, 2006.
[CGG+14] Antonio Carzaniga, Alberto Go , Alessandra Gorla, Andrea Mattavelli, and Mauro Pezze.
Crosschecking oracles from intrinsic software redundancy. In Proceedings of the International Conference
on Software Engineering, ICSE '14, pages 931{942. ACM, 2014.</p>
      <p>Antonio Carzaniga, Alessandra Gorla, and Mauro Pezze. Fault handling with software redundancy.
In R. de Lemos, J. Fabre, C. Gacek, F. Gadducci, and M. ter Beek, editors, Architecting Dependable
Systems VI, pages 148{171. Springer, 2009.
[CGPP15] Antonio Carzaniga, Alessandra Gorla, Nicolo Perino, and Mauro Pezze. Automatic workarounds:
Exploiting the intrinsic redundancy of web applications. ACM Transactions on Software Engineering
and Methodologies, 24(3):16, 2015.</p>
      <p>Luciano Del Corro and Rainer Gemulla. Clausie: Clause-based open information extraction. In
Proceedings of the International Conference on World Wide Web, WWW '13, pages 355{366. ACM,
2013.</p>
      <p>Gordon Fraser and Andrea Arcuri. Whole test suite generation. IEEE Transactions on Software
Engineering, 39(2):276{291, 2013.
[FDE+17] Alessio Ferrari, Felice Dell`Orletta, Andrea Esuli, Vincenzo Gervasi, and Stefania Gnesi. Natural
language requirements processing: a 4d vision. 34(6):28{35, 2017.</p>
      <p>Anthony Fader, Stephen Soderland, and Oren Etzioni. Identifying relations for open information
extraction. In Proceedings of the Conference on Empirical Methods in Natural Language Processing,
EMNLP '11, pages 1535{1545. Association for Computational Linguistics, 2011.
Mauro Pezze and Michal Young. Software Testing and Analysis: Process, Principles and Techniques.
Wiley, 2007.</p>
      <p>Michael Schmitz, Robert Bart, Stephen Soderland, Oren Etzioni, et al. Open language learning for
information extraction. In Proceedings of the Joint Conference on Empirical Methods in Natural
Language Processing and Computational Natural Language Learning, EMNLP-CoNLL '12, pages
523{534. Association for Computational Linguistics, 2012.
[SBS+12]
[TP18]</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [GGEP16]
          <string-name>
            <given-names>Alberto</given-names>
            <surname>Go</surname>
          </string-name>
          , Alessandra Gorla,
          <string-name>
            <given-names>Michael D.</given-names>
            <surname>Ernst</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Mauro</given-names>
            <surname>Pezze</surname>
          </string-name>
          .
          <article-title>Automatic generation of oracles for exceptional behaviors</article-title>
          .
          <source>In Proceedings of the International Symposium on Software Testing and Analysis</source>
          ,
          <source>ISSTA '16</source>
          , pages
          <fpage>213</fpage>
          {
          <fpage>224</fpage>
          . ACM,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [GGM+14]
          <string-name>
            <surname>Alberto</surname>
            <given-names>Go</given-names>
          </string-name>
          , Alessandra Gorla, Andrea Mattavelli, Mauro Pezze, and
          <string-name>
            <given-names>Paolo</given-names>
            <surname>Tonella</surname>
          </string-name>
          .
          <article-title>Search-based synthesis of equivalent method sequences</article-title>
          .
          <source>In Proceedings of the ACM SIGSOFT International Symposium on Foundations of Software Engineering, FSE '14</source>
          , pages
          <fpage>366</fpage>
          {
          <fpage>376</fpage>
          . ACM,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [GMPP17]
          <string-name>
            <given-names>Luca</given-names>
            <surname>Gazzola</surname>
          </string-name>
          , Leonardo Mariani, Fabrizio Pastore, and
          <string-name>
            <given-names>Mauro</given-names>
            <surname>Pezze</surname>
          </string-name>
          .
          <article-title>An exploratory study of eld failures</article-title>
          .
          <source>In Proceedings of the International Symposium on Software Reliability Engineering</source>
          , ISSRE '
          <volume>17</volume>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>[KSKW15] Matt J. Kusner</surname>
            , Yu Sun,
            <given-names>Nicholas I. Kolkin</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Kilian</surname>
            <given-names>Q.</given-names>
          </string-name>
          <string-name>
            <surname>Weinberger</surname>
          </string-name>
          .
          <article-title>From word embeddings to document distances</article-title>
          .
          <source>In Proceedings of the International Conference on International Conference on Machine Learning, ICML '15</source>
          , pages
          <fpage>957</fpage>
          {
          <fpage>966</fpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [MPGB18]
          <string-name>
            <surname>Phu</surname>
            <given-names>X.</given-names>
          </string-name>
          <string-name>
            <surname>Mai</surname>
          </string-name>
          , Fabrizio Pastore, Arda Goknil, and
          <string-name>
            <surname>Lionel</surname>
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Briand</surname>
          </string-name>
          .
          <article-title>A natural language programming approach for requirements-based security testing</article-title>
          .
          <source>In Proceedings of the International Symposium on Software Reliability Engineering</source>
          , ISSRE '
          <volume>18</volume>
          , pages
          <fpage>58</fpage>
          {
          <fpage>69</fpage>
          . IEEE Computer Society,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [MPS08] [MPZ18]
          <string-name>
            <given-names>Hausi</given-names>
            <surname>Muller</surname>
          </string-name>
          , Mauro Pezze, and
          <string-name>
            <given-names>Mary</given-names>
            <surname>Shaw</surname>
          </string-name>
          .
          <article-title>Visibility of control in adaptive systems</article-title>
          .
          <source>In ULSSIS '08: Proceedings of the 2nd International Workshop on Ultra-Large-Scale Software-Intensive Systems</source>
          , pages
          <fpage>23</fpage>
          {
          <fpage>26</fpage>
          . ACM,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <given-names>Leonardo</given-names>
            <surname>Mariani</surname>
          </string-name>
          , Mauro Pezze, and
          <string-name>
            <given-names>Daniele</given-names>
            <surname>Zuddas</surname>
          </string-name>
          . Augusto:
          <article-title>Exploiting popular functionalities for the generation of semantic gui tests with oracles</article-title>
          .
          <source>In Proceedings of the International Conference on Software Engineering</source>
          , ICSE '
          <volume>18</volume>
          , pages
          <fpage>280</fpage>
          {
          <fpage>290</fpage>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [PLEB07]
          <string-name>
            <given-names>Carlos</given-names>
            <surname>Pacheco</surname>
          </string-name>
          ,
          <string-name>
            <surname>Shuvendu K. Lahiri</surname>
            ,
            <given-names>Michael D.</given-names>
          </string-name>
          <string-name>
            <surname>Ernst</surname>
          </string-name>
          , and Thomas Ball.
          <article-title>Feedback-directed random test generation</article-title>
          .
          <source>In Proceedings of the International Conference on Software Engineering</source>
          , ICSE '
          <volume>07</volume>
          , pages
          <fpage>75</fpage>
          {
          <fpage>84</fpage>
          . ACM,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [TMTL12]
          <string-name>
            <given-names>Shin</given-names>
            <surname>Hwei</surname>
          </string-name>
          <string-name>
            <surname>Tan</surname>
          </string-name>
          , Darko Marinov,
          <string-name>
            <given-names>Lin</given-names>
            <surname>Tan</surname>
          </string-name>
          , and
          <string-name>
            <surname>Gary</surname>
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Leavens</surname>
          </string-name>
          . @tComment:
          <article-title>Testing Javadoc comments to detect comment-code inconsistencies</article-title>
          .
          <source>In Proceedings of the International Conference on Software Testing, Veri cation and Validation</source>
          , ICST '
          <volume>12</volume>
          , pages
          <fpage>260</fpage>
          {
          <fpage>269</fpage>
          . IEEE Computer Society,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <given-names>Valerio</given-names>
            <surname>Terragni</surname>
          </string-name>
          and
          <string-name>
            <given-names>Mauro</given-names>
            <surname>Pezze</surname>
          </string-name>
          .
          <article-title>E ectiveness and challenges in generating concurrent tests for thread-safe classes</article-title>
          .
          <source>In Proceedings of the International Conference on Automated Software Engineering, ASE '18. ACM</source>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [WPG+15]
          <string-name>
            <surname>Chunhui</surname>
            <given-names>Wang</given-names>
          </string-name>
          , Fabrizio Pastore, Arda Goknil, Lionel Briand, and
          <string-name>
            <given-names>Zohaib</given-names>
            <surname>Iqbal</surname>
          </string-name>
          .
          <article-title>Automatic generation of system test cases from use case speci cations</article-title>
          .
          <source>In Proceedings of the International Symposium on Software Testing and Analysis</source>
          ,
          <source>ISSTA '15</source>
          , pages
          <fpage>385</fpage>
          {
          <fpage>396</fpage>
          . ACM,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>