<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Can we automate away the main challenges of end-to-end testing?</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Renaud Rwemalika, Marinos Kintis, Mike Papadakis, Yves Le Traon Interdisciplinary Centre for Security, Reliability and Trust (SnT) University of Luxembourg Luxembourg</institution>
          ,
          <country country="LU">Luxembourg</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>-Agile methodologies enable companies to drastically increase software release pace and reduce time-to-market. In a rapidly changing environment, testing becomes a cornerstone of the software development process, guarding the system code base from the insertion of faults. To cater for this, many companies are migrating manual end-to-end tests to automated ones. This migration introduces several challenges to the practitioners. These challenges relate to difficulties in the creation of the automated tests, their maintenance and the evolution of the test code base. In this position paper, we discuss our preliminary results on such challenges and present two potential solutions to these problems, focusing on keyword-driven end-to-end tests. Our solutions leverage existing software artifacts, namely the test suite and an automatically-created model of the system under test, to support the evolution of keyword-driven test suites. Index Terms-end-to-end testing, keyword-driven testing, automatic test generation, automatic test repair</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>I. PROBLEM AND MOTIVATION</title>
      <p>In continuous integration and agile methodologies, rapid
feedback is required for everyone involved and for any code
change no matter whether it is a small or a full product
release. A crucial part of this feedback comes from
automated tests and checks that are responsible for code integrity
and reliability. Agile methodologies enabled companies to
drastically increase software release pace and reduce
timeto-market. Unfortunately, these benefits are accompanied by a
considerable increase in testing costs: more tests need to be
written and executed more frequently.</p>
      <p>
        End-to-end testing aims at ensuring that the system under
test (SUT) is performing as designed from start to finish.
Traditionally, these tests are generated manually and designed
as a set of usage scenarios describing the manual steps to be
performed [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. However, performing manual end-to-end
testing is tedious, time-consuming and error-prone. Furthermore,
manual tests hinder the continuous integration (CI) pipeline
by creating a blocking point.
      </p>
      <p>
        To mitigate these problems, companies focus on automating
the test execution and test reporting in order to avoid the
CI pipeline bottleneck. However, the main test activities,
i.e., test case design, test scripting and test maintenance, are
challenging and are mainly performed manually [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>
        Enterprise-grade applications typically involve thousands of
tests [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], which are derived from the specifications and are
usually written in a natural language. Converting manual tests
to automated tests is a huge amount of work. Consequently,
as projects lack resources, only a fraction of manual tests end
up being automated. Evidently, this compromises the quality
of testing.
      </p>
      <p>
        Even when tests are automated, another problem arises:
tests tend to break easily when the SUT changes, resulting
in a need for test maintenance [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. However, large systems
can involve hundreds of tests, each one composed of multiple
steps, making this task cumbersome, especially in the case
of not modularized tests [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Indeed, literature suggests that
simple modifications to the SUT can result in a 30 to 70
percent changes to the tests [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]–[
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
      </p>
      <p>
        Researchers have introduced many methods that
automatically create tests using functional requirements [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]–[
        <xref ref-type="bibr" rid="ref15">15</xref>
        ].
However, these methods make strong assumptions that are not
applicable in many industrial contexts, i.e., they require having
a model of the SUT or impose a specific formalization of
the requirements. Building a model for non-trivial
applications can be time-consuming and error-prone while using the
formalization of the requirements imposed by a tool might not
be compatible with the development workflow and company
practices. Furthermore, understanding the requirements and
translating them in an automated process might be a challenge
in itself [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ].
      </p>
      <p>
        For quality assurance (QA) teams, a common industry
practice is to use light-weight techniques for automating
endto-end testing such as Keyword-Driven Testing [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] instead
of model-based testing and formal methods [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]. Indeed, QA
teams often do not have the knowledge to work with complex
modeling tools or write complex test scripts in full-fledged
programming languages. Thus, in this position paper we focus
on the challenges of creating and maintaining keyword-driven
test suites and propose solutions for automated test creation
and repair with the aim of reducing the overall cost of
keyword-driven test suite evolution.
      </p>
      <p>Based on our experience working almost one year with
our industrial partners on this topic, we have identified the
following challenges that practitioners are facing:</p>
      <sec id="sec-1-1">
        <title>1) Translating requirements to automated tests can be chal</title>
        <p>lenging. Usually practitioners do not work with formal
requirements but rather with informal ones that are
written in natural language. The level of abstraction in
the requirements affects the automation effort required.
Analogous problems can arise when converting manual
tests to automated ones. Practitioners would benefit
from automated techniques that support end-to-end
test automation.</p>
      </sec>
      <sec id="sec-1-2">
        <title>2) Automated tests tend to break often. End-to-end tests are</title>
        <p>
          fragile to the SUT changes. Thus, the evolution of the
SUT plays an important role in the maintenance cost
of these tests. If test maintenance becomes extremely
high, the risk of abandoning automated end-to-end
testing becomes equally high, as well. Practitioners need
automated techniques to support the evolution of the
tests.
3) It is not clear when tests should be automated.
Practitioners face the dilemma of when to automate:
automating early in the SUT development lifecycle can increase
the test maintenance cost; automating late can hinder the
agile workflow. Practitioners need automated
techniques to repair the tests designed early in the SUT
development lifecycle.
4) The complexity of the test code base can hinder the
creation and maintenance of the tests. As with any
software artifact, a bigger test code base results in increased
difficulty in maintaining it, potentially leading to test
code duplication [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ], test smells [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ] or even dead
test code. Practitioners require effective automated
techniques that can identify problems in the test code
base.
        </p>
        <p>Considering the above-mentioned challenges, it becomes
clear that there is a need to aid practitioners in creating
and maintaining keyword-driven tests. Towards this end, we
propose two solutions that leverage the structure of
KeywordDriven Testing to extract semantic information from the tests
and support their evolution.</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>II. KEYWORD-DRIVEN TESTING</title>
      <p>Keyword-Driven Testing aims at separating test design
from technical implementation, hence limiting exposure to
unnecessary details. Keyword-Driven Testing advocates that
this separation of concerns allows tests to be written more
easily, to create more maintainable tests and enables experts
from different fields and backgrounds to work together at
different levels of abstraction using named procedures called
keywords.</p>
      <p>Each keyword is composed of a set of keywords, enabling
an end-to-end test to be expressed in a hierarchical structure.
Keywords at the top of the hierarchy are typically
describing the business requirements (behavioral keywords) while
keywords at the bottom of the hierarchy define the technical
implementation (technical keywords) to interact with the SUT.</p>
      <p>Figure 1 shows an example of a test called “Transfer money
to other bank” expressed as a rooted ordered tree. The root
of the tree (purple rectangle) is the test that is executed
by calling all the children nodes in a depth first manner.
The intermediary nodes (beige rectangles) are called User</p>
      <p>Keywords since they are created by the tester. Finally, the leaf
nodes (dark green rectangles) are Library Keywords. These
keywords are executed by a driver. Library keywords are
responsible for either defining the control flow of the tests
or interacting with the SUT.</p>
      <sec id="sec-2-1">
        <title>A. Preliminary results on Keyword-Driven Testing</title>
        <p>The first results of our ongoing analysis show that by
separating concerns in keywords, and by allowing keyword
reuse, Keyword-Driven Testing can reduce the number of
changes performed during the test suite evolution by up to
70% (in comparison with techniques such as capture/replay).
However, the cost of creation and maintenance of
keyworddriven tests remains high. Indeed, when a new feature is
created, new keywords need to be created and existing ones
might need refactoring.</p>
        <p>Furthermore, slight modifications to the GUI of the SUT
will break low-level, technical keywords. Our experiments
show that technical keywords responsible for synchronizing the
tests (waiting for an element to appear on the screen, timeouts,
etc.) and locators (finding and accessing GUI elements) are
subject to most of the changes during maintenance activities.
These results suggest that while requirements remain the same,
technical implementations cause the tests to break and be
maintained.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>III. RESEARCH SOLUTIONS</title>
      <sec id="sec-3-1">
        <title>A. Towards Automated Test Repair</title>
        <p>As discussed in the previous section, during test suite
evolution technical keywords are more prone to change than
more abstract ones. This fact indicates that a large proportion
of test suite maintenance is generated by software changes that
do not affect the behavior of the SUT (in terms of its
requirements), but only its technical details. This is not restricted to
keyword-driven tests: many techniques such as capture/replay
and programmable scripts are susceptible to such changes. In
this position paper we intend to tackle the automatic repair of
such changes and, as a consequence, reduce the maintenance
cost of keyword-driven test suite evolution.</p>
        <p>
          Our approach is partially influenced by the work of Gao et
al. [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ]. In their work, they use the event-flow graph [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ] of
the SUT obtained by GUI ripping [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ] before and after the
SUT changes, referred to as versions V1 and V2 of the SUT
respectively. Next, they map the end-to-end tests to V1 and
rip the application to find a potential new path in order for the
tests to comply to V2 of the event-flow graph. This technique
presents two main drawbacks.
        </p>
        <p>First, the creation of the event-flow graph is done in a
depth-first manner. This process can be time-consuming, and
in case of transactions or text inputs, ripping the application
might become impossible or incomplete. In our approach,
we intend to leverage the hierarchical structure of
KeywordDriven Testing and limit the GUI exploration space, thus,
ripping only a subset of the application, using only children
from an ancestor node. Indeed, each keyword represents a unit,
therefore, the investigation of a new path can be bound to some
specific keywords, unlike in traditional programmable tests.
Second, the approach proposed does not take into account
oracles and requires user input to decide whether or not
the repaired tests meet the requirements. We intend to use
the information contained in the structure of keywords to
restrict the domain of exploration to find test candidates, hence
eliminating repairs that would violate test oracles.</p>
        <p>In a nutshell our approach is composed of the following
steps:</p>
        <p>1) Language model: By construction, Keyword-Driven
Testing offers a mapping between more abstract concepts
(requirements) and their technical representation (actions on
the SUT). For instance, in Figure 1, we can see that “Browser
is opened to login page” is a synonym of “Open Browser
To Login Page” (both keywords perform the same set of
actions on the SUT). Similarly, we can see that “Welcome
Page Should Be Open” performs two assertion on the system
represented by the leaf nodes of the subtree (“Location should
be” and “Title should be”). Given a large corpus, we can
leverage this property and build a language model that contains
semantic information for keywords. This information include
the keyword’s name (e.g., “Browser is opened to login page”),
its children and its leaf nodes.</p>
        <p>2) Base event-flow graph: During execution of the test
suite, each state of the application and actions on the SUT
are saved in an event-flow graph. While the graph is not a
complete model of the SUT, it holds as a baseline on how the
SUT is supposed to react during normal test execution.</p>
        <p>3) GUI ripping: During the GUI ripping phase, paths are
computed using guided GUI ripping. Ripping the GUI allows
to find alternative valid paths by extending the base
eventflow graph. As explained earlier, the exploration domain is
restricted, allowing changes only in the children keywords of
an ancestor of the keyword needs maintenance, e.g., failed
during the execution of the end-to-end test. The challenge
is to define which ancestor to select. Indeed, choosing an
ancestor too far might give the algorithm to much latitude and
violate the constraints, while choosing an ancestor too close
might restrict the exploration domain too much, therefore,
excluding the valid repair path. We propose to start with a
highly localized domain and extend it if no potential candidate
is found. Empirical results will give us a better indication of
the level to which the search domain can be extended.</p>
        <p>4) Validate candidate paths: Using parts-of-speech tagging
and natural language processing (NLP) against the corpus of
keywords and elements of the SUT, the algorithm can target
valid candidates for repair. The goal of this step is to restrict
the candidate pool by creating a constraint model based on the
semantic of the keywords’ hierarchy.</p>
        <p>
          5) Suggest best repair candidate: Find and suggest the path
with the minimal changes from V1 as the best repair candidate,
or flag the test as not repairable if no alternative paths are
found in V2 of the SUT. For non-repairable tests, we will
investigate whether a semi-automated solution is beneficial,
i.e., we will turn to the testers and use his/her guidance to
guide the search. Such a solution has been employed to several
other approaches with success, e.g., [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ], [
          <xref ref-type="bibr" rid="ref23">23</xref>
          ], [
          <xref ref-type="bibr" rid="ref24">24</xref>
          ].
        </p>
        <p>Note that while steps 1 and 2 are always performed during
test suite execution, steps 3 to 5 are only executed once a step
fails.</p>
      </sec>
      <sec id="sec-3-2">
        <title>B. Towards Automated Test Creation</title>
        <p>In the previous section, we presented how we intend to
automatically repair keyword-driven tests using a language
model generated from the test suite and the event-flow graph of
the SUT. As we discussed, we restricted the exploration phase
of the SUT in order to reduce the probability of deviating
from the original tests (the tests of version V1 of the SUT).
In this section, we explore the possibility of creating tests
given the requirements, i.e. behavioral keywords. For instance,
if we take Figure 1, given that we have the keywords “Login
into the banking system”, “Check for the balance amount in
the account”, “Transfer amount to other bank account” and
“Logout the banking system”, can we generate the rest of
the tree? To answer this question, we propose the following
approach:</p>
        <p>1) Matching similar keywords: The assumption behind this
step is that similar keywords (similar name) should behave
in a similar way (similar tree structure). The main challenge
of this step is extracting the differences between similar
keywords and map them to changes in children nodes. Using
the language model from Section III-A should allow us to build
the similarity metric and give indications on the meaning of
the differences.</p>
        <p>2) Exploring the SUT for new behaviors: New keywords
might not have any similar keywords in the corpus, while other
might have only a partial resolution of their structure. For such
keywords, we explore the SUT using GUI elements to find
meaning for a specific keyword. For instance, for the keyword
“User $username logs in with password $password”, a form
named “Login” with two input field having labels “user” and
“password” and a validation button is a good candidate. The
keyword would then be composed of three sub keywords: input
username, input password and validate form. The assumption
under which this technique works is that the test suite uses
similar naming conventions as the SUT.</p>
        <p>3) Keyword validation: Finally, proceeding in a similar
fashion as in Section III-A, each test is executed against the
SUT and the GUI is ripped in case the sequence of actions
is not valid. Unlike the repair phase, the only ground truth
known is the step provided by the user as input. The process
cannot change input keywords but doesn’t have restriction in
the changes applied to keywords generated in phase 1 and 2 as
long as they respect the language model. In the case a keyword
is similar to the keyword being generated, the new keyword
should behave in abstract similar fashion. For instance, two
assertions can be derived for a keyword called “Login Page
Should be Open” based on the example from Figure 1 using
the knowledge derived from the “Welcome Page Should be
Open” keyword.</p>
      </sec>
      <sec id="sec-3-3">
        <title>C. Limitations</title>
        <p>While our approach intends to tackle the limitations of
existing techniques (e.g. a better respect of the oracles using
the language model to limit the set of repair candidates
for automatic test repair and the relaxation of requirements
formatting for test generation), it still has some limitations.</p>
        <p>Since the language model is solely based on user keywords,
in order to have a robust model, we need a large corpus
of keywords, so that we can learn from it. Furthermore,
automating test creation and repair for completely new features
might be harder to perform. This point is partially addressed
by allowing for an extended exploration space of the SUT
(using information contained in its GUI). Empirical results
should provide us better indications on the performance of
the language model based on the size of the corpus and the
heterogeneity of the features tested.</p>
        <p>Another limitation could arise during the ripping phase.
While exploring new paths, some actions might be irreversible
(e.g. create an account, remove an item, etc.) and thus change
the available paths. Indeed, end-to-end tests often involve some
cleaning steps or teardown at the end of their execution to
clean the environment. Under such conditions, GUI ripping
might be inefficient or even impossible.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>IV. RELATED WORK</title>
      <sec id="sec-4-1">
        <title>A. Automated test repair</title>
        <p>
          Choudhary et al. [
          <xref ref-type="bibr" rid="ref25">25</xref>
          ] propose a tool called WATER to
automatically fix tests in web applications using differential
testing. Their technique is limited to four types of changes.
        </p>
        <p>
          Chang et al. [
          <xref ref-type="bibr" rid="ref24">24</xref>
          ] propose a tool called CHATEM to
automatically repair android tests. CHATEM extends the previous
work of the authors, based on the ATOM tool [
          <xref ref-type="bibr" rid="ref23">23</xref>
          ] and
is built upon Robot Framework, a Keyword-Driven Testing
framework, and Apium. This particular technique takes as
input two event sequence models (similar to representation
of the SUT as event-flow graph but with a slightly different
formalization) one for the previous and the current version of
the SUT and computes the changes between them and updates
the tests accordingly. In our approach, we do not require the
user to provide a model of the SUT. Instead, we use the
information available in the test suite and GUI ripping to repair
broken tests. However, unlike ATOM, our approach requires
the test suite to be executed.
        </p>
        <p>
          Gao et al. [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ] propose a technique to repair low level scripts
using a tool called SITAR. In their approach, the authors first
rip the application to generate an event-flow graph of the SUT
and map the test script to the graph. In the event a match is
not found, the script needs to be repaired. In our approach, we
want to repair tests and overcome the limitation of having to
rip the entire SUT to do so. Our technique attempts to provide
a more guided GUI ripping based on the knowledge provided
by the keyword-driven test suite.
        </p>
      </sec>
      <sec id="sec-4-2">
        <title>B. Test generation</title>
        <p>
          We are not the first to try to convert requirements written
in natural language into test cases. In the literature, we can
find a large body of work [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ], [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]–[
          <xref ref-type="bibr" rid="ref15">15</xref>
          ]. However, most of
these techniques require a (semi-)formal representation of the
requirements [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ], [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ], [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ], [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ], [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ] and/or manual
generation of models for the SUT [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ] and/or manual annotation
of the requirements [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]. In contrast, we try to use available
software artifacts (test suite, SUT, execution results, etc.)
to automatically build the knowledge base required for test
generation.
        </p>
        <p>
          Other similar studies include the one of Blasi et al. [
          <xref ref-type="bibr" rid="ref26">26</xref>
          ]
who propose a tool called jDoctor to automatically generate
Java Unit Tests based on JavaDoc. The tool combines
patternmatching with natural language processing and adds a notion
of semantic similarity allowing to use terms that are
syntactically different from the methods in the application to be used
in their grammar. The grammar is used to map sentences from
the JavaDoc to Java methods to generate unit tests. We plan
on using a similar approach to create our language model and
map keywords to GUI elements and API entry points.
        </p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>V. CONCLUSION</title>
      <p>End-to-end testing is used by companies to evaluate whether
the system under test conforms to its requirements. End-to-end
tests are typically performed manually which hinders the
adoption of agile methodologies and increases the time required to
find failures. The automation of such tests is desirable but
unfortunately very expensive. Even when such tests are fully
automated the cost of their maintenance can be prohibitive.
In this position paper, we discuss several challenges faced
by practitioners during end-to-end test suite evolution and
present two potential solutions to the problems of test creation
and maintenance, focusing on keyword-driven, end-to-end test
suites. We argue that by leveraging the hierarchical structure
of such test suites, we can create a language model providing
semantic information for each keyword. We intend to use this
understanding of the test suite to tackle the aforementioned
problems. The evaluation of the solution will be based on real
industrial data, so that we can investigate the true benefits and
drawbacks of the techniques.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>V.</given-names>
            <surname>Garousi</surname>
          </string-name>
          and
          <string-name>
            <given-names>M. V.</given-names>
            <surname>Ma</surname>
          </string-name>
          <article-title>¨ntyla¨, “When and what to automate in software testing? A multi-vocal literature review</article-title>
          ,
          <source>” Information and Software Technology</source>
          , vol.
          <volume>76</volume>
          , pp.
          <fpage>92</fpage>
          -
          <lpage>117</lpage>
          , aug
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>V.</given-names>
            <surname>Garousi</surname>
          </string-name>
          and
          <string-name>
            <given-names>F.</given-names>
            <surname>Elberzhager</surname>
          </string-name>
          , “Test Automation:
          <article-title>Not Just for Test Execution</article-title>
          ,” IEEE Software, vol.
          <volume>34</volume>
          , no.
          <issue>2</issue>
          , pp.
          <fpage>90</fpage>
          -
          <lpage>96</lpage>
          , mar
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>S.</given-names>
            <surname>Thummalapenta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Sinha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Singhania</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Chandra</surname>
          </string-name>
          , “Automating test automation,
          <source>” in 2012 34th International Conference on Software Engineering (ICSE)</source>
          , no. June. IEEE, jun
          <year>2012</year>
          , pp.
          <fpage>881</fpage>
          -
          <lpage>891</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>L. S.</given-names>
            <surname>Pinto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Sinha</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Orso</surname>
          </string-name>
          , “
          <article-title>Understanding myths and realities of test-suite evolution</article-title>
          ,”
          <source>in Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering - FSE '12</source>
          , vol.
          <volume>1</volume>
          . New York, New York, USA: ACM Press,
          <year>2012</year>
          , p.
          <fpage>1</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>R.</given-names>
            <surname>Yandrapally</surname>
          </string-name>
          , G. Sridhara, and
          <string-name>
            <given-names>S.</given-names>
            <surname>Sinha</surname>
          </string-name>
          , “
          <source>Automated Modularization of GUI Test Cases</source>
          ,” in
          <source>2015 IEEE/ACM 37th IEEE International Conference on Software Engineering</source>
          , vol.
          <volume>1</volume>
          . IEEE, may
          <year>2015</year>
          , pp.
          <fpage>44</fpage>
          -
          <lpage>54</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>A. M.</given-names>
            <surname>Memon and M. L. Soffa</surname>
          </string-name>
          , “
          <article-title>Regression testing of GUIs,” in Proceedings of the 9th European software engineering conference held jointly with 10th ACM SIGSOFT international symposium on Foundations of software engineering -</article-title>
          <source>ESEC/FSE '03</source>
          , vol.
          <volume>28</volume>
          , no.
          <issue>5</issue>
          . New York, New York, USA: ACM Press,
          <year>2003</year>
          , p.
          <fpage>118</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>M.</given-names>
            <surname>Grechanik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Xie</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Fu</surname>
          </string-name>
          , “
          <article-title>Maintaining and evolving GUIdirected test scripts</article-title>
          ,” in
          <source>2009 IEEE 31st International Conference on Software Engineering. IEEE</source>
          ,
          <year>2009</year>
          , pp.
          <fpage>408</fpage>
          -
          <lpage>418</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>H.</given-names>
            <surname>Pirzadeh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Shanian</surname>
          </string-name>
          , and
          <string-name>
            <given-names>F.</given-names>
            <surname>Davari</surname>
          </string-name>
          , “
          <article-title>A Novel Framework for Creating User Interface Level Tests Resistant to Refactoring of Web Applications,” in 2014 9th International Conference on the Quality of Information and Communications Technology</article-title>
          . IEEE, sep
          <year>2014</year>
          , pp.
          <fpage>268</fpage>
          -
          <lpage>273</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>C.</given-names>
            <surname>Nebut</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Fleurey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Le Traon</surname>
          </string-name>
          , and J.
          <string-name>
            <surname>-M. Jezequel</surname>
          </string-name>
          , “
          <article-title>Automatic test generation: a use case driven approach</article-title>
          ,
          <source>” IEEE Transactions on Software Engineering</source>
          , vol.
          <volume>32</volume>
          , no.
          <issue>3</issue>
          , pp.
          <fpage>140</fpage>
          -
          <lpage>155</lpage>
          , mar
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>C. M. Kirchsteiger</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Grinschgl</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Trummer</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Steger</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Weiss</surname>
            , and
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Pistauer</surname>
          </string-name>
          , “
          <article-title>Automatic Test Generation From Semi-formal Specifications for Functional Verification of System-on-Chip Designs,” in 2008 2nd Annual IEEE Systems Conference</article-title>
          . IEEE, apr
          <year>2008</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>8</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>C.-Y. Hsieh</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.-H. Tsai</surname>
          </string-name>
          , and Y. C. Cheng, “
          <article-title>Test-Duo: A framework for generating and executing automated acceptance tests from use cases</article-title>
          ,” in
          <source>2013 8th International Workshop on Automation of Software Test (AST)</source>
          . IEEE, may
          <year>2013</year>
          , pp.
          <fpage>89</fpage>
          -
          <lpage>92</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>C.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Pastore</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Goknil</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Briand</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Z.</given-names>
            <surname>Iqbal</surname>
          </string-name>
          , “
          <article-title>Automatic generation of system test cases from use case specifications,”</article-title>
          <source>in Proceedings of the 2015 International Symposium on Software Testing and Analysis - ISSTA 2015</source>
          . New York, New York, USA: ACM Press,
          <year>2015</year>
          , pp.
          <fpage>385</fpage>
          -
          <lpage>396</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>S. H.</given-names>
            <surname>Jensen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Thummalapenta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Sinha</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Chandra</surname>
          </string-name>
          , “
          <article-title>Test Generation from Business Rules</article-title>
          ,” in
          <source>2015 IEEE 8th International Conference on Software Testing</source>
          ,
          <article-title>Verification and Validation (ICST)</article-title>
          . IEEE, apr
          <year>2015</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>10</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>T.</given-names>
            <surname>Yue</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ali</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , “
          <article-title>RTCM: a natural language based</article-title>
          ,
          <source>automated, and practical test case generation framework,” in Proceedings of the 2015 International Symposium on Software Testing and Analysis - ISSTA 2015</source>
          . New York, New York, USA: ACM Press,
          <year>2015</year>
          , pp.
          <fpage>397</fpage>
          -
          <lpage>408</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>I. L</surname>
          </string-name>
          . Arau´ jo,
          <string-name>
            <given-names>I. S.</given-names>
            <surname>Santos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. B. F.</given-names>
            <surname>Filho</surname>
          </string-name>
          ,
          <string-name>
            <surname>R. M. C. Andrade</surname>
            , and
            <given-names>P. S.</given-names>
          </string-name>
          <string-name>
            <surname>Neto</surname>
          </string-name>
          , “
          <article-title>Generating test cases and procedures from use cases in dynamic software product lines</article-title>
          ,”
          <source>in Proceedings of the Symposium on Applied Computing - SAC '17</source>
          . New York, New York, USA: ACM Press,
          <year>2017</year>
          , pp.
          <fpage>1296</fpage>
          -
          <lpage>1301</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Jingfan</surname>
            <given-names>Tang</given-names>
          </string-name>
          ,
          <article-title>Xiaohua Cao, and</article-title>
          <string-name>
            <surname>A</surname>
          </string-name>
          . Ma, “
          <article-title>Towards adaptive framework of keyword driven automation testing</article-title>
          ,” in
          <source>2008 IEEE International Conference on Automation and Logistics</source>
          , no.
          <source>September</source>
          . IEEE, sep
          <year>2008</year>
          , pp.
          <fpage>1631</fpage>
          -
          <lpage>1636</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>I.</given-names>
            <surname>Banerjee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Nguyen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Garousi</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Memon</surname>
          </string-name>
          , “
          <article-title>Graphical user interface (GUI) testing: Systematic mapping</article-title>
          and repository,
          <source>” Information and Software Technology</source>
          , vol.
          <volume>55</volume>
          , no.
          <issue>10</issue>
          , pp.
          <fpage>1679</fpage>
          -
          <lpage>1694</lpage>
          , oct
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>T.</given-names>
            <surname>Lavoie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mrineau</surname>
          </string-name>
          , E. Merlo, and
          <string-name>
            <given-names>P.</given-names>
            <surname>Potvin</surname>
          </string-name>
          , “
          <article-title>A case study of ttcn-3 test scripts clone analysis in an industrial telecommunication setting</article-title>
          ,
          <source>” Information and Software Technology</source>
          , vol.
          <volume>87</volume>
          , pp.
          <fpage>32</fpage>
          -
          <lpage>45</lpage>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>V.</given-names>
            <surname>Garousi</surname>
          </string-name>
          and
          <string-name>
            <given-names>B.</given-names>
            <surname>Kk</surname>
          </string-name>
          , “
          <article-title>Smells in software test code: A survey of knowledge in industry and academia</article-title>
          ,
          <source>” Journal of Systems and Software</source>
          , vol.
          <volume>138</volume>
          , pp.
          <fpage>52</fpage>
          -
          <lpage>81</lpage>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Gao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zou</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A. M.</given-names>
            <surname>Memon</surname>
          </string-name>
          , “SITAR: GUI Test Script Repair,
          <source>” IEEE Transactions on Software Engineering</source>
          , vol.
          <volume>42</volume>
          , no.
          <issue>2</issue>
          , pp.
          <fpage>170</fpage>
          -
          <lpage>186</lpage>
          , feb
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <surname>A. M. Memon</surname>
          </string-name>
          , “
          <article-title>An event-flow model of GUI-based applications for testing</article-title>
          ,
          <source>” Software Testing, Verification and Reliability</source>
          , vol.
          <volume>17</volume>
          , no.
          <issue>3</issue>
          , pp.
          <fpage>137</fpage>
          -
          <lpage>157</lpage>
          , sep
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>A.</given-names>
            <surname>Memon</surname>
          </string-name>
          ,
          <string-name>
            <surname>I.</surname>
          </string-name>
          <article-title>Banerjee, and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Nagarajan</surname>
          </string-name>
          , “
          <article-title>GUI ripping: reverse engineering of graphical user interfaces for testing</article-title>
          ,
          <source>” in 10th Working Conference on Reverse Engineering</source>
          ,
          <year>2003</year>
          .
          <article-title>WCRE 2003</article-title>
          .
          <article-title>Proceedings</article-title>
          ., vol. 2003-Janua. IEEE,
          <year>2003</year>
          , pp.
          <fpage>260</fpage>
          -
          <lpage>269</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>X.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Pei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Wang</surname>
          </string-name>
          , and
          <string-name>
            <given-names>X.</given-names>
            <surname>Li</surname>
          </string-name>
          , “
          <article-title>ATOM: Automatic Maintenance of GUI Test Scripts for Evolving Mobile Applications</article-title>
          ,” in
          <source>2017 IEEE International Conference on Software Testing</source>
          ,
          <article-title>Verification and Validation (ICST)</article-title>
          . IEEE, mar
          <year>2017</year>
          , pp.
          <fpage>161</fpage>
          -
          <lpage>171</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>N.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Pei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. K.</given-names>
            <surname>Mondal</surname>
          </string-name>
          , and
          <string-name>
            <given-names>X.</given-names>
            <surname>Li</surname>
          </string-name>
          , “
          <article-title>Change-Based Test Script Maintenance for Android Apps</article-title>
          ,” in
          <source>2018 IEEE International Conference on Software Quality</source>
          ,
          <article-title>Reliability and Security (QRS)</article-title>
          . IEEE, jul
          <year>2018</year>
          , pp.
          <fpage>215</fpage>
          -
          <lpage>225</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>S. R.</given-names>
            <surname>Choudhary</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Versee</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Orso</surname>
          </string-name>
          , “WATER,” in Proceedings of the First International Workshop on End-to-End Test Script Engineering - ETSE '
          <fpage>11</fpage>
          . New York, New York, USA: ACM Press,
          <year>2011</year>
          , pp.
          <fpage>24</fpage>
          -
          <lpage>29</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>A.</given-names>
            <surname>Blasi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Goffi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Kuznetsov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gorla</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. D.</given-names>
            <surname>Ernst</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Pezze`, and</article-title>
          <string-name>
            <given-names>S. D.</given-names>
            <surname>Castellanos</surname>
          </string-name>
          , “
          <article-title>Translating code comments to procedure specifications,”</article-title>
          <source>in Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis - ISSTA 2018</source>
          . New York, New York, USA: ACM Press,
          <year>2018</year>
          , pp.
          <fpage>242</fpage>
          -
          <lpage>253</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>