<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Alfred V Aho, Ravi Sethi, and Jeffrey D Ullman.</journal-title>
      </journal-title-group>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.1109/ICSM.2011.6080813</article-id>
      <title-group>
        <article-title>Utilize Syntax Tree Transformations as a C/C++ Test Seam</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>G A´BOR M A´RTON</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>ZOLTA´ N PORKOL A´B</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Eo¨tvo¨s Lora´ nd University</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Author's addresses: Eo ̈tvo ̈s Lora ́ nd University, Faculty of Informatics, Dept. of Programming Languages and Compilers, Pa ́ zma ́ ny Pe ́ter se ́ta ́ ny 1/C</institution>
          ,
          <addr-line>Budapest, Hungary, H-1177</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2018</year>
      </pub-date>
      <volume>8</volume>
      <issue>1986</issue>
      <fpage>27</fpage>
      <lpage>30</lpage>
      <abstract>
        <p>In C/C++, test code often in uences the source code of the unit we want to test. During the test development process we often have to introduce new interfaces to replace existing dependencies, e.g. a supplementary setter or constructor function has to be added to a class for dependency injection. In many cases, extra template parameters are used for the same purpose. These solutions may have serious detrimental e ects on the code structure and sometimes on the run-time performance as well. We can use non-intrusive tests (tests which does not require any modi cation in the production code) to avoid these disadvantages. Also, in legacy code bases often there are few or no unit tests. Refactoring such code in order to provide tests is almost impossible because we cannot verify correctness without having unit tests; hence it is a vicious circle. We can break the circle with non-intrusive tests, i.e. without actually modifying the production code. The di erent non-intrusive testing methods have di erent weaknesses. In this paper we introduce our new non-intrusive testing approach which complements the existing techniques. Our solution transforms certain parts of the original abstract syntax tree of the production code for the purpose of testing.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. INTRODUCTION</title>
      <p>10:2
We can break the circle with non-intrusive tests, i.e. without actually modifying the production code
[Feathers 2004; Ru¨ egg and Sommerlad 2012]. We refer to all those testing approaches which require
source code modification as intrusive testing and those which does not as non-intrusive testing.</p>
      <p>An abstract syntax tree (AST) is a data structure which represents the hierarchical syntactic
structure of the source program [Aho et al. 1986]. They are widely used in compilers as an intermediate
representation of the program through several stages that the compiler requires. During certain
compilation stages the abstract syntax tree is being transformed, e.g. in case of C++ template instantiation
existing nodes are being modified and new nodes are added to the tree.</p>
      <p>In this paper, we investigate a new, non-intrusive, syntax tree transformation based testing
approach. This paper is organized as follows. In Section 2 we describe existing unit testing methods to
replace dependencies. We show how we can replace dependent functions or types with syntax tree
transformations in Section 3. In Section 4, we outline future work and current limitations. Our paper
concludes in Section 5.</p>
    </sec>
    <sec id="sec-2">
      <title>2. TEST SEAMS IN C/C++</title>
      <p>A seam is an abstract concept introduced by Feathers to identify points where we can break
dependencies [Feathers 2004]. The goal is to have a place where we can alter the behaviour of a program
without modifying it in that place; this is important because editing the source code is often not an
option [Ru¨ egg and Sommerlad 2012]. Feathers, Ru¨ egg and Sommerlad define four different kinds of
seams for C++ [Feathers 2004; R u¨egg and Sommerlad 2012; mockator.com 2006]:
(1) Link seam: Change the definition of a function via some linker specific setup.
(2) Preprocessor seam: With the help of the preprocessor, redefine function names to use an alternative
implementation.
(3) Object seam: Based on inheritance to inject a subclass with an alternative implementation.
(4) Compile seam: Inject dependencies at compile-time through template parameters.
The enabling point of a seam is the place where we can make the decision to use one behaviour or
another. Different seams have different enabling points. For example, replacing the constructor
argument for the implementation of an interface with a mock implementation when a unit test is set up is
an object seam with the constructor as an enabling point.</p>
      <p>Link and preprocessor seams can be used to write non-intrusive tests. However, object and compile
seams may be used for such purpose only if the unit under test already has the proper architecture.
For instance, in case of object seams the unit must have a constructor (or setter) function to setup a
different implementation for the dependency. In case of compile seams, the unit must be a template
and it must have a template parameter via which we can mock the dependency. Often, these
architectural requirements are not satisfied, therefore the use of object and compile seams demand that we
intrusively change the source code of the unit.</p>
    </sec>
    <sec id="sec-3">
      <title>2.1 Link Seams</title>
      <p>We can use a link seam e.g. to replace the implementation of a free function or a member function
as presented in Figure 1. On the one hand, when we need to test the bar() function then we should
link the test executable to the MockA.o object file. On the other hand, we should link the production
code with A.o. Link-time dependency replacement is not possible if the dependency is defined in a static
library or in the same translation unit where the SUT is defined. It is also not feasible to use link seams
if the dependency is implemented as an inline function [Ru¨ egg and Sommerlad 2012]. This makes the
use of this seam cumbersome or practically impossible when the dependant unit is a template or when
// A.hpp
void foo();
// A.cpp
void foo() { ... };
// MockA.cpp
void foo() { ... };
// B.cpp
#include "A.hpp"
void bar() { foo(); ... }
the dependency is a template. The enabling point for a link seam is always outside of the program
text. This makes the use of link seams quite difficult to identify. On top of all, link-time substitution
requires strong support from the build system we are using. Thus, we might have to specialize the
building of the tests for each and every unit. This does not scale well and can be really demanding
regarding to maintenance.</p>
    </sec>
    <sec id="sec-4">
      <title>2.2 Preprocessor Seams</title>
      <p>Preprocessor seams can be applied to replace the invocation of a global function to an invocation of a
test double [Mihalicza et al. 2011]. Let us consider the code snippet in Figure 2. We can replace the
1 void *my_malloc(size_t size) {
2 //...
3 return malloc(size);
4 }
5 void my_free(void *p) {
6 //...
7 return free(p);
8 }
9 #define free my_free
10 #define malloc my_malloc
11
12 void unitUnderTest() {
13 int *array = (int *)malloc(4 * sizeof(int));
14 // do something with array
15 free(array);
16 }
standard malloc() and free() functions with our own implementation. One example usage may be to
collect statistics or do sanity checks in my_malloc and my_free functions. These seams can be applied
conveniently in C, but not in C++. As soon as we use namespaces, the preprocessor might generate code
which cannot be compiled because of the ambiguous use of names. Hazardous side effects of macros
are also well known.</p>
      <p>In our previous work we introduced a compiler instrumentation based test seam, which does not
have the mentioned drawbacks of link and preprocessor seams [Marton and Porkolab 2017; M a´rton
2017]. However, that technique is still not applicable for all cases, for example we can replace simple
function dependencies only but we cannot replace types.</p>
    </sec>
    <sec id="sec-5">
      <title>3. THE AST TRANSFORMATION SEAM</title>
      <p>Existing test seams for C++ have some disadvantages that prevents us from using them or make us
reluctant to use them. Link seams do not work with inline functions and require patching the build
system. Preprocessor seams are problematic with classes and namespaces. Object seams and compile
seams are intrusive and often demand that we widen the public interface. Therefore, we seek for a
new seam which does not have the above-mentioned disadvantages and makes it possible to write
non-intrusive tests and to replace dependent types with test double types.</p>
    </sec>
    <sec id="sec-6">
      <title>3.1 AST Transformations</title>
      <p>Abstract syntax tree transformations are used for several purposes. In the groovy programming
language [Koenig et al. 2007] it is used as a form of compile-time metaprogramming; it allows the
developers to hook into the compilation process and to modify the AST before bytecode generation.</p>
      <p>The LLVM/Clang compiler infrastructure [Lattner 2008] makes it possible to write certain source
to source transformations which modify the existing AST of a source file and produces the modified
source code as an output. This may be used for instance, to annotate the source code with additional
statements that create statistics about memory allocations.</p>
      <p>CodeBoost is a source-to-source transformation tool for domain-specific optimisation of C++
programs [Bagge et al. 2003; Bagge and Haveraaen 2003]. CodeBoost performs parsing, semantic analysis
and pretty-printing, and transformations can be implemented either in the Stratego program
transformation language, or as user-defined rewrite rules embedded within the C++ program.</p>
      <p>Constant propagation, a well known data-flow optimization problem, can be implemented on
abstract syntax trees in Stratego, a rewriting system extended with programmable rewriting strategies
for the control over the application of rules and dynamic rewrite rules for the propagation of
information [Olmos and Visser 2002].</p>
      <p>Prolog can be retrofitted with concrete syntax and a seamless interaction of concrete syntax
fragments with an existing ”legacy” meta-programming system (based on abstract syntax) can be achieved
[Fischer and Visser 2003]. This can result in a considerable reduction of the code size and an improved
readability of the Prolog source code.</p>
    </sec>
    <sec id="sec-7">
      <title>3.2 The AST Import Algorithm</title>
      <p>Traditionally, C/C++ compilers process one translation unit (TU) at a time. However, there are use
cases when we have to access information from several (or all) translation units of a program. For
example, in case of link time optimization, or in case of cross translation unit static analysis. The
LLVM/Clang cross translation unit static analysis [Horvath 2017] executes the same single translation
unit analysis algorithms but on an AST which is synthesized by merging several ASTs of the individual
translation units. This merged AST is created by first parsing one TU and then importing the ASTs of
the other TUs one-by-one into the first.</p>
      <p>Not considering templates, two AST nodes are structurally equivalent if they are
—builtin types and refer to the same type, e.g. int and int are structurally equivalent,
—function types and all their parameters have structurally equivalent types,
—record types and all their fields in order of their definition have the same identifier names and
structurally equivalent types,
—variable or function declarations and they have the same identifier name and their types are
structurally equivalent.</p>
      <p>Algorithm 1 describes the procedure of importing an AST. The algorithm has to ensure that the
structurally equivalent nodes in the different translation units are not getting duplicated in the merged
AST. E.g. if we include the definition of the vector template (#include &lt;vector&gt;) in two translation
units, then their merged AST should have only one node which represents the template. Also, we have
to discover one definition rule (ODR) violations. For instance if there is a class definition with the same
name in both translation units, but one of the definition contains different number of fields. We refer
to the translation unit into which we import as the to TU and the one which we import from as the
from TU.</p>
    </sec>
    <sec id="sec-8">
      <title>3.3 Reusing the AST Import Algorithm for Testing</title>
      <p>As our contribution we extended the AST importer mechanism. With this extension we can
transmogrify the original AST of the production code to an AST which contains the nodes of test doubles instead
of the nodes of the original dependencies. The key of our approach is that we import everything into
the test double’s context. When the production code is being merged and when we reach the import
ALGORITHM 1: AST Import Algorithm
for each top level Decl in the F rom TU do</p>
      <p>F oundDeclsList lookup all declarations in the T o TU with the same name of Decl
for each F oundDecl in F oundDeclsList do
if F oundDecl is structurally equivalent with Decl then</p>
      <p>T oDecl F oundDecl
mark Decl as imported
end
else if Decl is a function then overloaded function</p>
      <p>T oDecl create a new AST node in T o TU
import dependent declarations and types of T oDecl
end
else</p>
      <p>report ODR violation
end
end
if F oundDeclsList is empty then</p>
      <p>T oDecl create a new AST node in T o TU
import dependent declarations and types of T oDecl
end
end
of a dependency then the lookup finds the definition of the test double and that definition is getting
used in the rest of the imported AST. We introduced a new attribute ([[test_double]]) which modifies
the behaviour of the structural equivalence check. During the check of two declarations/types, if this
attribute is present in the declaration/type of the ”to” context then we consider the two entity as
equivalent. With the help of this new attribute and the modified equivalence check we are able to change the
import algorithm to use the test double in the production code. We created a prototype implementation
based on the LLVM compiler infrastructure which is publicly available at [M a´rton 2018].</p>
      <p>Figure 3 demonstrates how our new method can be used to replace a simple function in the AST. The
foo() function is defined in the foo.c translation unit and it calls the bar() function, which is defined
in the very same TU. The fake_bar.c file contains the definition for the test double which we want to
use as the replacement function in the test. The test double has the same name as the original function
we want to replace. Normally, having two definitions would cause ODR violation during the import
procedure, but this time the test double has the special attribute attached to its definition. In test.c
we have the actual test code, which exercises the production code (foo()) and formulates expectations
on that. If foo() calls the test double bar() then the return value will be 13 and then main() will
return with success. First we have to create the serialized AST files for each source file (-emit-pch).
Then we import the AST of the production code (foo.ast) and the test code (test.ast) into the AST
of the test double (fake_bar.ast). We achieve this by providing the AST files in the proper order and
with the -ast-merge command line option of the compiler. Once we have the merged AST then we emit
an object file from that (-emit-obj) and during this process all C/C++ semantic checks are executed.
Thus, if the modified AST is semantically incorrect then we will be notified.</p>
      <p>In Figure 4 we show how we can replace a record type. This time we have a class as a dependency
(Bar) defined in foo.h. The test double in fake_bar.h simply replaces the only f() member function,
which returns a fake value. The rest of the mechanism is quite similar as before, the only difference is
that this time we have to generate AST files from headers.</p>
      <p>The previous example in Figure 4 presented that we can easily replace types if the substitute type
has the same functions defined as the original one. Figure 5 demonstrates we can even extend the
// foo.c
int bar() { return 1; }
int foo() {</p>
      <p>return bar();
}
// fake_bar.c
[[test_double]]
int bar() {</p>
      <p>return 13;
}
// test.c
int foo();
int main() {
return</p>
      <p>foo() == 13 ? 0 : 1;
}
# Create the ASTs
clang -cc1 -emit-pch -o foo.ast foo.c
clang -cc1 -emit-pch -o fake_bar.ast fake_bar.c
clang -cc1 -emit-pch -o test.ast test.c
# Merge the ASTs and emit an object
clang -cc1 -ast-merge fake_bar.ast -ast-merge foo.ast -ast-merge test.ast /dev/null \</p>
      <p>-emit-obj -o merged.o
# Link
clang -o output merged.o
# Run the test
./output
replaced type with additional functionality, which may be very useful if we want to create mock test
doubles and not just simple stub objects. In the test (test.c) we would like to use a test double which
can be setup to return with a given value in its member function (f()). To achieve this we provide a
new header file (mock_bar_modifiers_fwd.h) which contains prototype definitions for such functions
which can modify the dependency via a pointer. Since we use a pointer we do not have to see the whole
definition of the Bar class a forward declaration is sufficient. With this auxiliary header file we can
create the AST dump for the test file. The definition of the setter function(s) are placed in the test
double file (mock_bar.h) together with the definition of the mock type. The rest of the mechanism is
similar to the previous examples, we have to merge the AST files of the production and test code into
the AST of the test double. There is no need to create a separate AST file for the auxiliary header.
3.4</p>
    </sec>
    <sec id="sec-9">
      <title>Evaluation</title>
      <p>Compared to link seams, our new seam makes it possible to replace functions even if they are inlined
or statically linked. Besides, we can replace class definitions and virtually anything because we do the
replacement on the AST level. Similarly to link seams, our method requires additional help from the
// foo.h
class Bar {</p>
      <p>int a;
public:</p>
      <p>int f() { return 1; }
class Foo {
public:</p>
      <p>Bar bar;
int ff() { return bar.f(); }
struct [[test_double]] Bar {
int f_return_value = 0;
int f() {</p>
      <p>return f_return_value;
};</p>
      <p>}
}
void set_f_return_value(Bar* bar, int value) {
bar-&gt;f_return_value = value;
// mock_bar_modifiers_fwd.h
struct Bar;
void set_f_return_value(Bar* bar, int value);
// test.c
#include "foo.h"
#include "mock_bar_modifiers_fwd.h"
int main() {</p>
      <p>Foo foo;
set_f_return_value(&amp;foo.bar, 13);
return foo.ff() == 13 ? 0 : 1;
}
build system since we have to create the AST files separately and we have to merge them manually.
Also the enabling point of our seam is in the build scripts, as in the case of the link seams.</p>
      <p>Similarly to the preprocessor seams, with our approach we can replace not just functions but types
or anything which has a name. However, our approach does the replacement on the AST level, while
the preprocessor seam does that on the token level.</p>
    </sec>
    <sec id="sec-10">
      <title>4. LIMITATIONS AND FUTURE WORK</title>
      <p>It is a common limitation with the three seams (link, preprocessor and astimporter) that we can replace
only those dependencies which have an identifier name. E.g., it is not possible to replace a C++11
lambda function nested in a call of an algorithm from the standard template library:
std::vector&lt;int&gt; nums{1, 2, 3, 4, 5};
std::for_each(nums.begin(), nums.end(), [](int &amp;n){ n++; });</p>
      <p>As of this writing, the implementation of AST importer algorithm in LLVM/Clang is quite immature.
Especially, the import of C++ templates and C++11 expressions are not well supported. Thus, it is an
important future work to improve this realization because that would open up the possibility to be able
to experiment with our new technique on real C++ projects.</p>
    </sec>
    <sec id="sec-11">
      <title>CONCLUSION</title>
      <p>Unit testing plays a crucial role in modern software development. The existing testing methods either
require intrusive changes in the original source of the production code or they have some sort of
disadvantages which makes them hard to use. In this paper we presented a new unit testing technique
which is based on transforming the AST of the production code in a way that dependent functions or
types are replaced with test doubles. This method has clear advantages compared to previous
solutions because we can replace types as well, not just simple functions. Also, the technique exploits a
compiler aided syntax tree modification which provides much safer transformation than we can get via
tokeniser based translations with the preprocessor. Similarly to linker based solutions, our method
requires additional help from the build system. Our new approach provides an alternative way to create
non-intrusive tests, thus making it easier to verify legacy software or to write tests without
bringing in unwanted structural changes in the production code. A prototype implementation based on the
LLVM/Clang compiler infrastructure is publicly available at [Ma´ rton 2018].</p>
    </sec>
  </body>
  <back>
    <ref-list />
  </back>
</article>