Bringing Incremental Builds to Continuous Integration Guillaume Maudoux Kim Mens guillaume.maudoux@uclouvain.be kim.mens@uclouvain.be ICTEAM Institute for Information & Communication Technologies, Electronics and Applied Mathematics Université catholique de Louvain, Louvain-la-Neuve, Belgium starts with source code management (A). When com- mits are ready, they enter continuous integration (B) Abstract where software is built and tested in different stages of increasing complexity and duration. To bring these Incremental builds can considerably speed up modifications to end-users, code needs to be deployed the edit-compile-test loop during program de- (D) and released (E). This pipeline is all about tooling, velopment. While this technique is commonly and a particular importance is attached to the build used for local builds, it is seldom enabled system (C) used by developers to test their code lo- during continuous integration. Correctness of cally and by continuous integration to both produce continuous integration builds is usually pre- artifacts and drive tests. As pressure increases to get ferred to compilation speed. With current frequent releases and shorter development cycles, re- tools, it is not trivial to get both properties, lease engineers try to optimize this pipeline at each but we show that it is theoretically achiev- stage. In particular, the goal of continuous integra- able with a carefully designed system. We tion is to have short build times and fast test feedback first assess the potential benefits of incremen- to developers. Optimizing the edit-compile-test loop tal builds in continuous integration environ- shortens developers’ critical paths and accelerates soft- ments. We then identify different reasons that ware evolution. A recent study on Travis CI [2] showed prevent that optimisation in practice. From that “ [t]he main factor for delayed feedback from test these, we derive requirements to be met by runs is the time required to execute the build ”. future build systems to support incremental Incremental compilation is the technique by which a continuous integration. These steps are illus- build system can efficiently update previous build arti- trated with existing tools, research insight and facts. With sufficient information about dependencies sample cases from industry. Ultimately, this between build steps, it is possible to tell which steps paper defines a new research direction at the are impacted by the updated sources and run only intersection of build systems and continuous these to generate correct build artifacts. The other integration. optimization based on build task dependencies is par- allel execution of independent tasks. These two fea- 1 Context tures explain why developers use build systems despite the added hassle of configuring them. However, dur- The field of release engineering is concerned with the ing continuous integration (CI) software is generally pipeline of techniques and operations between isolated built from clean sources, making incremental builds developers writing source code and end users enjoying impossible. released applications. Release engineers maintain and This paper is motivated by the need to understand optimize this pipeline from end to end. A classical re- why incremental builds are considered essential to de- lease engineering pipeline [1], as depicted on Figure 1, velopers on their own machines but are seldom used in continuous integration environments where they Copyright ⃝c by the paper’s authors. Copying permitted for private and academic purposes. could seemingly bring the same outstanding advan- Proceedings of the Seminar Series on Advanced Techniques and tages. Shunning incremental builds, release engineers Tools for Software Evolution SATToSE 2017 (sattose.org). voluntarily lose the performance gain of updating only 07-09 June 2017, Madrid, Spain. the required build products. We will show that re- 1 Figure 1: An overview of the release engineering pipeline, borrowed and freely simplified from [1] lease engineers prefer correctness over optimization, ment described in this section illustrates how much re- and that it is not so trivial to scale incremental com- sources and developer time could be spared with such pilation to CI environments. Because continuous in- an optimization. tegration is mostly automated, release engineers trade To assert the usefulness of incremental builds in slower compilations against better quality guarantees continuous integration, we reproduced the workload on build products. Computer time is quite cheap, and of an integration server. A list of consecutive com- tracking build issues is time consuming. We will see mits are build incrementally from a complete build that in that context their decision makes sense. of their parent. The build time for the first commit But decreasing the edit-compile-test loop time is a serves as an estimate of the duration of a full rebuild pressing concern of most organizations. Developers from clean sources. Comparing duration of incremen- waiting for CI test results start to work on unrelated tal builds with full rebuilds gives an idea of the po- issues. When the compilation finishes, developers need tential gain. This experiment relies on the simplifying to switch context to understand test failures on their assumptions detailed below. previous issue. With very slow compilations, develop- In practice, we analyzed the last 63 commits1 on ers may need to juggle with three issues at the same mozilla-inbound2 . As its name suggests, this repos- time, losing more concentration to context switches. itory belongs to the Mozilla foundation. It contains Some developers may prefer to wait for CI results in- mainly the source code of the web browser Firefox. stead of switching contexts too often, wasting even mozilla-inbound accepts briefly tested patches and gets more time. merged daily into mozilla-central if the branch passes That being said, there are other challenges involved extensive tests. It is the normal landing point for non- in enabling incremental builds on continuous integra- critical changes [6]. In their analysis of Mozilla’s patch tion servers. For example, builds are distributed on management infrastructure, Rodrigo Souza et al. [8] a cluster of workers. Workers may not be assigned have shown that continuous integration is key to early to consecutive builds, making incremental builds less patch backout and stable builds on stable branches. useful. For each of the 62 most recent commits, we have In this article, we show that wasted time on com- launched an incremental build, the 63rd was used to pilation is significant, and that there are solutions to bootstrap the incremental build chain, and it’s build make build systems correct. We outline the features time was used to estimate the duration of a full re- required for the next generation of build system to en- build. The results are presented in Figure 2, in an em- able incremental builds in continuous integration se- pirical cumulated density function (CDF) where the 62 tups while guaranteeing correctness in all cases. builds are ordered by build time. The horizontal line represents the time required for a full rebuild, and the other one represents the time for an incremental build 2 Incremental build experiment of a given commit from it’s parent build. The coloured We claim that incremental builds would benefit CI en- area between these lines represents the potential gain vironments. Were it not the case, there would be no 1 hg log -r341198:341136 reason to update CI tools to support it. The experi- 2 https://hg.mozilla.org/integration/mozilla-inbound/ 2 onds, which is still more than 20 times faster than a 3h31min full rebuild. This means that for a majority of builds, compilation time can be reduced by one and sometimes two orders of magnitude. 89,4% The graph shows that 65% of the commits have lit- tle impact on the build. For the first 20% there is no impact at all. The cost of an incremental build 56min 87,8% in that case is minimal. It is the time needed by the build system to detect that nothing needs to be done. Such commits are more frequent than one would ex- pect. A closer look at their contents shows documen- Figure 2: Empirical distribution function of incremen- tation changes, but also changes to the test suite. For tal build times and the potential gain they introduce these changes in particular, rebuilding Firefox from compared to full rebuilds. scratch seems a real waste of time. After these com- in time due to incremental builds. Two different met- mits follows a chunk of minor commits. They change rics are reported. The wall time corresponds to the files with low impact on the final Firefox executable. real time elapsed between the start and the end of the They contain for example .js files that are bundled as- incremental build. CPU time on the other hand is the is in the final data archive. Incremental builds need cumulative time spent on the task by all the CPU’s. only rebuild the said archive to complete. The right As the experiment was run on an Intel i5 with two part of the curve contains average-impact changes (.c cores and four threads, the CPU time cannot be more files or their equivalent) requiring some rebuild and than four times the wall time. CPU time can be seen high-impact changes (.h files or equivalent) that trig- as the expected duration of a sequential build on the ger the recompilation of complete sub-components of same machine. the project. Very high impact changes include changes Of course, this experiment is not perfect. From the to the build definition itself (change in compiler flags), number of commits analyzed, it is difficult to draw and mass edits of source files (update of copyright meaningful conclusions. We also conducted two fast headers for example). Such big changes where not in- experiments on the much smaller i3 project3 which ex- cluded in our test set. posed a similar distribution of the build times. We can The potential gain of achieving correct incremental expect that many repositories follow the same pattern, builds during CI can be represented on the graph as following the intuition that most changes are small and the area between the incremental build time (CDF) local and very few are spanning the entire project. We and the time required for a full rebuild represented by made the assumption that the duration of a full re- the horizontal line of corresponding color. When ap- build is stable across patches. The duration of the plied to CPU time, this gives the amount of wasted only full-rebuild is represented by the horizontal line CPU time. During that time, CI workers are building on Figure 2. It seems reasonable to think that big, products that are known to be the same as before as longstanding projects like Firefox have little variations per the build specification. Applied to the wall time, in their full rebuild duration, but this hypothesis re- this gives the useless latency introduced in the edit- mains to be checked. Also, all the times were sampled compile-test loop of programmers. This has a real im- only once. Displaying the results as a CDF provides pact on programmer workflow. When they work on some smoothing, but we cannot provide an estimate updates to the test suite, the result of the tests on the of the variability of build times. This should also be CI infrastructure will be delayed by that amount of addressed with more experiments. Finally, we assume time. that it is possible to build each commit incrementally from its parent. As we will see in the next section, this This problem is obviously not restricted to Mozilla. is not trivial when builds are spread on a cluster. If, as preliminary experiments tend to show, the cumu- These limitations of our initial experiment do not lative density function of incremental build times has invalidated its main conclusion, however. A closer the same shape for most projects, then nearly 90% of analysis of the Firefox case shows that when there is CPU time on CI workers is wasted. From this experi- nothing to rebuild, an incremental build takes 29 sec- ment, we see that incremental builds could save a lot onds +/-1 (wall clock time). This is 100 times faster resources, either developer attention and time or bare than a full rebuild, and happened for 10 builds out of computing power. Despite this, the optimization is 61. The median case spent two minutes and 23 sec- not used on continuous integration infrastructures. In the next section, we take a closer look at the reasons 3 https://i3wm.org/ behind this absence. 3 3 Build properties sive definition. Recent build systems like pluto [4], bazel5 , buildsome6 , button7 and many more claim Having illustrated that incremental builds could pro- to be correct. Although their definitions of correct- vide huge performance gains, we now try to explain ness are not equivalent, most agree on the fact that why such a well known optimization is never used in correct build systems never need full rebuilds because automated build farms. We identify three reasons that their incremental build algorithms will always man- make incremental builds unfit for CI environments. age to “do the right thing”. This is a feature of new a. (lack of) Correctness: Existing build systems im- build systems only. Existing projects will need time plement incremental builds incorrectly. The ob- to switch to these build systems. For example, it is tained build result with incremental compilation still recommended8 to clean the source tree before and differs from the one produced by a clean, full re- after every build of the Linux kernel. build. The number of new build systems claiming to be b. (lack of) Control over the environment: Build sys- correct is a strong hint that old, approximative build tems rely heavily on their environment. If not systems do not meet today’s expectations of quality strictly controlled, products built from the same and usability. source may not be the same. After multiple in- cremental builds, these impurities stack up, and 3.2 Control of the environment correctness is impacted. Definitions of correctness used by legacy build systems c. (lack of) Linearity: CI uses clusters to build and are local to the source tree. There is a broader scope test patches. A given worker may not have access in which builds happen, commonly referred to as the to the build products of previous builds, forcing environment. The environment is composed of all the it to rebuild from scratch. This requires commu- parameters that are not under direct control of the nication and caching between builds. build system but that may influence the build results We discuss details of these aspects in the following and therefore its correctness with respect to the whole sections. environment. Variability in build environments ap- pears with the set of available applications, their in- 3.1 Correctness stalled versions, the processor architecture of the build machine and many other parameters. Even small set- At the heart of correctness resides the idea that a tings like the hostname, the amount of RAM or the build system should behave like a pure function. For current time can impact builds in subtle ways. a given initial state, the build system should always For incremental builds to produce correct results, produce the same result. This means that incremental they must either take these impurities into account, or builds and full rebuilds from the same sources should great care should be taken to run successive incremen- always yield the same products. There are various tal builds into (nearly) identical environments. Next- ways to break current build systems. While some gen build systems and package managers are starting build systems are more robust than others, most can to run their subprocesses in controlled sandboxes to be tricked into producing wrong results. This topic tame the impact of developer’s environments. For ex- has been extensively studied by Mike Shal [7]. We will ample, sandboxing was added in bazel in late 20159 . only give a small example of such incorrect behavior Also, CI platforms have long seen the usefulness of of make. Because it has no memory of its previous controlling the build environments for reproducibility invocations, make cannot tell sources from old build purposes. Docker and VirtualBox are probably the products. When a target ’old’ is renamed into a tar- most used tools to manage build environments. Start- get ’new’, make does not delete ’old’ when it produces ing from this, it remains to be ensured that build sys- ’new’. The resulting state has both ’old’ and ’new’, tems can detect environment changes. Based on that which would not be the case if the system had been information, they can rebuild the impacted parts or build from clean sources. This example also shows that take other appropriate actions. Smarter build system correctness requires to keep a trace of previous execu- could track important parameters from the environ- tions. However make is a stateless program. It explores ment and try to hide the others from the build tasks. the working tree at each invocation to discover what needs to be done. If it remains stateless, it will never 5 https://bazel.build/ achieve correctness as described here. 6 http://buildsome.github.io/buildsome/ Together with his article, Mike Shal developped 7 http://jasonwhite.github.io/button/ 8 https://www.csee.umbc.edu/courses/undergraduate/ tup4 , a correct build system as per his own exten- CMSC421/fall02/burt/projects/howto_build_kernel.html 4 http://gittup.org/tup/ 9 https://blog.bazel.build/2015/09/11/sandboxing.html 4 The nix package manager [3] is an interesting to be done in the future. project in this context. It installs every software pack- With these three issues addressed, we have sketched age in a different path. As packages cannot conflict, the key issues to be addressed by a build system good multiple versions of the same package can be installed enough to be used in continuous integration. With at the same time. When using full paths, it is trivial to these three features, the build system would meet the detect different versions of the same binary; they have correctness and reproducibility requirements of release different paths. This also means that you have control engineers while still providing the full benefit of in- over which version of libraries are used. Nix packages cremental builds. All these aspects are individually do not depend on common installation paths like /usr present in different pieces of software. The challenge or /lib. They reference their dependencies directly by resides in making them work together. their full path. nix allows fine-grained control over execution environments and system configuration, and 4 Ideal build system could be used to wrap old-fashioned build systems that do not take the environment into account. In order to support continuous integration for CI, we By properly controlling the environment and tak- propose a design for an ideal build system. To anchor ing it into account, build systems will produce correct this discussion in the real world, we propose to reuse incremental builds at the level required by release en- existing software. This would avoid building yet an- gineers. other build system from the ground up. The tools pro- posed here could most probably be replaced by other equivalent tools. 3.3 Lack of linearity At the core, we need a correct algorithm to manage The last obstacle for CI to use incremental builds is incremental builds. The algorithm from tup is correct that CI builds are usually performed by a cluster of (see Section 3.1) and could be reused as-is. However, different workers. These workers cannot perform effi- tup does not take into account changes to the host cient incremental builds because they probably did not system, and does not provide caching. build the most recent version before the change being To manage the environment, we propose to use nix tested. If nothing is done, chances are that the worker as described in Section 3.2. Combining tup with nix, will have to catch up with intermediary commits and we can generate build configuration files containing the build more products than strictly needed. This has full path to the command to run. The way nix in- been observed at mozilla and motivated the develop- stalls packages ensures that the command path will ment of a shared build cache named sccache [5]. On change when the executable or one of its dependencies a lower level, it also requires workers to maintain a change. This way tup will detect changes to the build cache of previous builds. This is not easy to do when command and trigger rebuilds where needed. As tup workers are provisioned dynamically on the cloud. In already tracks environment variables, and nix allows both cases, the solution is identical. A cache of build to remove much variability between systems, the build results needs to be maintained. environment would be under control. Caching build results is not a new idea. It is exactly Finally, to provide sharing across the workers, we what the ccache program performs for the C/C++ need an efficient and correct caching mechanism. The compiler. By comparing input files’ content and com- cache should be implemented at the lowest level, be- piler flags, ccache is able to cache object files and neath tup. Whenever tup invokes a build command, detect equivalent compiler invocations. In that situ- the cache will first check if there is already a build ation, ccache does not call the compiler but directly result for that exact compilation. If the same step al- produces the object files from its cache. As explained ready happened, the cache produces the build outputs above, this does only work if all the relevant parame- without running the build step. If not, the build step ters are taken into account. Failing to detect parame- is executed and the results are added to the cache and ter changes, ccache may produce incorrect object files. shared on the cluster. The cache needs to take into A similar mechanism could be implemented for entire account the content of the input files, and all the com- build steps. Given a build system with enough con- pilation parameters like the compiler version and its trol on the environment of its sub-processes, it could flags as well as the available packages. cache the result of sub-commands execution and query While there exist excellent caching tools, significant the cache before each sub-command. A well-designed research and testing is required for this step. In partic- cache could be shared by all the CI workers making the ular it requires to design a proper encoding of a build knowledge of one available to the others at once. Many command and all its dependencies to make lookup of questions remain on the feasibility and practical per- previous identical builds efficient. Also, caching the formance of such an implementation. Much remains result of a build may be more complex than it looks. 5 How should we cache the result of a build step that ap- References pends to an existing file for example? Finally, caching [1] Bram Adams and Shane McIntosh. “Modern is never easy to implement correctly, and may turn Release Engineering in a Nutshell – Why Re- out to be impractical. For example, network access searchers Should Care”. In: IEEE 23rd Interna- could be slower than a local rebuild. This part of the tional Conference on Software Analysis, Evolu- tool requires investigations to ensure its feasibility and tion, and Reengineering (SANER). Vol. 5. Mar. practicality. 2016, pp. 78–90. doi: 10.1109/SANER.2016.108. A tool designed like this should provide correct and efficient incremental builds for distributed continuous [2] Moritz Beller, Georgios Gousios, and Andy Zaid- integration environments. Like most optimizations, man. ńOops, my tests broke the buildż: An Anal- this tool would add complexity and introduce poten- ysis of Travis CI Builds with Github. Tech. rep. tial bugs to be discovered. A careful assessment of the PeerJ Preprints, 2016. final tool would have to be realized. While we cannot [3] Eelco Dolstra, Merijn De Jonge, Eelco Visser, et provide guarantees that it would be adopted by indus- al. “Nix: A Safe and Policy-Free System for Soft- try, we believe that a huge speed-up in compilation ware Deployment.” In: LISA. Vol. 4. 2004, pp. 79– time could make it attractive. 92. [4] Sebastian Erdweg, Moritz Lichter, and Manuel 5 Conclusion Weiel. “A Sound and Optimal Incremental Build With incremental builds, we have argued that it is System with Dynamic Dependencies”. In: SIG- possible to improve build performance by one or two PLAN Not. 50.10 (Oct. 2015), pp. 89–106. issn: orders of magnitude. That speed-up is needed because 0362-1340. doi: 10.1145/2858965.2814316. it is on the critical path of developers that must wait [5] Mike Hommey (glandium). Shared compilation before getting test results. We have analysed the main cache experiment. Jan. 2014. url: https : / / barriers to the usage of incremental builds as an op- glandium.org/blog/?p=3054. timization technique for continuous integration. For [6] Kim Moir. “Built to Scale: The Mozilla Re- each of these we explained how it can be handled and provided examples of existing tools capable of doing it. lease Engineering toolbox”. EclipseCon. 2014. Taken together, these features address all the remain- url: https : / / www . eclipsecon . org / na2014 / ing issues to deploy incremental builds to CI. A tool session / built - scale - mozilla - release - that implements them all, possibly by reusing existing engineering-toolbox.html. partial tools, would make it possible to get the same [7] Mike Shal. Build system rules and algorithms. speed-up on CI as for local incremental builds. Having 2009. url: http : / / gittup . org / tup / build _ provided the need and path, we are ready to start the system_rules_and_algorithms.pdf. journey. [8] Rodrigo Souza, Christina Chavez, and Roberto A Bittencourt. “Rapid releases and patch backouts: 6 Acknowledgments A software analytics approach”. In: IEEE Soft- We thank Bram Adams for his comments on an earlier ware 32.2 (2015), pp. 89–96. version of this paper. We would also like to thank Kim Moir and Mike Shal for their attentive reading and their eagerness at discussing release engineering practices at Mozilla. 6