Lightning Talk: Bringing Techniques from Software Engineering into Scientific Software Eric L. Seidel Gabrielle Allen University of California San Diego University of Illinois at Urbana-Champaign Abstract—Research software, for example the large body of specification, (2) validate (i.e. test) the implementation against simulation frameworks and libraries in Computational Science a range of inputs, or (3) synthesize an implementation directly & Engineering (CSE), is often designed, developed, and then from the specification. Verification is the most difficult option supported by students and postdoctoral researchers trained in scientific domains and not in the discipline of software as it inevitably requires the programmer or machine to provide engineering. While there are a number of initiatives to improve internal invariants that are stronger (and often not obvious) the training and support recognition of these research software than the top-level property we wish to prove, yet researchers developers, we believe there has been little attention in bringing have had much success in designing (semi) automatic systems the wealth of research and experience in the computer science for proving all of the aforementioned properties [1], [2], [3]. field of Software Engineering to bear on the development and support of research software. We can trade developer time for a less complete guarantee This paper is written from an experience base of working as of correctness by validating our program against a range of an undergraduate REU scholar as part of a team developing the inputs, driven by the specification. This can also be done au- Cactus Framework — a component-based simulation framework tomatically by treating the specification as an oracle, either in for high performance computing, before training as a PhD a black-box manner where we sample inputs exhaustively [4], researcher in Software Engineering. This lightning talk will describe several areas of Software Engineering that could be [5] or randomly [6], [7], [8], run the program, and check investigated for their benefit for improving the quality and that the corresponding outputs satisfy the specification, or in productivity of research software, and suggest opportunities and a white-box manner, where we observe the path taken by challenges for bringing the two communities together. a given set of inputs and then choose a subsequent set in order to trigger a different path [9], [10], [11]. Finally, if the I. S OFTWARE C ORRECTNESS problem domain is restricted enough, we can avoid writing an A fundamental concern for research software is correctness. implementation entirely and synthesize one directly from the Incorrect software can lead to crashes or deadlocks during specification [12] (or from a naive implementation [13]). a long-running computation and waste precious time and money, or worse, it could complete successfully but produce II. R EPRODUCIBILITY an incorrect result, e.g. due to rounding error in floating-point Another concern for scientific software is reproducibility. arithmetic. Researchers in the Programming Languages and A key part of the scientific method is reproducing prior Software Engineering communities have studied techniques experiments to increase our confidence in the results. However, for ensuring correctness extensively. Though our motivation the rapidly changing nature of software (and hardware) can is seldom connected to scientific software, many of the tech- make it very difficult to reproduce the exact experiment that niques and tools should be transferrable. led to the published results. We can improve reproducibility by Ensuring correctness usually starts with a high-level spec- treating each software artifact as a function of its inputs: the ification of a property that we wish our software to exhibit. source code, external library dependencies, operating system, For example, when writing low-level C code, we would likely perhaps even the machine architecture. Viewed through this want to ensure memory safety. If we are writing concurrent lens, it becomes natural to describe the process of constructing or parallel code, we may want to ensure deadlock freedom. the software artifact in a (quasi) declarative language [14], Software that deals with floating-point arithmetic will likely [15], [16], [17]. The more precise the specification of the build want to ensure numerical stability, i.e. that the rounding error process — for example we might specify an exact version is bounded at some reasonable level. In any program we will requirement for inputs instead of a range, or even provide a want to ensure some form of functional correctness, that the cryptographic hash of the inputs to avoid ghost updates — the program computes what we expect it to. The specification can more confident we can be that future researchers will be able be given at varying levels of precision, from a plain english to reproduce our experiments. description to a precise mathematical specification that can be mechanically checked. III. O PPORTUNITIES AND C HALLENGES Armed with a specification, we have three choices: (1) we In this section we introduce technical and social challenges can verify (i.e. prove) that our implementation satisfies the to adopting Software Engineering research in the Scientific This work is licensed under a CC-BY-4.0 license. Software community. A. Technical committies, that well-supported software engineering tools would aid their own work. It is possible that research software developers are aware of the advances in software verification, but choose not to R EFERENCES adopt the tools and techniques because they are simply too [1] D. Beyer, T. A. Henzinger, R. Jhala, and R. Majumdar, “The software difficult to use. The most advanced and expressive verification model checker blast,” Int. J. Softw. Tools Technol. Trans., vol. 9, no. 5-6, engines require semi-manual proofs in a logical language, pp. 505–525, 13 Sep. 2007. [2] C. Flanagan, K. R. M. Leino, M. Lillibridge, G. Nelson, J. B. Saxe, which can often result in proofs that are significantly longer and R. Stata, “Extended static checking for java,” in Proceedings of the than the program. On the other hand, many automatic verifiers ACM SIGPLAN 2002 Conference on Programming Language Design produce errors that are difficult to decipher, containing either and Implementation, ser. PLDI ’02. New York, NY, USA: ACM, 2002, pp. 234–245. too little or too much contextual information to properly guide [3] P. M. Rondon, M. Kawaguci, and R. Jhala, “Liquid types,” in Proceed- the user. Even testing can be difficult when it is unclear what ings of the 29th ACM SIGPLAN Conference on Programming Language the correct answer to a problem is, as is often the case in Design and Implementation, ser. PLDI ’08. New York, NY, USA: ACM, 2008, pp. 159–169. scientific simulations [18]. A more human-centered approach [4] D. Marinov and S. Khurshid, “TestEra: A novel framework for auto- to verification and validation may be beneficial. mated testing of java programs,” in Automated Software Engineering, 2001.(ASE 2001). Proceedings. 16th Annual International Conference on. IEEE, 2001, pp. 22–31. B. Social [5] C. Runciman, M. Naylor, and F. Lindblad, “Smallcheck and lazy small- An equally important challenge to address is the lack of check: Automatic exhaustive testing for small values,” in Proceedings of the First ACM SIGPLAN Symposium on Haskell, ser. Haskell ’08. collaboration between researchers in the Software Engineering New York, NY, USA: ACM, 2008, pp. 37–48. and Scientific Software communities. Without open avenues [6] K. Claessen and J. Hughes, “QuickCheck: A lightweight tool for random of communication between the two disciplines, it is hard for testing of haskell programs,” in Proceedings of the Fifth ACM SIGPLAN International Conference on Functional Programming, ser. ICFP ’00. the Software Engineers to keep abreast of the state-of-the- New York, NY, USA: ACM, 2000, pp. 268–279. art in Scientific Software development and the challenges we [7] C. Csallner and Y. Smaragdakis, “JCrasher: an automatic robustness might help address. Similarly, research software developers tester for java,” Softw. Pract. Exp., vol. 34, no. 11, pp. 1025–1050, 1 Sep. 2004. must go out of their way to stay up-to-date with the advances [8] C. Pacheco, S. K. Lahiri, M. D. Ernst, and T. Ball, “Feedback-Directed in Software Engineering and Programming Languages. random test generation,” in 29th International Conference on Software Engineering, ser. ICSE ’07, 2007, pp. 75–84. One mechanism which could be used to bridge the gap [9] P. Godefroid, N. Klarlund, and K. Sen, “DART: Directed automated between the academic software engineering community and random testing,” in Proceedings of the 2005 ACM SIGPLAN Conference the teams developing research software would be to provide on Programming Language Design and Implementation, ser. PLDI ’05. New York, NY, USA: ACM, 2005, pp. 213–223. targeted graduate student internships. Currently, many gradu- [10] K. Sen, D. Marinov, and G. Agha, “CUTE: A concolic unit testing ate students in computer science spend short periods working engine for C,” in Proceedings of the 10th European Software Engineer- for industry or at national laboratories, but these internships ing Conference Held Jointly with 13th ACM SIGSOFT International Symposium on Foundations of Software Engineering, ser. ESEC/FSE- usually remain within the student’s own discipline. We propose 13. New York, NY, USA: ACM, 2005, pp. 263–272. that software engineering students be encouraged to intern [11] N. Tillmann and J. de Halleux, “Pex–White box test generation for with research software development teams and become embed- .NET,” in Tests and Proofs, ser. Lecture Notes in Computer Science, B. Beckert and R. Hähnle, Eds. Springer Berlin Heidelberg, 2008, pp. ded in these groups to understand the culture, software stack, 134–153. and development challenges and take these experiences back [12] E. Darulova and V. Kuncak, “Sound compilation of reals,” in Proceed- with them as they develop new tools and approaches. Such ings of the 41st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages - POPL ’14. New York, New York, USA: internships could be funded by agencies such as NSF already ACM Press, 2014, pp. 235–248. funding software development, potentially as supplements to [13] P. Panchekha, A. Sanchez-Stern, J. R. Wilcox, and Z. Tatlock, “Automat- existing awards. ically improving accuracy for floating point expressions,” in Proceedings of the 36th ACM SIGPLAN Conference on Programming Language One challenge of this approach will be to make sure that Design and Implementation, ser. PLDI 2015. New York, NY, USA: the students involved would be able to publish a creditable ACM, 2015, pp. 1–11. output from their internship which would be recognized as [14] “Chef — IT automation for speed and awesomeness — chef,” https: //www.chef.io/chef/, accessed: 2016-7-10. part of their career development — there is a danger that work [15] “Puppet - the shortest path to better software,” https://puppet.com/, carried out in this area would be viewed by peers as simply accessed: 2016-7-10. engineering rather than research. [16] E. Dolstra, A. Löh, and N. Pierron, “NixOS: A purely functional linux distribution,” J. Funct. Programming, vol. 20, no. Special Issue 5-6, pp. Another challenge faced by the Software Engineering com- 577–615, 2010. munity is how to reward researchers for producing not just [17] G. Allen, F. Löffler, E. Schnetter, and E. L. Seidel, “Component spec- techniques, but real, usable software artifacts that employ ification in the cactus framework: The cactus configuration language,” in GRID. IEEE, 2010, pp. 359–368. those techniques. As in many fields, the reward structure for [18] J. C. Carver, R. P. Kendall, S. E. Squires, and D. E. Post, “Software Software Engineering researchers heavily emphasizes publi- development environments for scientific and engineering software: A cation of novel techniques, which is somewhat at odds with series of case studies,” in 29th International Conference on Software Engineering (ICSE’07). ieeexplore.ieee.org, May 2007, pp. 550–559. providing (and supporting!) a usable artifact. The research software community may be able to help in that regard by making the case to funding agencies, and hiring and tenure