Lightning Talk: Bringing Techniques from Software
        Engineering into Scientific Software
                               Eric L. Seidel                                        Gabrielle Allen
                   University of California San Diego                  University of Illinois at Urbana-Champaign


   Abstract—Research software, for example the large body of         specification, (2) validate (i.e. test) the implementation against
simulation frameworks and libraries in Computational Science         a range of inputs, or (3) synthesize an implementation directly
& Engineering (CSE), is often designed, developed, and then          from the specification. Verification is the most difficult option
supported by students and postdoctoral researchers trained
in scientific domains and not in the discipline of software          as it inevitably requires the programmer or machine to provide
engineering. While there are a number of initiatives to improve      internal invariants that are stronger (and often not obvious)
the training and support recognition of these research software      than the top-level property we wish to prove, yet researchers
developers, we believe there has been little attention in bringing   have had much success in designing (semi) automatic systems
the wealth of research and experience in the computer science        for proving all of the aforementioned properties [1], [2], [3].
field of Software Engineering to bear on the development and
support of research software.                                        We can trade developer time for a less complete guarantee
   This paper is written from an experience base of working as       of correctness by validating our program against a range of
an undergraduate REU scholar as part of a team developing the        inputs, driven by the specification. This can also be done au-
Cactus Framework — a component-based simulation framework            tomatically by treating the specification as an oracle, either in
for high performance computing, before training as a PhD             a black-box manner where we sample inputs exhaustively [4],
researcher in Software Engineering. This lightning talk will
describe several areas of Software Engineering that could be         [5] or randomly [6], [7], [8], run the program, and check
investigated for their benefit for improving the quality and         that the corresponding outputs satisfy the specification, or in
productivity of research software, and suggest opportunities and     a white-box manner, where we observe the path taken by
challenges for bringing the two communities together.                a given set of inputs and then choose a subsequent set in
                                                                     order to trigger a different path [9], [10], [11]. Finally, if the
                   I. S OFTWARE C ORRECTNESS
                                                                     problem domain is restricted enough, we can avoid writing an
   A fundamental concern for research software is correctness.       implementation entirely and synthesize one directly from the
Incorrect software can lead to crashes or deadlocks during           specification [12] (or from a naive implementation [13]).
a long-running computation and waste precious time and
money, or worse, it could complete successfully but produce                              II. R EPRODUCIBILITY
an incorrect result, e.g. due to rounding error in floating-point       Another concern for scientific software is reproducibility.
arithmetic. Researchers in the Programming Languages and             A key part of the scientific method is reproducing prior
Software Engineering communities have studied techniques             experiments to increase our confidence in the results. However,
for ensuring correctness extensively. Though our motivation          the rapidly changing nature of software (and hardware) can
is seldom connected to scientific software, many of the tech-        make it very difficult to reproduce the exact experiment that
niques and tools should be transferrable.                            led to the published results. We can improve reproducibility by
   Ensuring correctness usually starts with a high-level spec-       treating each software artifact as a function of its inputs: the
ification of a property that we wish our software to exhibit.        source code, external library dependencies, operating system,
For example, when writing low-level C code, we would likely          perhaps even the machine architecture. Viewed through this
want to ensure memory safety. If we are writing concurrent           lens, it becomes natural to describe the process of constructing
or parallel code, we may want to ensure deadlock freedom.            the software artifact in a (quasi) declarative language [14],
Software that deals with floating-point arithmetic will likely       [15], [16], [17]. The more precise the specification of the build
want to ensure numerical stability, i.e. that the rounding error     process — for example we might specify an exact version
is bounded at some reasonable level. In any program we will          requirement for inputs instead of a range, or even provide a
want to ensure some form of functional correctness, that the         cryptographic hash of the inputs to avoid ghost updates — the
program computes what we expect it to. The specification can         more confident we can be that future researchers will be able
be given at varying levels of precision, from a plain english        to reproduce our experiments.
description to a precise mathematical specification that can be
mechanically checked.                                                          III. O PPORTUNITIES AND C HALLENGES
   Armed with a specification, we have three choices: (1) we           In this section we introduce technical and social challenges
can verify (i.e. prove) that our implementation satisfies the        to adopting Software Engineering research in the Scientific
This work is licensed under a CC-BY-4.0 license.                     Software community.
A. Technical                                                       committies, that well-supported software engineering tools
                                                                   would aid their own work.
   It is possible that research software developers are aware
of the advances in software verification, but choose not to                                       R EFERENCES
adopt the tools and techniques because they are simply too          [1] D. Beyer, T. A. Henzinger, R. Jhala, and R. Majumdar, “The software
difficult to use. The most advanced and expressive verification         model checker blast,” Int. J. Softw. Tools Technol. Trans., vol. 9, no. 5-6,
engines require semi-manual proofs in a logical language,               pp. 505–525, 13 Sep. 2007.
                                                                    [2] C. Flanagan, K. R. M. Leino, M. Lillibridge, G. Nelson, J. B. Saxe,
which can often result in proofs that are significantly longer          and R. Stata, “Extended static checking for java,” in Proceedings of the
than the program. On the other hand, many automatic verifiers           ACM SIGPLAN 2002 Conference on Programming Language Design
produce errors that are difficult to decipher, containing either        and Implementation, ser. PLDI ’02. New York, NY, USA: ACM, 2002,
                                                                        pp. 234–245.
too little or too much contextual information to properly guide     [3] P. M. Rondon, M. Kawaguci, and R. Jhala, “Liquid types,” in Proceed-
the user. Even testing can be difficult when it is unclear what         ings of the 29th ACM SIGPLAN Conference on Programming Language
the correct answer to a problem is, as is often the case in             Design and Implementation, ser. PLDI ’08. New York, NY, USA: ACM,
                                                                        2008, pp. 159–169.
scientific simulations [18]. A more human-centered approach         [4] D. Marinov and S. Khurshid, “TestEra: A novel framework for auto-
to verification and validation may be beneficial.                       mated testing of java programs,” in Automated Software Engineering,
                                                                        2001.(ASE 2001). Proceedings. 16th Annual International Conference
                                                                        on. IEEE, 2001, pp. 22–31.
B. Social                                                           [5] C. Runciman, M. Naylor, and F. Lindblad, “Smallcheck and lazy small-
   An equally important challenge to address is the lack of             check: Automatic exhaustive testing for small values,” in Proceedings
                                                                        of the First ACM SIGPLAN Symposium on Haskell, ser. Haskell ’08.
collaboration between researchers in the Software Engineering           New York, NY, USA: ACM, 2008, pp. 37–48.
and Scientific Software communities. Without open avenues           [6] K. Claessen and J. Hughes, “QuickCheck: A lightweight tool for random
of communication between the two disciplines, it is hard for            testing of haskell programs,” in Proceedings of the Fifth ACM SIGPLAN
                                                                        International Conference on Functional Programming, ser. ICFP ’00.
the Software Engineers to keep abreast of the state-of-the-             New York, NY, USA: ACM, 2000, pp. 268–279.
art in Scientific Software development and the challenges we        [7] C. Csallner and Y. Smaragdakis, “JCrasher: an automatic robustness
might help address. Similarly, research software developers             tester for java,” Softw. Pract. Exp., vol. 34, no. 11, pp. 1025–1050,
                                                                        1 Sep. 2004.
must go out of their way to stay up-to-date with the advances       [8] C. Pacheco, S. K. Lahiri, M. D. Ernst, and T. Ball, “Feedback-Directed
in Software Engineering and Programming Languages.                      random test generation,” in 29th International Conference on Software
                                                                        Engineering, ser. ICSE ’07, 2007, pp. 75–84.
   One mechanism which could be used to bridge the gap              [9] P. Godefroid, N. Klarlund, and K. Sen, “DART: Directed automated
between the academic software engineering community and                 random testing,” in Proceedings of the 2005 ACM SIGPLAN Conference
the teams developing research software would be to provide              on Programming Language Design and Implementation, ser. PLDI ’05.
                                                                        New York, NY, USA: ACM, 2005, pp. 213–223.
targeted graduate student internships. Currently, many gradu-      [10] K. Sen, D. Marinov, and G. Agha, “CUTE: A concolic unit testing
ate students in computer science spend short periods working            engine for C,” in Proceedings of the 10th European Software Engineer-
for industry or at national laboratories, but these internships         ing Conference Held Jointly with 13th ACM SIGSOFT International
                                                                        Symposium on Foundations of Software Engineering, ser. ESEC/FSE-
usually remain within the student’s own discipline. We propose          13. New York, NY, USA: ACM, 2005, pp. 263–272.
that software engineering students be encouraged to intern         [11] N. Tillmann and J. de Halleux, “Pex–White box test generation for
with research software development teams and become embed-              .NET,” in Tests and Proofs, ser. Lecture Notes in Computer Science,
                                                                        B. Beckert and R. Hähnle, Eds. Springer Berlin Heidelberg, 2008, pp.
ded in these groups to understand the culture, software stack,          134–153.
and development challenges and take these experiences back         [12] E. Darulova and V. Kuncak, “Sound compilation of reals,” in Proceed-
with them as they develop new tools and approaches. Such                ings of the 41st ACM SIGPLAN-SIGACT Symposium on Principles of
                                                                        Programming Languages - POPL ’14. New York, New York, USA:
internships could be funded by agencies such as NSF already             ACM Press, 2014, pp. 235–248.
funding software development, potentially as supplements to        [13] P. Panchekha, A. Sanchez-Stern, J. R. Wilcox, and Z. Tatlock, “Automat-
existing awards.                                                        ically improving accuracy for floating point expressions,” in Proceedings
                                                                        of the 36th ACM SIGPLAN Conference on Programming Language
   One challenge of this approach will be to make sure that             Design and Implementation, ser. PLDI 2015. New York, NY, USA:
the students involved would be able to publish a creditable             ACM, 2015, pp. 1–11.
output from their internship which would be recognized as          [14] “Chef — IT automation for speed and awesomeness — chef,” https:
                                                                        //www.chef.io/chef/, accessed: 2016-7-10.
part of their career development — there is a danger that work     [15] “Puppet - the shortest path to better software,” https://puppet.com/,
carried out in this area would be viewed by peers as simply             accessed: 2016-7-10.
engineering rather than research.                                  [16] E. Dolstra, A. Löh, and N. Pierron, “NixOS: A purely functional linux
                                                                        distribution,” J. Funct. Programming, vol. 20, no. Special Issue 5-6, pp.
   Another challenge faced by the Software Engineering com-             577–615, 2010.
munity is how to reward researchers for producing not just         [17] G. Allen, F. Löffler, E. Schnetter, and E. L. Seidel, “Component spec-
techniques, but real, usable software artifacts that employ             ification in the cactus framework: The cactus configuration language,”
                                                                        in GRID. IEEE, 2010, pp. 359–368.
those techniques. As in many fields, the reward structure for      [18] J. C. Carver, R. P. Kendall, S. E. Squires, and D. E. Post, “Software
Software Engineering researchers heavily emphasizes publi-              development environments for scientific and engineering software: A
cation of novel techniques, which is somewhat at odds with              series of case studies,” in 29th International Conference on Software
                                                                        Engineering (ICSE’07). ieeexplore.ieee.org, May 2007, pp. 550–559.
providing (and supporting!) a usable artifact. The research
software community may be able to help in that regard by
making the case to funding agencies, and hiring and tenure