End-user development via sheet-defined functions ∗ Peter Sestoft Jonas Druedahl Rask Simon Eikeland Timmermann ABSTRACT We have developed an implementation of sheet-defined func- tions, a mechanism that allows spreadsheet users to define their own functions, using only spreadsheet concepts such as cells, formulas and references, and no external programming languages. This position paper presents the motivation and vision of this work, describes the features of our prototype implementation, and outlines future work. Keywords Spreadsheets, end-user development, functional programming Figure 1: A sheet-defined function implementing Excel’s NOMINAL built-in. Cells B23 and B24 are in- 1. INTRODUCTION put cells, cell B26 is the output cell, and B25 holds Spreadsheet programs such as Microsoft Excel, OpenOffice an intermediate result. The call to DEFINE in cell A22 Calc, Gnumeric and Google Docs are used by millions of creates the function. Cell A28 contains a call to the people to develop and maintain complex models in finance, defined function. It takes around 200 ns to execute science, engineering and administration. Yet Peyton Jones, it, of which 80 ns is due to exponentiation (ˆ). As Burnett and Blackwell [9] observed that spreadsheet pro- shown in cell A28, a sheet-defined function is called grams lack even the most basic abstraction facility — a way just like a built-in or VBA function. to encapsulate an expression as a reusable function — and proposed a design for such a mechanism. nisms, saying about the traditional combination of Excel We have implemented this idea in the form of sheet-defined and externally defined functions that “change control, static functions. A user may define a function F simply by declar- type checking, abstraction and reuse are almost completely ing which (input) cells will hold F’s formal parameters and lacking” [1]. which (output) cell will compute F’s result, as a function of the input cells. The function definition may involve arbi- 2. THE VISION trary additional cells, spreadsheet formulas, calls to built-in The ultimate goal of this work is to allow spreadsheet users functions, and calls to other sheet-defined functions. Fig- themselves to develop and evolve libraries of user-defined ure 1 shows an example. In the example, values are referred functions to support sophisticated spreadsheet models. Defin- to by cell (such as B25). A mechanism that allows for sym- ing a function requires only well-known spreadsheet concepts bolic names (such as “periods”) instead could be added, but such as cell, cell reference and function, and no external pro- Nardi speculates that end user developers would not neces- gramming languages. Therefore experimentation and adap- sarily find that better [7, page 44]. tation of user-defined functions remain under the control of the spreadsheet users and domain experts, who need not Augustsson et al. from Standard Chartered Bank provide wait for an IT department to understand, describe, imple- further support for the utility of such abstraction mecha- ment and test the desired changes. ∗sestoft@itu.dk, IT University of Copenhagen, Denmark Any spreadsheet computation can be turned into a sheet- defined function. This ensures conceptual and notational simplicity. Moreover, it means that new user-defined func- tions may arise by refactoring of a spreadsheet model as it evolves. As a spreadsheet model becomes more refined and complex, it may be observed that the same cluster of formulas appears again and again. Such a cluster of formu- las may then be encapsulated in a sheet-defined function, and each formula cluster replaced by a call to that function. This both simplifies the spreadsheet model and improves its Table 1: Time to compute the cumulative distribu- Table 2: Time to call a square root function; includes tion function of the normal distribution N (0, 1). recalculation time. Implementation Time/call (ns) Calling Time/call (ns) Sheet-defined function 118 Sheet-defined function from Funcalc 400 C# 47 Excel built-in from Excel 160 C (gcc 4.2.1 -O3) 51 .NET function from Excel/Excel-DNA 4,900 Excel 2007 VBA function 1925 VBA function from Excel 12,000 Excel 2007 built-in NORMSDIST 993 value may be called as APPLY(A42,0.053543,4) using built- maintainability, because a bug-fix or other improvement to in function APPLY. the function will automatically affect all its uses, unlike the traditional situation when there are multiple copies of the Function values are built by applying a sheet-defined func- same cluster of formulas. tion to only some of its arguments, the absent arguments being given as NA(); the resulting function value will dis- Sheet-defined functions may be shared with other users in play as NOMINAL(#NA,4) or similar. the same department or application domain, without pre- venting them from making their own improvements — be- Such a function value may be specialized, or partially eval- cause the domain knowledge is not locked into the notation uated, with respect to its available (non-#NA) arguments. of a “real” programming language, but one that presumably The result is a new function value with the same behavior is familiar to users and that they are (more) comfortable but potentially better performance because the available ar- experimenting with. gument values have been inlined and loops unrolled in the underlying bytecode. For more information, see [5] and [12]. Sheet-defined functions support end-user “tinkering” to de- Specialization provides some amount of incremental compu- velop models and workflows that are appropriate within tation and memoization, and we do not currently have other their application domain [7]. Clearly not all spreadsheet general mechanisms for these purposes. users will be equally competent developers of sheet-defined functions, and clearly not all software should be developed A forthcoming book [13] gives many more details of the im- in this way. However, judging from the huge popularity plementation, more examples of sheet-defined functions, and of spreadsheets within banks, finance, management, science a manual for Funcalc. A previously published paper [15] and engineering, the immediate response and the user con- presents a case study of reimplementing Excel’s built-in fi- trol offered by spreadsheets are attractive features. Also, nancial functions as sheet-defined functions. from anecdotal evidence, structured use of spreadsheets is a flexible, fast and cheap alternative to “big bang” professional A comprehensive list of US spreadsheet patents is given in IT projects. a forthcoming report [14]. 3. THE FUNCALC PROTOTYPE 4. INTEGRATION WITH EXCEL We have created a prototype implementation of sheet-defined In ongoing work [10] we integrate sheet-defined functions functions, called Funcalc. The implementation is written in with the widely used Microsoft Excel spreadsheet program, C#, is quite compact (12,000 lines of code) and compiles rather than our Funcalc prototype, as illustrated in Figures 2 sheet-defined functions to .NET bytecode [3] at run-time. and 3. This enables large-scale experimentation with sheet- As shown by Table 1 execution efficiency is very good; this defined functions because they can be defined in a context is due both to local optimizations performed by our function that is familiar to spreadsheet users and provides charting, compiler and to Microsoft’s considerable engineering effort auditing, and so on. in the underlying .NET just-in-time compiler. The main downside is that calling a sheet-defined function Funcalc features include: from Excel is much slower than from the Funcalc implemen- tation (yet apparently faster than calling a VBA function); see Table 2. However, the sheet-defined function itself will • a “normal” interpretive spreadsheet implementation; execute at the same speed as in Funcalc. This work uses the • a compiled implementation of sheet-defined functions; Excel-DNA runtime bridge between Excel and .NET [4]. • recursive functions and higher-order functions; 5. FUTURE WORK • functions can accept and return array values in addi- So far we have focused mostly on functionality and good tion to numbers and string; performance. We emphasize performance because we want sheet-defined functions to replace not only user-defined func- • automatic specialization, or partial evaluation [12]; tions written in VBA, C++ and other external languages, but to replace built-in functions also. Domain experts in • facilities for benchmarking sheet-defined functions. finance, statistics and other areas of rather general interest should be able to develop well-performing high-quality func- Because Funcalc supports higher-order functions, the value tions themselves and not have to rely on Microsoft or other contained in a cell, say A42, may be a function value. This vendors to do so. Figure 2: Funcalc as Excel plug-in, showing formulas of sheet-defined function TRIAREA with input cells A3, B3 and C3, intermediate cell D3, and output cell E3. The call to DEFINE in cell E4 creates the function. Through the new “Excelcalc” menu one can interact with the underlying Funcalc implementation and the Excel-Funcalc bridge (mostly for development purposes). Figure 3: Same sheet as in Figure 2, here showing values rather than formulas. Note the editing in progress of a call to sheet-defined function TRIAREA in cell E6. However, a well-performing implementation of sheet-defined [7] B. A. Nardi A small matter of programming. functions is just the beginning: one should investigate ad- Perspectives on end user programming. MIT Press, ditional infrastructure and practices to support their use. 1993. For instance, how can we extend the natural collaboration [8] B. A. Nardi and J. R. Miller Twinkling lights and around spreadsheet development [8] in a community of users nested loops: distributed problem solving and to cover also libraries of sheet-defined functions; how can we spreadsheet development. International Journal of support versioning and merging of such libraries in a way Man-Machine Studies, 34:161–184, 1991. that does not preclude individual users’ tinkering and ex- [9] S. Peyton Jones, A. Blackwell, and M. Burnett. A perimentation; how can we support systematic testing; and user-centred approach to functions in Excel. In ICFP so on. ’03: Proceedings of the Eighth ACM SIGPLAN International Conference on Functional Programming, Our concept of sheet-defined functions should be subjected pages 165–176. ACM, 2003. to a systematic usability study; the study conducted by [10] J. D. Rask and S. E. Timmermann. Integration of Peyton-Jones, Blackwell and Burnett [9] assumed that func- sheet-defined functions in Excel using C#. Master’s tions could not be recursive, whereas ours can. thesis, IT University of Copenhagen, 2014. (Expected June 2014). Finally, sheet-defined functions lend themselves well to par- [11] V. Sarkar and J. Hennessy. Compile-time partitioning allelization, because they are pure (yet strict, an unusual and scheduling of parallel programs. In ACM combination) so that computations can be reordered and SIGPLAN ’86 Symposium on Compiler Construction, performed speculatively, and often exhibit considerable ex- pages 17–26, June 1986. plicit parallelism. In fact, they resemble dataflow languages [12] P. Sestoft. Online partial evaluation of sheet-defined such as Sisal [6]. Presumably some of the 1980es techniques functions. In A. Banerjee, O. Danvy, K. Doh, and used to schedule dataflow languages [11] could be used to J. Hatcliff, editors, Semantics, Abstract Interpretation, perform spreadsheet computations efficiently on modern mul- and Reasoning about Programs, volume 129 of ticore machines. The result might be “supercomputing for Electronic Proceedings in Theoretical Computer the masses”, realizing Chandy’s 1984 vision [2]. Science, pages 136–160, 2013. [13] P. Sestoft. Spreadsheet Implementation Technology. 6. CONCLUSION Basics and Extensions. MIT Press, 2014. ISBN We have presented a prototype implementation of sheet- 978-0-262-52664-7. (Expected August 2014). 313 defined functions and outlined some future work. Our hope pages. is that such functionality will become available in widely [14] P. Sestoft. Spreadsheet patents. Technical Report used spreadsheet programs, or via a full-featured version of ITU-TR-2014-178, IT University of Copenhagen, 2014. the plugin described in Section 4, and will enable spread- ISBN 978-87-7949-317-9. (To appear). sheet users to develop their own computational models into [15] P. Sestoft and J. Z. Sørensen. Sheet-defined functions: reusable function libraries, without loss of computational implementation and initial evaluation. In Y. Dittrich efficiency and without handing over control to remote IT et al., editors, International Symposium on End-User departments or software contractors. Moreover, there seems Development, June 2013, volume 7897 of Lecture to be a technological opportunity to harness the power of Notes in Computer Science, pages 88–103, 2013. multicore machines through spreadsheet programming. 7. REFERENCES [1] L. Augustsson, H. Mansell, and G. Sittampalam. Paradise: A two-stage DSL embedded in Haskell. In International Conference on Functional Programming (ICFP’08), pages 225–228. ACM, September 2008. [2] M. Chandy. Concurrent programming for the masses. (PODC 1984 invited address). In Principles of Distributed Computing 1985, pages 1–12. ACM, 1985. [3] Ecma TC39 TG3. Common Language Infrastructure (CLI). Standard ECMA-335. Ecma International, sixth edition, June 2012. [4] Excel DNA Project. Homepage. At http://exceldna.codeplex.com/ on 28 February 2014. [5] N. D. Jones, C. Gomard, and P. Sestoft. Partial Evaluation and Automatic Program Generation. Prentice Hall, 1993. At http://www.itu.dk/people/sestoft/pebook/pebook.html on 9 June 2013. [6] J. McGraw et al. Sisal. Streams and iteration in a single assignment language. Language reference manual, version 1.2. Technical report, Lawrence Livermore National Labs, March 1985.