Agile model-driven engineering of financial applications Kevin Lano, Howard Haughton, Sobhan Yassipour-Tehrani, Hessa Alfraihi Dept. of Informatics King’s College London, London, UK Email: { kevin.lano, sobhan.yassipour tehrani, hessa.alfraihi }@kcl.ac.uk Abstract—Flexibility and agility are key properties for financial applica- (i) they involve complex mathematical computations, which need to tions development: systems need to be easy to change and evolve to keep be specified in a clear and comprehensible manner and related to the up with changing market conditions and opportunities, and changes in appropriate finance theory, and (ii) they have high requirements for the regulatory environment. efficiency and accuracy. In addition, (iii) a high degree of flexibility In many cases, Excel with VB is the main platform used for the and agility is typically necessary. development of applications in the finance industry. This offers advantages of flexibility and user customisation of applications in rapid development Our solution to these challenges involved using agile MDD and cycles. However, this dependence upon a proprietary software package executable modelling as follows: causes problems when applications must be migrated to an alternative platform, or when precise control over functionality is needed. Model- • Specification of applications in an executable UML dialect, driven development (MDD) and model transformations have a good UML-RSDS [9], which supports efficient code generation in potential for improving these aspects of financial development. In this paper, we describe techniques that enable the use of Excel as an MDD Java, C#, C and C++. front-end for the specification and development of financial applications • Adoption of an agile process based on the Scrum method in a rigorous manner. [14]. I. I NTRODUCTION • Using Excel to provide a user-friendly means of defining UML-RSDS specifications, so that end-users do not need to Software development in the financial services sector is usually write or understand OCL. carried out on Excel and VBA-based platforms, with modelling and prototyping performed using Excel [8], [7]. After prototyping, the We considered that this approach was suitable to address issues Excel/VBA solution may be manually re-coded in an implementation (i), (ii) and (iii), because UML-RSDS supports high-level declarative platform (Figure 1). This approach has deficiencies with regard to specification of systems using UML class diagrams and use cases, maintainability, software architecture and documentation. Separate enabling the system specification to be expressed in the context spreadsheet-based applications are difficult to integrate into composite of relevant financial concepts, and to be structured in a systematic systems, and lack the structure and modularity of more advanced manner. In addition, it supports the execution of specifications by programming environments. the synthesis of designs and programming language implementations. UML-RSDS supports agility by enabling rapid modification of system specifications: only one application model (integrating class diagrams and use cases) needs to be modified, in contrast to MDD approaches which involve multiple linked models whose consistency needs to be managed at each specification change. The integration of agility and MDD in this case is achieved by making the MDD aspect as lightweight as possible, and by adopting a ‘model as code’ viewpoint, with agile practices applied to the application models instead of the code. In Section II we give an overview of the approach. In Sections IV and V we describe an example financial application we have developed. Section VI gives related work, and Section VII gives conclusions. II. I NTEGRATING AGILE AND MODEL - DRIVEN D EVELOPMENT The key principles of agile development include (agilemani- festo.org): (i) satisfy the customer through early and continuous software delivery; (ii) welcome changing requirements; (iii) deliver Figure 1. Conventional financial application development process working software frequently (every 2 weeks to every 2 months); (iv) business people and developers to work together daily; (v) rely on face-to-face communication to convey information; (vi) continuous Starting in 2014, the MDD team at King’s College were asked attention to software quality; (vii) simplicity is essential. to look at alternative model-based approaches for specifying and implementing financial software for multiple platforms. The charac- Integration of agile and MDD aims to achieve these principles teristics of such software represent a challenge for MDD, because: in the context of model-centered development, instead of traditional code-centered development. Research on integration has produced a Typically, a migration or refinement model transformation in- number of agile MDD approaches, such as MDD-SLAP [15] and volves the mapping of instances a : A of a source entity type A to Hybrid MDD [4]. These usually focus on a particular domain, and instances b : B of a target entity type B. Such mappings are formally do not cover all lifecycle stages. In contrast, UML-RSDS can be expressed by postconditions used as the basis of a general-purpose agile MDD approach, with A :: specialised interfaces and tools for particular domains. We adopt the ACond ⇒ B→exists(b | P(self , b)) ideas of three-phase iterations from [15] and parallel tooling teams from [4] (as shown in Figure 2). of a use case that defines the transformation. The postcondition asserts that (at termination of the transformation) for every instance self of A that satisfies ACond, there must exist an instance b of B that satisfies P(self , b). Many financial applications involve processing of a similar form, where some source data, such as market data, is used as input to derive secondary information and analysis via various functions and procedures. For example, the derivation of the internal rate of return or Macaulay duration for a coupon bond, or the derivation of a price for a derivative security, based on market information. In Excel, the application data is typically laid out as a table, with columns for each significant attribute of the data (Table Bond in Figure 3). Derived data may be placed in cells of additional columns of the table, or in the cells of separate tables (such as BondYield). The second situation corresponds directly to rules of migration/refinement model transformations, where A is a source class (eg., Bond), and B a target (eg., BondYield). Figure 2. UML-RSDS agile MDD process: iterations To achieve the agile principles, we adopt the following agile practices with UML-RSDS: (i) short iterations (principles (i) and (iii)); (ii) model refactoring (principle (vi)); (iii) emphasis on model and specification simplicity (principle (vii)); (iv) product and itera- tion backlogs (principles (i), (ii)); (v) Scrumboards (principles (i), (ii)); (vi) daily standup meetings, continuous integration and testing (principle (vi)); (vii) Customer involvement in modelling (principle (iv)). The following MDD practices are used: (i) metamodelling; (ii) transformations; (iii) executable modelling. Within each iteration, phases of requirements analysis, specifi- cation, implementation and testing are applied to each task, using exploratory prototyping based on partial specifications, and review of the prototypes/specifications with stakeholders/team members. Figure Figure 3. Excel spreadsheet example 2 shows these cycles for the main team on the LHS. Reuse of components, and the contribution of new/updated components to a library, can take place during these iterations. A tool development The correspondence between Excel and UML is as follows: team applies a similar agile process (on the RHS) to respond to tool • An Excel table defines a UML class with attributes for support requirements from the main team. each column of the table, with the attribute types obtained Prototyping and testing of completed specifications is followed from the data in an example row of the table. For example, by integration with other software elements, and iterative revision of the Bond table defines a UML class Bond with an integer specifications as necessary to pass tests and efficiency requirements. attribute Settlement, double attribute Price, etc. Multiple source and target classes can be defined as tables in the same spreadsheet. III. MDD SPECIFICATION USING E XCEL • The spreadsheet functionality corresponds to a model trans- Class diagrams and use cases are the core specification notations formation (defined as a use case, sheet) operating on the used in UML-RSDS to specify applications (including model trans- class diagram corresponding to the tables. formations). OCL constraints are used to define the logical postcon- ditions of operations and use cases, and to define the procedural steps • When data item T of table B is defined as a function f (A.S) (activities) of their designs [9]. of the data S of another table A, this corresponds to a postcondition constraint (+ve cash flows to the investor), and repayment of capital at the end of term (Figure 4). Eg.: a £100 bond is purchased for a price A :: of £105, with an investment term of 10 years, it pays 8% annual true ⇒ B→exists(b | b.BId = AId & b.T = f (S)) interest bi-annually (20 payments of £4), then £100 capital repayment in cases where f expects single data values, and to a (redemption) at the end of the term. In contrast, a zero-coupon bond constraint only pays back accumulated gains and capital at the end of its term. :: AS = f (A→collect(S)) ⇒ B→exists(b | b.T = AS) in cases where f expects aggregate data (eg., f is a function such as the SUM, AVERAGE, etc of a column/sequence of individual data items). In the first case, each instance (table row) of A is mapped to an instance (table row) of B. In the second, a single B instance is created with its attribute values derived from aggregates of A data. The dataflow dependencies of the spreadsheet must be acyclic [7], that is, if data item y depends on data x directly or indirectly, there cannot be a dependency in the reverse direction. In the UML representation, this means that the constraints can be ordered in a sequence where all data that is read by one constraint Ci is directly available or has been produced by earlier constraints Cj for j < i. This is the usual Figure 4. Finance domain bond classes in ExcelLib library organisation of use case constraints in UML-RSDS, using the Phased Construction transformation design pattern [10]. Using discrete compounding, the IRR is the rate r such that: An example is shown in Figure 5, which is derived from Figure 3. price = Σflows.size i=1 flows[i].amount/(1 + r)flows[i].timePoint + The use case sheet expresses the overall functionality of the spread- redemption/(1 + r)flows.last.timePoint sheet application. The derivation of UML from Excel spreadsheets in CSV format is performed automatically by the UML-RSDS tools. Time can be measured in days, months, years, etc.: r will Excel functions such as SUM, AVERAGE, DATE, RATE, DURA- be the rate wrt this measure. Generally, r can be estimated by TION, PRICE, etc have been formalised using OCL. These definitions numerical approximation techniques, eg., the secant method or are provided in a library class ExcelLib. For example: bisection. Excel uses the secant procedure to estimate the IRR, in the function RATE(npays : int, payment : double, query SUMPRODUCT(s1 : Sequence(double), price : double, fv : double) : double where npays s2 : Sequence(double)) : double is the number of coupon payments to be made, payment is the constant pre: s1.size = s2.size amount of each payment, price is the present value of the investment, post: and fv (optional) is the redemption amount at termination of the result = Integer.subrange(1,s1.size)->collect( i | investment. The result value is an approximation to the IRR r. In s1[i]*s2[i] )->sum() ExcelLib we provide a mathematical specification of RATE based on the secant procedure. From the user’s perspective, they can use In our style of Excel specification, logical expressions such RATE in the same way as in Excel. Behind the scenes, they are as Bond.Settlement are used to denote data items/groups, instead using the ExcelLib library specification written in OCL, and from this, of physical spreadsheet locations/ranges such as B7:B17. Logical procedural code in Java, C#, C or C++ can be automatically generated. specifications of Excel functions and other financial and numerical Note that in our version of OCL, computational numeric types are functions are provided in ExcelLib using OCL, facilitating analysis used: 32-bit and 64-bit integers, and IEEE 754 standard double- and formal verification. In the reverse direction, the tables of a precision floating-point numbers. This reduces the semantic gap spreadsheet can be synthesised from a class diagram: each concrete between the specification and implementation, improving verifiability. class maps to a table, with the attributes of the class mapped to table columns. The operation defineFlows1() of Figure 4 is specified as: In Sections IV and V we describe two iterations of an agile defineFlows1() MDD case study in finance, for a bond-pricing application. This pre: true was developed in collaboration with an industrial expert in financial post: software construction. Integer.subrange(1, (maturity*frequency).floor)->forAll( t | IV. C ASE STUDY ITERATION 1: I NTERNAL RATE OF RETURN $CashFlow->exists( f | f.amount = coupon & The bond-pricing application iteration 1 computes the internal f.timePoint = (t*1.0)/frequency & rate of return (IRR) of a bond. The IRR measures the quality of an f : flows ) ) investment: the effective rate at which the investment returns value over its term. A coupon bond consists of a series of cash flows: an ini- This creates all the positive cash flows from coupon payments over tial payment (-ve cash flow), followed by coupon payments/dividends the lifetime of the bond. (maturity in $Bond is the term of the bond). To derive the IRR of a series of bonds, the user specifies two tables in a spreadsheet, as in Figure 3. The Bond table identifies the attributes of the Bond class, and their types, whilst the BondYield table also defines the mapping of Bond to BondYield, using the RATE function. The function in cell D12 is RATE((Bond.Maturity - Bond.Settlement) * Bond.Frequency, Bond.Coupon, Bond.Price, 100) The UML-RSDS tools convert the spreadsheet to the class dia- gram of Figure 5, and derive the following postcondition constraint of the use case sheet: Application:: bondx : Bond => BondYield->exists( bondyieldx | bondyieldx.BondId = bondx.BondId & bondyieldx.Yield = RATE( Figure 5. Class diagram derived from Excel spreadsheet ( bondx.Maturity - bondx.Settlement ) * bondx.Frequency, bondx.Coupon,bondx.Price,100) & the equation: bondyieldx.Duration = DURATION( DATE(bondx.Settlement,1,1), y(t) = β1 + β2 ∗ (1 − exp(−t/ λ1 ))/(t/ λ1 ) + DATE(bondx.Maturity,1,1), bondx.Coupon, β3 ∗ ((1 − exp(−t/ λ2 ))/(t/ λ2 ) − exp(−t/ λ2 )) bondyieldx.Yield, bondx.Frequency) ) This models how the yield y(t) of a bond varies depending on its duration t. The meaning of this postcondition is that for each Bond instance, a The yield curve in this model has a long-term rate component BondYield instance will be produced with its Yield value set to the (β1 ), a short-term (2nd factor), and a ‘hump’ (3rd factor). The IRR computed using RATE. In addition, we also compute the discrete problem is to estimate the βi and λj , given market data – ‘fitting the Macaulay duration of the bond (the time to maturity of the equivalent curve’ to this data. Estimation procedures to find the NSS parameters zero-coupon bond): that fit a given set of market bond data include genetic algorithms (GA) and Matlab’s fminsearch using a simplex algorithm [5]. duration = We use a combination of genetic algorithms and the Nelder-Mead (Σflows.size i=1 flows[i].timePoint ∗ flows[i].amount/ simplex algorithm [12]. These algorithms are specified using OCL in (1 + yield)flows[i].timePoint )/ ExcelLib. (Σflows.size i=1 flows[i].amount/(1 + yield)flows[i].timePoint ) The adapted Nelson-Siegal-Svensson model is specified in UML- This computation can use the previously-computed yield value of RSDS by the nelsonsiegalx and nsx functions of the finance library each BondYield instance. In the spreadsheet, the value of the Duration class $Bond (Figure 4). These are then used to evaluate the fitness column (cell E12) is expressed as: of a candidate solution (a tuple of values for the parameters β1 , β2 , β3 , λ1 , λ2 ), based on the sum of squares of differences between the DURATION(DATE(Bond.Settlement,1,1), actual and predicted interest rates for the market data. DATE(Bond.Maturity,1,1), Bond.Coupon, BondYield@pre.Yield, NSS-estimation is performed by an ExcelLib function Bond.Frequency) ESTIMATENSX(yields: Sequence(double), durations: Sequence(double)) : Sequence(double). This is called on The @pre annotation indicates that the Yield derived by the preceding BondYield.Yield, BondYield.Duration (considered as sequence equation should be used. A continuous duration function DURA- data) to populate a result table NelsonSiegal. TIONC is also available. ESTIMATENSX initialises a population for a genetic algorithm using the bond interest rate (yield) and duration data, and iterates V. C ASE STUDY ITERATION 2: Y IELD CURVE FITTING the genetic algorithm until there is no further improvement in the maximum fitness in the population. The best result is then passed to In iteration 2 of the bond-pricing application, the yield and a numerical optimiser, in our case the simplex algorithm. duration derived in iteration 1 are used as input datapoints for a process which attempts to find the best-fitting yield curve for Having derived the NSS parameters, these can then be used interest rate data, according to some yield curve model such as to calculate the fair price of bonds of different durations, using a the Nelson-Siegal or Nelson-Siegal-Svensson models. The adapted formalisation of the Excel PRICE function. This functionality was Nelson-Siegal-Svensson (NSS) model [5] which we use is defined by developed in iteration 3. We compared our formalisation of this bond-pricing procedure These could then be used directly in the Excel-style specifications with a traditional Excel/VBA solution, in terms of software size, used by our approach. accuracy and efficiency A Java implementation of the UML-RSDS specification was used, applied to a test case of 8 coupon bonds VII. C ONCLUSIONS ranging from 1 year to 12 year terms. Code size was reduced from 34K to 22K, accuracy improved from 2 ∗ 10−4 to 6 ∗ 10−6 , however We have described techniques to combine agile MDD with execution time increased from 550ms to 730ms, mainly due to use conventional Excel-based financial application development. Excel of a genetic algorithm. spreadsheets are interpreted as UML class diagrams operated on by model transformation specifications, and executable implementations The curve produced from the estimated parameters using our of these transformations can then be synthesised in production-quality approach is shown in Figure 6. languages. In future work we will evaluate the usability of the approach with financial engineers, and extend the OCL finance library to provide comprehensive facilities for the use of MDD in the finance domain. R EFERENCES [1] Cincom, JP Morgan derives clear benefits from Cincom Smalltalk, www.cincom.com/pdf/CS040819-1.pdf, accessed October 2016. [2] J. Cunha, J. Saraiva, J. Visser, From spreadsheets to relational databases and back, PEPM 2009. [3] J. Cunha, J. Fernandes, J. Mendes, H. Pacheco, J. Saraiva, Bidirectional transformations of model-driven spreadsheets, ICMT 2012. [4] G. Guta, W. Schreiner, D. Draheim, A lightweight MDSD process applied in small projects, Proceedings 35th Euromicro conference on Software Engineering and Advanced Applications, IEEE, 2009. [5] M. Gilli, S. Grosse, E. Schumann, Calibrating the Nelson-Siegal- Svensson model, COMISEF working paper WPS-031, 2010. [6] O. Hammarlid, Aggregating sectors in the infectious defaults model, Figure 6. Estimated NSS curve Quantitative Finance, vol. 4, no. 1, 2004, pp. 64–69. [7] Y. Horry, Financial information description language and visualisa- tion/analysis tools, Computer Languages, Systems and Structures 50 VI. R ELATED WORK (2017), pp. 31–52. [8] A. Kumiega, B. Van Vliet, A software development methodology for The Kapital system used by JP Morgan has been a highly success- research and prototyping in financial markets, Proc. European Spread- ful alternative approach in finance [1], enabling financial engineers to sheet Risks Int. Grp., pp. 107–127, 2006. quickly prototype and implement new financial products. However, [9] K. Lano and S. Kolahdouz-Rahimi, Constraint-based specification of Kapital is based on a specialised platform (Smalltalk). We consider model transformations, Journal of Systems and Software, vol. 88, no. that basing a financial engineering platform on language-independent 2, February 2013, pp. 412–436. UML and OCL specifications will provide more flexible and durable [10] K. Lano, S. Kolahdouz-Rahimi, Model-transformation Design Patterns, IEEE Transactions in Software Engineering, vol 40, 2014. support than a language-specific solution. In [11] we describe the risk [11] K. Lano, H. Alfraihi, S. Yassipour-Tehrani, H. Haughton, Improving evaluation of multiple investment instruments [6] using UML-RSDS, the Application of Agile Model-based Development: Experiences from this direct MDD approach was technically effective but was consid- Case Studies, ICSEA 2015. ered too difficult for financial engineers to use. As an alternative, we [12] J. Nelder, R. Mead, A simplex method for function minimisation, investigated the Excel-based approach described here. An application Computer Journal, 7, pp. 308–313. of agile MDD in finance is given in [13], however this is focussed on [13] M. B. Nakicenovic, An Agile Driven Architecture Modernization to a the management of data and message formats instead of numerical Model-Driven Development Solution, International Journal on Advances computations. In [2], spreadsheets are mapped to relational databases in Software, vol 5, nos. 3, 4, 2012, pp. 308–322. in order to optimise the spreadsheets. In [3], spreadsheet evolution is [14] K. Schwaber, M. Beedble, Agile software development with Scrum, Pearson, 2012. supported via a mapping to a semantic representation. These papers [15] Y. Zhang, S. Patel, Agile model-driven development in practice, IEEE use transformations to support spreadsheet-based development, whilst Software, vol. 28, no. 2, pp. 84–91, 2011. our work uses spreadsheets to define model transformations. A similar data-flow logical specification of financial applications is described in [7], although all the data derivations are performed within a single table, whilst we separate data into possibly multiple source, target and intermediate tables. QuantLib (quantlib.org) is a widely-used library of financial analysis functions, written in C++ (interfaces and partially-ported versions also exist for other programming languages). In contrast to QuantLib, we define financial functions in a platform-independent formalism (OCL), from which language-specific implementations can be automatically generated. In future work, we aim to include substantial parts of the QuantLib libraries in our OCL financial library.