MDSheet – Model-Driven Spreadsheets Jácome Cunha João Paulo Fernandes Jorge Mendes Rui Pereira João Saraiva {jacome,jpaulo,jorgemendes,ruipereira,jas}@di.uminho.pt HASLab/INESC TEC & Universidade do Minho, Portugal CIICESI, ESTGF, Instituto Politécnico do Porto, Portugal RELEASE, Universidade da Beira Interior, Portugal ABSTRACT has been fully incorporated in a tool, MDSheet1 , whose fea- This paper showcases MDSheet, a framework aimed at im- tures include:2 proving the engineering of spreadsheets. This framework is 1) Model inference: we extract the abstract representation model-driven, and has been fully integrated under a spread- from legacy spreadsheets; sheet system. Also, its practical interest has been demon- 2) Embedded models: this abstract representation is manip- strated by several empirical studies. ulated and evolved in spreadsheets themselves; 3) User guidance: relying on this business model, we are able of guiding users in avoiding traditional spreadsheet mistakes; Categories and Subject Descriptors 4) Model/instance synchronization: we support the evolu- tion of model and instances, ensuring an automatic synchro- H.4 [Information Systems Applications]: Spreadsheets; nization of the unevolved artifact; D.2.0 [Software Engineering]: General; D.2.6 [Software 5) Model quality assessment: a set of metrics on the com- Engineering]: Programming Environments—Graphical en- plexity of a spreadsheet model can be computed; vironments, Integrated environments, Interactive environ- 6) Querying: spreadsheet data can be queried. ments 2. SPREADSHEET ENGINEERING General Terms MDSheet is a framework for the engineering of spread- Languages, Design, Human Factors sheets in a model-driven fashion. This framework is highly extensible: we have actually extended it with several new functionalities that we have developed in the last few years. Keywords Model-Driven Spreadsheets, MDSheet, Model Inference, Em- 2.1 Motivational Example bedding, Bidirectional Synchronization, Querying The realization of our approach to spreadsheet engineering builds upon the embedding of ClassSheets in a spreadsheet system. So, we start by introducing ClassSheets within MD- 1. INTRODUCTION Sheet with the example given in Figure 1: we present a model We can not run the modern world without spreadsheets. for a Budget spreadsheet (Figure 1a), which we adapted Spreadsheets are omnipresent, from individuals needing to from [13]3 , and an instance of such model (Figure 1b). cope with simple needs to large companies needing to im- This model holds three classes where data is to be in- plement complex forecasts or to produce advanced reports. serted by end users: i) Year, with a default value of 2010, The realization of such importance has made concrete im- for the budget to accommodate multi-year information, ii) pact in the scientific community as well. This is due to more Category, for assigning a label to each expense and iii), research teams devoting their efforts to improving spread- a(n implicit) relationship class where quantity and costs are sheets, and a growing number of scientific events dedicated registered and totals are calculated based on them. The ac- to them. tual spreadsheet may hold several repetitions of any of these A successful approach to address spreadsheets under a sci- elements, as indicated by the ellipsis. For each expense we entific perspective consists of incorporating well-established record its quantity and its cost (with 0 as the default value), software engineering techniques in the spreadsheet develop- and we calculate the total amount associated with it. Fi- ment process. nally, (simple) summation formulas are used to calculate the Our approach is essentially based on precisely one such global amount spent per year (cell D5), the amount spent technique: we adopt model-driven spreadsheet engineering. per expense type in all years (cell F3) and the total amount In the setting we propose a spreadsheet is abstracted through spent in all years (cell F5) are also calculated. a concise model, which is then used to improve effective- 1 MDSheet is available through the SSaaPP project website: ness and efficiency of spreadsheet users. The framework http://ssaapp.di.uminho.pt. we describe in this paper has been realized in a traditional 2 In the next section, we describe each such feature in a dif- spreadsheet development system, thus not forcing spread- ferent subsection. sheet users to move to a different paradigm. 3 We assume colors are visible in the digital version of this The spreadsheet development framework that we envision paper. (a) Model worksheet. (b) Data/instance worksheet. Figure 1: A bidirectional model-driven environment for a budget spreadsheet. Following is the description of the full set of features of- embedding of spreadsheets has also been implemented under fered by MDSheet. MDSheet [6], which was demonstrated in Section 2.1. More- over, we extended the ClassSheets language with database 2.2 Model Inference constraints, such as unique columns/rows or foreign keys, A model-driven approach to spreadsheet engineering of- which have also been incorporated in MDSheet [11]. In fact, fers an improved development experience: an abstract rep- we have further extended the available restrictions so that resentation of a spreadsheet, i.e., its model, helps us, among the user can specify the contents of a cell using regular ex- other things, in guiding users into preventing errors. This pressions or intervals [8]. Finally, we extended ClassSheets approach, however, requires the definition of a model in par- with references between different models making them more allel with the spreadsheet it abstracts. In order to handle flexible. Note that through this embedding we can guaran- legacy spreadsheets, i.e., the ones that have not been de- tee that spreadsheet data always conforms to a model. veloped from scratch together with their model, we have devised a model inference technique [2], that has been im- 2.4 User Guidance plemented in MDSheet. Concretely, we infer models in the The embedding of our extended version of the ClassSheet ClassSheets language, an object-oriented high-level formal- language allows us to guide the user in inserting correct data. ism to abstract spreadsheets [13]. When a model is designed, it serves as a guider in the cre- ation of a data worksheet, which is initially empty. Only 2.3 Embedded Models cells containing plain data can be edited as all other are The worksheet structure of spreadsheets is a decisive fac- inferred from the model. This prevents, e.g., users from tor in their modularity. In fact, we exploited precisely this making mistakes when defining formulas as they are locked. structure to make the model of a spreadsheet available within Moreover, the restrictions created in the model guarantee spreadsheet systems themselves: one worksheet holds the that the data in the cells respects them. In the model it is model of a spreadsheet, while another holds its data. This possible to define an interval of integers for a cell, or a regu- lar expression that the content must conform to. A column in an agile testing framework of a software house; ii), to or row can be marked as having only unique values or being adapt data produced by different database systems for a a foreign key to another column or row. All these restric- car multimedia production company; and iii), to provide tions are enforced by MDSheet. In the case of foreign keys, spreadsheet models for a food bank. the user can use a combo box to select existing values from the referred column/row. Acknowledgments 2.5 Model/Instance Synchronization This work is part funded by the ERDF - European Regional Development Fund through the COMPETE Programme (op- As any other software artifact, spreadsheets evolve over erational programme for competitiveness) and by National time. MDSheet accommodates changes by allowing the evo- Funds through the FCT - Fundação para a Ciência e a Tec- lution of models and instances, while automatically coevolv- nologia (Portuguese Foundation for Science and Technology) ing the unchanged artifact. For this, we introduced a for- within project FCOMP-01-0124-FEDER-020532. The first au- mal framework to allow evolutions of the model to be au- thor was funded by the FCT grant SFRH/BPD/73358/2010. tomatically spanned to the instances [6, 7, 12]. We have later proposed techniques and tools to the evolution of data by the user and corresponding automatic coevolution of the 5. REFERENCES model [3]. We therefore ensure that model/instance consis- [1] O. Belo, J. Cunha, J. P. Fernandes, J. Mendes, tency is never broken. R. Pereira, and J. Saraiva. Querysheet: A bidirectional query environment for model-driven spreadsheets. In 2.6 Model Quality Assessment VLHCC ’13, pages 199–200. IEEE CS, 2013. In a first attempt to measure the quality of a spreadsheet [2] J. Cunha, M. Erwig, and J. Saraiva. Automatically model, we introduced a set of metrics to calculate the com- inferring classsheet models from spreadsheets. In plexity of ClassSheet models [9]. These metrics are imple- VLHCC ’10, pages 93–100. IEEE CS, 2010. mented under MDSheet and can be calculated for any Class- [3] J. Cunha, J. P. Fernandes, J. Mendes, H. Pacheco, Sheet defined using it. They are then compared to the same and J. Saraiva. Bidirectional transformation of metrics computed for a repository of ClassSheet models so model-driven spreadsheets. In ICMT ’12, volume 7307 users can have a reference point for such values. The evolu- of LNCS, pages 105–120. Springer, 2012. tion mechanisms can then be used to evolve the spreadsheet [4] J. Cunha, J. P. Fernandes, J. Mendes, R. Pereira, and improving it according to the metrics calculated. J. Saraiva. Querying model-driven spreadsheets. In VLHCC ’13, pages 83–86. IEEE CS, 2013. 2.7 Querying [5] J. Cunha, J. P. Fernandes, J. Mendes, R. Pereira, and As many spreadsheets are used as data repositories, the J. Saraiva. Embedding model-driven spreadsheet need to query their data is frequent. MDSheet also inte- queries in spreadsheet systems. In VLHCC ’14, 2014. grates a query system, which allows the definition of model- to appear. oriented queries, in the style of traditional database queries. [6] J. Cunha, J. P. Fernandes, J. Mendes, and J. Saraiva. This allows the writing of queries without having to man- Embedding and evolution of spreadsheet models in ually observe a possibly large number of columns and rows spreadsheet systems. In VLHCC ’11, pages 186–201. of concrete data. Indeed, queries are written, by analyzing [7] J. Cunha, J. P. Fernandes, J. Mendes, and J. Saraiva. models, as abstractions that are simpler to understand. Our MDSheet: A framework for model-driven spreadsheet system was initially presented as a textual language [1, 4], engineering. In ICSE 2012, pages 1412–1415. ACM. very similar to SQL. Even being textual it already was of [8] J. Cunha, J. P. Fernandes, J. Mendes, and J. Saraiva. great help for users [14]. Still, we have further improved it Extension and implementation of classsheet models. In by embedding the language in a worksheet, thus creating a VLHCC ’12, pages 19–22. IEEE CS, 2012. visual language for spreadsheet querying [5]. [9] J. Cunha, J. P. Fernandes, J. Mendes, and J. Saraiva. Complexity Metrics for ClassSheet Models. In 3. EMPIRICAL VALIDATION ICCSA ’13, volume 7972, pages 459–474. LNCS, 2013. One of the purposes of our tool is to help users commit [10] J. Cunha, J. P. Fernandes, J. Mendes, and J. Saraiva. less errors; if possible, it also intends to help users work Embedding, evolution, and validation of spreadsheet faster with spreadsheets. To assess these two concerns we models in spreadsheet systems. 2014. submitted. have run an empirical study and we have found empirical [11] J. Cunha, J. P. Fernandes, and J. Saraiva. From evidence that indeed our model-driven spreadsheet environ- Relational ClassSheets to UML+OCL. In SAC ’12, ment can in fact help users become more efficient and more pages 1151–1158. ACM, 2012. effective [10]. [12] J. Cunha, J. Visser, T. Alves, and J. Saraiva. Type-safe evolution of spreadsheets. In 4. CONCLUSION D. Giannakopoulou and F. Orejas, editors, FASE ’11, We briefly presented MDSheet and all the features it of- volume 6603 of LNCS, pages 186–201. Springer, 2011. fers to its users. Given the fact that it has been built as a [13] G. Engels and M. Erwig. ClassSheets: automatic framework, new tools, even if not proposed by us, can easily generation of spreadsheet applications from be integrated in it. object-oriented specifications. In ASE ’05, pages We believe this tool is in a very mature state and can 124–133. ACM, 2005. be used in real case scenarios. We have thus started its [14] R. Pereira. Querying for model-driven spreadsheets. integration in industry: i), to support test case evolution Master’s thesis, University of Minho, 2013.