The road to highlights is paved with good intentions: envisioning a paradigm shift in OLAP modeling Panos Vassiliadis Patrick Marcel Univ. Ioannina University of Tours Ioannina, Greece Tours, France pvassil@cs.uoi.gr patrick.marcel@univ-tours.fr ABSTRACT users operating in terms of roll-up and drill-down with In this vision paper we structure a vision for the Business Intelli- the cube, we wish to empower them to state their inten- gence of the near future in terms of a model with novel concepts tions with respect to an operation. For example, instead of and operators. We envision systems where the end-user requests saying "drill down this cell one level", the user might ask information at a very high level, expressed as his intention to dis- "explain the drop in the sales of this product", or "is the cover information, and the system transforms this request to the main drop in sales observed at the best selling city also concrete execution of algorithms in order to compute, visualize observed in similar cities?"; as another example, instead and comment data and important highlights among them as an of issuing a roll-up operator, the user can state "verify answer to the information request made by the end-user. whether a trend that I observe in a certain context still holds in general". • Concerning the aspect of what the answer to a query is, 1 INTRODUCTION we believe we can no longer remain with "sets of tuples" as How will BI look like 10 years from now? What foundations the answers to queries. We can envision BI tools where the should academia build in order to rigorously support the building answer to a query includes (i) a set of tuples accompanied of tools, the optimization of OLAP sessions, and the training of with appropriate visualizations, (ii) the automatic mining new scientists around a logical Paradigm? In this vision paper, of models and patterns, (iii) the extraction of important we make a first attempt to revisit the foundations of OLAP and "jewels" hidden in the result (which we call highlights), BI in an attempt to address the aforementioned questions. and naturally, (iv) advanced, intuitive reporting. To start with, it is worth to shortly revisit the evolution of Contributions and Benefits. The main contribution of this OLAP and BI modeling so far. paper is that it structures a vision for the Business Intelligence • At the beginning of time, people would be working with re- of the near future in terms of a model with novel concepts and lational queries and recordsets, returned by these queries. operators. We aim our definitions to be broad enough, yet as precise This treatment of BI was very DBMS-oriented, as the focus as possible; at the same time, we want to link them as much as of attention was on what the DBMS can do for the users possible to the intentional nature of the next generation of BI tools, [8]. where the end-user requests information at a very high level and the • Then, both the scientific community and the industry un- system transforms these requests to concrete execution of algorithms derstood that it is possible to provide a layer of abstraction in order to compute, visualize and comment data and important on top of the database. So, users would deal (on-line) with highlights among them as an answer to the information request cubes, rather than with traditional database data, which made by the end-user. gives a very elegant simplification of the data to the user. Why bother? We are convinced that after 50 years of naive The operations would be cube-oriented, one level of ab- query answering, it is now time to replace it with effortless, au- straction above the database operations and would involve tomatic insight gaining from the user. Instead of making the end roll-ups, drill-downs, etc. (see for example [15]). user dig into sets of records, we can increase productivity and the • Rapidly, these abstractions became even closer to the way understanding of the essence of the data by using two pillars, one the multidimensional data space was navigated. Research devoted to querying and another devoted to answering, specif- proposed advanced operators permitting discovery driven ically, firstly, by allowing the user to focus on high-level goals analysis were basically combinations of OLAP primitives of information acquisition, rather than details of what data to [16–18]. bring in, and secondly, by providing focus-points in the answers What is the vision for the new generation of BI tools? Broadly that will move the user effort from manual "jewel mining" to speaking, we envision the redefinition of what an OLAP query is, addressing the insights gained. both with respect to what users ask the system and with respect to Several other practical benefits can be envisioned. Our moti- what the answer entails. vation was partially triggered from the problem of generating • Concerning the aspect of what users can ask, we wish to meaningful, artificial session logs for experimental reasons (e.g., further simplify the operations and choices that the users like [14]). Moreover, such efforts also make sense in other do- face and add an extra layer of abstraction. Specifically, we mains, e.g., to search Web data sources [3]. Finally, note that this propose to replace cube operators with intentional opera- algebra can be seen as a first step towards addressing some of tors over the data – in other words, with an "algebra" of the open challenges proposed by the research community, like operators closely to the user’s intentions. Instead of the e.g., in [9], namely the lack of declarative exploration languages to present and reason about popular navigational idioms, or the © 2018 Copyright held by the owner/author(s). Published in the Workshop various challenges raised by [5] around the benchmarking of Proceedings of the EDBT/ICDT 2018 Joint Conference (March 26, 2018, Vienna, Austria) on CEUR-WS.org (ISSN 1613-0073). Distribution of this paper is permitted interactive data exploration by measuring user’s gain in terms of under the terms of the Creative Commons license CC-by-nc-nd 4.0. insights. 2 THE VISION Compare. A transition of type compare is used to retrieve more An OLAP session is a sequence of dashboards that the analyst data to be compared with what the current dashboard displays. sees, each with its own information, including data, charts and Intuitively, transitions in this family ask for bringing in more data informative summaries of KPI performance. The sequence is to the ongoing analysis. The family includes OLAP operators produced by the actions of the analyst that changes the contents pivot, drill-across, the augment operator of [3], the put-in-context of the dashboard by requesting more information on the basis of act of Cinecubes [7]. For instance, on the example of Figure 1, the a set of operations made available to him by the tool. put-in-context [7] is used twice (top left vertical path in Figure 1), first, to compare the sales for a category of products (MFGR#5) 2.1 Preliminaries on multidimensional in a given supplier country (Argentina) to that of past years modeling (indicated as ❷), and second to compare these past results to the results for sibling countries in the dimension Supplier (indicated For lack of space, we do not go to the details of data modeling. as ❺). We closely follow the model of [7] and constrain ourselves to mention a textual summary. As typically happens with multidi- FocusOn. A transition of type FocusOn is used to exclude data mensional models, we assume that dimensions provide a context from the current analysis. Transitions in this family focus on a for facts [11]. This is especially important if combined with the particular cell, or group of cells. The family includes the OLAP fact that dimension values come in hierarchies; therefore, every slice/dice operation, the take operator of [3], and also personal- single fact can be simultaneously placed in multiple hierarchi- ization operators like skyline, winnow, etc. [10]. For instance, in cally structured contexts, providing thus the ability to analyze Figure 1, a slice/dice operation is done to focus on sales results sets of facts from multiple perspectives. The underlying data sets for part MFRG#51 in years 2015 and 2016 (indicated as ❼). include measures that are characterized with respect to these di- mensions. Cube queries involve measure aggregations at specific Abstract. The transitions of this family are used to reduce the levels of granularity per dimension, along with filtering of data information load of the dashboard, by abstracting them in a more for specific values of interest. compact form. The family includes the traditional roll-up opera- We model an OLAP session as a directed graph G(V ,E). The tor, as well as data mining operations like clustering, frequent set of vertices of the graph represent states of the user’s session itemset mining, etc. For instance, in Figure 1, sales results for –practically the dashboard screens the user is receiving by the years 2011 (not all shown in previous dashboards) to 2016 in OLAP tool– and the arcs represent transitions among states. Argentina are abstracted in 2 classes based on a result threshold (indicated as ❹). 2.2 Intentional Queries: The Transitions of a Analyze and Explain. A transition of type Analyze and Explain Session is used to understand the cause of a phenomenon displayed in the Before describing in detail our vision on dashboards and their current dashboard and to explore what is displayed at finer levels internals, we start with the deeper essence of the intentional na- of detail. With respect to the analysis part, the family includes ture of our proposal: user operations, or, equivalently transitions the drill-down OLAP operation, and the details act of Cinecubes between the states of an OLAP session. The main idea behind tran- [7]. With respect to the explain part, the family includes Diff [16] sitions is that we move from a concrete model of logical operators (interesting drill downs at various granularities explaining the like roll-up’s and drill down’s, to an intentional model where the content of two cells) and the operators of Cariou & al. [4]. For user expresses in terms of operators for high-level requirements instance, at the top of Figure 1, a drill-down from countries to like "explain a certain phenomenon", "predict the future values" cities is used to analyze sale results by cities (indicated as ❶), and this high-level requirement has to be automatically trans- and, at the left bottom of the figure, a Diff is used to explain an lated to specific OLAP and Data Mining operators/algorithms important drop of sales in Argentina in year 2016 compared to that will carry the answer. This can also facilitate greatly the year 2015, for the specific category of product MFGR#5 (indicated extraction of highlights, as the user’s goal is explicitly stated to as ❻). the system. We organize the transition operators that can express the users’ Verify. A transition of type Verify is used to check that a po- requirements for extra information, in families, i.e., a taxonomy tential trend or pattern currently observed can be generalized to of types for our transitions, which we call transition types. In broader contexts, e.g., we can check whether it occurs at coarser contrast to other models where a transition would practically be a levels of detail. The family includes Relax operator[18] (interest- query (relational or multidimensional), in our model, a transition ing rollups at various granularities agreeing or contradicting a characterizes the intention of the user with respect to her infor- trend observed in the content of two cells), the verification of mation need. Each family is a collection of transition operators outlier data points in broader context than the current, etc. For with the same high-level intention, but of course, different se- instance, in Figure 1, a Relax operation is used to verify whether mantics at the details. This set of families intends to cover (a) the drop of sale results for category MFGR#5 in Argentina, in traditional OLAP operators, (b) various contributions of the liter- 2016, also holds for all products sold (indicated as ❸). ature, in particular discovery driven analysis operators [16–18], Cinecubes acts [7], and operators for exploratory search in the Predict. A transition of type Predict is used for predictive anal- web [3], as well as (c) novel operators that can be developed by ysis of the displayed data. Typically this involves the entire set of the research community in the future. Before detailing these fam- timeseries forecasting methods, as well as classification methods ilies, we mention Figure 1 that intuitively describes an analysis used for assessing future events [1]. For instance, in Figure 1, phrased with some transitions, using the Star Schema Benchmark sales results of product MFGR#51 for year 2015 and 2016 are used data cube [13]. The annotation and highlight generation of the to issue predictions for year 2017 in Argentina cities (indicated data is explained in Section 2.3. as ❽). Figure 1: A BI session phrased with some of our transitions. Numbers show the sequence of dashboard generation. Colored cells are results of data analytics operations annotating the dashboard data. Suggest. A transition of type Suggest is used to ask for guid- abstract 2Cl asses ( ance with respect to where to navigate next in the multidimen- veri f yall P ar t s,showT r end (❷))))). sional data space. The family includes Sarawagi’s Inform [17] An optimizer would then automatically translate this request into (Interesting drill downs at various granularities, deviating from the sequence of operations: roll up, cluster, backtrack, Cinecube’s trends that what was already observed), or query recommenda- put in context, Diff and optimize it to produce a data story using tion techniques [12]. We note that this kind of transition should dashboards ❷ to ❻. incorporate risk control mechanisms to prevent false discoveries [2]. For instance, in Figure 1, a suggest operation is used to auto- 2.3 Dashboards: The states of a session matically find neighboring sale results deviating from the results A state is a dashboard that the user sees. In principle, each dash- of the previous operation (indicated as ❾). board is ultimately based on the generating data provided by as Concluding with transitions, we emphasize that an open re- a finite collection of queries, posed to the underlying database. A search challenge is to define an algebraically closed language sharp distinction of our approach compared to previous models of intentional operators, each of which can be translated auto- is that we do not restrict ourselves to data for each state but ac- matically to the execution of concrete query and data mining company them with a set of interesting findings, which come in operators. The synthesis of these operators can then create entire two flavors, specifically, (a) in terms of models, i.e., results of data stories. Assume the user, being at state ❷, would like to: "verify mining or machine learning algorithms applied over the data of whether the distribution of sales for mfgr#5 in Argentina from a dashboard’s state, and, (b) important findings that accompany 2011 to 2016 still holds in general for all parts, and build a 2 classes the dashboard, to which we refer as the highlights of the state. model for it, then backtrack to parts and compare with sibling Take for example, the small scenario of Fig. 2. The dashboard is countries, and finally explain the highest country-wise differ- based on a simple cube, as its generating data, involving Product, ence." This request would correspond to the following statement Reдion and Year as dimension levels and Sales and Cost as (ag- in the intentional language: explainhiдhl iдht :Max Dif f er ence ( gregate) measures. Then, the dashboard automatically computes comparesibl inдCount r ies ( the Bene f it as the application of a simple arithmetic function, backtrackshowT r end ( specifically, the difference of the two measures. A second step Figure 2: Model extraction, data annotation and highlight production: an example involves the building of a model, and specifically the classifica- will simply be not enough in the near future. The automatic tion of sales with respect to their stability of the benefit measure. assessment and critical characterization of the presented data Finally, highlights are extracted on the basis of this model, and will be part of the BI of the near-future. See some simple cases specifically, these highlight are the cube cells pertaining to the based on the example of observing sales data of an international class stable bene f its. Observe how the final data are extended company: with the class attribute linking them to their specific model coun- • Sales data will be automatically characterized with respect terpart. to a decision tree that classifies them (e.g., as "successful", Steps. In order to construct and visualize a dashboard, we "risky", "potentially hazardous" etc) envision several computations taking place. Here is the sequence • Sales per country will be automatically clustered to re- of the performed actions: veal similarities and differences, as a first step towards (1) First, the queries of the state’s dashboard are issued and understanding outliers and non-expected behavior their results, the generating data of the dashboard, are com- • Aggregate sales over significant periods will be fed into puted. Any straightforward computations for extra, de- time series analysis and forecasting methods to automat- rived columns of the dashboard (e.g., дain=price-cost) are ically detect trends, seasonalities and to deduce future performed too. values (2) Then, the available data are fed to model extraction algo- We consider the plugging of data analysis algorithms in the back- rithms for the computation of models abstract, summarize stage of a dashboard as an indispensable part of BI. These algo- and provide patterns and insights for the data. rithms can range from very simple ones (e.g., finding the top (3) The potentially large amount of data and models computed values of a cuboid, or detecting whether a dimension value is has to be ranked and assessed on their interestingness for systematically related to top or bottom sales) to very compli- the analyst; the most important findings are classified cated ones (like, for example, outlier detection, dimensionality as the dashboards highlights to be used for providing the reduction, etc). Most importantly, as the operation of the algo- main insights and the main directions for future transitions rithms will likely be as transparent as possible to the end user, by the analyst. their execution will require an almost automatic tuning of their (4) The above are accompanied by visualization, text con- parameters. The findings of these algorithms will be models of struction and reporting tasks that aim the process of un- the data that are typically (not always) used to annotate the derstanding and communicating the main findings. existing data with characterizations and offer focus points to The generating data of the dashboard. The results of the the visualization of the dashboard (forecasts, outliers, dimension underlying queries are the basis for the subsequent computations values that dominate top or bottom measures, . . .). The models that take place for the construction of the dashboard: this is why themselves give a multitude of results. However, some of these we refer to them as the generating data of the dashboard. results indicate that a part of a dashboard’s data are of important Models. In difference from the state of the art in OLAP model- interestingness value to the end user. Due to that, we collectively ing, in our approach, we believe that the static results of aggregate refer to the important results of the execution of these algorithms queries, and their visualizations with charts and speedometers as highlights, in an attempt to show that the aim is to enrich the current data-intensive dashboards with knowledge that is once a model of the underlying data is available, our solution worth exploring or using for decision making. to the problem is to provide a distinct identity (an id) to the We are going to treat model extraction algorithms as "black- components of a complex type T , retain the membership/part-of box" algorithm without probing into their internals, and, most relationships of T separately and annotate or characterize the data importantly, without assuming any particular properties for their with respect to the part of T that pertains to them. In fact, this step output. What does a model extraction algorithm do? Basically, can be blended within the model extraction itself. Examples of the algorithm receives as input (a) a set of input data, and, (b) a set such annotation follow: of execution parameters that have to be fixed for the algorithm’s • Assuming a time series model that splits a time series execution. Without loss of generality, we can assume that a subset to trend, seasonality and noise, these attributes can be of these parameters will be bound to string or numerical values appended to the generating data set. and the rest will be mapped to attributes of the input data. The • Assuming a cluster model, the generating data can be output of a model extraction algorithm is a model of the input data. annotated with the id of the cluster to which they belong. Depending on the algorithm, the result differs. For instance, a • Assuming a classification model, the input data can be descriptive model built using unsupervised clustering is basically labeled via an extra attribute with respect to the class(es) just a labeling of each cube’s cells, while a predictive one allows of the model to which they belong. enriching the cube with predictions and comes with an accuracy • Assuming a model of top-k values of a measure, the input score. In summary, the main properties of a model extraction data can be annotated with their rank, if they belong in algorithm is outlined as follows: the top-k set and they have been ranked. (1) Input: a set of input data, which is the result of an extended A notable property of our modeling is that we require model cube query set of the dashboard, along with a fixing of the components to be directly mapped and linked to their generating algorithm’s input parameters data in a bidirectional mapping, so that the end-user can navigate (2) Output: a (possibly complex) result composed of (a) a back and forth between cube cells and their models. model of the input data, and, (b) several characterizations Highlight Production. As already mentioned, the set of high- of it (precision, strength, p-value, . . . ) lights of the dashboard is a set of important findings that accom- Each model type can be arbitrarily complex and consists of a pany the dashboard. These can be findings of any nature, e.g., set of model components of versatile nature. Examples: important outliers in the contents of the dashboard’s data, all • A time series splits each of its points to 3 measurements, the tuples belonging to a certain class of a classification scheme, specifically error, trend and seasonality (practically creat- the top or bottom values of a measure, etc. It would be straight- ing 3 times series in the place of one, whose sum recon- forward (and doable with our modeling) to treat all the contents structs the original one). of an extended query as highlights. However, we would like to • A clustering scheme includes a set of clusters, each with a stress that we want to restrict our attention to the ones that are set of tuples that constitute it, along with a centroid. really important for the end-user. • A classification decision tree includes a tree structure, best Other "local’ operations. Once all computations are done, expressed as the composition of a set of paths, leading to there are several tasks to be taken care of before automatically characterization classes; again, each class comprises a set constructing the dashboard. We envision that the development of generating tuples that pertain to it. of principled methods for (a) the visualization of the dashboard data in various ways and (b) the creation of reports, via data Typically, each such component, as well as the model on its storytelling methods will present major challenges for the future. entirety is accompanied with a set of metrics of its statistical power. Given a model type T , we denote its components with Visualize. Operations in this family compute the contents of TC = {tmc 1 . . .tmcm }. We avoid resolving the internals of the visual representations (bar charts, speedometers, scatterplots, ...) composition of the tmc parts and treat TC as a set of components. on the basis of the current contents of the dashboard. We con- This is realistic, as we can always mask a part-of relation as a set sider them to be local operations as they are performed without of constituting components, even a recursive one, by allowing any further interaction with data external to the ones already an extra composition relation to provide the semantics of the retrieved from the dashboard. part of-relationship. Apart from this simple modeling, attempts to relationally code mining results already exist [6]. In other Data storytelling. Data storytelling is a novel trend that seems words: assuming a model M of type T (e.g., a particular cluster to carry significant weight in the way the future BI will look like. set M of type ClusterSet), we can use its components (e.g., the The main idea involves the automatic generation of a textual clusters of M, M = {cluster 1 , . . . , clusterm }) as fundamental parts report from the data of a particular dashboard or of an entire of subsequent analyses, and, at the same time, link each of these OLAP sessions. There are already some academic efforts, like components to their underlying data. Cinecubes [7] as well as some tools (see [19] for a nice discussion). But, then, how do we handle the heterogeneity of different model types? Clearly, a cluster is inherently different from a decision 3 PATHS FOR FUTURE RESEARCH tree or the formula for a trend. Is there a unifying model to cover This paper is a vision paper describing, in broad terms, a potential them all? The unifying essence of all the plethora of diverse model future for OLAP, to strengthen its place as the corner stone of BI. types is that, at the end of the day, all of them are annotations Our call to arms to the research community can be structured of the original data. At the end of the day, every component of a along several lines. Population of the families of the transition complex model type (be it a cluster id, a path in a decision tree and operators with concrete operators. Each new operator (or each new a resulting class, a characterization of the top-k tuples, or a trend formalization of existing operators) should hopefully carry (a) formula): (a) refers to a subset of the input data and vice-versa, clear semantics, (b) execution algorithms as well as their opti- and, (b) refers to the overall model via a part-of relationship. So, mizations, (c) fine tuning algorithms for the parameter fixing of the introduced algorithms and, equally importantly, (d) a graceful [16] Sunita Sarawagi. 1999. Explaining Differences in Multidimensional Aggre- linkage to the overall model proposed here (in an attempt to be gates. In Proceedings of 25th International Conference on Very Large Data Bases (VLDB’99). 42–53. able to gracefully plug it in the respective BI tools). Automation of [17] Sunita Sarawagi. 2000. User-Adaptive Exploration of Multidimensional Data. transitions and tuning. We believe this to be the most important In Proceedings of 26th International Conference on Very Large Data Bases (VLDB 2000). 307–316. piece of the puzzle. How do we fully automate what model and [18] Gayatri Sathe and Sunita Sarawagi. 2001. Intelligent Rollups in Multidimen- highlight extraction methods should be employed? This includes sional OLAP Data. In Proceedings of 27th International Conference on Very the possibility of predicting interesting results for the end-user, Large Data Bases (VLDB 2001). 531–540. [19] Alex Wright. 2015. Algorithmic authors. Commun. ACM 58, 11 (2015), 12–14. which in turn, requires appropriate models of interestingness. Along with the necessary run-time optimizations, all these tasks provide challenging and open problems of significance practical importance. Key Performance Indicators (at least in the way they are used today) are examples of simple model extraction, and by definition, KPI’s are highlights (or else they wouldn’t be "Key" indicators). Linking them explicitly to a model for OLAP is a necessary add-on for a comprehensive view of what OLAP does. Benchmarking and tools. A reference free, open-source tool and a reference benchmark for the future BI (involving data, model and highlight extraction requests and sessions) can be a really handy tool for the research community (otherwise, each new paper will need to improvise on its experimental assessment). A tool will also trigger other research directions, like e.g., the incorporation of research results on natural language processing to accept the user requests, visualizations to depict the models and highlights, etc. REFERENCES [1] Charu C. Aggarwal. 2015. Data Mining - The Textbook. Springer. [2] Carsten Binnig, Lorenzo De Stefani, Tim Kraska, Eli Upfal, Emanuel Zgraggen, and Zheguang Zhao. 2017. Toward Sustainable Insights, or Why Polygamy is Bad for You. In CIDR 2017, 8th Biennial Conference on Innovative Data Systems Research. [3] Alessandro Bozzon, Marco Brambilla, Stefano Ceri, and Davide Mazza. 2013. Exploratory search framework for Web data sources. VLDB J. 22, 5 (2013), 641–663. [4] Véronique Cariou, Jérôme Cubillé, Christian Derquenne, Sabine Goutier, Françoise Guisnel, and Henri Klajnmic. 2009. Embedded indicators to fa- cilitate the exploration of a data cube. IJBIDM 4, 3/4 (2009), 329–349. [5] Philipp Eichmann, Emanuel Zgraggen, Zheguang Zhao, Carsten Binnig, and Tim Kraska. 2016. Towards a Benchmark for Interactive Data Exploration. IEEE Data Eng. Bull. 39, 4 (2016), 50–61. [6] Arnaud Giacometti, Patrick Marcel, and Arnaud Soulet. 2011. A Relational View of Pattern Discovery. In Database Systems for Advanced Applications - 16th International Conference, DASFAA. 153–167. [7] Dimitrios Gkesoulis, Panos Vassiliadis, and Petros Manousis. 2015. CineCubes: Aiding data workers gain insights from OLAP queries. Inf. Syst. 53 (2015), 60–86. [8] Jim Gray, Adam Bosworth, Andrew Layman, and Hamid Pirahesh. 1996. Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Total. In Proceedings of the Twelfth International Conference on Data Engineering. 152–159. [9] Stratos Idreos, Olga Papaemmanouil, and Surajit Chaudhuri. 2015. Overview of Data Exploration Techniques. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data. 277–281. [10] Ihab F. Ilyas, George Beskales, and Mohamed A. Soliman. 2008. A survey of top-k query processing techniques in relational database systems. ACM Comput. Surv. 40, 4 (2008), 11:1–11:58. [11] Christian S. Jensen, Torben Bach Pedersen, and Christian Thomsen. 2010. Multidimensional Databases and Data Warehousing. Morgan & Claypool Publishers. [12] Patrick Marcel and Elsa Negre. 2011. A survey of query recommendation techniques for data warehouse exploration. In Actes des 7èmes journées fran- cophones sur les Entrepôts de Données et l’Analyse en ligne, Clermont-Ferrand, France, EDA 2011, Juin 2011. 119–134. [13] Patrick E. O’Neil, Elizabeth J. O’Neil, Xuedong Chen, and Stephen Revilak. 2009. The Star Schema Benchmark and Augmented Fact Table Indexing.. In TPCTC (2009-10-28). 237–252. [14] Stefano Rizzi and Enrico Gallinucci. 2014. CubeLoad: A Parametric Generator of Realistic OLAP Workloads. In Advanced Information Systems Engineering - 26th International Conference, CAiSE 2014, Thessaloniki, Greece, June 16-20, 2014. Proceedings. 610–624. [15] Oscar Romero and Alberto Abelló. 2007. On the Need of a Reference Algebra for OLAP. In Data Warehousing and Knowledge Discovery, 9th International Conference, DaWaK 2007, Regensburg, Germany, September 3-7, 2007, Proceed- ings. 99–110.