=Paper= {{Paper |id=Vol-2062/paper7 |storemode=property |title=The Road to Highlights is Paved with Good Intentions: Envisioning a Paradigm Shift in OLAP Modeling |pdfUrl=https://ceur-ws.org/Vol-2062/paper07.pdf |volume=Vol-2062 |authors=Panos Vassiliadis,Patrick Marcel |dblpUrl=https://dblp.org/rec/conf/dolap/VassiliadisM18 }} ==The Road to Highlights is Paved with Good Intentions: Envisioning a Paradigm Shift in OLAP Modeling== https://ceur-ws.org/Vol-2062/paper07.pdf
             The road to highlights is paved with good intentions:
               envisioning a paradigm shift in OLAP modeling
                             Panos Vassiliadis                                                           Patrick Marcel
                                Univ. Ioannina                                                         University of Tours
                               Ioannina, Greece                                                           Tours, France
                               pvassil@cs.uoi.gr                                                  patrick.marcel@univ-tours.fr

ABSTRACT                                                                                  users operating in terms of roll-up and drill-down with
In this vision paper we structure a vision for the Business Intelli-                      the cube, we wish to empower them to state their inten-
gence of the near future in terms of a model with novel concepts                          tions with respect to an operation. For example, instead of
and operators. We envision systems where the end-user requests                            saying "drill down this cell one level", the user might ask
information at a very high level, expressed as his intention to dis-                      "explain the drop in the sales of this product", or "is the
cover information, and the system transforms this request to the                          main drop in sales observed at the best selling city also
concrete execution of algorithms in order to compute, visualize                           observed in similar cities?"; as another example, instead
and comment data and important highlights among them as an                                of issuing a roll-up operator, the user can state "verify
answer to the information request made by the end-user.                                   whether a trend that I observe in a certain context still
                                                                                          holds in general".
                                                                                        • Concerning the aspect of what the answer to a query is,
1    INTRODUCTION                                                                         we believe we can no longer remain with "sets of tuples" as
How will BI look like 10 years from now? What foundations                                 the answers to queries. We can envision BI tools where the
should academia build in order to rigorously support the building                         answer to a query includes (i) a set of tuples accompanied
of tools, the optimization of OLAP sessions, and the training of                          with appropriate visualizations, (ii) the automatic mining
new scientists around a logical Paradigm? In this vision paper,                           of models and patterns, (iii) the extraction of important
we make a first attempt to revisit the foundations of OLAP and                            "jewels" hidden in the result (which we call highlights),
BI in an attempt to address the aforementioned questions.                                 and naturally, (iv) advanced, intuitive reporting.
   To start with, it is worth to shortly revisit the evolution of                      Contributions and Benefits. The main contribution of this
OLAP and BI modeling so far.                                                        paper is that it structures a vision for the Business Intelligence
     • At the beginning of time, people would be working with re-                   of the near future in terms of a model with novel concepts and
       lational queries and recordsets, returned by these queries.                  operators. We aim our definitions to be broad enough, yet as precise
       This treatment of BI was very DBMS-oriented, as the focus                    as possible; at the same time, we want to link them as much as
       of attention was on what the DBMS can do for the users                       possible to the intentional nature of the next generation of BI tools,
       [8].                                                                         where the end-user requests information at a very high level and the
     • Then, both the scientific community and the industry un-                     system transforms these requests to concrete execution of algorithms
       derstood that it is possible to provide a layer of abstraction               in order to compute, visualize and comment data and important
       on top of the database. So, users would deal (on-line) with                  highlights among them as an answer to the information request
       cubes, rather than with traditional database data, which                     made by the end-user.
       gives a very elegant simplification of the data to the user.                    Why bother? We are convinced that after 50 years of naive
       The operations would be cube-oriented, one level of ab-                      query answering, it is now time to replace it with effortless, au-
       straction above the database operations and would involve                    tomatic insight gaining from the user. Instead of making the end
       roll-ups, drill-downs, etc. (see for example [15]).                          user dig into sets of records, we can increase productivity and the
     • Rapidly, these abstractions became even closer to the way                    understanding of the essence of the data by using two pillars, one
       the multidimensional data space was navigated. Research                      devoted to querying and another devoted to answering, specif-
       proposed advanced operators permitting discovery driven                      ically, firstly, by allowing the user to focus on high-level goals
       analysis were basically combinations of OLAP primitives                      of information acquisition, rather than details of what data to
       [16–18].                                                                     bring in, and secondly, by providing focus-points in the answers
   What is the vision for the new generation of BI tools? Broadly                   that will move the user effort from manual "jewel mining" to
speaking, we envision the redefinition of what an OLAP query is,                    addressing the insights gained.
both with respect to what users ask the system and with respect to                     Several other practical benefits can be envisioned. Our moti-
what the answer entails.                                                            vation was partially triggered from the problem of generating
     • Concerning the aspect of what users can ask, we wish to                      meaningful, artificial session logs for experimental reasons (e.g.,
       further simplify the operations and choices that the users                   like [14]). Moreover, such efforts also make sense in other do-
       face and add an extra layer of abstraction. Specifically, we                 mains, e.g., to search Web data sources [3]. Finally, note that this
       propose to replace cube operators with intentional opera-                    algebra can be seen as a first step towards addressing some of
       tors over the data – in other words, with an "algebra" of                    the open challenges proposed by the research community, like
       operators closely to the user’s intentions. Instead of the                   e.g., in [9], namely the lack of declarative exploration languages
                                                                                    to present and reason about popular navigational idioms, or the
© 2018 Copyright held by the owner/author(s). Published in the Workshop             various challenges raised by [5] around the benchmarking of
Proceedings of the EDBT/ICDT 2018 Joint Conference (March 26, 2018, Vienna,
Austria) on CEUR-WS.org (ISSN 1613-0073). Distribution of this paper is permitted
                                                                                    interactive data exploration by measuring user’s gain in terms of
under the terms of the Creative Commons license CC-by-nc-nd 4.0.                    insights.
2     THE VISION                                                          Compare. A transition of type compare is used to retrieve more
An OLAP session is a sequence of dashboards that the analyst           data to be compared with what the current dashboard displays.
sees, each with its own information, including data, charts and        Intuitively, transitions in this family ask for bringing in more data
informative summaries of KPI performance. The sequence is              to the ongoing analysis. The family includes OLAP operators
produced by the actions of the analyst that changes the contents       pivot, drill-across, the augment operator of [3], the put-in-context
of the dashboard by requesting more information on the basis of        act of Cinecubes [7]. For instance, on the example of Figure 1, the
a set of operations made available to him by the tool.                 put-in-context [7] is used twice (top left vertical path in Figure 1),
                                                                       first, to compare the sales for a category of products (MFGR#5)
2.1    Preliminaries on multidimensional                               in a given supplier country (Argentina) to that of past years
       modeling                                                        (indicated as ❷), and second to compare these past results to the
                                                                       results for sibling countries in the dimension Supplier (indicated
For lack of space, we do not go to the details of data modeling.       as ❺).
We closely follow the model of [7] and constrain ourselves to
mention a textual summary. As typically happens with multidi-             FocusOn. A transition of type FocusOn is used to exclude data
mensional models, we assume that dimensions provide a context          from the current analysis. Transitions in this family focus on a
for facts [11]. This is especially important if combined with the      particular cell, or group of cells. The family includes the OLAP
fact that dimension values come in hierarchies; therefore, every       slice/dice operation, the take operator of [3], and also personal-
single fact can be simultaneously placed in multiple hierarchi-        ization operators like skyline, winnow, etc. [10]. For instance, in
cally structured contexts, providing thus the ability to analyze       Figure 1, a slice/dice operation is done to focus on sales results
sets of facts from multiple perspectives. The underlying data sets     for part MFRG#51 in years 2015 and 2016 (indicated as ❼).
include measures that are characterized with respect to these di-
mensions. Cube queries involve measure aggregations at specific           Abstract. The transitions of this family are used to reduce the
levels of granularity per dimension, along with filtering of data      information load of the dashboard, by abstracting them in a more
for specific values of interest.                                       compact form. The family includes the traditional roll-up opera-
   We model an OLAP session as a directed graph G(V ,E). The           tor, as well as data mining operations like clustering, frequent
set of vertices of the graph represent states of the user’s session    itemset mining, etc. For instance, in Figure 1, sales results for
–practically the dashboard screens the user is receiving by the        years 2011 (not all shown in previous dashboards) to 2016 in
OLAP tool– and the arcs represent transitions among states.            Argentina are abstracted in 2 classes based on a result threshold
                                                                       (indicated as ❹).
2.2    Intentional Queries: The Transitions of a
                                                                           Analyze and Explain. A transition of type Analyze and Explain
       Session
                                                                       is used to understand the cause of a phenomenon displayed in the
Before describing in detail our vision on dashboards and their         current dashboard and to explore what is displayed at finer levels
internals, we start with the deeper essence of the intentional na-     of detail. With respect to the analysis part, the family includes
ture of our proposal: user operations, or, equivalently transitions    the drill-down OLAP operation, and the details act of Cinecubes
between the states of an OLAP session. The main idea behind tran-      [7]. With respect to the explain part, the family includes Diff [16]
sitions is that we move from a concrete model of logical operators     (interesting drill downs at various granularities explaining the
like roll-up’s and drill down’s, to an intentional model where the     content of two cells) and the operators of Cariou & al. [4]. For
user expresses in terms of operators for high-level requirements       instance, at the top of Figure 1, a drill-down from countries to
like "explain a certain phenomenon", "predict the future values"       cities is used to analyze sale results by cities (indicated as ❶),
and this high-level requirement has to be automatically trans-         and, at the left bottom of the figure, a Diff is used to explain an
lated to specific OLAP and Data Mining operators/algorithms            important drop of sales in Argentina in year 2016 compared to
that will carry the answer. This can also facilitate greatly the       year 2015, for the specific category of product MFGR#5 (indicated
extraction of highlights, as the user’s goal is explicitly stated to   as ❻).
the system.
    We organize the transition operators that can express the users’      Verify. A transition of type Verify is used to check that a po-
requirements for extra information, in families, i.e., a taxonomy      tential trend or pattern currently observed can be generalized to
of types for our transitions, which we call transition types. In       broader contexts, e.g., we can check whether it occurs at coarser
contrast to other models where a transition would practically be a     levels of detail. The family includes Relax operator[18] (interest-
query (relational or multidimensional), in our model, a transition     ing rollups at various granularities agreeing or contradicting a
characterizes the intention of the user with respect to her infor-     trend observed in the content of two cells), the verification of
mation need. Each family is a collection of transition operators       outlier data points in broader context than the current, etc. For
with the same high-level intention, but of course, different se-       instance, in Figure 1, a Relax operation is used to verify whether
mantics at the details. This set of families intends to cover (a)      the drop of sale results for category MFGR#5 in Argentina, in
traditional OLAP operators, (b) various contributions of the liter-    2016, also holds for all products sold (indicated as ❸).
ature, in particular discovery driven analysis operators [16–18],
Cinecubes acts [7], and operators for exploratory search in the           Predict. A transition of type Predict is used for predictive anal-
web [3], as well as (c) novel operators that can be developed by       ysis of the displayed data. Typically this involves the entire set of
the research community in the future. Before detailing these fam-      timeseries forecasting methods, as well as classification methods
ilies, we mention Figure 1 that intuitively describes an analysis      used for assessing future events [1]. For instance, in Figure 1,
phrased with some transitions, using the Star Schema Benchmark         sales results of product MFGR#51 for year 2015 and 2016 are used
data cube [13]. The annotation and highlight generation of the         to issue predictions for year 2017 in Argentina cities (indicated
data is explained in Section 2.3.                                      as ❽).
Figure 1: A BI session phrased with some of our transitions. Numbers show the sequence of dashboard generation. Colored
cells are results of data analytics operations annotating the dashboard data.


   Suggest. A transition of type Suggest is used to ask for guid-                      abstract 2Cl asses (
ance with respect to where to navigate next in the multidimen-                             veri f yall P ar t s,showT r end (❷))))).
sional data space. The family includes Sarawagi’s Inform [17]              An optimizer would then automatically translate this request into
(Interesting drill downs at various granularities, deviating from          the sequence of operations: roll up, cluster, backtrack, Cinecube’s
trends that what was already observed), or query recommenda-               put in context, Diff and optimize it to produce a data story using
tion techniques [12]. We note that this kind of transition should          dashboards ❷ to ❻.
incorporate risk control mechanisms to prevent false discoveries
[2]. For instance, in Figure 1, a suggest operation is used to auto-       2.3    Dashboards: The states of a session
matically find neighboring sale results deviating from the results         A state is a dashboard that the user sees. In principle, each dash-
of the previous operation (indicated as ❾).                                board is ultimately based on the generating data provided by as
   Concluding with transitions, we emphasize that an open re-              a finite collection of queries, posed to the underlying database. A
search challenge is to define an algebraically closed language             sharp distinction of our approach compared to previous models
of intentional operators, each of which can be translated auto-            is that we do not restrict ourselves to data for each state but ac-
matically to the execution of concrete query and data mining               company them with a set of interesting findings, which come in
operators. The synthesis of these operators can then create entire         two flavors, specifically, (a) in terms of models, i.e., results of data
stories. Assume the user, being at state ❷, would like to: "verify         mining or machine learning algorithms applied over the data of
whether the distribution of sales for mfgr#5 in Argentina from             a dashboard’s state, and, (b) important findings that accompany
2011 to 2016 still holds in general for all parts, and build a 2 classes   the dashboard, to which we refer as the highlights of the state.
model for it, then backtrack to parts and compare with sibling                 Take for example, the small scenario of Fig. 2. The dashboard is
countries, and finally explain the highest country-wise differ-            based on a simple cube, as its generating data, involving Product,
ence." This request would correspond to the following statement            Reдion and Year as dimension levels and Sales and Cost as (ag-
in the intentional language: explainhiдhl iдht :Max Dif f er ence (        gregate) measures. Then, the dashboard automatically computes
     comparesibl inдCount r ies (                                          the Bene f it as the application of a simple arithmetic function,
         backtrackshowT r end (                                            specifically, the difference of the two measures. A second step
                       Figure 2: Model extraction, data annotation and highlight production: an example


involves the building of a model, and specifically the classifica-      will simply be not enough in the near future. The automatic
tion of sales with respect to their stability of the benefit measure.   assessment and critical characterization of the presented data
Finally, highlights are extracted on the basis of this model, and       will be part of the BI of the near-future. See some simple cases
specifically, these highlight are the cube cells pertaining to the      based on the example of observing sales data of an international
class stable bene f its. Observe how the final data are extended        company:
with the class attribute linking them to their specific model coun-         • Sales data will be automatically characterized with respect
terpart.                                                                      to a decision tree that classifies them (e.g., as "successful",
   Steps. In order to construct and visualize a dashboard, we                 "risky", "potentially hazardous" etc)
envision several computations taking place. Here is the sequence            • Sales per country will be automatically clustered to re-
of the performed actions:                                                     veal similarities and differences, as a first step towards
   (1) First, the queries of the state’s dashboard are issued and             understanding outliers and non-expected behavior
       their results, the generating data of the dashboard, are com-        • Aggregate sales over significant periods will be fed into
       puted. Any straightforward computations for extra, de-                 time series analysis and forecasting methods to automat-
       rived columns of the dashboard (e.g., дain=price-cost) are             ically detect trends, seasonalities and to deduce future
       performed too.                                                         values
   (2) Then, the available data are fed to model extraction algo-       We consider the plugging of data analysis algorithms in the back-
       rithms for the computation of models abstract, summarize         stage of a dashboard as an indispensable part of BI. These algo-
       and provide patterns and insights for the data.                  rithms can range from very simple ones (e.g., finding the top
   (3) The potentially large amount of data and models computed         values of a cuboid, or detecting whether a dimension value is
       has to be ranked and assessed on their interestingness for       systematically related to top or bottom sales) to very compli-
       the analyst; the most important findings are classified          cated ones (like, for example, outlier detection, dimensionality
       as the dashboards highlights to be used for providing the        reduction, etc). Most importantly, as the operation of the algo-
       main insights and the main directions for future transitions     rithms will likely be as transparent as possible to the end user,
       by the analyst.                                                  their execution will require an almost automatic tuning of their
   (4) The above are accompanied by visualization, text con-            parameters. The findings of these algorithms will be models of
       struction and reporting tasks that aim the process of un-        the data that are typically (not always) used to annotate the
       derstanding and communicating the main findings.                 existing data with characterizations and offer focus points to
   The generating data of the dashboard. The results of the             the visualization of the dashboard (forecasts, outliers, dimension
underlying queries are the basis for the subsequent computations        values that dominate top or bottom measures, . . .). The models
that take place for the construction of the dashboard: this is why      themselves give a multitude of results. However, some of these
we refer to them as the generating data of the dashboard.               results indicate that a part of a dashboard’s data are of important
   Models. In difference from the state of the art in OLAP model-       interestingness value to the end user. Due to that, we collectively
ing, in our approach, we believe that the static results of aggregate   refer to the important results of the execution of these algorithms
queries, and their visualizations with charts and speedometers          as highlights, in an attempt to show that the aim is to enrich
the current data-intensive dashboards with knowledge that is               once a model of the underlying data is available, our solution
worth exploring or using for decision making.                              to the problem is to provide a distinct identity (an id) to the
   We are going to treat model extraction algorithms as "black-            components of a complex type T , retain the membership/part-of
box" algorithm without probing into their internals, and, most             relationships of T separately and annotate or characterize the data
importantly, without assuming any particular properties for their          with respect to the part of T that pertains to them. In fact, this step
output. What does a model extraction algorithm do? Basically,              can be blended within the model extraction itself. Examples of
the algorithm receives as input (a) a set of input data, and, (b) a set    such annotation follow:
of execution parameters that have to be fixed for the algorithm’s              • Assuming a time series model that splits a time series
execution. Without loss of generality, we can assume that a subset               to trend, seasonality and noise, these attributes can be
of these parameters will be bound to string or numerical values                  appended to the generating data set.
and the rest will be mapped to attributes of the input data. The               • Assuming a cluster model, the generating data can be
output of a model extraction algorithm is a model of the input data.             annotated with the id of the cluster to which they belong.
Depending on the algorithm, the result differs. For instance, a                • Assuming a classification model, the input data can be
descriptive model built using unsupervised clustering is basically               labeled via an extra attribute with respect to the class(es)
just a labeling of each cube’s cells, while a predictive one allows              of the model to which they belong.
enriching the cube with predictions and comes with an accuracy                 • Assuming a model of top-k values of a measure, the input
score. In summary, the main properties of a model extraction                     data can be annotated with their rank, if they belong in
algorithm is outlined as follows:                                                the top-k set and they have been ranked.
   (1) Input: a set of input data, which is the result of an extended         A notable property of our modeling is that we require model
       cube query set of the dashboard, along with a fixing of the         components to be directly mapped and linked to their generating
       algorithm’s input parameters                                        data in a bidirectional mapping, so that the end-user can navigate
   (2) Output: a (possibly complex) result composed of (a) a               back and forth between cube cells and their models.
       model of the input data, and, (b) several characterizations            Highlight Production. As already mentioned, the set of high-
       of it (precision, strength, p-value, . . . )                        lights of the dashboard is a set of important findings that accom-
   Each model type can be arbitrarily complex and consists of a            pany the dashboard. These can be findings of any nature, e.g.,
set of model components of versatile nature. Examples:                     important outliers in the contents of the dashboard’s data, all
     • A time series splits each of its points to 3 measurements,          the tuples belonging to a certain class of a classification scheme,
       specifically error, trend and seasonality (practically creat-       the top or bottom values of a measure, etc. It would be straight-
       ing 3 times series in the place of one, whose sum recon-            forward (and doable with our modeling) to treat all the contents
       structs the original one).                                          of an extended query as highlights. However, we would like to
     • A clustering scheme includes a set of clusters, each with a         stress that we want to restrict our attention to the ones that are
       set of tuples that constitute it, along with a centroid.            really important for the end-user.
     • A classification decision tree includes a tree structure, best         Other "local’ operations. Once all computations are done,
       expressed as the composition of a set of paths, leading to          there are several tasks to be taken care of before automatically
       characterization classes; again, each class comprises a set         constructing the dashboard. We envision that the development
       of generating tuples that pertain to it.                            of principled methods for (a) the visualization of the dashboard
                                                                           data in various ways and (b) the creation of reports, via data
   Typically, each such component, as well as the model on its
                                                                           storytelling methods will present major challenges for the future.
entirety is accompanied with a set of metrics of its statistical
power. Given a model type T , we denote its components with                   Visualize. Operations in this family compute the contents of
TC = {tmc 1 . . .tmcm }. We avoid resolving the internals of the           visual representations (bar charts, speedometers, scatterplots, ...)
composition of the tmc parts and treat TC as a set of components.          on the basis of the current contents of the dashboard. We con-
This is realistic, as we can always mask a part-of relation as a set       sider them to be local operations as they are performed without
of constituting components, even a recursive one, by allowing              any further interaction with data external to the ones already
an extra composition relation to provide the semantics of the              retrieved from the dashboard.
part of-relationship. Apart from this simple modeling, attempts
to relationally code mining results already exist [6]. In other               Data storytelling. Data storytelling is a novel trend that seems
words: assuming a model M of type T (e.g., a particular cluster            to carry significant weight in the way the future BI will look like.
set M of type ClusterSet), we can use its components (e.g., the            The main idea involves the automatic generation of a textual
clusters of M, M = {cluster 1 , . . . , clusterm }) as fundamental parts   report from the data of a particular dashboard or of an entire
of subsequent analyses, and, at the same time, link each of these          OLAP sessions. There are already some academic efforts, like
components to their underlying data.                                       Cinecubes [7] as well as some tools (see [19] for a nice discussion).
   But, then, how do we handle the heterogeneity of different model
types? Clearly, a cluster is inherently different from a decision          3    PATHS FOR FUTURE RESEARCH
tree or the formula for a trend. Is there a unifying model to cover        This paper is a vision paper describing, in broad terms, a potential
them all? The unifying essence of all the plethora of diverse model        future for OLAP, to strengthen its place as the corner stone of BI.
types is that, at the end of the day, all of them are annotations          Our call to arms to the research community can be structured
of the original data. At the end of the day, every component of a          along several lines. Population of the families of the transition
complex model type (be it a cluster id, a path in a decision tree and      operators with concrete operators. Each new operator (or each new
a resulting class, a characterization of the top-k tuples, or a trend      formalization of existing operators) should hopefully carry (a)
formula): (a) refers to a subset of the input data and vice-versa,         clear semantics, (b) execution algorithms as well as their opti-
and, (b) refers to the overall model via a part-of relationship. So,       mizations, (c) fine tuning algorithms for the parameter fixing of
the introduced algorithms and, equally importantly, (d) a graceful                    [16] Sunita Sarawagi. 1999. Explaining Differences in Multidimensional Aggre-
linkage to the overall model proposed here (in an attempt to be                            gates. In Proceedings of 25th International Conference on Very Large Data Bases
                                                                                           (VLDB’99). 42–53.
able to gracefully plug it in the respective BI tools). Automation of                 [17] Sunita Sarawagi. 2000. User-Adaptive Exploration of Multidimensional Data.
transitions and tuning. We believe this to be the most important                           In Proceedings of 26th International Conference on Very Large Data Bases (VLDB
                                                                                           2000). 307–316.
piece of the puzzle. How do we fully automate what model and                          [18] Gayatri Sathe and Sunita Sarawagi. 2001. Intelligent Rollups in Multidimen-
highlight extraction methods should be employed? This includes                             sional OLAP Data. In Proceedings of 27th International Conference on Very
the possibility of predicting interesting results for the end-user,                        Large Data Bases (VLDB 2001). 531–540.
                                                                                      [19] Alex Wright. 2015. Algorithmic authors. Commun. ACM 58, 11 (2015), 12–14.
which in turn, requires appropriate models of interestingness.
Along with the necessary run-time optimizations, all these tasks
provide challenging and open problems of significance practical
importance. Key Performance Indicators (at least in the way they
are used today) are examples of simple model extraction, and by
definition, KPI’s are highlights (or else they wouldn’t be "Key"
indicators). Linking them explicitly to a model for OLAP is a
necessary add-on for a comprehensive view of what OLAP does.
Benchmarking and tools. A reference free, open-source tool and a
reference benchmark for the future BI (involving data, model and
highlight extraction requests and sessions) can be a really handy
tool for the research community (otherwise, each new paper will
need to improvise on its experimental assessment). A tool will
also trigger other research directions, like e.g., the incorporation
of research results on natural language processing to accept the
user requests, visualizations to depict the models and highlights,
etc.


REFERENCES
 [1] Charu C. Aggarwal. 2015. Data Mining - The Textbook. Springer.
 [2] Carsten Binnig, Lorenzo De Stefani, Tim Kraska, Eli Upfal, Emanuel Zgraggen,
     and Zheguang Zhao. 2017. Toward Sustainable Insights, or Why Polygamy is
     Bad for You. In CIDR 2017, 8th Biennial Conference on Innovative Data Systems
     Research.
 [3] Alessandro Bozzon, Marco Brambilla, Stefano Ceri, and Davide Mazza. 2013.
     Exploratory search framework for Web data sources. VLDB J. 22, 5 (2013),
     641–663.
 [4] Véronique Cariou, Jérôme Cubillé, Christian Derquenne, Sabine Goutier,
     Françoise Guisnel, and Henri Klajnmic. 2009. Embedded indicators to fa-
     cilitate the exploration of a data cube. IJBIDM 4, 3/4 (2009), 329–349.
 [5] Philipp Eichmann, Emanuel Zgraggen, Zheguang Zhao, Carsten Binnig, and
     Tim Kraska. 2016. Towards a Benchmark for Interactive Data Exploration.
     IEEE Data Eng. Bull. 39, 4 (2016), 50–61.
 [6] Arnaud Giacometti, Patrick Marcel, and Arnaud Soulet. 2011. A Relational
     View of Pattern Discovery. In Database Systems for Advanced Applications -
     16th International Conference, DASFAA. 153–167.
 [7] Dimitrios Gkesoulis, Panos Vassiliadis, and Petros Manousis. 2015. CineCubes:
     Aiding data workers gain insights from OLAP queries. Inf. Syst. 53 (2015),
     60–86.
 [8] Jim Gray, Adam Bosworth, Andrew Layman, and Hamid Pirahesh. 1996. Data
     Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab,
     and Sub-Total. In Proceedings of the Twelfth International Conference on Data
     Engineering. 152–159.
 [9] Stratos Idreos, Olga Papaemmanouil, and Surajit Chaudhuri. 2015. Overview
     of Data Exploration Techniques. In Proceedings of the 2015 ACM SIGMOD
     International Conference on Management of Data. 277–281.
[10] Ihab F. Ilyas, George Beskales, and Mohamed A. Soliman. 2008. A survey
     of top-k query processing techniques in relational database systems. ACM
     Comput. Surv. 40, 4 (2008), 11:1–11:58.
[11] Christian S. Jensen, Torben Bach Pedersen, and Christian Thomsen. 2010.
     Multidimensional Databases and Data Warehousing. Morgan & Claypool
     Publishers.
[12] Patrick Marcel and Elsa Negre. 2011. A survey of query recommendation
     techniques for data warehouse exploration. In Actes des 7èmes journées fran-
     cophones sur les Entrepôts de Données et l’Analyse en ligne, Clermont-Ferrand,
     France, EDA 2011, Juin 2011. 119–134.
[13] Patrick E. O’Neil, Elizabeth J. O’Neil, Xuedong Chen, and Stephen Revilak.
     2009. The Star Schema Benchmark and Augmented Fact Table Indexing.. In
     TPCTC (2009-10-28). 237–252.
[14] Stefano Rizzi and Enrico Gallinucci. 2014. CubeLoad: A Parametric Generator
     of Realistic OLAP Workloads. In Advanced Information Systems Engineering
     - 26th International Conference, CAiSE 2014, Thessaloniki, Greece, June 16-20,
     2014. Proceedings. 610–624.
[15] Oscar Romero and Alberto Abelló. 2007. On the Need of a Reference Algebra
     for OLAP. In Data Warehousing and Knowledge Discovery, 9th International
     Conference, DaWaK 2007, Regensburg, Germany, September 3-7, 2007, Proceed-
     ings. 99–110.