<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>The road to highlights is paved with good intentions: envisioning a paradigm shift in OLAP modeling</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Panos Vassiliadis</string-name>
          <email>pvassil@cs.uoi.gr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Patrick Marcel</string-name>
          <email>patrick.marcel@univ-tours.fr</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Univ. Ioannina</institution>
          ,
          <addr-line>Ioannina</addr-line>
          ,
          <country country="GR">Greece</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Tours</institution>
          ,
          <addr-line>Tours</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In this vision paper we structure a vision for the Business Intelligence of the near future in terms of a model with novel concepts and operators. We envision systems where the end-user requests information at a very high level, expressed as his intention to discover information, and the system transforms this request to the concrete execution of algorithms in order to compute, visualize and comment data and important highlights among them as an answer to the information request made by the end-user.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>INTRODUCTION</title>
      <p>How will BI look like 10 years from now? What foundations
should academia build in order to rigorously support the building
of tools, the optimization of OLAP sessions, and the training of
new scientists around a logical Paradigm? In this vision paper,
we make a first attempt to revisit the foundations of OLAP and
BI in an attempt to address the aforementioned questions.</p>
      <p>To start with, it is worth to shortly revisit the evolution of
OLAP and BI modeling so far.</p>
      <p>
        • At the beginning of time, people would be working with
relational queries and recordsets, returned by these queries.
This treatment of BI was very DBMS-oriented, as the focus
of attention was on what the DBMS can do for the users
[
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
• Then, both the scientific community and the industry
understood that it is possible to provide a layer of abstraction
on top of the database. So, users would deal (on-line) with
cubes, rather than with traditional database data, which
gives a very elegant simplification of the data to the user.
The operations would be cube-oriented, one level of
abstraction above the database operations and would involve
roll-ups, drill-downs, etc. (see for example [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]).
• Rapidly, these abstractions became even closer to the way
the multidimensional data space was navigated. Research
proposed advanced operators permitting discovery driven
analysis were basically combinations of OLAP primitives
[
        <xref ref-type="bibr" rid="ref16 ref17 ref18">16–18</xref>
        ].
      </p>
      <p>What is the vision for the new generation of BI tools? Broadly
speaking, we envision the redefinition of what an OLAP query is,
both with respect to what users ask the system and with respect to
what the answer entails.</p>
      <p>• Concerning the aspect of what users can ask, we wish to
further simplify the operations and choices that the users
face and add an extra layer of abstraction. Specifically, we
propose to replace cube operators with intentional
operators over the data – in other words, with an "algebra" of
operators closely to the user’s intentions. Instead of the
users operating in terms of roll-up and drill-down with
the cube, we wish to empower them to state their
intentions with respect to an operation. For example, instead of
saying "drill down this cell one level", the user might ask
"explain the drop in the sales of this product", or "is the
main drop in sales observed at the best selling city also
observed in similar cities?"; as another example, instead
of issuing a roll-up operator, the user can state "verify
whether a trend that I observe in a certain context still
holds in general".
• Concerning the aspect of what the answer to a query is,
we believe we can no longer remain with "sets of tuples" as
the answers to queries. We can envision BI tools where the
answer to a query includes (i) a set of tuples accompanied
with appropriate visualizations, (ii) the automatic mining
of models and patterns, (iii) the extraction of important
"jewels" hidden in the result (which we call highlights),
and naturally, (iv) advanced, intuitive reporting.</p>
      <p>Contributions and Benefits . The main contribution of this
paper is that it structures a vision for the Business Intelligence
of the near future in terms of a model with novel concepts and
operators. We aim our definitions to be broad enough, yet as precise
as possible; at the same time, we want to link them as much as
possible to the intentional nature of the next generation of BI tools,
where the end-user requests information at a very high level and the
system transforms these requests to concrete execution of algorithms
in order to compute, visualize and comment data and important
highlights among them as an answer to the information request
made by the end-user.</p>
      <p>Why bother? We are convinced that after 50 years of naive
query answering, it is now time to replace it with efortless,
automatic insight gaining from the user. Instead of making the end
user dig into sets of records, we can increase productivity and the
understanding of the essence of the data by using two pillars, one
devoted to querying and another devoted to answering,
specifically, firstly, by allowing the user to focus on high-level goals
of information acquisition, rather than details of what data to
bring in, and secondly, by providing focus-points in the answers
that will move the user efort from manual "jewel mining" to
addressing the insights gained.</p>
      <p>
        Several other practical benefits can be envisioned. Our
motivation was partially triggered from the problem of generating
meaningful, artificial session logs for experimental reasons (e.g.,
like [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]). Moreover, such eforts also make sense in other
domains, e.g., to search Web data sources [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Finally, note that this
algebra can be seen as a first step towards addressing some of
the open challenges proposed by the research community, like
e.g., in [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], namely the lack of declarative exploration languages
to present and reason about popular navigational idioms, or the
various challenges raised by [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] around the benchmarking of
interactive data exploration by measuring user’s gain in terms of
insights.
      </p>
    </sec>
    <sec id="sec-2">
      <title>THE VISION</title>
      <p>An OLAP session is a sequence of dashboards that the analyst
sees, each with its own information, including data, charts and
informative summaries of KPI performance. The sequence is
produced by the actions of the analyst that changes the contents
of the dashboard by requesting more information on the basis of
a set of operations made available to him by the tool.
2.1</p>
    </sec>
    <sec id="sec-3">
      <title>Preliminaries on multidimensional modeling</title>
      <p>
        For lack of space, we do not go to the details of data modeling.
We closely follow the model of [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] and constrain ourselves to
mention a textual summary. As typically happens with
multidimensional models, we assume that dimensions provide a context
for facts [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. This is especially important if combined with the
fact that dimension values come in hierarchies; therefore, every
single fact can be simultaneously placed in multiple
hierarchically structured contexts, providing thus the ability to analyze
sets of facts from multiple perspectives. The underlying data sets
include measures that are characterized with respect to these
dimensions. Cube queries involve measure aggregations at specific
levels of granularity per dimension, along with filtering of data
for specific values of interest.
      </p>
      <p>We model an OLAP session as a directed graph G(V ,E). The
set of vertices of the graph represent states of the user’s session
–practically the dashboard screens the user is receiving by the
OLAP tool– and the arcs represent transitions among states.
2.2</p>
    </sec>
    <sec id="sec-4">
      <title>Intentional Queries: The Transitions of a</title>
    </sec>
    <sec id="sec-5">
      <title>Session</title>
      <p>Before describing in detail our vision on dashboards and their
internals, we start with the deeper essence of the intentional
nature of our proposal: user operations, or, equivalently transitions
between the states of an OLAP session. The main idea behind
transitions is that we move from a concrete model of logical operators
like roll-up’s and drill down’s, to an intentional model where the
user expresses in terms of operators for high-level requirements
like "explain a certain phenomenon", "predict the future values"
and this high-level requirement has to be automatically
translated to specific OLAP and Data Mining operators/algorithms
that will carry the answer. This can also facilitate greatly the
extraction of highlights, as the user’s goal is explicitly stated to
the system.</p>
      <p>
        We organize the transition operators that can express the users’
requirements for extra information, in families, i.e., a taxonomy
of types for our transitions, which we call transition types. In
contrast to other models where a transition would practically be a
query (relational or multidimensional), in our model, a transition
characterizes the intention of the user with respect to her
information need. Each family is a collection of transition operators
with the same high-level intention, but of course, diferent
semantics at the details. This set of families intends to cover (a)
traditional OLAP operators, (b) various contributions of the
literature, in particular discovery driven analysis operators [
        <xref ref-type="bibr" rid="ref16 ref17 ref18">16–18</xref>
        ],
Cinecubes acts [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], and operators for exploratory search in the
web [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], as well as (c) novel operators that can be developed by
the research community in the future. Before detailing these
families, we mention Figure 1 that intuitively describes an analysis
phrased with some transitions, using the Star Schema Benchmark
data cube [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. The annotation and highlight generation of the
data is explained in Section 2.3.
      </p>
      <p>
        Compare. A transition of type compare is used to retrieve more
data to be compared with what the current dashboard displays.
Intuitively, transitions in this family ask for bringing in more data
to the ongoing analysis. The family includes OLAP operators
pivot, drill-across, the augment operator of [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], the put-in-context
act of Cinecubes [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. For instance, on the example of Figure 1, the
put-in-context [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] is used twice (top left vertical path in Figure 1),
ifrst, to compare the sales for a category of products ( MFGR#5)
in a given supplier country (Argentina) to that of past years
(indicated as ❷), and second to compare these past results to the
results for sibling countries in the dimension Supplier (indicated
as ❺).
      </p>
      <p>
        FocusOn. A transition of type FocusOn is used to exclude data
from the current analysis. Transitions in this family focus on a
particular cell, or group of cells. The family includes the OLAP
slice/dice operation, the take operator of [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], and also
personalization operators like skyline, winnow, etc. [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. For instance, in
Figure 1, a slice/dice operation is done to focus on sales results
for part MF RG#51 in years 2015 and 2016 (indicated as ❼).
      </p>
      <p>Abstract. The transitions of this family are used to reduce the
information load of the dashboard, by abstracting them in a more
compact form. The family includes the traditional roll-up
operator, as well as data mining operations like clustering, frequent
itemset mining, etc. For instance, in Figure 1, sales results for
years 2011 (not all shown in previous dashboards) to 2016 in
Argentina are abstracted in 2 classes based on a result threshold
(indicated as ❹).</p>
      <p>
        Analyze and Explain. A transition of type Analyze and Explain
is used to understand the cause of a phenomenon displayed in the
current dashboard and to explore what is displayed at nfier levels
of detail. With respect to the analysis part, the family includes
the drill-down OLAP operation, and the details act of Cinecubes
[
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. With respect to the explain part, the family includes Dif [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]
(interesting drill downs at various granularities explaining the
content of two cells) and the operators of Cariou &amp; al. [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. For
instance, at the top of Figure 1, a drill-down from countries to
cities is used to analyze sale results by cities (indicated as ❶),
and, at the left bottom of the figure, a Dif is used to explain an
important drop of sales in Argentina in year 2016 compared to
year 2015, for the specific category of product MFGR#5 (indicated
as ❻).
      </p>
      <p>
        Verify. A transition of type Verify is used to check that a
potential trend or pattern currently observed can be generalized to
broader contexts, e.g., we can check whether it occurs at coarser
levels of detail. The family includes Relax operator[
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]
(interesting rollups at various granularities agreeing or contradicting a
trend observed in the content of two cells), the verification of
outlier data points in broader context than the current, etc. For
instance, in Figure 1, a Relax operation is used to verify whether
the drop of sale results for category MFGR#5 in Argentina, in
2016, also holds for all products sold (indicated as ❸).
      </p>
      <p>
        Predict. A transition of type Predict is used for predictive
analysis of the displayed data. Typically this involves the entire set of
timeseries forecasting methods, as well as classification methods
used for assessing future events [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. For instance, in Figure 1,
sales results of product MFGR#51 for year 2015 and 2016 are used
to issue predictions for year 2017 in Argentina cities (indicated
as ❽).
      </p>
      <p>
        Suggest. A transition of type Suggest is used to ask for
guidance with respect to where to navigate next in the
multidimensional data space. The family includes Sarawagi’s Inform [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]
(Interesting drill downs at various granularities, deviating from
trends that what was already observed), or query
recommendation techniques [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. We note that this kind of transition should
incorporate risk control mechanisms to prevent false discoveries
[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. For instance, in Figure 1, a suggest operation is used to
automatically find neighboring sale results deviating from the results
of the previous operation (indicated as ❾).
      </p>
      <p>Concluding with transitions, we emphasize that an open
research challenge is to define an algebraically closed language
of intentional operators, each of which can be translated
automatically to the execution of concrete query and data mining
operators. The synthesis of these operators can then create entire
stories. Assume the user, being at state ❷, would like to: "verify
whether the distribution of sales for mfgr#5 in Argentina from
2011 to 2016 still holds in general for all parts, and build a 2 classes
model for it, then backtrack to parts and compare with sibling
countries, and finally explain the highest country-wise
diference." This request would correspond to the following statement
in the intentional language: explainhiдhliдht :Max Dif f er ence (
comparesiblinдCount r ies (
backtrackshowT r end (
abstract2Cl asses (</p>
      <p>veri f yall P ar t s,showT r end (❷))))).</p>
      <p>An optimizer would then automatically translate this request into
the sequence of operations: roll up, cluster, backtrack, Cinecube’s
put in context, Dif and optimize it to produce a data story using
dashboards ❷ to ❻.
2.3</p>
    </sec>
    <sec id="sec-6">
      <title>Dashboards: The states of a session</title>
      <p>A state is a dashboard that the user sees. In principle, each
dashboard is ultimately based on the generating data provided by as
a finite collection of queries, posed to the underlying database. A
sharp distinction of our approach compared to previous models
is that we do not restrict ourselves to data for each state but
accompany them with a set of interesting findings, which come in
two flavors, specifically, (a) in terms of models, i.e., results of data
mining or machine learning algorithms applied over the data of
a dashboard’s state, and, (b) important findings that accompany
the dashboard, to which we refer as the highlights of the state.</p>
      <p>Take for example, the small scenario of Fig. 2. The dashboard is
based on a simple cube, as its generating data, involving Product ,
Reдion and Y ear as dimension levels and Sales and Cost as
(aggregate) measures. Then, the dashboard automatically computes
the Bene f it as the application of a simple arithmetic function,
specifically, the diference of the two measures. A second step
involves the building of a model, and specifically the
classification of sales with respect to their stability of the benefit measure.
Finally, highlights are extracted on the basis of this model, and
specifically, these highlight are the cube cells pertaining to the
class stable bene f its. Observe how the final data are extended
with the class attribute linking them to their specific model
counterpart.</p>
      <p>Steps. In order to construct and visualize a dashboard, we
envision several computations taking place. Here is the sequence
of the performed actions:
(1) First, the queries of the state’s dashboard are issued and
their results, the generating data of the dashboard, are
computed. Any straightforward computations for extra,
derived columns of the dashboard (e.g., дain=price-cost ) are
performed too.
(2) Then, the available data are fed to model extraction
algorithms for the computation of models abstract, summarize
and provide patterns and insights for the data.
(3) The potentially large amount of data and models computed
has to be ranked and assessed on their interestingness for
the analyst; the most important findings are classified
as the dashboards highlights to be used for providing the
main insights and the main directions for future transitions
by the analyst.
(4) The above are accompanied by visualization, text
construction and reporting tasks that aim the process of
understanding and communicating the main findings.</p>
      <p>The generating data of the dashboard. The results of the
underlying queries are the basis for the subsequent computations
that take place for the construction of the dashboard: this is why
we refer to them as the generating data of the dashboard.</p>
      <p>Models. In diference from the state of the art in OLAP
modeling, in our approach, we believe that the static results of aggregate
queries, and their visualizations with charts and speedometers
will simply be not enough in the near future. The automatic
assessment and critical characterization of the presented data
will be part of the BI of the near-future. See some simple cases
based on the example of observing sales data of an international
company:
• Sales data will be automatically characterized with respect
to a decision tree that classifies them (e.g., as "successful",
"risky", "potentially hazardous" etc)
• Sales per country will be automatically clustered to
reveal similarities and diferences, as a first step towards
understanding outliers and non-expected behavior
• Aggregate sales over significant periods will be fed into
time series analysis and forecasting methods to
automatically detect trends, seasonalities and to deduce future
values
We consider the plugging of data analysis algorithms in the
backstage of a dashboard as an indispensable part of BI. These
algorithms can range from very simple ones (e.g., finding the top
values of a cuboid, or detecting whether a dimension value is
systematically related to top or bottom sales) to very
complicated ones (like, for example, outlier detection, dimensionality
reduction, etc). Most importantly, as the operation of the
algorithms will likely be as transparent as possible to the end user,
their execution will require an almost automatic tuning of their
parameters. The findings of these algorithms will be models of
the data that are typically (not always) used to annotate the
existing data with characterizations and ofer focus points to
the visualization of the dashboard (forecasts, outliers, dimension
values that dominate top or bottom measures, . . .). The models
themselves give a multitude of results. However, some of these
results indicate that a part of a dashboard’s data are of important
interestingness value to the end user. Due to that, we collectively
refer to the important results of the execution of these algorithms
as highlights, in an attempt to show that the aim is to enrich
the current data-intensive dashboards with knowledge that is
worth exploring or using for decision making.</p>
      <p>We are going to treat model extraction algorithms as
"blackbox" algorithm without probing into their internals, and, most
importantly, without assuming any particular properties for their
output. What does a model extraction algorithm do? Basically,
the algorithm receives as input (a) a set of input data, and, (b) a set
of execution parameters that have to be fixed for the algorithm’s
execution. Without loss of generality, we can assume that a subset
of these parameters will be bound to string or numerical values
and the rest will be mapped to attributes of the input data. The
output of a model extraction algorithm is a model of the input data.
Depending on the algorithm, the result difers. For instance, a
descriptive model built using unsupervised clustering is basically
just a labeling of each cube’s cells, while a predictive one allows
enriching the cube with predictions and comes with an accuracy
score. In summary, the main properties of a model extraction
algorithm is outlined as follows:
(1) Input: a set of input data, which is the result of an extended
cube query set of the dashboard, along with a fixing of the
algorithm’s input parameters
(2) Output: a (possibly complex) result composed of (a) a
model of the input data, and, (b) several characterizations
of it (precision, strength, p-value, . . . )</p>
      <p>Each model type can be arbitrarily complex and consists of a
set of model components of versatile nature. Examples:
• A time series splits each of its points to 3 measurements,
specifically error, trend and seasonality (practically
creating 3 times series in the place of one, whose sum
reconstructs the original one).
• A clustering scheme includes a set of clusters, each with a
set of tuples that constitute it, along with a centroid.
• A classification decision tree includes a tree structure, best
expressed as the composition of a set of paths, leading to
characterization classes; again, each class comprises a set
of generating tuples that pertain to it.</p>
      <p>
        Typically, each such component, as well as the model on its
entirety is accompanied with a set of metrics of its statistical
power. Given a model type T , we denote its components with
TC = {tmc1 . . . tmcm }. We avoid resolving the internals of the
composition of the tmc parts and treat TC as a set of components.
This is realistic, as we can always mask a part-of relation as a set
of constituting components, even a recursive one, by allowing
an extra composition relation to provide the semantics of the
part of-relationship. Apart from this simple modeling, attempts
to relationally code mining results already exist [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. In other
words: assuming a model M of type T (e.g., a particular cluster
set M of type ClusterSet ), we can use its components (e.g., the
clusters of M, M = {cluster1, . . . , clusterm }) as fundamental parts
of subsequent analyses, and, at the same time, link each of these
components to their underlying data.
      </p>
      <p>But, then, how do we handle the heterogeneity of diferent model
types? Clearly, a cluster is inherently diferent from a decision
tree or the formula for a trend. Is there a unifying model to cover
them all? The unifying essence of all the plethora of diverse model
types is that, at the end of the day, all of them are annotations
of the original data. At the end of the day, every component of a
complex model type (be it a cluster id, a path in a decision tree and
a resulting class, a characterization of the top-k tuples, or a trend
formula): (a) refers to a subset of the input data and vice-versa,
and, (b) refers to the overall model via a part-of relationship. So,
once a model of the underlying data is available, our solution
to the problem is to provide a distinct identity (an id) to the
components of a complex type T , retain the membership/part-of
relationships of T separately and annotate or characterize the data
with respect to the part of T that pertains to them. In fact, this step
can be blended within the model extraction itself. Examples of
such annotation follow:
• Assuming a time series model that splits a time series
to trend, seasonality and noise, these attributes can be
appended to the generating data set.
• Assuming a cluster model, the generating data can be
annotated with the id of the cluster to which they belong.
• Assuming a classification model, the input data can be
labeled via an extra attribute with respect to the class(es)
of the model to which they belong.
• Assuming a model of top-k values of a measure, the input
data can be annotated with their rank, if they belong in
the top-k set and they have been ranked.</p>
      <p>A notable property of our modeling is that we require model
components to be directly mapped and linked to their generating
data in a bidirectional mapping, so that the end-user can navigate
back and forth between cube cells and their models.</p>
      <p>Highlight Production. As already mentioned, the set of
highlights of the dashboard is a set of important findings that
accompany the dashboard. These can be findings of any nature, e.g.,
important outliers in the contents of the dashboard’s data, all
the tuples belonging to a certain class of a classification scheme,
the top or bottom values of a measure, etc. It would be
straightforward (and doable with our modeling) to treat all the contents
of an extended query as highlights. However, we would like to
stress that we want to restrict our attention to the ones that are
really important for the end-user.</p>
      <p>Other "local’ operations. Once all computations are done,
there are several tasks to be taken care of before automatically
constructing the dashboard. We envision that the development
of principled methods for (a) the visualization of the dashboard
data in various ways and (b) the creation of reports, via data
storytelling methods will present major challenges for the future.</p>
      <p>Visualize. Operations in this family compute the contents of
visual representations (bar charts, speedometers, scatterplots, ...)
on the basis of the current contents of the dashboard. We
consider them to be local operations as they are performed without
any further interaction with data external to the ones already
retrieved from the dashboard.</p>
      <p>
        Data storytelling. Data storytelling is a novel trend that seems
to carry significant weight in the way the future BI will look like.
The main idea involves the automatic generation of a textual
report from the data of a particular dashboard or of an entire
OLAP sessions. There are already some academic eforts, like
Cinecubes [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] as well as some tools (see [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] for a nice discussion).
3
      </p>
    </sec>
    <sec id="sec-7">
      <title>PATHS FOR FUTURE RESEARCH</title>
      <p>This paper is a vision paper describing, in broad terms, a potential
future for OLAP, to strengthen its place as the corner stone of BI.
Our call to arms to the research community can be structured
along several lines. Population of the families of the transition
operators with concrete operators. Each new operator (or each new
formalization of existing operators) should hopefully carry (a)
clear semantics, (b) execution algorithms as well as their
optimizations, (c) fine tuning algorithms for the parameter fixing of
the introduced algorithms and, equally importantly, (d) a graceful
linkage to the overall model proposed here (in an attempt to be
able to gracefully plug it in the respective BI tools). Automation of
transitions and tuning. We believe this to be the most important
piece of the puzzle. How do we fully automate what model and
highlight extraction methods should be employed? This includes
the possibility of predicting interesting results for the end-user,
which in turn, requires appropriate models of interestingness.
Along with the necessary run-time optimizations, all these tasks
provide challenging and open problems of significance practical
importance. Key Performance Indicators (at least in the way they
are used today) are examples of simple model extraction, and by
definition, KPI’s are highlights (or else they wouldn’t be "Key"
indicators). Linking them explicitly to a model for OLAP is a
necessary add-on for a comprehensive view of what OLAP does.
Benchmarking and tools. A reference free, open-source tool and a
reference benchmark for the future BI (involving data, model and
highlight extraction requests and sessions) can be a really handy
tool for the research community (otherwise, each new paper will
need to improvise on its experimental assessment). A tool will
also trigger other research directions, like e.g., the incorporation
of research results on natural language processing to accept the
user requests, visualizations to depict the models and highlights,
etc.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Charu</surname>
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Aggarwal</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Data Mining - The Textbook</article-title>
          . Springer.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Carsten</given-names>
            <surname>Binnig</surname>
          </string-name>
          , Lorenzo De Stefani, Tim Kraska, Eli Upfal, Emanuel Zgraggen, and
          <string-name>
            <given-names>Zheguang</given-names>
            <surname>Zhao</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Toward Sustainable Insights, or Why Polygamy is Bad for You</article-title>
          .
          <source>In CIDR 2017, 8th Biennial Conference on Innovative Data Systems Research.</source>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Alessandro</given-names>
            <surname>Bozzon</surname>
          </string-name>
          , Marco Brambilla, Stefano Ceri, and
          <string-name>
            <given-names>Davide</given-names>
            <surname>Mazza</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>Exploratory search framework for Web data sources</article-title>
          .
          <source>VLDB J</source>
          .
          <volume>22</volume>
          ,
          <issue>5</issue>
          (
          <year>2013</year>
          ),
          <fpage>641</fpage>
          -
          <lpage>663</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Véronique</given-names>
            <surname>Cariou</surname>
          </string-name>
          , Jérôme Cubillé, Christian Derquenne, Sabine Goutier,
          <source>Françoise Guisnel, and Henri Klajnmic</source>
          .
          <year>2009</year>
          .
          <article-title>Embedded indicators to facilitate the exploration of a data cube</article-title>
          .
          <source>IJBIDM 4</source>
          ,
          <issue>3</issue>
          /4 (
          <year>2009</year>
          ),
          <fpage>329</fpage>
          -
          <lpage>349</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Philipp</given-names>
            <surname>Eichmann</surname>
          </string-name>
          , Emanuel Zgraggen,
          <string-name>
            <given-names>Zheguang</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Carsten</given-names>
            <surname>Binnig</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Tim</given-names>
            <surname>Kraska</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Towards a Benchmark for Interactive Data Exploration</article-title>
          .
          <source>IEEE Data Eng. Bull. 39</source>
          ,
          <issue>4</issue>
          (
          <year>2016</year>
          ),
          <fpage>50</fpage>
          -
          <lpage>61</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Arnaud</given-names>
            <surname>Giacometti</surname>
          </string-name>
          , Patrick Marcel, and
          <string-name>
            <given-names>Arnaud</given-names>
            <surname>Soulet</surname>
          </string-name>
          .
          <year>2011</year>
          .
          <article-title>A Relational View of Pattern Discovery</article-title>
          .
          <source>In Database Systems for Advanced Applications - 16th International Conference</source>
          , DASFAA.
          <fpage>153</fpage>
          -
          <lpage>167</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Dimitrios</given-names>
            <surname>Gkesoulis</surname>
          </string-name>
          , Panos Vassiliadis, and
          <string-name>
            <given-names>Petros</given-names>
            <surname>Manousis</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>CineCubes: Aiding data workers gain insights from OLAP queries</article-title>
          .
          <source>Inf. Syst</source>
          .
          <volume>53</volume>
          (
          <year>2015</year>
          ),
          <fpage>60</fpage>
          -
          <lpage>86</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Jim</given-names>
            <surname>Gray</surname>
          </string-name>
          , Adam Bosworth,
          <string-name>
            <given-names>Andrew</given-names>
            <surname>Layman</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Hamid</given-names>
            <surname>Pirahesh</surname>
          </string-name>
          .
          <year>1996</year>
          . Data Cube:
          <string-name>
            <given-names>A Relational</given-names>
            <surname>Aggregation Operator Generalizing</surname>
          </string-name>
          Group-By,
          <article-title>Cross-Tab, and Sub-Total</article-title>
          .
          <source>In Proceedings of the Twelfth International Conference on Data Engineering</source>
          .
          <fpage>152</fpage>
          -
          <lpage>159</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Stratos</given-names>
            <surname>Idreos</surname>
          </string-name>
          , Olga Papaemmanouil, and
          <string-name>
            <given-names>Surajit</given-names>
            <surname>Chaudhuri</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Overview of Data Exploration Techniques</article-title>
          .
          <source>In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data</source>
          .
          <volume>277</volume>
          -
          <fpage>281</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Ihab</surname>
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Ilyas</surname>
          </string-name>
          , George Beskales, and
          <string-name>
            <surname>Mohamed</surname>
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Soliman</surname>
          </string-name>
          .
          <year>2008</year>
          .
          <article-title>A survey of top-k query processing techniques in relational database systems</article-title>
          .
          <source>ACM Comput. Surv</source>
          .
          <volume>40</volume>
          ,
          <issue>4</issue>
          (
          <year>2008</year>
          ),
          <volume>11</volume>
          :
          <fpage>1</fpage>
          -
          <lpage>11</lpage>
          :
          <fpage>58</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Christian</surname>
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Jensen</surname>
            , Torben Bach Pedersen, and
            <given-names>Christian</given-names>
          </string-name>
          <string-name>
            <surname>Thomsen</surname>
          </string-name>
          .
          <year>2010</year>
          .
          <article-title>Multidimensional Databases and Data Warehousing</article-title>
          . Morgan &amp; Claypool Publishers.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>Patrick</given-names>
            <surname>Marcel</surname>
          </string-name>
          and
          <string-name>
            <given-names>Elsa</given-names>
            <surname>Negre</surname>
          </string-name>
          .
          <year>2011</year>
          .
          <article-title>A survey of query recommendation techniques for data warehouse exploration. In Actes des 7èmes journées francophones sur les Entrepôts de Données et l'Analyse en ligne</article-title>
          , Clermont-Ferrand, France, EDA 2011,
          <year>Juin 2011</year>
          .
          <fpage>119</fpage>
          -
          <lpage>134</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Patrick E. O'Neil</surname>
          </string-name>
          ,
          <string-name>
            <surname>Elizabeth J. O'Neil</surname>
            ,
            <given-names>Xuedong</given-names>
          </string-name>
          <string-name>
            <surname>Chen</surname>
            , and
            <given-names>Stephen</given-names>
          </string-name>
          <string-name>
            <surname>Revilak</surname>
          </string-name>
          .
          <year>2009</year>
          .
          <article-title>The Star Schema Benchmark and Augmented Fact Table Indexing.</article-title>
          .
          <source>In TPCTC (2009-10-28)</source>
          .
          <fpage>237</fpage>
          -
          <lpage>252</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>Stefano</given-names>
            <surname>Rizzi</surname>
          </string-name>
          and
          <string-name>
            <given-names>Enrico</given-names>
            <surname>Gallinucci</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>CubeLoad: A Parametric Generator of Realistic OLAP Workloads</article-title>
          .
          <source>In Advanced Information Systems</source>
          Engineering - 26th International Conference, CAiSE
          <year>2014</year>
          , Thessaloniki, Greece, June 16-20,
          <year>2014</year>
          . Proceedings.
          <volume>610</volume>
          -
          <fpage>624</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>Oscar</given-names>
            <surname>Romero</surname>
          </string-name>
          and
          <string-name>
            <given-names>Alberto</given-names>
            <surname>Abelló</surname>
          </string-name>
          .
          <year>2007</year>
          .
          <article-title>On the Need of a Reference Algebra for OLAP</article-title>
          .
          <source>In Data Warehousing and Knowledge Discovery</source>
          , 9th International Conference, DaWaK
          <year>2007</year>
          , Regensburg, Germany, September 3-
          <issue>7</issue>
          ,
          <year>2007</year>
          , Proceedings.
          <fpage>99</fpage>
          -
          <lpage>110</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>Sunita</given-names>
            <surname>Sarawagi</surname>
          </string-name>
          .
          <year>1999</year>
          .
          <article-title>Explaining Diferences in Multidimensional Aggregates</article-title>
          .
          <source>In Proceedings of 25th International Conference on Very Large Data Bases (VLDB'99)</source>
          .
          <fpage>42</fpage>
          -
          <lpage>53</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>Sunita</given-names>
            <surname>Sarawagi</surname>
          </string-name>
          .
          <year>2000</year>
          .
          <article-title>User-Adaptive Exploration of Multidimensional Data</article-title>
          .
          <source>In Proceedings of 26th International Conference on Very Large Data Bases (VLDB</source>
          <year>2000</year>
          ).
          <fpage>307</fpage>
          -
          <lpage>316</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>Gayatri</given-names>
            <surname>Sathe</surname>
          </string-name>
          and
          <string-name>
            <given-names>Sunita</given-names>
            <surname>Sarawagi</surname>
          </string-name>
          .
          <year>2001</year>
          .
          <article-title>Intelligent Rollups in Multidimensional OLAP Data</article-title>
          .
          <source>In Proceedings of 27th International Conference on Very Large Data Bases (VLDB</source>
          <year>2001</year>
          ).
          <fpage>531</fpage>
          -
          <lpage>540</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>Alex</given-names>
            <surname>Wright</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Algorithmic authors</article-title>
          .
          <source>Commun. ACM</source>
          <volume>58</volume>
          ,
          <issue>11</issue>
          (
          <year>2015</year>
          ),
          <fpage>12</fpage>
          -
          <lpage>14</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>