<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A Conceptual Model for Data Analysis Highlights</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Panos Vassiliadis</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Veronika Peralta</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Patrick Marcel</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dimos Gkitsakis</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Angeliki Dougia</string-name>
          <email>adougia@pi-ag.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Faten El Outa</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>P&amp;I Hellas</institution>
          ,
          <addr-line>Ioannina, Greece; work done while with the Univ. Ioannina</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Univ. Ioannina</institution>
          ,
          <addr-line>Ioannina</addr-line>
          ,
          <country country="GR">Greece</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>University of Orleans</institution>
          ,
          <addr-line>Orleans</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <abstract>
        <p>We introduce a conceptual model for highlights to support automated data analysis and storytelling. Highlights reveal key facts, of high significance, that are hidden in the data with which a data analyst works. The model builds on the concepts of Holistic and Elementary Highlights, along with their context, constituents and interrelationships, whose synergy can identify internal properties, patterns and key facts in a dataset being analyzed. We also report how the related literature fits within the model, as well as a first implementation of it.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Highlights</kwd>
        <kwd>Data exploration</kwd>
        <kwd>Data storytelling</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Data analysis concerns the processing of large volumes of data for the identification of important
information hidden in them, or derivable from them. Meliou et al., [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] ofer an excellent classification of
analysis tasks as (a) descriptive analytics, that report properties and extract patterns from the underlying
data, (b) diagnostic analytics, conducting analyses (e.g., causal inference) to explain the reasons behind
the observed state of afairs, (c) prescriptive analytics, mainly solving optimization problems to derive
the best configuration of parameters for optimal solutions to complex problems, and, (d) predictive
analytics, mainly using historical data and simulation models to predict future trends as well as the
efects of interventions. The results of such analysis tasks are first to be extracted by the analysts,
interpreted and evaluated for the significance, compiled into a data-based story of “what – why –
how - next” and finally be communicated, in humanly understandable format, to broader audiences of
non-technical users and decision-makers via data narration methods.
      </p>
      <p>
        As, more and more, in their attempt to extract meaningful results, data analysts have to overcome
the time pressure, the learning curve of analytics tools, and the growing volume and variety of data,
there is unavoidably a growing need for automating the process of (a) eficiently and (b) efectively
discovering significant characteristics of the underlying data sets, that are subsequently used to derive
decisions and actions. Already, several eforts towards automating the results of analytical questions
have been made [
        <xref ref-type="bibr" rid="ref2 ref3 ref4 ref5 ref6 ref7">2, 3, 4, 5, 6, 7</xref>
        ]. The typical terminology for these automatically extracted answers
are findings , also called insights, and data facts. These concepts are data-oriented and concern the
identification of parts of the data space that demonstrate interesting properties (typically a pattern or
relationship between its data). When, for example, a time series demonstrates a trend, or a histogram
follows a certain distribution, then the data demonstrate a phenomenon that may, or may not, be
interesting to the analyst. The key contribution of this paper is the introduction of a richer conceptual
model for highlights, which, much like findings, isolate the parts of the data that make the existence of an
interesting phenomenon true, but also, come with structured extra information with respect to (i) their roles,
relationships and provenance, as well as, (ii) the necessary information that explains why these data are of
importance. But first, we will illustrate the case with an example.
      </p>
      <p>Working Example. Assume we have a relational database with product sales, and for reasons
of comprehension, we structure it as a cube. The fundamental source of information is a fact table
SalesFact(ProductId, TimeId, CityId, PromotionId, CustomerId, Sales, Costs, UnitSales, AvgSalesPerUnit,
Profit) , with all the monetary measures expressed in thousands of Euros. Around the aforementioned
fact table, we also have several lookup dimension tables, namely Products, Time, Cities, Promotions,
Customers, joinable to the fact table via the respective   attributes (i.e., we have a PK-FK relationship
between fact and dimension tables for each   attribute).</p>
      <p>Assume a query that selects the total sum of sales of the product Wine for the 2nd quarter of 2023 in
Greece, grouped by month and city (Table 1). Observe the following highlights of the result set:
• The city Athens dominates all other cities: for every month, the sales of Athens are higher than
the sales of every other city.
• The month May 2023 dominates all other months: for every city, the sales of May are higher than
the sales of other months.
• The city of Athens is a mega-contributor to the total sales: the sales of Athens are 75% of the total
sales.
• If one observes the time-series of the marginal sales per month, there is no trend or seasonality;
however there is a unimodality in the time-series: sales rise, reach a peak (in May), and then drop.</p>
      <p>Month
April 2023
May 2023
June 2023</p>
      <p>Total</p>
      <p>The following (automatically derivable) textual summary reports the discovered highlights grouped
around the main characters of Athens and May:</p>
      <p>In terms of geography, Athens dominates all other cities, in every month. In fact, Athens
is a mega-contributor to total sales, by contributing 75% of all sales. In terms of time, the
progression in time shows a peak in May; in fact, May dominates all other months in terms of
total sales. No trend or seasonality were detected.</p>
      <p>Besides data analysis, highlights are of capital importance for data narration. When a data narrative
is ultimately constructed (via a process also known as data storytelling, which falls outside the scope of
this paper), these highlights will provide the basis for automatically generating text, chart annotations
or choosing of visualizations (e.g., type, style, position). Highlight constituents, in particular main
Characters (as Athens or May 2023 in the example) and Measure values (e.g., Athens total of 2100Ke,
representing 75% of total sales), may also guide the choice of specialized visual artifacts (e.g., colors or
efects).</p>
      <p>To support data analysts in the retrieval, understanding and communication of highlights, we first
need to handle two important problems: (a) to clarify the involved concepts in the production of
highlights, and (b) to present a unifying conceptual model that covers a large number of automatically
extracted highlights and allows to uniformly handle them via automated tools under common syntax
and semantics.</p>
      <p>Contribution. The main contribution of this paper lies in the provision of a comprehensive and precise
conceptual model for highlights, along with their constituents and their interrelationships, with the goal to
help system builders implement tools and algorithms that facilitate the automated extraction, representation,
and exploitation of highlights in data, for data analysis and storytelling purposes.</p>
      <p>Outline. In Section 2, we survey related work. In Section 3, we introduce the model for highlights.
We first describe supporting concepts and introduce the main concepts of Character and Measure Value,
and then define highlights, organizing them as (a) Holistic Highlights, which are properties of the entire
dataset being examined, and, (b) Elementary Highlights which concern individual Characters, or sets
of them, that play a crucial role to Holistic Highlights. In Section 4, we discuss the relationship of the
proposed model with the state of the art. In Section 5, we discuss the practical usage of the proposed
model. Finally, in Section 6 we conclude with a summary and ideas for future work.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        Data analysis. As already mentioned, data analysis can be (a) descriptive, (b) diagnostic, (c) prescriptive,
and, (d) predictive [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Traditionally, statistics, data mining and business intelligence ofered descriptive
and diagnostic analysis support. In recent years, these traditional techniques have been complemented
with Exploratory Data Analysis (EDA), where users are interactively analyzing datasets to gain insights
[
        <xref ref-type="bibr" rid="ref10 ref11 ref12 ref13 ref14 ref8 ref9">8, 9, 10, 11, 12, 13, 14</xref>
        ]. Supporting this task can be done, e.g., by generating EDA notebooks using deep
learning [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] which supposes having access to lots of former analyses, or by pre-analyzing datasets for
computing highlights [
        <xref ref-type="bibr" rid="ref16 ref3 ref5">3, 16, 5</xref>
        ]. New tools combine LLM with data exploration techniques [
        <xref ref-type="bibr" rid="ref14 ref7">7, 14</xref>
        ]. EDA
is similar to Discovery-Driven Exploration (DDE) of data cubes [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ], essentially motivated by explaining
unexpected data in the result of a cube query. Gkesoulis et al. [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] demonstrated how to enrich query
answering with a short data movie that provides highlights for the results of an OLAP query, albeit
with hard-coded highlights and without a general model. Many recent tools propose automating not
only data exploration but the whole data narration process [
        <xref ref-type="bibr" rid="ref18 ref19 ref4 ref6 ref7">4, 6, 18, 19, 7</xref>
        ].
      </p>
      <p>
        Defining highlights. There is no clear consensus on the terminology used for the case of highlights.
We believe that a clarification of the related concepts is one of the contributions of this paper . The term
insight is well adopted in the data management and data visualization communities [
        <xref ref-type="bibr" rid="ref16 ref5">16, 5</xref>
        ], and other
terms are also used, like discoveries [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ], data facts [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] or findings [
        <xref ref-type="bibr" rid="ref20 ref21">20, 21</xref>
        ]. To the extent that insight
is “the act or result of apprehending the inner nature of things or of seeing intuitively" according to
Merriam-Webster, and, hence, a perception-based concept, we prefer to adopt the term highlights for
the discoveries made in data, and, the assessment of their significance.
      </p>
      <p>
        In the conceptual model of [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ], highlights are the more striking, surprising or relevant facts of a
query result. Similar definitions are given in VOOL [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ] (interesting facts in a cube) and DAISY [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ] (a
table containing interesting data). In these works, however, the structure of highlights is not detailed.
In a large number of works [
        <xref ref-type="bibr" rid="ref16 ref3 ref4">3, 16, 4</xref>
        ], and most notably MetaInsight [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], a highlight is characterized by
the subset of the data space that generates, possibly with a group-by clause, a type of the pattern that is
identified over a measure of this data sub-space, and a score of its importance. These definitions are
fairly similar in other tools: Calliope [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], Erato [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ], Notable [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] and InsightPilot [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
      </p>
      <p>
        Interestingness of highlights. Characterizing meaningful highlights in data has attracted a lot of
attention, since the seminal works on discovery driven exploration [
        <xref ref-type="bibr" rid="ref17 ref24">17, 24</xref>
        ] and knowledge discovery in
databases [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ]. Often, this characterization takes the form of interestingness scores for retrieved data [26]
or patterns [27, 28]. Scoring a highlight is used to express its importance, or interestingness, for the user.
As explained in [26, 29], interestingness is manifold: scores can be computed for diferent dimensions
of interestingness. In the taxonomy proposed in [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], following [30], interestingness of highlights can
be characterized with human, system or data metrics. Chen et al. [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ] stress the importance of the
relationships between highlights for the selection of the more important ones, while other authors
exploit their significance , defined in terms of statistical tests [
        <xref ref-type="bibr" rid="ref3 ref4">31, 3, 32, 33, 34, 4</xref>
        ], Shapley values [35], or
information theory [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. A model for highlights</title>
      <p>In general, the process of automated highlight extraction (Fig. 1) applies a set of Highlight Extraction
Algorithms over a dataset, which can be, e.g., the result of a query. These algorithms operate like
pattern-matching testers, checking whether the data abide by a certain pattern or not. Examples of
possible questions such algorithms might ask are: (a) is there a bimodality in a time-series produced as
a query result, and if yes, which are the peaks?, (b) if we breakdown the total sum of sales by product
type, is there a “mega-contributor" product type with more than 40% of total sales, and if yes, who
is it?, (c) assuming we sum total sales per month and product is there any month that systematically
outperforms all other months for all products, and if yes, who is it?</p>
      <p>Given a dataset, highlights are important properties of the entire dataset, extracted via the respective
algorithms, along with the necessary data and statistical properties, as answers to the above questions.
In this Section, we present a model for highlights, with Fig. 2 serving as its visual depiction.</p>
      <sec id="sec-3-1">
        <title>3.1. Supporting concepts</title>
        <p>A Dataset is a set of Facts, i.e., structured observations of a domain, under a Schema. Schemata
define a template internal structure for the facts of the dataset, and include a set of Features, each
with a domain of values. For the purpose of our model, features are of two kinds: Measure Types, for
measurable quantities of facts, and, Character Types for contextual, lookup dimensions of facts. In
our reference example, the query result is a Dataset with  ℎ and  as its Character Types, and
  as its Measure.</p>
        <p>Characters are the main entities of the domain being modeled (which in the OLAP terminology
would be named as Dimension Members) and belong to the domain of Character Types. A Character
Type comes with the following characteristics: (a) an  that uniquely identifies each Character, (b)
a  that provides a textual, human-relatable description of the Character, and (c) a set
of ℎ  , that each Character Type carries with it. For example, assume that
the Character Type , has the structure (, , ,  ). A possible
character would be Athens 〈101, “Athens Metropolitan Area", 2928 2, 3.7M 〉. A Measure Value is an
instance of a Measure Type and belongs to its domain.</p>
        <p>To be able to discriminate between attributes and values playing diferent roles in the same highlight,
we need to beforehand introduce the concept of a Role as a template that annotates a class with specific
attributes, specifically: (a) a unique name, (b) an accompanying textual description. The Main Measure
Role and the Explanator Role, introduced in the sequel, are incarnations of roles, each with a name and
textual information that describe their function and discriminate them from other Roles.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Holistic Highlights</title>
        <p>We start our discussion at the Metamodel level, which involves abstract concepts, not bound to any
specific dataset. We will handle how the abstract concepts are materialized into discoveries of significance
for a specific dataset at the Model level, introduced in the sequel.</p>
        <p>An Archetype Property is a specific pattern of internal relationships that can appear between the
contents of a dataset. We assume an extensible set of Archetype Properties, each with well-known
semantics.</p>
        <p>Examples of Archetype Properties include the existence of a trend in a timeseries, the correlation of
two Measure Types, the existence of mega-contributors, etc. One can think of Archetype Properties
as hypotheses that are to be verified when tested over a specific dataset. As we will see in the sequel,
whenever a data set supports an Archetype Property, a Holistic Highlight is born as a testimony for the
validation of the hypothesis (Fig. 3).</p>
        <p>Since we work in the area of data analysis, each Archetype Property operates over a Measure Type
of a dataset. The Main Measure Role of the Archetype Role is a placeholder that captures the details
and constraints of the studied Measure Type. For example, when testing for mega-contributors, where
a certain data point contains a large percentage of a marginal sum, we need a Main Measure that is
additive, such that the summation is allowed.</p>
        <p>Similarly, for the Archetype Property to eventually create a highlight, certain Features of a data set
need to be reserved for specific roles. We reserve an Explanator Role for each such Feature. For
example, when studying a time series trend, apart from the Main Measure being studied, we need
to specify a feature playing the role of a sorter that defines the time ordering of the measure values
(equivalently: what would be in the x-axis of a line chart, if the y-axis would be the measure). For
instance, a  dataset can have several time attributes, , ℎ, ,
while the testing of the trend needs to be done with respect to one of them.</p>
        <p>To test the hypothesis that an Archetype Property poses, we need a family of generic, parameterizable
Highlight Extraction Algorithms, or Algorithms for short, that are to be applied over incoming
datasets. For example, algorithms like Shapiro-Wilk or Kolmogorov-Smirnov test the normality of the
distribution of a measure; Pearson, Spearman or Kendall assess correlation; Diference-Precomputation
or Boolean-based tests can compute Uni/Bi-modality, etc. In all these cases, there are several candidate,
schema-agnostic algorithms to be applied over any possible Dataset, in order to check an Archetype
Property.</p>
        <p>All these algorithms have a set of parameters that have to be fixed whenever executed, as well as a
result. At the meta-level, parameters are modeled as Parameter Roles, and results as Result Types.
Assume, for example, a linear regression algorithm that can optionally take a pre-specified intercept as
parameter: in this case, the Algorithm’s signature at the meta-level contains the respective Parameter
Role. Result Types need the actual Results to have been introduced, so we defer their discussion for the
next paragraph.</p>
        <p>Whenever an entire Dataset is tested for the existence of an Archetype Property in its contents via a
specific algorithm, the resulting verdict of the algorithm’s execution along with important facts and
their interrelationships that verify the Archetype Property constitute a Holistic Highlight.</p>
        <p>A Holistic Highlight is a significant, structured testimony for the existence of an Archetype
Property over a specific dataset that is automatically tested via a dedicated algorithm and characterized
accordingly.</p>
        <p>Holistic Highlights characterize an entire dataset with respect to an Archetype Property and not
specific data points. A couple of illustrative examples follow:
1. Distribution. The distribution of the values of a certain Measure Type, say  , follows the Normal
distribution model. The test was performed by a Shapiro-Wilk test and the p-value is 10− 4.
2. Correlation. The correlation of a Measure Type, say  , with another Measure Type  ′, is
characterized as significant . The test was performed via a Kendall test and the tau-value is 0.83.</p>
        <p>A Holistic Highlight materializes an Archetype Property over a Dataset. The rest of the elements of
a Holistic Highlight are structured and instantiated in direct correspondence to the elements of the
Archetype Property they testify. Hence, a Holistic Highlight concerns a Main Measure, i.e., a specific
Measure Type of its Dataset, and a set of Explanators, i.e., specific Features. Moreover, a Holistic
Highlight comes with an Algorithm Execution, (i.e., the execution of a specific Algorithm among the
Algorithms of its Archetype Property), with each Parameter Role being assigned a concrete Parameter
Instantiation, and producing a concrete Result.</p>
        <p>The Result of an Algorithm Execution is a structured testimony that tells us whether the Archetype
Property exists or not. To define results, we first need to introduce an illustrative example. Assume
we run a Simple Linear Regression over a dataset measuring the sales of cities as a function of the
population in thousands. The result of an algorithm assessing what is the best linear relationship
possible, will produce a technical result containing, among others:
• The constituents of the resulting Model, specifically, an Intercept and a Slope.
• A set of auxiliary metrics characterizing the result; in the case of an SLR these will include, the</p>
        <p>MSE, 2 and a p-value.</p>
        <p>F
i
g
u
r
e
3
:
E
x
a
m
p
l
e
s
o
f
h
o
Moreover, based on these results, we can have a qualitative assessment of the result – e.g., when
 −  &lt; 0.05 and 2 &gt; 0.7 we can call the hypothesis of a linear relationship as being verified.
Conceptually, since a result is the automated, data-based assessment for the existence of an Archetype
Property in a data set, we can report the result of this assessment – equiv., the verification of whether the
Archetype Property holds, and to what extent– via a predicate. In fact, we can have two predicates, a
Simple Qualitative Report and a Detailed Report:
• A Simple Qualitative Report is a generic predicate, common to all Result Types of the form:
ℎ (,  , ) : ,
e.g., (, ,  ) →  
• A Detailed Report customizes the predicate according to the model of the algorithm and extends
the list of generic variables with the technical results:
ℎ (. . . , , , 2, ,  ) : ,
e.g., (, ,  , 756, 0.5, 0.87, 4− 6, 900) →</p>
        <p>To support Results at the Metamodel level, Result Types come with (a) a static Simple Quantitative
Report, common to all Result Types, (b) a Model, which internally holds all the essential attributes for
the result quantification (e.g., an intercept and a slope for the SLR), (c) a set of Auxiliary Metrics, each
with a name and a definition.</p>
        <p>
          Finally, Holistic Highlights are assessed and labeled for their significance. The significance of a
result can be evaluated via an extensible palette of (a) characteristics directly measurable from a query
result, or, (b) session-level characteristics – for both cases, see e.g., the families of Novelty, Relevance,
Peculiarity and Surprise [36, 37] as well as Commonness and Exception [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. Any significance facet can
be assessed either via an arithmetic score, or an enumerated label, but in any case, it should have
an ordinal domain, such that diferent highlights can be compared to each other. To this end, we
introduce (a) at the Metamodel level, Score Types having names with well known semantics and
ordinal domains, and (b) at the Model level, Scores, such that the individual highlights can be annotated
with &lt;  ,   &gt; pairs.
        </p>
        <p>A possible textual description of a Holistic Highlight is as follows:</p>
        <p>The &lt; ℎ  &gt; for &lt;    &gt; in &lt;  &gt;, tested via
&lt; ℎ  &gt; and supported by {}⋆, results in &lt;  &gt;
with {&lt;   &gt;=&lt;  &gt;}⋆.</p>
        <p>Last but not least, a Holistic Highlight can include a set of details in the form of Elementary Highlights,
to be discussed in the sequel.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Elementary Highlights</title>
        <sec id="sec-3-3-1">
          <title>3.3.1. Model level</title>
          <p>In some cases, like for instance in the case of modality peaks, top-k values, or peer-domination, it is
possible that there exist Facts, i.e., specific members of the Dataset, that play specific roles to facilitate
the verification of the Archetype Property in the context of a Holistic Highlight. Depending on the
Archetype Property, these can be its important details, without which any reporting is useless. For
instance, an analyst needs to know at which time point the time series reaches a peak. In other cases,
e.g., in the cases of a distribution check, or the correlation of two Measure Types, these details are not
present.</p>
          <p>An Elementary Highlight is a fact, determined by the combination of a set of contextualizing
Characters demonstrating a behavior measured by a Measure Value, which plays an important role
to the interpretation of a Holistic Highlight. The fundamental diference from Holistic Highlights is
that instead of annotating the entire Dataset with an Archetype Property, Elementary Highlights refer
to specific data points. Two examples (see also Fig. 4):
• Top-k. Character ℎ, with a Measure value of 2100 for   is in the top-k Facts for a
dataset. Its rank=1 is a score that denotes how high this particular fact is in the list of top-k facts.
• Unimodality peak. The time-series of ℎ has a unimodality peak for the Fact determined by
the combination of Characters =ℎ and ℎ=2023-05 and a Measure Value of 1000
for Total Sales.</p>
          <p>Scores are handled in a symmetrical manner to Holistic Highlights.</p>
          <p>A possible textual description of an Elementary Highlight is as follows:</p>
          <p>The combination of characters {&lt; ℎ &gt;}* with value &lt;     &gt;
serves as &lt;  ℎℎ  &gt;
with &lt;   &gt; = &lt;   &gt;.</p>
        </sec>
        <sec id="sec-3-3-2">
          <title>3.3.2. Metamodel level</title>
          <p>In full accordance with the Model Level, each Archetype Property comes with a set of characteristic
Elementary Highlight Roles. An Elementary Highlight Role has a set of identifying Character
Roles and a Measure Role, for the specific Elementary Highlights to materialize. Moreover, it carries
Score Types with formulae for individual scores.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Discussion</title>
      <p>Who are the beneficiaries of the model? A first contribution of the proposed conceptual model
is that it clarifies both the concepts and the terminology for data storytelling, for every stakeholder
involved. Concerning data analysts, the conceptual model allows the structuring of important parts
of the problem in a way that is exploitable later: as soon as Archetype Properties, Characters and
important Measure Values become part of a structured solution, the data analyst can think on the
problem in terms of them (e.g., “What are the main Archetype Properties hiding in my data? Who are
the main Characters in these data?"). Concerning tool builders, it is absolutely feasible to direct the
automation of algorithm execution and result structuring along the concepts of our model; once this
automated extraction and representation of highlights is achieved, their exploitation for storytelling
purposes is straightforward (see Fig. 3 and 4).</p>
      <p>How realistic is the proposed model? Fig. 3 and 4 report on a large number of typical data analysis
algorithms. The fact that, for all these heterogeneous algorithms, there is a straightforward translation
to a common, structured result, immediately exploitable for data storytelling purposes, testifies in favor
of the model proposed in this paper.</p>
      <p>Furthermore, as shown in Fig. 5, the Holistic and Elementary Highlights proposed in this paper
cover the ones reported in the literature for tools automatically producing highlights. All of them deal
with a dataset (typically a query result) and, excepting VOOL and DAISY, they distinguish at least one
measure and a breakdown dimension (there are no other types of explanators). Many highlight types
are proposed, and there is a set of underlying extraction algorithms, albeit not thoroughly modeled.
Similarly, all tools score highlights, but scores are frequently not part of the highlight. Highlight models
are typically not modeled, or, mostly limited to one parameter. Finally, although many tools distinguish
a subset of breakdown values as focus, or a set of important facts, which correspond to Elementary
Highlights, no work provides details on them.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Application of the model</title>
      <p>We have implemented our model in two tools: The first implementation involves a data profiler tool 1,
that automatically profiles the columns of submitted datasets for their descriptive statistics, histograms,
correlations, decision trees, outliers and dominance patterns, along with the respective highlights. The
second implementation involves the extension of a Business Intelligence system2, with a new subsystem
that answers time series queries with a data story based on highlights. In this section we describe the
second tool, as an illustration of the application of our model.</p>
      <p>The analyst is allowed to define a time-related query as a chart that needs to be constructed, and
the system responds by (a) obtaining the result of the query, (b) enriching its result with (b1) results of
auxiliary queries and (b2) highlights from the highlight extraction algorithms, applied over the query
results, and, (c) ultimately, returning all the interesting findings wrapped as a data story composed of
text and graphical representations.</p>
      <p>The main steps of the process (see Figure 6) are the following:
1. Using the Query generator GUI, the analyst specifies the intended query  as a chart, which
can be a bar-chart, line-chart or scatter-plot. This means that practically, the query will have
two grouper attributes (dimension levels) that will be visualized in the two axes of the chart,
and, consequently an aggregated measure. Filters are also parts of the specification. The Query
generator module automatically generates a set of auxiliary queries  in order to contextualize
and assess the results of the original query (see next).
2. All the queries are executed by the Query server module, over the underlying data set, and their
results are produced.
3. The results can be immediately visualized, but most importantly, they are passed through a set
of Highlight extractors. This is the part that is mostly related to our deliberations in this paper.
Highlights include the existence of trend, unimodality and bimodality in time-series query results
(where one grouper is time), the existence of a strong linearity in the results as demonstrated by
a strong linear regression score, and others.
4. A Story-maker module receives query results and highlights, compares and combines them as a
data story with automatically generated charts and text (the explanation of this part falls outside
the scope of this paper).
1Available at https://github.com/DAINTINESS-Group/Pythia
2Available at https://github.com/DAINTINESS-Group/DelianCubeEngine</p>
      <p>Queries. We consider several query classes over cubes in our setup. The most relevant query
class is Time-series queries. For this query class, the first grouper (obligatorily) concerns a level of a
time-related dimension (e.g., month, year, decade). This allows the result to be a timeseries of aggregate
measurements for the second grouper (and, thus visualized as a line-chart, with time in the horizontal
axis). The relationship of groupers with filters can be arbitrary. The general form of the query is
 = ()</p>
      <p>.1,2.2( ()),  : ⋀︀ . = 
where  is a cube name,   is the selection operator applying the conjunctive selection condition  to
,  refers to a cube dimension,  to a dimension level,  to a value in (.),  is a measure,
 is the aggregate function applied to it, and  is the group-by operator with the two grouper levels
as subscripts (the first being time-related) and the aggregated measure as superscript.</p>
      <p>A special case of time-series queries occurs when the second grouper has been pinned to a single
value, i.e.,  : 2.2 = . In other words, the first grouper concerns time, and the second grouper
includes a filter at the same level as the grouper, resulting in a single timeseries as the query result.</p>
      <p>
        The construction of sibling queries exploits the dimension hierarchies of the involved cubes to contrast
the result of the original query to the results of “peer" queries. To avoid long formalities, here, we restrict
ourselves to an indicative example and refer the interested reader to [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] for a rigorous definition and
discussion. In the example of the introduction, the selection condition of the original query restricts the
country to  and the time to 2023 − 2. We assume that all such values pertain to dimension
hierarchies, creating trees of values with ancestor and child values: e.g., the mother of  (at the
 level) is  (at the  level) and the mother of 2023 − 2 is 2023. To generate
siblings, we need to construct two auxiliary queries, one per filter: the first is based on contrasting
Greece to all its siblings, i.e., ℎ(ℎ()) = ℎ() at the country level (all
European countries), whereas the second contrasts 2023 − 2 to its siblings, i.e., all the quarters of
2023. Both (a) the results, and, (b) the highlights of sibling queries can be later contrasted for similarities
and exceptions during the story making process.
      </p>
      <p>Highlights. We test query results for archetype properties via a set of highlight extraction algorithms,
applied to both the original and the auxiliary queries. We have implemented the following highlight
types, along with their respective extraction algorithms, albeit specifically tailored for aforementioned
classes of time-series queries:
• AbsoluteTrend, to detect whether a time-series is absolutely monotonically increasing (uptrend)
or decreasing (downtrend) or none of them.
• KendallBased, to detect whether a timeseries is monotonically increasing (uptrend) or decreasing
(downtrend) based on the Kendall tau coeficient.
• Contributor, to detect whether a timeseries has a value at the x-axis that contributes more than
50% (Mega contributor) in the produced results.
• Modality, to detect whether a Unimodality or Bimodality shape governs the timeseries. A
timeseries is considered unimodal when it forms a U-shaped valley or peak shape. A timeseries is
considered bimodal when it has two distinct peaks or modes in its distribution
• Regression, to perform a Simple Linear Regression.</p>
      <p>
        Score. To determine the importance of results of the model we introduce a score function for each
algorithm producing a score in the range [
        <xref ref-type="bibr" rid="ref1">-1,1</xref>
        ]:
• AbsoluteTrend score: the score equals to -1 if there is an absolute downtrend, 0 if there is no
absolute trend, and 1 if there is an absolute uptrend.
• KendallBased score:  = ( ) where  is Kendall’s coeficient.
• Contributor score: The score is the ratio of the maximum measure of a grouper’s values over the
sum of measures across all grouper instances.
• Modality score: The following steps determine the modality score for the time series:
– Divide the series into segments from the start to each point where the sign of  − − 1
changes.
– For each segment, we compute its score as  = | |−|  | (actually with an
| |
ofset to avoid zero divisions).
– Finally, we compute the score for the timeseries by calculating the average of the scores from
all segments:  = 1− 1 ∑︀0− 1 (), where  the number of segments.
• Regression score: The formula  = 1 − | ℎ( )| (normalized Mean Square Error via
hyperbolic tangent) of the Simple Linear Regression.
      </p>
      <p>
        Score-based pruning. To alleviate the user from the burden of going through all the model results,
we need to identify significant highlights with a significant score, i.e., above a pre-determined threshold.
As the score range is always between [
        <xref ref-type="bibr" rid="ref1">-1,1</xref>
        ], and we do recognize that some negative scores are important
(for example downtrend), we employ the absolute value of every score as the criterion of its significance.
In all our deliberations, we employ a threshold  = 0.5 for the absolute score, above which a model is
considered to be an important highlight.
      </p>
      <p>As a side note, with respect to the characterizations of interestingness along diferent dimensions, all
the aforementioned scores are related to peculiarity: the more intensely present a property is, the more
extreme the score the data receive. Thus, subsets of the data space stand out with a higher score than
the rest of the data, because of their high adherence to the archetype property (correlation, modality,
mega-contributor presence, etc).</p>
      <p>In summary. An observant reader will have already underscored how all these inherently diferent
algorithms (regression, modalities, contributors) are all modeled within the same tool and with the
same highlights model. The handling of this heterogeneity via a single, common, and, as this section
shows, feasible model, is one of the main contributions of this paper.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusions</title>
      <p>In this paper, we have presented a model that facilitates (a) the modeling, (b) the clarification of
terminology, (c) the automation of the production of highlights, i.e., structured characterizations of
subsets of the data space that are worth reporting due to their support of archetype statistical properties
of interest, demonstrated as phenomena. We have introduced the most important entities of the domain
of highlight extraction and discussed their inter-relationships. Holistic highlights are global properties
that pertain to an entire set of facts, whereas elementary highlights are their constituents that identify
facts that play a particular role for the holistic highlight to take place. We have also demonstrated
that (a) frequently encountered archetype properties are nicely covered by our modeling and (b) the
highlight structure facilitates narratives straightforwardly.</p>
      <p>The evaluation of highlight interestingness in a fully automated way, such that we can rank and prune
highlights in an even more precise and context-aware fashion is an open research issue. Structuring
data stories in an eficient way, by taking advantage of the complementarity, overlap, discrepancy, or
other properties of a set of highlights is another open research issue.</p>
    </sec>
    <sec id="sec-7">
      <title>Declaration on Generative AI</title>
      <p>The author(s) have not employed any Generative AI tools.
[26] P. Marcel, V. Peralta, P. Vassiliadis, A framework for learning cell interestingness from cube
explorations, in: ADBIS, 2019.
[27] L. Geng, H. J. Hamilton, Interestingness measures for data mining: A survey, ACM Comput. Surv.</p>
      <p>(2006).
[28] T. D. Bie, Subjective interestingness in exploratory data mining, in: IDA, 2013.
[29] T. Milo, A. Somech, Automating exploratory data analysis via machine learning: An overview, in:</p>
      <p>SIGMOD, 2020.
[30] Y. Patil, S. Amer-Yahia, S. Subramanian, Designing the evaluation of operator-enabled interactive
data exploration in VALIDE, in: HILDA, 2022.
[31] H. Guo, S. R. Gomez, C. Ziemkiewicz, D. H. Laidlaw, A case study using visualization interaction
logs and insight metrics to understand how analysts arrive at insights, IEEE Trans. Vis. Comput.</p>
      <p>Graph. (2016).
[32] E. Zgraggen, Z. Zhao, R. C. Zeleznik, T. Kraska, Investigating the efect of the multiple comparisons
problem in visual analysis, in: CHI, 2018.
[33] M. Joglekar, H. Garcia-Molina, A. G. Parameswaran, Interactive data exploration with smart
drill-down, IEEE Trans. Knowl. Data Eng. (2019).
[34] A. Giuzio, G. Mecca, E. Quintarelli, M. Roveri, D. Santoro, L. Tanca, INDIANA: an interactive
system for assisting database exploration, Inf. Syst. (2019).
[35] D. Deutch, A. Gilad, T. Milo, A. Somech, Explained: Explanations for EDA notebooks, Proc. VLDB</p>
      <p>Endow. (2020).
[36] D. Gkitsakis, S. Kaloudis, E. Mouselli, V. Peralta, P. Marcel, P. Vassiliadis, Assessment methods for
the interestingness of cube queries, in: DOLAP, 2023.
[37] D. Gkitsakis, S. Kaloudis, E. Mouselli, V. Peralta, P. Marcel, P. Vassiliadis, Cube query
interestingness: Novelty, relevance, peculiarity and surprise, Inf. Syst. 123 (2024) 102381. URL:
https://doi.org/10.1016/j.is.2024.102381. doi:10.1016/J.IS.2024.102381.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Meliou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Abouzied</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. J.</given-names>
            <surname>Haas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. R.</given-names>
            <surname>Haque</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. L.</given-names>
            <surname>Mai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Vittis</surname>
          </string-name>
          ,
          <article-title>Data management perspectives on prescriptive analytics (invited talk)</article-title>
          ,
          <source>in: ICDT</source>
          ,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>D.</given-names>
            <surname>Gkesoulis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Vassiliadis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Manousis</surname>
          </string-name>
          , Cinecubes:
          <article-title>Aiding data workers gain insights from OLAP queries</article-title>
          ,
          <source>Inf. Syst</source>
          .
          <volume>53</volume>
          (
          <year>2015</year>
          )
          <fpage>60</fpage>
          -
          <lpage>86</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>B.</given-names>
            <surname>Tang</surname>
          </string-name>
          , S. Han,
          <string-name>
            <given-names>M. L.</given-names>
            <surname>Yiu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Ding</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <article-title>Extracting top-k insights from multi-dimensional data</article-title>
          ,
          <source>in: SIGMOD</source>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , W. Cui,
          <string-name>
            <given-names>K.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Ma</surname>
          </string-name>
          , D. Zhang, Datashot:
          <article-title>Automatic generation of fact sheets from tabular data</article-title>
          ,
          <source>IEEE Trans. Vis. Comput. Graph</source>
          .
          <volume>26</volume>
          (
          <year>2020</year>
          )
          <fpage>895</fpage>
          -
          <lpage>905</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>P.</given-names>
            <surname>Ma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Ding</surname>
          </string-name>
          , S. Han,
          <string-name>
            <surname>D</surname>
          </string-name>
          . Zhang, Metainsight:
          <article-title>Automatic discovery of structured knowledge for exploratory data analysis</article-title>
          ,
          <source>in: SIGMOD</source>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>D.</given-names>
            <surname>Shi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Shi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Cao</surname>
          </string-name>
          , Calliope:
          <article-title>Automatic visual data story generation from a spreadsheet</article-title>
          ,
          <source>IEEE Trans. Vis. Comput. Graph</source>
          .
          <volume>27</volume>
          (
          <year>2021</year>
          )
          <fpage>453</fpage>
          -
          <lpage>463</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>P.</given-names>
            <surname>Ma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Ding</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Wang</surname>
          </string-name>
          , S. Han,
          <string-name>
            <surname>D</surname>
          </string-name>
          . Zhang, Insightpilot:
          <article-title>An llm-empowered automated data exploration system</article-title>
          ,
          <source>in: EMNLP'</source>
          <year>2023</year>
          ,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>S.</given-names>
            <surname>Idreos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Papaemmanouil</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Chaudhuri</surname>
          </string-name>
          ,
          <article-title>Overview of data exploration techniques</article-title>
          ,
          <source>in: SIGMOD</source>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>O. B.</given-names>
            <surname>El</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Milo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Somech</surname>
          </string-name>
          ,
          <string-name>
            <surname>ATENA:</surname>
          </string-name>
          <article-title>an autonomous system for data exploration based on deep reinforcement learning</article-title>
          ,
          <source>in: CIKM</source>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>A.</given-names>
            <surname>Personnaz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Amer-Yahia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Berti-Équille</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Fabricius</surname>
          </string-name>
          ,
          <string-name>
            <surname>S.</surname>
          </string-name>
          <article-title>Subramanian, DORA THE EXPLORER: exploring very large data with interactive deep reinforcement learning</article-title>
          ,
          <source>in: CIKM</source>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>T. De Bie</surname>
            ,
            <given-names>L. D.</given-names>
          </string-name>
          <string-name>
            <surname>Raedt</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Hernández-Orallo</surname>
            ,
            <given-names>H. H.</given-names>
          </string-name>
          <string-name>
            <surname>Hoos</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Smyth</surname>
            ,
            <given-names>C. K. I. Williams</given-names>
          </string-name>
          ,
          <article-title>Automating data science</article-title>
          ,
          <source>Commun. ACM</source>
          <volume>65</volume>
          (
          <year>2022</year>
          )
          <fpage>76</fpage>
          -
          <lpage>87</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>P.</given-names>
            <surname>Marcel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Peralta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Amer-Yahia</surname>
          </string-name>
          ,
          <article-title>Data narration for the people: Challenges and opportunities</article-title>
          , in: EDBT,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>S.</given-names>
            <surname>Amer-Yahia</surname>
          </string-name>
          ,
          <article-title>Intelligent agents for data exploration</article-title>
          ,
          <source>VLDB Endow</source>
          .
          <volume>17</volume>
          (
          <year>2024</year>
          )
          <fpage>4521</fpage>
          -
          <lpage>4530</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>T.</given-names>
            <surname>Lipman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Milo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Somech</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Wolfson</surname>
          </string-name>
          ,
          <string-name>
            <surname>O. Zafar,</surname>
          </string-name>
          <article-title>LINX: A language driven generative system for goal-oriented automated data exploration</article-title>
          ,
          <source>in: EDBT</source>
          ,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>O. B.</given-names>
            <surname>El</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Milo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Somech</surname>
          </string-name>
          ,
          <article-title>Automatically generating data exploration sessions using deep reinforcement learning</article-title>
          ,
          <source>in: SIGMOD</source>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>R.</given-names>
            <surname>Ding</surname>
          </string-name>
          , S. Han,
          <string-name>
            <surname>Y</surname>
          </string-name>
          . Xu,
          <string-name>
            <given-names>H.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , D. Zhang, Quickinsights:
          <article-title>Quick and automatic discovery of insights from multi-dimensional data</article-title>
          ,
          <source>in: SIGMOD</source>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>S.</given-names>
            <surname>Sarawagi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Agrawal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Megiddo</surname>
          </string-name>
          ,
          <article-title>Discovery-driven exploration of OLAP data cubes</article-title>
          ,
          <source>in: EDBT</source>
          ,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>M.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Cai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Cui</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Shi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Cao</surname>
          </string-name>
          , Erato:
          <article-title>Cooperative data story editing via fact interpolation</article-title>
          ,
          <source>IEEE Trans. Vis. Comput. Graph</source>
          .
          <volume>29</volume>
          (
          <year>2023</year>
          )
          <fpage>983</fpage>
          -
          <lpage>993</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>H.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Ying</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , Y. Wu,
          <string-name>
            <given-names>H.</given-names>
            <surname>Qu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          , Notable:
          <article-title>On-the-fly assistant for data storytelling in computational notebooks</article-title>
          ,
          <source>in: CHI</source>
          ,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>F. E.</given-names>
            <surname>Outa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Francia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Marcel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Peralta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Vassiliadis</surname>
          </string-name>
          ,
          <article-title>Towards a conceptual model for data narratives</article-title>
          ,
          <source>in: ER</source>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>S.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. L.</given-names>
            <surname>Andrienko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. V.</given-names>
            <surname>Andrienko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. H.</given-names>
            <surname>Nguyen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Turkay</surname>
          </string-name>
          ,
          <article-title>Supporting story synthesis: Bridging the gap between visual analytics and storytelling</article-title>
          ,
          <source>IEEE Trans. Vis. Comput. Graph</source>
          .
          <volume>26</volume>
          (
          <year>2020</year>
          )
          <fpage>2499</fpage>
          -
          <lpage>2516</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>M.</given-names>
            <surname>Francia</surname>
          </string-name>
          , E. Gallinucci,
          <string-name>
            <given-names>M.</given-names>
            <surname>Golfarelli</surname>
          </string-name>
          , S. Rizzi,
          <article-title>VOOL: A modular insight-based framework for vocalizing OLAP sessions</article-title>
          ,
          <source>Inf. Syst</source>
          .
          <volume>129</volume>
          (
          <year>2025</year>
          )
          <fpage>102496</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>J.</given-names>
            <surname>Xing</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. V.</given-names>
            <surname>Jagadish</surname>
          </string-name>
          ,
          <article-title>Data-driven insight synthesis for multi-dimensional data</article-title>
          ,
          <source>VLDB Endow</source>
          .
          <volume>17</volume>
          (
          <year>2024</year>
          )
          <fpage>1007</fpage>
          -
          <lpage>1019</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>S.</given-names>
            <surname>Sarawagi</surname>
          </string-name>
          ,
          <article-title>User-adaptive exploration of multidimensional data</article-title>
          ,
          <source>in: VLDB</source>
          ,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>R.</given-names>
            <surname>Agrawal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Srikant</surname>
          </string-name>
          ,
          <article-title>Fast algorithms for mining association rules in large databases</article-title>
          ,
          <source>in: VLDB</source>
          ,
          <year>1994</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>