=Paper=
{{Paper
|id=Vol-3931/paper2
|storemode=property
|title=Finding comparison insights in multidimensional datasets
|pdfUrl=https://ceur-ws.org/Vol-3931/paper2.pdf
|volume=Vol-3931
|authors=Patrick Marcel,Claire Antoine,Alexandre Chanson,Nicolas Labroche
|dblpUrl=https://dblp.org/rec/conf/dolap/MarcelACL25
}}
==Finding comparison insights in multidimensional datasets==
Finding Comparison Insights in Multidimensional Datasets
Claire Antoine1,† , Alexandre Chanson1 , Nicolas Labroche1 and Patrick Marcel2,*,‡
1
LIFAT, université de Tours, France
2
LIFO, Université d’Orléans, France
Abstract
What are the best comparisons that can be found in a multidimensional dataset? In this paper, we propose to support exploratory data
analysis by answering this question, resorting to both input space sampling and output space sampling. We sample both the dataset
and the set of aggregate queries that a user would have to ask to find significant comparisons that are frequent in the data cube of
the dataset. Our tests show that our approach is effective in retrieving significant comparisons, and that comparison insights can be
computed in seconds even for fairly large datasets, if the number of values compared is small.
Keywords
Exploratory data analysis, Comparison queries, Data cube, Statistical tests, Sampling
1. Introduction departure date departure airline departure
airport hour delay
Exploratory Data Analysis (EDA) is the notoriously tedious DCA 1-1-2015 945 WN -5
activity of Data Science consisting of interactively analyzing LAX 1-1-2015 1951 OO -4
DCA 1-1-2015 2233 AA 93
a dataset to gain insights. According to De Bie et al. [1]
SEA 2-1-2015 547 WN -3
EDA poses the greatest challenges for automation, since
LAX 2-1-2015 2143 OO 63
background knowledge and human judgment are the keys
LAX 3-1-2015 1143 OO 206
to success. Recently, many approaches were proposed to LAX 3-1-2015 927 AA -3
support EDA, including approaches to automatically gener- LAX 3-1-2015 1346 OO -9
ate EDA sessions, often defined as maximization problems SEA 3-1-2015 1654 OO 54
(see, e.g., [2, 3, 4, 5]). SEA 4-1-2015 155 AA 85
In this work, we target a specific type of insights, com-
parisons, that are very popular among data workers (see Table 1
e.g., [6, 7, 8]). For a given multidimensional dataset 𝑅 with Excerpt of table flight
a categorical attribute 𝐴 and a numerical attribute 𝑀 , a
comparison insight is noted 𝑎 < 𝑏, where 𝑎 and 𝑏 are values
in the active domain of 𝐴, and is such that there are enough Our goal is to efficiently extract such insights in a given
evidences that the value of 𝑀 for 𝐴 = 𝑎 is less than the multidimensional dataset. To do so, we resort to sampling
value of 𝑀 for 𝐴 = 𝑏. the dataset and use existing optimization structures. Re-
sorting to sampling the dataset allows to obtain a candidate
Example 1.1. Consider the excerpt of the flight dataset dis- comparison insight quickly, using statistical tests to check
played in Table 1, loaded in a relational database. Suppose the its significance. Besides, the candidate insights should be
analyst is interested in comparing airlines (the categorical at- validated against the possible group-bys that can be com-
tribute 𝐴) to each other regarding the average departure delay puted over the multidimensional dataset. To validate the
(the numerical attribute 𝑀 ). To do so, the analyst could for insight, we evaluate aggregate queries against the real data,
instance run a SQL query selecting two airlines, say WN and using a sample of queries over existing materialized views.
OO (the values 𝑎 and 𝑏), grouping by airline and some other Our approach to extract comparison insights from a
attributes (e.g., departure airport), and aggregating measure dataset contributes with:
departure delay. This implies using a full table scan, which
may be prohibitively costly for large tables, and results in • a robust way of obtaining candidate comparison
a very summarized view of the data, which may hide some insights by devising a statistics to choose between a
"local" effects at more granular level of details. To obtain in- parametric and a non parametric test of significance,
teresting comparisons, the analyst would have to run enough based on the distribution of values in the sample of
aggregate queries to check that the same insights for WN and the dataset,
OO are found at various levels of detail. • an algorithm for generating queries over existing
materialized views to validate candidate comparison
insights,
DOLAP 2025: 27th International Workshop on Design, Optimization, Lan- • a series of experiments on artificial and real datasets
guages and Analytical Processing of Big Data, co-located with EDBT/ICDT to asses the effectiveness and efficiency of the ap-
2025, March 25, 2025, Barcelona, Spain proach.
*
Corresponding author and main contributor.
†
The work was done when the author was an intern at LIFAT. The outline of the paper is as follows. Next section
‡
This work was partially supported by JUNON Program (DATA/LIFO) presents the formal background. Section 3 introduces our
with financial support of Région Centre-Val de Loire, France. approach to extract comparisons. Section 4 details how can-
$ claire.antoine56@gmail.com (C. Antoine);
Alexandre.Chanson@univ-tours.fr (A. Chanson);
didate comparisons are obtained while Section 5 presents
Nicolas.Labroche@univ-tours.fr (N. Labroche); the validation of candidates. Experiments are reported in
Patrick.Marcel@univ-orleans.fr (P. Marcel) Section 6. Section 7 discusses related work and Section 8
0000-0003-3171-1174 (P. Marcel) concludes by outlining perspectives.
© 2025 Copyright for this paper by its authors. Use permitted under Creative Commons License
Attribution 4.0 International (CC BY 4.0).
CEUR
ceur-ws.org
Workshop ISSN 1613-0073
Proceedings
2. Formal background Definition 2.4 (Validated comparison insight). Let
𝑎 < 𝑏 be a candidate insight, and 𝐶 be a cuboid over 𝑅
We give the definition of the comparison queries we consider. including attribute 𝐴. We say that 𝐶 supports 𝑎 < 𝑏 if the
This definition extends that of [4, 9] by considering more ratio of violations of 𝑎 < 𝑏 is below a user defined threshold
than one attribute in the group-by set. We consider an 𝜏 . Finally, we say that the insight is validated if the ratio
instance of relation 𝑅 of schema 𝑅[𝐴1 , . . . , 𝐴𝑛 , 𝑀1 , . . . , of cuboids supporting the insight is above a user defined
𝑀𝑚 ]. The 𝐴𝑖 ’s are categorical attributes and the 𝑀𝑗 ’s are threshold 𝜌.
numerical attributes, called measures in what follows. The
active domain of attribute 𝐴 is noted 𝑎𝑑𝑜𝑚(𝐴). As usual for
multidimensional data, we require 𝑅 to be in Boyce-Codd 3. Approach overview
Normal Form.
Definition 2.1 (Comparison queries). Comparison
queries are extended relational queries of the form:
𝛾𝐺,𝑎𝑔𝑔(𝑀 )→𝑣𝑎𝑙 (𝜎𝐴=𝑎 (𝑅)) ◁▷ 𝛾𝐺,𝑎𝑔𝑔(𝑀 )→𝑣𝑎𝑙′ (𝜎𝐴=𝑏 (𝑅))
where 𝐴 is a categorical attribute in {𝐴1 , . . . , 𝐴𝑛 }, 𝐺 is
a group-by set in 2{𝐴1 ,...,𝐴𝑛 } , 𝑀 is a measure attribute
in {𝑀1 , . . . , 𝑀𝑚 }, 𝑎𝑔𝑔 is an aggregate function, and
𝑎, 𝑏 ∈ 𝑎𝑑𝑜𝑚(𝐴). 𝛾 denotes the grouping/aggregation
operator.
For a group-by set 𝐺 of relation 𝑅, we call a cuboid of
𝑅 for function 𝑎𝑔𝑔 and measure 𝑀 the result of query:
𝛾𝐺;𝑎𝑔𝑔(𝑀 ) (𝑅). The cuboids of 𝑅 form the data cube
[10] of 𝑅. The number of cuboids for 𝑅 and 𝑎𝑔𝑔(𝑀 ) is
2|{𝐴1 ,...,𝐴𝑛 }| , and the number of cuboids for 𝑅 and 𝑎𝑔𝑔(𝑀 )
whose group-by includes attribute 𝐴 is 2|{𝐴1 ,...,𝐴𝑛 }|−1 . Figure 1: Approach overview
For two values 𝑎, 𝑏 of an attribute 𝐴 for measure 𝑀 , a
comparison insight 𝑎 < 𝑏 indicates that, on average, values
The computation of comparison insights requires to ex-
of 𝑀 for 𝑎 are statistically lower than values of 𝑀 for 𝑏.
plore the data cube of 𝑅 which may be too computationally
The result of a comparison query can present evidences (if
costly if 𝑅 is large. We developed an approach based on
the value of 𝑀 for 𝑎 is indeed lower than the value of 𝑀
sampling to tackle those cases of discovering comparison
for 𝑏), or violations (otherwise) of the insight. An actual
insights from large relations. The principle is to compute
insight must have enough evidences that support it. If this
candidate comparison insights on a sample of 𝑅 and validate
is not known, we call it a candidate comparison insight.
them over a sample of the cuboids of 𝑅.
Definition 2.2 (Candidate comparison insight). A Our approach is illustrated in Figure 1. We first sam-
candidate comparison insight 𝑎 < 𝑏 where 𝑎, 𝑏 are in ple 𝑅, so that this sample fits in main memory. Let 𝑆
𝑎𝑑𝑜𝑚(𝐴) for attribute 𝐴, measure 𝑀 of 𝑅, aggregation be the sample of 𝑅. Over 𝑆, we compute a candidate
function 𝑎𝑔𝑔, and instance 𝑔 of a group-by set 𝐺 over 𝑅 comparison insight 𝑎 < 𝑏, for the comparison query
including 𝐴 is a pair of tuples (𝑔, 𝑎, 𝑚𝑎 ), (𝑔, 𝑏, 𝑚𝑏 ) of the 𝛾𝐺,𝑎𝑔𝑔(𝑀 )→𝑣𝑎𝑙 (𝜎𝐴=𝑎 (𝑆)) ◁▷ 𝛾𝐺,𝑎𝑔𝑔(𝑀 )→𝑣𝑎𝑙′ (𝜎𝐴=𝑏 (𝑆))
result of a comparison query over 𝑅 such that 𝑚𝑎 < 𝑚𝑏 . where 𝐺 includes all categorical attributes of 𝑅. To check
if the comparison is significant, we run a statistical test as-
A candidate comparison insight can be violated depend- sessing whether the mean for 𝑎 is significantly greater than
ing on the group-by set instance, where a violation of can- that for 𝑏 over the sample.
didate 𝑎 < 𝑏 is when 𝑚𝑎 ≥ 𝑚𝑏 . To validate the candidate comparisons found with the
statistical tests over 𝑆, we sample the set 𝐿 of cuboids over 𝑅
airline
as follows. From this set 𝐿, we assume that some cuboids are
departure airport AA OO WN
DCA 11.92 -1.67 8.13 materialized on disk under the form of materialized views
LAX 10.15 3.18 19.09 (𝑀 𝐶, dark green cuboids in Figure 1). From this set 𝑀 𝐶 of
SEA 6.22 31.0 4.5 materialized views (excluding R), we consider the set 𝐶𝑄
of all comparison queries that can be answered using 𝑀 𝐶
Table 2 (purple bordered in Figure 1). Generally, |𝐶𝑄| > |𝑀 𝐶|.
Cuboid by airline and departure airport (crosstab view) over the We sample this set 𝐶𝑄. This sample is called 𝑄. Over 𝑄,
flight table
we approximate the ratio of queries of 𝐶𝑄 satisfying the
user threshold 𝜏 . We postulate that this approximation is
close enough to the ratio of queries of 𝐿 satisfying the user
Example 2.3. Assume Table 2 shows
threshold.
the result of the aggregate query 𝑞 =
We now detail these two steps.
𝛾𝑎𝑖𝑟𝑙𝑖𝑛𝑒,𝑑𝑒𝑝𝑎𝑟𝑡𝑢𝑟𝑒_𝑎𝑖𝑟𝑝𝑜𝑟𝑡;𝑎𝑣𝑔(𝑑𝑒𝑝𝑎𝑟𝑡𝑢𝑟𝑒_𝑑𝑒𝑙𝑎𝑦) (𝑓 𝑙𝑖𝑔ℎ𝑡)
over the flight table of the previous Example. The candidate
comparison insight AA> OO holds for DCA and LAX 4. Statistical test to obtain candidate
airports, while SEA airport shows a violation of AA> OO.
This candidate insight can be obtained from the compar- comparisons
ison query: 𝛾𝐺,𝑎𝑣𝑔(𝑑𝑒𝑙𝑎𝑦)→𝐴𝐴 (𝜎𝑎𝑖𝑟𝑙𝑖𝑛𝑒=𝐴𝐴 (𝑓 𝑙𝑖𝑔ℎ𝑡)) ◁▷
As we do not have all of 𝑅 but a sample 𝑆, the pairwise
𝛾𝐺,𝑎𝑣𝑔(𝑑𝑒𝑙𝑎𝑦)→𝑂𝑂 (𝜎𝑎𝑖𝑟𝑙𝑖𝑛𝑒=𝑂𝑂 (𝑓 𝑙𝑖𝑔ℎ𝑡)) where 𝐺 denotes
comparison of 𝑎, 𝑏 is made using a statistical test over 𝑆. The
all the attributes of table flight (see the previous example).
null hypothesis of the test should be that the mean for 𝑎 is threshold on the statistic in question. We trained two models
greater than that for 𝑏. The choice of accepting or rejecting (decision tree, logistic regression) to classify pairs whose
the hypothesis is made based on the p-value of the chosen type I error is out of bounds (1) and those for which it is
test. As we will be making many comparisons, the p-values controlled (0) based on each of the candidate statistics. The
should be corrected, using e.g., the Benjamini-Hochberg training sample transmitted to our models was balanced by
FDR correction [11]. undersampling, to avoid favoring the majority class.
We could use a non-parametric test, as in [4] since such For each model type, we selected the most interesting
tests make few or no assumptions about the data distri- statistic (the one creating the greatest heterogeneity for the
bution. However, the statistical power of non parametric decision tree, and the one selected by backward selection for
tests is generally lower and some tests, e.g., permutation logistic regression). We took the highest probability thresh-
tests, require more computation time, as they estimate the old for logistic regression that ensures 0% false negatives on
distribution by performing numerous re-samplings. the training sample. In our case, false negatives (not detect-
We choose to use the parametric Welch’s one-sided t-test ing a pair where type I error is out of bounds) are considered
that has the advantage of not making assumptions about the more serious errors than false positives (thinking that a pair
variance or sample size, which is well-suited to a context with controlled type I error is out of bounds) because in the
where there is no prior knowledge of the data. However first case, we present a false test result to the user, while in
Welch’s test still assumes normality of the data and is mainly the second case, we postpone the decision to more tests but
impacted by skewness and sample size. Indeed, type I error do not give false conclusions. We compared the two models
tends to increase with the rise in skewness of the distribu- based on the 𝐹2 -score on the rest of the data (excluding the
tions [12]. training sample) which provides a more severe estimation
The question becomes: how to build a decision model of recall, i.e. it maximizes the portion of out-of-bounds pairs
based on a dedicated statistic to verify whether the sam- detected, which avoids false conclusions.
ples used in the comparison are sufficiently normal and ⃒Both models agree ⃒ on the following: the best statistics
decide whether to use Welch’s test or a non parametric is ⃒ 𝑠𝑘𝑒𝑤
⃒ 𝑎
− 𝑠𝑘𝑒𝑤𝑏 ⃒
and a threshold of 0.049 achieves the
𝑛𝑎 𝑛𝑏 ⃒
test. We therefore present our methodology relying on the
best 𝐹2 -score of 0.99 (with no false negatives) to separate
parametric Welch-t test to devise potentially discriminat-
valid/invalid pairs on the training sample. Below this thresh-
ing candidate statistics (Section 4.1) and then train a simple
old, pairs are valid for using Welch’s test. On the test sample,
decision model based on these candidates (Section 4.2).
the 𝐹2 -score is 0.73 (with a recall of 1). We therefore chose
to use this statistics and threshold for deciding which test
4.1. Devising candidate statistics to run.
To devise a statistics for deciding which test to use,
we considered 8 candidate statistics, devised based 5. Validating candidate comparisons
on the factors cited in the literature as influenc-
ing Welch’s test robustness, namely sample size The validation of a candidate comparison insight is de-
(𝑛) and skewness (𝑠𝑘𝑒𝑤) for each sample, 𝑎 and 𝑏: scribed in Algorithm 1. Recall from Section 3 that the al-
(𝑛𝑎 × 𝑛𝑏 ), (𝑛𝑎 + 𝑛𝑏 ), (𝑠𝑘𝑒𝑤
⃒ 𝑎 × 𝑠𝑘𝑒𝑤⃒𝑏 ), (𝑠𝑘𝑒𝑤𝑎 + gorithm validates the candidate insight 𝑎 < 𝑏 on a sample
𝑠𝑘𝑒𝑤𝑏 ), |𝑠𝑘𝑒𝑤𝑎 − 𝑠𝑘𝑒𝑤𝑏 | , ⃒ 𝑛𝑎 − 𝑠𝑘𝑒𝑤
⃒ 𝑠𝑘𝑒𝑤𝑎 𝑏⃒
, ( 𝑠𝑘𝑒𝑤 𝑎
+ of queries 𝑄. Given the sparsity of the data, the sample 𝑆
𝑛𝑏 ⃒ 𝑛2
𝑠𝑘𝑒𝑤𝑏
𝑎
may return no data for 𝑎 or 𝑏. If so the candidate is not
𝑛2
), ( 𝑠𝑘𝑒𝑤
𝑛𝑎
𝑎
+ 𝑠𝑘𝑒𝑤
𝑛𝑏
𝑏
). considered, and the next candidate comparison is checked
𝑏
We generated artificial datasets for 𝑎 and 𝑏 with various (line 4). The candidate comes with a p-values indicating if
distributions (standard normal, normal with 𝜎 = 20, uni- the difference 𝑎 < 𝑏 is significant in 𝑆. If it is not, the next
form, chi-squared, and exponential) and different sample candidate comparison is checked (line 4). If the candidate is
sizes 𝑛 ∈ {10, 50, 100, 200, 500, 1000}. The goal was to considered and significant, the queries to check violations
obtain a varied panel of pairs (distribution for 𝑎, 𝑛𝑎 ) × (dis- over the sample 𝑄 are generated (line 5), and ran over the
tribution for 𝑏, 𝑛𝑏 ): distributions with high asymmetry and materialized cuboids (lines 8). Again, the query result may
completely symmetric ones, equal or unequal sample sizes, return no data for 𝑎 or 𝑏. In that case, the cuboid is not
with large and small size differences. Each sample of the counted in the denominator of the prediction (lines 10). If
same pair was generated with the same expectation 𝜇, to be the prediction is such that it is over the threshold (line 15)
under the hypothesis 𝐻0 : 𝜇𝑎 = 𝜇𝑏 (= 𝜇) of Welch’s test. or such that this threshold can not be reached given the
If the test rejects 𝐻0 , we know it makes a type I error. For remaining cuboids to check (line 18), then no more vali-
each pair considered, we draw 10,000 replicas of different dation queries are run and another candidate is checked.
sample pairs, perform Welch’s test at the nominal threshold Otherwise the algorithm runs until no more queries are left.
𝛼 = 5%, and calculate the rejection rate among the replicas, Phrased in SQL, the queries generated to check violations
giving us an estimator of the type I error rate for this given of 𝑎 < 𝑏 are of the following form:
combination. The number of replicas was chosen to be large
enough to ensure that the rejection rate is a good estimator WITH
of the type I error rate [12, 13]. We also considered several Q1 AS (SELECT G
expectations 𝜇 ∈ {2, 5, 10, 100} in case the parameter also FROM (SELECT G,A FROM view_G
impacts the type I error rate. WHERE A in (a,b) group-by G,A )
group-by G HAVING count(*) >1),
Q2 AS (SELECT G,A, agg(M) rank () over
4.2. Decision model (partition by G ORDER BY agg(M) desc)
We want a model that uses only one statistic to classify FROM view_G
pairs, to have a quick and simple decision rule: a decision WHERE A in (a,b)
Algorithm 1 Candidate comparison validation FROM (Q1)
Require: a sample of queries 𝑄, two user thresholds 𝜏 and group-by A1,A2
𝜌, a candidate comparison insight 𝑎 < 𝑏 )
Ensure: Valid if 𝑎 < 𝑏 is valid, Invalid if it is not, Inconclu- SELECT A1, A2, (pos-neg)/cnt score
sive if 𝑎 < 𝑏 is not considered or not significant FROM (Q2);
1: 𝑛𝑏𝑄𝑢𝑒𝑟𝑖𝑒𝑠 = |𝑄|
We postulate that such queries may not be able to exploit
2: 𝑛𝑏𝑂𝑘 = 0
any index and that the self-join between aggregates over
3: 𝑛𝑏𝑇 𝑒𝑠𝑡 = 0
the materialized view is likely to be very costly. Therefore,
4: if 𝑎 < 𝑏 is considered and significant then
only small datasets may benefit from this strategy.
5: 𝑉 = generate validation queries over 𝑄
6: for 𝑞 ∈ 𝑉 do
7: 𝑛𝑏𝑇 𝑒𝑠𝑡 = 𝑛𝑏𝑇 𝑒𝑠𝑡 + 1 6. Experiments
8: 𝑟 = evaluate 𝑞
9: if 𝑎 and 𝑏 are not in 𝑟 then This section reports the tests we did to assess the effective-
10: 𝑛𝑏𝑄𝑢𝑒𝑟𝑖𝑒𝑠 = 𝑛𝑏𝑄𝑢𝑒𝑟𝑖𝑒𝑠 − 1 ness and efficiency of our approach.
11: end if
12: if ratio violations in 𝑟 < 𝜏 then 6.1. Settings
13: 𝑛𝑏𝑂𝑘 = 𝑛𝑏𝑂𝑘 + 1
14: end if We developed a prototype in Python 3.10 over the Post-
15: if 𝑛𝑏𝑂𝑘/𝑛𝑏𝑄𝑢𝑒𝑟𝑖𝑒𝑠 > 𝜌 then gres 16 RDBMS1 . We use the SAMPLE method of Postgres
16: return Valid to sample the initial dataset, with the ’SYSTEM_ROWS’
17: end if parameter2 . As Postgres does not support query rewriting
18: if 𝑛𝑏𝑂𝑘+|𝑄|−𝑛𝑏𝑇
𝑛𝑏𝑄𝑢𝑒𝑟𝑖𝑒𝑠
𝑒𝑠𝑡
< 𝜌 then using materialized view, we implemented a simple rewriting
19: return Invalid method based only on the list of attributes in the group-by
20: end if set of the query. Tests are run on a Macbook Pro with Apple
21: end for M2 Pro chip and 32GB of RAM. Postgres was run with the
22: if 𝑛𝑏𝑂𝑘/𝑛𝑏𝑄𝑢𝑒𝑟𝑖𝑒𝑠 > 𝜌 then default configuration, i.e., with 128MB of shared memory
23: return Valid buffers and 4MB of working memory.
24: else
25: return Invalid Dataset #tuples Disk #cat. #cub. Density
26: end if sp. (MB) attr.
27: end if
F100K 110,358 10 6 32 4.4.10−10
F600K 633,938 129 7 64 5.7.10−12
28: return Inconclusive
SSB 6,001,171 1406 5 16 2.7.10−8
Health 12,694,445 2309 6 32 1.3.10−2
group-by G,A ) Table 3
SELECT G,string_agg(A::text,',') Dataset statistics
FROM (Q2)
WHERE (G) IN (Q1) Datasets used are in Table 3. F100K and F600K are ex-
group-by G; cerpts from the flight delays dataset3 . SSB is the artificial
dataset of the SSB benchmark [14]. Health is the Rate table
where view_G is the materialized view over which the
from the health insurance dataset4 .
query is evaluated, Q2 is used to order 𝑎 and 𝑏 in the result of
We ran two sets of tests: the first tests compare the re-
the comparison query over view_G and Q1 removes group-
sult of our approach to ground truth on the small F100K
by instances where 𝑎 or 𝑏 do not appear.
dataset, for the airline attribute. The second set of tests on
Note that Algorithm 1 generates one query per pair (𝑎, 𝑏)
all datasets reports the time to compute 90 pairs for one
per element of the sample 𝑄, allowing to leverage existing
attribute: the airline attribute having 14 values (91 pairs
indexes over attribute 𝐴. Another strategy consists of leav-
in total) for F100k and F600K, the statecode attribute for
ing the DBMS to actually compute the ratio of violations of
Health and the lo_suppkey attribute for SSB.
all pairs in a cuboid, by generating one query per element
of the sample 𝑄. Phrased in SQL, these queries are of the
following form: 6.2. Effectiveness
WITH Since our approach uses sampling, we ran effectiveness
Q1 AS (SELECT G,c1.A A1,c2.A A2, tests to compare its outcome (the comparison insights) to
sign(c1.M-c2.M) sign the ground truth, i.e., when comparisons are computed on
FROM the non-sampled relation 𝑅 and all its cuboids. Unless oth-
(SELECT G,A, agg(M) M erwise stated, we arbitrarily fixed 𝜏 = 0.4 for the ratio of
FROM view_G group-by G,A) c1, violations in a cuboid and 𝜌 = 0.4 for the ratio of cuboids
(SELECT G,A, agg(M) M showing less than 0.4 violations. The percentage of the lat-
FROM view_G group-by G,A) c2 tice materialized is fixed to 40%. Note that the materialized
WHERE c1.A