=Paper=
{{Paper
|id=None
|storemode=property
|title=Structural Repairs of Multidimensional Databases
|pdfUrl=https://ceur-ws.org/Vol-749/paper15.pdf
|volume=Vol-749
|dblpUrl=https://dblp.org/rec/conf/amw/AriyanB11
}}
==Structural Repairs of Multidimensional Databases==
Structural Repairs of Multidimensional Databases∗
Sina Ariyan and Leopoldo Bertossi †
Carleton University, School of Computer Science, Ottawa, Canada.
{mariyan,bertossi}@scs.carleton.ca
Abstract. In a multidimensional (MD) database, dimensions may be subject to
semantic conditions that are not enforced by MD DBMSs or data warehouse ap-
plications. Strictness and homogeneity are possibly two of them; and are crucial
for the efficiency and correctness of answering MD aggregate queries and up-
dating materialized aggregate views. Dimensions may become inconsistent, i.e.
non-strict or heterogeneous, as the result of update operations. As a methodol-
ogy to restore consistency, we propose and investigate changes to the dimension
schema, as an alternative to changes on the dimension instance. We introduce
the notion of minimal structural repair, and establish that under certain condi-
tions, a structural repair reduces the cost wrt changing the dimension instance.
We also show that it allows for a correct rewriting of queries posed to the orig-
inal MD model into queries in terms of the new schema. Finally, we show how
query-scoped calculated members in MDX can be used to create virtual repairs
that simulate structural repairs.
1 Introduction
Multidimensional databases (MDDBs) represent data in terms of dimensions and fact
tables. Now, a dimension is represented as a dimension schema, i.e. a hierarchy (or
lattice) of categories [13, 10], plus a dimension instance that assigns data elements to
categories, organizing them in a hierarchy that parallels the category hierarchy. Facts
are quantitative measures assigned to dimension instances. This MD organization al-
lows users to aggregate data at different levels of granularity. Aggregation is the most
important operation in OLAP applications. Performing fast query answering becomes
crucial since queries are complex, involve aggregation and also much data [15].
Figure 1 shows the schema and instance of a Location dimension. The categories
are: City, State, Province, Country, All. Here, for example, USA,
England, UK, Canada are data elements, of the Country category. Categories
(and elements therein) are partially ordered upwards. We say, for example, that Country
is an ancestor category of City. If an element a is connected to an element b that be-
longs to a higher category, we say that a rolls up to b.
Fig. 1: Schema(left) and instance(right) of a dimension
In an ideal situation, MDDBs are expected to satisfy the strictness (aka. no double
counting) and homogeneity (aka. covering) conditions. These become semantic con-
∗
This research was funded by an NSERC Discovery Grant and the NSERC Strategic Network
on Business Intelligence (BIN), ADC02 project.
†
Faculty Fellow of the IBM CAS, Toronto.
straints on dimension instances. In a strict dimension, every element of a category rolls
up to at most one element in a same ancestor category.
In a homogeneous dimension, all elements in a category have the same structure.
More precisely, if a category C has C as an ancestor category, then all elements of C
must roll up to some element in C . The dimension instance in Figure 1 is not homoge-
neous, i.e. heterogeneous, because the element London does not roll up to any element
in categories State or Province. It is also non-strict, because London rolls up to
the two different elements, England and UK, in the Country category.
Dimension instances that do not satisfy strictness or homogeneity (or both) are said
to be inconsistent. Hence, the dimension in Figure 1 is inconsistent. Dimensions may
become inconsistent for several reasons, in particular, as the result of a poor design or a
series of update operations. Inconsistent dimensions reduce soundness and efficiency of
OLAP systems. In them it is not possible to assume that the summarizability property
holds [13, 14]. In a summarizable dimension, an aggregate view defined for a parent cat-
egory can be correctly derived from a set of pre-computed views for its child categories.
Summarizability allows for the correct reuse of materialized aggregate views.
To restore consistency, changes have to be made to the dimension schema and/or the
dimension instance. In accordance with the area of consistent query answering (CQA)
in relational databases [1, 3], the resulting dimension is called a repair of the original
one. A minimal repair is one that minimally differs from the original dimension. As
expected, minimality can be defined and characterized in different ways [3]. Previous
work on repairing MDDBs has focused mainly on changing the dimension instance by
modifying data elements or links between them [4, 5, 6]. Repairs obtained in this way
are called data repairs. In them, the proposed approach to removing heterogeneity is
the addition of different null values in the dimension instance [14, 18].
As indicated above, inconsistency may be caused by a problematic dimension sche-
ma. Therefore, changing the dimension instance may be only a temporary solution.
Future update operations will probably produce new inconsistencies. So, exploring and
considering structural changes, i.e. changes in the schema, is a natural way to go. This
is an idea that has been suggested in [14], to overcome heterogeneity. In this work
we introduce and investigate the notion of structural repair, as an alternative to data
repair. They modify the schema of an inconsistent dimension, to restore strictness and
homogeneity.
Structural repairs restrict changes to dimension instances by allowing changes that
affect only the category lattice and the distribution of the data elements in different
categories. In particular, there are no “data changes” (as in data repairs), in the sense
that the data elements and also the edges between them remain. In the case of a ROLAP
data warehouse (with relational schemas, like star or snow flake) fewer data changes
on a dimension instance directly results in fewer changes on records in the underlying
database.
We establish that any minimal structural repair enables query rewriting: Given a
query, Q, posed to the original, inconsistent MDDB D and any of its minimal repairs,
D , Q can be translated into a query Q to be posed to D , obtaining the same results
in terms of selected data elements and their respective aggregate values. In particular,
if in D some summarizability properties applied or “local” homogeneity and strictness
held (which, e.g. can be enforced by local semantic constraints [5]), then the query
answers for the “correct” cases in D are reobtained via Q from D . Even more, the
same (correct) summarizability properties that held in D will hold in D . Since repairs
are now consistent, summarizability will be a global property in each of them. 1
Structural repairs can be used to solve both non-strictness and heterogeneity, and
have major advantages wrt to other approaches, specially when dealing with hetero-
geneity. On the other hand, when fixing non-strictness, structural repairs have good
properties wrt the number of changes when the parent elements in a same category in-
volved in a violation of strictness (i.e. they have a descendant in common) do not have
descendants in many different categories. In such undesirable cases, changing the par-
ent category may have to be propagated to categories belonging to its lower subgraph,
causing several additional changes (cf. Section 6). Thus, a data repair or a combination
of data and structural repair may result in fewer overall changes.
Structural repairs may be virtually implemented through appropriate view defini-
tions, an idea we develop in this work. As a view definition language, we consider
MDX, the language commonly used to query MDDBs [17, 19]. In this work we also
show how query-scoped calculated members in the MDX language can be used to create
virtual structural repairs. In this way, the behavior of structural repairs can be explored
at query time, which is useful if a structural repair of choice is going to be materialized
at some point.
This paper is organized as follows: Section 2 presents the MD data model. Section 3
formalizes the notion of (minimal) structural repair. Section 4 discusses properties of an
ideal repair and shows that, by satisfying certain conditions, a minimal structural repair
will have those properties. Section 5 shows how virtual structural repairs can be created
in MDX. Section 6 provides a discussion of previous work, a comparison between data
repairs and structural repairs, directions for future work, and some concluding remarks.
Proofs of results, additional examples, and a brief introduction to MDX can be found
in an extended version. 2
2 The Multidimensional Model
In this work we adopt as a basis the Hurtado-Mendelzon model of MDDBs [10, 13].
However, since we will deal with changes in the schema, we will use a two-sorted
structural representation of a MDDB (as in many-sorted predicate logic [8]). This will
flatten out the representation and will allow us to talk about categories as elements of the
structure (as opposed to sets of elements). In consequence, a dimension is a set-theoretic
structure of the form S = U, C U , E U , LU U U U U
C , LE , EC , where C and E are unary
relations, and L UC , L U
E , EC U
are binary relations, all of them over the universe U . More
precisely:
1. U is a possibly infinite and non-empty set that is the disjunct union of a set of
categories and a set of data elements, say U = U C ∪ UE .
2. C U ⊆ UC is a finite, non-empty set of “active” categories. In the example above,
C U = {City, State, Province, Country, All}.
3. E U ⊆ UE is a finite, non-empty set of “active” data elements. In the example,
E U = {LA, ..., all}.
4. LUC is the child/parent relationship between categories, i.e. edges between cate-
gories in the dimension schema. In the example, L U C ={(City, State), (State,
Country), . . . , (Country, All)}.
1
Actually, all these claims still hold under slightly softer conditions than those imposed by the
notion of minimal repair.
2
http://people.scs.carleton.ca/∼bertossi/papers/strucRep18Ext.pdf
5. LUE is the child/parent relationship between data elements, i.e. edges between data
elements in the dimension instance. In the example, L U E = {(LA,California),
(California, USA), . . . , (Canada,all)}.
6. EC U is a relation that maps data elements to their categories. In the example,
EC U = {(LA,City), (California, State), . . . , (all,All)}. all is the
only “element” of All. 3
From a dimension S as above we can define its dimension schema structure: K(S) :=
UC , C U , LU U U U
C . We can also define its dimension instance: M (S) := U, E , LE , EC .
Loosely speaking, we can say that “M (S) is the instance for schema K(S)”.
For a structure S to represent a valid MDM, it has to satisfy the following basic
MD conditions (BMDCs): (below e i and ci represent members of the universe and R ∗
denotes the transitive closure of a binary relation R):
U
1. C U ∩ E U = ∅. 2. LU U U U U
C ⊆ C × C . 3. LE ⊆ E × E . 4. EC
U
⊆ E U × C U.
5. For all e1 , e2 , c1 , c2 : If (e1 , e2 ) ∈ LE and (e1 , c1 ) ∈ EC and (e2 , c2 ) ∈ EC U,
U U
then (c1 , c2 ) ∈ LU C.
6. For all c1 : If c1 ∈ C U, then (c1 , c1 ) ∈ / (LU ∗
C) .
7. For all e1 : If e1 ∈ E , then there exists c1 with (e1 , c1 ) ∈ EC U.
U
8. For all c1 , c2 : If there is e with (e, c1 ) ∈ EC U and (e, c2 ) ∈ EC U , then c1 = c2 .
9. For all c1 , c2 : c1 = c2 iff, for all e ∈ E U, it holds: (e, c1 ) ∈ EC U iff (e, c2 )
∈ EC U .
Notice that from these conditions it follows that 2. an 3. are proper inclusions.
Definition 1. Dimension S = U, C U , E U , LU U U
C , LE , EC is: (a) Strict if, for all
e1 , e2 , e3 , c : If (e1 , e2 ) ∈ (LE ) , (e1 , e3 ) ∈ (LE ) , (e2 , c) ∈ EC U , (e3 , c) ∈ EC U ,
U ∗ U ∗
then e2 = e3 . (b) Homogeneous if, for all e 1 , c1 , c2 : If (e1 , c1 ) ∈ EC U , (c1 , c2 ) ∈ LU C,
then there is e2 , with (e2 , c2 ) ∈ EC U and (e1 , e2 ) ∈ LU E . (c) Consistent if it satisfies
the two previous conditions; and inconsistent otherwise.
3 Structural Repairs
Definition 2. A data repair for an inconsistent dimension S = U, C U , E U , LU U
C , LE ,
U
U , LU , L
EC U is a structure S = U, C U , E U , EC
that satisfies the dimension
C E
4
conditions and the BMDCs, and is strict and homogeneous.
Notice that S and S have the same universe, the same categories, and the same category
links (but other relations may differ). We can see that, in order to obtain a data repair,
one can only add or remove data elements, change links between elements or move
elements between different categories. The following notion of structural repair (SR) is
given in terms of a new MD structure, possibly with a new schema, and a mapping that
establishes the correspondence between the original dimension and the new structure.
Definition 3. A structural repair for an inconsistent dimension S = U, C U , E U ,
U
LU U U U U U U
C , LE , EC is a pair S , g, with S a structure U, C , E , LC , LE , EC , with
the following properties: (a) S is strict and homogeneous. (b) Elements cannot move
3
We assume that in MDM all dimensions have this top category All with the single element
all, to which all the other elements roll up.
4
In the following this will be implicity assumed.
between categories that exist both in S and S : For all e, c1 , c2, if (e, c1 ) ∈ EC U , (e, c2 )
U
, c1 = c2 , then {c1 , c2 } (C U ∩ C
∈ EC U ). (c) Any new category in S must have
U
at least one data element. (d) g : C U → 2C , the schema mapping, is a total function
that maps each category of S to a set of categories in CU (which is finite in S ). (e) If
c ∈ g(c), then c and c share at least one data element.
The role of g is to establish a relationship between the schemas of S and S . Notice
that, for each category c, the set of its “elements”, i.e. {e | (e, c) ∈ EC U }, may be
U and c ∈ g(c)}).
⊆-incomparable with {e | (e, c ) ∈ EC
Fig. 2: An SR for the dimension of Figure 1
Fig. 3: Another SR for the dimension of Figure 1
A structural repair has the same domain, including categories and elements, as the initial
dimension. However, the finite sets of active categories, i.e. C U , CU , resp., may be
different. On the other side, the finitely many active elements are the same. Condition
(b) in Definition 3 ensures that in a structural repair, data elements move from one
category to another only as a result of changes made to the dimension schema (splitting
or merging categories of S). 5
Figures 2 and 3 show two of the possible structural repairs for the inconsistent
dimension of Figure 1. They show that strictness forces us to put elements England
and UK in different categories. On the other hand, homogeneity forces us to either merge
categories State and Province or isolate element LA in a new category.
Proposition 1. Every inconsistent dimension has a structural repair.
This proposition can be proved by using a simple structural repair that essentially cre-
ates a new one-element category for each element in the dimension instance, and also
links between the newly created categories whenever their single elements are con-
nected in the original instance. Figure 4 illustrates this kind of structural repair as a
third repair for the instance in Figure 1.
5
Isomorphic structural repairs, i.e. that differ only in the names of active categories, will treated
as being the same repair.
Example 1. The following mapping is a schema mapping between the dimension of
Figure 1 and the structural repair of Figure 2:
g : City → {City1, City2}, State → {StateProv}, Province → {StateProv},
Country → {Country1, Country2}, All → {All}.
Fig. 4: Another SR for instance in Figure 1
Now we define minimal structural repairs (MSRs), as preferred repairs among the struc-
tural repairs. This requires comparing structural repairs. The data movement set, defined
next, has useful properties for enabling this comparison (cf. Section 4).
Definition 4. Let S , g be an SR for dimension S, and c a category of S. (a) The
data movement set of category c is defined by:
U },
DMSet g (c) = {e | (e, c) ∈ EC U } {e | (e, c ) ∈ EC
c ∈g(c)
where denotes the symmetric difference oftwo sets. (b) The overall data movement
set between S and S is DMSet g (S, S ) := c∈C U DMSet g (c).
Example 2. (example 1 cont.) For the schema mapping in the example: DMSet g (City)
= ∅, DMSet g (State) = {BC, ON}, DMSet g (Province) = {California},
DMSet g (Country) = ∅, DMSet g (All) = ∅.
Intuitively, an MSR is a new dimension that is obtained by applying a minimal set of
changes to the schema of an inconsistent dimension. Inspired by the notion of priori-
tized minimization [16], we propose to minimize both data movement and the changes
in the set of categories, but assigning higher priority to minimizing the former.
Definition 5. For a dimension S and two SRs S 1 , g1 and S2 , g2 :
S1 , g1 ≤S S2 , g2 iff DMSet g1 (S, S1 ) ⊆ DMSet g2 (S, S2 ) and
DMSet g1 (S, S1 ) = DMSet g2 (S, S2 ) ⇒ (C S C S1 ) ⊆ (C S C S2 ).
Here, C S , C S1 and C S2 denote the finite sets (C U ) of active categories for structures
S, S1 and S2 , respectively.
Definition 6. S1 , g1 is a minimal structural repair (MSR) of dimension S iff it is an
SR of S and there is no other SR S 2 , g2 for S, such that S 2 , g2