Logical Multidimensional Database Design for Ragged and
                        Unbalanced Aggregation Hierarchies
                                     Tapio Niemi                                                              Jyrki Nummenmaa
                     Department of Computer and Information                                      Department of Computer and Information
                        Sciences, University of Tampere                                             Sciences, University of Tampere
                          FIN-33014 University of Tampere,                                            FIN-33014 University of Tampere,
                                        Finland                                                                     Finland
                                    tapio@cs.uta.fi                                                             jyrki@cs.uta.fi

                                                                                  Peter Thanisch
                                                            Department of Computer Science, University of Edinburgh
                                                                         Edinburgh, EH9 3JZ, Scotland
                                                                                 pt@dcs.ed.ac.uk


                                         Abstract                                       modelling methods of these hierarchies differ from the
                                                                                        methods used in database design. Thus, dependencies
     Research on logical design of OLAP cubes has                                       used in database design are not enough for modelling
     tended to assume that the rollup hierarchy in a                                    OLAP hierarchies. In this work our aim is to study which
     cube dimension takes the form of a balanced                                        kinds of dependencies would be needed to get aggrega-
     tree. However, experience from industry                                            tion hierarchies more desirable for OLAP.
     indicates that a much broader class of rollup                                          Much of the research on logical design for OLAP
     hierarchies is of interest in practice. For                                        cubes has concentrated on the class of aggregation
     example, the hierarchy tree might be unbalanced                                    hierarchies that arise from star and snowflake schemata.
     or ragged (or both), or indeed the hierarchy                                       In such aggregation hierarchies:
     might not be a tree at all, but a more general                                     (1) the attribute labelling a column in a dimension table
     acyclic directed graph structure such as a lattice.                                    will correspond to a level in the aggregation hierarchy
     We demonstrate how dependency information                                              of the corresponding cube dimension, and
     can assist in the design of aggregation                                            (2) data values occurring in that column of the dimension
     hierarchies. In addition to dependencies familiar                                      table will become the members of the corresponding
     from relational database theory, we use new                                            level.
     classes of dependencies to extend logical design                                   OLAP practitioners need a much more general notion of
     principles so that they include this more general                                  aggregation hierarchies than this. For example, an
     notion of cube dimension hierarchies.                                              aggregation hierarchy in one of the cube’s dimensions
                                                                                        might be built from a hierarchical relationship that is
1 Introduction                                                                          modelled in one of the database tables. Example 1
                                                                                        illustrates this.
Hierarchical dimensions are an essential part of On-Line
Analytical Processing (OLAP). The hierarchical structure                                Example 1 Consider a management hierarchy in an
allows the user to study facts in different levels of details.                          Employee table in which each row contains the
Good hierarchical structures ensure correct and efficient                               information pertaining to a particular employee and one
calculation of aggregation functions: consistent and                                    column in the table contains the employee number
coherent structure of hierarchies decreases logical errors                              (attribute EmpID) and another contains the number of
while efficiency of calculation is increased, for example,                              that employee’s manager (attribute MgrID). In the OLAP
by decreasing redundancy in rollup paths. Logical                                       cube, we have Salary as the measure and an Employee
                                                                                        dimension. The aggregation hierarchy in the Employee
 The copyright of this paper belongs to the paper’s authors. Permission to copy         dimension is the management hierarchy modelled by the
 without fee all or part of this material is granted provided that the copies are not   EmpID and MgrID columns. This aggregation hierarchy
 made or distributed for direct commercial advantage.                                   allows us to query the total salary bill for all employees
 Proceedings of the International Workshop on Design and                                below any given manager.
 Management of Data Warehouses (DMDW’2001)
 Interlaken, Switzerland, June 4, 2001                                                  We note that in Example 1, we do not have levels
 (D. Theodoratos, J. Hammer, M. Jeusfeld, M. Staudt, eds.)
                                                                                        corresponding to attributes. Rather, levels correspond to
                                                                                        data values occurring in the table. Thus two quite
 http://sunsite.informatik.rwth-aachen.de/Publications/CEUR-WS/Vol-39/


T. Niemi, J. Nummenmaa, P. Thanisch                                                                                                            7-1
distinct kinds of aggregation hierarchies can arise in         dimension can have hierarchies like day-week, and
practice and we refer to these as “attribute hierarchies”      day-month-year. The third item, correct aggregation of
and “data hierarchies”.                                        data, ensures that aggregated values are not misleading.
    With regard to the logical design of cubes, much           Good structure for dimension hierarchies is essential in
research has been published on the role of dependencies,       guaranteeing this. The idea of the fourth item, support for
such as functional dependencies, on the design of              non-strict hierarchies, means that many to many
attribute hierarchies. In the present paper, we demon-         relationships between different levels in a dimension are
strate that dependency theory is also relevant to cube         allowed. Finally, the last item, data with different levels
design for data hierarchies. In addition to the functional     of granularity should be allowed, means that it should be
dependencies, we give three new dependency classes for         possible to store data in different levels of hierarchy, not
data hierarchies. The anti-closure dependency prevents         only to the lowest level.
calculating redundant aggregation values by disallowing            Jagadish et al. [JLS99] argue that balanced and tree
“shortcuts” in hierarchies, while non-raggedness and           structured hierarchies are not enough for many real
balance dependencies ensure complete aggregations on           situations but more flexible structures are needed. Zurek
each level in a hierarchy.                                     and Sinnwell [ZuSi99] list some non-standard require-
    Correct summarizability is the most essential design       ments for data warehouses. Especially, realignment of a
goal for dimension hierarchies in OLAP. There are three        data warehouse schema is needed quite often since
necessary conditions for summarizability and these             dimension hierarchies can change even regularly.
conditions are also assumed to be sufficient [LeSh97]:             Hurtado et al. [HMV99] study how data cubes can be
1. disjointness of category attributes,                        maintained under dimension updates. They have noticed
2. completeness, and                                           that dimensions are not so static than it is often assumed,
3. correct use of measure (summary) attributes with            thus efficient maintaining of dimensions can remarkably
    statistical functions.                                     increase the efficiency of the system. A formal model for
Disjointness requires that attributes in dimensions form       dimension updates is presented. The model covers both
disjoint subsets over the elements. Completeness means         domain updates and structural updates of dimensions. An
that all elements exist and every element is assigned to       algorithm to maintain a relational stored data cube under
some category on the level above it in the hierarchy.          dimension updates is also presented.
Correct use of measure attributes with statistical functions       Lehner et al. [LAW98] define the dimensional normal
is the most complex of these requirements since it             form, which is a normal form for OLAP hierarchies. A
depends on the type of the attribute and the type of the       dimension is in dimensional normal form if there is a
statistical function. Without any prior knowledge about        functional dependency from each lower level to the level
the semantics of the attributes, we can verify the first two   above it. In many real situations, however, the dimen-
conditions by using information on dependencies among          sional normal form may be too strict. The authors also
attributes in a dimension.                                     study a problem related to attributes that are not
    We have organised the rest of the paper as follows.        applicable for all instances, for example, all products do
Next a review of the related work is given in Section 2.       not have a colour.
After that we present the basis of hierarchy types in              Khardon et al. [KMR99] have studied Boolean
Section 3. In Section 4 the new dependencies are studied.      dependencies in a context of example based reasoning.
Finally, the conclusions are presented in Section 5.           Boolean dependencies are a generalisation to the
                                                               functional dependencies [Cod72]. While the functional
2 Related Work                                                 dependency has a form X→Y, where X and Y are sets of
                                                               attributes, the Boolean dependencies allow us to use
Pedersen and Jensen [PeJe99] discuss requirements for          Boolean operators between attributes. Boolean dependen-
multidimensional data models. Five of their nine               cies can be extended from simple equality comparison of
requirements are for dimension hierarchies:                    values to compare whether the values belong to the same
1) explicit hierarchies in dimensions,                         equivalence class. This makes it possible to define
2) support for multiple hierarchies in a dimension,            constraints between value groups, e.g. age → age group.
3) correct aggregation of data,                                    Wijsen et al. [WNC99] study generalising temporal
4) support for non-strict hierarchies, and                     dependencies for non-temporal dimensions. They
5) data with different levels of granularity should be         generalise temporal functional dependencies into rollup
     allowed.                                                  dependencies and illustrate their usability in conceptual
Explicit hierarchies in dimensions means that hierarchies      modelling and data mining. The rollup dependencies can
should be explicitly shown in the schema to help user's        be used in modelling generalisation hierarchies if
understanding of hierarchical structures in dimensions.        hierarchical levels are ordered by the finer than relation.
By multiple hierarchies in a dimension, the authors mean       The rollup dependencies ensure comparing equality of
that a dimension hierarchy is allowed to be an acyclic         values in a specific level. In this way, they are a
graph, that is, there can be several paths from the top        generalisation of the functional dependencies. As an
level to the most detailed level. For example, the time        example, consider a relation on hotel room reservations


T. Niemi, J. Nummenmaa, P. Thanisch                                                                                    7-2
having attributes (ROOM_NUMBER, DATE, PRICE).                   relationship between just two columns, for example
An example of a rollup dependency in this relation is           Employee ID and Manager ID. In addition to this, a data
ROOM_NUMBER DATEWEEK → PRICE, meaning that                      hierarchy can have additional attributes to indicate the
the price of the room cannot change during a week. An           level of the members, for example Job Grade for
axiomatisation for rollup dependencies is also presented        employees. This level does not necessarily correspond
and negation with the rollup dependencies is studied.           with the actual level of the member in the parent-child
                                                                hierarchy.
3 OLAP Hierarchies                                                  Hierarchies can be classified according to their
                                                                generality. Figure 1 presents a hierarchy of different data
In this section we first formalise our model for the OLAP       hierarchies. The most general is the acyclic digraph and
cube and then study different kinds of data hierarchies         the strictest is the balanced and non-ragged tree.
and their properties. Our formalisation is based on
relational database theory and we assume that the reader                             acyclic digraph
knows the basics of it.

3.1 OLAP Model                                                                    transitive anti-closed
To be able to operate with dependencies, we must                                          digraph
formalise our notion of the OLAP cube. Generally, we
assume that the cube consists of dimensions (measures
can also be handled as dimensions). We use a relational                                     tree
approach [Cod70] that is close to the often-used star
schema. However, we do not locate dimension hierar-
chies and the fact table in different relations but assume
                                                                             balanced,             non-ragged,
that the fact table also contains the dimension hierarchies.
                                                                              ragged               unbalanced
It is important to notice that this is only an abstract model
used as a base of the logical design and totally independ-
ent of the implementation.
Definition 1 The OLAP cube is a relation over the                                         balanced,
relation schema H = D1 ∪ D2 ∪ ...Dn, where each Dk is a                                 non-ragged
set of attributes and it is called a dimension schema.
If D is a dimension schema, then a relation d over D is                Figure 1: Hierarchy of dimension hierarchies
called a dimension. The hierarchy representing the
dimension is called the dimension hierarchy. The nodes          The transitive anti-closed graph is the complement of the
in the dimension schema are called attribute hierarchy          transitive closure graph. The transitive closure can be
levels. The tuples in the dimension relation are members        constructed as follows. If there is a path from node A to
of the attribute hierarchy levels. Instead, in data             node B of length 2 or more, then add an edge from A to
hierarchies the levels and members of the levels are            B. Instead, the anti-closure can be produced as follows. If
implied by the actual data.                                     there is a path from node A to node B of length 2 or
    In what follows we mostly use the following example         more, then remove an edge from A to B.
of a dimension.                                                     These balance and raggedness properties require that
                                                                the level code is attached to each node. A graph is ragged
Example 2 We have a dimension of sales employees. The           if each node does not have a level code corresponding to
schema has three attributes:
                                                                its actual level in the hierarchy, whereas a graph is
E: Employee ID
                                                                balanced if all leaf nodes have the same level code.
M: Manager ID
                                                                    As Figure 1 indicates, we have six different kinds of
G: Job grade
                                                                hierarchies but two of them have similar properties
Each tuple, t, in a relation over {E, M, G} contains details
                                                                related to OLAP hierarchies:
about an employee, namely t[E], the employee’s
                                                                (1) Acyclic. This is the most general class. It allows the
Employee ID, t[G], his or her Job grade, and t[M], the
                                                                     day-month-quarter-year; day-year structure, where
Employee ID of his or her manager.
                                                                     the day-year could be called a "direct shortcut". A
                                                                     direct shortcut means redundant aggregation paths.
3.2 Data Hierarchies                                            (2) Transitive anti-closed digraph. This is an acyclic
                                                                     graph with no direct shortcuts. No redundant aggre-
In the attribute hierarchy, the levels correspond to                 gation paths are possible.
attributes and the members of the level have the names of       (3) Tree. Unique aggregation paths are guaranteed.
the values that occur in the corresponding attribute's
column. In a data hierarchy we have a parent-child


T. Niemi, J. Nummenmaa, P. Thanisch                                                                                    7-3
(4) Unbalanced but non-ragged or balanced but                    if there exists a path from t[X] to t’[Y], then a path from
     ragged. The available levels of aggregation are not         t’[Y] to t[X] does not exist.
     equal.
                                                                 For simplicity we assume that the left part of the
(5) Balanced and non-ragged. Equal levels of
                                                                 hierarchy cannot contain null values. We use the example
     aggregation are available.
                                                                 data presented in Table 1 to illustrate different classes of
In the next section, we study dependencies that guarantee
                                                                 hierarchies.
particular hierarchies and show examples about the
hierarchies.
                                                                 Table 1: An example relation on management hierarchy
4 Dependencies for Hierarchies                                                empID        mgrID         grade
In this section we study relational dependencies in the                         1          NULL            A
context of OLAP hierarchies. We mainly concentrate on                           2            1             B
dependencies for data hierarchies. We may consider two                          3            1             B
types of dependencies:                                                          4            1             C
(1) dependencies that must always hold because they                             5            2             C
      say something about the semantics of the data, and                        5            1             C
(2) dependencies that might hold in the database at the
      point that we want to construct the parent/child           The graph representing the relation of Table 1 is given in
      aggregation hierarchy, but which do not always             Figure 2. We use the relation of Table 1 as a running
      hold.                                                      example. Typically, stricter hierarchies imply deletion of
For instance, it may be that at some moment each                 some rows from the relation.
employee has a unique manager. This might be because
there is a dependency, which always holds, or, alterna-
tively, it just happens to be so at that moment but                                        1
generally it is possible that an employee has several
managers.                                                                                  3       2
    We note that, in practice, a number of other
constraints on the data are needed. For example, it may be
                                                                                   5                     4
that the data hierarchy should have a unique root element
for aggregation purposes. We ignore these considerations
in the present paper                                                   Figure 2: Directed acyclic graph representing the
                                                                                           relation in Table 1
4.1 Data Hierarchy
                                                                 4.2 Anti-Closure Dependency
For data hierarchies, we need the inclusion dependencies,
but only in a simplified form where only a single relation       Unbalanced hierarchies often exist in applications and the
is considered.                                                   vendors of OLAP products have started support them. For
                                                                 example, in the latest release of Microsoft Analysis
Definition 2 Let r be a relation over R and X, Y ⊆ R.            Services [Mic00], it is possible to define an "unbalanced"
There is an inclusion dependency [X] ⊆ [Y] in r, if X(r)         tree. This allows us to have an EMPLOYEES relation
⊆ Y(r).                                                          which includes the columns EmpID and MgrID, where
For the data hierarchy we first define a path in a relation.     (as usual) the MgrID entry in a row identifies the
                                                                 employee's manager. Suppose that all employees are
Definition 3 Let r be a relation over R and X, Y ⊆ R. A          Salesmen or their managers. We want to aggregate sales
sequence of tuples t0, t1, ..., tn∈r forms a path from t0[X]     by manager, by aggregating the sales of all salesmen
to tn[Y] with length n, if t0[Y] = t1[X], t1[Y] = t2[X], ...,    below a manager. This is tricky because the leaf nodes
tn-1[Y] = tn[X].                                                 can have different depths. For example, in Table 1 (see
                                                                 also Figure 2) leaf nodes ‘3’ and ‘4’ have different levels
We write <t0, t1, ..., tn> for a path from the tuple t0 to the
                                                                 in the tree.
tuple tn. A relation may contain null values but a null
                                                                     In a different context, Gottlob et al. defined the
value cannot exist in a valid path. If the designer is
                                                                 closure dependency [Got89]:
considering the use of some attributes as a data hierarchy,
the conditions described in the following definition must        Definition 5 Let r be a relation over R and X, Y ⊆ R. The
be met.                                                          relation r obeys the closure dependency X@Y if for each
                                                                 pair of tuples t and t’ in r it holds: if t[Y] = t’[X], then
Definition 4 A relation h over the relation schema H(XY)
                                                                 there exists a tuple t’’ such that t’’[X] = t[X] and
forms a data hierarchy if [Y] ⊆ [X] and for each t, t’ ∈ h
                                                                 t’’[Y]=t’[Y].


T. Niemi, J. Nummenmaa, P. Thanisch                                                                                      7-4
With regard to OLAP hierarchies, the complement of the                                       year
closure dependency, which we refer to as the anti-closure
dependency, is of relevance.
     We need the anti-closure dependency in order to                               quarter          week
prevent a hierarchy in which a rollup path “by-passes”
intermediate nodes. The transitive anti-closure of the
hierarchy in Figure 2 is given in Figure 3. There are two                                month
rollup paths from ‘4’ to ‘1’ in Figure 2. We want to
eliminate the direct path, so that rollup never by-passes
‘2’.                                                                                          day

                           1                                         Figure 5: A dimension schema hierarchy that obeys
                                                                                               ACD

                             3     2                             Multiple paths between nodes are allowed but if there is a
                                                                 longer path between two nodes, the direct path is not
                  5                       4                      allowed. Or, the same can be said in the opposite way: if
                                                                 there is a direct path between two nodes, the longer path
           Figure 3: A transitive anti-closed graph              cannot exist. The hierarchy in Figure 4 does not obey the
                                                                 ACD while the hierarchy in Figure 5 obeys it.
    For the hierarchy of Example 2, we want to ensure            Theorem 1 Let r be a relation over R and X, Y, Z, W ⊆
that the transitive anti-closure of the graph, i.e. if t[M] =    R. If X #> Y, then XZ #> YW.
t’[E], there should not be some longer “path” from t’ to t
using E and M values to connect tuples. This can be done         Proof. Let t, t’∈r. Since X #> Y, there is not a direct and
by the anti-closure dependency. It guarantees that the           indirect path from t[X] to t’[Y]. Adding more attributes
corresponding hierarchy graph is transitively anti-closed,       cannot increase the number of matching values, thus there
but this does not disallow the possibility that a node can       cannot exist more paths from t[XZ] to t’[YW] than from
have multiple parents, i.e. an employee reports to more          t[X] to t’[Y].
than one manager.                                                The anti-closure dependency is not transitive. A relation
Definition 6 Let r be a relation over R and X, Y ⊆ R. The        presented in Table 2 obeys dependencies A#>B and
relation r obeys the anti-closure dependency (ACD)               B#>C but not A#>C.
X#>Y, if for any path <t0, t1, ..., tn>, n ≥ 2, there does not
exist a direct path <v1, vm> in r such that t1 [X] = v1[X]                 Table 2: Anti-closure is not transitive
and tn[Y] = vm[Y].
                                                                                   A          B            C
                                                                                   1          4            2
                                 year                                              2          5            3
                                                                                   1          6            3

                   quarter                                       4.3 Functional Dependencies
                                                                 Although more general rollup hierarchies are useful, often
                        month                                    stricter hierarchies, such as trees, are needed. The
                                                                 well-known functional dependencies [Cod72] can be used
                                                                 to force a hierarchy to a tree. In Figure 3 there is also a
                                 day                             functional dependency between nodes. The functional
                                                                 dependencies can be used in both attribute and data
     Figure 4: A dimension schema hierarchy violating            hierarchies.
                               ACD                                   In data hierarchies, we may want to restrict our
                                                                 attention to Parent/Child dimensions in which each node
                                                                 in the dimension has just one parent. In Example 2, we
                                                                 can enforce this rule with the functional dependency E
                                                                 M. The functional dependency E  * PXVW DOVR KROG LQ
                                                                 our example.


T. Niemi, J. Nummenmaa, P. Thanisch                                                                                     7-5
Definition 7 A relation r over R obeys a functional           this a subtler dependency than a conventional FD is
dependency X→Y, if for each pair of tuples t, t’ ∈ r,         needed as Example 3 shows.
whenever t[X] = t’[X], then t[Y] = t’[Y].
                                                              Example 3 We have the following rule:
    OLAP hierarchies often contain non-applicable
                                                              If two rows agree on City and Neighbourhood then
NULLs. With regard to functional dependencies, suppose
                                                                  If State is NULL in both rows then
two of the columns in the relation are ProductID (which
                                                                      They must disagree on Country
will be used as a member in the cube) and Colour (which
                                                                  else
will be incorporated as a member property). ProductId →               Either One has a NULL in State
Colour holds. We assume that two products are given                   Or they are both non-NULL in State and the State
different ProductIds if their colours differ, even if they            entries are different.
are identical in every other respect. However, it could be
that some products do not have a colour, so in a relation,    The Boolean dependency corresponding to the example
the entry in the Colour column for such products will be      would be: City AND Neighbourhood → State OR
NULL. The functional dependency means that two rows           Country. Next we give a more formal definition for
that agree on ProductId cannot disagree on Colour. In         Boolean dependencies.
particular, if one such row has a non-NULL entry in the
                                                              Definition 8 Let R be a relation schema. The following
Colour column, the other row must have an identical
                                                              expressions are valid Boolean dependencies in R:
entry.
                                                              1. A, if A∈R,
    The functional dependency ProductId → Colour must
                                                              1. !A, if A is a valid expression,
hold before we can consider including the Colour column
                                                              2. (A), if A is a valid expression, and
as a member property of ProductId. If we want to add
Colour as a level above the ProductId level, we would         3. A o B, if A and B are valid expressions and o∈{AND,
need to include a special member in the Colour level for         OR, XOR, →, ↔}.
products that have NULL entries in the Colour column.         Definition 9 Let r be a relation over R. The relation r
Lemma 1 Let r be a relation over R and X, Y ⊆ R. If the       obeys a Boolean dependency X, if r has only one tuple or
                                                              for each pair of distinct tuples t1 and t2 in r obeys the
functional dependency X→Y holds in r, then for all two
                                                              dependency X’ that is constructed from X by replacing
tuples t, t’∈ r there is at most one path between t[X] and    each term A in X to t1[A] = t2[A].
t’[Y].
                                                              In Definition 9, we assume for the negation that !(t1[A] =
Proof. (By contradiction) Assume there exist two paths
                                                              t2[A]) ⇔ t1[A] ≠ t2[A]. The tuples t1 and t2 have to be
from t[X] to t’[Y]. Then there must exist an X value
                                                              distinct because of problems with negation: the
related to two different Y values, that is, the functional
dependency does not hold.                                     dependency A → !B would be never valid if t1 = t2.
                                                              Although a Boolean dependency can be an arbitrary
Theorem 2 Let r be a relation over R and X, Y ⊆ R. If         Boolean expression, we restrict our study to ‘implication’
X→Y, then X #> Y.                                             dependencies. Example 4 illustrates such a Boolean
                                                              dependency.
Proof. Let t, t’∈r. According to Lemma 1 there can be at
most one path between t[X] and t’[Y]. Thus, X#>Y holds.       Example 4 Table 3 obeys a Boolean dependency A→B
                                                              OR C, but it does not obey A→B OR A→C.
4.4 Boolean Dependencies
                                                                               Table 3: A→B OR C
The following example illustrates the use of the Boolean
dependencies [KRM99]. Consider the Geography
dimension: Country – State – City – Neighbourhood.                                A      B      C
With ragged hierarchies, we allow NULLs in the column                             a1     b1     c1
corresponding to potentially-missing parents (e.g., the                           a1     b1     c2
State column for the row which has "Washington DC" in                             a1     b2     c2
its City column). But we could insist that if we allow such
raggedness, there must be a "partial" functional              4.5 Enforcing Non-Raggedness
dependency in force which means that we must restrict         In our example non-raggedness means that all employees
the numbers of cities called "Washington DC" to one per       reporting to the same manager have the same job grade.
country. There are several different cities in the USA        We use the terms of graph theory. If t[X, Y] ∈ h, then
called "Springfield", but that is acceptable, because each    t[X] is called an underling of t[Y].
Springfield is located in a different State. If there were
two "Washington DC"s, it would not be a problem, so           Definition 10 Let h be a data hierarchy over a relation
long as at least one is in a State. But if neither Washing-   schema H and X, Y ⊆ H. The set of direct underlings of
ton DC is in a State, then we have ambiguity. To forbid       t[Y] is Underlings(t) = {t’ ∈ h | t’[Y] = t[X]}.


T. Niemi, J. Nummenmaa, P. Thanisch                                                                                 7-6
Definition 11 Let r be a relation over R and X, Y, W ⊆           an unbalanced tree can be seen in Figure 7. We define the
R. The relation r obeys a non-raggedness dependency              set of the leaf nodes of a hierarchy as follows:
(NRD) XY →NRD : LI (XY)r is a data hierarchy and for          Definition 12 Let h be a data hierarchy H. The set of the
all tuples t ∈ r, for all tuples t’, t” ∈ Underlings(t), t’[W]   leaf nodes of h is Leaves(h) = {t ∈ h | Underlings(t) is
= t”[W].                                                         empty}.
In the definition, the role of W is to be a level identifier.
                                                                 Definition 13 Let r be a relation over R and X, Y, W ⊆
There can be two types of hierarchies according to the
                                                                 R. The relation r obeys a balance dependency (BD)
values of the level identifier: 1) strict and 2) monotonic.
                                                                 XY→BD:LI (XY)r is a data hierarchy and for all tuples t,
Strict means that the child must have a different and
strictly lower level than the parent, whereas monotonic          t’∈ Leaves(h), t[W] = t’[W].
means that the child must not have a higher level value
than its parent has. Moreover, we say that a level                                        1
identifier is consistent if there is a one-to-one mapping
between the values of the level identifier and the actual
levels of the nodes. Consistent level identifiers and                                     3       2
non-raggedness dependency guarantee that no levels can
be by-passed in rollup paths.                                             Figure 8: A non-ragged and balanced tree

                           1                                         Non-raggedness with consistent level identifiers and
                                                                 balance guarantee that aggregations are complete on
                                                                 every level in the hierarchy. In the balanced hierarchy all
                                   2                             leaf nodes are on the same level, and in the non-ragged
                                                                 hierarchy with consistent level identifiers there must exist
                  5                        4                     a node on every hierarchy level in each rollup path. An
                                                                 example about non-ragged and balanced hierarchy can be
                                                                 seen in Figure 8.
             Figure 6: Ragged but balanced tree

Figure 6 shows that the completeness condition of                5 Conclusions
aggregations can easily be violated in ragged hierarchies.       We have studied different aggregation hierarchies and
For example, if we summarize the members of the second           dependencies that hold in a particular hierarchy. We have
level, we lost the value of node ‘5’, because it does not        also classified the hierarchies: the most general is the
rollup to any member on the second level. However, the           acyclic digraph and the strictest is the balanced and
non-raggedness with consistent level identifiers is not a        non-ragged tree. The structure of the hierarchy has
sufficient condition for complete aggregations. For              significant effect on correctness of aggregations and the
example, the hierarchy in Figure 7 is non-ragged but             storage space needed to store pre-calculated aggregation
unbalanced, i.e. the leaf nodes are not on the same level.       values. In general it can be said that a stricter hierarchy
Now, if we summarize the members of the lowest level,            makes it easier to get correct aggregations. When a data
we will lose the member ‘3’.                                     hierarchy is balanced and non-ragged with consistent
                                                                 level identifiers, aggregations are complete on every level
                       1                                         in the hierarchy. On the other hand, a hierarchy, which is
                                                                 too strict, reduces the number of possible rollup paths and
                                                                 thus decreases the expressive power of the OLAP cube.
                      3        2
                                                                 6 References
                                       4
                                                                 [Cod70] Codd, E. F.: A relational model for large
                                                                     shared data banks, Communications of the ACM,
          Figure 7: Non-ragged but unbalanced tree                   13(6), 1970.
                                                                 [Cod72] Codd, E. F.: Further normalization of the data
                                                                     base relational model, Data Base Systems, Courant
4.6 Enforcing Balance                                                Computer Science Symposia Series 6, R. Rustin (ed.),
In a balanced hierarchy, all employees who are at the                Prentice-Hall, New Jersey, 1972.
leaves of the organisation hierarchy have the same job           [Got89] Gottlob, G., Schrefl, M., and Stumptner, M.:
grade. A hierarchy in Figure 6 is balanced, since all leaf           On the interaction between transitive closure and
nodes, ‘5’ and ‘4’ are on the same level. An example of              functional dependencies, MFDBS 89, the 2nd Sym-
                                                                     posium on Mathematical Fundamentals of Database


T. Niemi, J. Nummenmaa, P. Thanisch                                                                                      7-7
    Systems, J. Demetrovics et al. (eds), LNCS, Vol. 364,
    Springer, 1989.
[HMV99] Hurtado, A., Mendelson, A., and Vaisman A.:
    Maintaining data cubes under dimension updates,
    Proceedings of the 15th International Conference on
    Data Engineering, IEEE Computer Society Press,
    1999.
[JLS99] Jagadish, H., Lakshmanan, L., and Srivastava
    D.: What can hierarchies do for data warehouses?,
    Proceedings of the 25th International Conference on
    Very Large Data Bases, M. Atkinson et al. (eds),
    Morgan Kaufmann, 1999.
[KMR99] Khardon, R., Mannila, H., Roth, D.:
    Reasoning with examples: propositional formulae
    and database dependencies, Acta Informatica, 36(4):
    267-286, 1999.
[LAW98] Lehner, W., Albrecht, J., and Wedekind, H.:
    Normal forms for multidimensional databases, Pro-
    ceedings of the 10th International Conference on
    Scientific and Statistical Data Management
    (SSDBM’98), 1998.
[LeSh97] Lenz, H.-J. and Shoshani, A.: Summarizability
    in OLAP and statistical data bases, Ninth Interna-
    tional Conference On Scientific And Statistical
    Database Management (SSDBM), 1997.
[Mic00] Jacobson, R.: Microsoft SQL Server 2000
    Analysis Services Step By Step, Microsoft Press,
    2000.
[PeJe99] Pedersen, T. and Jensen, C.: Multidimensional
    data modeling for complex data, Proceedings of the
    International Conference on Data Engineering
    (ICDE’99), 1999.
[WNC99] Wijsen, J., Ng, R., and Calders, T.: Discover-
    ing roll-up dependencies, The Fifth ACM SIGKDD
    International Conference on Knowledge Discovery
    and Data Mining, KDD 1999, ACM, 1999.
[ZuSi99] Zurek, T. and Sinnwell, M.: Data warehousing
    has more colours than just black & white, VLDB’99,
    Proceedings of 25th International Conference on
    Very Large Data Bases, M. Atkinson et al. (eds.),
    Morgan Kaufmann, 1999.


T. Niemi, J. Nummenmaa, P. Thanisch                         7-8