<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Logical Multidimensional Database Design for Ragged and Unbalanced Aggregation Hierarchies</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Tapio Niemi</string-name>
          <email>tapio@cs.uta.fi</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jyrki Nummenmaa</string-name>
          <email>jyrki@cs.uta.fi</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Peter Thanisch</string-name>
          <email>pt@dcs.ed.ac.uk</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science, University of Edinburgh</institution>
          ,
          <addr-line>Edinburgh, EH9 3JZ, Scotland</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of Computer and Information, Sciences, University of Tampere, FIN-33014 University of Tampere</institution>
          ,
          <country country="FI">Finland</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Research on logical design of OLAP cubes has tended to assume that the rollup hierarchy in a cube dimension takes the form of a balanced tree. However, experience from industry indicates that a much broader class of rollup hierarchies is of interest in practice. For example, the hierarchy tree might be unbalanced or ragged (or both), or indeed the hierarchy might not be a tree at all, but a more general acyclic directed graph structure such as a lattice. We demonstrate how dependency information can assist in the design of aggregation hierarchies. In addition to dependencies familiar from relational database theory, we use new classes of dependencies to extend logical design principles so that they include this more general notion of cube dimension hierarchies.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>Hierarchical dimensions are an essential part of On-Line
Analytical Processing (OLAP). The hierarchical structure
allows the user to study facts in different levels of details.
Good hierarchical structures ensure correct and efficient
calculation of aggregation functions: consistent and
coherent structure of hierarchies decreases logical errors
while efficiency of calculation is increased, for example,
by decreasing redundancy in rollup paths. Logical
The copyright of this paper belongs to the paper’s authors. Permission to copy
without fee all or part of this material is granted provided that the copies are not
made or distributed for direct commercial advantage.</p>
      <p>Proceedings of the International Workshop on Design and
Management of Data Warehouses (DMDW’2001)
Interlaken, Switzerland, June 4, 2001
(D. Theodoratos, J. Hammer, M. Jeusfeld, M. Staudt, eds.)
modelling methods of these hierarchies differ from the
methods used in database design. Thus, dependencies
used in database design are not enough for modelling
OLAP hierarchies. In this work our aim is to study which
kinds of dependencies would be needed to get
aggregation hierarchies more desirable for OLAP.</p>
      <p>Much of the research on logical design for OLAP
cubes has concentrated on the class of aggregation
hierarchies that arise from star and snowflake schemata.
In such aggregation hierarchies:
(1) the attribute labelling a column in a dimension table
will correspond to a level in the aggregation hierarchy
of the corresponding cube dimension, and
(2) data values occurring in that column of the dimension
table will become the members of the corresponding
level.</p>
      <p>OLAP practitioners need a much more general notion of
aggregation hierarchies than this. For example, an
aggregation hierarchy in one of the cube’s dimensions
might be built from a hierarchical relationship that is
modelled in one of the database tables. Example 1
illustrates this.</p>
      <p>Example 1 Consider a management hierarchy in an
Employee table in which each row contains the
information pertaining to a particular employee and one
column in the table contains the employee number
(attribute EmpID) and another contains the number of
that employee’s manager (attribute MgrID). In the OLAP
cube, we have Salary as the measure and an Employee
dimension. The aggregation hierarchy in the Employee
dimension is the management hierarchy modelled by the
EmpID and MgrID columns. This aggregation hierarchy
allows us to query the total salary bill for all employees
below any given manager.</p>
      <p>We note that in Example 1, we do not have levels
corresponding to attributes. Rather, levels correspond to
data values occurring in the table. Thus two quite
distinct kinds of aggregation hierarchies can arise in
practice and we refer to these as “attribute hierarchies”
and “data hierarchies”.</p>
      <p>With regard to the logical design of cubes, much
research has been published on the role of dependencies,
such as functional dependencies, on the design of
attribute hierarchies. In the present paper, we
demonstrate that dependency theory is also relevant to cube
design for data hierarchies. In addition to the functional
dependencies, we give three new dependency classes for
data hierarchies. The anti-closure dependency prevents
calculating redundant aggregation values by disallowing
“shortcuts” in hierarchies, while non-raggedness and
balance dependencies ensure complete aggregations on
each level in a hierarchy.</p>
      <p>Correct summarizability is the most essential design
goal for dimension hierarchies in OLAP. There are three
necessary conditions for summarizability and these
conditions are also assumed to be sufficient [LeSh97]:
1. disjointness of category attributes,
2. completeness, and
3. correct use of measure (summary) attributes with
statistical functions.</p>
      <p>Disjointness requires that attributes in dimensions form
disjoint subsets over the elements. Completeness means
that all elements exist and every element is assigned to
some category on the level above it in the hierarchy.
Correct use of measure attributes with statistical functions
is the most complex of these requirements since it
depends on the type of the attribute and the type of the
statistical function. Without any prior knowledge about
the semantics of the attributes, we can verify the first two
conditions by using information on dependencies among
attributes in a dimension.</p>
      <p>We have organised the rest of the paper as follows.
Next a review of the related work is given in Section 2.
After that we present the basis of hierarchy types in
Section 3. In Section 4 the new dependencies are studied.
Finally, the conclusions are presented in Section 5.</p>
    </sec>
    <sec id="sec-2">
      <title>2 Related Work</title>
      <p>Pedersen and Jensen [PeJe99] discuss requirements for
multidimensional data models. Five of their nine
requirements are for dimension hierarchies:
1) explicit hierarchies in dimensions,
2) support for multiple hierarchies in a dimension,
3) correct aggregation of data,
4) support for non-strict hierarchies, and
5) data with different levels of granularity should be
allowed.</p>
      <p>Explicit hierarchies in dimensions means that hierarchies
should be explicitly shown in the schema to help user's
understanding of hierarchical structures in dimensions.
By multiple hierarchies in a dimension, the authors mean
that a dimension hierarchy is allowed to be an acyclic
graph, that is, there can be several paths from the top
level to the most detailed level. For example, the time
dimension can have hierarchies like day-week, and
day-month-year. The third item, correct aggregation of
data, ensures that aggregated values are not misleading.
Good structure for dimension hierarchies is essential in
guaranteeing this. The idea of the fourth item, support for
non-strict hierarchies, means that many to many
relationships between different levels in a dimension are
allowed. Finally, the last item, data with different levels
of granularity should be allowed, means that it should be
possible to store data in different levels of hierarchy, not
only to the lowest level.</p>
      <p>Jagadish et al. [JLS99] argue that balanced and tree
structured hierarchies are not enough for many real
situations but more flexible structures are needed. Zurek
and Sinnwell [ZuSi99] list some non-standard
requirements for data warehouses. Especially, realignment of a
data warehouse schema is needed quite often since
dimension hierarchies can change even regularly.</p>
      <p>Hurtado et al. [HMV99] study how data cubes can be
maintained under dimension updates. They have noticed
that dimensions are not so static than it is often assumed,
thus efficient maintaining of dimensions can remarkably
increase the efficiency of the system. A formal model for
dimension updates is presented. The model covers both
domain updates and structural updates of dimensions. An
algorithm to maintain a relational stored data cube under
dimension updates is also presented.</p>
      <p>Lehner et al. [LAW98] define the dimensional normal
form, which is a normal form for OLAP hierarchies. A
dimension is in dimensional normal form if there is a
functional dependency from each lower level to the level
above it. In many real situations, however, the
dimensional normal form may be too strict. The authors also
study a problem related to attributes that are not
applicable for all instances, for example, all products do
not have a colour.</p>
      <p>Khardon et al. [KMR99] have studied Boolean
dependencies in a context of example based reasoning.
Boolean dependencies are a generalisation to the
functional dependencies [Cod72]. While the functional
dependency has a form X→Y, where X and Y are sets of
attributes, the Boolean dependencies allow us to use
Boolean operators between attributes. Boolean
dependencies can be extended from simple equality comparison of
values to compare whether the values belong to the same
equivalence class. This makes it possible to define
constraints between value groups, e.g. age → age group.</p>
      <p>Wijsen et al. [WNC99] study generalising temporal
dependencies for non-temporal dimensions. They
generalise temporal functional dependencies into rollup
dependencies and illustrate their usability in conceptual
modelling and data mining. The rollup dependencies can
be used in modelling generalisation hierarchies if
hierarchical levels are ordered by the finer than relation.
The rollup dependencies ensure comparing equality of
values in a specific level. In this way, they are a
generalisation of the functional dependencies. As an
example, consider a relation on hotel room reservations
7-2
having attributes (ROOM_NUMBER, DATE, PRICE).
An example of a rollup dependency in this relation is
ROOM_NUMBER DATEWEEK → PRICE, meaning that
the price of the room cannot change during a week. An
axiomatisation for rollup dependencies is also presented
and negation with the rollup dependencies is studied.</p>
    </sec>
    <sec id="sec-3">
      <title>3 OLAP Hierarchies</title>
      <p>In this section we first formalise our model for the OLAP
cube and then study different kinds of data hierarchies
and their properties. Our formalisation is based on
relational database theory and we assume that the reader
knows the basics of it.</p>
      <sec id="sec-3-1">
        <title>3.1 OLAP Model</title>
        <p>To be able to operate with dependencies, we must
formalise our notion of the OLAP cube. Generally, we
assume that the cube consists of dimensions (measures
can also be handled as dimensions). We use a relational
approach [Cod70] that is close to the often-used star
schema. However, we do not locate dimension
hierarchies and the fact table in different relations but assume
that the fact table also contains the dimension hierarchies.
It is important to notice that this is only an abstract model
used as a base of the logical design and totally
independent of the implementation.</p>
        <p>Definition 1 The OLAP cube is a relation over the
relation schema H = D1 ∪ D2 ∪ ... Dn, where each Dk is a
set of attributes and it is called a dimension schema.
If D is a dimension schema, then a relation d over D is
called a dimension. The hierarchy representing the
dimension is called the dimension hierarchy. The nodes
in the dimension schema are called attribute hierarchy
levels. The tuples in the dimension relation are members
of the attribute hierarchy levels. Instead, in data
hierarchies the levels and members of the levels are
implied by the actual data.</p>
        <p>In what follows we mostly use the following example
of a dimension.</p>
        <p>Example 2 We have a dimension of sales employees. The
schema has three attributes:
E: Employee ID
M: Manager ID
G: Job grade
Each tuple, t, in a relation over {E, M, G} contains details
about an employee, namely t[E], the employee’s
Employee ID, t[G], his or her Job grade, and t[M], the
Employee ID of his or her manager.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2 Data Hierarchies</title>
        <p>In the attribute hierarchy, the levels correspond to
attributes and the members of the level have the names of
the values that occur in the corresponding attribute's
column. In a data hierarchy we have a parent-child
relationship between just two columns, for example
Employee ID and Manager ID. In addition to this, a data
hierarchy can have additional attributes to indicate the
level of the members, for example Job Grade for
employees. This level does not necessarily correspond
with the actual level of the member in the parent-child
hierarchy.</p>
        <p>Hierarchies can be classified according to their
generality. Figure 1 presents a hierarchy of different data
hierarchies. The most general is the acyclic digraph and
the strictest is the balanced and non-ragged tree.</p>
        <p>acyclic digraph
transitive anti-closed
digraph</p>
        <p>tree
balanced,
ragged
non-ragged,
unbalanced
balanced,
non-ragged
The transitive anti-closed graph is the complement of the
transitive closure graph. The transitive closure can be
constructed as follows. If there is a path from node A to
node B of length 2 or more, then add an edge from A to
B. Instead, the anti-closure can be produced as follows. If
there is a path from node A to node B of length 2 or
more, then remove an edge from A to B.</p>
        <p>These balance and raggedness properties require that
the level code is attached to each node. A graph is ragged
if each node does not have a level code corresponding to
its actual level in the hierarchy, whereas a graph is
balanced if all leaf nodes have the same level code.</p>
        <p>As Figure 1 indicates, we have six different kinds of
hierarchies but two of them have similar properties
related to OLAP hierarchies:
(1) Acyclic. This is the most general class. It allows the
day-month-quarter-year; day-year structure, where
the day-year could be called a "direct shortcut". A
direct shortcut means redundant aggregation paths.
(2) Transitive anti-closed digraph. This is an acyclic
graph with no direct shortcuts. No redundant
aggregation paths are possible.
(3) Tree. Unique aggregation paths are guaranteed.
7-3
if there exists a path from t[X] to t’[Y], then a path from
t’[Y] to t[X] does not exist.</p>
        <p>For simplicity we assume that the left part of the
hierarchy cannot contain null values. We use the example
data presented in Table 1 to illustrate different classes of
hierarchies.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4 Dependencies for Hierarchies</title>
      <p>In this section we study relational dependencies in the
context of OLAP hierarchies. We mainly concentrate on
dependencies for data hierarchies. We may consider two
types of dependencies:
(1) dependencies that must always hold because they
say something about the semantics of the data, and
(2) dependencies that might hold in the database at the
point that we want to construct the parent/child
aggregation hierarchy, but which do not always
hold.</p>
      <p>For instance, it may be that at some moment each
employee has a unique manager. This might be because
there is a dependency, which always holds, or,
alternatively, it just happens to be so at that moment but
generally it is possible that an employee has several
managers.</p>
      <p>We note that, in practice, a number of other
constraints on the data are needed. For example, it may be
that the data hierarchy should have a unique root element
for aggregation purposes. We ignore these considerations
in the present paper</p>
      <sec id="sec-4-1">
        <title>4.1 Data Hierarchy</title>
        <p>For data hierarchies, we need the inclusion dependencies,
but only in a simplified form where only a single relation
is considered.</p>
        <p>Definition 2 Let r be a relation over R and X, Y ⊆ R.
There is an inclusion dependency [X] ⊆ [Y] in r, if X(r)
⊆ Y(r).</p>
        <p>For the data hierarchy we first define a path in a relation.
Definition 3 Let r be a relation over R and X, Y ⊆ R. A
sequence of tuples t0, t1, ..., tn∈r forms a path from t0[X]
to tn[Y] with length n, if t0[Y] = t1[X], t1[Y] = t2[X], ...,
tn-1[Y] = tn[X].</p>
        <p>We write &lt;t0, t1, ..., tn&gt; for a path from the tuple t0 to the
tuple tn. A relation may contain null values but a null
value cannot exist in a valid path. If the designer is
considering the use of some attributes as a data hierarchy,
the conditions described in the following definition must
be met.</p>
        <p>Definition 4 A relation h over the relation schema H(XY)
forms a data hierarchy if [Y] ⊆ [X] and for each t, t’ ∈ h
empID
1
2
3
4
5
5</p>
        <p>5
The graph representing the relation of Table 1 is given in
Figure 2. We use the relation of Table 1 as a running
example. Typically, stricter hierarchies imply deletion of
some rows from the relation.
Unbalanced hierarchies often exist in applications and the
vendors of OLAP products have started support them. For
example, in the latest release of Microsoft Analysis
Services [Mic00], it is possible to define an "unbalanced"
tree. This allows us to have an EMPLOYEES relation
which includes the columns EmpID and MgrID, where
(as usual) the MgrID entry in a row identifies the
employee's manager. Suppose that all employees are
Salesmen or their managers. We want to aggregate sales
by manager, by aggregating the sales of all salesmen
below a manager. This is tricky because the leaf nodes
can have different depths. For example, in Table 1 (see
also Figure 2) leaf nodes ‘3’ and ‘4’ have different levels
in the tree.</p>
        <p>In a different context, Gottlob et al. defined the
closure dependency [Got89]:
Definition 5 Let r be a relation over R and X, Y ⊆ R. The
relation r obeys the closure dependency X@Y if for each
pair of tuples t and t’ in r it holds: if t[Y] = t’[X], then
there exists a tuple t’’ such that t’’[X] = t[X] and
t’’[Y]=t’[Y].
7-4
With regard to OLAP hierarchies, the complement of the
closure dependency, which we refer to as the anti-closure
dependency, is of relevance.</p>
        <p>We need the anti-closure dependency in order to
prevent a hierarchy in which a rollup path “by-passes”
intermediate nodes. The transitive anti-closure of the
hierarchy in Figure 2 is given in Figure 3. There are two
rollup paths from ‘4’ to ‘1’ in Figure 2. We want to
eliminate the direct path, so that rollup never by-passes
‘2’.</p>
        <p>5
1
3
2</p>
        <p>4</p>
        <p>For the hierarchy of Example 2, we want to ensure
that the transitive anti-closure of the graph, i.e. if t[M] =
t’[E], there should not be some longer “path” from t’ to t
using E and M values to connect tuples. This can be done
by the anti-closure dependency. It guarantees that the
corresponding hierarchy graph is transitively anti-closed,
but this does not disallow the possibility that a node can
have multiple parents, i.e. an employee reports to more
than one manager.</p>
        <p>Definition 6 Let r be a relation over R and X, Y ⊆ R. The
relation r obeys the anti-closure dependency (ACD)
X#&gt;Y, if for any path &lt;t0, t1, ..., tn&gt;, n ≥ 2, there does not
exist a direct path &lt;v1, vm&gt; in r such that t1 [X] = v1[X]
and tn[Y] = vm[Y].</p>
        <p>quarter
month
year
day
Multiple paths between nodes are allowed but if there is a
longer path between two nodes, the direct path is not
allowed. Or, the same can be said in the opposite way: if
there is a direct path between two nodes, the longer path
cannot exist. The hierarchy in Figure 4 does not obey the
ACD while the hierarchy in Figure 5 obeys it.</p>
        <p>Theorem 1 Let r be a relation over R and X, Y, Z, W ⊆
R. If X #&gt; Y, then XZ #&gt; YW.</p>
        <p>Proof. Let t, t’∈r. Since X #&gt; Y, there is not a direct and
indirect path from t[X] to t’[Y]. Adding more attributes
cannot increase the number of matching values, thus there
cannot exist more paths from t[XZ] to t’[YW] than from
t[X] to t’[Y].</p>
        <p>The anti-closure dependency is not transitive. A relation
presented in Table 2 obeys dependencies A#&gt;B and
B#&gt;C but not A#&gt;C.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.3 Functional Dependencies</title>
        <p>Although more general rollup hierarchies are useful, often
stricter hierarchies, such as trees, are needed. The
well-known functional dependencies [Cod72] can be used
to force a hierarchy to a tree. In Figure 3 there is also a
functional dependency between nodes. The functional
dependencies can be used in both attribute and data
hierarchies.</p>
        <p>In data hierarchies, we may want to restrict our
attention to Parent/Child dimensions in which each node
in the dimension has just one parent. In Example 2, we
can enforce this rule with the functional dependency E
M. The functional dependency E *PXVWDOVRKROGLQ
our example.
7-5
Definition 7 A relation r over R obeys a functional
dependency X→Y, if for each pair of tuples t, t’ ∈ r,
whenever t[X] = t’[X], then t[Y] = t’[Y].</p>
        <p>OLAP hierarchies often contain non-applicable
NULLs. With regard to functional dependencies, suppose
two of the columns in the relation are ProductID (which
will be used as a member in the cube) and Colour (which
will be incorporated as a member property). ProductId →
Colour holds. We assume that two products are given
different ProductIds if their colours differ, even if they
are identical in every other respect. However, it could be
that some products do not have a colour, so in a relation,
the entry in the Colour column for such products will be
NULL. The functional dependency means that two rows
that agree on ProductId cannot disagree on Colour. In
particular, if one such row has a non-NULL entry in the
Colour column, the other row must have an identical
entry.</p>
        <p>The functional dependency ProductId → Colour must
hold before we can consider including the Colour column
as a member property of ProductId. If we want to add
Colour as a level above the ProductId level, we would
need to include a special member in the Colour level for
products that have NULL entries in the Colour column.
Lemma 1 Let r be a relation over R and X, Y ⊆ R. If the
functional dependency X→Y holds in r, then for all two
tuples t, t’∈ r there is at most one path between t[X] and
t’[Y].</p>
        <p>Proof. (By contradiction) Assume there exist two paths
from t[X] to t’[Y]. Then there must exist an X value
related to two different Y values, that is, the functional
dependency does not hold.</p>
        <p>Theorem 2 Let r be a relation over R and X, Y ⊆ R. If
X→Y, then X #&gt; Y.</p>
        <p>Proof. Let t, t’∈r. According to Lemma 1 there can be at
most one path between t[X] and t’[Y]. Thus, X#&gt;Y holds.</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.4 Boolean Dependencies</title>
        <p>The following example illustrates the use of the Boolean
dependencies [KRM99]. Consider the Geography
dimension: Country – State – City – Neighbourhood.
With ragged hierarchies, we allow NULLs in the column
corresponding to potentially-missing parents (e.g., the
State column for the row which has "Washington DC" in
its City column). But we could insist that if we allow such
raggedness, there must be a "partial" functional
dependency in force which means that we must restrict
the numbers of cities called "Washington DC" to one per
country. There are several different cities in the USA
called "Springfield", but that is acceptable, because each
Springfield is located in a different State. If there were
two "Washington DC"s, it would not be a problem, so
long as at least one is in a State. But if neither
Washington DC is in a State, then we have ambiguity. To forbid
this a subtler dependency than a conventional FD is
needed as Example 3 shows.</p>
        <sec id="sec-4-3-1">
          <title>Example 3 We have the following rule:</title>
          <p>If two rows agree on City and Neighbourhood then
If State is NULL in both rows then</p>
          <p>They must disagree on Country
else</p>
        </sec>
        <sec id="sec-4-3-2">
          <title>Either One has a NULL in State</title>
          <p>Or they are both non-NULL in State and the State
entries are different.</p>
          <p>The Boolean dependency corresponding to the example
would be: City AND Neighbourhood → State OR
Country. Next we give a more formal definition for
Boolean dependencies.</p>
          <p>Definition 8 Let R be a relation schema. The following
expressions are valid Boolean dependencies in R:
1. A, if A∈R,
1. !A, if A is a valid expression,
2. (A), if A is a valid expression, and
3. A o B, if A and B are valid expressions and o∈{AND,</p>
          <p>OR, XOR, →, ↔}.</p>
          <p>Definition 9 Let r be a relation over R. The relation r
obeys a Boolean dependency X, if r has only one tuple or
for each pair of distinct tuples t1 and t2 in r obeys the
dependency X’ that is constructed from X by replacing
each term A in X to t1[A] = t2[A].</p>
          <p>In Definition 9, we assume for the negation that !(t1[A] =
t2[A]) ⇔ t1[A] ≠ t2[A]. The tuples t1 and t2 have to be
distinct because of problems with negation: the
dependency A → !B would be never valid if t1 = t2.
Although a Boolean dependency can be an arbitrary
Boolean expression, we restrict our study to ‘implication’
dependencies. Example 4 illustrates such a Boolean
dependency.</p>
          <p>Example 4 Table 3 obeys a Boolean dependency A→B
OR C, but it does not obey A→B OR A→C.
In our example non-raggedness means that all employees
reporting to the same manager have the same job grade.
We use the terms of graph theory. If t[X, Y] ∈ h, then
t[X] is called an underling of t[Y].</p>
          <p>Definition 10 Let h be a data hierarchy over a relation
schema H and X, Y ⊆ H. The set of direct underlings of
t[Y] is Underlings(t) = {t’ ∈ h | t’[Y] = t[X]}.
7-6
Definition 11 Let r be a relation over R and X, Y, W ⊆
R. The relation r obeys a non-raggedness dependency
(NRD) XY →NRD: LI (XY)r is a data hierarchy and for
all tuples t ∈ r, for all tuples t’, t” ∈ Underlings(t), t’[W]
= t”[W].</p>
          <p>In the definition, the role of W is to be a level identifier.
There can be two types of hierarchies according to the
values of the level identifier: 1) strict and 2) monotonic.
Strict means that the child must have a different and
strictly lower level than the parent, whereas monotonic
means that the child must not have a higher level value
than its parent has. Moreover, we say that a level
identifier is consistent if there is a one-to-one mapping
between the values of the level identifier and the actual
levels of the nodes. Consistent level identifiers and
non-raggedness dependency guarantee that no levels can
be by-passed in rollup paths.</p>
          <p>5
1
2</p>
          <p>4
Figure 6 shows that the completeness condition of
aggregations can easily be violated in ragged hierarchies.
For example, if we summarize the members of the second
level, we lost the value of node ‘5’, because it does not
rollup to any member on the second level. However, the
non-raggedness with consistent level identifiers is not a
sufficient condition for complete aggregations. For
example, the hierarchy in Figure 7 is non-ragged but
unbalanced, i.e. the leaf nodes are not on the same level.
Now, if we summarize the members of the lowest level,
we will lose the member ‘3’.</p>
          <p>1
3
2</p>
          <p>4
In a balanced hierarchy, all employees who are at the
leaves of the organisation hierarchy have the same job
grade. A hierarchy in Figure 6 is balanced, since all leaf
nodes, ‘5’ and ‘4’ are on the same level. An example of
an unbalanced tree can be seen in Figure 7. We define the
set of the leaf nodes of a hierarchy as follows:
Definition 12 Let h be a data hierarchy H. The set of the
leaf nodes of h is Leaves(h) = {t ∈ h | Underlings(t) is
empty}.</p>
          <p>Definition 13 Let r be a relation over R and X, Y, W ⊆
R. The relation r obeys a balance dependency (BD)
XY→BD:LI (XY)r is a data hierarchy and for all tuples t,
t’∈ Leaves(h), t[W] = t’[W].</p>
          <p>1
3</p>
          <p>2</p>
          <p>Non-raggedness with consistent level identifiers and
balance guarantee that aggregations are complete on
every level in the hierarchy. In the balanced hierarchy all
leaf nodes are on the same level, and in the non-ragged
hierarchy with consistent level identifiers there must exist
a node on every hierarchy level in each rollup path. An
example about non-ragged and balanced hierarchy can be
seen in Figure 8.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5 Conclusions</title>
      <p>We have studied different aggregation hierarchies and
dependencies that hold in a particular hierarchy. We have
also classified the hierarchies: the most general is the
acyclic digraph and the strictest is the balanced and
non-ragged tree. The structure of the hierarchy has
significant effect on correctness of aggregations and the
storage space needed to store pre-calculated aggregation
values. In general it can be said that a stricter hierarchy
makes it easier to get correct aggregations. When a data
hierarchy is balanced and non-ragged with consistent
level identifiers, aggregations are complete on every level
in the hierarchy. On the other hand, a hierarchy, which is
too strict, reduces the number of possible rollup paths and
thus decreases the expressive power of the OLAP cube.</p>
    </sec>
    <sec id="sec-6">
      <title>6 References</title>
      <p>7-7</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [Cod70]
          <string-name>
            <surname>Codd</surname>
            ,
            <given-names>E. F.</given-names>
          </string-name>
          :
          <article-title>A relational model for large shared data banks</article-title>
          ,
          <source>Communications of the ACM</source>
          ,
          <volume>13</volume>
          (
          <issue>6</issue>
          ),
          <year>1970</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [Cod72]
          <string-name>
            <surname>Codd</surname>
            ,
            <given-names>E. F.</given-names>
          </string-name>
          :
          <article-title>Further normalization of the data base relational model</article-title>
          ,
          <source>Data Base Systems, Courant Computer Science Symposia Series 6</source>
          ,
          <string-name>
            <surname>R.</surname>
          </string-name>
          Rustin (ed.), Prentice-Hall, New Jersey,
          <year>1972</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [Got89]
          <string-name>
            <surname>Gottlob</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schrefl</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Stumptner</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>On the interaction between transitive closure and functional dependencies</article-title>
          ,
          <source>MFDBS 89, the 2nd Symposium on Mathematical Fundamentals of Database Systems, J. Demetrovics</source>
          et al. (eds),
          <source>LNCS</source>
          , Vol.
          <volume>364</volume>
          , Springer,
          <year>1989</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [HMV99]
          <string-name>
            <surname>Hurtado</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mendelson</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Vaisman</surname>
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Maintaining data cubes under dimension updates</article-title>
          ,
          <source>Proceedings of the 15th International Conference on Data Engineering</source>
          , IEEE Computer Society Press,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [JLS99]
          <string-name>
            <surname>Jagadish</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lakshmanan</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Srivastava</surname>
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>What can hierarchies do for data warehouses?</article-title>
          ,
          <source>Proceedings of the 25th International Conference on Very Large Data Bases</source>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Atkinson</surname>
          </string-name>
          et al. (eds), Morgan Kaufmann,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [KMR99]
          <string-name>
            <surname>Khardon</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mannila</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Roth</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Reasoning with examples: propositional formulae and database dependencies</article-title>
          ,
          <source>Acta Informatica</source>
          ,
          <volume>36</volume>
          (
          <issue>4</issue>
          ):
          <fpage>267</fpage>
          -
          <lpage>286</lpage>
          ,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [LAW98]
          <string-name>
            <surname>Lehner</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Albrecht</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Wedekind</surname>
          </string-name>
          , H.:
          <article-title>Normal forms for multidimensional databases</article-title>
          ,
          <source>Proceedings of the 10th International Conference on Scientific and Statistical Data Management (SSDBM'98)</source>
          ,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [LeSh97]
          <string-name>
            <surname>Lenz</surname>
          </string-name>
          , H.-J. and
          <string-name>
            <surname>Shoshani</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Summarizability in OLAP and statistical data bases</article-title>
          ,
          <source>Ninth International Conference On Scientific And Statistical Database Management (SSDBM)</source>
          ,
          <year>1997</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [Mic00]
          <string-name>
            <surname>Jacobson</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          :
          <string-name>
            <surname>Microsoft SQL Server 2000 Analysis Services Step By Step</surname>
          </string-name>
          , Microsoft Press,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [PeJe99]
          <string-name>
            <surname>Pedersen</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Jensen</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Multidimensional data modeling for complex data</article-title>
          ,
          <source>Proceedings of the International Conference on Data Engineering (ICDE'99)</source>
          ,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [WNC99]
          <string-name>
            <surname>Wijsen</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ng</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Calders</surname>
          </string-name>
          , T.:
          <article-title>Discovering roll-up dependencies</article-title>
          ,
          <source>The Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD</source>
          <year>1999</year>
          , ACM,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [ZuSi99]
          <string-name>
            <surname>Zurek</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Sinnwell</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Data warehousing has more colours than just black &amp; white</article-title>
          ,
          <source>VLDB'99, Proceedings of 25th International Conference on Very Large Data Bases</source>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Atkinson</surname>
          </string-name>
          et al. (eds.), Morgan Kaufmann,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>