HINTA: A Linearization Algorithm for Physical Clustering of
                  Complex OLAP Hierachies
                                        Roland Pieringer                                                         Volker Markl
                                  TransActon Software GmbH                                            IBM Almaden Research Center K55/B1,
                                     Thomas-Dehler-Str. 18,                                                     650 Harry Road,
                                   81737 München, Germany                                                  San Jose, CA 95120-6099
                                    pieringer@transaction.de                                                 marklv@us.ibm.com
                                                                                                                     Rudolf Bayer
                               Frank Ramsak                                                                     Institut für Informatik
                      Bayerisches Forschungsinstitut für                                                   Technische Universität München,
                          wissensbasierte Systeme                                                           Orleansstr. 34, 81667 München
                       Orleansstr. 34, 81667 München                                                              bayer@in.tum.de
                         frank.ramsak@forwiss.de

                                                                                           The set of base values forming a dimension generally is
                                            Abstract                                       classified according to a set of hierarchies. For instance,
        Hierarchies are an important means to categorize                                   the time dimension may have a hierarchy all-year-month-
        data stored in OLAP systems. OLAP queries fol-                                     day or all-year-week-day. In this paper we will discuss
        low the drill/slice/dice-paradigm and therefore                                    and further detail how the set of hierarchies can be repre-
        exhibit navigation patterns that follow the hierar-                                sented and efficiently utilized for query processing.
        chy of a dimension. In real-world applications,                                    Multidimensional clustering indexes (e.g., UB-Tree, R-
        hierarchies are often unbalanced and share levels,                                 Tree) handle multiple dimensions for multidimensional
        resulting in complex hierarchy structures. So far,                                 range queries ([Mar99]). Encoding methods prepare hier-
        encoding methods for simple structured hierar-                                     archical classification for the use of clustering B-Trees for
        chies have been introduced to handle hierarchies                                   one hierarchy ([ZSL98], [MRB99]). This encoding, how-
        efficiently for query processing. In this paper we                                 ever, is only useful for a special case of hierarchies, i.e.,
        propose the HINTA algorithm to compute the                                         hierarchy trees or simple hierarchies. In reality, hierar-
        clustering order for complex hierarchies by lin-                                   chies are more complex, e.g., hierarchies are unbalanced,
        earization. The physical clustering of OLAP data                                   have alternative paths and shared levels. To solve this
        computed by HINTA significantly improves the                                       severe problem and make encoding techniques useful for
        performance of OLAP queries. HINTA enables                                         real world scenarios, we propose HINTA, an algorithm
        clustering of complex hierarchies that can share                                   that transforms an instantiation of a complex hierarchy to
        hierarchy levels in several classifications over                                   a hierarchy tree. In combination with the above mentioned
        one dimension.                                                                     encoding schemes, the resulting hierarchy can be used for
                                                                                           clustering.
                                                                                           In this paper, we present a formal hierarchy model, that is
1         Introduction                                                                     based on graph algorithms and is introduced by the instan-
A data warehouse (DW) is a physical database with an                                       tiation of the hierarchies.
integrated view onto arbitrary data. A multidimensional                                    The rest of the paper is organized as follows. Section 2
(MD) view enables complex interactive, explorative data                                    lists related work. Section 3 gives a motivating example
analysis (OLAP, i.e. OnLine Analytical Processing).                                        how to use hierarchy encoding and to make use of
Conceptually, the data of a DW is stored in data cubes. A                                  HINTA. In Section 4, we present the hierarchy model.
data cube consists of a set of dimensions and a set of                                     Section 5 describes HINTA, a transformation algorithm of
measures. Dimensions provide categorical (qualitative)                                     complex hierarchies to simple hierarchies. Section 6
data (e.g., products, customers, time), which determine the                                summarizes this paper and gives an outlook to future
context of the measures (e.g., items sold, cost, turnover).                                work.


    The copyright of this paper belongs to the paper’s authors. Permission to copy
                                                                                           2    Related Work
    without fee all or part of this material is granted provided that the copies are not   In the DW community, some formal models of DW, di-
    made or distributed for direct commercial advantage.
                                                                                           mensions, hierarchies etc. already have been worked out.
    Proceedings of the International Workshop on Design and                                Some approaches do not explicitly include hierarchical
    Management of Data Warehouses (DMDW'2001)                                              classification in their data model ([AGS97], [BPT97]). In
    Interlaken, Switzerland, June 4, 2001                                                  [Sap01], [Leh98a] and [Alb01], the authors work out a
    (D. Theodoratos, J. Hammer, M. Jeusfeld, M. Staudt, eds.)                              hierarchical classification, defining hierarchy schemata
    http://sunsite.informatik.rwth-aachen.de/Publications/CEUR-WS/Vol-39/


R. Pieringer, V. Markl, F. Ramsak, R. Bayer                                                                                                        11-1
              Dimension                                                                                                   Segment


               Country
                                                                             0 Germany                                                                                  1 Austria


     Region                                                        0 North                  1 South                                                           0 East                 1   West
                        TurnoverClass                                                                           TG1 TG2    TG5      TA1    TA2
                                                      0 Aldi            1 Saturn               0 Aldi                                             0 Hofer         1Saturn      0 Hofer        1   SaturnW
   MicroMarket                                                 N                   N                     S                                                  E             E              W


                                                  0            1
                                                      A1       A2       0          1            0                                                 0       1                              1        0          1
                                                                            S1         S2           A3                                             H1       H2     0 S4       0H3        H4           S5         S6
               Outlet                        0000          0001        0010        0011         0100                                             1000     1001         1010   1100       1101 1110           1111


                                                                          Figure 3-1: Hierarchy with Encoding
with classify-relationships. In [LW96], a MD model is
discussed, based on relational elements.                                                                        3     Motivation
Many publications propose first to establish the concep-
tual model and then to do the actual implementation                                                             In a star schema ([Kim96]), dimension tables are con-
([WB97], [CT98], [GMR98]). [HLV00] show how to                                                                  nected to a large fact table via dimension attributes (join
systematically derive a conceptual warehouse schema                                                             attributes). The dimension table usually contains the hier-
from a generalized multidimensional normal form.                                                                archies of the dimension, where for every path through the
[FS99] introduce a conceptual data model, that allows                                                           hierarchy an artificial unique id (dimID) is used as join
complex descriptions of the structure of aggregated enti-                                                       attribute. This dimID can be a computed number with
ties and multiply hierarchically organized dimensions.                                                          respect to the encoding of the hierarchy for hierarchical
[VS99] presents an overview of the understanding of                                                             clustering: dimID=surr(vm, vm-1, …, vleaf). The function
commercial and scientific concepts of DW modeling.                                                              surr computes a surrogate id for the path of the dimension
For single hierarchies, [ZSL98] discusses the linearization                                                     tuple. The schema of a dimension table usually includes
and presents the physical representation within DBMS.                                                           the hierarchy attributes of all simple hierarchies.
[MRB99] extend the linearization to multiple dimensions                                                         Conventional approaches to process queries in DW sche-
and hierarchies and discuss query processing of hierarchi-                                                      mata in relational DBMS are star join algorithms, where
cally organized multidimensional data.                                                                          restrictions on the dimension tables result in a number of
In this paper, we further present a linearization method for                                                    dimension values that are joined with the fact table. Que-
complex hierarchies by transforming complex hierarchies                                                         ries that restrict dimensions, have predicates on hierarchy
to simple hierarchies and using the linearization method                                                        levels. These predicates usually are point or interval re-
already published in [MRB99].                                                                                   strictions ([Sar97]) and result in large point sets on base
[PJD99] discuss a transformation algorithm to achieve                                                           granularity (i.e., the leaf level of the hierarchy). Such
summarizability on unbalanced hierarchies.                                                                      point sets can be replaced by a smaller set of interval re-
                                                                                                                strictions depending on the predicate. The predicate
                                                                                                                “Germany” of the hierarchy in Figure 3-1 would result in
                                                                                                                the leaf members {“A1”, “A2”, “S1”, “S2”, “A3”}, and
                                                                                                                          Segment
      Dimension


       Country                                                              Germany                                                                                     Austria


        Region                                             North                                 South                                                  East                                    West


                                          AldiN                       SaturnN                                AldiS                        HoferE                SaturnE             HoferW                   SaturnW
    MicroMarket


                                        TG11 TG21                   TG51 TG22                                TG23                     TA21 TA11                  TA12                TA13                  TA22 TA14
   TurnoverClass


        Outlet                          A1    A2                     S1          S2                           A3                          H1     H2               S4                H3        H4            S5        S6
                                                                                 Figure 3-2: Transformed Hierarchy

R. Pieringer, V. Markl, F. Ramsak, R. Bayer                                                                                                                                                                       11-2
every such member is a join predicate to the fact table.        ent to the SQL statements (i.e., the optimizer recognizes,
Figure 3-1 shows a hierarchy schema (on the left) and one       that a predicate on the dimension table with a correspond-
hierarchy instance (on the right). The hierarchy is a com-      ing join to the fact table can be replaced by a number of
plex hierarchy with the paths Dimension-Country-Region-         local interval predicates on the fact table). In such a case,
MicroMarket-Outlet (solid arrows) or alternatively Di-          the generated operator tree avoids expensive join opera-
mension-Country-TurnoverClass-Outlet (dashed arrows).           tions. However, for the so called residual join, i.e., the
                                                                join for the result set of the fact table to the dimension
3.1    Hierarchy Encoding                                       table in order to perform grouping, sorting, feature evalua-
The identifier of the paths must be unique. Thus, a number      tion, postfiltering etc., the join cannot be prevented. Com-
can be used to represent the corresponding path in the          pared to the first pass of query evaluation, this residual
hierarchy. We establish an encoding schema on the hier-         join will be performed on a relatively small number of
archy, that numbers (surrogate number) the children of          tuples and thus usually will not be critical for query execu-
every level. The resulting identifier, called compound          tion.
surrogate, are the concatenated surrogates of the path, one
for each level. It is shown in Figure 3-1 in the rectangles.    4     A Hierarchy Model
With this encoding ([ZSL98], [MRB99]), hierarchical
                                                                Graphs represent relationships between vertices. Mem-
point sets can be replaced by intervals. The predicate
                                                                bers in hierarchies are classified by relationships (usually
“Germany”, is mapped to the interval [000; 0100]. This
                                                                1:n relationships), which we in the following call hierar-
new interval predicate speeds up query execution on the
                                                                chical relationships. These hierarchical relationships can
fact table, when using corresponding clustering indexes
                                                                be represented in a directed graph. A hierarchy instance is
(because a local interval predicate can be performed on
                                                                the actual instantiation of the hierarchical relationship. A
the fact table instead of a join). Such an encoding is
                                                                special case of a hierarchy instance is a hierarchy tree. In
known for simple hierarchies. But predicates on a com-
                                                                this paper, we extend the simple structure of a hierarchy
plex hierarchy often result in point restrictions on the leaf
                                                                tree to a more complex hierarchy graph. We use equiva-
members. The predicate “TG2” specifies the leaf members
                                                                lence classes defined on the graph to describe hierarchy
{“A2”, “S2”, “A3”}, that cannot be expressed by an inter-
                                                                instances.
val when encoding the hierarchy with respect to the previ-
                                                                In the first part of this section, we work out properties of
ous case.
                                                                directed acyclic graphs (DAG) as model to describe hier-
A solution to speed up queries for DW applications with
                                                                archies. The second part introduces hierarchy instances
complex hierarchies is to transform the complex hierarchy
                                                                and schemata. We define some special hierarchies and
into a simple hierarchy while leaving hierarchical depend-
                                                                describe typical hierarchies of data warehouses.
encies. With this transformation and the mentioned encod-
                                                                Basically, a hierarchy instance H corresponds to a graph
ing, a predicate on the dimension hierarchy can be
                                                                G = (V, E) with vertices vi ∈ V and typed edges ej ∈ E. V
mapped to a relatively small number of intervals on the
                                                                is a finite set and E is a subset of V×V×N: et ∈ E = (v1,
fact table. Thus, a query with a number of intervals on the
                                                                v2,)t, where v1, v2 ∈ V and t ∈ N is a type determinator
fact table is performed instead of a complex join operation
                                                                (type) specifying the type of the edge. We define a func-
between dimension and fact table.
                                                                tion T: V×V×N!N that returns the type of an edge e:
HINTA changes the hierarchy from a complex to a simple
                                                                T(e) = T((v1, v2)t) = t.
hierarchy, where alternative paths are concatenated by
preserving hierarchical dependencies. Figure 3-2 shows,
                                                                4.1    Typed Directed Acyclic Graphs
the result of HINTA for the complex hierarchy of Figure
3-1 (the detailed transformation algorithm is discussed in      We concentrate on DAGs ([CLR90]) with typed edges,
Section 5).                                                     abbreviated by tDAG. In a DAG, a vertex v is adjacent to
                                                                u, if u ! v or (u, v) ∈E.
3.2    HINTA for Star Schemata
                                                                Example 4-1 (Graph):
The advantage of using HINTA in combination with hier-          Figure 4-1 illustrates a sample graph. This graph is a
archy encoding is, that the dimension is left unchanged for     tDAG (the direction of the edges is denoted by arrows, the
the members of the hierarchy. Only the artificial key has       type of the edges is denoted by the edge style, a solid ar-
to be recomputed. A dimension table D for Figure 3-1            row denotes type 1, a dashed arrow denotes type 2). The
may have the schema D(country, region, micromarket,             vertices vi are { Germany, Austria, North, South, East,
turnoverclass, outlet, dimID). For the geographical hierar-     West, …, S6 }, the edges are: E={A1!AldiN, AldiN!
chy, dimID=surrgeo(country, region, micromarket, outlet),       North, North!Germany, …, West!Austria} or equiva-
for the transformed hierarchy, dimID=surrgeotc(country,         lently a set of pairs E={(A1, AldiN)1, (AldiN, North)1, …,
region, micromarket, turnoverclass, outlet), where surr is a    (TG2, Germany) 2, …, (West, Austria)1}.
function that computes the encoding for the corresponding
hierarchy path.                                                 Definition 4-1 (Path φ, Typed Path φt, Pathlength):
These physical properties do not affect the schema. If the      A path φ from u to v is a sequence of adjacent vertices (v1,
optimizer is able to handle hierarchy encoding, another         v2, …, vn), where vi ! vi+1, i = 1, …, n-1 and v1 = u and vn
hierarchy schema and therefore encoding even is transpar-


R. Pieringer, V. Markl, F. Ramsak, R. Bayer                                                                             11-3
                                                 Φ
= v. We say, v is reachable from u via φ: u    → v . We         Definition 4-3 (Outdegree, Indegree):
say, φ contains the vertices v1, v2, …, vn.                       The out-degree of a vertex u (outdegree(u)) is the number
A typed path φt is a path with a type t, the function T: (E×      of edges leaving u, outdegreet(u) is the number of edges
.. ×E)! N returns the type:                                       with type t, leaving u.
                                                                  The in-degree of u (indegree(u)) is the number of edges
       t if ∀vi , vi+1 ∈Φ: T((vi , vi+1)) = t (i =1,...,n −1)    entering u, indegreet(u) is the number of edges with type t
T(Φ) =                                                           entering u, correspondingly.
       ⊥                     otherwise                           The degree of u is the sum of indegree(u) and outde-
                                                                  gree(u).
Two paths φ1 = (v11, v12, …, v1n) and φ2 = (v21, v22, …, v2n)     A rooted tDAG has a number of vertices vi with inde-
have the same type t, if the types of all edges of φ1 and φ2      gree(vi) = 0. These vertices are called leaf vertices vleaf (or
are the same: T(φ1) = T(φ2) = t.                                  leaves). In the graph of Figure 4-1, the leaf vertices are
The pathlength is the number of edges in path φ.                  {A1, A2, S1, S2, A3, H1, H2, S4, H3, H4, S5, S6}. A root
                                                        Segment


                     Germany                                                                            Austria


             North             South                                                        East                  West
                                             TG1 TG2     TG5     TA1    TA2

   AldiN          SaturnN         AldiS                                            HoferE      SaturnE        HoferW      SaturnW


 A1     A2      S1      S2         A3                                             H1    H2         S4       H3    H4     S5    S6


                                           Figure 4-1: Rooted Directed Acyclic Graph
pathlength(φ)        =       pathlength(path(u,          v)),     vertex (root) r has an out-degree of 0.
           Φ
if φ = u  →   v and path(u,v) is the path φ from u to v.        We further consider graphs, where every leaf vertex has at
The type of a path is only defined, if all edges in the path      least one typed path to the root. We additionally require,
have the same type.                                               that for every vertex v outdegreet(v)=1.
Example 4-2 (Path, Pathlength):                                   Example 4-3 (Indegree, Outdegree, Degree):
We use Figure 4-1 as example graph. There are two paths           In Figure 4-1, the vertex SaturnN has the following de-
from “A1” to “Segment” φ1=(“A1”, “AldiN”, “North”,                grees: indegree(SaturnN) = indegree1(SaturnN) = 2,
“Germany”, “Segment”) and φ2=(“A1”, “TG1”, “Ger-                  outdegree(SaturnN)=1, degree(SaturnN)=3.
many”, “Segment”). pathlength(φ1) = 4 and path-
length(φ2) = 3, where T(φ1)=1 and T(φ2) = 2.                      Definition 4-4 (Subgraph):
                                                                  A subgraph G’ of graph G=(V,E) is a graph, whose verti-
Definition 4-2 (Rooted tDAG):                                     ces V’ and edges E’ are subsets of vertices V and edges E
A rooted tDAG is a tDAG that has one vertex r that is             of G: G’=(V’, E’), V’⊆V, E’⊆E.
reachable from all vertices vi ∈ V \{r}. Thus, there is a
path from all vi∈V to r, vi≠r. Vertex r is called root vertex     Definition 4-5 (Simple tDAG):
(or root).                                                        A simple tDAG (stDAG) Ts=(Vs, Es) is a subgraph of G
If the union of two tDAGs G1=(V1, E1) and G2=(V2, E2) is          with edges of one type t. The vertices of Ts are the vertices
not rooted (i.e., G=G1∪G2=(V1∪V2, E1∪E2) is not a                 contained in all paths φtk from leaves of G to the root, and
rooted tDAG), but G1 and G2 are rooted tDAGs, we can              pathlength(φti)= pathlength(φtj), i.e., all paths from leaves
construct a rooted tDAG G of G1∪G2 by adding a new                to root with same type and length.
vertex r and two edges e1 = (rG1, r) and e2 = (rG2, r),
where rG1 is root of G1 and rG2 is root of G2: G=( V1 ∪ V2        Theorem 4-1:
∪ r, E1 ∪ E2 ∪ (rG1, r) ∪ (rG1, r)).                              A simple tDAG TS is a balanced tree.


R. Pieringer, V. Markl, F. Ramsak, R. Bayer                                                                                   11-4
Proof:                                                                  tDAGs to hierarchies. This section describes hierarchies
1. A stDAG is a tree:                                                   and their properties.
   According to the definition of trees ([Knu99])1, a tree
   T has the following properties:                                      Definition 4-7 (Hierarchy Instance):
   T=(V, E), where vi∈V are the vertices and ei∈E are                   A hierarchy instance H is a rooted tDAG H=(V, E) with
   directed edges, where ei = (root(Tj), root(T)) and 1 ≤ j             members mi∈V and directed, typed edges ej∈E. The edges
   ≤ m. T is a special case of a DAG, where outde-                      are called hierarchical relationships. We call a member mi
   gree(vi)=1 for all vi∈V \{root(T)}. For every vi∈V                   hierarchically dependent on mj, if mj!mi (or equivalently
   \{root(T)}, there is a path from vi to r = root(T):                  (mj, mi) ∈ E). We call a member mi indirect hierarchically
                                 Φ
   ∀ vi∈V\{root(T)} ∃ φ: v     →    r.                                dependent on mj, if mi is reachable from mj via a path φ:
                                                                                Φ
   A stDAG (V, E) is a rooted DAG with edges of t.                       m j →     mi , also denoted by m j  →
                                                                                                                 *
                                                                                                                      mi .
   outdegreet(vi)=1 = outdegree(vi) (see Definition 4-5)
   for vi ∈ V \{r}, where r is the root. For every vertex vi            We additionally define sub-hierarchies, called simple
   of the stDAG, there is a path from vi to root: ∀ vi∈V                hierarchies HS=(VS,ES) that correspond to simple graphs.
                   Φ
   \{r} ∃ φ: v  →     r.                                              All simple hierarchies HiS of a hierarchy instance H are
   Thus, a stDAG is a tree.                                             partitions of H. The union of the simple hierarchies is the
2. A stDAG is a balanced tree:
   In a balanced tree, the height (i.e., the maximum path-
                                                                        hierarchy instance H:    U H = H . This follows from
                                                                                                  i
                                                                                                       i
                                                                                                        S


   length of the path from leaves to the root) of the sub-
   trees is equal or has a difference of at most 1.                     the definition of simple graphs.
   In a stDAG, the pathlength of all paths from the                     Definition 4-8 (Hierarchy Level):
   leaves to the root is equal.
                                                                        A hierarchy level or level is an equivalence class of a
   Thus, a stDAG is a balanced tree.
                                                                        simple hierarchy containing members with the same dis-
                                                       q.e.d.
                                                                        tance from the root. We call the level, consisting of
Definition 4-6 (Equivalence Class):                                     leaves, leaf level and the level, consisting of the root, root
An equivalence class is a set of vertices with the follow-              level. A simple hierarchy is a balanced hierarchy tree
ing properties: Two vertices u, v of a simple tDAG                      with a depth equal to the pathlength of the path from the
TS=(VS, ES), u, v ∈ Vs are elements of equivalence class c,             leaves to the root.
if pathlength(path(u, root)) = pathlength(path(v, root)),               Example 4-5 (Hierarchy Instance, Simple Hierarchy,
i.e., if the path length of the path from the vertices of c to          Hierarchy Level):
the root is identical (same distance).                                  The graph illustrated in Figure 4-1, is a hierarchy instance
Example 4-4 (Simple tDAG, Equivalence Class):                           with two simple hierarchies H1 and H2.
In the graph of Figure 4-1, two simple tDAGs T1 and T2                  The levels of H1=(V1, E1) are V1={h11, h21, h31, h41, h51),
are defined:                                                            where h11={A1, A2, S1, S2, A3, H1, H2, S4, H3, H4, S5,
T1 = (V1, E1), where V1 = {A1, A2, S1, S2, A3, H1, H2, S4,              S6}, h21={AldiN, SaturnN, AldiS, HoferE, SaturnE, HoferW,
H3, H4, S5, S6, AldiN, SaturnN, AldiS, HoferE, SaturnE,                 SaturnW}, h31={North, South, East, West}, h41={Germany,
HoferW, SaturnW, North, South, East, West, Germany,                     Austria} and h51={Segment}.
Austria, Segment}                                                       Definition 4-9 (Hierarchically Dependent Levels):
T2 = (V2, E2), where V2 = {A1, A2, S1, S2, A3, H1, H2, S4,
                                                                        A level hj is hierarchically dependent on hi, if all mem-
H3, H4, S5, S6, TG1, TG2, TG5, TA1, TA2, Germany,
                                                                        bers mjk∈hj are hierarchically dependent on members
Austria, Segment}
                                                                        mih∈hi, i.e., ∀mjk∈hj ∃mih∈hi: (mih, mjk) ∈E. The function
Equivalence classes of T1 are c11={A1, A2, S1, S2, A3, H1,
                                                                        HD: (V, V×V)!(V×V) computes the hierarchical relation-
H2, S4, H3, H4, S5, S6 }, c21={AldiN, SaturnN, AldiS,
                                                                        ships {(hi, hj)} of the levels of a hierarchy instance H=(V,
HoferE, SaturnE, HoferW, SaturnW}, c31={North, South,
                                                                        E), where hi, hj are levels of H and hi is hierarchically de-
East, West}, c41={Germany, Austria} and c51={Segment}.
                                                                        pendent on hj.
Equivalence classes of T2 are c12={A1, A2, S1, S2, A3, H1,
                                                                        A level hj is indirect hierarchically dependent on hi, if all
H2, S4, H3, H4, S5, S6}, c22={TG1, TG2, TG5, TA1,
                                                                        members mjk∈ hj are indirect hierarchically dependent on
TA2}, c32={Germany, Austria} and c42={Segment}.
                                                                        members mih∈ hi.
4.2     Hierarchies
                                                                        The order of dependencies often intuitively is misunder-
With the definitions of graphs, we now can define hierar-               stood. A simple example illustrates the correct hierarchi-
chies. We draw a parallel from the concepts of rooted                   cal dependencies: In a geographic hierarchy with levels
                                                                        country, state and town, the level state is hierarchically
                                                                        dependent on town, because state is determined by the
1
  A tree is a finite set T of one or more vertices such that there is   towns. The level country also is hierarchically dependent
one specially designated node called the root of the tree, root(T),     on town, however indirect (via level state).
and the remaining nodes (excluding the root) are partitioned into
m≥0 disjoint sets T1, …, Tm and each of these sets in turn is a
tree. The trees T1, …, Tm are called the subtrees of the root.


R. Pieringer, V. Markl, F. Ramsak, R. Bayer                                                                                      11-5
Example 4-6 (Hierarchically Dependent Levels):                   {h11, h21, h31, h41, h51, h22} and the hierarchical dependen-
HD(H1) returns the following hierarchical dependencies:          cies are HD(H) = {(h11, h21), (h21, h31), (h31, h41), (h41, h51),
HD(H1) = {(A1, AldiN), (A2, AldiN), (S1, SaturnN), (S2,          (h11, h22), (h22, h41)}. If we map Outlet to h11, MicroMarket
SaturnN), (A3, AldiS), …, (Germany, Segment), (Austria,          to h21, Region to h31, Country to h41, Dimension to h51 and
Segment)}, i.e., all edges of the graph representing H1.         TurnoverClass to h22, the hierarchy schema HS conforms
                                                                 to hierarchy instance H of Figure 4-1.
Definition 4-10 (Shared Level):
Two levels h1={mk1} and h2={mj2} are shared levels, if the       4.3     Hierarchies in Data Warehouses
intersection of h1 and h2 is not empty. Otherwise the levels
                                                                 Hierarchies are used to classify the dimensions of a DW.
h1 and h2 are not shared. We call such levels h1 and h2
                                                                 DW model complex business contexts. Additional attrib-
disjoint levels.
                                                                 utes are used to provide additional classification informa-
Definition 4-11 (Distinct Operator):                             tion, e.g., the screen size of TV sets. For this reason, the
The distinct operator L!L returns a subset of levels             members can be augmented by classification features.
L’={hk} of a set of levels L={hi}: distinct(L)={hk}=L’,          Therefore, a member v in a hierarchy graph is a pair
where ∀hk, hh ∈ L’: hk ≠ hh. There are no equal levels in        v=(id, {fi}), where id is a unique identifier of the vertex,
L’.                                                              called member label (or label), and {fi} is a set of addi-
If there are several paths from a member to the root (usu-       tional attributes, called feature attributes. We call such a
ally true for shared levels), we call these paths alternative    graph an attributed tDAG.
paths.                                                           Feature attributes are assigned to hierarchy members.
                                                                 Generally, a member can have an arbitrary number of fea-
Example 4-7 (Shared Level, Distinct Operator):                   tures. In many DW hierarchies, however, the hierarchy
According to Example 4-5, shared levels are: h11=h12,            members of one hierarchy level have the same number of
h41=h32, h51=h42.                                                feature attributes2. In this case, features are assigned to
For the hierarchy instance H = H1∪H2, the distinct opera-        hierarchy levels.
tor returns the following levels:                                In a DW, hierarchies are assigned to dimensions. One
distinct(H) ={ h11, h21, h31, h41, h51, h22}.                    dimension can contain several hierarchies. We combine
The distinct operator generally is not deterministic. How-       all hierarchies of one dimension to one hierarchy instance
ever, the members of the levels specified by the distinct        corresponding to a rooted tDAG, where the root is the
                                                      1
operator, are deterministic (e.g., the members of h1 and         “All” level. Such a hierarchy instance is called DW-
  2
h1 are the same).                                                hierarchy. Usually, facts have a base granularity with re-
                                                                 spect to every dimension. This base granularity corre-
Definition 4-12 (Balanced Hierarchy):                            sponds to one leaf level of the DW hierarchy. Thus, a
A balanced hierarchy is a hierarchy, whose leaf members          DW-hierarchy only has one (shared) leaf level.
are contained in one (shared) level hl. Simple hierarchies       If facts are classified with respect to different
are always balanced hierarchies, because they have only          granularities3 (leaf hierarchy levels), new aggregation and
one leaf level (balanced tree).                                  grouping semantics have to be introduced ([Leh98a]). The
                                                                 hierarchy model, however, supports such degenerated
Definition 4-13 (Hierachy Schema):                               hierarchies.
The hierarchy schema HS is a rooted tDAG specified by
HS=(LS, ES), where L is a set of levels hi, and ES is a set of
hierarchical relationships ES=(hi, hj) between the levels,       5     Transforming Hierarchy Instances to
i.e., hj is hierarchically dependent on hi.                            Simple Hierarchies
Definition 4-14 (Schema-Instance Conformity):                    First, we discuss some algorithms, that are used by
A hierarchy schema HS=(LS, ES) conforms to a hierarchy           HINTA. Then we discuss HINTA in detail and show a
instance H=(VH, EH), if the number of levels of HS and H         complete example of HINTA for the hierarchy instance of
is equal, and the hierarchical dependencies of these levels      Figure 3-1.
are equal:
                                                                 5.1     Primitive Hierarchy Instances (phi)
∀ (hiS, hjS) ∈ ES: ∃( hiH, hjH) ∈ HD(H) ∧ ∀( hiH, hjH) ∈
HD(H): ∃(hiS, hjS) ∈ ES                                          We use the term primitive hierarchy instance, phi, for
                                                                 hierarchy instances, that consist of two simple hierarchies
Example 4-8 (Hierarchy Schema and Instance):                     with one shared leaf level - the remaining levels are dis-
On the left side of Figure 3-1, a hierarchy schema is illus-     joint. Such a phi can be transformed to one simple hierar-
trated: HS=(L, ES), where L={Outlet, MicroMarket, Re-            chy instance. A phi is some kind of sub-hierarchy of a
gion, TurnoverClass, Country, Dimension} and
ES={(Outlet, MicroMarket), (MicroMarket, Region), (Re-
gion, Country), (Outlet, TurnoverClass), (TurnoverClass,         2
                                                                   Hierarchy members of one hierarchy level usually categorize
Country),              (Country,              Dimension)}.       the same information, e.g., the level “country“ may have feature
As hierarchy instance H, we use Example 4-5. H = H1∪H2           attributes like number of inhabitants, gross national product, etc.
= (V, EH), where the levels are LH={h11, h21, h31, h41, h51,     for every country stored in the hierarchy.
h12, h22, h32, h42}. The distinct levels are distinct(LH) =      3
                                                                   unbalanced hierarchies


R. Pieringer, V. Markl, F. Ramsak, R. Bayer                                                                                   11-6
conventional hierarchy instance consisting of two simple             ES1) and HS2=(VS2, ES2) is a hierarchy instance Hphi=(Vphi,
hierarchies.                                                         Ephi):
A phi H consists of a number of hierarchically dependent             Vphi = {h1m, h1m-1, …, h1k, h2n, h2n-1, …, h2h}, where h1m
disjoint levels and one shared leaf level. Figure 5-1 illus-         resp. h2n are root levels of HS1 resp. HS2 and h1k and h2h are
trates all possible hierarchy schemata of phi (phi1, phi2,           shared levels and ∀h1j, k<j≤m: ∀h2i, h<i≤n: h1j, h2i are
phi3).                                                               disjoint levels.
phi1 only contains one shared level, i.e., the leaf level of         Ephi = {ei}, where ei = (vi, vj) ∈ (ES1∪ES2): vi, vj ∈ {(Vphi ∪
H. Such a phi can be constructed, if a hierarchy has sev-            vx)}, vk!vx, vk∈Vphi.
eral hierarchically dependent shared levels. This sequence           Ephi contains all original edges between the members of
of levels is split into phi’s of type phi1 for every level.          Vphi and the “leaving” edges.
Usually, edges of both simple hierarchies “leave” phi1
                                                                     Example 5-1 (Primitive Hierarchy Instance):
(illustrated by dotted arrows). Thus, the original hierarchy
                                                                     This example shows the phi’s of the sample hierarchy
has a level hierarchically dependent on hi, if hi is not the
                                                                     instance H of Figure 4-1. H consists of three phi’s: Hp1,
root level.
                                                                     Hp2 and Hp3.
phi2 is the general case for a phi. Two simple hierarchies
                                                                     Hp1 is of type phi1. Vp1 = {Segment}, Ep1 = ∅, because the
H1 and H2 have one shared leaf level hi and a number of
                                                                     root does not have leaving edges.
hierarchically dependent levels hk, …, hh for H1 and hj, …,
                                                                     Hp2 is of type phi1 again and consists of the members Vp2
hl for H2. Usually, a level hx (shared level) is hierarchi-
                                                                     = {Germany, Austria} and the edges Ep2 = {(Germany,
cally dependent on hh and hl. The dotted arrows denote
                                                                     Segment)1, (Austria, Segment)1, (Germany, Segment)2,
these hierarchical relationships.
                                                                     (Austria, Segment)2}
phi3 is a special case of phi2, where H1 only consists of the
                                                                     The edges are of type 1 and 2 (see Figure 4-1).
shared leaf level hi, and H2 consists of additional hierar-
                                                                     Hp3 is of type phi2 and consists of two alternative paths
chically dependent levels hj, …, hl.
                                                                     with shared leaf level Outlet (see Figure )
In Figure 5-1, a splitting of a hierarchy schema of hierar-
                                                                     Vp3 = {A1, A2, S1, S2, A3, H1, H2, S4, H3, H4, S5, S6,
chy H with the two simple hierarchies HS1 and HS2 into
                                                                     AldiN, SaturnN, AldiS, HoferE, SaturnE, HoferW, SaturnW,
phi’s is illustrated. HS1 consists of the levels {A, B, D, E,
                                                                     TG1, TG2, TG5, TA1, TA2, North, South, East, West}
G, J}, HS2 consists of the levels {A, B, E, F, G, I, J}.
                                                                     Ephi3 = {(A1, AldiN), (A2, AldiN), (S1, SaturnN), (S2, Sat-
Shared hierarchy paths (levels A and B) are from type
                                                                     urnN), (A3, AldiS), …, (South, Germany), (East, Austria),
phi1, the alternative paths for levels G!D!C and
                                                                     (West, Austria)}
G!F!E are from type phi2, and the alternative paths J
and J!I are from type phi3.
                                                                     5.2    Transformation of Primitive Hierarchy In-
No other phi are possible for two simple hierarchies, be-
cause by concatenating phi’s, all hierarchy instances for                   stances
                                                                     A phi can be transformed to a simple hierarchy. In this
                            A                    phi1                section, members are denoted by v. If v is in level hi, i.e., v
                                                                     ∈ hi, we write vi, if v ∈ hj, we write vj etc. We write vx and
                                                                     vy for members not within the hierarchies. An edge (vh, vx)
                            B                    phi1
                      hh                   hl                  hl
                  C                    E
                                                                                           hh               hl
                      ...                  ...                 ...
                  D                    F         phi2
                      hk                   hj                  hj                          ...              ...
                            G
    hi                           hi                      hi
                                                                                           hk               hj
                                       I
   phi1                         phi2             phi3   phi3
           Figure 5-1: Hierarchy Schema for Primitive                                               hi
                        J
                       Hierarchy Instances                                          Figure 5-4: Transformation of a phi2
                  Figure 5-2: Example of phi’s
two simple hierarchies can be constructed.                           is a leaving edge of vh. Depending on the type of the phi,
A phi of a hierarchy instance H=H1∪H2 formally is de-                the hierarchy is transformed by deleting and adding spe-
fined in the following way:                                          cial members and edges. A phi consists of two simple
                                                                     hierarchies. For the transformation, one hierarchy is pre-
Definition 5-1 (Primitive Hierarchy Instance, phi):                  ferred, i.e., the levels of the preferred hierarchy usually
The primitive hierarchy instance (phi) of a hierarchy in-            are more significant for the encoding than the levels of the
stance H consisting of two simple hierarchies HS1=(VS1,              other hierarchy (predicate isPreferred). The isPreferred:


R. Pieringer, V. Markl, F. Ramsak, R. Bayer                                                                                    11-7
E!Bool predicate (i.e., isPreferred(e) = TRUE | FALSE)                                 any more (see Figure 5-4). The operation insert-
returns TRUE, if edge e is the edge of the preferred hier-                             path(path) inserts members and edges of path. This is
archy. Usually, a hierarchy is preferred, if it is used in                             necessary, because a member vj ∈ hj can be adjacent to
more queries than the other hierarchy. There can be many                               several members vi ∈ hi, that do not correspond to an
preference criteria (e.g., numbers, importance or kind of                              equal number of members vk ∈ hk. Thus, we have to dupli-
queries etc.).                                                                         cate the path to preserve hierarchical dependencies.
The transformation algorithm is specified in pseudo code:                              Instead of the discussed transformation algorithm, that
TransformPhiToSimpleHierarchy:                                                         concatenates two simple hierarchies, a hierarchy level
if type(Hphi)=phi1, then                                                               interleaving also is possible. This interleaving corre-
  forall edges (vi, vx)                                                                sponds to a topological sorting of the hierarchies. Further
    if not isPreferred(vi, vx) then                                                    research is necessary to work out the advantages of the
       delete edge(vi, vx)                                                             transformation methods.
    /* delete leaving edges of the non-
        preferred hierarchy, leaving edges
        of preferred hierarchy remain*/                                                5.3    Hierarchy Instance Transformation Algorithm
if type(Hphi)=phi2, then                                                                      (HINTA)
  forall edges (vl, vx)                                                                The Hierarchy INstance Transformation Algorithm,
    if not isPreferred(vl, vx) then
                                                                                       HINTA, transforms a hierarchy instance H=(V, E), repre-
       delete edges (vl, vx)
       /* delete leaving edges of the non-                                             sented by a rooted tDAG (e.g., a DW-hierarchy) into sim-
          preferred hierarchy */                                                       ple hierarchy HS = HINTA(H) = (VS, ES). The input of
  forall edges (vi, vk)                                                                HINTA is a hierarchy instance that consists of an arbitrary
    if not pathexists(vi!vj’!…!vl’!vk)                                                 number n of simple hierarchies HSk. We transform two
       insertpath(vi!vj’!…!vl’!vk)                                                     simple hierarchies HS1 and HS2 to one simple hierarchy by
    delete edges (vi, vk)                                                              splitting them into phi’s and transforming each phi with

   Hphi1                                                                                           Segment
                      Dimension


                       Country
   Hphi1                                                          Germany                                                                     Austria


             Region                                       North             South                                                 East                   West

                                TurnoverClass                                            TG1 TG2    TG5      TA1   TA2

   Hphi1   MicroMarket
                                                 AldiN        SaturnN          AldiS                                     HoferE      SaturnE        HoferW       SaturnW


                                                A1   A2      S1     S2          A3                                       H1   H2         S4        H3   H4      S5    S6
                       Outlet


                                                          Figure 5-3: Schema and Instance of phi’s
  forall edges (vi,vj) delete edges (vi,vj)                                            TransformPhiToSimpleHierarchy into a primi-
  /* make vk indirect hierarchically                                                   tive simple hierarchy. The primitive simple hierarchies are
     dependent on vi (instead of direct                                                merged to the resulting simple hierarchy HS12. We now
     hierarchically dependent) by duplica-                                             transform HS12 and the next simple hierarchy HS3 to HS123
     ting vertices and edges */
if type(Hphi)=phi3, then
                                                                                       according to the previous described steps etc. Thus, at the
  for all edges (vi, vx)                                                               end of HINTA, we get one simple hierarchy HS123..n.
    delete edges (vi, vx)                                                              In the following, we describe the process in a more formal
For phi’s of type phi1 and phi3, only “leaving” edges must                             manner:
be removed. For a phi1, all leaving edges of one of the two
hierarchies must be removed (in this case, the non-                                    The input hierarchy H is split into simple hierarchies HSi:
preferred hierarchy). For a phi3, we must remove the                                   H = U H iS , where HSi is preferred to HSi+1.
“leaving” edges of the “small” hierarchy, because the lev-                                    i
els of the other hierarchy must remain for hierarchical                                HINTA: HS = HINTA(H)
classification.                                                                        According to the informal description of HINTA above,
For phi’s of type phi2, a kind of hierarchy interleaving is                            we transform a pair of simple hierarchies into one simple
performed. The alternative paths are concatenated in the                               hierarchy, starting with the first two simple hierarchies in
meaning, that the levels of the non-preferred hierarchy are                            preference order.
made hierarchically dependent on the levels of the pre-                                H12 = Transform(HS1 ∪ HS2)
ferred hierarchy. Members of the shared leaf level hi are                              The resulting simple hierarchy H12 and the next preferred
not directly hierarchically dependent on members of hk                                 simple hierarchy HS3 are transformed:


R. Pieringer, V. Markl, F. Ramsak, R. Bayer                                                                                                                           11-8
H123 = Transform(H12 ∪ HS3)                                                      Transform is a recursive function, that transforms the
The resulting simple hierarchy H123 and the next preferred                       first phi of H into a simple hierarchy HS and concatenates
simple hierarchy HS4 are transformed etc. Thus, we have                          HS with the rest of the transformed phi’s by calling
n-1 calls of Transform for n simple hierarchies of H.                            Transform again. It terminates, when H is already a
The transformation calls also can be summed up in one                            phi, i.e., if the last phi of the original hierarchy instance is
expression:                                                                      the input parameter.
H12 = Transform(HS1∪HS2)                                                         The “∪” operator means, that for H1∪H2 the hierarchies
H123 = Transform(H12∪HS3) = Transform(                                           H1 and H2 are concatenated via the existing edges of
        Transform(HS1∪HS2) ∪ HS3)                                                members of H1 and H2.
.....                                                                            The “\” operator is a splitting of the hierarchies H1 and H2,
H123..n = Transform(H12..n-1 ∪ HSn) =


    Region
                                     North                 South                                                          East                   West
                TurnoverClass                                             TG1 TG2      TG5   TA1      TA2
 MicroMarket
                         AldiN             SaturnN              AldiS                                          HoferE        SaturnE        HoferW       SaturnW


       Outlet
                       A1       A2       S1      S2              A3                                           H1     H2          S4       H3    H4      S5     S6


                                                          Figure 5-5: Deleting Members and Edges
    Transform(Transform(.... Transform                                           i.e. H*= H1 \ H2 means, that H* is the hierarchy H1 with-
    (HS1∪HS2) ∪ HS3) ∪ … ∪ HSn-1) ∪ HSn)                                         out the members and edges of H2. Thus, H \ phi(H) is hi-
The function Transform splits a hierarchy instance H                             erarchy H without the first phi of H.
consisting of two simple hierarchies into phi’s, transforms                      Transform is called n times, if H consists of n phi’s.
                                                                                       Segment
     Dimension


      Country                                             Germany                                                        Austria


       Region                                  North                    South                               East                       West


                                 AldiN                SaturnN              AldiS                   HoferE      SaturnE             HoferW        SaturnW
    MicroMarket


                            TG11 TG21             TG51 TG22                TG23                TA21 TA11           TA12            TA13        TA22 TA14
   TurnoverClass


       Outlet                   A1    A2             S1     S2              A3                   H1    H2           S4           H3    H4       S5      S6
                                              Figure 5-6: Final Simple Hierarchy Instance and Schema
each phi into a simple hierarchy (TransformPhiTo-                                Thus, Transform terminates, because a hierarchy in-
SimpleHierarchy) and concatenates the resulting                                  stance H consists of a finite number n of phi’s.
simple hierarchies to one simple hierarchy:
Transform (H):                                                                   5.4    Example of HINTA
if H is phi then
   Transform(H) = TransformPhiToSimpleHier-                                      To illustrate HINTA, we use the hierarchy instance
archy(H)                                                                         H=(VH, EH) of Figure 4-1, where H contains two simple
otherwise                                                                        hierarchies H1S and H2S (see Example 4-5). We assume,
   Transform(H) = TransformPhiToSimple-                                          that H1S is preferred to H2S. The resulting simple hierarchy
   Hierarchy(phi(H)) ∪ Transform(H\phi(H))                                       HS is computed by:


R. Pieringer, V. Markl, F. Ramsak, R. Bayer                                                                                                             11-9
HS= HINTA(H)                                                                                                                           The resulting simple hierarchy of HINTA is the union of
The pair of simple hierarchies H1 and H2 is transformed                                                                                HS1, HS2 and HS3, as illustrated in Figure 5-6.
by Transform(H1∪H2), where H1∪H2 is not a phi.
(H1∪H2) consists of three phi, i.e., Hphi1, Hphi2 and Hphi3.                                                                           5.5                  Effects of HINTA
Hp1 (of type phi1) is the root level without edges, because                                                                            This section illustrates the benefit of HINTA in an exam-
the root level does not have leaving edges.                                                                                            ple. Usually, the benefit of HINTA is as better as larger
Hp2 (of type phi1) consists of the members {Germany,
Austria} and the corresponding edges to the root (see Fig-                                                                                                                                                                 C1
ure 5-3 and Example 5-1).
Hp3 =(V, E) (of type phi2) consists of two alternative paths
with shared leaf level Outlet (see Figure 5-3 and Example                                                                                                          0                                                                                                  2
5-1).                                                                                                                                                                       S1                                     1       S2                                                 S3

Now we transform Hp1, Hp2 and Hp3 to simple hierarchies:                                                                                            0                  1          2             0                      1              2            0                      1           2
HS1=TransformPhiToSimpleHierarchy(Hp1)                                                                                                                  L                   M          S             L                     M              S             L                     M           S

HS2=TransformPhiToSimpleHierarchy(Hp2)
HS2=TransformPhiToSimpleHierarchy(Hp3)                                                                                                   0 T
                                                                                                                                               11
                                                                                                                                                            1T
                                                                                                                                                              13
                                                                                                                                                                       0T
                                                                                                                                                                             12
                                                                                                                                                                                  0T
                                                                                                                                                                                      14
                                                                                                                                                                                           0
                                                                                                                                                                                               T21
                                                                                                                                                                                                         1
                                                                                                                                                                                                             T23
                                                                                                                                                                                                                   0
                                                                                                                                                                                                                           T22
                                                                                                                                                                                                                                  0
                                                                                                                                                                                                                                      T24
                                                                                                                                                                                                                                              0
                                                                                                                                                                                                                                                  T31
                                                                                                                                                                                                                                                            1
                                                                                                                                                                                                                                                                T34
                                                                                                                                                                                                                                                                      0
                                                                                                                                                                                                                                                                              T32
                                                                                                                                                                                                                                                                                      0
                                                                                                                                                                                                                                                                                          T33
HS1 = ({Segment}, ∅}, i.e. the root without edges.                                                                                       000                 001           010    020      100               101           110        120         200           201            210        220


HS2 = ({Germany, Austria}, {(Germany, Segment), (Aus-                                                                                                                      Figure 5-8: Transformed Hierarchy
tria, Segment)}, because Hp2 is a phi of type phi1, the
                                                                                                                                       the hierarchies are. Due to graphical illustrations, how-
edges of type 2 are deleted.
                                                                                                                                       ever, only small hierarchies can be drawn. In Figure 5-7
HS3 = (V, E): Hp3 is of type phi2 and
                                                                                                                                       we show a complex hierarchy with the levels H1=Country-
we delete the edges {(A1, AldiN), (A2, AldiN), (S1, Sat-
                                                                                                                                       State-Town and H2=Country-Category-Town, where
urnN), (S2, SaturnN), (A3, AldiS), (H1, HoferE), (H2,
                                                                                                                                       Category characterizes the size of the towns (large, mid-
HoferE), (S4, SaturnE), (H3, HoferW), (H4, HoferW), (S5,
                                                                                                                                       dle, small). Hierarchy Encoding is applied to H1, i.e., a
SaturnW), (S6, SaturnW)} and the edges {(TG1, Germany),
                                                                                                                                       predicate “Category=L” would include the leaves {T11,
(TG2, Germany), (TG5, Germany), (TA1, Austria), (TA2,
                                                                                                                                       T13, T21, T23, T31, T34} or equivalently the dimIDs
Austria)} (edges between leafs and the preferred hierarchy
                                                                                                                                       {00, 02, 10, 12, 20, 23}, i.e., 6 intervals include one sin-
are deleted, because now the members of the non pre-
                                                                                                                                       gle value. With a transformed hierarchy (as in Figure 5-8),
ferred hierarchy are directly dependent on the leafs).
                                                                                                                                       we get three intervals [000, 001], [100, 101] and
Figure 5-5 illustrates, which edges are deleted.
                                                                                                                                       [200,201]. With these intervals, the query will be proc-
We delete members {TG1, TG2, TG5, TA1, TA2} and
                                                                                                                                       essed faster on the fact table.
insert new members {TA11, TG21, TG51, TG22, TG23,
TA21, TA11, TA12, TA13, TA13, TA22, TA14} and get the set
                                                                                                                                       5.6                  Remarks
of members:
V = { A1, A2, S1, S2, A3, H1, H2, S4, H3, H4, S5, S6,                                                                                  This section gives a short impact to the quality of HINTA.
AldiN, SaturnN, AldiS, HoferE, SaturnE, HoferW, SaturnW,                                                                               We consider a hierarchy such as in Figure 3-1, where one
North, South, East, West, TA11, TG21, TG51, TG22, TG23,                                                                                is the preferred hierarchy, the other is the non-preferred
TA21, TA11, TA12, TA13, TA13, TA22, TA14}                                                                                              hierarchy. We use an encoding of the preferred hierarchy
We insert new edges of level TurnoverClass to Micro-                                                                                   (EPH), an encoding of the non-preferred hierarchy
market preserving hierarchical dependencies: {(A1, TG11),                                                                              (ENPH) and an encoding of the transformed hierarchy
(TG11, AldiN), (A2, TG21), (TG21, AldiN), (S1, TG51),                                                                                  with HINTA (ETH).

                                                                 C1
                                                                                                                                                                                                         Year
                                             L             M                S

                   0
                        S1                                     1 S2
                                                                                                         2
                                                                                                              S3
                                                                                                                                                                                      Week                                       Month


                                                                                                                                                                                                             Day
   0 T        1T         2T        3T        0         1         2              3         0         1              2         3
         11        12         13        14       T21       T22        T23           T24       T31       T32            T33       T34
                                                                                                                                                           Figure 6-1: Time Hierarchy
   00          01        02        03        10            11         12            13        20        21              22       23
                                                                                                                                         In general, no perfect linearization is possible for all
                               Figure 5-7: Complex Hierarchy
                                                                                                                                         simple hierarchies. However, due to the transformation
(TG5 , SaturnN), (S2, TG22), (TG22, SaturnN), (A3,TG23),
         1
                                                                                                                                         algorithm, the preferred hierarchy still is encoded per-
…, (TA14, SaturnW)}                                                                                                                    fectly, i.e., a predicate (point restriction) on EPH resulting
After deleting and inserting, the edges are:                                                                                           in one interval also will result in one interval for ETH,
E = {(A1, TG11), (A2, TG21), (S1, TG51), (S2, TG22), (A3,                                                                              because the affected leaves are the leaves of a complete
TG23), (H1, TA21), (H2, TA11), (S4, TA12), (H3, TA13), …,                                                                              sub-tree. This is due to the higher priority of the preferred
(North, Germany), (South, Germany), (East, Austria),                                                                                   hierarchy (the levels of the preferred hierarchy usually are
(West, Austria)}                                                                                                                       higher levels than the levels of the non-preferred hierar-


R. Pieringer, V. Markl, F. Ramsak, R. Bayer                                                                                                                                                                                                                                          11-10
chy). Thus, no disadvantages in query execution will re-        [Edi01] EDITH: European Development on Indexing
sult from HINTA for such predicates.                               Techniques for Databases with Multidimensional
The clustering of ETH compared to ENPH, however, is                Hierarchies, http://edith.in.tum.de/
not optimal, and a predicate on a level of ENPH might           [FS99] E. Franconi, U. Sattler: A Data Warehouse Con-
result in a number of intervals instead of one interval. This      ceptual Data Model for Multidimensional Aggrega-
number highly depends on the correlation of the two sim-           tion, Proc. DMDW, 1999
ple hierarchies: If the hierarchies are correlated in a large   [GMR98] M. Golfarelli, D. Maio, S. Rizzi: Conceptual
extent (e.g., for the time hierarchy in Figure 6-1 with the
                                                                   design of data warehouses from E/R schemes, Proc.
preferred hierarchy year – month – day), most restrictions
such as year = 1999 and week = 33 will result in an inter-         32th HICSS 1998
val on dimID for the transformed hierarchy year – month         [HLV00] B. Hüsemann, J. Lechtenbörger, G. Vossen:
– week – day.                                                      Conceptual Data Warehouse Design, Proc. DMDW,
                                                                   2000
                                                                [Kim96] R. Kimball: The Data Warehouse Toolkit. John
6    Conclusions and Future Work
                                                                   Wiley & Sons, New York. 1996.
In this paper, we present a concept, how to represent hier-     [Knu99] D. E. Knuth: The Art of Computer Program-
archies by directed acyclic typed graphs. We concentrate           ming, Volume 1: Fundamental Algorithms, Third
on complex hierarchies and describe HINTA, a method to             Edition, Addison Wesley, Sixth printing, 1999
linearize complex hierarchies by transforming them to
                                                                [Leh98a] W. Lehner: Modeling Large Scale OLAP Sce-
simple hierarchies and thus allow hierarchical clustering
on such hierarchies.                                               narios. In Proceedings of the 6 th International Con-
The quality of clustering depends on the correlation of the        ference on Extending Database Technology (EDBT’
simple hierarchies. For query processing, predicates on            98), Valencia, Spain, LNCS Vol. 1377, pp. 153-167,
hierarchy levels are mapped to intervals by an encoding            Springer Verlag, 1998.
that preserves hierarchical clustering.                         [Leh98b] W. Lehner: Adaptive Preaggregations-
Currently, in the EDITH project ([Edi01]), we are inte-            Strategien für Data Warehouses.(in German) Disser-
grating encoding algorithms into the kernel of the rela-           tation Thesis, University of Erlangen-Nuremburg,
tional DBMS TransBase ([Tra01]) and are investigating              1998.
query processing algorithms and optimizer strategies to         [LW96] C. Li, X. Sean Wang. A Data Model for Sup-
efficiently support these methods. Application partners            porting On-Line Analytical Processing. CIKM 1996.
will test the DBMS implementation. We are going to ana-         [Mar99] V. Markl. MISTRAL: Processing Relational
lyze HINTA by transforming complex real world hierar-              Queries using a Multidimensional Access Technique.
chies and compare clustering properties.                           Ph.D. Thesis, Technische Universität München,
We thank Martin Zirkel and Robert Fenk for fruitful dis-           1999.
cussions for reading and correcting this paper.
                                                                [MRB99] V. Markl, F. Ramsak, and R. Bayer. Improv-
                                                                   ing OLAP Performance by Multidimensional Hierar-
7    References                                                    chical Clustering. Proc. of IDEAS’99, Montreal,
[AGS97] R. Agrawal, A. Gupta, S. Sarawagi: Modelling               Canada, 1999.
   Multidimensional Databases. In Proceedings of the            [PJD99] T. B. Pedersen, C. S. Jensen, C. E. Dyreson:
   13 th International Conference on Data Engineering              Extending Practical Pre-Aggregation in On-Line Ana-
   (ICDE 97), Birmingham, UK, pp. 232-243, IEEE                    lytical Processing, Proc. 25th VLDB, 663-674, 1999
   Computer Society, 1997.                                      [Sap01] C. Sapia: PROMISE: Modeling and Predicting
[Alb01] J. Albrecht: Anfrageoptimierung in Data-                   User Behavior for Online Analytical Processing Ap-
   Warehouse-Systemen auf Grundlage des multidimen-                plications: Ph.D. Thesis submitted, Technische Uni-
   sionalen Datenmodells (in German). Dissertation                 versität München, 2001
   Thesis, Universität Erlangen-Nürnberg, 2001.                 [Sar97] S. Sarawagi. Indexing OLAP data. Data Engi-
[BPT97] E. Baralis, S. Paraboschi, E. Teniente: Materi-            neering Bulletin 20 (1), 1997, pp. 36-43.
   alized Views Selection in a Multidimensional Data-           [Tra01] TransAction Software Gmbh. TransBase Docu-
   base. In Proceedings of the 18 th International Con-            mentation, 2001. www.transaction.de
   ference on Very Large Data Bases (VLDB 97), Ath-             [VS99] P. Vassiliadis, T. Sellis, A Survey on Logical
   ens, Greece, pp. 156-165, Morgan Kaufmann, 1997.                Models for OLAP Databases: ACM SIGMOD Re-
[CLR90] T. H. Cormen, C.E. Leiserson, R.L. Rivest.:                cord 28(4) 1999, 64-69
   Introduction to Algorithms, MIT Press Cambridge,             [WB97] M.-C. Wu, A. P. Buchmann, Research Issues in
   Massachusetts London, 1990                                      data warehousing. Proc. Zth BTW 1997, 61-82
[CT98] L. Cabibbo, R. Torlone, A logical approach to            [ZSL98]C. Zou, B. Salzberg, and R. Ladin. Back to the
   multidimensional databases, Proc. 6th EDBT 1998,                Future: Dynamic Hierarchical Clustering. Proc. of
   LNCS 1377, 183-197                                              the ICDE 1998: 578 – 587, 1998.


R. Pieringer, V. Markl, F. Ramsak, R. Bayer                                                                        11-11