HINTA: A Linearization Algorithm for Physical Clustering of Complex OLAP Hierachies Roland Pieringer Volker Markl TransActon Software GmbH IBM Almaden Research Center K55/B1, Thomas-Dehler-Str. 18, 650 Harry Road, 81737 München, Germany San Jose, CA 95120-6099 pieringer@transaction.de marklv@us.ibm.com Rudolf Bayer Frank Ramsak Institut für Informatik Bayerisches Forschungsinstitut für Technische Universität München, wissensbasierte Systeme Orleansstr. 34, 81667 München Orleansstr. 34, 81667 München bayer@in.tum.de frank.ramsak@forwiss.de The set of base values forming a dimension generally is Abstract classified according to a set of hierarchies. For instance, Hierarchies are an important means to categorize the time dimension may have a hierarchy all-year-month- data stored in OLAP systems. OLAP queries fol- day or all-year-week-day. In this paper we will discuss low the drill/slice/dice-paradigm and therefore and further detail how the set of hierarchies can be repre- exhibit navigation patterns that follow the hierar- sented and efficiently utilized for query processing. chy of a dimension. In real-world applications, Multidimensional clustering indexes (e.g., UB-Tree, R- hierarchies are often unbalanced and share levels, Tree) handle multiple dimensions for multidimensional resulting in complex hierarchy structures. So far, range queries ([Mar99]). Encoding methods prepare hier- encoding methods for simple structured hierar- archical classification for the use of clustering B-Trees for chies have been introduced to handle hierarchies one hierarchy ([ZSL98], [MRB99]). This encoding, how- efficiently for query processing. In this paper we ever, is only useful for a special case of hierarchies, i.e., propose the HINTA algorithm to compute the hierarchy trees or simple hierarchies. In reality, hierar- clustering order for complex hierarchies by lin- chies are more complex, e.g., hierarchies are unbalanced, earization. The physical clustering of OLAP data have alternative paths and shared levels. To solve this computed by HINTA significantly improves the severe problem and make encoding techniques useful for performance of OLAP queries. HINTA enables real world scenarios, we propose HINTA, an algorithm clustering of complex hierarchies that can share that transforms an instantiation of a complex hierarchy to hierarchy levels in several classifications over a hierarchy tree. In combination with the above mentioned one dimension. encoding schemes, the resulting hierarchy can be used for clustering. In this paper, we present a formal hierarchy model, that is 1 Introduction based on graph algorithms and is introduced by the instan- A data warehouse (DW) is a physical database with an tiation of the hierarchies. integrated view onto arbitrary data. A multidimensional The rest of the paper is organized as follows. Section 2 (MD) view enables complex interactive, explorative data lists related work. Section 3 gives a motivating example analysis (OLAP, i.e. OnLine Analytical Processing). how to use hierarchy encoding and to make use of Conceptually, the data of a DW is stored in data cubes. A HINTA. In Section 4, we present the hierarchy model. data cube consists of a set of dimensions and a set of Section 5 describes HINTA, a transformation algorithm of measures. Dimensions provide categorical (qualitative) complex hierarchies to simple hierarchies. Section 6 data (e.g., products, customers, time), which determine the summarizes this paper and gives an outlook to future context of the measures (e.g., items sold, cost, turnover). work. The copyright of this paper belongs to the paper’s authors. Permission to copy 2 Related Work without fee all or part of this material is granted provided that the copies are not In the DW community, some formal models of DW, di- made or distributed for direct commercial advantage. mensions, hierarchies etc. already have been worked out. Proceedings of the International Workshop on Design and Some approaches do not explicitly include hierarchical Management of Data Warehouses (DMDW'2001) classification in their data model ([AGS97], [BPT97]). In Interlaken, Switzerland, June 4, 2001 [Sap01], [Leh98a] and [Alb01], the authors work out a (D. Theodoratos, J. Hammer, M. Jeusfeld, M. Staudt, eds.) hierarchical classification, defining hierarchy schemata http://sunsite.informatik.rwth-aachen.de/Publications/CEUR-WS/Vol-39/ R. Pieringer, V. Markl, F. Ramsak, R. Bayer 11-1 Dimension Segment Country 0 Germany 1 Austria Region 0 North 1 South 0 East 1 West TurnoverClass TG1 TG2 TG5 TA1 TA2 0 Aldi 1 Saturn 0 Aldi 0 Hofer 1Saturn 0 Hofer 1 SaturnW MicroMarket N N S E E W 0 1 A1 A2 0 1 0 0 1 1 0 1 S1 S2 A3 H1 H2 0 S4 0H3 H4 S5 S6 Outlet 0000 0001 0010 0011 0100 1000 1001 1010 1100 1101 1110 1111 Figure 3-1: Hierarchy with Encoding with classify-relationships. In [LW96], a MD model is discussed, based on relational elements. 3 Motivation Many publications propose first to establish the concep- tual model and then to do the actual implementation In a star schema ([Kim96]), dimension tables are con- ([WB97], [CT98], [GMR98]). [HLV00] show how to nected to a large fact table via dimension attributes (join systematically derive a conceptual warehouse schema attributes). The dimension table usually contains the hier- from a generalized multidimensional normal form. archies of the dimension, where for every path through the [FS99] introduce a conceptual data model, that allows hierarchy an artificial unique id (dimID) is used as join complex descriptions of the structure of aggregated enti- attribute. This dimID can be a computed number with ties and multiply hierarchically organized dimensions. respect to the encoding of the hierarchy for hierarchical [VS99] presents an overview of the understanding of clustering: dimID=surr(vm, vm-1, …, vleaf). The function commercial and scientific concepts of DW modeling. surr computes a surrogate id for the path of the dimension For single hierarchies, [ZSL98] discusses the linearization tuple. The schema of a dimension table usually includes and presents the physical representation within DBMS. the hierarchy attributes of all simple hierarchies. [MRB99] extend the linearization to multiple dimensions Conventional approaches to process queries in DW sche- and hierarchies and discuss query processing of hierarchi- mata in relational DBMS are star join algorithms, where cally organized multidimensional data. restrictions on the dimension tables result in a number of In this paper, we further present a linearization method for dimension values that are joined with the fact table. Que- complex hierarchies by transforming complex hierarchies ries that restrict dimensions, have predicates on hierarchy to simple hierarchies and using the linearization method levels. These predicates usually are point or interval re- already published in [MRB99]. strictions ([Sar97]) and result in large point sets on base [PJD99] discuss a transformation algorithm to achieve granularity (i.e., the leaf level of the hierarchy). Such summarizability on unbalanced hierarchies. point sets can be replaced by a smaller set of interval re- strictions depending on the predicate. The predicate “Germany” of the hierarchy in Figure 3-1 would result in the leaf members {“A1”, “A2”, “S1”, “S2”, “A3”}, and Segment Dimension Country Germany Austria Region North South East West AldiN SaturnN AldiS HoferE SaturnE HoferW SaturnW MicroMarket TG11 TG21 TG51 TG22 TG23 TA21 TA11 TA12 TA13 TA22 TA14 TurnoverClass Outlet A1 A2 S1 S2 A3 H1 H2 S4 H3 H4 S5 S6 Figure 3-2: Transformed Hierarchy R. Pieringer, V. Markl, F. Ramsak, R. Bayer 11-2 every such member is a join predicate to the fact table. ent to the SQL statements (i.e., the optimizer recognizes, Figure 3-1 shows a hierarchy schema (on the left) and one that a predicate on the dimension table with a correspond- hierarchy instance (on the right). The hierarchy is a com- ing join to the fact table can be replaced by a number of plex hierarchy with the paths Dimension-Country-Region- local interval predicates on the fact table). In such a case, MicroMarket-Outlet (solid arrows) or alternatively Di- the generated operator tree avoids expensive join opera- mension-Country-TurnoverClass-Outlet (dashed arrows). tions. However, for the so called residual join, i.e., the join for the result set of the fact table to the dimension 3.1 Hierarchy Encoding table in order to perform grouping, sorting, feature evalua- The identifier of the paths must be unique. Thus, a number tion, postfiltering etc., the join cannot be prevented. Com- can be used to represent the corresponding path in the pared to the first pass of query evaluation, this residual hierarchy. We establish an encoding schema on the hier- join will be performed on a relatively small number of archy, that numbers (surrogate number) the children of tuples and thus usually will not be critical for query execu- every level. The resulting identifier, called compound tion. surrogate, are the concatenated surrogates of the path, one for each level. It is shown in Figure 3-1 in the rectangles. 4 A Hierarchy Model With this encoding ([ZSL98], [MRB99]), hierarchical Graphs represent relationships between vertices. Mem- point sets can be replaced by intervals. The predicate bers in hierarchies are classified by relationships (usually “Germany”, is mapped to the interval [000; 0100]. This 1:n relationships), which we in the following call hierar- new interval predicate speeds up query execution on the chical relationships. These hierarchical relationships can fact table, when using corresponding clustering indexes be represented in a directed graph. A hierarchy instance is (because a local interval predicate can be performed on the actual instantiation of the hierarchical relationship. A the fact table instead of a join). Such an encoding is special case of a hierarchy instance is a hierarchy tree. In known for simple hierarchies. But predicates on a com- this paper, we extend the simple structure of a hierarchy plex hierarchy often result in point restrictions on the leaf tree to a more complex hierarchy graph. We use equiva- members. The predicate “TG2” specifies the leaf members lence classes defined on the graph to describe hierarchy {“A2”, “S2”, “A3”}, that cannot be expressed by an inter- instances. val when encoding the hierarchy with respect to the previ- In the first part of this section, we work out properties of ous case. directed acyclic graphs (DAG) as model to describe hier- A solution to speed up queries for DW applications with archies. The second part introduces hierarchy instances complex hierarchies is to transform the complex hierarchy and schemata. We define some special hierarchies and into a simple hierarchy while leaving hierarchical depend- describe typical hierarchies of data warehouses. encies. With this transformation and the mentioned encod- Basically, a hierarchy instance H corresponds to a graph ing, a predicate on the dimension hierarchy can be G = (V, E) with vertices vi ∈ V and typed edges ej ∈ E. V mapped to a relatively small number of intervals on the is a finite set and E is a subset of V×V×N: et ∈ E = (v1, fact table. Thus, a query with a number of intervals on the v2,)t, where v1, v2 ∈ V and t ∈ N is a type determinator fact table is performed instead of a complex join operation (type) specifying the type of the edge. We define a func- between dimension and fact table. tion T: V×V×N!N that returns the type of an edge e: HINTA changes the hierarchy from a complex to a simple T(e) = T((v1, v2)t) = t. hierarchy, where alternative paths are concatenated by preserving hierarchical dependencies. Figure 3-2 shows, 4.1 Typed Directed Acyclic Graphs the result of HINTA for the complex hierarchy of Figure 3-1 (the detailed transformation algorithm is discussed in We concentrate on DAGs ([CLR90]) with typed edges, Section 5). abbreviated by tDAG. In a DAG, a vertex v is adjacent to u, if u ! v or (u, v) ∈E. 3.2 HINTA for Star Schemata Example 4-1 (Graph): The advantage of using HINTA in combination with hier- Figure 4-1 illustrates a sample graph. This graph is a archy encoding is, that the dimension is left unchanged for tDAG (the direction of the edges is denoted by arrows, the the members of the hierarchy. Only the artificial key has type of the edges is denoted by the edge style, a solid ar- to be recomputed. A dimension table D for Figure 3-1 row denotes type 1, a dashed arrow denotes type 2). The may have the schema D(country, region, micromarket, vertices vi are { Germany, Austria, North, South, East, turnoverclass, outlet, dimID). For the geographical hierar- West, …, S6 }, the edges are: E={A1!AldiN, AldiN! chy, dimID=surrgeo(country, region, micromarket, outlet), North, North!Germany, …, West!Austria} or equiva- for the transformed hierarchy, dimID=surrgeotc(country, lently a set of pairs E={(A1, AldiN)1, (AldiN, North)1, …, region, micromarket, turnoverclass, outlet), where surr is a (TG2, Germany) 2, …, (West, Austria)1}. function that computes the encoding for the corresponding hierarchy path. Definition 4-1 (Path φ, Typed Path φt, Pathlength): These physical properties do not affect the schema. If the A path φ from u to v is a sequence of adjacent vertices (v1, optimizer is able to handle hierarchy encoding, another v2, …, vn), where vi ! vi+1, i = 1, …, n-1 and v1 = u and vn hierarchy schema and therefore encoding even is transpar- R. Pieringer, V. Markl, F. Ramsak, R. Bayer 11-3 Φ = v. We say, v is reachable from u via φ: u  → v . We Definition 4-3 (Outdegree, Indegree): say, φ contains the vertices v1, v2, …, vn. The out-degree of a vertex u (outdegree(u)) is the number A typed path φt is a path with a type t, the function T: (E× of edges leaving u, outdegreet(u) is the number of edges .. ×E)! N returns the type: with type t, leaving u. The in-degree of u (indegree(u)) is the number of edges t if ∀vi , vi+1 ∈Φ: T((vi , vi+1)) = t (i =1,...,n −1) entering u, indegreet(u) is the number of edges with type t T(Φ) =  entering u, correspondingly. ⊥ otherwise The degree of u is the sum of indegree(u) and outde- gree(u). Two paths φ1 = (v11, v12, …, v1n) and φ2 = (v21, v22, …, v2n) A rooted tDAG has a number of vertices vi with inde- have the same type t, if the types of all edges of φ1 and φ2 gree(vi) = 0. These vertices are called leaf vertices vleaf (or are the same: T(φ1) = T(φ2) = t. leaves). In the graph of Figure 4-1, the leaf vertices are The pathlength is the number of edges in path φ. {A1, A2, S1, S2, A3, H1, H2, S4, H3, H4, S5, S6}. A root Segment Germany Austria North South East West TG1 TG2 TG5 TA1 TA2 AldiN SaturnN AldiS HoferE SaturnE HoferW SaturnW A1 A2 S1 S2 A3 H1 H2 S4 H3 H4 S5 S6 Figure 4-1: Rooted Directed Acyclic Graph pathlength(φ) = pathlength(path(u, v)), vertex (root) r has an out-degree of 0. Φ if φ = u  → v and path(u,v) is the path φ from u to v. We further consider graphs, where every leaf vertex has at The type of a path is only defined, if all edges in the path least one typed path to the root. We additionally require, have the same type. that for every vertex v outdegreet(v)=1. Example 4-2 (Path, Pathlength): Example 4-3 (Indegree, Outdegree, Degree): We use Figure 4-1 as example graph. There are two paths In Figure 4-1, the vertex SaturnN has the following de- from “A1” to “Segment” φ1=(“A1”, “AldiN”, “North”, grees: indegree(SaturnN) = indegree1(SaturnN) = 2, “Germany”, “Segment”) and φ2=(“A1”, “TG1”, “Ger- outdegree(SaturnN)=1, degree(SaturnN)=3. many”, “Segment”). pathlength(φ1) = 4 and path- length(φ2) = 3, where T(φ1)=1 and T(φ2) = 2. Definition 4-4 (Subgraph): A subgraph G’ of graph G=(V,E) is a graph, whose verti- Definition 4-2 (Rooted tDAG): ces V’ and edges E’ are subsets of vertices V and edges E A rooted tDAG is a tDAG that has one vertex r that is of G: G’=(V’, E’), V’⊆V, E’⊆E. reachable from all vertices vi ∈ V \{r}. Thus, there is a path from all vi∈V to r, vi≠r. Vertex r is called root vertex Definition 4-5 (Simple tDAG): (or root). A simple tDAG (stDAG) Ts=(Vs, Es) is a subgraph of G If the union of two tDAGs G1=(V1, E1) and G2=(V2, E2) is with edges of one type t. The vertices of Ts are the vertices not rooted (i.e., G=G1∪G2=(V1∪V2, E1∪E2) is not a contained in all paths φtk from leaves of G to the root, and rooted tDAG), but G1 and G2 are rooted tDAGs, we can pathlength(φti)= pathlength(φtj), i.e., all paths from leaves construct a rooted tDAG G of G1∪G2 by adding a new to root with same type and length. vertex r and two edges e1 = (rG1, r) and e2 = (rG2, r), where rG1 is root of G1 and rG2 is root of G2: G=( V1 ∪ V2 Theorem 4-1: ∪ r, E1 ∪ E2 ∪ (rG1, r) ∪ (rG1, r)). A simple tDAG TS is a balanced tree. R. Pieringer, V. Markl, F. Ramsak, R. Bayer 11-4 Proof: tDAGs to hierarchies. This section describes hierarchies 1. A stDAG is a tree: and their properties. According to the definition of trees ([Knu99])1, a tree T has the following properties: Definition 4-7 (Hierarchy Instance): T=(V, E), where vi∈V are the vertices and ei∈E are A hierarchy instance H is a rooted tDAG H=(V, E) with directed edges, where ei = (root(Tj), root(T)) and 1 ≤ j members mi∈V and directed, typed edges ej∈E. The edges ≤ m. T is a special case of a DAG, where outde- are called hierarchical relationships. We call a member mi gree(vi)=1 for all vi∈V \{root(T)}. For every vi∈V hierarchically dependent on mj, if mj!mi (or equivalently \{root(T)}, there is a path from vi to r = root(T): (mj, mi) ∈ E). We call a member mi indirect hierarchically Φ ∀ vi∈V\{root(T)} ∃ φ: v  → r. dependent on mj, if mi is reachable from mj via a path φ: Φ A stDAG (V, E) is a rooted DAG with edges of t. m j → mi , also denoted by m j  → * mi . outdegreet(vi)=1 = outdegree(vi) (see Definition 4-5) for vi ∈ V \{r}, where r is the root. For every vertex vi We additionally define sub-hierarchies, called simple of the stDAG, there is a path from vi to root: ∀ vi∈V hierarchies HS=(VS,ES) that correspond to simple graphs. Φ \{r} ∃ φ: v  → r. All simple hierarchies HiS of a hierarchy instance H are Thus, a stDAG is a tree. partitions of H. The union of the simple hierarchies is the 2. A stDAG is a balanced tree: In a balanced tree, the height (i.e., the maximum path- hierarchy instance H: U H = H . This follows from i i S length of the path from leaves to the root) of the sub- trees is equal or has a difference of at most 1. the definition of simple graphs. In a stDAG, the pathlength of all paths from the Definition 4-8 (Hierarchy Level): leaves to the root is equal. A hierarchy level or level is an equivalence class of a Thus, a stDAG is a balanced tree. simple hierarchy containing members with the same dis- q.e.d. tance from the root. We call the level, consisting of Definition 4-6 (Equivalence Class): leaves, leaf level and the level, consisting of the root, root An equivalence class is a set of vertices with the follow- level. A simple hierarchy is a balanced hierarchy tree ing properties: Two vertices u, v of a simple tDAG with a depth equal to the pathlength of the path from the TS=(VS, ES), u, v ∈ Vs are elements of equivalence class c, leaves to the root. if pathlength(path(u, root)) = pathlength(path(v, root)), Example 4-5 (Hierarchy Instance, Simple Hierarchy, i.e., if the path length of the path from the vertices of c to Hierarchy Level): the root is identical (same distance). The graph illustrated in Figure 4-1, is a hierarchy instance Example 4-4 (Simple tDAG, Equivalence Class): with two simple hierarchies H1 and H2. In the graph of Figure 4-1, two simple tDAGs T1 and T2 The levels of H1=(V1, E1) are V1={h11, h21, h31, h41, h51), are defined: where h11={A1, A2, S1, S2, A3, H1, H2, S4, H3, H4, S5, T1 = (V1, E1), where V1 = {A1, A2, S1, S2, A3, H1, H2, S4, S6}, h21={AldiN, SaturnN, AldiS, HoferE, SaturnE, HoferW, H3, H4, S5, S6, AldiN, SaturnN, AldiS, HoferE, SaturnE, SaturnW}, h31={North, South, East, West}, h41={Germany, HoferW, SaturnW, North, South, East, West, Germany, Austria} and h51={Segment}. Austria, Segment} Definition 4-9 (Hierarchically Dependent Levels): T2 = (V2, E2), where V2 = {A1, A2, S1, S2, A3, H1, H2, S4, A level hj is hierarchically dependent on hi, if all mem- H3, H4, S5, S6, TG1, TG2, TG5, TA1, TA2, Germany, bers mjk∈hj are hierarchically dependent on members Austria, Segment} mih∈hi, i.e., ∀mjk∈hj ∃mih∈hi: (mih, mjk) ∈E. The function Equivalence classes of T1 are c11={A1, A2, S1, S2, A3, H1, HD: (V, V×V)!(V×V) computes the hierarchical relation- H2, S4, H3, H4, S5, S6 }, c21={AldiN, SaturnN, AldiS, ships {(hi, hj)} of the levels of a hierarchy instance H=(V, HoferE, SaturnE, HoferW, SaturnW}, c31={North, South, E), where hi, hj are levels of H and hi is hierarchically de- East, West}, c41={Germany, Austria} and c51={Segment}. pendent on hj. Equivalence classes of T2 are c12={A1, A2, S1, S2, A3, H1, A level hj is indirect hierarchically dependent on hi, if all H2, S4, H3, H4, S5, S6}, c22={TG1, TG2, TG5, TA1, members mjk∈ hj are indirect hierarchically dependent on TA2}, c32={Germany, Austria} and c42={Segment}. members mih∈ hi. 4.2 Hierarchies The order of dependencies often intuitively is misunder- With the definitions of graphs, we now can define hierar- stood. A simple example illustrates the correct hierarchi- chies. We draw a parallel from the concepts of rooted cal dependencies: In a geographic hierarchy with levels country, state and town, the level state is hierarchically dependent on town, because state is determined by the 1 A tree is a finite set T of one or more vertices such that there is towns. The level country also is hierarchically dependent one specially designated node called the root of the tree, root(T), on town, however indirect (via level state). and the remaining nodes (excluding the root) are partitioned into m≥0 disjoint sets T1, …, Tm and each of these sets in turn is a tree. The trees T1, …, Tm are called the subtrees of the root. R. Pieringer, V. Markl, F. Ramsak, R. Bayer 11-5 Example 4-6 (Hierarchically Dependent Levels): {h11, h21, h31, h41, h51, h22} and the hierarchical dependen- HD(H1) returns the following hierarchical dependencies: cies are HD(H) = {(h11, h21), (h21, h31), (h31, h41), (h41, h51), HD(H1) = {(A1, AldiN), (A2, AldiN), (S1, SaturnN), (S2, (h11, h22), (h22, h41)}. If we map Outlet to h11, MicroMarket SaturnN), (A3, AldiS), …, (Germany, Segment), (Austria, to h21, Region to h31, Country to h41, Dimension to h51 and Segment)}, i.e., all edges of the graph representing H1. TurnoverClass to h22, the hierarchy schema HS conforms to hierarchy instance H of Figure 4-1. Definition 4-10 (Shared Level): Two levels h1={mk1} and h2={mj2} are shared levels, if the 4.3 Hierarchies in Data Warehouses intersection of h1 and h2 is not empty. Otherwise the levels Hierarchies are used to classify the dimensions of a DW. h1 and h2 are not shared. We call such levels h1 and h2 DW model complex business contexts. Additional attrib- disjoint levels. utes are used to provide additional classification informa- Definition 4-11 (Distinct Operator): tion, e.g., the screen size of TV sets. For this reason, the The distinct operator L!L returns a subset of levels members can be augmented by classification features. L’={hk} of a set of levels L={hi}: distinct(L)={hk}=L’, Therefore, a member v in a hierarchy graph is a pair where ∀hk, hh ∈ L’: hk ≠ hh. There are no equal levels in v=(id, {fi}), where id is a unique identifier of the vertex, L’. called member label (or label), and {fi} is a set of addi- If there are several paths from a member to the root (usu- tional attributes, called feature attributes. We call such a ally true for shared levels), we call these paths alternative graph an attributed tDAG. paths. Feature attributes are assigned to hierarchy members. Generally, a member can have an arbitrary number of fea- Example 4-7 (Shared Level, Distinct Operator): tures. In many DW hierarchies, however, the hierarchy According to Example 4-5, shared levels are: h11=h12, members of one hierarchy level have the same number of h41=h32, h51=h42. feature attributes2. In this case, features are assigned to For the hierarchy instance H = H1∪H2, the distinct opera- hierarchy levels. tor returns the following levels: In a DW, hierarchies are assigned to dimensions. One distinct(H) ={ h11, h21, h31, h41, h51, h22}. dimension can contain several hierarchies. We combine The distinct operator generally is not deterministic. How- all hierarchies of one dimension to one hierarchy instance ever, the members of the levels specified by the distinct corresponding to a rooted tDAG, where the root is the 1 operator, are deterministic (e.g., the members of h1 and “All” level. Such a hierarchy instance is called DW- 2 h1 are the same). hierarchy. Usually, facts have a base granularity with re- spect to every dimension. This base granularity corre- Definition 4-12 (Balanced Hierarchy): sponds to one leaf level of the DW hierarchy. Thus, a A balanced hierarchy is a hierarchy, whose leaf members DW-hierarchy only has one (shared) leaf level. are contained in one (shared) level hl. Simple hierarchies If facts are classified with respect to different are always balanced hierarchies, because they have only granularities3 (leaf hierarchy levels), new aggregation and one leaf level (balanced tree). grouping semantics have to be introduced ([Leh98a]). The hierarchy model, however, supports such degenerated Definition 4-13 (Hierachy Schema): hierarchies. The hierarchy schema HS is a rooted tDAG specified by HS=(LS, ES), where L is a set of levels hi, and ES is a set of hierarchical relationships ES=(hi, hj) between the levels, 5 Transforming Hierarchy Instances to i.e., hj is hierarchically dependent on hi. Simple Hierarchies Definition 4-14 (Schema-Instance Conformity): First, we discuss some algorithms, that are used by A hierarchy schema HS=(LS, ES) conforms to a hierarchy HINTA. Then we discuss HINTA in detail and show a instance H=(VH, EH), if the number of levels of HS and H complete example of HINTA for the hierarchy instance of is equal, and the hierarchical dependencies of these levels Figure 3-1. are equal: 5.1 Primitive Hierarchy Instances (phi) ∀ (hiS, hjS) ∈ ES: ∃( hiH, hjH) ∈ HD(H) ∧ ∀( hiH, hjH) ∈ HD(H): ∃(hiS, hjS) ∈ ES We use the term primitive hierarchy instance, phi, for hierarchy instances, that consist of two simple hierarchies Example 4-8 (Hierarchy Schema and Instance): with one shared leaf level - the remaining levels are dis- On the left side of Figure 3-1, a hierarchy schema is illus- joint. Such a phi can be transformed to one simple hierar- trated: HS=(L, ES), where L={Outlet, MicroMarket, Re- chy instance. A phi is some kind of sub-hierarchy of a gion, TurnoverClass, Country, Dimension} and ES={(Outlet, MicroMarket), (MicroMarket, Region), (Re- gion, Country), (Outlet, TurnoverClass), (TurnoverClass, 2 Hierarchy members of one hierarchy level usually categorize Country), (Country, Dimension)}. the same information, e.g., the level “country“ may have feature As hierarchy instance H, we use Example 4-5. H = H1∪H2 attributes like number of inhabitants, gross national product, etc. = (V, EH), where the levels are LH={h11, h21, h31, h41, h51, for every country stored in the hierarchy. h12, h22, h32, h42}. The distinct levels are distinct(LH) = 3 unbalanced hierarchies R. Pieringer, V. Markl, F. Ramsak, R. Bayer 11-6 conventional hierarchy instance consisting of two simple ES1) and HS2=(VS2, ES2) is a hierarchy instance Hphi=(Vphi, hierarchies. Ephi): A phi H consists of a number of hierarchically dependent Vphi = {h1m, h1m-1, …, h1k, h2n, h2n-1, …, h2h}, where h1m disjoint levels and one shared leaf level. Figure 5-1 illus- resp. h2n are root levels of HS1 resp. HS2 and h1k and h2h are trates all possible hierarchy schemata of phi (phi1, phi2, shared levels and ∀h1j, k