=Paper= {{Paper |id=Vol-39/paper-10 |storemode=property |title=SISYPHUS: A Chunk-Based Storage Manager for OLAP Cubes |pdfUrl=https://ceur-ws.org/Vol-39/paper10.pdf |volume=Vol-39 |authors=N. Karayannidis,T. Sellis |dblpUrl=https://dblp.org/rec/conf/dmdw/KarayannidisS01 }} ==SISYPHUS: A Chunk-Based Storage Manager for OLAP Cubes== https://ceur-ws.org/Vol-39/paper10.pdf
SISYPHUS: A Chunk-Based Storage Manager for OLAP Cubes
                                  Nikos Karayannidis                                                             Timos Sellis
                               Knowledge and Database                                                        Knowledge and Database
                                  Systems Laboratory                                                            Systems Laboratory
                             Department of Electrical and                                                     Department of Electrical
                                 Computer Engineering                                                       and Computer Engineering
                            National Technical University of                                             National Technical University
                                    Athens (NTUA)                                                             of Athens (NTUA)
                            Zografou 15773, Athens Greece                                               Zografou 15773, Athens Greece
                                nikos@dblab.ece.ntua.gr                                                      timos@dblab.ece.ntua.gr

                                                                                        is characterized by dynamic multidimensional analysis of
                                                                                        consolidated enterprise data supporting end user analytical
                                         Abstract                                       and navigational activities including calculations and
                                                                                        modeling applied across dimensions, through hierarchies
     In this paper, we present SISYPHUS, a storage                                      and/or across members, trend analysis over sequential
     manager for data cubes that provides an efficient                                  time periods, slicing subsets for on-screen viewing, drill-
     physical base for performing OLAP operations.                                      down to deeper levels of consolidation, rotation to new
     On-Line Analytical Processing (OLAP) poses                                         dimensional comparisons in the viewing area etc. …".
     new requirements to the physical storage layer of                                  The OLAP data space is composed of measures
     a database management system. Special                                              (alternatively facts1) and dimensions. In the real world, a
     characteristics of OLAP cubes such as                                              measure would be typically an attribute in some enterprise
     multidimensionality, hierarchical structure of                                     model that changes constantly and there is interest in
     dimensions, data sparseness, etc., are difficult to                                measuring its values in regular periods. Common
     handle with ordinary record-oriented storage                                       examples of measures are total sales during a day, balance
     managers. The SISYPHUS storage manager is                                          snapshots of a bank account, inventory levels of a
     based on a chunk-based data model that enables                                     warehouse etc.
     the hierarchical clustering of data with a very low                                A dimension is another enterprise attribute that does not
     storage cost. Moreover, it provides an access                                      change with time (and if it does this happens very slowly
     interface that is “hierarchy aware” and thus native                                compared to measures) and has a constant value for a
     to the OLAP data space. This interface can be                                      specific measure value. For example, the date of the day,
     used to implement efficient access paths to cube                                   the name of the store and the specific product that a total
     data.                                                                              refers to, characterize a sales total at the end of a day for a
                                                                                        large retail store. At least one of these constants will have
                                                                                        a different value for a different measure value. Therefore,
1 Introduction                                                                          dimension values can uniquely identify a fact value in the
On-Line Analytical Processing (OLAP) is a trend in                                      same sense that a set of coordinates uniquely identifies a
database technology, based on the multidimensional view                                 point in space.
of data. A good definition of the term OLAP is found in                                 A cube can be envisioned as a multi-dimensional grid
[OLAP97]: "…On-Line Analytical Processing (OLAP) is                                     built from the dimension values. Each cell in this grid
a category of software technology that enables analysts,                                contains a set of measure values, which are all
managers and executives to gain insight into data through                               characterized by the same combination of coordinates.
fast, consistent, interactive access to a wide variety of                               Note that in the literature the term "cube" usually implies
possible views of information that has been transformed                                 a set of pre-computed aggregates along all possible
from raw data to reflect the real dimensionality of the                                 dimension combinations. In what follows by "cube" we
enterprise as understood by the user. OLAP functionality                                will mean just a set of facts organized as described above
                                                                                        (in SISYPHUS, a cell is simply defined as a set of
                                                                                        measures).
 The copyright of this paper belongs to the paper’s authors. Permission to copy
 without fee all or part of this material is granted provided that the copies are not
                                                                                        OLAP poses new requirements to storage management.
 made or distributed for direct commercial advantage.                                   Ordinary record-oriented storage managers have been
 Proceedings of the International Workshop on Design and                                designed to fulfill mainly the needs of on-line transaction
 Management of Data Warehouses (DMDW'2001)
 Interlaken, Switzerland, June 4, 2001                                                  1
                                                                                            The terms "fact" and "measure" will be used interchangeably
 (D. Theodoratos, J. Hammer, M. Jeusfeld, M. Staudt, eds.)
                                                                                        through this text.
 KWWSVXQVLWHLQIRUPDWLNUZWKDDFKHQGH3XEOLFDWLRQV&(85:69RO




N. Karayannidis, T. Sellis                                                                                                                        10-1
processing (OLTP) systems and thus fail to serve as an         2 OLAP requirements relative to storage
efficient storage basis for doing OLAP. Therefore the
need for storage managers that adapt well to OLAP
                                                               management
characteristics is essential.                                  A typical RDBMS storage manager offers the storage
Our contribution to this problem can be summarized as          structures, the operations, and in one word the framework,
follows:                                                       in order to implement a tuple (or record) oriented file
     • We present the design of a storage manager              system on top of an operating system’s file system or
          specific to OLAP cubes, based on a chunk-            storage device interface. Precious services, such as the
          oriented file system, called SISYPHUS.               management of a buffer pool, in which pages are fetched
     • The chunk-oriented file system offered by               from permanent storage and “pinned” into some page slot
          SISYPHUS:                                            in main memory, or the concurrency control with different
              o is natively multi-dimensional and              kind of locks offered at several granularities, and even the
                    supports hierarchies,                      recovery management done by a log manager, can all
              o clusters data hierarchically,                  gracefully be included in a storage manager system. The
              o is space conservative in the sense that it     record-oriented SHORE storage manager [SSMP97]
                    copes with the cube sparseness, and        offers all of these functionalities.
              o adopts a location-based data-addressing        However, in the context of OLAP some of these services
                    scheme instead of a content-based one.     have “restricted usefulness”, while some other
     • SISYPHUS provides a data-access interface that          characteristics that are really needed are not supported by
          enables navigation in the multi-dimensional and      a record-oriented storage manager. For example, it is
          multi-level data space of a cube. This interface     known that in OLAP there are no transaction-oriented
          can be used for defining more elaborate cube-        workloads with frequent updates to the database. Most of
          oriented access paths.                               the loads are read-only. Moreover, queries in OLAP are
SISYPHUS is implemented on top of the SHORE Storage            much more demanding than in OLTP systems and thus
Manager (SSM), a C++ library for building object               pose an imperative need for small response times, which
repository servers developed at the University of              in storage management terms translates to efficient access
Wisconsin-Madison [SSMP97].                                    to the stored data. Also, concurrent access to the data is
The structure of this paper is as follows: in section 2 we     not as important in OLAP as it is in OLTP. This is due to
argue on the new requirements posed to storage managers        the read-oriented profile of OLAP workloads and the
in the context of OLAP. In section 3, we present a             different end-user target groups between the two.
hierarchy of abstraction levels offered by the SISYPHUS        Additionally, OLAP data are natively multi-dimensional.
modules. In section 4, we present the heart of SISYPHUS,       This means that the underlying storage structures should
which is the chunk-oriented file system. In section 5, we      provide efficient access to the data, when the latter are
present a set of access operations offered by SISYPHUS         addressed by dimension content. Unfortunately, record-
with which more elaborate OLAP access methods and              oriented storage managers are natively one-dimensional
operations can be defined. We begin though, with a small       and cannot adapt well to this requirement. Moreover, the
hint on chunking.                                              intuitive view of the cube as a multidimensional grid with
Chunking is not a new concept in the relevant literature.      facts playing the role of the data points within this grid,
Several papers exploit chunks; to our knowledge, the first     points out the need for addressing data by location and not
paper to introduce the notion of the chunk was [SaSt94].       by content, as it is in ordinary storage managers.
Very simply put, a chunk is a sub-cube within a cube with      Finally, dimensions in OLAP contain hierarchies. The
the same dimensionality as the encompassing cube. A            most typical dimensional restriction is to select some point
chunk is created by defining distinct ranges of members        at a higher aggregation level, e.g. year “1998” that will
along each dimension of the cube. In other words, by           next be interpreted possibly to some range on the most
applying chunking to the cube we essentially perform a         detailed data. Again, ordinary storage managers do not
kind of grouping of data. It has been observed ([SaSt94,       support hierarchies in particular.
DeRaSh+98]) that chunks provide excellent clustering of        The need for smaller response times makes the issue of
cells, which results in less I/O cost when reading data        good physical clustering of the data a central point in
from a disk and also better caching, if a chunk is used as a   storage management. Sometimes this might cause
caching unit.                                                  inflexibility in updating. However, considering the profile
Chunks can be of uniform size [SaSt94, ChIo99] or of           of typical OLAP workloads this is acceptable.
variant size [DeRaSh+98]. Our approach of chunking             As a last point, we should not forget that OLAP cubes are
deals with variant size chunks that are built according to     usually very sparse. [Co96] argues that only 20% of the
the parent-child relationships of the dimension members        cube contains real data. Therefore, the storage manager
along an aggregation path. A similar approach has been         must cope with sparseness and make good space
adopted in [DeRaSh+98] for caching OLAP query results.         utilization.




N. Karayannidis, T. Sellis                                                                                            10-2
3 Levels of abstraction in SISYPHUS                                   •    Unpin bucket: unpins the bucket from the buffer
                                                                           pool and, if it has been updated, it writes it back
The levels of abstraction in a storage manager are guided                  to permanent storage.
by the principles of data abstraction and module design          The interface provided by the buffer manager to the next
[GrRe93]. Each level plays its own role in storage               higher level is viewing a bucket as an array of chunks.
management by hiding details of the levels below from the        Therefore, appropriate chunk-access operations are used
levels above. Figure 1 depicts the abstraction levels            in this interface.
implemented in SISYPHUS. This hierarchy of levels had
to stand upon the corresponding abstraction levels                                          &XEH


provided by the record-oriented SHORE storage manager                                      $FFHVV

                                                                                           0HWKRGV
                                                                                                        2/$3 3URFHVVLQJ



(SSM) [SSMP97]. We will start our description of Figure
1 in a bottom up approach.                                           &HOO2ULHQWHG

                                                                          $FFHVV

The SSM provides a hierarchy of storage structures. A
                                                                                           $FFHVV       &KXQN2ULHQWHG )LOH 0DQDJHPHQW
device corresponds to a disk partition or an operating                                     0DQDJHU           2IIVHW%DVHG $FFHVV


system file used for storing data. A device contains
volumes. A volume is a collection of files and indexes             &KXQN2ULHQWHG

                                                                          $FFHVV

managed as a unit. A file is a collection of records. A                                    /RJJLQJ

                                                                                           5HFRYHU\
record is an un-typed container of bytes consisting                                                     %XIIHU 0DQDJHPHQW

basically of a header and a body. The body of a record is                                   %XIIHU

                                                                                           0DQDJHU
the primary data storage location and can range in size           %XFNHW2ULHQWHG

                                                                          $FFHVV
from zero bytes to 4-GB.
The SISYPHUS file manager’s primary task is to hide all                                  )LOH 0DQDJHU
                                                                                                        %XFNHW2ULHQWHG )LOH 0DQDJHPHQW


the SSM details. The higher levels don’t have to know
                                                                            660
anything about devices, disk volumes, SSM files, SSM              5HFRUG2ULHQWHG

records, etc. The abstraction provided by this module is                  $FFHVV


that the basic file system consists of a collection of cubes,                               660         5HFRUG2ULHQWHG 6WRUDJH 0DQDJHU

where each cube is a collection of buckets.
Each cube is stored in a single SSM file. We use an SSM
record to implement a bucket. In our case however, a               Figure 1: The abstraction levels in SISYPHUS storage
bucket is of fixed size. This typically equals the size of the                            manager
operating system’s disk page (e.g. 8,192 bytes). A bucket
is recognized within a cube with its bucket id, which            The access manager is concerned with all the details of
encapsulates its record counterpart.                             managing the chunks such as which chunks to place in
The file manager communicates with the SSM level with            what bucket, etc. A typical sample of these administrative
record access operations provided by SSM. A typical              operations is the following:
subset of operations offered by the file manager is:                  • Create cube: actually this is just a wrapper that
      • Create cube: allocates a new file for the new                     calls the underlying counterpart offered by the
          cube and registers the new cube in the catalog                  file manager.
      • Destroy cube: the destruction of a cube.                      • Drop cube: similarly, this method calls the file
      • Create bucket: allocate a new bucket for a cube.                  manager’s counterpart.
      • Destroy bucket: destroy a specified bucket.                   • Load cube: receives as input a file containing the
      • Bucket scan: iterate through all buckets of a cube                detailed data and the schema of the cube (i.e.
          in the order of their physical storage.                         dimensions, hierarchies, etc.) and loads these into
The next level of abstraction is the buffer manager. This                 a SISYPHUS cube.
level’s basic concern is to hide all the file system specific         • Incremental load: receives detailed data that are
details and give the impression of a virtual memory space                 incrementally loaded to an already loaded with
of buckets, as if the whole database were in main memory.                 data, cube.
It is a client of the file manager in the sense that buckets      However, the most important responsibility of the access
have to be pinned from a cube into a page in the buffer          manager is to give the illusion of a multi-dimensional and
pool. The underlying page-oriented SSM buffer manager            multi-level space of cube cells, i.e. cube data points. The
implements the replacement policies and also the                 important thing to emphasize here is that this is also the
collaboration with the log manager, for logging of               native data space of an OLAP cube consisting of many
transactions and recovery precautions. Typical operations        dimensions with each dimension having at least one
offered are:                                                     hierarchy of levels.
      • Pin bucket: pins a bucket in the buffer pool.            Each cell in this data space is characterized by a chunk-id,
          Also, locks the bucket with a read (shared) or         which we will discuss in detail later. The access manager
                                                                 provides a set of access operations for seamless
          write (exclusive) lock.
                                                                 navigation in the multi-dimensional and multi-level space



N. Karayannidis, T. Sellis                                                                                                    10-3
of cube data cells. These operations are the interface used    As an example in Figure 2, we depict a CUSTOMER
by higher-level access methods, or OLAP operators, in          dimension consisting of two paths. We call a specific
order to access the cube data. We defer to mention these       instantiation of a level L of a dimension D a member of L,
“access operations” until section 5.1, where we will take a    e.g. “1997” is a member of the Year level of dimension
detailed look at them.                                         Date.

4 A chunk-oriented file system                                                       Country (3)         Sales region (3)
The basic file system based on fixed size buckets
                                                                                                            City (2)
mentioned earlier is used as the foundation for                                        State (2)
implementing a chunk-oriented file system. Each chunk
represents a semantic subset of the cube and therefore,                                 City (1)         Sales district (1)

chunks are of variable size. The semantics are drawn from
                                                                                       Store (0)           Store (0)
the parent-child relationships along aggregation paths on           Grain level
each dimension. A chunk-oriented file system destined for                             PATH 0              PATH 1
a storage base for OLAP cubes, has to provide the
following services:                                                                       CUSTOMER dimension
     • Storage allocation: It has to store chunks into the
                                                                           Figure 2: An example of a dimension
          buckets provided by the underlying bucket-
          oriented file system.
                                                               The chunk-oriented file system will be based on a single
     • Chunk addressing: A single chunk must be made           hierarchy path from each dimension. We call this path the
          addressable from other modules. This means that      primary path of the dimension. Data will be physically
          an identifier must be assigned to each chunk.        clustered according to the dimensions’ primary paths.
          Moreover, an efficient access path must exist via    Since, queries based on primary paths are likely to be
          that identifier.                                     favored in terms of response time, it is crucial for the
     • Enumeration: There must be a fast way to get            designer to decide on the paths that will play the role of
          from one chunk to the “next” one. However, as        the primary paths.
          we will see, in a multi-dimensional multi-level      A very useful characteristic in OLAP is that the members
          space, “next” can have many interpretations.         of a level are typically known a priori. Moreover, this
     • Data point location addressing: Cube data points        domain remains unchanged for sufficiently long periods.
          should be made accessible via their location in      A very common trend in the literature [Sa97, RoKoRo97,
          the multi-dimensional multi-level space.             DeRaSh+98, MaRaBa99, VaSk00] is to impose a specific
     • Data sparseness management: Space allocated             ordering on these members. One can implement this
          should not be wasteful and must handle               ordering through an integer mapping for the members of
          efficiently the native sparseness of cube data.      each level. Obviously, this total ordering among the
     • Maintenance: Although, transaction oriented             members can be either inherent (e.g. for day values), or
          workloads are not expected in OLAP                   arbitrarily set (e.g. for city values). Either way, it is very
          environments, the system must be able to support     useful to assign a distinct value to each member. This
          at least periodic incremental loads in a batch       distinct value can play the role of a surrogate key [Ki96]
          form.                                                in relational OLAP (ROLAP) systems or the role of an
In the following sections we will describe the chunking        index value for computing cell offsets in multidimensional
method used, called hierarchical chunking and discuss the      OLAP (MOLAP) systems [Sa97]. Moreover, it is far more
details of mapping chunks into buckets. However, first we      efficient to handle simple integers than non-numeric data
begin with a small discussion on how we model dimension        types e.g. character strings. We will call this distinct value
data.
                                                               the order code of a member.
                                                               In our model, we choose to order the members of a level
4.1 Dimension Data
                                                               according to the primary path that this level belongs. We
In many cases, dimension values are organized into levels      start from 0 and assign consecutive order codes to
of consolidation defining a hierarchy, i.e. an aggregation     members with a common parent member. The sequence is
path. For example, the Time dimension consists of day          never reset but continuously incremented until we reach
values, month values and year values, which belong to the      the end of a level’s domain. This way an order code
day level, month level and year level respectively. It is      uniquely specifies a member within a level. Moreover,
typical for a dimension to be comprised of more than one       order codes can be easily implemented with common
aggregation path. In our model, all the paths of a             RDBMS data types such as sequence, or serial, where the
dimension have always a common level containing the
                                                               increment is taken care of automatically by the system.
most detailed data possible. We call this the grain level of
the dimension.                                                 Similar “hierarchical” ordering approaches have been
                                                               used in [DeRaSh+98, MaRaBa99].



N. Karayannidis, T. Sellis                                                                                                    10-4
                            CountryA                           into the shorter hierarchies until they reach the length of
                               (0)
                                                               the longest one. This "padding" is done after the level that
                                                               is just above the grain level. In our example, the PRODUCT
                   StateA              StateB                  dimension has only three levels and needs one pseudo-
                    (0)                 (1)
                                                               level in order to reach the length of the LOCATION
                                                               dimension. This is depicted next, where we have also note
           CityA       CityB       CityC        CityD          the order code range at each level:
            (0)         (1)         (2)          (3)           LOCATION:[0-2].[0-4].[0-10].[0-18]
                                                               PRODUCT:[0-1].[0-2].P.[0-5]
  Figure 3: A member code denotes the whole path of a          In Figure 5, we show the hierarchical chunking of our
          member in a specific level hierarchy                 example cube. We begin our chunking method at
                                                               chunking depth D = 1. We choose the top level from each
In order to uniquely identify a member within a dimension      dimension and insert it into a set called the set of pivot
we also assign to each member a member code. This is           levels PVT. Therefore initially, PVT = {LOCATION:
constructed by the order codes of all its ancestor members     continent, PRODUCT: category}. This set will guide
along the primary path, separated by dots. For example,        the chunking process at each step.
the member code of CityC along path 0 is "0.1.2"
(Figure 3).                                                                    location          product

4.2 The hierarchically chunked cube                                           continent        category
In this sub-section we discuss our proposal for a chunking
method in order to organize the data of the cube. We                            country
believe that this method is close to the OLAP                                                      type
requirements that we have posed in section 0. Intuitively,                      region
one can support that a typical OLAP workload, where
consecutive drill-downs into detail data or roll-ups to
                                                                                  city             item
more consolidated views of the data are common,
essentially involves swing movements along one or more
aggregation paths. Moreover, in [DeRaSh+98] this               Figure 4: Dimensions of our 2-dimensional example cube
property of OLAP queries is characterized as
"hierarchical locality". The basic incentive behind            On each dimension, we define discrete ranges of grain-
hierarchical chunking is to partition the data space by        level members, denoted in the figure as [a..b], where a and
forming a hierarchy of chunks that is based on the             b are grain-level order-codes. Each such range is defined
dimensions' hierarchies.                                       as the set of members with the same parent (member) in
We model the cube as a large multidimensional array,           the pivot level. Due to the imposed ordering, these
which consists only of the most detailed data possible. In     members will have consecutive order codes, thus, we can
this primary definition of the cube, we assume no pre-         talk about "ranges" of grain-level members on each
computation of aggregates. Therefore, a cube C is              dimension. For example, if we take member 0 of pivot
formally defined as the following (n+m)-tuple:                 level continent of the LOCATION dimension, then the
                   C ≡ (D1,…,Dn, M1,… Mm)                      corresponding range at the grain level is cities [0..5].
where Di, for 1<= i <=n, is a dimension and Mj, for 1<= j
<=m, is a measure.                                                     Table 1: Members of dimension PRODUCT
Initially we partition the cube in a very few regions (i.e.
chunks) corresponding to the most aggregated levels of             Category     Type                   Item
the dimensions' hierarchies. Then we recursively re-
                                                                    Books     Literature   “Murderess”, A. Papadiamantis
partition each region as we drill-down to the hierarchies of          0          0.0                   0.0.0
all dimensions in parallel. We define a measure in order to                                   “Karamazof brothers” F.
distinguish each recursion step called chunking depth D.                                           Dostoiewsky
For visualization reasons we will use an example of a 2-                                               0.0.1
                                                                              Philosophy   “Zarathustra”, F. W. Nietzsche
dimensional cube, hosting sales data for a fictitious                             0.1                  0.1.2
company. The dimensions of our cube are depicted in                                             “Symposium”, Plato
Figure 4. Namely, these are location and product. In                                                   0.1.3
Table 1 and Table 2, we can see the members for each                Music     Classical     “The Vivaldi Album Special
level of these dimensions, each appearing with its member            1           1.2                  Edition”
                                                                                                       1.2.4
code.                                                                                        “Mozart: The Magic Flute”
In order to apply our method, we need to have hierarchies                                              1.2.5
of equal length. For this reason, we insert pseudo-levels P



N. Karayannidis, T. Sellis                                                                                                  10-5
      Table 2: Members of dimension LOCATION                 (LOCATION:     2.3.[7-8].[11-14],           PRODUCT:
                                                             1.2.P.[4-5]). This chunk is a child chunk of the chunk
    Continent    Country       Region           City         mentioned in the previous paragraph and is also grayed in
                                                             the figure at D = 2.
     Europe      Greece      Greece-North     Salonica
       0                       0.0.0         0.0.0.0         Similarly, we proceed the chunking by descending in
                             Greece-South      Athens        parallel all dimension hierarchies and at each depth D we
                               0.0.1         0.0.1.1
                                              Rhodes
                                                             create new chunks within the existing ones. The total
                                             0.0.1.2         number of chunks created at each depth D (#chunks(D))
                  U.K.       U.K.-North       Glasgow        equals the number of possible combinations between the
                  0.1         0.1.2          0.1.2.3
                                                             members of the pivot levels. That is,
                             U.K.-South       London         #chunks(D) = card(pivot_level_dim1)x …x
                              0.1.3          0.1.3.4
                                              Cardiff        card(pivot_level_dimN)
                                             0.1.3.5         where card() denotes the cardinality of a pivot level. We
 North America    USA         USA-East       New York        assume N dimensions for the cube.
       1          1.2          1.2.4         1.2.4.6
                                                             If at a particular depth one or more pivot-level is a
                                               Boston
                                             1.2.4.7         pseudo-level, then this level does not take part in the
                              USA-West      Los Angeles      chunking. This means that we don't define any new ranges
                               1.2.5         1.2.5.8
                                                             within the previously defined range for the specific
                                            San Francisco    dimension(s) but instead we keep the old one with no
                                              1.2.5.9        further refinement. In our example this occurs at D = 3 for
                             USA-North          Seattle
                              1.2.6          1.2.6.10        the PRODUCT dimension. In the case of a pseudo level for
      Asia        Japan       Kiusiu          Nagasaki       a dimension, in the above formula we use the pivot level
       2           2.3        2.3.7          2.3.7.11
                              Hondo             Tokyo
                                                             of the previous step for this dimension.
                              2.3.8          2.3.8.12        The procedure ends when the next levels to include in the
                                             Yokohama        pivot set are the grain levels. Then we do not need to
                                             2.3.8.13
                                                Kioto        perform any further chunking because the chunks that
                                             2.3.8.14        would be produced from such a chunking would be the
                  India       India-East       Calcutta      cells of the cube. In this case, we have reached the
                  2.4           2.4.9        2.4.9.15
                                             New Delhi       maximum chunking depth Dmax. Note that with this
                                             2.4.9.16        scheme, we handle chunks and cells in a completely
                              India-West       Karachi
                               2.4.10       2.4.10.17        uniform way in the sense that the cells of a chunk at depth
                                               Bombay        D = d represent the chunks at depth D = d+1. Depth 3 is
                                            2.4.10.18        the maximum depth in our running example, since at the
                                                             next step we hit the grain levels of the dimensions.
The definition of such a range for each dimension defines    If we interleave the member codes of the pivot level
a chunk. For example a chunk defined from the 2,1            members that define a chunk, then we get a code that we
members of the pivot levels continent and category           call chunk id. This is a unique identifier for a chunk within
respectively, consists of the following grain data           a cube in our model. Moreover, this id depicts the whole
(LOCATION:2.[3-4].[7-10].[11-18],
                                                             path of a particular chunk. Let's look at the previously
PRODUCT:1.2.P.[4-5]). The '[]' notation denotes a
                                                             defined chunk at D = 2 from the pivot level members
range of members. This chunk appears with gray in Figure
                                                             LOCATION:2.3 and PRODUCT:1.2. For an interleaving
5 at D = 1. Ultimately at D = 1 we have a chunk for each
                                                             order O = (LOCATION, PRODUCT) (major-to-minor from
possible combination between the members of the pivot
                                                             left-to-right), the chunk id in question is 2|1.3|2, with
levels, that is a total of [0-1]x[0-2] = 6 chunks in this
                                                             “|” character acting as a dimension separator. This id
example.
                                                             describes the fact that this is a chunk at depth D = 2 and it
Next we proceed at D = 2, with PVT =
                                                             is defined within chunk 2|1 at D = 1 (parent chunk).
{LOCATION:country,          PRODUCT:type}       and we
                                                             Finally, the cells of the cube also have chunk ids, since as
recursively re-chunk each chunk of depth D = 1. This time
                                                             we have already mentioned, we can consider them as the
we define ranges within the previously defined ranges. For
                                                             smallest possible chunks. For instance, the cell with
example, on the range corresponding to continent
                                                             coordinates             (LOCATION:0.1.2.3                and
member 2 that we saw before, we define discrete ranges
                                                             PRODUCT:0.0.P.1), can be assigned the chunk id
corresponding to each country of this continent (i.e. to
                                                             0|0.1|0.2|P.3|1. The part of a chunk id that is
each member of the country level, which has parent 2).
                                                             contained between dots and corresponds to a specific
Let's look at the chunk defined from the 2.3, 1.2
                                                             depth D is called D-domain.
members of the pivot levels country and type
respectively. It consists of the following grain data




N. Karayannidis, T. Sellis                                                                                           10-6
                                           C ube
                                                                                                                     devised a unique identifier for each chunk within a cube,
   PRO DUCT                                                                                                          called chunk id, chunks should be made addressable by
        [0..5]                                                                                                       their chunk id. We have seen that the hierarchical
                                                                                                                     chunking method described previously results in chunks at
                                           [0 ..1 8 ]                                                                different depths (Figure 5). One idea would be to use the
D = 1                                     L O C A T IO N                                                             intermediate depth chunks as directory chunks that will
                                                                                                                     guide us to the Dmax + 1 depth chunks containing the data
         [4..5]
                                                                                                                     and thus called data chunks. This is depicted in Figure 6
                                                                                                                     for our example cube.
         [0..3]
                                                                                                                     In Figure 6 we have expanded our hierarchically chunked
                     [0..5]                [6 ..1 0 ]          [1 1 ..1 8 ]
                                                                                                                     cube, the chunk sub-tree under the root-chunk cell with
D = 2
                                                                                                                     chunk id 00. Above each chunk we note its chunk id. We
         [4..5]                                                                                                      can see the directory chunks containing “pointer” entries
         [2..3]                                                                                                      that lead to larger depth directory chunks and finally to
         [0..1]
                                                                                                                     data chunks.
                   [0..2]      [3 ..5 ]        [6 ..1 0 ]      [1 1 -1 4 ] [15 -1 8 ]                                In general, a chunk sub-tree consists of some directory
                                                                                                                     chunks and some data chunks. In Figure 7, we depict the
D = 3
                                                                                                                     structure of a bucket. It is composed of three parts: the
         [4..5]                                                                                                      bucket header and two vectors for storing chunks, one for
         [2..3]
                                                                                                                     the directory chunks and one for the data chunks. In the
                                                                                                                     same figure we can see the implementation of a bucket
         [0..1]
                                                                                                                     over an SSM record.
                  [0] [1..2 ] [3 ] [4 ..5 ]        [6 ..7 ] [8 ..9 ] [10 ] [1 1 ][1 2 ..1 4][1 5..1 6 ][17 ..1 8 ]
                                                                                                                     Chunk vectors are essentially arrays of chunks with the
                                                                                                                     capability of handling variable size chunk entries.
        Figure 5: The cube from our running example                                                                  Actually, the in-memory structure used for a chunk vector
                   hierarchically chunked                                                                            is the C++ STL vector container [STL99]. Prior to disk
                                                                                                                     storage, we “pack” the whole memory vector to a byte
The formal definition of a chunk Ch of a cube C, is given                                                            stream and then we store it in secondary media.
from the following triplet:                                                                                          The basic idea in this file organization is to try to include
                    Ch ≡ (PL,MB,D)                                                                                   in the same bucket as many chunks of the same family
PL is the set of pivot levels that generated this chunk, MB                                                          (i.e. sub-tree) as possible. The incentive behind this lies in
is the set of members, one from each pivot level, that                                                               the hierarchical nature of OLAP query loads. By imposing
define –through member hierarchies- the grain level                                                                  this “hierarchical clustering” of data we aim at
ranges on each dimension of the chunk and D is the                                                                   improving query response time by reducing page accesses
chunking depth of the chunk. For example the grayed                                                                  significantly.
chunk at D=1 of Figure 5 is defined as Ch =                                                                          The order in which chunks are laid out in their
({LOCATION:continent, PRODUCT:category }, {2,1},                                                                     corresponding vector is as follows: When we have to store
1}. A cell is a chunk where PL contains all the grain levels                                                         a sub-tree in a bucket, we descend the sub-tree in a depth-
and D = Dmax + 1.                                                                                                    first manner and we store each chunk the first time we
Next we will see how the chunks of Figure 5, at D = 3 can                                                            visit it. The chunk is stored to one of the two vectors,
be stored into the buckets provided by the underlying file                                                           depending whether it is a directory or a data chunk. Parent
system.                                                                                                              cells are visited in the lexicographic order of their chunk
                                                                                                                     ids, thus their corresponding child chunks are stored
4.3 Mapping of chunks into buckets                                                                                   accordingly. The discrimination between directory and
We will begin our discussion with a description of the                                                               data chunks is done based on the depth depicted on the
internal organization of a bucket, which is our basic chunk                                                          length of the chunk ids. In Figure 6 we show the
container. In order to store chunks into buckets, we will                                                            corresponding index value for each directory and data
need some sort of an internal directory that will guide us                                                           chunk respectively.
to the appropriate chunk. Moreover, since we have




N. Karayannidis, T. Sellis                                                                                                                                                    10-7
                                                             R oot C h un k
                                                                                                                                               4.3.1 Chunk Internal Organization
                                               1
               PRO DUCT
                                                                                                                     D = 1                     The data structure used for implementing a chunk is the
                                               0                                                                                               multidimensional array (md-array). Multidimensional
                                                                                                                                               arrays are very similar in concept with cubes in the sense
                                                             0               1           2
                                          00                                                                                                   that values are accessed by specifying a coordinate (index
                                                                 L O C A T IO N
          in d e x : 0            1                                                                                  D = 2                     value) on each dimension. Moreover, we have seen that
                                  0                                                                                                            each chunk corresponds essentially to a data point in the
                                          0        1                                                                                           multi-dimensional multi-level data space. The chunk id
                           00.01                                 00.11                       in d e x : 4                                      that we have assigned to each chunk, contains both
in d e x : 2           P                                 P                                             D = 3 (M a x D ep th )                  information regarding the specific level and coordinate
                           0      1                              2           3                                                                 (i.e. member) within a level for each dimension of the
                              00.00                               00.10
                                                                                                                                               cube that a chunk corresponds. Thus, the access-by-
in d e x : 1           P                                 P                                   in d e x : 3
                                                                                                                                               location and not by-content that is offered by md-arrays,
                              0       1                           2              3
                                                                                                                        G r a in le v e l
                                                                                                                                               is native to our case and gives us the chance to exploit
          in d e x : 2                in d e x : 3                        in d e x : 6                in d e x : 7
                                                                                                                        (D a ta C h u n k s)
                                                                                                                                               chunk ids. Moreover, exactly because of the address
          00.01.0P                    00.01.1P                        00.11.2P                        00.11.3P
                                                                                                                                               computing accessing, we don’t have to store the chunk id
          3                       3                                   3                           3                                            for each cell, as would have been the case in a record-
          2                       2                                   2                           2                                            oriented storage manager, where the coordinates of the
                   0                      1        2                             3                       4       5                             cell would have also been stored with the fact values. In
          00.00.0P                    00.00.1P                        00.10.2P                        00.10.3P
                                                                                                                                               addition, the simple offset computation needed in order to
          1                       1                                   1                           1                                            access an md-array cell is very efficient.
          0                       0                                   0                           0                                            Clearly, there are several issues that need to be dealt with
                   0                      1        2                             3                       4       5                             caution concerning md-arrays. For one, not to waste space
               in d e x : 0               in d e x : 1                       in d e x : 4                in d e x : 5
                                                                                                                                               when one has to store a sparse chunk, or for another one,
                                                                                                                                               to choose such an ordering to set the cells out that will
Figure 6: The whole sub-tree up to the data chunks under                                                                                       minimize dispersed data in range queries.
                       chunk 00
                                                                                                                                                        660 UHFRUG
                                                                                                                                                                                                 660 UHFRUG ERG\
To increase space utilization we have imposed a bucket                                                                                                  KHDGHU



occupancy threshold B. A typical value for B could be
50%. We distinguish four different cases regarding the
storage of a sub-tree inside a bucket. In a bucket we can
store:
     a) A single sub-tree of chunks.
                                                                                                                                                     %XFNHW +HDGHU   'LUHFWRU\ &KXQN 9HFWRU      'DWD &KXQN 9HFWRU
     b) Many sub-trees of chunks that form a cluster (or
          bucket region).
     c) A single data chunk.
     d) A single tree of directory chunks (root bucket).                                                                                                     Figure 7: The structure of a bucket
The first case occurs when a sub-tree’s size falls in the
range between B and the bucket size. The second case                                                                                           Another argument against would be that md-arrays are not
occurs, when a sub-tree’s size is below B. Then, we look                                                                                       so flexible with deletions and updates, because a cell
for other sub-trees with the same property and we “pack”                                                                                       rearrangement might be needed. However, since we only
them all in one bucket, calling this grouping of sub-trees a                                                                                   aim at incremental bulk updating and not transaction
cluster or a bucket region. The third case refers to the                                                                                       oriented updating that would require very frequent
situation where we have descended the chunk-tree, we are                                                                                       reorganization of each chunk, we thought that we
unable to find a sub-tree that can fit in a bucket, and have
                                                                                                                                               shouldn't impose the overhead of using complicated
finally hit a leaf (i.e. a data chunk). In this case, either we
store the entire data chunk in a bucket, or, if it still does                                                                                  structures based on linked lists that would also slow down
not fit we partition it and store it in a bucket overflow                                                                                      query processing. These were the major justifications for
chain. Last is the case of a bucket used for storing the root                                                                                  our design choice.
chunk and also all the “roots” of sub-trees that are stored                                                                                    Note that, we don’t allocate all the cells for a data chunk,
in other buckets. This is called the root bucket. In case of                                                                                   just the non-empty ones. Usually the data cube is sparse,
an overflow of the root bucket, we resort to a bucket                                                                                          so it is reasonable to assume that most of the chunks will
overflow chain again.                                                                                                                          also be sparse. We have used a simple compression
                                                                                                                                               method that helps us keep track of the “holes” of each
                                                                                                                                               data chunk. In particular, we maintain a bitmap for each




N. Karayannidis, T. Sellis                                                                                                                                                                                              10-8
data chunk indicating which cell is empty and which is            5.1 Primary access operations to cube data
not.
                                                                  In previous sections we have seen that the data space is
Since, all the information of data existence is kept in the
                                                                  modeled as a hierarchy of chunks (refer to Figure 6). At
compression bitmap, we can allocate space only for the
                                                                  the bottom of this hierarchy lie the actual data of the most
non-empty cells and still be able to reach a cell on disk
                                                                  detailed level, contained inside data chunks. Each chunk is
with just an algebraic computation.
                                                                  assigned a chunk id, depicting its location with respect to
The “compression” that we apply on directory chunks is
                                                                  the dimensions and to the hierarchy levels.
somewhat different. Likewise, we might find many cells
                                                                  At the access manager level the access to a cube begins
with no values. In this case however, an empty cell
                                                                  with the instantiation of a special Cube class. This class
corresponds to the absence of a whole sub-tree of chunks.
                                                                  implements the notion of the current position in the cube
For the directory chunks we allocate all the cells for
                                                                  file. It simulates a “pointer”, which points to the current
chunks that contain at least one non-empty cell and we
                                                                  cell of the current chunk in the hierarchy, which resides in
mark empty cells with a special value. However, no
                                                                  the current bucket of the file hosting the cube’s data.
allocation is done for empty sub-trees. Therefore large
                                                                  An instantiation of this class generates an in-memory
families of chunks that end-up to many data chunks and
                                                                  representative of a cube, which normally resides on disk.
are empty will not consume any space.
                                                                  For each cube “opened” for access, it is sufficient to keep
Finally, we briefly refer to the issue of maintenance. As
                                                                  a pointer to an in-memory instance of the root chunk. This
already mentioned before, an OLAP environment is
                                                                  discriminates one cube from another, as well as can
heavily inclined to read operations than it is to transaction
                                                                  provide access to all the cube’s data. The Cube
oriented updates. Moreover, deletions are significantly
                                                                  instantiation is achieved with the operation open_cube.
rare in OLAP and data warehousing in general, since we
                                                                  This operation returns a pointer to an instance of the Cube
are always interested on the history of our data. We
                                                                  class. open_cube searches by the cube name in the
therefore, anticipate mainly incremental batch updates. A
                                                                  SISYPHUS catalog and retrieves an appropriate
typical situation that falls in this category is the loading of
                                                                  CubeInfo structure containing the cube’s meta-data.
new data at the end of some time period (e.g. day).
                                                                  Then, it accesses the underlying SSM file dedicated for
However, there might be other less frequent updates, such
                                                                  this cube, retrieves the root bucket and creates the
as the sales for some new product category, etc.
                                                                  corresponding Bucket object. Finally, it creates a Cell
In the chunk-oriented file system the advent of e.g. the
                                                                  object with coordinates set by default to 0 for all
sales of a new day, would trigger the need for creating the
                                                                  dimensions of the cube.
chunks corresponding to the current month. Therefore, we
                                                                  A Cube instance has a state that is characterized from the
have to spot the directory chunks that contain an empty
                                                                  values that are stored in its members. A change in this
cell entry corresponding to this month. Then we have to
                                                                  state implements a “move” from the current position in the
remove the empty tag from the respective cells and "hang"
                                                                  multi-dimensional multi-level space. There are four basic
the new sub-tree. Each new sub-tree will be stored either
                                                                  operations offered by the access manager level for
in the same bucket as its “parent” chunk or if there is no
                                                                  achieving this. Namely these are: move_to(),
space, in a new bucket allocated for it. This will not result
                                                                  get_next(), roll_up(), drill_down(). In addition,
to poor bucket space utilization, even if the new sub-tree’s
                                                                  there is a read() operation for retrieving the content of
size is below the bucket threshold B. This is because we
                                                                  the cell at the current position, and a write() operation
will use the new bucket in the future to store more new
                                                                  for updating the current position’s entry, only if this
sub-trees corresponding to the other months following up
                                                                  position corresponds to a data chunk cell. These
and thus form a bucket region.
In the next section we will discuss the issue of the access       operations enable seamless navigation in the cube data
interface provided by our chunk oriented file system.             space and access to any cell of the hierarchically chunked
                                                                  cube. We discuss them in more detail next.
                                                                  move_to operation
5 Access Paths                                                    The primary goal of this operation is to provide an easy
In this section we will look in more detail the access            way to navigate in the hierarchy enabled multidimensional
manager abstraction level of Figure 1. Essentially, the           space, exploiting the chunk id representation that we have
basic operations offered by this module play the role of          proposed. In particular, this method receives as input a
the data access interface of SISYPHUS. As mentioned               chunk id corresponding to a specific point in our data
earlier, the primary responsibility of the access manager is      space (i.e. cell), that we would like to set as the current
to provide the illusion of a multi-dimensional and multi-         position. The outcome of this operation is a change to the
level space of cube cells, a space that represents naturally
                                                                  state of the corresponding Cube object, in order to reflect
the OLAP data space. Moreover, we will see that through
                                                                  the new position in the cube file.
this set of primary access operations more elaborate
access paths on cube data can be defined.                         roll_up & drill_down operations
                                                                  These two operations provide the ability to navigate along
                                                                  the chunk hierarchy. With the former, we "roll up" to the



N. Karayannidis, T. Sellis                                                                                               10-9
parent cell of the current cell, and with the latter we "drill      and then it will provide a “next” operation for iterating
down" to the child chunk node and set the current position          through the values falling into this range. The range will
to the first non-empty cell of this chunk.                          be provided in the form of a chunk id, thus it refers to the
get_next operation                                                  data cells in the leaves (i.e. data chunks) of a specific sub-
The get_next operation provides an enumeration facility             tree hanging from this point.
for visiting the cells at a specific depth in the                   Range-scan is made up of three methods: Open, Next and
hierarchically chunked cube space. Actually this is an              Close. Open is responsible for the opening of the cube
overloaded method. There are two flavors of get_next.               for data access and positioning the current cell at the first
The first form of this method offers cell enumeration               data cell in the specified range. This can be easily
along a certain dimension. The desired dimension is                 achieved with a call to open_cube for initializing data
specified through its position in the interleaving order.           access to the cube, then a call to move_to for changing
For example, if the interleaving order is (LOCATION,                the current position in the cube file to the location
PRODUCT), then by position 0 we mean LOCATION and by                represented by the input chunk id and finally repeatedly
1 PRODUCT. We can get to the “next” cell from the current           drilling down (i.e. calling access operation drill_down)
position along a dimension D, if we simply move on to the           until we hit the first non-empty data cell in the specified
next member in the domain of level L of D that                      range.
corresponds to the current chunk. Note that the “next               Method Next returns the data entry at the current position
member” has a twofold meaning in this case. It might                and advances to the next cell. If the next cell is out of
mean that we have to increase by one the corresponding              range, or if we have reached the end of data, it fails. The
order code, or that we have to decrease it by one, thus             preservation of the range limits is guaranteed through the
obtaining essentially the “previous” cell. The input                chunk id prefixes, denoting chunks of the same sub-tree.
arguments consist of the dimension along which we will              This can be easily implemented with two calls to
move and a direction specification with possible values             operations read and get_next respectively and with a
“above” (default value) and “below” corresponding to                subsequent check of whether the “new” position’s chunk
the two aforementioned cases.                                       id is not prefixed by the input chunk id or we have
The second form of get_next receives no input                       reached the end of file. Note also that due to the
arguments. It enables an enumeration of the cells at a              hierarchical clustering imposed, the retrieved data points
specific depth in the order of physical storage. A call to          are very much likely to reside in the same bucket. Thus,
this get_next will place the current position at the next           this should be quite an efficient operation. Finally, Close
stored non-empty cell within the current chunk. This new            invokes the close_cube method to perform all cleaning
cell will have the “next” chunk id in the lexicographic             up tasks.
order, corresponding to a non-empty cell. When we reach             This was a rather simple case of an access path. However,
the end of the current chunk we move to the next stored             other more elaborate access methods can be defined in a
chunk of the same depth in the current bucket. Recall from          similar way. For example a range-scan that operates on an
section 0 that chunks are stored in one of the two bucket           arbitrary range and not only on the range defined by a
vectors in the lexicographic order of their chunk ids (see          specific sub-tree. Or, a range-scan-sort could be defined,
also Figure 6). When there are no more chunks with the              in order the returned values to be sorted along a
desired depth in the current bucket we advance to the next          dimension and so on.
stored bucket in the cube (i.e SSM file).
Finally, there is a read, write, and close_cube                     6 Conclusions and future work
operation with the obvious meanings.                                In this paper we have focused on the special requirements
                                                                    posed by OLAP applications on storage management. We
5.2 Defining Access Methods                                         have argued that conventional record-oriented storage
Next, we will give an example of how the primary                    managers fail to fulfill these requirements to a large
operations of the previous sub-section can be used in               extend. To this end, we have presented the design of a
order to create access paths2 to cube data. These access            storage manager specific to OLAP cubes, based on a
paths actually correspond to the topmost abstract level of          chunk-oriented file system, called SISYPHUS.
Figure 1.                                                           SISYPHUS has been implemented on top of a record
In our example, we will define a very common access                 oriented storage manager [SSMP97] and provides a set of
method for multi-dimensional data, the Range-scan. The              typical to storage management abstraction levels, which
operator will be defined with an iterator interface [Gr93].         have been modified to fit the multidimensional, hierarchy-
Essentially this means, that it will receive as input a range       enabled data space of OLAP.
                                                                    We have seen the hierarchical chunking method used in
2
                                                                    SISYPHUS and the corresponding file organization
    The terms access path and access method are identical for our   adopted. The chunk-oriented file system offered by
discussion and will be used interchangeably.                        SISYPHUS is natively multi-dimensional and supports




N. Karayannidis, T. Sellis                                                                                                 10-10
hierarchies. It clusters data hierarchically and it is space           Definitions. 1997. Available at
conservative in the sense that copes with cube sparseness.             http://www.olapcouncil.org/research/glossaryly.h
Also, it adopts a location-based data-addressing scheme                tm
instead of a content-based one. Finally, we have seen the
data-access interface provided by SISYPHUS that enables        [RoKoRo97] N. Roussopoulos, Y. Kotidis, and M.
navigation in the multi-dimensional and multi-level data             Roussopoulos. Cubetree: Organization of and
space of a cube. This interface can be used for defining             Bulk Incremental Updates on the Data Cube. In
more elaborate cube-oriented access paths.                           Proceedings of the ACM SIGMOD International
In the future, we plan to extensively test experimentally            Conference on Management of Data, p.89-99,
the proposed file organization. In addition we will design           Tuscon, Arizona, May 1997.
and implement algorithms for typical OLAP operations.
From the viewpoint of research, several issues remain          [Sa97] S. Sarawagi. Indexing OLAP data. IEEE Data
open such as: finding optimal clusters (i.e. bucket regions)            Engineering Bulletin, March 1997.
for a specific workload, developing efficient file system
operations for typical OLAP updating loads (e.g. slowly        [SaSt94] S. Sarawagi and M. Stonebraker. Efficient
changing dimensions). Finally, open remains the issue of
                                                                       Organization of Large Multidimensional Arrays.
an efficient file organization for dimension data.
                                                                       Proc. Of the 11th Int. Conf. On Data Eng., 1994.
Acknowledgements
                                                               [SSMP97] The Shore Project Group. The Shore Storage
This work has been partially funded by the European                  Manager Programming Interface. CS Dept.,
Union's Information Society Technologies Programme                   Univ. of Wisconsin-Madison, 1997.
(IST) under project EDITH (IST-1999-20722).
                                                               [STL99] Standard Template Programmer’s Guide.
7 References                                                           Available at:
[ChIo99] C.-Y. Chan, Y. Ioannidis. Hierarchical Cubes                  http://www.sgi.com/Technology/STL/index.html
        for Range-Sum Queries, In Proc. of the 25th
        International Conference on Very Large Data            [VaSk00] P. Vassiliadis, S. Skiadopoulos. Modelling and
        Bases, Edinburgh, UK, 1999.                                   Optimization Issues for Multidimensional
                                                                      Databases. In Proc. 12th Conference on
[Co96] G.Colliat. Olap relational and multidimensional                Advanced Information Systems Engineering
        database systems. SIGMOD Record, 25(3):64-                    CAiSE '00), pp. 482-497, Stockholm, Sweden, 5-
        69, Sept 1996.                                                9 June 2000.

[DeRaSh+98] P. Deshpande, K. Ramasamy, A. Shukla,
       J. Naughton. Cashing multidimensional Queries
       using Chunks. Proc. ACM SIGMOD Int. Conf.
       On Management of data, 259-270, 1998.

[Gr93] G.Graefe. Query Evaluation Techniques for Large
        Databases. ACM Computing Surveys 25(2),
        1993.

[GrRe93] J.Gray, A. Reuter. Transaction Processing:
       Concepts and Techniques. Morgan Kaufmann,
       1993.

[Ki96] R.Kimball. The Data Warehouse Toolkit, John
         Wiley & Sons, 1996.

[MaRaBa99] V. Markl, F. Ramsak, and R. Bayer.
      Improving OLAP Performance by
      Multidimensional Hierarchical Clustering. Proc.
      of IDEAS Conf., Montreal, Canada, 1999

[OLAP97] OLAP Council. OLAP AND OLAP Server




N. Karayannidis, T. Sellis                                                                                        10-11