=Paper= {{Paper |id=Vol-1558/paper18 |storemode=property |title=Template-based Time Series Generation with Loom |pdfUrl=https://ceur-ws.org/Vol-1558/paper18.pdf |volume=Vol-1558 |authors=Lars Kegel,Martin Hahmann,Wolfgang Lehner |dblpUrl=https://dblp.org/rec/conf/edbt/KegelHL16 }} ==Template-based Time Series Generation with Loom== https://ceur-ws.org/Vol-1558/paper18.pdf
          Template-based Time series generation with Loom

                                    Lars Kegel, Martin Hahmann, Wolfgang Lehner
                                                     Technische Universität Dresden
                                                       01062 Dresden, Germany
                                             {firstname.lastname}@tu-dresden.de



ABSTRACT                                                                 and processing of large sets of time series data. While all
Time series analysis and forecasting are important tech-                 these research endeavors can differ greatly with respect to
niques for decision-making in many domains. They are typ-                their individual goals and application scenarios they have
ically evaluated on given sets of time series that have a con-           one thing in common: they require large amounts of time
stant size and specified characteristics. Synthetic datasets             series data in order to evaluate, verify, and optimize their
are relevant because they are flexible in both size and charac-          findings. Although there are many stakeholders that have
teristics. In this demo, we present our prototype Loom, that             a substantial interest in using and exploiting time series,
generates datasets with respect to the user’s configuration of           acquiring sophisticated data is not easy. Basically there
categorical information and time series characteristics. The             are two sources: First, public open repositories or single
prototype allows for comparison of different analysis tech-              datasets [10], that are tailored to specific applications and
niques.                                                                  only offer a small selection. Second, ”real” data owned by
                                                                         companies/organizations that is sometimes made available
                                                                         to partners in the context of closed research projects but
Categories and Subject Descriptors                                       rarely to the general public. Moreover, obtaining real data
I.6.7 [Simulation and modeling]: Simulation Support                      can be tedious due to the time and cost that is necessary to
Systems; H.2.8 [Database management]: Database Ap-                       collect them [11]. Based on our own experience we can state
plications—Statistical databases                                         that some data is always available, that allows to conduct
                                                                         basic evaluations. This situation normally becomes prob-
Keywords                                                                 lematic when scalability, versatility, and robustness have
                                                                         to be examined. These require a more versatile selection
Time series analysis, Data generation                                    of data, containing datasets with varying size, time series
                                                                         length, trends, seasonality, or just a different blend of time
1.    INTRODUCTION                                                       series characteristics. In general, this is not available which
   Time series describe the dynamic behavior of a monitored              often leads to researchers using workarounds to create more
object, parameter, or process over time and are one of the               data, e.g. duplication to increase the number of time series
most popular and useful data types. They can be found                    or their length.
in a multitude of application domains, e.g. as item sales                   To cope with this problem, we demonstrate Loom which
in commerce, various sensor readings in manufacturing pro-               is a user-friendly and flexible approach to generate sets of
cesses, or as demand and production in the energy domain.                time series for the evaluation of arbitrary analysis techniques
Obviously, this makes them a valuable source for diverse                 or the benchmarking of time series management systems.
data analysis techniques, such as forecasting [8]. Especially            Loom stands for the process of weaving different time se-
in the domain of renewable energy production, where the                  ries generators, datasets of arbitrary size. In addition, users
fluctuating character of renewable energy sources makes ac-              can generate categorical information to structure the time
curate forecasts vital in order to match electricity produc-             series hierarchically. Thus, they form a data cube that can
tion and demand. Further applications on time series data                be explored by usual OLAP queries, such as roll up or drill
include querying, classification, efficient storage and much             down. Generated datasets can be directly exported to re-
more. The ubiquity of this data type and the ongoing trend               lational databases or flat file formats in order to easily uti-
for data collection, storage and analysis have led to a sub-             lize them in different applications. The usage of Loom is
stantial amount of research that is dedicated to the handling            template-driven at its core. This means, given datasets can
                                                                         be analyzed in order to extract a template containing their
                                                                         defining characteristics. These templates are then used to
                                                                         create different variants of datasets that are still similar to
                                                                         the template. This approach eases the application of our
                                                                         tool as users do not have to specify a completely synthetic
                                                                         time series model. In addition, this mechanism offers a cer-
 c 2016, Copyright is with the authors. Published in the Workshop Pro-   tain degree of anonymization for otherwise closed data.
ceedings of the EDBT/ICDT 2016 Joint Conference (March 15, 2016, Bor-
deaux, France) on CEUR-WS.org (ISSN 1613-0073). Distribution of this
                                                                            In the remainder of this article, we present a general sys-
paper is permitted under the terms of the Creative Commons license CC-   tem overview in Section 2, before we describe our demonstra-
by-nc-nd 4.0
               Template              Data cube             Time series                 Time series                   Dataset
               creation              modelling             generation                   mapping                      export

              • Dataset             • Dimensions          • Time Frame                • Model                     • Destination
                                    • Scale Factor        • Time Series                 Frequency                 • Schema
                                                            Models


                                                 Figure 1: Workflow overview


tion in detail in Section 3. Previous work related to dataset        dimension is a lattice of levels L = {l1 , l2 , . . . , lm }. The con-
generation and time series models is presented in Section 4          straint of this lattice states that the values of a level, called
before the concluding remarks and pointers to future work            category attributes functionally determine the values of its
are given in Section 5.                                              parent level, e.g., l0 → l00 . More formally, for each level l, l0 ,
                                                                     l00 of the same dimension:
2.   SYSTEM OVERVIEW
                                                                                   l→l                                   (reflexivity)
  The main workflow of Loom is depicted in Figure 1 and                                  0   0              0
shows all steps necessary to create a set of time series. In                       l →l ∧l →l ⇒l =l                  (antisymmetry)
this section, we give an outline of the idea behind each step                      l → l0 ∧ l0 → l00 ⇒ l → l00          (transitivity)
before we describe its implementation in Section 3.
                                                                     For the sake of simplicity, we implemented totally ordered
                                                                     dimensions in Loom; this will be discussed in Section 5. A
Template creation. This optional step at the beginning of            total order has the following additional condition:
the workflow allows the user to upload and analyze given
time series data in order to create templates that can be                                    l → l0 ∨ l0 → l      (totality)
used during the latter steps of the data generation process.
                                                                     As an example, Figure 2 shows the totally ordered dimension
Currently, Loom employs three types of template creation:
                                                                     Geography of Australia with the two levels State and Region.
(1) If present, existing hierarchies of categorical informa-
                                                                     The category attributes of Region functionally determine the
tion are extracted and stored. In order to anonymize the
                                                                     category attributes of State, e.g. Melbourne and Ballarat
data, the original attributes are replaced with synthetic ids.
                                                                     determine Victoria.
(2) Given time series are extracted and taken as samples
                                                                        Three parameters are necessary to configure a data cube
for time series generation. (3) The whole set of time series
                                                                     skeleton: the number of dimensions, the number of levels
is analyzed to create a template that represents its charac-
                                                                     per dimension, and the outdegree of a category attribute.
teristics and can be used to create multiple datasets that
                                                                     While the first two parameters are straightforward, the last
are similar to the original. While the first two types are
                                                                     one needs more explanation. The outdegree of a category
straightforward, the third one is more complex. In order
                                                                     attribute defines the number of subcategories within a cate-
to create the described template, Loom uses an approach
                                                                     gory and thus describes the branching between the levels of
based on the hierarchical divisive analysis clustering (DI-
                                                                     a dimension. To illustrate this, we regard the example from
ANA) [12]. With this method, the dataset is partitioned
                                                                     Figure 2 where we observe an outdegree of 2. This means a
into groups of similar time series. From each of these clus-
                                                                     category attribute on the State level, e.g. New South Wales,
ters, the time series with the lowest average distance to the
                                                                     is related to two category attributes on the lower Region
remaining members is selected as a prototype. By fitting
                                                                     level, e.g. Sydney and Blue Mountains. Category attributes
an analytical time series model, e.g. ARIMA, to this time
                                                                     that have no further subcategories are called base category
series, a generator that represents the characteristics of its
                                                                     attributes and form the leaves of a dimension’s hierarchy.
underlying cluster is created. This generator can be used
                                                                     Thus, a data cube skeleton is the cross-product of all base
to create multiple variations of the original time series. To
                                                                     category attributes of all dimensions.
complete the template, the size of each group in relation to
                                                                        In the energy domain, categorical information is also ben-
the whole dataset is stored. With the collected information
                                                                     eficial because forecast models that are built for categories
it is possible to create multiple datasets that are different
                                                                     may lead to more accurate and robust forecast results. For
but still share the characteristic time series of the original
and their distribution.
                                                                          Top
Data cube modelling. Usually, a set of time series does
not only contain sequences of measured values, but also cat-
egorical information, e.g. geography, purpose, or color. Sets             State                   New South Wales                  Victoria
of these attributes are organized in hierarchies, called dimen-
sions, while sets of these dimensions form the skeleton of a
data cube. The goal of this step is to allow users the con-               Region    Sydney       Blue Mountains       Melbourne    Ballarat
figuration and creation of such cubes. In short, data cubes
can be formally described as follows [17]:
   A data cube skeleton consists of a set of dimensions. A            Figure 2: Dimension with levels State and Region
instance, the Irish Smart Metering Project [1] gathers time       lowing the use of our generated data in almost every appli-
series of smart meters in over 5,000 Irish households and         cation. In addition, we offer export as RData which is the
businesses. In a survey, the owners give additional informa-      data format data.table for the popular statistical workbench
tion, such as Social class, House type and Age of the house.      R [3]. The database export transforms the created data into
We identify 8 dimensions, each of whose consisting of one         fact tables that can be imported into any RDBMS. These
dimension level. Thus, forecast models can be created for         tables have one time and at least one measure column, cat-
individual time series or aggregated time series along one or     egorical information may be stored in fact tables as well as
more dimensions.                                                  in dimensional tables. As we deal with a high amount of
                                                                  structured data, it is necessary to bring the dataset into an
Time series generation. The data cube skeleton that was           appropriate schema depending on the chosen export option.
configured and created in the previous step must now be
filled with facts, which in the case of our system are time se-
ries. In the following, we again give a quick formalization of    3.    DEMONSTRATION
time series and explain what is necessary to configure their         This section demonstrates the usage of Loom. Starting
creation. A time series is a sequence of successive observa-      the application, the user sets a workspace directory that
tions xt (1 ≤ t ≤ T ) recorded at specified time instances        is used for configuration files and generated datasets. Be-
t. For this demonstration, we assume that observations are        low, we describe the configuration of a template, a dataset
complete and equidistant, i.e. there exists an observation        and further generation steps. These steps correspond to the
for every time instance and all time instances have the same      workflow shown in Figure 1.
distance. For configuration a time frame must be defined
that consists of start and end time instance as well as the       3.1   Template configuration
distance between time instances. In addition one or more             The optional template configuration annotates a user given
measure columns must be defined, depending whether uni-           dataset with information that is further needed for the tem-
variate or multivariate time series should be generated. To       plate creation. The user inputs a CSV file and annotates
fill these measures with actual values, our system uses time      each column with the corresponding semantic (time, mea-
series models and samples.                                        sure or category). Moreover, the time column needs infor-
    As an example, the user can create a synthetic model with     mation about the time type (integer, date, time) and the
a base bt , season st and error component et :                    respective format. As shown in Figure 3, the user configures
                   xt = bt · s(t mod L) + et                      dimensions by Drag and Drop of the category names. Op-
                                                                  tionally, the user may indicate the primary attribute that
The season length is given by L. The component st is              is the lowest level category in every dimension. Once this
a seasonal mask of length L. Thus, the weights repeat             configuration is finished, the template may be used for time
every L time instances. The error component et is nor-            series and/or data cube generation.
mal distributed N (0, σ 2 ) and overlays the “perfect” model
x?t = bt · s(t mod L) . The standard deviation σ depends on
the user’s accuracy expectation, expressed as mean average
percentage error (MAPE):
                   T                T
                1 X x?t − xt     1 X          et
    M AP E =          |      | =      |                |
                T t=1   x?t      T t=1 bt · s(t mod L)

The error distribution is calculated such that the user given
M AP E holds on average of the whole time series.
   Datasets may also be created from template. During tem-
plate creation, a set of ARIMA models is created that is
based on user given data. Synthetic time series are gener-
ated by those ARIMA models, incorporating normally dis-
tributed errors or sampled errors from the template [4].

Time series mapping. The mapping links time series to
the data cube, i.e. populates the skeleton with facts. As
manual insertion of thousands of time series into a large
data cube would not be feasible, Loom features an automatic
mapping that randomly distributes the generated time series
over the data cube skeleton. The user may set the model
frequency by weighting each time series model. If templates
are used, their distribution is used as default.

Dataset export. After the configuration and mapping steps
are done, the actual data is created and can be exported in
a suitable format for further use. Our application offers two
                                                                            Figure 3: Template configuration
different export destinations: either file or database. File
export offers general formats like CSV and SQL script, al-
3.2     Dataset configuration
  After login, the user is greeted by an overview window
that displays all datasets that he/she has already generated
and those that are queued for generation, Figure 4. Clicking
“Create new”, opens a wizard dialog that guides the user
through the configuration. The first input by the user is a
dataset name which is needed to reference and handle the
configuration and its results.

3.2.1    Data cube configuration
  The next dialog, Figure 5, allows the configuration of the
data cube. Loom offers two ways of configuration: template
and synthetic.
   • Template configuration: The user can select a tem-
     plate as basis for the data cube skeleton. Templates
     are derived from real world datasets or from existing
     data cubes and are ready-made configurations featur-
     ing default values for number of dimensions, number
     of levels and outdegrees. Thus, this type of config-                   Figure 5: Data cube configuration
     uration is more user friendly than the synthetic one.
     Users can still customize the configuration by making
     selective adjustments to the template. It is possible        attributes and can be used to adapt the size of the data
     to create different variants, e.g. smaller, larger, highly   cube as desired. Thus, the resulting data cube is inflated by
     branched, etc., of an existing data cube.                    this factor.
                                                                    While the parameters and configuration types described
   • Synthetic configuration: Alternatively, this type of con-    in this section provide users with a versatile and comfort-
     figuration allows users the full manual customization        able way of modeling the categorical information of a data
     of the data cube skeleton. Users specify the number of       cube, its configuration is not mandatory. If a user wishes
     dimensions, the number of levels per dimension and the       for a plain set of unlabeled data, only a primary attribute is
     outdegree per level. These parameters can be provided        generated to identify the created time series.
     fixed for each element or the whole cube. In addition,
     Loom offers a random parameter distribution, allow-
     ing a probabilistic setting with randomly structured
     dimensions.
   Both template and synthetic configurations, employ ran-
dom distribution of outdegrees to a certain extent. This
means the specific number of facts that can be accommo-
dated by the data cube skeleton is not known during con-
figuration. As the user must know the actual structure of
the data cube in order to configure an appropriate number
of time series, our system offers a preview of the data cube
skeleton directly after configuration. This preview is de-
picted in Figure 6 and shows all category attributes ordered
by their size. The user may restart the modeling or accept
the generated result.
   In many benchmarks such as TPC-H [2], it is common to
set a scale factor SF of a database size, such as 10, 30, 100.
Loom adopts this functionality by using a scale dimension
that consists of exactly one level with SF ≥ 1 category



                                                                       Figure 6: Preview of generated dimension


                                                                  3.2.2    Time series configuration
                                                                    Time series configuration is split into three separate di-
                                                                  alogs: time attribute, measure attributes, and time series
                                                                  models. The time attribute needs parameters such as the
                                                                  time type and the timeframe, i.e. start time, granularity,
                                                                  end time.
              Figure 4: Dataset overview                            In the measure dialog, the user sets the number of measure
columns of the dataset. For each measure attribute, the user      dimensions, the star and snowflake schema only support un-
sets a data type. Usually, a measure is of double-precision       balanced dimensions when referential integrity can be guar-
floating-point format but in order to decrease dataset size       anteed. To achieve this, the user may add a primary at-
in a database, the user can also set single-precision floating-   tribute column and a minimum outdegree. If the user did
point or integer format.                                          not set a data cube skeleton, then a primary attribute is
   Most importantly, time series models have to be added to       automatically generated.
the configuration. For this, Loom offers four options:
                                                                    Top
   • Sampled from Template: All time series data is gen-
     erated “as is” by taking values from given time series
     from existing datasets uploaded by users. According                                  Northern         Australian Capital
                                                                    State
                                                                                          Territory        Territory
     to the timeframe configuration data is extracted from
     the original time series and added to the generated one.
     Mismatches with the timeframe or data cube size are                                                   Australian Capital
                                                                    Region    Darwin   Alice Springs
     resolved using duplication, cutting, granularity conver-                                              Territory
     sion etc.
                                                                   Figure 7: Unbalanced dimension with two levels
   • Recombined from Template: Time series are created
     via decomposition and recombination. Classical de-
     composition strategies like decompose [13] and stl [6]       3.3       Table generation
     are used to extract the defining components trend, sea-         Closing the configuration window brings the user back to
     sonality, and noise from an existing set of time series.     the initial overview where a new entry has been added, see
     By recombining these components, new time series are         Figure 4. Now, the generation process can be invoked by
     created and can be used to add volume and variety to         clicking Start Selected“. The state switches to In Progress“
     a dataset.                                                            ”                                     ”
                                                                  and indicates the current amount of data that has been gen-
                                                                  erated.
   • Modeled from Template: This option uses the third
     type of templates we described in Section 2. Users
     can load the time series generators and their distribu-      4.    RELATED WORK
     tions of an existing dataset. Customization is possible         Workload generation has been studied in many papers
     by changing the number of time series an individual          each of which focus either on the generation process or time
     generator creates or by removing/adding certain gen-         series modelling. To our knowledge, Loom is the first appli-
     erators.                                                     cation that integrated both techniques and allows for flexible
                                                                  data cube and time series characteristics. In the following,
   • Synthetic time series: The user configures a time series     we present selected sources that relate to our prototype.
     model from scratch, without relying on any given mea-           The IDAS dataset generator [11] offers data generation
     sures. Time series properties are defined freely and are     based on statistical distributions. Attributes form depen-
     synthetically generated, e.g. with a linear rising trend,    dency graphs that are not necessarily lattices. The goal is
     a regular gauss-shaped seasonal component, and a nor-        the creation of a synthetic dataset for testing data mining
     mally distributed error series.                              algorithms. The workflow is similar to Loom since the user
                                                                  creates a dataset by specifying the number of tables, setting
3.2.3    Export configuration                                     the attributes and initiating the data generation. Moreover,
   In the last dialog, the user sets the export configuration.    the authors experienced similar shortcomings with real data,
The export destination has different options. CSV, SQL            such as privacy issues, a lack of training data or unsatisfy-
and RData create flat files within the workspace. Alter-          ing categorical information. Still, measures do not depend
natively, time series are exported to a database via JDBC         on time, thus time series generation is not supported.
driver. Thus, the user sets up a database connection with            Schaffner and Januschowski [14] focus on benchmarking
a database location and login credentials. Finally, the user      of databases under varying request rates. Request rates can
sets the schema. Loom supports different schemas: (1) Ba-         be seen as time series of aggregate tenant traces. Since there
sic unnormalized export in a Universal schema, which cre-         is not enough real data available, they provide two method-
ates high redundancy as it stores the categorical information     ologies for generating synthetic tenant traces. (1) The mod-
with each value of a time series. (2) The partly resp. fully      elling approach fits a function as a model for a given ag-
normalized Star and Snowflake schema, which allows more           gregate tenant trace. The function’s shape has been deter-
compact exports and are common in database design. (3)            mined empirically. By adding an error term, they create
The Parent-Child schema, which stores each functional de-         diversity among the synthetic tenant traces. (2) Another
pendency as a pair of parent attribute and child attribute in     way is the decomposition of time series by bootstrapping.
the respective dimension table.                                   Thus, a given trace is split into windows that are randomly
   Particular attention has to be paid to the export when         shifted and result in synthetic traces. Both approaches are
dimensions are unbalanced, i.e. their base categories are         similar to Loom’s template creation in that synthetic time
not on the same level. Figure 7 shows an example for such         series are either modelled or recombined from template.
a dimension, where base category attributes are either on            The F-TPC-H benchmark [7] is a modified TPC-H bench-
the region level (Darwin, Alice Springs) or on the state          mark for time series generation. This work reuses the given
level (Australia Capital Territory). While both the univer-       TPC-H schema in that customers submit orders of products
sal schema and the parent-child schema support unbalanced         for a certain quantity. While this quantity does not depend
on time in TPC-H, the modified F-TPC-H adds dependency               6.   REFERENCES
for representing trend and seasonal effects via ARIMA mod-            [1] CER Smart Metering Project.
els. Thus, this work represents a subset of Loom because it               http://www.ucd.ie/issda/data, 2010.
consists of a given schema and allows for synthetic time se-          [2] TPC Benchmark H. http://www.tpc.org/TPC_
ries in the sales domain. Loom also supports schema flexibil-             Documents_Current_Versions/pdf/tpch2.17.1.pdf,
ity and allows for composing different time series generators.            2014.
   A specific use case for managing energy data is given by
                                                                      [3] R data.table package. https://cran.r-project.org/
[16]. This work proposes a unified Data Warehouse schema
                                                                          web/packages/data.table/index.html, 2015.
for storing workloads, given as information about actors like
                                                                      [4] R forecast package. https://cran.r-project.org/
producers and consumers, offers, and time series about past
                                                                          web/packages/forecast/index.html, 2015.
measures. Their time series schema involves measures from
different types such as energy, power and price. Categorical          [5] G. E. P. Box and G. M. Jenkins. Time series analysis
information is necessary in order to store special annotations            forecasting and control. Holden-Day, San Francisco,
such as aggregation level of time series or additional infor-             1970.
mation for each time series type. A time series is represented        [6] R. B. Cleveland, W. S. Cleveland, J. E. McRae, and
by several tables: (1) a time series table stores the primary             I. Terpenning. STL: A Seasonal-Trend Decomposition
attribute that identifies the time series and that links to each          Procedure Based on Loess. Journal of Official
category, (2) another dimensional table stores time frames                Statistics, 6:3–73, 1990.
with an identifier and the resp. time frame information, (3)          [7] U. Fischer. Forecasting in database systems, 2014.
the fact table itself consists of the primary attribute, a mea-       [8] U. Fischer, F. Rosenthal, and W. Lehner. F2DB: The
sure column and a foreign key to the time frame. Thus,                    Flash-Forward Database System. In ICDE, pages
this schema is not a traditional star of snowflake schema                 1245–1248, 2012.
and cannot directly be covered by Loom. Moreover, Loom                [9] C. C. Holt. Forecasting trends and seasonal by
keeps time and measure together as fact. This may increase                exponentially weighted averages. Office of Naval
redundancy but we opt for this solution for several reasons:              Research Memorandum, 52, 1957. Reprinted in:
(1) there is no need for an additional description of time                International Journal of Forecasting, 20(1):5-10, 2004.
frames, (2) a time frame is encoded either as an integer or a        [10] R. J. Hyndman. Time series data library.
short string, thus the space for storage is still affordable, (3)         http://data.is/TSDLdemo. Accessed on 9-24-15.
there is no join operation needed in order to retrieve a time        [11] D. R. Jeske et al. Generation of synthetic data sets for
series. After all, time series from the energy domain may be              evaluating the accuracy of knowledge discovery
generated by models integrated in Loom.                                   systems. In Proc. of KDD, pages 756–762, 2005.
                                                                     [12] L. Kaufman and P. J. Rousseeuw. Finding Groups in
5.   CONCLUSIONS AND FUTURE WORK                                          Data. Wiley, 1990.
   In this paper, we introduced Loom as a tool for generating        [13] M. Kendall and A. Stuart. The Advanced Theory of
large sets of synthetical time series data. Our prototype                 Statistics, volume 3. Griffin, 1983.
utilizes different time series generators to create multiple         [14] J. Schaffner and T. Januschowski. Realistic tenant
time series that share certain characteristics. In addition,              traces for enterprise DBaaS. In Workshops Proc. of
our prototype allows the creation of dimensional categorical              ICDE, pages 29–35, 2013.
information for the description of time series. Besides the          [15] R. H. Shumway and D. S. Stoffer. Time Series
full manual definition of a dataset, Loom features a template             Analysis and Its Applications. Springer, 2011.
driven approach that analyses given datasets and allows the          [16] L. Siksnys, C. Thomsen, and T. B. Pedersen.
creation of synthetic variants of this template data.                     MIRABEL DW: managing complex energy data in a
   Currently, our approach only generates complete time se-               smart grid. In Proc. of DaWaK, pages 443–457, 2012.
ries with equidistant time stamps. This is done, as forecast         [17] P. Vassiliadis. Modeling multidimensional databases,
methods like Exponential smoothing [9] and ARIMA [5] rely                 cubes and cube operations. In Proc. of SSDBM, pages
on these properties, except few models like [15]. Part of our             53–62, 1998.
future work will be the integration of functions for generat-
ing incomplete time series with configurable gap patterns.
   Regarding data cube modelling, we assume that a dimen-
sion is a totally ordered set of levels, which is the case in most
real-world datasets. However, there are exceptions, such as
the modelling of a time dimension with levels: day, week
and month. There, a day functionally determines a week
and a month, but a week does not determine the month.
Such lattices are not supported by our prototype.
   Further future work will be focused on time series mapping
to the data cube. Right now, we use a very simple approach
for this and randomly distribute our time series over the
data cube skeleton. This approach will be replaced with a
more sophisticated method that allows the configuration of
the distribution. For example, time series from a certain
generator only occur in a specified subset of the data cube
skeleton.