<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Model-Driven Integration of Compression Algorithms in Column-Store Database Systems</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Juliana</forename><surname>Hildebrandt</surname></persName>
							<email>juliana.hildebrandt@tu-dresden.de</email>
							<affiliation key="aff0">
								<orgName type="department">Database Systems Group</orgName>
								<orgName type="institution">Technische Universität Dresden</orgName>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Dirk</forename><surname>Habich</surname></persName>
							<email>dirk.habich@tu-dresden.de</email>
							<affiliation key="aff0">
								<orgName type="department">Database Systems Group</orgName>
								<orgName type="institution">Technische Universität Dresden</orgName>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Wolfgang</forename><surname>Lehner</surname></persName>
							<email>wolfgang.lehner@tu-dresden.de</email>
							<affiliation key="aff0">
								<orgName type="department">Database Systems Group</orgName>
								<orgName type="institution">Technische Universität Dresden</orgName>
							</affiliation>
						</author>
						<title level="a" type="main">Model-Driven Integration of Compression Algorithms in Column-Store Database Systems</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">56952320FC8EF438D94E33A3057B52A0</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-19T15:21+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>Processing Layer Storage Layer Durability Layer Storage Format Compression Format</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Modern database systems are very often in the position to store their entire data in main memory. Aside from increased main memory capacities, a further driver for in-memory database systems was the shift to a decomposition storage model in combination with lightweight data compression algorithms. Using both mentioned storage design concepts, large datasets can be held and processed in main memory with a low memory footprint. In recent years, a large corpus of lightweight data compression algorithms has been developed to efficiently support different data characteristics. In this paper, we present our novel model-driven concept to integrate this large and evolving corpus of lightweight data compression algorithms in column-store database systems. Core components of our concept are (i) a unified conceptual model for lightweight compression algorithms, (ii) specifying algorithms as platform-independent model instances, (iii) transforming model instances into low-level system code, and (iv) integrating low-level system code into a storage layer.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>With an ever increasing volume of data in the era of Big Data, the storage requirement for database systems grows quickly. In the same way, the pressure to achieve the required processing performance increases, too. For tackling both aspects in a consistent and uniform way, data compression plays an important role. On the one hand, compression drastically reduces storage requirements. On the other hand, compression also is the cornerstone of an efficient processing capability by enabling in-memory technologies. This aspect is heavily utilized in modern in-memory database systems <ref type="bibr" target="#b0">[1,</ref><ref type="bibr" target="#b22">23]</ref> based on a decomposition storage model (DSM) <ref type="bibr" target="#b6">[7]</ref>. For the compression of sequences of integer values like applied in DSM compression or in the compression of posting lists in the context of information retrieval, a large corpus of lightweight data compression algorithms has been developed to efficiently support different data characteristics especially of sequences of integer values. Examples of lightweight compression techniques are: frame-of-reference (FOR) <ref type="bibr" target="#b7">[8,</ref><ref type="bibr" target="#b22">23]</ref>, delta coding (DELTA) <ref type="bibr" target="#b14">[15,</ref><ref type="bibr" target="#b18">19]</ref>, dictionary compression (DICT) <ref type="bibr" target="#b0">[1,</ref><ref type="bibr" target="#b22">23]</ref>, bit vectors (BV) <ref type="bibr" target="#b21">[22]</ref>, run-length encoding (RLE) <ref type="bibr" target="#b0">[1,</ref><ref type="bibr" target="#b18">19]</ref>, and null suppression (NS) <ref type="bibr" target="#b0">[1,</ref><ref type="bibr" target="#b18">19]</ref>. These algorithms achieve good compression rates and they provide fast compression as well as decompression. The corpus evolves further because it is impossible to design an algorithm that always produces optimal results for all kinds of data. As shown, e.g., in <ref type="bibr" target="#b0">[1,</ref><ref type="bibr" target="#b22">23,</ref><ref type="bibr" target="#b23">24]</ref>, the query performance gain for analytical queries is massive.</p><p>From the database system architecture perspective, the most challenging task is now to define an approach allowing us to integrate this large and evolving corpus of lightweight compression algorithms in an efficient way. The naïve approach would be to natively implement the algorithms in the storage layer of an in-memory database system as done today. However, this naïve approach has several drawbacks, e.g., (1) massive effort to implement every possible lightweight compression algorithm as well as <ref type="bibr" target="#b1">(2)</ref> database developers or even users are not able to integrate a specific compression algorithm without a deep system understanding and implementing the algorithm on a system level. Generally, users know their data and could design appropriate compression algorithms, however, database developers have to preserve the generality of the system and cannot implement many algorithms for a small range of applications. To overcome this situation, we propose our novel model-driven approach to easily and automatically integrate almost every lightweight data compression algorithm in an in-memory DSM storage layer. In detail, we address the following points:</p><p>1. We start with a brief solution overview in Section 2. As we are going to show, our solution consists of four components: (i) unified conceptual model for lightweight compression algorithms, (ii) description approach for algorithms as model instances, (iii) transforming model instances to executable code and (iv) integration of generated code into the storage layer. 2. Based on this solution overview, we summarize our conceptual model for lightweight compression algorithms in Section 3. This specific conceptual model helps in describing, understanding, communicating, and comparing the algorithms on a conceptual level. 3. Then, we introduce our description language enabling database developers to specify algorithms in a platform-independent form based on our model. 4. These platform-independent model instances are the foundation for database system integration as described in Section 5. Here, we introduce our approach to transform the platform-independent instances to executable algorithms. 5. We conclude with related work and a summary in Section 6 and 7.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Solution Overview</head><p>Fundamentally, the model-driven architecture (MDA) is a software design approach for the development of software systems <ref type="bibr" target="#b13">[14,</ref><ref type="bibr" target="#b20">21]</ref>. In the MDA approach, the system functionality is defined with a platform-independent model (PIM) using an appropriate domain-specific language <ref type="bibr" target="#b13">[14,</ref><ref type="bibr" target="#b20">21]</ref>. Then, the PIM is translated into one or more platform-specific models (PSMs) that can be executed <ref type="bibr" target="#b13">[14,</ref><ref type="bibr" target="#b20">21]</ref>. The MDA paradigm is widely used in the area of database applications for database creations. On the one hand, the model-driven data modeling and the generation of normalized database schemas should be mentioned. On the Fig. <ref type="figure">1</ref>: Model-driven approach for the integration of lightweight data compression algorithms.</p><p>other hand, there is the generation of full database applications, including the data schema as well as data layer code, business logic layer code, and even user interface code <ref type="bibr" target="#b9">[10,</ref><ref type="bibr" target="#b17">18]</ref>. Furthermore, the MDA approach has been successfully applied in the area of data warehouse schema creation <ref type="bibr" target="#b16">[17]</ref>, as well as the modeling and generation of data integration processes <ref type="bibr" target="#b1">[2,</ref><ref type="bibr" target="#b2">3]</ref>. However, there is little to no work on the utilization of MDA for database system-internal components.</p><p>As depicted on the left side in Fig. <ref type="figure">1</ref>, the storage layer of an in-memory database system usually consists of two important components. The first component is the storage format, thereby several formats are proposed. One well-known format is the N-ary storage model (NSM) storing tuples coherently <ref type="bibr" target="#b12">[13]</ref>. That means, the tuple unit is preserved in this storage format. The decomposition storage model (DSM), proposed in 1985 <ref type="bibr" target="#b6">[7]</ref>, is a second widely utilized format. The DSM partitions an n-attribute relation vertically into n sub-relations. Each sub-relation contains two attributes, a logical record id (surrogate) and an attributed value <ref type="bibr" target="#b6">[7]</ref>. In most cases, the surrogate can be neglected due to the order of the tuples. That means, sub-relation storage equals to a value-based storage model in form of a sequence of values. While the NSM storage format is used in disk-based database systems, the DSM is the preferred layout of in-memory databases <ref type="bibr" target="#b0">[1,</ref><ref type="bibr" target="#b3">4,</ref><ref type="bibr" target="#b22">23,</ref><ref type="bibr" target="#b23">24]</ref>. Compression is the second component of the storage layer, thereby a large and evolving corpus of lightweight data compression algorithms has been developed to efficiently support different data characteristics <ref type="bibr" target="#b0">[1,</ref><ref type="bibr" target="#b14">15,</ref><ref type="bibr" target="#b22">23]</ref>.</p><p>For this compression component, we have developed an MDA-based approach as illustrated on the right side in Fig. <ref type="figure">1</ref>. According to the MDA paradigm, the lightweight data compression algorithms have to be defined in a platformindependent model. To achieve this, we developed a conceptual model for leightweight compression algorithms called COLLATE for this specific class of algorithms. The aim of COLLATE is to provide a holistic and platform-independent view of necessary concepts including all aspects of data (structure), behavior (function), and interaction (component interaction). In particular, COLLATE consists of only five main concepts and we can specify a basic functional arrangement of these concepts as a blueprint for every lightweight data compression algorithm for DSM (see Section 3). That means, the platform-independent model of a lightweight data compression algorithm is a specific model instance of COLLATE expressed in an appropriate language (see Section 4). Then, the platform-specific model can be transformed into a platform-specific executable code as presented in Section 5. The generated code can be used in the database system in a straightforward way. In this paper, we focus on data compression algorithms. The same can be done for the decompression algorithms with a slightly adjusted conceptual model.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Conceptual Model</head><p>One of our main challenges is the definition of a conceptual model for the class of lightweight data compression algorithms. This is the starting point and anchor of our approach, since all algorithms can be consistently described with this unified and specific model in a platform-independent manner. In <ref type="bibr" target="#b11">[12]</ref>, we have proposed an appropriate model called COLLATE and the development of this model in detail. In the remainder of this section, we briefly summarize its main aspects.</p><p>The input for COLLATE is a sequence of uncompressed (integer) values due to the DSM storage format. The output is a sequence of compressed values. Input and output data have a logical representation (semantic level) and a physical representation (bit or encoding level). Through the analysis of the available algorithms, we have identified three important aspects. First, there are only six basic techniques which are used in the algorithms. These basic techniques are parameter-dependent and the parameter values are calculated within the algorithms. Second, a lot of algorithms subdivide the input data hierarchically in subsequences for which the parameters can be calculated. The following data processing of a subsequence depends on the subsequence itself. That means, data subdivision and parameter calculation are the adjustment points and the application of the basic techniques is straightforward. Third, for an exact algorithm description, the combination and arrangement of codewords and parameters have to be defined. Here, the algorithms differ widely.</p><p>Based on a systematic algorithm analysis, we defined our conceptual model for this class of algorithms. The COLLATE model consists of five main conceptsor building blocks-being required to transform a sequence of uncompressed values to a sequence of compressed values:  In addition to these individual concepts, Fig. <ref type="figure" target="#fig_0">2</ref> illustrates the interactions and the data flow through our concepts. In this figure, a simple case with only one pair of Parameter Calculator and Encoder is depicted and can be described as follows. The input data is first processed by a Tokenizer. Most Tokenizers need only a finite prefix of a data sequence to decide how many values to output. The rest of the sequence is used as further input for the Tokenizer and processed in the same manner (shown with a dashed line). Moreover, there are Tokenizers needing the whole (finite) input sequence to decide how to subdivide it. A second task of the Tokenizer is to decide for each output sequence which pair of Parameter Calculator and Encoder is used for the further processing. Most algorithms process all data in the same way, so we need only one pair of Parameter Calculator and Encoder. Some of them distinguish several cases, so that this choice between several pairs is necessary. The finite Tokenizer output sequences serve as input for the Parameter Calculator and the Encoder.</p><formula xml:id="formula_0">Recursion: Each model instance</formula><p>Parameters are often required for the encoding and decoding. Therefore, we defined the Parameter Calculator concept, which knows special rules (parameter definitions) for the calculation of several parameters. Parameters can be used to store a state during data processing. This is depicted with a dashed line. Calculated parameters have a logical representation for further calculations and the encoding of values as well as a representation at bit level, because on the one hand they are needed to calculate the encoding of values, on the other hand they have to be stored additionally to allow the decoding.</p><p>The Encoder processes an atomic input, where the output of the Parameter Calculator and other parameters are additional inputs. The input is a token that cannot or shall not be subdivided anymore. In practice the Encoder mostly gets a single integer value to be mapped into a binary code. Similar to the parameter definitions, the Encoder calculates a logical representation of its input value and an encoding at bit level using functions. Finally, the Combiner arranges the encoded values and the calculated parameters for the output representation. Based on our conceptual model, we are able to specify lightweight data compression algorithms in a platform-independent way <ref type="bibr" target="#b11">[12]</ref>. For the specification and our overall MDA-based solution, we further require an appropriate language approach. While we introduce our language concept in Section 4.1 in general, we describe an example in Section 4.2.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1">Language Approach</head><p>Instead of developing a completely new language, we decided to use the GNU Octave<ref type="foot" target="#foot_0">1</ref> high-level programming language as a foundation for the functional behavior of COLLATE. A COLLATE instance is always a Recursion concept. A Recursion consists of (1) a Tokenizer, (2) a set of pairs of Parameter Calculator and a Scheme, and (3) a Combiner concept. A Scheme is either an Encoder or a Recursion. This relationship can be summarized using a setoriented notation as follows:</p><p>M odelInstance ∈ Recursion, Recursion = T okenizer × P P air × Combiner, P air = P arameterCalculator × Scheme, Scheme = Recursion ∪ Encoder.</p><p>(1) Fig. <ref type="figure" target="#fig_1">3</ref> depicts this organization of the COLLATE concept as a class diagram in more detail. As described in the previous section, each Parameter Calculator can contain several parameter definitions. A parameter definition consists of a logical mapping to calculate a semantic parameter for a subsequence and a bit level mapping (physical level) to calculate the encoding for the semantic parameter in case the parameter has to be encoded. Likewise an Encoder consists of a logical mapping and a physical mapping.</p><p>As mentioned in the Section 3, concepts contain functions for data processing. Therefore, we combine functional and object-oriented programming for the specification of an algorithm in order to preserve the component interaction. According to the class diagram of Fig. <ref type="figure" target="#fig_1">3</ref> and the set-oriented notation, we are able to specify a COLLATE model instance as an object of the class Recursion in the Octave high level programming language. The Recursion class contains (i) a Tokenizer object which is characterized by a function handle, (ii) an array of pairs and (iii) a Combiner object which is also defined by a function handle. A function handle is an anonymous function with the syntax @(argument-list) expression. The input for a Tokenizer's function handle is (1) a sequence inp of values and (2) a structure par. This structure contains one field resp. attribute for each valid parameter. These attributes might be global parameters that are valid for each subsequence or each single value, or other calculated parameters. The output of a Tokenizer's function handle returns a triple of values. The first element is the number of output values, the second element is a finite subsequence of the input sequence and the third element is a number that indicates, which of the pairs of Parameter Calculator and Scheme is chosen for the further data processing. A Combiner's function handle has two input values. The first one is an array inp of tuples of a compressed sequence or a compressed value inp.enc which is the output of a Scheme and the corresponding output of the Parameter Calculator, inp.par. The second one is the structure par that contains all other parameters that are valid for the whole array inp. The output of a Combiner is a concatenation of compressed sequences or values.</p><p>Each pair consists of a Parameter Calculator object which is defined by several parameter definition objects and an Encoder resp. a Recursion object. Each parameter definition contains a function handle for the calculation of logical parameters and a function handle for the calculation of the encoding of the logical parameter. The input for the logical calculation is (1) the subsequence inp that is an output of the Tokenizer and (2) a structure par of known and valid parameters. The output is a logical parameter. Its type is not fix. Often it is one single value, but it can be a more complex parameter, i.e., a mapping like a dictionary. The input of the physical calculation is (1) the logical parameter inp and (2) all other known and valid parameters par. Its output is the encoding for the parameter. The function composition of both function handles maps a finite subsequence to a bit level representation of a parameter that is valid for the sequence inp and the parameters par. It is the same for the logical and physical function handles of the encoder.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2">Example algorithm</head><p>To illustrate our Octave language approach, we now present a simple example lightweight data compression algorithm: frame-of-reference with binary packing for n values (forbp). Here, an input sequence of arbitrary length is subdivided in subsequences of n values. The minimum is calculated for each subsequence of n values as the reference value. So, each value of the subsequence can be mapped to its difference to the reference value at the logical data level. This technique is </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Recursion</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">System Integration</head><p>The result of the previous two sections is that we are now able to specify lightweight compression algorithms in a platform-independent way by defining model instances with an Octave notation. As depicted in Fig. <ref type="figure" target="#fig_2">4</ref>, the first system integration is done using a CREATE COMPRESSION statement to register algorithms in the database system under a user-defined name, e.g., forbp as in this example. This system integration continues with (i) an approach to transform model instances to executable code and (ii) the specification of the application of the compression algorithm. For this application, we extended the CREATE TABLE-syntax in a straightforward way to allow the specification of the compression algorithm which should be used for each attribute separately: CREATE TABLE Test (attribute_a int compress forbp( struct("num",struct("log",128,"phys",dec2bin(128,32))));</p><p>In order to execute model instance specifications inside a database system (e.g. MonetDB <ref type="bibr" target="#b3">[4]</ref>), we require an approach to transform model instances to executable code or platform-specific code. Fig. <ref type="figure">5</ref> depicts our developed overall approach for this challenge. Generally, we follow a generator approach in our Model-2-Code Transformer. At the moment, the input is (i) an Octave specification of a model instance and (ii) code templates for our model concepts. On the COLLATE model level, we have 5 specific concepts. That means, we require one code template for each model concept to generate executable code. The code Fig. <ref type="figure">5</ref>: Transformation of model instances to executable code. templates have to be implemented once for each specific database system, e.g., MonetDB <ref type="bibr" target="#b3">[4]</ref>. This is necessary to get access to data on the specific storage layer implementation.</p><p>Based on this input, our Model-2-Code Transformer generates the executable code using a replacement and optimization strategy. In this case, the Octave specification is parsed and a corresponding arrangement of the code templates is constructed. The code templates are enriched with the corresponding parameter calculations or encoding functions. Then, the code template arrangement is optimized by applying typical compiler techniques like loop unrolling or load constant replacement <ref type="bibr" target="#b10">[11]</ref>. The goal of this optimization is to generate code with less instructions to save as many CPU cycles as possible. This is particularly crucial for reducing the compression overhead.</p><p>The generated code can be seamlessly integrated into the specific database system since the templates are implemented for the particular system. At the moment, we are implementing and integrating the whole approach for MonetDB <ref type="bibr" target="#b3">[4]</ref>, whereas our Model-2-Code Transformer is implemented using the LLVM compiler infrastructure. Unfortunately, the current status does not allow a meaningful evaluation with regard to performance.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6">Related Work</head><p>In the previous sections, we explained our overall approach for the integration of different data compression algorithms in a column-store database system. With our approach, an in-memory column-store system is extendable with regard to the bit-level data representation on storage layer level. Generally, extensibility of DBMS was a research field in the late 80's and early 90's <ref type="bibr" target="#b4">[5,</ref><ref type="bibr" target="#b5">6,</ref><ref type="bibr" target="#b19">20]</ref>. Here, the focus was the integration of new user defined data types, storage methods and indexes, but not the extensibility with regard to new algorithms on top of storage formats. Furthermore, there is also work available focussing on code generation in database systems. For example, Neumann has proposed a completely new query processing architecture <ref type="bibr" target="#b15">[16]</ref>. Here, a code generator is used to build specialized query execution plans on demand based on operator templates. He also used the LLVM framework to merge operations into one machine code function. In our work, we follow a similar approach for lightweight data compression algorithms. An interesting research direction would be to combine both approaches for a compression-aware query processing <ref type="bibr" target="#b8">[9]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7">Conclusion</head><p>The efficient storage and processing of large datasets is a challenge in the era of Big Data. To tackle this challenge, data compression plays an important role from a system-level perspective. Aside from drastically reducing the storage requirement, data compression also enables an efficient processing using "in-memory" technologies. In order to do justice to that, we have presented a model-based approach to integrate a large and evolving corpus of lightweight data compression algorithms in column-store database systems like MonetDB <ref type="bibr" target="#b3">[4]</ref> in this paper. Generally, our approach consists of four components: (i) unified conceptual model for lightweight compression algorithms, (ii) description approach for algorithms as model instances, (iii) transforming model instances to executable code and (iv) integration of generated code into the storage layer. In this paper, we concentrated on data compression. The same is also possible for data decompression. In our further research activities, we want to shift our focus to different storage formats to cover not only column-store database systems, but also regard other storage formats, for example row-stores, but our overall approach works.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Fig. 2 :</head><label>2</label><figDesc>Fig. 2: Interaction and data flow of COLLATE.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Fig. 3 :</head><label>3</label><figDesc>Fig. 3: Class diagram of the COLLATE model and data types in the domainspecific language Octave 4 Description Language for Model Instances</figDesc><graphic coords="6,143.41,115.83,328.54,125.33" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Fig. 4 :</head><label>4</label><figDesc>Fig. 4: Graphical representation and Octave code representation example</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head></head><label></label><figDesc>includes a Recursion per se. This concept is responsible for the hierarchical sequence subdivision and for applying the included concepts in the Recursion on each data subsequence. Tokenizer: This concept is responsible for dividing an input sequence into finite subsequences of k values (or single values). Parameter Calculator: The concept Parameter Calculator determines parameter values for finite subsequences or single values. The specification of the parameter values is done using parameter definitions. Encoder: The third concept determines the encoded form for values to be compressed at bit level. Again, the concrete encoding is specified using functions representing the basic techniques.</figDesc><table><row><cell>Recursion</cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell>tail of</cell><cell></cell><cell cols="2">calculated parameters</cell><cell></cell></row><row><cell>input sequence</cell><cell>adapted value</cell><cell></cell><cell></cell><cell></cell></row><row><cell>input</cell><cell>fix</cell><cell></cell><cell></cell><cell></cell></row><row><cell>sequence</cell><cell>para-</cell><cell></cell><cell></cell><cell></cell></row><row><cell></cell><cell>meters</cell><cell></cell><cell></cell><cell></cell><cell>encoded</cell></row><row><cell>Tokenizer</cell><cell>Calculator Parameter</cell><cell></cell><cell>compressed</cell><cell>Combiner</cell><cell>sequence</cell></row><row><cell>parameter/ set of valid tokens/. . .</cell><cell>fix para-meters</cell><cell>Encoder/ Recursion</cell><cell>value or compressed sequence</cell><cell></cell></row><row><cell>adapted pa-</cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell>rameters/</cell><cell>token:</cell><cell></cell><cell></cell><cell></cell></row><row><cell>adapted set of</cell><cell>finite sequence</cell><cell></cell><cell></cell><cell></cell></row><row><cell>valid tokens/. . .</cell><cell></cell><cell></cell><cell></cell><cell></cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">https://www.gnu.org/software/octave/</note>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgments</head><p>This work was partly funded (1) by the German Research Foundation (DFG) in the context of the project "Lightweight Compression Techniques for the Optimization of Complex Database Queries" (LE-1416/26-1) and ( <ref type="formula">2</ref>) by the German Federal Ministry of Education and Research (BMBF) in EXPLOIDS project under grant 16KIS0523.</p></div>
			</div>

			<div type="annex">
<div xmlns="http://www.tei-c.org/ns/1.0" />			</div>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Integrating compression and execution in column-oriented database systems</title>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">J</forename><surname>Abadi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">R</forename><surname>Madden</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">C</forename><surname>Ferreira</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">SIGMOD</title>
				<imprint>
			<date type="published" when="2006">2006</date>
			<biblScope unit="page" from="671" to="682" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Model-driven generation and optimization of complex integration processes</title>
		<author>
			<persName><forename type="first">M</forename><surname>Böhm</surname></persName>
		</author>
		<author>
			<persName><forename type="first">U</forename><surname>Wloka</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Habich</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Lehner</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">ICEIS</title>
				<imprint>
			<date type="published" when="2008">2008</date>
			<biblScope unit="page" from="131" to="136" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">GCIP: exploiting the generation and optimization of integration processes</title>
		<author>
			<persName><forename type="first">M</forename><surname>Böhm</surname></persName>
		</author>
		<author>
			<persName><forename type="first">U</forename><surname>Wloka</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Habich</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Lehner</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">EDBT</title>
				<imprint>
			<date type="published" when="2009">2009</date>
			<biblScope unit="page" from="1128" to="1131" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Breaking the memory wall in monetdb</title>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">A</forename><surname>Boncz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">L</forename><surname>Kersten</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Manegold</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Commun. ACM</title>
		<imprint>
			<biblScope unit="volume">51</biblScope>
			<biblScope unit="issue">12</biblScope>
			<biblScope unit="page" from="77" to="85" />
			<date type="published" when="2008">2008</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Extensible database management systems</title>
		<author>
			<persName><forename type="first">M</forename><surname>Carey</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Haas</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">SIGMOD Rec</title>
		<imprint>
			<biblScope unit="volume">19</biblScope>
			<biblScope unit="issue">4</biblScope>
			<biblScope unit="page" from="54" to="60" />
			<date type="published" when="1990-12">Dec 1990</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">The architecture of the exodus extensible dbms</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">J</forename><surname>Carey</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">J</forename><surname>Dewitt</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Frank</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Muralikrishna</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Graefe</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">E</forename><surname>Richardson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">J</forename><surname>Shekita</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">OODS</title>
		<imprint>
			<biblScope unit="page" from="52" to="65" />
			<date type="published" when="1986">1986</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">A decomposition storage model</title>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">P</forename><surname>Copeland</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">N</forename><surname>Khoshafian</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">SIGMOD Rec</title>
		<imprint>
			<biblScope unit="volume">14</biblScope>
			<biblScope unit="issue">4</biblScope>
			<biblScope unit="page" from="268" to="279" />
			<date type="published" when="1985-05">May 1985</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Compressing relations and indexes</title>
		<author>
			<persName><forename type="first">J</forename><surname>Goldstein</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Ramakrishnan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">U</forename><surname>Shaft</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">ICDE</title>
				<imprint>
			<date type="published" when="1998">1998</date>
			<biblScope unit="page" from="370" to="379" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Optimierung der Anfrageverarbeitung mittels Kompression der Zwischenergebnisse</title>
		<author>
			<persName><forename type="first">D</forename><surname>Habich</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Damme</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Lehner</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">BTW</title>
		<imprint>
			<biblScope unit="page" from="259" to="278" />
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<monogr>
		<title level="m" type="main">Gignomda -exploiting cross-layer optimization for complex database applications</title>
		<author>
			<persName><forename type="first">D</forename><surname>Habich</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Richly</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Lehner</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2006">2006</date>
			<publisher>VLDB</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Plan operator specialization using reflective compiler techniques</title>
		<author>
			<persName><forename type="first">C</forename><surname>Hänsch</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Kissinger</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Habich</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Lehner</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">BTW</title>
		<imprint>
			<biblScope unit="page" from="363" to="382" />
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<monogr>
		<title level="m" type="main">COLLATE -a conceptual model for lightweight data compression algorithms</title>
		<author>
			<persName><forename type="first">J</forename><surname>Hildebrandt</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Habich</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Damme</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Lehner</surname></persName>
		</author>
		<ptr target="https://goo.gl/SgXm5z" />
		<imprint>
			<date type="published" when="2016">2016</date>
		</imprint>
		<respStmt>
			<orgName>Technische Universitaet Dresden, Database Systems Group</orgName>
		</respStmt>
	</monogr>
	<note type="report_type">Tech. rep.</note>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Operations and implementation of complex objects</title>
		<author>
			<persName><forename type="first">W</forename><surname>Kim</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><forename type="middle">T</forename><surname>Chou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Banerjee</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">ICDE</title>
				<imprint>
			<date type="published" when="1987">1987</date>
			<biblScope unit="page" from="626" to="633" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<monogr>
		<author>
			<persName><forename type="first">A</forename><surname>Kleppe</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Warmer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Bast</surname></persName>
		</author>
		<title level="m">MDA Explained. The Model Driven Architecture: Practice and Promise</title>
				<imprint>
			<publisher>Addison-Wesley</publisher>
			<date type="published" when="2003">2003</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Decoding billions of integers per second through vectorization</title>
		<author>
			<persName><forename type="first">D</forename><surname>Lemire</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Boytsov</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Softw., Pract. Exper</title>
		<imprint>
			<biblScope unit="volume">45</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="1" to="29" />
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Efficiently compiling efficient query plans for modern hardware</title>
		<author>
			<persName><forename type="first">T</forename><surname>Neumann</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Proc. VLDB Endow</title>
		<imprint>
			<biblScope unit="volume">4</biblScope>
			<biblScope unit="issue">9</biblScope>
			<biblScope unit="page" from="539" to="550" />
			<date type="published" when="2011-06">Jun 2011</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">OMG: Common Warehouse Metamodel (CWM)</title>
	</analytic>
	<monogr>
		<title level="j">Version</title>
		<imprint>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="issue">0</biblScope>
			<date type="published" when="2001">2001</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">Gignomda -generation of complex database applications</title>
		<author>
			<persName><forename type="first">S</forename><surname>Richly</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Habich</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Lehner</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Grundlagen von Datenbanken</title>
				<imprint>
			<date type="published" when="2006">2006</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">Database compression</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">A</forename><surname>Roth</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">J V</forename><surname>Horn</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">SIGMOD Record</title>
		<imprint>
			<biblScope unit="volume">22</biblScope>
			<biblScope unit="issue">3</biblScope>
			<biblScope unit="page" from="31" to="39" />
			<date type="published" when="1993">1993</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">Extensibility in the starburst database system</title>
		<author>
			<persName><forename type="first">P</forename><surname>Schwarz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Chang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">C</forename><surname>Freytag</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Lohman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Mcpherson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Mohan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Pirahesh</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">OODS</title>
		<imprint>
			<biblScope unit="page" from="85" to="92" />
			<date type="published" when="1986">1986</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">Model driven development: the case for domain oriented programming</title>
		<author>
			<persName><forename type="first">D</forename><surname>Thomas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">M</forename><surname>Barry</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">OOPSLA</title>
				<imprint>
			<date type="published" when="2003">2003</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<analytic>
		<title level="a" type="main">Adaptive Data Compression</title>
		<author>
			<persName><forename type="first">R</forename><surname>Williams</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Kluwer international series in engineering and computer science: Communications and information theory</title>
				<imprint>
			<publisher>Springer US</publisher>
			<date type="published" when="1991">1991</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<analytic>
		<title level="a" type="main">Super-scalar RAM-CPU cache compression</title>
		<author>
			<persName><forename type="first">M</forename><surname>Zukowski</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Heman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Nes</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Boncz</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">ICDE</title>
		<imprint>
			<biblScope unit="page">59</biblScope>
			<date type="published" when="2006">2006</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b23">
	<analytic>
		<title level="a" type="main">DSM vs. NSM: CPU performance tradeoffs in block-oriented query processing</title>
		<author>
			<persName><forename type="first">M</forename><surname>Zukowski</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Nes</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">A</forename><surname>Boncz</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">DaMoN</title>
		<imprint>
			<biblScope unit="page" from="47" to="54" />
			<date type="published" when="2008">2008</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
