A Comprehensive Study of Late Materialization Strategies for a Disk-Based Column-Store George Chernishev Viacheslav Galaktionov Valentin Grigorev Saint Petersburg State University Saint Petersburg State University Saint Petersburg State University Saint-Petersburg, Russia Saint-Petersburg, Russia Saint-Petersburg, Russia chernishev@gmail.com viacheslav.galaktionov@gmail.com valentin.d.grigorev@gmail.com Evgeniy Klyuchikov Kirill Smirnov Saint Petersburg State University Saint Petersburg State University Saint-Petersburg, Russia Saint-Petersburg, Russia evgeniy.klyuchikov@gmail.com kirill.k.smirnov@gmail.com ABSTRACT This strategy is capable of significantly conserving I/O bandwidth By allowing operations on positions (row IDs, offsets), column- and reducing the CPU processing load for appropriate queries. It stores increase the overall number of admissible query plans. was extensively studied in the earliest of the new-wave column- Their plans can be classified into a number of so-called material- stores [21, 33] and was deemed useful in cases when a lot of ization strategies, which describe the moment when positions are tuples are discarded by plan operators. However, the majority of switched to tuples. Despite being a well-studied topic with sev- modern industrial column-stores do not employ this strategy and eral different implementations, there is still no formal definition instead rely on EM. The idea of this approach can be described for it, as well as no classification of existing approaches. as not allowing positions past filter operators. The reasons for In this paper we review and classify these approaches. Our this are the complexity of implementing an optimizer and issues classification shows that, for disk-based systems, none of the related to processing disk-spilling joins. existing implementation variants efficiently combines position However, there is no formal definition of late materialization; manipulation inside both selections and joins. For this reason, it is more of a paradigm than a fixed set of rules. Over the years, we propose such an approach which we name “ultra-late materi- there were multiple proposals which concerned position-based alization”. query processing with LM support, each involving different tech- Further, we describe recent modifications of PosDB — a dis- niques, use-cases, and aims. Despite late materialization being a tributed, disk-based column-store. These modifications allowed well-established topic, there is motivation to study the various us to implement a flexible query processing model. Relying on it, approaches: we have implemented a number of late materialization variants, • More than ten years have passed since the last of mate- including our approach. rialization experiments. There are various accumulated Finally, we empirically evaluate the performance of ultra-late changes in the hardware, such as the improvement of SSDs materialization and classic strategies. We also compare it with and appearance of novel types of storage, such as NVMe. two industrial-grade disk-based systems: PostgreSQL and Mari- • Furthermore, our review demonstrates that the research aDB Column Store. Experiments demonstrate that our variant landscape concerning the processing of joins in a late ma- of late materialization outperforms the closest competitor (Mari- terialization environment is incomplete and needs expan- aDB Column Store) by 50% which makes further investigation sion. The majority of reviewed studies addressed queries worthwhile. containing not more than a single join. • There are novel applications of position enabled-processing such as systems for visual analytics. 1 INTRODUCTION Thus, there is a strong call for taking a second look into mate- Unlike row-stores, column-stores can operate not only on data, rialization. but on positions as well (or row IDs, offsets, etc.) [3]. Usually, In this paper we review existing approaches in order to com- positions are employed on the lower levels of query plan, e.g., if pare and classify them. For this, we examine query plans and there is a very selective predicate on some column, then it may study position-related techniques that they employ. As the re- be worthwhile to evaluate it first, and then perform positional sult, we come to the conclusion that there are two types of late lookup for values of other attributes. materialization: late materialization in selection parts of the plan Operating on positions opens a number of different query and late materialization in joins. We demonstrate that there are processing strategies centered around so-called “materialization”. yet to be studies on how to efficiently combine these approaches Roughly speaking, materialization is the process of switching for disk-based systems. positions back into values. Materialization eventually has to hap- Following this, we propose the ultra-late materialization ap- pen at some point during query execution, since the user needs proach which efficiently combines both types of late materializa- tuples instead of positions. tion techniques. The literature presents two main strategies: early (EM) and Next, we describe the recent modifications of PosDB which late materialization (LM). The basic idea of LM is to operate on allowed us to greatly expand its query processing model. It now positions and defer tuple reconstruction for as long as possible. offers substantial flexibility in terms of admissible query plans and makes it possible to implement different types of materializa- © Copyright 2022 for this paper by its author(s). Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0) tion strategies. We have implemented the proposed ultra-late ma- terialization as well as several other materialization approaches. Finally, we evaluate these strategies, comparing them with approach that took into account positions in the table, which they each other and running additional experiments with two indus- called surrogate attributes. Data was stored in two-column tables trial DBMSes, one of which is a column-store. made up from records containing Overall, the contributions of this paper are the following: pairs. (1) A discussion of LM approaches, examination of their sup- To represent joins, the authors employed a join index [36]. ported query plans, their comparison and classification. However, unlike modern systems, they proposed to pre-compute (2) A proposal for ultra-late materialization: a materializa- and store the join indexes with the rest of the data on disk. tion strategy that allows to operate on positions in both The central idea of the approach is to process sets consisting selections and joins. of two-column tables which may be either data columns with (3) A description of PosDB architecture and its recent modifi- assigned pivot attributes or join indexes. The processing is based cations which made it possible to implement a number of on a pivot surrogate — a column which other columns are joined different materialization strategies. to using an N-ary join. (4) An experimental evaluation of these strategies inside PosDB The overall process consisted of several phases: and additionally using two industrial systems — row- and (1) The Selection phase, during which predicates are applied column-store, using the Star Schema Benchmark. to individual tables. This process resulted in two types of intermediates: the ones which contained only surrogate 2 BACKGROUND, RELATED WORK, AND attribute and records. MOTIVATION (2) The Pivot phase, during which results from individual tables are “stitched” together. For this, intermediates ob- 2.1 What are Early and Late Materialization? tained in the previous phase and stored join indexes are A Classification of Approaches joined via an N-ary join. An optimal pivot surrogate is Materialization is a process of combining data from individual selected and all other itermediates are joined to it. This columns into wide records [3]. There are several possible materi- phase results in a set of two-column tables consisting of alization strategies, each of which defines its own query plans records. and types of employed operators. (3) The Materialization phase, which processes each table There are two types of approaches: early materialization and individually: each attribute necessary to answer the query late materialization. The first one is adopted by many so-called [3] is read from disk to construct a set of two-column tables naive column-stores which only store data in columnar format containing and use this feature only to read a subset of columns which are records. necessary for a particular query. During query processing, they (4) The Composition phase uses the N-ary join of intermedi- immediately decompress data, form rows and then proceed to ates obtained in the previous phase to construct records. evaluate query similarly to row-stores. Therefore, they do not Thus, this approach may be considered as a variant of late process data in columnar form. materialization which operated on positions inside selections Although early materialization provides some benefits — lower and joins. However, one may say that it was not pure late materi- volumes of data read from disk (compared to row-stores), it is alization, but hybrid instead, since after the value materialization possible to do even better. The idea of late materialization is to phase the query executor continued to operate on both positions defer record reconstruction to a later moment. The goal is to find (pivot surrogate) and values. a better query plan which becomes available in the context of The proposed processing scheme was rigid, offered little to no this strategy. Usually, implementing late materialization requires opportunities of query optimization, but more importantly, it did additional efforts such as devising novel query processing models not explore available options of performing selections the way and operators. C-Store did, which is discussed further. Unlike early materialization systems, late materialization ones These drawbacks, at least partly, come from the fact the paper not only store, but also process columnar data, aggressively using was published several years before the release of the famous positions inside the query executor. The idea is to conserve disk Volcano [16] paper. bandwidth using knowledge of selectivities. We can classify all studies by the place of query plan which 2.3 Late Materialization in Fractured Mirrors employs positions. Query plans have a fairly regular structure: The Fractured Mirrors paper [29] also explored late materializa- selections are placed on the bottom and joins usually reside above. tion. Its authors designed a system in which row- and column- The capability to employ positions inside each of these parts store data layouts were combined. They used TPC-H [1] queries corresponds to a variant of late materialization. Thus, there are to benchmark and study plans produced by their approach. two places where late materialization can be implemented: inside The core of the approach is the DSMScan operator, which is a selections and inside joins. specialized variant of multi-way merge-join. To demonstrate the Late materialization has been studied for several decades. How- produced plans, the authors used several queries, two of which ever, it was successfully implemented only in a handful of studies. we will consider. They are presented in Figure 1. We will review them below. The first one is Q1* which is essentially the TPC-H Q1 query, in which the predicate was inverted to make it more selective. 2.2 Early Days of Late Materialization: Works The leftmost shows the row-store mode plan. There, query of G. Copeland et al. executor performs an index scan and sends qualifying tuples One of the earliest studies that considered a variant of late mate- into the aggregation operator. The columnar plan has a modified rialization was a series of papers by G. Copeland et al. [14, 22], index scan operator that returns record ids, which are then fed published in the 80’s. The authors proposed a query execution into six columnar access operators. DSMScan orchestrates them Figure 1: Query plans in Fractured Mirrors, adapted from original paper [29] and creates records which are passed to the aggregation operator. Select R1.B, R1.C, Therefore, this query plan has late materialization in the selection R2.E, R2.H, R3.F part. From R1, R2, R3 The right part of Figure 1 contains plans for the TPC-H Q19, Where R1.A = R2.D Project which consists of selections and a single join. It shows two DSM AND R2.G = R3.K plans — with early and late materialization. In case of early mate- Join G=K rialization, tuples are formed at the bottom levels of the plan and G then joined with the tuple-based join. The late materialization plan is different: at the first step, only four out of six attributes K F Project belonging to LineItem (LItem in the figure) table are accessed. Join Then the tuple-based join is run and the remaining two attributes A=D are read from disk. Therefore, one may say that for l_extprice and l_discount late materialization is performed. However, in order C B A D G E H to perform reaccess, positions are kept together with values after the join. This way, it is more of a hybrid than late materialization technique. Figure 2: FlashJoin example Therefore, this variant includes late materialization in selec- tions and partially in joins. However, similarly to the approach kernel reads five attributes, belonging to all three tables. Thus, described in Section 2.2 it lacked C-Store-like scans and it did this approach supports late materialization inside join sequences. not deeply explore join performance. In this study, only a single The approach was implemented inside the PostgreSQL system considered query contained more than one join. and evaluated in an SSD environment. However, their experi- ments used queries featuring no more than two joins. Finally, 2.4 FlashJoin: late materialization inside this study did not consider late materialization inside selections. PostgreSQL In the previous study, joins “captured” some of attributes that 2.5 Late materialization in C-Store were materialized earlier. An alternative approach was proposed In the C-Store [33] system, late materialization was implemented by the HP Labs research group [34]. differently. It relied on a special data structure: a multi-column They proposed a new join operator called FlashJoin consisting block. It describes a consequent horizontal fragment of a table, of two components: fetch kernel and join kernel. The join kernel transferred between operators in a Volcano-style manner. Each will process only the two necessary attributes, one for each table. block contains [3]: The goal of the fetch kernel is to read attributes necessary for • a position descriptor which indicates the status of each further query processing. record — valid (i.e. satisfies the predicate) or not, and An example of this processing scheme is presented in Figure 2 • values of several columns, each one of which may be com- (adapted from the original paper). pressed by its own algorithm. In this query, the first operator joins 𝑅1 and 𝑅2 using the The multi-column block made it possible to implement late 𝑅1.𝐴 = 𝑅2.𝐷 condition. The result of its join kernel is the list of materialization inside selections [6]. Consider, for example, a position pairs, one for each table. The next join requires the 𝐺 query which scans the LineItem table (the SSB [2] one) and selects attribute, therefore the fetch kernel reads it. In a similar manner, records consisting of 𝐿𝑖𝑛𝑒𝑛𝑢𝑚 and 𝑆ℎ𝑖𝑝𝑑𝑎𝑡𝑒 attributes which the next operator performs the join. However, this time the fetch satisfy 𝑆ℎ𝑖𝑝𝑑𝑎𝑡𝑒 < 𝑐𝑜𝑛𝑠𝑡1 𝐴𝑁 𝐷 𝐿𝑖𝑛𝑒𝑛𝑢𝑚 < 𝑐𝑜𝑛𝑠𝑡2 condition. Figure 3: Possible query plans, figure taken from the original paper [6]. The paper discussed four approaches to evaluating this query, Table 1: Classification of LM approaches which are presented in Figure 3. The first plan uses the DS2 operator to read the 𝑆ℎ𝑖𝑝𝑑𝑎𝑡𝑒 Focus on LM Focus on Study Type column and filter it to produce positions with the corresponding in selections LM in joins values. Next, DS4 fetches values from the 𝐿𝑖𝑛𝑒𝑛𝑢𝑚 column using G. Copeland et al. [14, 22] partial full disk the position, applies the predicate and constructs records that Fractured Mirrors [29] partial partial disk satisfy it. The second plan does not use positions at all, it simply FlashJoin [34] none full disk reads both columns synchronously and constructs the result. C-Store [6, 33] full partial disk Thus, query executor reads each column exactly once. MonetDB family [10, 21] full full mem The paper called the first two plans early materialization, but Hyrise [18] full none mem in fact the first one is a hybrid one since it operates both on positions and values. The last two plans belong to the late materialization class. The third plan at first obtains positions satisfying each predicate, then Moreover, MonetDB/X100 [9, 19] offered another late materi- intersects them and uses them to read the corresponding values. alization option — selection vectors. The idea is to select a subset In the end, they are merged to construct records. The fourth of data using a vector of positions inside some operator, without plan filters 𝑆ℎ𝑖𝑝𝑑𝑎𝑡𝑒 first, and then resulting positions are used a dedicated filtration phase. to read 𝐿𝑖𝑛𝑒𝑛𝑢𝑚 and filter using the corresponding predicate. However, MonetDB is an in-memory system and its query These refined positions are used to read values, and then a record processing model is defined by this fact. Likewise, its late materi- is constructed similarly to the previous plan. The fourth plan is alization techniques are dependent on having all data in memory beneficial when the 𝑆ℎ𝑖𝑝𝑛𝑢𝑚 predicate is very selective, which and thus have limited application in disk-based systems. makes it possible to read a very small number of blocks from the 𝐿𝑖𝑛𝑒𝑛𝑢𝑚 column. 2.7 Late materialization in Hyrise The C-Store was oriented towards projection-based processing. HYRISE [18] has also explored late materialization. The cited Projection is a pre-joined collection of columns that may involve paper was an extension of study [6] (discussed in Section 2.5), multiple logical tables. For this reason C-Store preferred joinless whose authors studied the same query plans and adapted them queries, although it was possible to run joins between projections for in-memory processing. Thus, they did not study its impact using pre-computed join indexes. Later studies [24] revealed that on joins and instead concentrated on scans. this idea was abandoned due to excessive computing costs. A subsequent paper [6] evaluated late materialization in joins, 2.8 Wrap-up and motivation for further but using only a single query and this query had only a single join. research One of the reasons was the use of an unsuitable data structure (multi-column) to store post-join intermediates which restricted Concluding this section, we can say that all existing late materi- further experiments. alization models have some issues: • They either support late materialization in joins only, but 2.6 Late materialization in MonetDB family not in selections [29, 34], or they support a rich operator The MonetDB [10, 21] system offers the largest number of op- set for selections but fall behind in joins [6, 18, 33]. tions of materialization processing. Its idea is to operate on BATs • They do not follow [14, 22] the now standard Volcano pro- (Binary Association Tables) — special tables that consist of one or cessing scheme, and thus are inapplicable in contemporary two columns. Each column may contain either data itself (values) systems. or positions. All MonetDB operators take BATs as input and also • They employ in-memory processing [10, 18, 21] and thus output BATs, thus forming an algebra. are inapplicable for disk-based DBMSes. Despite the boom In this query processing model, late materialization naturally of in-memory systems, there is still plenty of use-cases for appears both in selections and in joins [10]: the disk-based ones. • After a selection, the result is a BAT containing positions. The summarized state of affairs in the area of late material- • Join processing also results in BATs that contain positions. ization is presented in Table 1. There, we describe how well late SELECT sum(lo_revenue), d_year, p_brand1 FROM lineorder, date, part, supplier To user Operator Operator Reader internals Join Index WHERE lo_orderdate = d_datekey and lo_partkey = p_partkey and lo_suppkey = s_suppkey Sort T1 T2 T3 local access remote access and p_category = 'MFGR#12' 1 2 3 and s_region = 'AMERICA' GROUP BY d_year, p_brand1 ... Aggregate 2 1 2 ORDER BY d_year, p_brand1; Tuples 3 3 3 Columns Read(d_datekey) Join Read(lo_orderdate) T1 T2 T3 Read(s_suppkey) Join Read(lo_suppkey) DS(date) row1 row1 row1 row2 row2 row2 Read(s_region) Filter Read(p_partkey) Join Read(lo_partkey) row3 row3 row3 DS(supplier) DS(lineorder) Read(p_category) Filter DS(part) (a) Example of join index (b) Query plan example Figure 4: PosDB’17 internals materialization is elaborated inside selections and joins, and also in a draft [8]. During their implementation, position-enabled system type (main memory or disk-based). query processing will play a vital role, since a lot of de- Therefore, a new disk-oriented late materialization model is pendency discovery algorithms rely on partitions [7, 26], needed, which will: which are essentially positions. (1) support C-Store-style late materialization in selections, (2) support late materialization inside joins and will allow handling arbitrary number of joins. 3 SYSTEM ARCHITECTURE In this paper, we propose such an approach which we call ultra-late materialization. We implement it inside PosDB — a The PosDB [13] project started in 2017 as a research effort to distributed disk-based column-store engine. study late materialization-oriented query processing. We focused We run an extensive experimental study featuring several on query processing issues not studied in PosDB’s predecessors, existing materialization approaches. The goal is to compare them namely: LM and processing of aggregation queries (including with each other and study the resulting performance. window function processing), distributed processing in LM en- Finally, it is important to discuss three other “general” reasons vironments, subqueries and LM, etc. We were also interested in for the re-evaluation of late materialization approaches: solving the problem of disk-spilling joins given the advancement of hardware (mostly SSDs) that happened since the time of the pi- • More than ten years passed since the last experiments oneers. In short, PosDB is a distributed and parallel column-store with late materialization. Except a single paper [34], these oriented at analytical processing in a disk-based environment. experiments considered HDDs, state-of-the-art hardware As of 2017, query processing in PosDB was organized accord- of that time. Nowadays, SSDs are ubiquitous, and their ing to the pull-based Volcano model [16] with block-oriented characteristics, in particular lower random access time, processing. This model assumes that query plans are represented may seriously improve the performance of late materializa- via trees that have operators as nodes and data flows as edges. tion inside joins. This may help overcome the out-of-order Each operator supports an iterator interface and can produce probing problem, which arises during late materialization either positions or tuples exclusively. inside joins. Modern SSDs have significantly reduced costs To represent intermediate positional data, PosDB uses a gen- of disk seeks, which may provide efficient reading of val- eralized join index [36]: a data structure that essentially encodes ues by positions even when position lists are not fully the result of a series of joins. The join index states that row 𝑟 1 of sorted. table 𝑇1 was joined with row 𝑟 2 of table 𝑇2 and so on. An example • There is a resurgence of interest in late materialization- is presented in Figure 4a. Past the materialization point, PosDB enabled systems, driven by the need of supporting prove- represents data in tuples, similarly to row-stores. nance in systems for visual analytics [27, 28, 37]. A typical The query plans in classic PosDB contain two parts: positional use-case is as follows: a user runs a query, visualizes it as (columnar) data and tuples. Operators that manipulate positions a bar chart, and then wants to “zoom in” — that is, per- (joins, filters, positional set operators) use join indexes to form additional actions with a subset of visualized data represent intermediate results, and the tuple part (aggregation which they select with a mouse. In such scenarios, the and sort operators) is similar to row-stores. An example plan query executor needs to obtain records that contributed is presented in Figure 4b, where the dashed line denotes the to the selected buckets and thus, position-oriented query materialization point (or the Tuples-Columns border). processing is useful [37]. Therefore, an efficient late mate- Several operators such as the positional set AND and OR need rialization model may improve performance of such appli- only positional data, while others, i.e. join, filter, and aggre- cations. gation also require the corresponding values. To fetch values • Position-enabled processing will be extremely useful for using positions from the join index, we have introduced special implementing functional dependency predicates inside entities named readers. For example, ColumnReader is designed queries [12]. A number of promising operators is described to retrieve values of a specific attribute, and SyncReader is de- 4 QUERY EVALUATION STRATEGIES signed to read values of several attributes synchronously. From the query processing standpoint, the modifications that PosDB is not only distributed, but also parallel: it includes the PosDB has undergone over the years have allowed us to imple- Asynchronizer and UnionAll operators. The former allows to ment several different approaches to materialization: start executing an operator subtree in a separate thread, similarly (1) early materialization, to the Exchange operator [17]. The latter enables the collection (2) late materialization, of data from several subtrees, all executed in their own threads, (3) hybrid materialization, effectively providing data parallelism inside query plans. (4) ultra-late materialization. A detailed description of the baseline architecture can be found in paper [13]. Since 2017, we have been exploring various as- The first two are the classic approaches, described in the early pects of LM-oriented processing: aggregation [35], window func- 10’s [3]. They will be used as the baselines. Hybrid materializa- tions [25], compression [31], intermediate result caching [15], tion is our recent development, aimed specifically at solving a distributed processing [11]. particular issue that arises in the distributed case during position- Recently, the engine has been significantly extended. First of based query processing. We do not include it in the benchmark all, we have improved the disk-based capabilities by introducing since it does not fit into the evaluation scenario. Nevertheless, we a proper buffer manager and reworking access methods. This believe it is important to mention it here, since it demonstrates an moved PosDB closer to industrial systems and allowed us to important issue that will inevitably arise in any position-based obtain higher-quality experimental data. query processing model run in a distributed environment. Finally, Next, we have extended the support of partitioning by adding ultra-late materialization is the core approach discussed in this hash- and range-based partitioning methods. We have also switched paper. from a column-based partitioning scheme to a row-based one Consider, for example, query Q.11 of the Star Schema Bench- due to the former’s excessive complexity of maintenance. On top mark [2]: of that, we have added compression support. SELECT SUM( l o _ e x t e n d e d p r i c e ∗ l o _ d i s c o u n t ) AS r e v e n u e FROM l i n e o r d e r , date WHERE l o _ o r d e r d a t e = d _ d a t e k e y AND l o _ d i s c o u n t BETWEEN 1 AND 3 d _ y e a r = 1 9 9 3 AND AND l o _ q u a n t i t y < 2 5 ; This query has a join, three predicates, and an aggregation. Query processing model available in PosDB allows to produce several different query plans for each of materialization strategies. Let us consider them. 4.1 Ultra-Late Materialization Figure 5: Novel query plans, possible in PosDB’21 Operator Reader to user Operator internals Query processing faced the most extensive modifications. First ... Aggregate Tuples of all, truly distributed operators like distributed join [23] and distributed aggregation [32] have been implemented. Next, Columns we have implemented a number of novel tuple-based operators Read(lo_orderdate) Join Read(d_datekey) that significantly extended the number of allowed query plan types. PosAND For the current study, the most important addition was the introduction of a tuple-based join, filter, cross-product, and Read(lo_discount) Filter DS(lineorder) aggregation operators. This allowed to greatly expand the num- ber of admissible query plans and made possible an implementa- Read(lo_quantity) Filter DS(lineorder) tion of a several late materialization strategies. Unlike PosDB’17, now there may be multiple materialization points inside the query Read(d_year) Filter plan, as shown in Figure 5. In this figure, the top join and the filter are tuple-based operators. Thus, it is now possible to DS(date) construct query plans that may have tuple data representation inside subplans. This may be beneficial in cases when running Figure 6: Query evaluation strategies for Q.11: ultra-late the whole query using late materialization is suboptimal. materialization Finally, we have laid foundations for hybrid materialization (HM) by introducing a special intermediate representation. For As stated in Section 2.8 the idea of the ultra-late materialization now, it is supported only in single-tabled parts of a plan. is to combine late materialization in selections and joins. This makes possible operating on positions and thus deferring tuple • In a EM strategy, readers are present only in materializa- reconstruction for as long as possible. Therefore, such plans tion operators — all other operators are tuple-based. The should perform all filters and joins below the materialization Tuples-Columns border resides on the lowest level among point, thus using positional representation (join indexes). all considered strategies. The ultra-late materialization plan for the example query is Finally, note that in this strategy only attributes necessary for presented in Fig. 6. At the first step, two position lists for the the further query execution are materialized, i.e. not every one LINEORDER table are obtained by applying predicates to each of the corresponding tables. of the involved columns. This is achieved via using Filter oper- ators with readers corresponding to the involved columns. 4.3 Late Materialization Next, the PosAND operator is used to intersect position lists in Another baseline approach was the C-Store-style late material- order to obtain IDs of LINEORDER records that conform to both ization in selections. As described in Section 2.5, its core idea predicates from the query. concerns operating on positions in the bottom parts of the plan The DATE table is processed in the same way, but using only only. That is, predicates are evaluated using join index represen- a single Filter operator which works with the d_year column. tation, but costlier operators such as joins are evaluated using The Join operator is located higher up in the query plan. the classic tuple-based representation. Note that this is a positional join operator, i.e. it operates on join indexes. In this particular case, they are one-dimensional, i.e. plain position lists. to user Operator Reader Despite being positional, this join still requires the data values Operator themselves. Therefore, it needs readers for the corresponding Aggregate internals columns which can be seen in the figure. This operator produces a list of position pairs, each corre- Join Tuples sponding to LINEORDER and DATE respectively. The list is fed into the aggregation operator which also materializes the final Columns result. ... Materialize 4.2 Early Materialization PosAND Read(lo_discount) Filter DS(lineorder) to user Operator Reader Read(lo_quantity) Filter DS(lineorder) Operator Aggregate internals Read(d_datekey) Materialize Join Read(d_year) Filter Tuples Filter DS(date) Columns Read(lo_quantity) Read(lo_extendedprice) Figure 8: Query evaluation strategies for Q.11: late materi- Mat-ze Read(lo_orderdate) Read(lo_discount) alization DS(lineorder) The LM plan of Q.11 is presented in Fig. 8. It can be seen that Filter in this query evaluation strategy, predicates are evaluated in Read(d_year) Mat-ze Read(d_datekey) the positional part of the plan. Similarly to the ultra-late mate- rialization plan, individual position lists for LINEORDER table DS(date) are intersected. However, an operator of explicit materialization is run next. It transforms the join-index representation into a Figure 7: Query evaluation strategies for Q.11: early mate- tuple-based one. Note that, similarly to the EM approach, this rialization tuple representation contains only those attributes that would be requested by subsequent plan operators. Despite being well-known [3], early materialization is a query Unlike the EM strategy, which materialized four attributes, evaluation strategy that has been implemented in PosDB since only three attributes will be materialized for the LINEORDER ta- only recently. The idea of this strategy is to imitate a classic row- ble: lo_orderdate, lo_discount, and lo_extendedprice. The lo_quan- store: at first, a tuple representation is built using only the neces- tity attribute will not be materialized, since LINEORDER has sary attributes and then all required operators are run. Therefore, already been filtered by the corresponding predicate and subse- filters, joins and other operators are performed on the tuple-based quent plan operators do not need this attribute. representation. Concerning the DATE table, only the d_datekey attribute will Fig. 7 contains the EM plan for the considered query. It pos- be materialized. Therefore, higher plan levels will be reached sesses several distinctive features: only by two attributes of limited use: lo_orderdate and d_datekey. • Tuple materialization happens at the bottom of the plan, Both will be discarded immediately after the join operator since i.e. materialization operators are essentially the first non- they do not belong to the answer. At the same time, they should trivial ones. be materialized, since they are needed to perform this join. Finally, one can see that tuple-based part of the plan contains described, establishes a novel, completely different context. And aggregation and join operators. Both of them are tuple-based in this context, reverting to late materialization just after the and therefore contain no readers. Join operator may be preferable. Another major change is the encapsulation of the data acqui- 4.4 Hybrid Materialization sition process in an independent separate entity. This allows to Intensive data reaccess within a query plan is an inherent char- combine both late and hybrid materialization when accessing acteristic of the late materialization approach. The existing pa- several replicas located both locally and remotely. Moreover, it pers usually consider the local case [3, 6, 30], where the induced can be done seamlessly to an operator itself due to the unified overhead can be reduced by an efficient buffer manager. Then, interface and C++ template techniques. the advantages of late materialization (compression [4], cache- Encapsulating the data acquisition strategy also increases op- consciousness [3]) outweigh its drawbacks. portunities for query plan tuning. A query optimizer can freely However, in a distributed case, the overhead is significantly choose not only the particular nodes the data is received from, higher, because data has to be reaccessed via network. At the but also the order and the intensity of their usage. It can be espe- same time, a network cache is usually less robust than a buffer cially important in the case of a distributed join. The dynamic manager. Moreover, the common query processing workflow reshuffling stage may consider several data streams from remote may require meaningless round-trips of position blocks between Multiplexors as well as local data from disk. nodes. data request Network Join(A,B) pos Reader Data loader pos, data Join(A, B) 1400 Ultra Late Materialization (PosDB) request pos B request pos, data Late Materialization (PosDB) Receive Receive B 1200 Early Materialization (PosDB) Node1 pos Node1 PostgreSQL request pos data request pos, data 1000 MariaDB Column Store data Node2 Time, seconds Send Send Materialize(A) Reader pos 800 pos request pos A data request pos A Filter(A) Reader Filter(A) Node2 600 (a) Late materialization (b) Hybrid materialization 400 200 Figure 9: PosDB: Join-Filter in a distributed case 0 Consider an execution of a Filter-Join operator sequence in 0 20 40 60 80 100 Scale Factor PosDB using tables A and B, shown in Figure 9a. This operator sequence is commonly found in query plans. It also has the same Figure 10: Total runtime of considered strategies workflow as a necessary step of the reshuffling process during a distributed join. With the late materialization approach, the Filter operator processes table A and passes the resulting position blocks over 5 EXPERIMENTAL EVALUATION the network to 𝑁𝑜𝑑𝑒 1 . Then the Join operator has to return the To study the considered strategies, we have performed a twofold same blocks to 𝑁𝑜𝑑𝑒 2 with a request for table A data. Thus, a experimental evaluation. First, we have compared them to each meaningless network communication cycle exists. other within PosDB. Second, we have selected the two most sim- Moreover, in the general case, the Join operator can process ilar industrial competitors: a disk-based row-store (PostgreSQL) position blocks from a set of Filter operators executed both and a column-store (MariaDB Column-Store). We centered our locally and on different remote nodes. Table A data may be re- baseline around open-source-licensed systems due to the DeWitt quested by the next query plan operator again, immediately af- Clause [20], which they are free of. At the same time, they are ter the Join. Therefore, a specific optimization for a particular mature enough to be used in industrial applications. “Filter-Join” case is not sufficient. Instead, the whole query Since our evaluation was performed in a centralized environ- execution model should be improved. ment, we did not consider the HM strategy. PosDB solves the problem with two major changes: hybrid In order to run our experiments, we have used the Star Schema materialization support and moving out the data acquisition logic Benchmark [2], varying the Scale Factor (SF) in a [1, 100] range. from operator internals into a separate optimizer-configurable Thus, the total data volume ranged from 0, 8 to 80 GB. We have entity. selected SSB because of its focus on engine comparison instead First, hybrid materialization [3, 6] allows to pass both data of optimizer output. This is achieved by the relative simplicity of and positions between operators. Thus, the required table A queries: they do not allow much variability in terms of admissible data is materialized immediately after the Filter operator and query plans. sent to Join together with position blocks, see Figure 9b. The In order to provide an equal environment for all systems, the meaningless position round-trip is eliminated, and the network following was done during the test runs: communication overhead decreases nearly twice. • Data compression, JIT-compilation, SIMD, and indexes In existing articles, hybrid materialization is treated as an were not used. one-way step from purely position-based to tuple-based repre- • DBMSes were not tuned, default parameters were used. sentation [3]. However, the distributed case, which we have just • Default data plans were used. 50 Early Materialization (PosDB) 140 Early Materialization (PosDB) Late Materialization (PosDB) Late Materialization (PosDB) Ultra Late Materialization (PosDB) Ultra Late Materialization (PosDB) PostgreSQL 120 PostgreSQL 40 MariaDB MariaDB 100 Time, seconds Time, seconds 30 80 20 60 40 10 20 Q1.1 Q1.2 Q1.3 Q2.1 Q2.2 Q2.3 Q3.1 Q3.2 Q3.3 Q3.4 Q4.1 Q4.2 Q4.3 Q1.1 Q1.2 Q1.3 Q2.1 Q2.2 Q2.3 Q3.1 Q3.2 Q3.3 Q3.4 Q4.1 Q4.2 Q4.3 Query Query (a) Scale Factor 40 (b) Scale Factor 80 Figure 11: Per-query breakdown • Hash-based versions of joins were used: the smaller table • Fig. 11b shows that increasing the data size leads to Post- was put into a hash table. greSQL losing on the first three query flights out of total • Intra- and inter- query parallelism was turned off. Dis- four. tributed capabilities were not used. • It is interesting to note that PosDB’s LM and EM strategies All systems were run without warm-up, OS page cache was are in-between in terms of performance on all but the dropped before launching each experiment run. Since each run fourth flight. This query flight is the most complex one, yielded almost the same numbers, we present results averaged featuring four joins. over ten launches. Experiments were run on a desktop with the following hardware: AMD Ryzen 9 3900X, GIGABYTE X570 6 FUTURE PLANS AORUS ELITE, Kingston HyperX FURY Black HX434C16FB3K2/32 (1) The critical issue for adoption of late-materialization in 32GB, 512 GB SSD M.2 Patriot Viper VPN100-512GM28H. the 00’s were disk-spilling joins. LM-induced out-of-order Software: Ubuntu 20.04 LTS, GCC 9.3.0, PostgreSQL 12.5, Mari- probing [3, 5] killed the performance on HDDs and even aDB Column-Store 1.5.2 on MariaDB Community Server 10.5.8. SSDs of that time. Nowadays, hardware and software spec- First of all, we have measured the total runtime on the whole ifications have improved, and novel storage such as NVMe SSB workload (Fig 10). Here, it can be seen that PosDB’s ultra-late has appeared. All this calls for a second round of mate- materialization strategy performs the best with all SF values. The rialization evaluation that will employ joins with tables second best, MariaDB Column Store, is almost 50% slower. The larger than main memory. next three, LM, EM, and PostgreSQL, show more interesting be- (2) We plan to further explore Hybrid Materialization not havior. In the [0, 40] range, all three of them have approximately only because it looks promising for addressing the disk- the same run times. However, starting from SF 40 we can see the spilling join problem, but it also offers query plans that following: may improve query performance further. (1) Late Materialization is consistently ≈ 5% faster than Early (3) Extending the system itself: implementing a parser, a query Materialization. optimizer, supporting subqueries, indexes, implementing (2) PostgreSQL loses about 15% to both Early and Late Mate- FD-manipulation predicates [8, 12], and so on. rialization. This happens due to PostgreSQL running out of memory to cache pages in its buffer manager. 7 CONCLUSION (3) MariaDB Column Store beats PostgreSQL by more than In this paper, we reviewed existing approaches to implementing two times. late materialization in order to compare and classify them. Then The second experiment provides an in-depth view into the re- we demonstrated that there are two types of late materialization — sults by studying the performance of individual queries. We have one concerning selections and one concerning joins. We showed selected two important points: SF 40 and SF 80. The first one is that there were no studies on how to efficiently combine these the point where all tuple-oriented strategies show approximately approaches for disk-based systems. Following this, we proposed equal performance, and the second one is the point where the the ultra-late materialization which makes it possible to imple- graphs fully diverge. ment both of these approaches inside a single query processing The results are presented in Fig. 11: model. • For almost all queries, PosDB’s ultra-late materialization Further, we have presented a new version of PosDB. Modifi- yields the best result. The only exception are the queries cations that it underwent over the years allowed us to greatly of the first flight, on which MariaDB Column Store shows expand its query processing model. Now it offers substantial flex- very close or better results. ibility in terms of possible query plans and allows to implement different types of late materialization strategies. We have imple- [18] Martin Grund, Jens Krueger, Matthias Kleine, Alexander Zeier, and Hasso mented the proposed ultra-late materialization as well as several Plattner. 2011. Optimal query operator materialization strategy for hybrid databases. In Proceedings of the 2011 Third International Conference on Advances other materialization approaches. Following the description of in Databases, Knowledge, and Data Applications (DBKDA ’11). IARIA, 169–174. basic system architecture and recent modifications, we have dis- [19] Stavros Harizopoulos, Daniel Abadi, and Peter Boncz. 2009. Column-Oriented Database Systems, VLDB 2009 Tutorial. nms.csail.mit.edu/~stavros/pubs/ cussed these strategies, using an SSB query as an example. tutorial2009-column_stores.pdf Next, in order to study materialization strategies we have [20] Joseph M. Hellerstein and Michael Stonebraker. 2005. Readings in Database performed experimental evaluation. We have compared them Systems. The MIT Press, pp. 96–. [21] Stratos Idreos, Fabian Groffen, Niels Nes, Stefan Manegold, K. Sjoerd Mul- not only with each other, but with industrial-grade systems as lender, and Martin L. Kersten. 2012. MonetDB: Two Decades of Research in well — PostgreSQL and MariaDB Column Store. Our experiments Column-oriented Database Architectures. IEEE Data Eng. Bull. 35, 1 (2012), demonstrated that our ultra-late materialization provides 50% 40–45. http://sites.computer.org/debull/A12mar/monetdb.pdf [22] Setrag Khoshafian, George P. Copeland, Thomas Jagodis, Haran Boral, and better performance than MariaDB Column Store, and is more Patrick Valduriez. 1987. A Query Processing Strategy for the Decomposed than three times faster than PostgreSQL. Storage Model. In Proceedings of the Third International Conference on Data Engineering. IEEE Computer Society, Washington, DC, USA, 636–643. http: //dl.acm.org/citation.cfm?id=645472.655555 ACKNOWLEDGMENTS [23] Donald Kossmann. 2000. The State of the Art in Distributed Query Processing. ACM Comput. Surv. 32, 4 (Dec. 2000), 422–469. https://doi.org/10.1145/371578. We would like to thank Anna Smirnova for her help with the 371598 preparation of the paper. [24] Andrew Lamb, Matt Fuller, Ramakrishna Varadarajan, Nga Tran, Ben Vandiver, Lyric Doshi, and Chuck Bear. 2012. The Vertica Analytic Database: C-Store 7 Years Later. Proc. VLDB Endow. 5, 12 (Aug. 2012), 1790–1801. https://doi.org/ REFERENCES 10.14778/2367502.2367518 [25] Nadezhda Mukhaleva, Valentin D. Grigorev, and George A. Chernishev. 2019. [1] [n.d.]. TPC Benchmark H. Decision Support. Version 2.17.1. http://www.tpc. Implementing Window Functions in a Column-Store with Late Materialization. org/tpch. In Model and Data Engineering - 9th International Conference, MEDI 2019, [2] 2009. P. E. O’Neil, E. J. O’Neil and X. Chen. The Star Schema Benchmark (SSB). Toulouse, France, October 28-31, 2019, Proceedings (Lecture Notes in Computer http://www.cs.umb.edu/~poneil/StarSchemaB.PDF. Accessed: 10/09/2017. Science), Klaus-Dieter Schewe and Neeraj Kumar Singh (Eds.), Vol. 11815. [3] Daniel Abadi, Peter Boncz, and Stavros Harizopoulos. 2013. The Design and Springer, 303–313. https://doi.org/10.1007/978-3-030-32065-2_21 Implementation of Modern Column-Oriented Database Systems. Now Publishers [26] Thorsten Papenbrock, Jens Ehrlich, Jannik Marten, Tommy Neubert, Jan- Inc., Hanover, MA, USA. Peer Rudolph, Martin Schönberg, Jakob Zwiener, and Felix Naumann. 2015. [4] Daniel Abadi, Samuel Madden, and Miguel Ferreira. 2006. Integrating Com- Functional Dependency Discovery: An Experimental Evaluation of Seven pression and Execution in Column-oriented Database Systems. In Proceedings Algorithms. Proc. VLDB Endow. 8, 10 (jun 2015), 1082–1093. https://doi.org/ of the 2006 ACM SIGMOD International Conference on Management of Data 10.14778/2794367.2794377 (SIGMOD ’06). ACM, New York, NY, USA, 671–682. https://doi.org/10.1145/ [27] Fotis Psallidas and Eugene Wu. 2018. Demonstration of Smoke: A Deep Breath 1142473.1142548 of Data-Intensive Lineage Applications. In Proceedings of the 2018 International [5] Daniel J. Abadi, Peter A. Boncz, and Stavros Harizopoulos. 2009. Column- Conference on Management of Data, SIGMOD Conference 2018, Houston, TX, oriented Database Systems. Proc. VLDB Endow. 2, 2 (Aug. 2009), 1664–1665. USA, June 10-15, 2018, Gautam Das, Christopher M. Jermaine, and Philip A. https://doi.org/10.14778/1687553.1687625 Bernstein (Eds.). ACM, 1781–1784. https://doi.org/10.1145/3183713.3193537 [6] Daniel J. Abadi, Daniel S. Myers, David J. DeWitt, and Samuel Madden. [28] Fotis Psallidas and Eugene Wu. 2018. Smoke: Fine-grained Lineage at Interac- 2007. Materialization Strategies in a Column-Oriented DBMS. In ICDE, Rada tive Speed. Proc. VLDB Endow. 11, 6 (2018), 719–732. https://doi.org/10.14778/ Chirkova, Asuman Dogac, M. Tamer Özsu, and Timos K. Sellis (Eds.). IEEE, 3184470.3184475 466–475. [29] Ravishankar Ramamurthy, David J. DeWitt, and Qi Su. 2002. A case for [7] Tobias Bleifuß, Susanne Bülow, Johannes Frohnhofen, Julian Risch, Georg fractured mirrors. In Proceedings of the 28th international conference on Very Wiese, Sebastian Kruse, Thorsten Papenbrock, and Felix Naumann. 2016. Large Data Bases (VLDB ’02). VLDB Endowment, 430–441. http://dl.acm.org/ Approximate Discovery of Functional Dependencies for Large Datasets. In citation.cfm?id=1287369.1287407 Proceedings of the 25th ACM International on Conference on Information and [30] L. Shrinivas, S. Bodagala, R. Varadarajan, A. Cary, V. Bharathan, and C. Bear. Knowledge Management (CIKM ’16). Association for Computing Machinery, 2013. Materialization Strategies in the Vertica Analytic Database: Lessons New York, NY, USA, 1803–1812. https://doi.org/10.1145/2983323.2983781 Learned. In 2013 IEEE 29th International Conference on Data Engineering (ICDE). [8] Nikita Bobrov, Kirill Smirnov, and George A. Chernishev. 2020. Extending 1196–1207. https://doi.org/10.1109/ICDE.2013.6544909 Databases to Support Data Manipulation with Functional Dependencies: a [31] Alexander Slesarev, Evgeniy Klyuchikov, Kirill Smirnov, and George A. Cherni- Vision Paper. CoRR abs/2005.07992 (2020). arXiv:2005.07992 https://arxiv.org/ shev. 2021. Revisiting Data Compression in Column-Stores. In Model and abs/2005.07992 Data Engineering - 10th International Conference, MEDI 2021, Tallinn, Estonia, [9] Peter Boncz, Marcin Zukowski, and Niels Nes. 2005. MonetDB/X100: Hyper- June 21-23, 2021, Proceedings (Lecture Notes in Computer Science), J. Chris- pipelining query execution. In In CIDR’05. tian Attiogbé and Sadok Ben Yahia (Eds.), Vol. 12732. Springer, 279–292. [10] Peter A. Boncz and Martin L. Kersten. 1999. MIL Primitives for Querying https://doi.org/10.1007/978-3-030-78428-7_22 a Fragmented World. The VLDB Journal 8, 2 (Oct. 1999), 101–119. https: [32] Cluet Sophie and Moerkotte Guido. 1995. Efficient Evaluation of Aggregates //doi.org/10.1007/s007780050076 on Bulk Types. In Proceedings of the Fifth International Workshop on Database [11] George Chernishev, Viacheslav Galaktionov, Valentin Grigorev, Evgeniy Programming Languages. 8. Klyuchikov, and Kirill Smirnov. 2017. A Study of PosDB Performance in [33] Mike Stonebraker, Daniel J. Abadi, Adam Batkin, Xuedong Chen, Mitch a Distributed Environment. In Proceedings of the 2017 Software Engineering Cherniack, Miguel Ferreira, Edmond Lau, Amerson Lin, Sam Madden, Eliz- and Information Management (CEUR Workshop Proceedings), Vol. 1864. CEUR- abeth O’Neil, Pat O’Neil, Alex Rasin, Nga Tran, and Stan Zdonik. 2005. C- WS.org, Saint-Petersburg, Russia. store: A Column-oriented DBMS. In Proceedings of the 31st International Con- [12] George A. Chernishev. 2020. Making DBMSes Dependency-Aware. In 10th Con- ference on Very Large Data Bases (VLDB ’05). VLDB Endowment, 553–564. ference on Innovative Data Systems Research, CIDR 2020, Amsterdam, The Nether- http://dl.acm.org/citation.cfm?id=1083592.1083658 lands, January 12-15, 2020, Online Proceedings. www.cidrdb.org. http://cidrdb. [34] Dimitris Tsirogiannis, Stavros Harizopoulos, Mehul A. Shah, Janet L. Wiener, org/cidr2020/gongshow2020/gongshow/abstracts/cidr2020_abstract67.pdf and Goetz Graefe. 2009. Query Processing Techniques for Solid State Drives. In [13] G. A. Chernishev, V. A. Galaktionov, V. D. Grigorev, E. S. Klyuchikov, and K. K. Proceedings of the 2009 ACM SIGMOD International Conference on Management Smirnov. 2018. PosDB: An Architecture Overview. Programming and Computer of Data (SIGMOD ’09). ACM, New York, NY, USA, 59–72. https://doi.org/10. Software 44, 1 (Jan. 2018), 62–74. https://doi.org/10.1134/S0361768818010024 1145/1559845.1559854 [14] George P. Copeland and Setrag N. Khoshafian. 1985. A decomposition storage [35] A. Tuchina, V. Grigorev, and G. Chernishev. 2018. On-the-fly filtering of model. SIGMOD Rec. 14, 4 (1985), 268–279. https://doi.org/10.1145/971699. aggregation results in column-stores. CEUR Workshop Proceedings 2135 (2018), 318923 53–60. [15] Viacheslav Galaktionov, Evgeniy Klyuchikov, and George A. Chernishev. 2020. [36] Patrick Valduriez. 1987. Join Indices. ACM Trans. Database Syst. 12, 2 (June Position Caching in a Column-Store with Late Materialization: An Initial Study. 1987), 218–246. https://doi.org/10.1145/22952.22955 In Proceedings of DOLAP@EDBT/ICDT 2020 (CEUR Workshop Proceedings), Il- [37] Eugene Wu. 2021. Systems for Human Data Interaction (keynote). In Proceed- Yeol Song, Katja Hose, and Oscar Romero (Eds.), Vol. 2572. CEUR-WS.org, ings of the 2nd Workshop on Search, Exploration, and Analysis in Heterogeneous 89–93. http://ceur-ws.org/Vol-2572/short14.pdf Datastores (SEA-Data 2021) co-located with 47th International Conference on [16] Goetz Graefe. 1993. Query Evaluation Techniques for Large Databases. ACM Very Large Data Bases (VLDB 2021), Copenhagen, Denmark, August 20, 2021, Comput. Surv. 25, 2 (June 1993), 73–169. https://doi.org/10.1145/152610.152611 Davide Mottin, Matteo Lissandrini, Senjuti Basu Roy, and Yannis Velegrakis [17] G. Graefe. 1994. Volcano — An Extensible and Parallel Query Evaluation (Eds.). System. IEEE Trans. on Knowl. and Data Eng. 6, 1 (Feb. 1994), 120–135. https: //doi.org/10.1109/69.273032