=Paper=
{{Paper
|id=Vol-2738/paper2
|storemode=property
|title=Future Fetch -- Towards a Ticket-based Data Access from Secondary Storage in Database Systems
|pdfUrl=https://ceur-ws.org/Vol-2738/LWDA2020_paper_2.pdf
|volume=Vol-2738
|authors=Demian E. Vöhringer,Klaus Meyer-Wegener
|dblpUrl=https://dblp.org/rec/conf/lwa/VohringerM20
}}
==Future Fetch -- Towards a Ticket-based Data Access from Secondary Storage in Database Systems==
<pdf width="1500px">https://ceur-ws.org/Vol-2738/LWDA2020_paper_2.pdf</pdf>
<pre>
                                Future Fetch
 Towards a ticket-based data access for secondary storage
                   in database systems

                 Demian E. Vöhringer and Klaus Meyer-Wegener

     FAU Erlangen-Nuremberg, Chair for Computer Science 6 (Data Management)
            {firstname.surname}@fau.de, https://www.cs6.tf.fau.de/


        Abstract. When accessing data from a database, a database manage-
        ment system has to accomplish many tasks. Checking the conformity
        of the query, generating a reasonably good query plan and executing it
        in a concurrent system are just a few of them. During query execution,
        there are some points at which the main execution thread must stop and
        wait for data. This synchronous waiting can have a major impact on
        the overall query performance. In this paper we introduce the idea of a
        ticket-based fetch, which supports the database management system by
        waiting for data asynchronously. This can improve the query-execution
        performance in database systems.

        Keywords: Asynchronism · Data Flow · Data Access · Ticket · Database
        System · Data Warehouse · Big Data · Concurrent


1     Introduction
Large database systems (DBS) depend heavily on fast access to huge volumes of
data, which can easily exceed the capacity of a server’s random-access memory
(RAM) by orders of magnitude.
     These data can be stored in three different storage spaces, the primary, sec-
ondary and archive storage space [9]. They all have different properties that
allow them to be used for different tasks. Primary storage uses RAM, while
solid-state and hard-disk drives are found in secondary storage, which is orders
of magnitude slower. If the data a query is to be executed on is stored in the
secondary storage space, they must be transferred to the much faster primary
storage space to perform operations on them. It is therefore crucial that algo-
rithms of the DBS depend as little as possible on data in slower storage spaces
and try to rely only on data that has already been loaded into primary storage.
However, if data are not in main memory, they still must be fetched from disk,
which can cause long latencies and slows down the execution of queries.
     G. Graefe [6] summarized ideas like indices, buffer management and paral-
lelism in DBS, which allow to cope with these latencies. With the help of indices,
    Copyright © 2020 by the paper’s authors. Use permitted under Creative Commons
    License Attribution 4.0 International (CC BY 4.0).
data can be found much faster on secondary and archival storage, while good
buffer management helps keeping the right data in main memory. Parallelism in
database systems is divided in three categories, namely interquery parallelism,
horizontal interoperator parallelism, and vertical interoperator parallelism. With
interquery parallelism, a DBS can execute several queries simultaneously and
therefore use waiting times for data loading by switching to another query that
already has its data. Horizontal interoperator parallelism performs the same op-
eration on several partitions of the tuple set at the same time and can take
advantage of modern multi-core CPUs. Vertical interoperator parallelism exe-
cutes operators in pipelines, as shown in [10].
    These ideas about parallelism focus more or less on data already accessible
in primary storage, or on finding data in the storage spaces, ignoring the fact
that the transfer of these data to primary storage adds latency to the execution
of each single query. The question is whether we can support these parallelism
ideas even better by providing an asynchronous data flow inside a DBS.
    This can be useful in a query that connects many tuples from different tables
to create one large result tuple, as it is done in star-shaped queries (e. g. List-
ing 1.1). A star-shaped query is similar to a star query, which uses a main table
(fact table) that contains information about how to connect the other tables
(dimension tables). The only difference is that star queries are bound to data-
warehouse systems and use information from the dimension tables mostly for
reducing the result-set size1 . With a limited number of values in the dimension
tables, star queries can be optimized by using bitmap indices, while star-shaped
queries cannot use this kind of optimization and are thus much harder to opti-
mize.
    In this paper we present some first steps of the idea of a ticket-based data
loading from secondary storage. We do this by introducing a ticket system for
database operators. It desynchronizes data and execution flow of query process-
ing.
    We begin our journey with a simple example. We describe the problem (Sec-
tion 2), find a first solution (Section 2.1), explain that solution in depth (Sec-
tion 2.2), and try to expand that solution, so it can be executed in a more
general workflow of query execution (Section 2.3). After that we discuss related
work (Section 3), draw a conclusion, and present future work (Section 4).


2     The Idea
To work with the data, a classical DBS must transfer it from secondary or
archival storage to primary storage. This process is shown in Figure 1a for a
star-shaped query. Because the data-tranfer rate of secondary storage is limited,
access to the data takes some time, so any query execution in need of these data
must interrupt its execution flow and wait some time for the data to arrive.
    This pause can become large depending on the exact location on disk and
the access path for it, and the query execution must wait, even if the data is
1
    http://www.orafaq.com/tuningguide/star%20query.html
not needed immediately. To improve the search, indices have been introduced,
which can help to find the required data faster on secondary storage. Once the
data is collected, query execution proceeds to the next step, which may require
some more data. These steps must be repeated for each tuple-fetch operation,
until the result set is complete and can be returned to the caller. So waiting for
data can take a large portion of the execution time of a single query.
    To reduce this data latency, we present an initial idea of ticket-based data
access, describe it in depth, and expand it towards a more general query execu-
tion.


2.1   A first step

To make the idea understandable, we simplify the problem a bit. We ignore some
ideas on parallelism presented in [6] and only focus on one query that works on
one large set of tuples. This way we still have vertical interoperator parallelism,
but neither interquery parallelism nor horizontal interoperator parallelism. Ad-
ditionally we assume that secondary indices (an index not containing data, but
only information where to find the data in the storage spaces) can be kept in
primary storage for fast access.
    With our idea we want to intervene exactly where the query execution must
wait for data and try to improve the execution time with asynchronous data
access. As shown in Figure 1c, our first step towards asynchronous data access
focuses on an index scan (a scan operator using an index) combined with a
special Data Access Manager, our abstraction for access to secondary storage
and moving data to primary storage.
    Traditionally, an index scan performs three steps in accessing the data from
secondary storage. First it uses a secondary index and checks whether and where
the corresponding tuple can be found. Because of our assumption, this is done in
main memory and thus very quickly. So the location of the tuple on secondary
storage is known. In the second step, this tuple is loaded to primary storage,
where the third step continues and returns the tuple to the calling operator. In
our example, this is a join operator.
    If we now rework this index-scan operator to use the Data Access Manager,
we obtain two operators. We get a “check for existence, get the storage-space
address and start loading” (∃) and a “wait for loading and combine the loaded
tuple with the existing tuple” (m) operator, as seen in Figure 1b. We describe
these operators and the Data Access Manager in detail in Section 2.2. You can
easily see that the second step has been delegated to the Data Access Manager.
    Now that we have refactored our index scan into two operators, we move
them in our query-execution tree to optimize it, as shown in Figure 1c. The ∃
operator remains in the same place where the scan was, while the m operator is
moved to where the data is actually needed. In our example, this is the end of
query execution just before returning the result tuple, where we insert the “wait
and combine all data” operator. Because the data from R1 are needed right now,
we can use the old scan operator for simplicity.
                                                                                                           ./R1.5 id=R5.id

SELECT ∗                                                                                 ./R1.4 id=R4.id                 ScanR5
FROM R1 , R2 , R3 , R4 , R5


                                                                                                                                  Secondary Storage
WHERE R1 . i d = 42                                               ./R1.3 id=R3.id                                   ScanR4
AND R1 . 2 i d = R2 . i d
                                                            ./R1.2 id=R2.id                                     ScanR3
AND R1 . 3 i d = R3 . i d
AND R1 . 4 i d = R4 . i d                                σR1.id=42 ScanR2
AND R1 . 5 i d = R5 . i d ;
Listing 1.1: Exemplary SQL for
                                                          ScanR1
a star-shaped query

                                                       (a) Traditional access approach
                                        ./R1.5 id=R5.id

                                 ./R1.4 id=R4.id           mR5

                            ./R1.3 id=R3.id        mR4
                                                           ∃ R5


                                                                   Data Access Manager
                    ./R1.2 id=R2.id        mR3


                                                                                            Secondary Storage
                                                   ∃ R4
                  σR1.id=42      mR2
                                           ∃ R3

                     mR1         ∃ R2


                     ∃ R1

                 (b) Traditional access approach with our newly
                 designed operators and the Data Access Manager
                                         mall


                                        ./R1.5 id=R5.id
                                                                   Data Access Manager


                                                                                            Secondary Storage


                                 ./R1.4 id=R4.id          ∃ R5

                            ./R1.3 id=R3.id       ∃ R4

                    ./R1.2 id=R2.id      ∃ R3

                  σR1.id=42      ∃ R2


                   ScanR1

                  (c) The complete ticket-based system approach

Fig. 1: Execution trees of the star-shaped query presented in Listing 1.1. Access
to data from secondary storage is presented in red.
   We can now easily see that the main execution path just informs the Data
Access Manager which tuples need to be moved from secondary storage to pri-
mary storage for easy access. This has several advantages, as described in the
details (Section 2.2).
   We expect that this change in strategy can already improve the execution
time of quite small star-shaped queries in huge DBS, and we hope to show in
the future that this also holds for other query types.


2.2   The operators in detail

Here we explain in more detail our thoughts on the two operators and the Data
Access Manager mentioned above.
    The ∃ operator is designed to be called with a table and an expression
(e. g. simple search argument). It queries a secondary index structure for the
position of the tuple in secondary storage space. With this address and a certain
priority, it calls the Data Access Manager and receives a ticket (e. g. a pointer
or “future object”2 ) with information later allowing access to the fetched tuple.
This ticket is then forwarded to the next operator. Now the data flow is sepa-
rated from the execution flow and the current tuple is much smaller, since data
not currently needed are not yet part of the tuple. This may have the additional
advantage of being able to store the tuple closer to the CPU.
    The ticket is now part of the tuple and is resolved by the m operator when
the data are needed. The m operator looks at the ticket and determines whether
the data is already stored in primary storage. If this is not the case, it recalls the
Data Access Manager with the ticket, requests a higher priority and waits for
the data to be delivered. This results in a delay that should not be greater than
the normal access delay in a synchronous call. Hopefully the data can already
be found and can be accessed directly without any delay. Now the data can
be combined with the tuple and can be presented to the next operator. If the
m operator needs to get the data from multiple tickets at the same time, it can
prioritize the already loaded data and can give the Data Access Manager more
time for completing the other tickets.
    The Data Access Manager has a list of all tuples that must be moved to
primary storage and can therefore improve throughput by taking advantage of
the secondary storage device’s properties, reorganizing the tuples’ access order
accordingly3 . If a tuple is needed right now, the access order can be rescheduled
to make that data accessible sooner.


2.3   The next steps

The first step exploits some simplifications, like reducing the parallelism and
keeping the secondary indices in primary storage. We see no problem in bringing
2
  https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/Future.
  html
3
  http://www.cs.iit.edu/~cs561/cs450/disksched/disksched.html
the two other parallelism ideas back. The secondary indices in primary storage
gives us the advantage of knowing how many tuples to expect without needing
to access the secondary storage. Additionally we have dropped the idea of batch
execution, where at the same time multiple tuples are given to an operator as
input.
    We now like to expand the idea to overcome these simplifications.
    For querying a secondary index that is not in primary storage, we move the
querying of the secondary index into the Data Access Manager. The ∃ operator
then does not give an address to the Data Access Manager, but just the table and
the expression. By limiting the information, we force the Data Access Manager to
do a normal index scan, where the index might not be stored in primary storage.
This way we cannot predict how many tuples are going to be returned. Based
on our ticket system, the Data Access Manager now creates a data structure
that stores from zero to multiple tuples in it. The m operator reacts to this by
either dropping the given tuple (it has no join partner), or by iterating through
the data structure and creating a combination with each of the retrieved tuples,
resulting in multiple tuples for the next operator. Depending on the query and
the expected drop rate, the query optimization may place the m operator closer
to or farther away from the ∃ operator to balance between reduced unnecessary
workload and increased parallel execution.
    With this change, we can use arbitrary scan operators with the Data Access
Manager, and may also be able to include basic operations such as a projection or
selection to increase the amount of parallel execution and to reduce the amount
of space on primary storage needed to store the tuples.
    To handle batch execution, we expect the ∃ operator to open a ticket for
each of the tuples in the batch. No further adjustments are needed, because
the remainder of the execution can be kept as if there was only one tuple, only
that we now have to do this for each tuple of the batch. The m operator has
another small advantage in being able to check tickets from multiple tuples at
once, rather than waiting for the data of a particular tuple to be accessible right
now.
    With these steps we expect to be able to handle arbitrary queries in current
DBS.


3   State of the Art

We found the idea of increasing the speed of slower devices already in the work
of Sarawagi [12], where the optimization of access to tertiary memory (archival
storage) is discussed. Here the author suggest using a scheduler to first bring all
needed data in the secondary storage space and then start executing the query
on it. Even if you do not consider that we are talking about secondary to primary
storage data transfer and not archival to secondary, we still differ in the idea,
because we try to make data accessible while running the query and not to load
the data into a faster storage space before executing the query, which may not
need all the data.
    Codd described a restriction in [4] that was later used in the concept of semi-
joins [2]. This operator exploits the idea of reducing a relation to the tuples
needed before joining it with another relation. With this reduction all unneeded
tuples are dropped early and the join can be performed much faster. In particular
if working over a network, the data to be transferred can be reduced substantially.
The idea seems similar to ours, but semi-joins are not using the potential of
asynchronous data flow, and the relations are reduced in the execution of a join
and not before.
   For asynchronous data access there are some implementation concepts like
asynchronous I/O4 or future objects5 . These are some basic techniques for adding
asynchronism to software systems and we plan to use them for implementing the
Data Access Manager.
   The idea of splitting an operator into smaller bits and looking at these bits
was recently proposed by Dittrich and Nix [5]. This paper focuses on the idea of
creating physical operators that fit the data better, while we focus on establishing
a new asynchronous data flow that ignores the borders of scan operators.
   Gurumurthy et al. [7] gave an overview of constructing operators from smaller
parts and adapting more easily to new specialized hardware like GPUs [8] and
FPGAs [1]. We share the idea of having smaller parts in operators, but focus on
data access and not on adaptability to new hardware.
    There have been some ideas for operators that can take advantage of better
disk-scheduling algorithms, e. g. [3]. Here the authors present a physical scan
operator that can adapt to errors in cardinality estimation by prefetching more
blocks with tuples to tune between an index and a table scan. We, in contrast,
are not defining a new physical operator to replace a logical operator, but change
the size and borders of existing scan operators to better optimize the data flow
in the query execution.
    We found the idea of optimizing star queries inside the Oracle Documenta-
tion [11], but this is very specific for star queries in data warehouses. The data
from the dimension tables is only used for comparison with constraints and is
dropped afterwards. This way they use bitmap indices for the dimension tables
to say whether a tuple fits the constraint, which can increase the speed of star
queries in data warehouses. We just use star-shaped queries as an easy example
and must return the data from the “dimension” tables.
    In HyPer [10] a data-aware computing method was introduced by switching
the execution direction inside of a pipe from pulling to pushing. By doing this,
the data can be kept in CPU registers and need not be stored and loaded from
slower storage areas. We do not focus on keeping the data close to the CPU,
but instead try to minimize the waiting time when loading the data into a faster
storage space. We think it might be possible to combine these two ideas.


4
    https://man7.org/linux/man-pages/man7/aio.7.html
5
    https://en.cppreference.com/w/cpp/thread/future
4     Conclusion and Future Work

In this paper we present our idea of desynchronizing the data and execution flow
in query processing. We do this by introducing a Data Access Manager and by
splitting scan operators into new operators focused on data access. This way
we are able to place them independently in the query-operator tree. We expect
to reduce the latency for data access in a DBS and the overall query-execution
time.
    We are fully aware that this idea increases the search space of query opti-
mization even more, but we rather see this as a challenge for research than a
problem.
    As future work we plan to implement and test our idea. Because we expect
the rework of the software to be quite large, we have decided to use a lightweight,
relatively simple, and open-source database management system, namely Apache
Derby6 .
    We plan to do this by dividing the task into subtasks, which are: the imple-
mentation and evaluation of the Data Access Manager with the ticket system,
the splitting of the scan operator, and an additional optimizer step for the phys-
ical execution plan that replaces the standard operators with our operators and
moves them to fit our idea. Afterwards, we would like to extend the implementa-
tion with ideas from our next steps (Section 2.3). For evaluation we plan to run
some queries from the TCP-H benchmark on datasets magnitudes larger than
the RAM of the specific machine. Furthermore, we think about adding the idea
to distributed or federated DBS, to reduce the data-access latency of an access
to a remote system.


References
 1. Becher, A., B.G., L., Broneske, D., Drewes, T., Gurumurthy, B., Meyer-Wegener,
    K., Pionteck, T., Saake, G., Teich, J., Wildermann, S.: Integration of fpgas
    in database management systems: Challenges and opportunities. Datenbank-
    Spektrum 18(3), 145–156 (Nov 2018). https://doi.org/10.1007/s13222-018-0294-9
 2. Bernstein, P.A., Chiu, D.M.W.: Using semi-joins to solve relational queries. J.
    ACM 28(1), 25–40 (Jan 1981). https://doi.org/10.1145/322234.322238
 3. Borovica-Gajic, R., Idreos, S., Ailamaki, A., Zukowski, M., Fraser, C.: Smooth scan:
    robust access path selection without cardinality estimation. The VLDB Journal
    27(4), 521–545 (2018). https://doi.org/10.1007/s00778-018-0507-8
 4. Codd, E.F.: A relational model of data for large shared data banks. Commun.
    ACM 13(6), 377–387 (Jun 1970). https://doi.org/10.1145/362384.362685
 5. Dittrich, J., Nix, J.: The case for deep query optimisation. In: CIDR 2020, 10th
    Conference on Innovative Data Systems Research, Amsterdam, The Netherlands,
    January 12-15, 2020, Online Proceedings. www.cidrdb.org (2020), http://cidrdb.
    org/cidr2020/papers/p3-dittrich-cidr20.pdf
 6. Graefe, G.: Query evaluation techniques for large databases. ACM Comput. Surv.
    25(2), 73–169 (Jun 1993). https://doi.org/10.1145/152610.152611
6
    https://db.apache.org/derby/
 7. Gurumurthy, B., Broneske, D., Drewes, T., Pionteck, T., Saake, G.: Cooking dbms
    operations using granular primitives. Datenbank-Spektrum 18(3), 183–193 (Nov
    2018). https://doi.org/10.1007/s13222-018-0295-8
 8. He, B., Lu, M., Yang, K., Fang, R., Govindaraju, N.K., Luo, Q., Sander, P.V.:
    Relational query coprocessing on graphics processors. ACM Trans. Database Syst.
    34(4) (Dec 2009). https://doi.org/10.1145/1620585.1620588
 9. Kemper, A., Eickler, A.: Speichermedien. In: Datenbanksysteme, pp. 211–212. Old-
    enbourg Wissenschaftsverlag GmbH (2013), 9th edition
10. Neumann, T.: Efficiently compiling efficient query plans for mod-
    ern hardware. Proc. VLDB Endow. 4(9), 539–550 (Jun 2011).
    https://doi.org/10.14778/2002938.2002940
11. Rich, B.: Oracle database reference, 12c release 1 (12.1) e17615-20 (2017), https:
    //docs.oracle.com/database/121/DWHSG/schemas.htm#DWHSG9069
12. Sarawagi, S.: Database systems for efficient access to tertiary memory. In: Pro-
    ceedings of IEEE 14th Symposium on Mass Storage Systems. pp. 120–126 (1995).
    https://doi.org/10.1109/MASS.1995.528222

</pre>