Design Considerations Towards AI-Driven Co-Processor Accelerated Database Management Anh Trang Le Bala Gurumurthy Christoph Steup Gabriel Campero Durand David Broneske Gunter Saake Otto-von-Guericke-Universität Otto-von-Guericke-Universität Otto-von-Guericke-Universität Magdeburg, Germany Magdeburg, Germany Magdeburg, Germany firstname.lastname@ovgu.de firstname.lastname@ovgu.de firstname.lastname@ovgu.de ABSTRACT CPU counterparts still have poor performance for overall Adopting AI techniques for query optimization is an on- query processing in analytical benchmarks [28]. This is part- going research interest in the database community. Current- ly explained by the fact that optimizing query processing for ly, the search space for the best plan increases drastically, such systems encounters numerous intrinsic challenges cau- with the growing heterogeneity of the target hardware, the sed by the diversity of tuning techniques per device , the novel tuning choices offered, and co-processing. Hence, the uncertainty (given such techniques) in accurately modelling need for AI techniques to identify such a best plan in a real-world performance impact factors (through parametric reasonable time-frame is imminent. Though AI-based solu- cost models), the influence of workloads, as well as the need tions for improving query processing exist, there is still a to support scalability to more devices, data and new tu- need for principled system designs able to incorporate the ning choices [6]. All these aspects create a huge optimizati- different innovations, leverage synergy effects, and keep with on space, which is hard to evaluate [17], turning the task of production-readiness expectations when using AI. In this pa- establishing a uniform research prototype for their detailed per, we propose a series of seven ideal design characteristics study into a tough nut to crack. we envision for such systems. We then make the case for In this paper, we propose an early vision for a princi- revisiting the traditional Mariposa system, to consider its pled system architecture with the goal of exposing and ea- market concepts as a useful starting point for new system sing query optimization in a co-processor accelerated data designs to support the identified characteristics. Altogether, system. In traditional systems, optimization decisions are we expect that this short paper could be a modest contri- commonly addressed with hard-corded rules and heuristics. bution towards AI-driven heterogeneous processing, empha- Such systems, can result in weaknesses for generalizing to sizing the practical aspects of a supportive and principled unknown workloads or devices, as well as difficulties for ex- overall design. tending the heuristics and their maintenance. Throughout the last decade, there has been a shift to- wards employing AI techniques in handling these tasks more Keywords efficiently, alleviating the mentioned drawbacks of traditio- AI for DBMS, Self-driving DBMS, Heterogeneous query pro- nal methods. In the database community, there is a strong cessing, Hardware-accelerated query processing research trend that studies how AI can benefit database op- timizations [20, 14, 21, 29, 4]. Though the prospect is bright, there are several obstacles coming from AI itself, especially 1. INTRODUCTION in the deployment of AI solutions [13, 19]. Hence, adopting In the recent decade, computer systems composed of them in the co-processing domain forces us to solve a du- multiple heterogeneous processors have quickly become the al challenge: enhancing the performance of hybrid processor norm, rather than the exception [27]. Along with this ra- databases, as well as maintaining the AI, and specially ma- pid growth, we also witness an increasing adoption of hy- chine learning (ML) models in production. brid processor database systems that circumvent the ’power In order to confront this situation, we consider that prin- wall’ [3] and show great potentials for speeding up query cipled designs are needed for heterogeneous hardware data- processing [24]. However, without tailored optimization stra- base systems, which can facilitate the inclusion of learning tegies, these systems cannot achieve the best performance from the ground-up, to address the different challenges in gains. In fact, studies show that some GPU-accelerated sys- these systems. More precisely we propose seven characteri- tems with better operator-level implementations than their stics we deem as essential for the system: C1) task modula- rization, C2) collaborative agents as building blocks, C3) ex- changeability of optimizers, C4) the separation of represen- tation and policies, C5) concepts for database administrators (DBAs) to manage AI components, C6) ease of adaptation to training scenarios, and C7) learning from demonstrations. Overall, we propose that all these characteristics would con- tribute to improve the solutions for heterogeneous database 32nd GI-Workshop on Foundations of Databases (Grundlagen von Daten- banken), September 01-03, 2021, Munich, Germany. query processing, while simultaneously addressing the needs Copyright © 2021 for this paper by its authors. Use permitted under Crea- for AI production readiness. tive Commons License Attribution 4.0 International (CC BY 4.0). In more detail, our core contributions in this paper are: plans across multiple devices, or optimizing groups of que- ries at-a-time (which results pertinent for high parallelism • We present to the community a first proposal of se- devices). Other than that, many relevant performance fac- ven ideal features that we deem as central to building tors (e.g. device saturation, query expression complexity) or and maintaining a practical AI-based DBMS for co- implementation details (e.g. cache consistency) could result processor systems. in difficulties to model accurately for cost estimations. Summary: For efficient query processing over varied hard- • We propose an early high-level design that builds on ware, it is important to consider methods that are able to the ideas of the market components of the classical deal with large-scale optimization, and to work with uncer- Mariposa system developed by Stonebraker et al. [26], tain models for performance factors. while seeking to support the ideal system features we AI Adoption: It might be conceptually simple to alter a identified. certain computer system task (e.g. magic number selection, to build a hash function), replacing hard-coded rules with The remainder of this paper is structured as follows: Sec. 2, an AI-friendly interface that provides experience for a mo- points out various challenges in incorporating AI com- del, to eventually master the task. However, in practice it ponents to support co-processor database management. is far from trivial to build and maintain such models at the Sec. 3 outlines our proposed design needs, it also describes highest production-readiness levels [13]. In essence, unlike the high-level architecture of a system able to serve these traditional software components, AI models can be harder needs. This section covers the system features, as well as the to test and can fail in unexpected ways, specially for deep main workflow. Sec. 4, formalizes research questions that we learning. Models can often be black boxes, or require copious aim to address with our proposed system design. Sec. 5, pro- training to be efficiently used. Machine learning models, spe- vides context to the design we consider, by reviewing related cifically, consist of an entirely particular lifecycle going from work. Finally, Sec. 6, wraps-up this paper with a summary data management tasks (including data collection and fea- and points for future work. ture engineering), model learning (and tuning), validation and deployment, with challenges and cross-cutting concerns 2. CHALLENGES OF HETEROGENEOUS all through this lifecycle [19]. Some common challenges are: DATA MANAGEMENT insufficient data, concept drift, and adversarial attacks. Summary: The incorporation of AI components might af- In developing an AI-driven DBMS for heterogeneous pro- fect the guarantees that a system can provide. To overcome cessors, the literature suggests several challenges. this, it is fundamental that the system is made safe to criti- Storage Engine Design: From the perspective of a sto- cal AI errors with fall-back mechanisms, and finally that an rage engine, the trade-off between consistency, availabili- easy-to-use interface is offered for administrators to engage ty and partition-tolerance, for a given workload, is a fo- with AI metrics and model lifecycle management. remost concern [22]. Addressing availability, some challen- ges are: mechanisms to efficiently use scale-out processing to an increasing number of heterogeneous processors, while 3. DEVELOPING AN AI-ENABLED keeping in mind aspects such as different data transfer ra- tes [1]. Up front data distribution strategies, such as hard- CO-PROCESSOR ACCELERATED ware islands [24] or layered designs [28] for hot-cold data DATABASE PROTOTYPE are commonly adopted, but exploration of alternatives is li- In order to overcome the identified challenges of scalabi- mited in the domain (e.g., [16]). Furthermore, co-processor- lity as well as the need for adaptability and support for the friendly data structures and storage optimization adapted to AI/ML lifecycle, a principled design for building a system is the diversity of application scenarios, devices, and data cha- required. To date, similar designs have already been consi- racteristics (e.g., increasing relevance of textual and semi- dered by researchers in other areas, already offering design structured data), remains important for availability. Addres- principles. In this section, we consider briefly some of such sing consistency, strong mechanisms for supporting isolation design concepts from a general, to a more specific case, which level guarantees are necessary for increasing system maturi- we then use to propose the series of ideal characteristics, that ty. become the basis of our system design. Summary: Relevant directions for storage technologies to From a general application perspective, university cour- enhance their co-processing efficiency, while keeping with ses1 and textbooks already study good practices for building consistency constraints, include: workload-tuned data dis- systems that incorporate machine learning [9]. Furthermore, tribution, transfer-aware processing, the ease for incorpora- there are several papers that highlight intrinsic challenges ting alternative processor-specific data optimizations (e.g., which require designs to adapt to them [25]. They usual- layouts, or compression), and finally the ability to seamlessly ly refer to difficulties such as modularization, or to specific share data structures across processors. problems of an ML approach (e.g., [7]). Query Engine Design: The processing of single queries over Moving to a more specific application, the authors of heterogeneous hardware offers numerous optimization choi- the AutoSys framework [18] suggest 4 principles organized ces, as compared to the case of homogeneous hardware (e.g. around the goals of making systems learnable, and making just-in-time code generation to fuse pipelined operators into the learning manageable: exposing system behavioral featu- unified kernels, more diversity of operator variants, or oppor- res for learning through well-defined interfaces (P1), careful tunities for resource sharing of concurrent kernels). Further- monitoring of model behavior (P2), modularization of the more, there are many variants for a single operator present depending on the underlying device [3, 5]. This number of 1 For example, SE4AI offered by Christian Kästner at CMU: choices only increases when considering distributing single https://github.com/ckaestne/seaibib learning to scope complexity (P3), and resource manage- be solved in a collaboration among agents (e.g., sub-query ment for system exploration and maintenance (P4). In fur- selection and optimization per device). To this end, clean ther work, authors report experience in applying their fra- abstractions for the task and communication protocols are mework, providing further practical advice. required. This characteristic seeks to address the storage en- Similar design considerations have a long history in the gine challenge for high device adaptability. community that studies self-driving data management, al- C3- Exchangeability of optimizers: Similar to current databa- beit not often coupled with AI (e.g., Babu et al. have argued ses that already employ alternative optimizers in tasks like for experiment-driven adaptive tuning by having replicated join order optimization, for a research-oriented prototype it test databases [2]); most recently Kossmann and Schlosser becomes essential to support the use of alternative optimi- also highlight the importance of modular designs and the zers/models in a plug-and-play manner for dealing with a plug-and-play nature of optimizations in designing such ad- specifically modularized task. To the point, extending opti- aptive systems [11]. Furthermore, authors identify co-related mizers (with new features), and integrating new optimizers tasks as a core challenge to efficient modularization, propo- should also be supported with ease, to facilitate the evolu- sing and testing a linear programming framework that ena- tion of the overall system. bles them to deal with such complication. In recent years, research in AI-based databases has pro- C4- The separation of representation and policies: Following posed designs tailored to the needs in the area. Due to space related work [20], we propose that a smart separation bet- limitations, we discuss in the following a few key ideas from ween representation learning (i.e., how a model decides to a limited set of them: Pavlo et al. [20] build a system desi- represent an entity) and policies (i.e., the decisions made gned with a principled distinction between workload mode- by a model, for a task, given a representation), will be of ling (i.e., representation learning) and system control (i.e., value. This separation would facilitate representation re-use policies). In further work, they continue their approach whi- (transfer learning) across tasks that work on similar entities le distinguishing between externally and internally coupled (e.g., a query) and the analysis of alternative multi-modal intelligent mechanisms [21], illustrated by their work in Ot- solutions for a task which could provide benefits on different terTune and NoisePage, respectively. Within the research scenarios (e.g., a query can be represented as multi-sets of scope of SageDB, Kraska et al. [12] present and evaluate a traditionally encoded predicates, joins, tables; but it can al- comprehensive vision for how common database components so be represented as a graph of such features). As different can be replaced with AI. Among their core ideas, authors de- policies can benefit from stable compact learned representa- velop the concept of instance optimality, which posits that tions, this characteristic seeks to help in the aforementioned a learned model for a database needs only to be provably query engine challenges for large-scale optimization. optimal to the intended workload and system configuration. C5- Concepts for DBAs to manage AI components: In consi- Finally, the authors of XuanYuan [14] present a broad high- deration of the many steps requiring human management in level design that focuses on identifying what are the learna- the ML lifecycle, we envision that the role of the DBA could ble components of current databases, considering task mo- be extended to incorporate a degree of actions to manage dularization, and categorizing tasks according to the func- this lifecycle. To support this, novel services exposing ML tionality they offer to the overall system (e.g., self-healing, management with clearly-defined interfaces will be needed self-assembling, self-optimizing). in the database context. Based on this preceding work, and on the challenges iden- C6- Ease of adaptation to training scenarios: Different user tified for heterogeneous co-processing, we propose the fol- scenarios will create different alternatives for training the lowing seven characteristics we deem that an AI-based da- ML models. It might be that some scenarios accept live trai- tabase should reasonably offer for this domain. We should ning in the background for a given task, while other scenarios note that these characteristics might not be exhaustive, but might require collecting experience data for offline learning aim to serve as a starting point towards a principled design. at a later stage. Some scenarios might allow for ample large- scale training, while training on other scenarios might be C1- Task Modularization: The growing hardware heteroge- severely resource-constrained. In either case, the design for neity increasingly expands the space of all possible optimi- the system components in charge of scheduling model trai- zation choices for DBMSs. Already query optimizers for such ning, with its intrinsic resource management, should be able systems employ staging, wherewith optimizations are confi- to cater to such variations. The ability of models to schedule gured into stages and at each one there are specific sets of self-training should also be supported. rules and mechanisms that can be adopted. Concerning ML components, employing a single, monolithic model to learn C7- Learning from demonstrations: The final feature that we such a complex space and address the optimizing at a sin- believe is essential for successful adoption of AI models to gle shot can escalate the learning cost and complexity. Task solve computer system tasks has to do with robustness. In modularization is a good alternative, since the optimizati- order for the model to be able to replace a current strategy, on problems to be tackled can be decomposed and solved an efficient and reasonable solution would be starting with separately, resulting easier to learn. the model by being pre-trained on experience collected from the current strategy. Hence, mechanisms for creating and C2- Collaborative agents as building blocks: The best models using demonstrations for training are important. for a given task on a selected device are only required to be After listing these ideal system characteristics, we can now instance optimal (i.e, their strategies do not need to generali- present a tentative design that can be adopted to fulfill them. ze to other devices). Hence, as much as possible, it might be To be precise, we make the case for a design based on the beneficial for designs to strive towards supporting device- Mariposa system – a market-based distributed DBMS [26], specific simple instance-optimal models by decomposing a which we will discuss further in Sec. 5. In general, Maripo- task (e.g., a single query optimization) into parts that can sa operates the query processing in a decentralized manner that allows for local autonomy regarding query execution served. Once choices are made, finally queries can be execu- in each site contained within a network, instead of centrali- ted on the devices. This scheme describes a query market, zed management. Mariposa’s working mechanism primarily as the proposed by Stonebraker et al. [26]. By framing the bases itself on an economic paradigm, focusing on two sepa- problem in economic terms, this approach helps decentrali- rate markets, for data and query distribution, respectively. zed coordination and favors local strategies for optimization In our research, we argue for building on Mariposa’s market (C2). Some optimization choices include: variant selection, concepts, in two key ways: First, by considering an architec- operator merging into unified kernels, different sub-query ture with heterogeneous processing capabilities. Second, by splitting and pipelining strategies, parallelism tuning with investigating how AI-based solutions can augment the pro- morsel-driven execution, locality awareness, intermediate re- posed markets. In this regard, we take as a hypothesis that sults reuse and operator sharing across queries. the modularization of the optimization mechanisms presen- Data Management: The data distribution lifecycle, which ted in Mariposa (C1, C2) serve to scope the complexity of occurs in the background, can be understood as follows: On the learning tasks, serving as a workable basis for incorpo- system start, given the lack of information for distributing rating technological innovations as well as the production the data, some assumptions can be made by the storage ma- readiness. Figure 1 envisions a general architecture of our nager to achieve fragmentation and distribute the data for proposed design, which employs the market concepts from load balancing. In general, data can be grouped into frag- the Mariposa system as a starting point. ments that are commonly co-accessed and that provide a At a high level, four components are involved: given utility. While the system is online there are two ways Global Optimizer: This component maps SQL queries to in which data can be redistributed: First, when an optimal actual plans. It is in charge of global query optimization plan for a query cannot be found, global requests for data re- including: the generation of global plans, partial splitting of organization can be made by the global optimizer (with some the plans (to distribute among devices), decision support for pre-designed mechanism). Following these requests, the de- selection of plans returned from the device processor class vice optimizers can organize autonomously how to serve the optimizer, and (optionally) requests for data re-distribution. global hints. Second, device optimizers themselves are re- Storage Manager: This component provides a centralized sponsible for tracking the utility derived from a given data collection of statistics about devices and a tracking mecha- fragment (i.e., depending on the queries that can be ser- nism of data distribution schemes. It enables user-facing con- ved by such fragment). This enables devices to have metrics figurations of the overall storage, including schema manage- to be able to assess how much utility can be derived from ment, index selection and coarse-grained partitioning. It also fragments that are not locally available. Hence, by using in- is intended to provide the DBA with an interface to the AI formation from the storage manager, device optimizers can components, including learning from demonstration. Hence, participate in a data market. In this market, devices buy this component realizes C5 and C7. copies of fragments, and delete local copies of fragments, Device Processor Class Optimizer: This is the key com- while keeping with some constraints (e.g., for co-location or ponent for decentralized modularized data management. As availability). The market formulation is expected to facili- we could propose a component per processor, or per compu- tate adaptivity and work distribution (C2). Some learning te node/device (i.e., irrespective of the co-processor variety tasks to be tackled with this system include: local algorithm included), we find that a component per type of processor selection, local query optimization, global plan selection, lo- serves as a workable middle-ground. This component is in cal fragment partitioning, data sharing, global management charge of local data fragmentation, local query optimization for data redistribution, query classification/prioritization. (and pricing), algorithm selection and the actual execution Altogether, the proposed design is intended to realize the of queries. It is also responsible for autonomous data sharing. ideal characteristics we set as goals (C1-C7). To achieve this, AI Support System: This element encompasses the func- the characteristics of the original Mariposa design are lever- tionality required to support the ML lifecycle. It includes aged (C1,C2). Furthermore, the design seeks to facilitate model management, model training, among others. It is in- the use of alternative optimizers or models for a task (C3), tended to facilitate C6 and C7. to provide chances for reusing representations across com- Our architecture enables the distribution of query and ponents (e.g., of queries with respect to the device optimi- sub-query plans for cost estimation on the device processor zers), which that can then adopt different policies (C4), and class optimizers; besides the distribution of data driven by to create opportunities for components to have flexible AI the device-specific component, in addition to (partial input training engaging the DBA with the process, and enabling from) the global optimizer. to learn from demonstration data (C5-C7). Query Processing: At the start, a group of queries enters the system at a given time step, and at the global optimizer, 4. OPEN QUESTIONS they are ranked by their importance to overall performance Based on our proposed design, in this section we turn to goals. Second, they are globally partitioned and subsets of open questions we envision our design to be able to help their plans are shipped to the devices, for pricing. In third address. These questions relate to query engine (Q1-Q2), place, the different device processor class optimizers provide storage engine (Q3) or machine learning (Q4-Q5) challenges. a set of optimizations and prices for the queries requested. Q1: What building blocks for intelligent and collaborative To do this, they featurize the query plans (C4), and sug- query processing are necessary to achieve improvements on gest different combinations of sub-queries to execute with heterogeneous processors, considering single-query optimi- different costs (for this they consider local data statistics zation –focusing on algorithm selection, parallelism tuning, and learned models for algorithm selection). The prices are splitting, merging and pipelining of operators; compared to then returned, in a fourth step, to the global optimizer, so strong baselines? this optimizer can select among the bids until all queries are Q2: What strategic designs for intelligent and collaborative SQL Query System Administration Interface Interface Storage Manager Device Manager for Fragment Access Indexes and Configurations Logging Monitor and AI Tracking Paths Views Global Optimizer Info Components Workload-level Query Prioritization Performance Goal AI Device Processor Class Optimizer Support Global Plan Partitioning System Local Storage Manager Local Query Optimizer Executor Global Plan Selection Fragment Global Algorithm/ Fragment valuation Data Local query Intermediate Storage Variant formation and sharing optimization Results Views Global Management for Info Selection Statistics Data Redistribution Data market interface (bidding) Query market interface (bidding) Figure 1: General Architecture of our Proposed System query processing lead to performance gains on heterogeneous by machine intelligence, instead of focusing on only one or processors, considering multi-query optimization (MQO) – some certain tasks within it. Some highlighted work, which with a focus on intermediate results reuse and operator sha- can be considered to be in a relative early stage, include ring? To what degree do intelligent methods compete with Peloton [20], SageDB [12] and GaussDB2 . non AI-based alternatives? Q3: What precise contributions are brought from different 5.2 Market-based distributed database applications of AI, to the efficiency of data sharing across systems co-processors; contrasted with competitive baselines? In economics, a market is defined as any structure that Q4: How do AI-based approaches perform in robustness enables trading activities among its participants, for any ty- tests, compared to heuristic baselines, with respect to pes of goods, services or information, following a pricing me- changing assumptions such as novel processors or unseen chanism that aims for optimal distribution and allocation of workloads/kinds of queries? What level of improvements resources. Interestingly, this concept has been reformulated does curricula management bring regarding robustness and to efficiently solve the problems of query optimization in sample-efficiency? many distributed data management systems. To help with Q5: What techniques from learning management (such as some of our ideal design characteristics (C2), we consider learning from demonstrations, or transfer optimization) or this concepts to be relevant. from database implementation contribute the most to an The system that we base our design on is Mariposa [26], efficient integration of the AI components into the lifecy- which adopts market concepts to achieve autonomous da- cle of data management? Do these techniques contribute to ta sharing and query processing. In a wide-area network, trade-off management between approaches? To what extent Mariposa allows each single site to take a full control over do these techniques improve the overall readiness of our so- its own resources, enabling it to decide on data objects to lution over baseline choices? buy or sell and queries for which to bid on, for execution. A bidding protocol is defined to regulate the transactions 5. RELATED WORK among all sites within the two markets: 1) Query Market: each query Q enters the system with a budget B(t) indi- 5.1 AI-based database systems cating the price that the user wants to pay for running Q within time t. Also, Q is administered by a broker, which Incorporating AI components to traditional systems, for sends out to bidder sites the requests for bids to execute improving the overall system performance, is a significant subqueries Q1 , ..., Qn and then decides on the winning sites. topic that is currently catching great attention from resear- 2) Data Market: each table included in the FROM clause of chers, in both theoretical and applied aspects. On one hand, a query can be split into a set of fragments. A site needs to multiple studies investigate the strategies for an overall co- buy fragments referenced in the subquery that it wants to design of systems and AI, fitting for general applications [18] bid on, and can sell its must-evicted fragments at any time and more specific ones in databases [21]. On the other hand, by conducting an auction, following the system pricing me- many research zooms into the particular problems that can chanism. The trading process runs continuously. Each site benefit from using suitable AI techniques. Those, in data- makes decisions on storing, buying and selling fragments or bases, range from cost and cardinality estimation assisted the replicas of fragments made by the site itself, aiming at by deep neural networks [10], join order selection or parti- maximizing its profit per unit time. tioning supported by reinforcement learning [15, 4, 8], and many more [29, 14]. In a bigger scope, other recent literature 2 https://e.huawei.com/en/solutions/cloud-computing/big- also studies complete database management system assisted data/gaussdb-distributed-database Another framework that is based on an economic para- [4] G. C. Durand, R. Piriyev, M. Pinnecke, D. Broneske, digm to acquire self-adaptive query allocation in large-scale B. Gurumurthy, and G. Saake. Automated vertical distributed systems is SQLB [23], in which the authors high- partitioning with deep reinforcement learning. In ADBIS. Springer, 2019. light the importance of maintaining constantly the interests [5] B. Gurumurthy, D. Broneske, M. Pinnecke, G. Campero, of the participators throughout the ongoing market. The sys- and G. Saake. Simd vectorized hashing for grouped tem targets at preserving participants’ satisfaction on query aggregation. In ADBIS. Springer, 2018. allocation/execution and guaranteeing query load balancing [6] B. Gurumurthy, T. Drewes, D. Broneske, G. Saake, and within the system, which then helps in minimizing the re- T. Pionteck. Adaptive data processing in heterogeneous sponse time and maximizing system throughput. hardware systems. In GvDB, 2018. NashDB [16] is a more recent framework that shows ef- [7] A. Haj-Ali, N. K. Ahmed, T. Willke, J. Gonzalez, et al. A ficiency in autonomously handling data fragmentation, rep- view on deep reinforcement learning in system optimization. arXiv preprint arXiv:1908.01275, 2019. licas generation, allocation and cluster sizing to attain the [8] B. Hilprecht, C. Binnig, and U. Röhm. Learning a Nash equilibrium, i.e. supply-demand balance in markets. partitioning advisor for cloud databases. In SIGMOD, 2020. [9] G. Hulten. Building Intelligent Systems. Springer, 2019. 6. CONCLUSION [10] A. Kipf, D. Vorona, J. Müller, T. Kipf, et al. Estimating The search space of traditional query optimizers is ve- cardinalities with deep sketches. In SIGMOD. ACM, 2019. ry large. Such search space is further increased many folds [11] J. Kossmann and R. Schlosser. Self-driving database systems: A conceptual approach. DAPD, 2020. with the introduction of co-processors for query execution. [12] T. Kraska, M. Alizadeh, A. Beutel, H. Chi, et al. SageDB: AI techniques are promising solutions in traversing such a A learned database system. In CIDR, 2019. large search space, identifying the best plan in an effective [13] A. Lavin, C. M. Gilligan-Lee, A. Visnjic, S. Ganju, et al. and time-efficient way. As a consequence, there is a growing Technology readiness levels for machine learning systems. body of research devoted to AI-based solutions [29]. Howe- arXiv, 2021. ver, turning these solutions into a production-ready contri- [14] G. Li, X. Zhou, and S. Li. Xuanyuan: An AI-native bution remains a challenge since AI, and machine learning database. IEEE Data Eng. Bull., 42(2), 2019. in specific, contain many intrinsic challenges that require [15] R. Marcus and O. Papaemmanouil. Deep reinforcement overall system considerations. In sum, systems builders are learning for join order enumeration. In aiDM@SIGMOD. Association for Computing Machinery, 2018. placed in the difficult position of having to simultaneously [16] R. Marcus, O. Papaemmanouil, S. Semenova, and tackle a set of heterogeneous co-processing challenges, next S. Garber. NashDB: An end-to-end economic method for to a set of AI adoption challenges. elastic database fragmentation, replication, and As a motivation for this work, we considered that a prin- provisioning. In SIGMOD. Association for Computing cipled system design could contribute to addressing the afo- Machinery, 2018. rementioned 2 sets of challenges, while at the same time [17] A. Meister, S. Breß, and G. Saake. Toward GPU-accelerated helping in the integration of different technological innovati- database optimization. Datenbank-Spektrum, 15(2), 2015. ons. In order to contribute towards this goal, in this pa- [18] C.-J. Mike Liang, H. Xue, M. Yang, and L. Zhou. The case for learning-and-system co-design. ACM SIGOPS per we summarized a list of preceding work that helped Operating Systems Review, 53(1), 2019. us to identify 7 design characteristics (C1-C7), addressing [19] A. Paleyes, R.-G. Urma, and N. Lawrence. Challenges in needs for scoping complexity and difficulty of learning (C1, deploying machine learning: A survey of case studies. C7), high adaptability/instance optimality (C2-C3), scale arXiv, abs/2011.09926, 2020. (C4), and machine learning issues in general (C5-C6). Ba- [20] A. Pavlo, G. Angulo, J. Arulraj, H. Lin, et al. Self-driving sed on this, we proposed an early overall system design, ba- database management systems. In CIDR, volume 4, 2017. sed on concepts originally studied in the visionary Mariposa [21] A. Pavlo, M. Butrovich, A. Joshi, L. Ma, et al. External vs. system, specifically the economic concepts for a data and internal: An essay on machine learning agents for autonomous database management systems. IEEE Data query market. We propose this design to fulfill the design Eng. Bull., 42, 2019. characteristics, while offering an AI-based heterogeneous co- [22] M. Pinnecke, D. Broneske, G. C. Durand, and G. Saake. processing database. To conclude, we listed open questions Are databases fit for hybrid workloads on GPUs? A storage that we would like to review, moving forward, by using our engine’s perspective. In ICDE. IEEE, 2017. proposed design. [23] J.-A. Quiane-Ruiz, P. Lamarre, and P. Valduriez. SQLB: A Query Allocation Framework for Autonomous Consumers and Providers. 2007. 7. ACKNOWLEDGMENTS [24] A. Raza, P. Chrysogelos, P. Sioulas, V. Indjic, et al. This work was partially funded by the DFG (grant no.: SA GPU-accelerated data management under the test of time. 465/51-1 and PI 447/9). The authors would like to thank In CIDR, 2020. Marcus Pinnecke, Andrey Kharitonov, Rajatha Rao and [25] I. Stoica, D. Song, R. A. Popa, D. Patterson, et al. A Yash Shah for collaborations related to this work. berkeley view of systems challenges for ai. arXiv preprint arXiv:1712.05855, 2017. [26] M. Stonebraker, P. M. Aoki, W. Litwin, A. Pfeffer, et al. 8.[1] I.REFERENCES Arefyeva, D. Broneske, G. Campero, M. Pinnecke, and Mariposa: A wide-area distributed database system. VLDB G. Saake. Memory management strategies in CPU/GPU Journal, 5(1), 1996. database systems: A survey. In BDAS. Springer, 2018. [27] M. Zahran. Heterogeneous computing: Hardware and [2] S. Babu, N. Borisov, S. Duan, H. Herodotou, and software perspectives. ACM, 2016. V. Thummala. Automated experiment-driven management [28] Y. Zhang, Y. Zhang, J. Lu, S. Wang, et al. One size does of (database) systems. In HotOS, 2009. not fit all: Accelerating OLAP workloads with GPUs. [3] D. Broneske, S. Breß, M. Heimel, and G. Saake. Toward DAPD, 38, 2020. hardware-sensitive database operations. In EDBT. [29] X. Zhou, C. Chai, G. Li, and J. Sun. Database meets OpenProceedings.org, 2014. artificial intelligence: A survey. TKDE, 2020.