=Paper=
{{Paper
|id=Vol-3946/PhDW_paper3
|storemode=property
|title=Towards a Neural Database Execution Engine
|pdfUrl=https://ceur-ws.org/Vol-3946/PhD-Workshop-3.pdf
|volume=Vol-3946
|authors=Christos Tsapelas
|dblpUrl=https://dblp.org/rec/conf/edbt/Tsapelas25
}}
==Towards a Neural Database Execution Engine==
<pdf width="1500px">https://ceur-ws.org/Vol-3946/PhD-Workshop-3.pdf</pdf>
<pre>
                         Towards a Neural Database Execution Engine
                         Christos Tsapelas1,2
                         Supervised by Georgia Koutrika2
                         1
                           Department of Informatics and Telecommunications, National and Kapodistrian University of Athens
                         2
                           Archimedes, Athena Research Center, Greece


                                         Abstract
                                         Recent advances in natural language understanding have heightened the interest in AI systems capable of answering queries across
                                         multiple data modalities, such as structured database tables and unstructured text. Current approaches typically rely on Large Language
                                         Models (LLMs) to facilitate queries between these modalities, which incurs substantial computational costs and often yields suboptimal
                                         performance. To this direction, this research introduces a novel query execution engine designed to bridge diverse data modalities,
                                         leveraging the high-efficiency querying capabilities of database systems with the advanced reasoning capacities of LLMs. This paper
                                         presents a prototype architecture for such a multi-modal database system, detailing its core components and their functionalities to
                                         demonstrate how it can achieve effective, scalable query processing across structured and unstructured data.

                                         Keywords
                                         database systems, large language models, memory networks, hybrid query execution, virtual knowledge bases


                         1. Introduction                                                                                            strengths of both learned models and traditional database
                                                                                                                                    methods, for rapid and precise query execution across mul-
                         The evolution of modern data warehouses has introduced                                                     tiple data modalities. The proposed neural execution engine
                         unprecedented challenges and opportunities, with data vol-                                                 seeks to empower database systems with the flexibility to
                         umes now encompassing multiple modalities, such as struc-                                                  handle diverse data modalities and complex query types, ad-
                         tured table data, unstructured text, and images. Each data                                                 dressing a critical need in the field of data management and
                         type possesses a unique structure, necessitating tailored                                                  paving the way for next-generation data retrieval solutions.
                         querying methods that effectively harness the properties
                         of each modality. However, despite these advances, exist-
                         ing systems struggle to generalize queries across multiple                                                 2. Related Work
                         modalities, presenting a key limitation in addressing the
                         needs of diverse, cross-modal data integration tasks.                                                      Recent works in natural language understanding require
                            Database management systems (DBMS) excel in perform-                                                    retrieving and reasoning, like question answering. For such
                         ing rapid, efficient, and precise queries on extremely large                                               knowledge-intensive tasks, it is required to assimilate in-
                         data volumes at scale. However, their primary focus remains                                                formation from different sections of large different inputs
                         on exact computation at scale, with limited reasoning ca-                                                  such as books and article collections [4]. To this direction,
                         pabilities [1]. In contrast, Large Language Models (LLMs)                                                  the notions of Virtual Knowledge Bases (VKBs) [5, 6, 4] and
                         excel at processing natural language data across massive tex-                                              Memory Networks [7, 8] are proposed, in which entity men-
                         tual corpora, offering logical reasoning over unstructured                                                 tions in text are transformed into dense representations to
                         data due to the model’s ability to embed knowledge within                                                  represent properties or relations expressed with text pas-
                         its weights [2]. This distinction highlights a crucial gap                                                 sages.
                         between traditional DBMS architectures and the reasoning                                                      Moreover, the advanced reasoning capabilities of LLMs
                         and flexibility capabilities that LLMs bring to unstructured                                               using RAG in question answering, has emerged a new area
                         data processing.                                                                                           of research where the system takes as input both structured
                            Numerous contemporary applications demand complex                                                       and unstructured data for reasoning over different modal-
                         queries that integrate information across multiple modali-                                                 ities [1, 9, 3, 10, 11, 12, 13, 14], or use the LLMs as a query
                         ties [3]. Current dominant approaches for multi-modal data                                                 engine to pose SQL queries [2].
                         integration include retrieval-augmented generation (RAG),                                                     Simultaneously, the database community has introduced
                         similarity-based search, and Text2SQL. These techniques,                                                   an innovative research direction involving learning-based
                         however, exhibit limitations in both the diversity of query                                                techniques to enhance query execution. Advances such as
                         types they can accommodate and their query execution per-                                                  learned sorting [15] and learned scans and joins algorithms
                         formance. Text2SQL methods, for instance, are effective for                                                [16, 17] have demonstrated highly promising results in opti-
                         natural language queries that have a direct SQL equivalent,                                                mizing traditional query processes. These methods indicate
                         whereas RAG systems are constrained to point lookups in-                                                   that machine learning techniques can significantly improve
                         volving only a limited number of records, requiring an LLM                                                 fundamental database operations, reinforcing the potential
                         to execute the join operation.                                                                             for a hybrid approach that integrates both database and
                            To this direction, my doctoral research seeks to bridge                                                 LLM methodologies.
                         the gap between the approaches of LLMs and traditional
                         database systems to enable efficient, flexible hybrid search                                               3. Research Questions
                         queries. The objective is to develop a prototype neural
                         query execution engine equipped with novel algorithms                                                      The main purpose of this research proposal is the develop-
                         for efficient data access and join operations, leveraging the                                              ment of a prototype execution engine able to execute queries
                                                                                                                                    combining different data modalities. The implementation
                          Published in the Proceedings of the Workshops of the EDBT/ICDT 2025                                       of such a novel system, emerges a set of research questions:
                          Joint Conference (March 25-28, 2025), Barcelona, Spain
                                     © 2025 Copyright for this paper by its authors. Use permitted under Creative Commons License
                                     Attribution 4.0 International (CC BY 4.0).


CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
                                                                    4. A Prototype of A Neural Database
                                                                       Engine
                                                                    In this section, an overview of the proposed neural database
                                                                    execution engine is provided. Suppose the example query:
                                                                    "Find all customers who purchased ’Product X’ and had posi-
                                                                    tive experience regarding the quality of the product, within
                                                                    the past six months.". This query is transformed into a multi-
                                                                    modal SQL query and sent to the engine for execution. The
                                                                    example query will assist to describe several aspects of the
                                                                    proposed system, along with the execution flow of a hybrid
                                                                    query between database tables and a text corpora.
                                                                       In Figure [1], we present the architecture of our query
Figure 1: Architecture of Neural Execution Engine. The sys-         engine. The example query is parsed and the system fetches
tem accepts a multi-modal SQL query and fetches the related
                                                                    the Product and Sales database tables and their related men-
database and mention tables. Next the optimizer generates a
hybrid execution plan with: database scans (rectangles), mention
                                                                    tion tables. Then, the optimizer is invoked to generate an
tables scans (circles), database join (triangle) database-mention   optimal execution plan for the given query. The optimizer
table join (trapezoid)                                              faces many challenges like selecting the appropriate scan
                                                                    and join operators within and across modalities, while pre-
                                                                    dicting their optimal order in the execution plan. In Figure 1,
                                                                    different physical operators are separated to make clear the
RQ1 How can structured database tables and unstructured             different processing steps. Finally, the generated execution
  data sources, like text documents, be effectively associ-         plan is submitted to the neural engine for execution.
  ated to form a unified querying framework?                           Before the query engine can execute queries across both
RQ2 How can dense vector representations be constructed             data modalities, a preparatory phase, referred to as mention
  to preserve both the semantic richness of text and the            tables construction, is required. In the provided example,
  structural integrity of database entities?                        mentions of ’Product X’ must be recognized within text
RQ3 What new operators are required and how can                     passages. In this phase, the system generates a series of
  database operators be adapted to process queries in-              key-value (KV) tables that bridge the information within
  volving structured and unstructured data?                         database tables and text documents stored in blob storage.
                                                                       The objective of these KV tables is to create dense vec-
RQ4 How can a cost-based query optimizer be designed
                                                                    tor representations of entities (keys) that encapsulate the
  to generate efficient execution plans for hybrid queries
                                                                    knowledge embedded in the text corpus (values). These
  involving structured and unstructured data?
                                                                    representations are structured to seamlessly integrate with
RQ5 What techniques can be employed to ensure the query             a Transformer model, enabling efficient and effective pro-
  engine scales efficiently for large datasets across multi-        cessing by the query engine in subsequent stages.
  ple modalities?                                                      Upon initializing these learned tables, the execution en-
                                                                    gine is ready to process queries. When a query is posed,
3.1. Research Opportunities                                         the system parses it and generates an optimal execution
                                                                    plan. The plan selection process resembles that of tradi-
Building upon the related work, the proposed query en-              tional database systems, wherein the optimizer explores the
gine represents a significant advancement in addressing the         space of possible execution plans and evaluates candidate
previously outlined research questions.                             plans based on a cost model.
   Virtual Knowledge Bases (VKBs) generate dense repre-                Given the hybrid nature of the proposed query engine,
sentations of real-world entities, such as those found within       which supports queries across multiple data modalities, it
Wikipedia, to enable querying. However, these representa-           is necessary to define new hybrid operators capable of han-
tions have not been applied within the context of database          dling data from both structured and unstructured sources,
systems. A key component of this research involves estab-           like scans, projections, joins etc. These operators are de-
lishing connections between database entities, as defined by        signed to facilitate seamless integration and processing of
the data model of each database, and external text corpora          data across the diverse modalities enclosed by the system.
or additional modalities, such as images.                              During the next subsections, the main components of the
   Furthermore, the current state-of-the-art approach for           proposed query engine are describes. Initially, the process
integrating multiple data modalities relies on Multi-Modal          of mention tables is described, a methodology to associate
Large Language Models (MLLMs). This methodology typi-               table data with the available text corpus. Next, the core of
cally employs large-scale LLMs to process queries, a strategy       the execution engine is detailed, focusing on the needed
that is computationally expensive and constrained by the            operators and the query optimizer of the system.
input size limitations inherent to LLMs.
   The primary objective of this research is to enable effi-
cient execution of multi-modal queries capable of manag-            4.1. Mention Tables
ing large-scale data in a manner aligned with traditional           As previously noted, it is essential to establish associations
database systems. To achieve this, the research will extend         between data from database tables and the available text
conventional database operators, such as scans and joins,           documents, defining the specific types of information to be
by developing novel implementation algorithms designed              retrieved and assimilated across these data sources [4]. The
to process diverse data modalities.                                 database schema provides a structured representation of the
entities within the database, with clearly defined properties
for each entity type.
   Thus, an initial processing step is proposed between the
different data sources, where each passage in text documents
is annotated with the main entities (fact tables) from the
database and we highlight entity mentions in the passage
with special tokens. Figure [2] shows the construction of
mention tables. Representations of these tokens are later         Figure 2: Mention Tables represent entities inside database ta-
used to generate entity encodings.                                bles into vector representations.
   The goal of mention tables is to gather these database
entity encodings into matrices constructing key-value stores
containing the dense vector representations for each entity       joins, which retrieve passages where multiple entities are
in text documents forming a virtual knowledge base of the         mentioned together, and contextual similarity joins, which
available text documents, like [7, 4, 18, 8]                      link entities based on the similarity of their dense vectors
                                                                  and passage-level joins, which connect text passages that
4.2. Neural Query Execution                                       reference related entities, enabling richer narratives
                                                                     Joins between mention tables and database tables inte-
To extend the querying capabilities of traditional database       grate structured and unstructured data to provide a unified
systems across multiple data modalities, it becomes neces-        query interface. Entity-ID joins link entities in mention ta-
sary to adapt and expand conventional database operators          bles to their corresponding database records, while property-
to effectively manage and process data from both structured       based joins combine entities based on shared attributes, such
and unstructured sources.                                         as linking customer mentions with their structured profiles.
    Within a neural database system, operators are catego-        In the example query, the first join of the execution plan
rized into two distinct types: a) single-modal operators,         is an Entity-ID join between Product database and mention
which are designed to process a single data modality (e.g.,       tables. Aggregated knowledge joins enrich database records
structured table data or unstructured text passages), such as     with insights from text passages, and hybrid semantic joins
scan operations, and b) multi-modal operators, which are          bridge structured relationships in the database with seman-
capable of processing and associating inputs across multiple      tic similarity in mention tables, enabling advanced querying
data modalities, such as join and aggregation operations,         across diverse data modalities.
that integrate information from both structured and unstruc-         While the traditional database operators are well-defined,
tured sources. Thus, there is an emerging need to extend          the landscape of neural operators for execution is an active
traditional relational algebra used in database systems, to       area of research. There are efforts from the database com-
describe the new neural operators for different modalities.       munity enhancing traditional operators [15, 16, 13] with
    In database systems, all operators in relational algebra      neural models for faster query processing, while there are
take as input a relation and the output is the result of the      approaches that propose learned operators, e.g learned scans
operator applied on the input relation, which is again a          and joins [17]. Further, the proposed query engine can uti-
relation. In the case of the proposed query engine, we need       lize the reasoning capabilities of LLMs to provide reasoning
to define the neural operators regarding the scan, filter and     or summaries on the results of the aforementioned oper-
project of mention tables, as well the join implementation        ators at the end of query results or on some intermediate
between the a database table and a mention table.                 step of query processing. In the provided example, the LLM
    Scan Mention Tables Scan operations over mention              is invoked to evaluate all reviews text passages per product.
tables enable efficient querying and processing of entity-
associated text passages. These include entity retrieval scans,
which extract passages linked to specific entities, like ’Prod-
                                                                  4.3. Query Optimization
uct X’ in example query, and mention highlight scans, which       The query optimizer for the proposed neural execution en-
identify all occurrences of targeted entities. Contextual sim-    gine bridges the gap between structured and unstructured
ilarity scans rank passages based on semantic relevance to a      data processing, enabling efficient query execution across
query vector, while entity-to-entity relationship scans reveal    database tables and mention tables. By integrating tradi-
co-occurrences within text. Additional methods, such as           tional database strategies with neural processing, it ensures
aggregated entity statistics scans and temporal or categorical    scalability and adaptability for hybrid, multi-modal queries.
filters, allow for deeper insights by analyzing mention fre-         A core characteristic of the optimizer is its cost-based
quency, context diversity, or filtering by specific attributes.   approach, which evaluates potential query execution plans
Advanced operations, like neighborhood scans for exploring        based on resource consumption, including computation
entity connections and multi-modal entity scans for linking       time, cardinality estimation on both database and mention
to database tables, further enhance the querying capabilities     tables, memory usage, and I/O overhead. For neural opera-
of mention tables. These methods leverage the dense vector        tors, additional factors such as the cost of vector similarity
representations of entities to facilitate robust and flexible     computations and embedding generation are built-in the
data exploration.                                                 cost model, ensuring an accurate evaluation of query plans.
    Join Mention & Database Tables For join operations in            Moreover, two very important aspects are the query de-
the proposed query engine, two types of joins are possible:       composition and the cross-modal data flow. The optimizer
a) joins within mention tables and b) hybrid joins between        decomposes complex queries into into modality-specific
database and mention tables.                                      sub-queries, ensuring efficient processing of structured and
    Joins within mention tables enable the discovery of rela-     unstructured data by simultaneously selecting the most ap-
tionships between entities based on shared textual contexts       propriate operators, as well as, their optimal order in the
or semantic relevance. These include entity co-occurrence         execution plan. In this context, the proposed execution plan
should minimizes redundant computations and intermedi-                 knowledge bases, 2022. URL: https://arxiv.org/abs/
ate results, optimizing data transfer between operators, for           2204.06031. arXiv:2204.06031.
efficient cross-modal data flow.                                   [6] M. de Jong, Y. Zemlyanskiy, N. A. FitzGerald, F. Sha,
   Finally, the optimizer is designed to adapt to dynamic              W. W. Cohen, Mention memory: incorporating textual
query workloads and evolving data characteristics by sup-              knowledge into transformers through entity mention
porting runtime re-optimization. It monitors operator per-             attention, in: 10th International Conference on Learn-
formance during execution and adjusts plans as needed.                 ing Representations, ICLR 2022, 2022.
Furthermore, it integrates pre-trained or fine-tuned neu-          [7] S. Sukhbaatar, J. Weston, R. Fergus, et al., End-to-end
ral models for unstructured data processing, ensuring their            memory networks, Advances in neural information
effective and efficient use in query execution.                        processing systems 28 (2015).
                                                                   [8] Z. Zhong, T. Lei, D. Chen, Training language
                                                                       models with memory augmentation, in: Proceed-
5. Conclusions and Future Work                                         ings of the 2022 Conference on EMNLP, Association
                                                                       for Computational Linguistics, 2022, pp. 5657–5673.
This paper presents a prototype query engine designed to
                                                                       URL: https://aclanthology.org/2022.emnlp-main.382/.
execute queries across multiple data modalities efficiently
                                                                       doi:10.18653/v1/2022.emnlp-main.382.
and at scale. A central proposition of this work is the initial
                                                                   [9] L. Patel, P. Kraft, C. Guestrin, M. Zaharia, Acorn:
association of key entities from the database with their cor-
                                                                       Performant and predicate-agnostic search over vector
responding references within the text corpus. The system
                                                                       embeddings and structured data, Proceedings of the
then constructs mention tables, key-value tables containing
                                                                       ACM on Management of Data 2 (2024) 1–27.
dense vector representations of entities and their related tex-
                                                                  [10] A. Dargahi Nobari, D. Rafiei, Dtt: An example-driven
tual passages. The results of this approach have the poten-
                                                                       tabular transformer for joinability by leveraging large
tial to inspire new algorithms that link structured database
                                                                       language models, Proceedings of the ACM on Man-
information with external unstructured data sources.
                                                                       agement of Data (SIGMOD) 2 (2024). URL: https://doi.
   Furthermore, this novel tabular representation of unstruc-
                                                                       org/10.1145/3639279. doi:10.1145/3639279.
tured text enables the development of specialized operators
                                                                  [11] G. Badaro, M. Saeed, P. Papotti, Transformers
that the query engine must support. The design and im-
                                                                       for tabular data representation: A survey of mod-
plementation of these operators establish the foundational
                                                                       els and applications, Transactions of the Associa-
components of the envisioned query engine. Additionally,
                                                                       tion for Computational Linguistics 11 (2023) 227–249.
the integration of these operators calls for the development
                                                                       URL: https://aclanthology.org/2023.tacl-1.14/. doi:10.
of a new generation of query optimizers capable of gener-
                                                                       1162/tacl_a_00544.
ating efficient execution plans across both structured and
                                                                  [12] M. J. Cafarella, C. Re, D. Suciu, O. Etzioni, M. Banko,
unstructured data modalities. This research lays the funda-
                                                                       Structured querying of web text, in: 3rd Biennial Con-
mentals for a novel approach to querying multi-modal data
                                                                       ference on Innovative Data Systems Research (CIDR),
and opens new avenues for future exploration in hybrid
                                                                       Asilomar, California, USA, 2007.
query optimization and execution strategies.
                                                                  [13] C. Liu, M. Russo, M. Cafarella, L. Cao, P. B.
                                                                       Chen, Z. Chen, M. Franklin, T. Kraska, S. Madden,
Acknowledgments                                                        G. Vitagliano, A declarative system for optimizing ai
                                                                       workloads, arXiv preprint arXiv:2405.14696 (2024).
This work has been partially supported by DataGEMS,               [14] Y. Lin, M. Hulsebos, R. Ma, S. Shankar, S. Zeigham,
funded by the European Union’s Horizon Europe Research                 A. G. Parameswaran, E. Wu, Towards accurate and
and Innovation programme, under grant agreement No                     efficient document analytics with large language
101188416 and by project MIS 5154714 of the National Recov-            models, 2024. URL: https://arxiv.org/abs/2405.04674.
ery and Resilience Plan Greece 2.0 funded by the European              arXiv:2405.04674.
Union under the NextGenerationEU Program.                         [15] A. Kristo, K. Vaidya, U. Çetintemel, S. Misra, T. Kraska,
                                                                       The case for a learned sorting algorithm, in: Proceed-
                                                                       ings of the 2020 ACM SIGMOD International Con-
References                                                             ference on Management of Data, SIGMOD ’20, As-
                                                                       sociation for Computing Machinery, New York, NY,
 [1] A. Biswal, L. Patel, S. Jha, A. Kamsetty, S. Liu, J. E.
                                                                       USA, 2020, p. 1001–1016. URL: https://doi.org/10.1145/
     Gonzalez, C. Guestrin, M. Zaharia, Text2sql is not
                                                                       3318464.3389752. doi:10.1145/3318464.3389752.
     enough: Unifying ai and databases with tag, arXiv
                                                                  [16] I. Sabek, T. Kraska,        The case for learned in-
     preprint arXiv:2408.14717 (2024).
                                                                       memory joins,        Proc. VLDB Endow. 16 (2023)
 [2] M. Saeed, N. De Cao, P. Papotti,              Querying
                                                                       1749–1762. URL: https://doi.org/10.14778/3587136.
     large language models with sql, arXiv preprint
                                                                       3587148. doi:10.14778/3587136.3587148.
     arXiv:2304.00472 (2023).
                                                                  [17] M. Urban, C. Binnig, Eleet: Efficient learned query
 [3] L. Patel, S. Jha, C. Guestrin, M. Zaharia, Lotus:
                                                                       execution over text and tables, volume 17, VLDB En-
     Enabling semantic queries with llms over tables of
                                                                       dowment, 2024, pp. 4867–4880.
     unstructured and structured data, arXiv preprint
                                                                  [18] A. Miller, A. Fisch, J. Dodge, A.-H. Karimi, A. Bor-
     arXiv:2407.11418 (2024).
                                                                       des, J. Weston,        Key-value memory networks
 [4] Y. Zemlyanskiy, J. Ainslie, M. de Jong, P. Pham, I. Eck-
                                                                       for directly reading documents,           in: Proceed-
     stein, F. Sha, Readtwice: Reading very large documents
                                                                       ings of the 2016 Conference on EMNLP, Associa-
     with memories, in: Proceedings of NAACL, 2021.
                                                                       tion for Computational Linguistics, Austin, Texas,
 [5] B. AlKhamissi, M. Li, A. Celikyilmaz, M. Diab,
                                                                       2016. URL: https://aclanthology.org/D16-1147. doi:10.
     M. Ghazvininejad, A review on language models as
                                                                       18653/v1/D16-1147.

</pre>