=Paper=
{{Paper
|id=Vol-3946/PhDW_paper3
|storemode=property
|title=Towards a Neural Database Execution Engine
|pdfUrl=https://ceur-ws.org/Vol-3946/PhD-Workshop-3.pdf
|volume=Vol-3946
|authors=Christos Tsapelas
|dblpUrl=https://dblp.org/rec/conf/edbt/Tsapelas25
}}
==Towards a Neural Database Execution Engine==
Towards a Neural Database Execution Engine
Christos Tsapelas1,2
Supervised by Georgia Koutrika2
1
Department of Informatics and Telecommunications, National and Kapodistrian University of Athens
2
Archimedes, Athena Research Center, Greece
Abstract
Recent advances in natural language understanding have heightened the interest in AI systems capable of answering queries across
multiple data modalities, such as structured database tables and unstructured text. Current approaches typically rely on Large Language
Models (LLMs) to facilitate queries between these modalities, which incurs substantial computational costs and often yields suboptimal
performance. To this direction, this research introduces a novel query execution engine designed to bridge diverse data modalities,
leveraging the high-efficiency querying capabilities of database systems with the advanced reasoning capacities of LLMs. This paper
presents a prototype architecture for such a multi-modal database system, detailing its core components and their functionalities to
demonstrate how it can achieve effective, scalable query processing across structured and unstructured data.
Keywords
database systems, large language models, memory networks, hybrid query execution, virtual knowledge bases
1. Introduction strengths of both learned models and traditional database
methods, for rapid and precise query execution across mul-
The evolution of modern data warehouses has introduced tiple data modalities. The proposed neural execution engine
unprecedented challenges and opportunities, with data vol- seeks to empower database systems with the flexibility to
umes now encompassing multiple modalities, such as struc- handle diverse data modalities and complex query types, ad-
tured table data, unstructured text, and images. Each data dressing a critical need in the field of data management and
type possesses a unique structure, necessitating tailored paving the way for next-generation data retrieval solutions.
querying methods that effectively harness the properties
of each modality. However, despite these advances, exist-
ing systems struggle to generalize queries across multiple 2. Related Work
modalities, presenting a key limitation in addressing the
needs of diverse, cross-modal data integration tasks. Recent works in natural language understanding require
Database management systems (DBMS) excel in perform- retrieving and reasoning, like question answering. For such
ing rapid, efficient, and precise queries on extremely large knowledge-intensive tasks, it is required to assimilate in-
data volumes at scale. However, their primary focus remains formation from different sections of large different inputs
on exact computation at scale, with limited reasoning ca- such as books and article collections [4]. To this direction,
pabilities [1]. In contrast, Large Language Models (LLMs) the notions of Virtual Knowledge Bases (VKBs) [5, 6, 4] and
excel at processing natural language data across massive tex- Memory Networks [7, 8] are proposed, in which entity men-
tual corpora, offering logical reasoning over unstructured tions in text are transformed into dense representations to
data due to the model’s ability to embed knowledge within represent properties or relations expressed with text pas-
its weights [2]. This distinction highlights a crucial gap sages.
between traditional DBMS architectures and the reasoning Moreover, the advanced reasoning capabilities of LLMs
and flexibility capabilities that LLMs bring to unstructured using RAG in question answering, has emerged a new area
data processing. of research where the system takes as input both structured
Numerous contemporary applications demand complex and unstructured data for reasoning over different modal-
queries that integrate information across multiple modali- ities [1, 9, 3, 10, 11, 12, 13, 14], or use the LLMs as a query
ties [3]. Current dominant approaches for multi-modal data engine to pose SQL queries [2].
integration include retrieval-augmented generation (RAG), Simultaneously, the database community has introduced
similarity-based search, and Text2SQL. These techniques, an innovative research direction involving learning-based
however, exhibit limitations in both the diversity of query techniques to enhance query execution. Advances such as
types they can accommodate and their query execution per- learned sorting [15] and learned scans and joins algorithms
formance. Text2SQL methods, for instance, are effective for [16, 17] have demonstrated highly promising results in opti-
natural language queries that have a direct SQL equivalent, mizing traditional query processes. These methods indicate
whereas RAG systems are constrained to point lookups in- that machine learning techniques can significantly improve
volving only a limited number of records, requiring an LLM fundamental database operations, reinforcing the potential
to execute the join operation. for a hybrid approach that integrates both database and
To this direction, my doctoral research seeks to bridge LLM methodologies.
the gap between the approaches of LLMs and traditional
database systems to enable efficient, flexible hybrid search 3. Research Questions
queries. The objective is to develop a prototype neural
query execution engine equipped with novel algorithms The main purpose of this research proposal is the develop-
for efficient data access and join operations, leveraging the ment of a prototype execution engine able to execute queries
combining different data modalities. The implementation
Published in the Proceedings of the Workshops of the EDBT/ICDT 2025 of such a novel system, emerges a set of research questions:
Joint Conference (March 25-28, 2025), Barcelona, Spain
© 2025 Copyright for this paper by its authors. Use permitted under Creative Commons License
Attribution 4.0 International (CC BY 4.0).
CEUR
ceur-ws.org
Workshop ISSN 1613-0073
Proceedings
4. A Prototype of A Neural Database
Engine
In this section, an overview of the proposed neural database
execution engine is provided. Suppose the example query:
"Find all customers who purchased ’Product X’ and had posi-
tive experience regarding the quality of the product, within
the past six months.". This query is transformed into a multi-
modal SQL query and sent to the engine for execution. The
example query will assist to describe several aspects of the
proposed system, along with the execution flow of a hybrid
query between database tables and a text corpora.
In Figure [1], we present the architecture of our query
Figure 1: Architecture of Neural Execution Engine. The sys- engine. The example query is parsed and the system fetches
tem accepts a multi-modal SQL query and fetches the related
the Product and Sales database tables and their related men-
database and mention tables. Next the optimizer generates a
hybrid execution plan with: database scans (rectangles), mention
tion tables. Then, the optimizer is invoked to generate an
tables scans (circles), database join (triangle) database-mention optimal execution plan for the given query. The optimizer
table join (trapezoid) faces many challenges like selecting the appropriate scan
and join operators within and across modalities, while pre-
dicting their optimal order in the execution plan. In Figure 1,
different physical operators are separated to make clear the
RQ1 How can structured database tables and unstructured different processing steps. Finally, the generated execution
data sources, like text documents, be effectively associ- plan is submitted to the neural engine for execution.
ated to form a unified querying framework? Before the query engine can execute queries across both
RQ2 How can dense vector representations be constructed data modalities, a preparatory phase, referred to as mention
to preserve both the semantic richness of text and the tables construction, is required. In the provided example,
structural integrity of database entities? mentions of ’Product X’ must be recognized within text
RQ3 What new operators are required and how can passages. In this phase, the system generates a series of
database operators be adapted to process queries in- key-value (KV) tables that bridge the information within
volving structured and unstructured data? database tables and text documents stored in blob storage.
The objective of these KV tables is to create dense vec-
RQ4 How can a cost-based query optimizer be designed
tor representations of entities (keys) that encapsulate the
to generate efficient execution plans for hybrid queries
knowledge embedded in the text corpus (values). These
involving structured and unstructured data?
representations are structured to seamlessly integrate with
RQ5 What techniques can be employed to ensure the query a Transformer model, enabling efficient and effective pro-
engine scales efficiently for large datasets across multi- cessing by the query engine in subsequent stages.
ple modalities? Upon initializing these learned tables, the execution en-
gine is ready to process queries. When a query is posed,
3.1. Research Opportunities the system parses it and generates an optimal execution
plan. The plan selection process resembles that of tradi-
Building upon the related work, the proposed query en- tional database systems, wherein the optimizer explores the
gine represents a significant advancement in addressing the space of possible execution plans and evaluates candidate
previously outlined research questions. plans based on a cost model.
Virtual Knowledge Bases (VKBs) generate dense repre- Given the hybrid nature of the proposed query engine,
sentations of real-world entities, such as those found within which supports queries across multiple data modalities, it
Wikipedia, to enable querying. However, these representa- is necessary to define new hybrid operators capable of han-
tions have not been applied within the context of database dling data from both structured and unstructured sources,
systems. A key component of this research involves estab- like scans, projections, joins etc. These operators are de-
lishing connections between database entities, as defined by signed to facilitate seamless integration and processing of
the data model of each database, and external text corpora data across the diverse modalities enclosed by the system.
or additional modalities, such as images. During the next subsections, the main components of the
Furthermore, the current state-of-the-art approach for proposed query engine are describes. Initially, the process
integrating multiple data modalities relies on Multi-Modal of mention tables is described, a methodology to associate
Large Language Models (MLLMs). This methodology typi- table data with the available text corpus. Next, the core of
cally employs large-scale LLMs to process queries, a strategy the execution engine is detailed, focusing on the needed
that is computationally expensive and constrained by the operators and the query optimizer of the system.
input size limitations inherent to LLMs.
The primary objective of this research is to enable effi-
cient execution of multi-modal queries capable of manag- 4.1. Mention Tables
ing large-scale data in a manner aligned with traditional As previously noted, it is essential to establish associations
database systems. To achieve this, the research will extend between data from database tables and the available text
conventional database operators, such as scans and joins, documents, defining the specific types of information to be
by developing novel implementation algorithms designed retrieved and assimilated across these data sources [4]. The
to process diverse data modalities. database schema provides a structured representation of the
entities within the database, with clearly defined properties
for each entity type.
Thus, an initial processing step is proposed between the
different data sources, where each passage in text documents
is annotated with the main entities (fact tables) from the
database and we highlight entity mentions in the passage
with special tokens. Figure [2] shows the construction of
mention tables. Representations of these tokens are later Figure 2: Mention Tables represent entities inside database ta-
used to generate entity encodings. bles into vector representations.
The goal of mention tables is to gather these database
entity encodings into matrices constructing key-value stores
containing the dense vector representations for each entity joins, which retrieve passages where multiple entities are
in text documents forming a virtual knowledge base of the mentioned together, and contextual similarity joins, which
available text documents, like [7, 4, 18, 8] link entities based on the similarity of their dense vectors
and passage-level joins, which connect text passages that
4.2. Neural Query Execution reference related entities, enabling richer narratives
Joins between mention tables and database tables inte-
To extend the querying capabilities of traditional database grate structured and unstructured data to provide a unified
systems across multiple data modalities, it becomes neces- query interface. Entity-ID joins link entities in mention ta-
sary to adapt and expand conventional database operators bles to their corresponding database records, while property-
to effectively manage and process data from both structured based joins combine entities based on shared attributes, such
and unstructured sources. as linking customer mentions with their structured profiles.
Within a neural database system, operators are catego- In the example query, the first join of the execution plan
rized into two distinct types: a) single-modal operators, is an Entity-ID join between Product database and mention
which are designed to process a single data modality (e.g., tables. Aggregated knowledge joins enrich database records
structured table data or unstructured text passages), such as with insights from text passages, and hybrid semantic joins
scan operations, and b) multi-modal operators, which are bridge structured relationships in the database with seman-
capable of processing and associating inputs across multiple tic similarity in mention tables, enabling advanced querying
data modalities, such as join and aggregation operations, across diverse data modalities.
that integrate information from both structured and unstruc- While the traditional database operators are well-defined,
tured sources. Thus, there is an emerging need to extend the landscape of neural operators for execution is an active
traditional relational algebra used in database systems, to area of research. There are efforts from the database com-
describe the new neural operators for different modalities. munity enhancing traditional operators [15, 16, 13] with
In database systems, all operators in relational algebra neural models for faster query processing, while there are
take as input a relation and the output is the result of the approaches that propose learned operators, e.g learned scans
operator applied on the input relation, which is again a and joins [17]. Further, the proposed query engine can uti-
relation. In the case of the proposed query engine, we need lize the reasoning capabilities of LLMs to provide reasoning
to define the neural operators regarding the scan, filter and or summaries on the results of the aforementioned oper-
project of mention tables, as well the join implementation ators at the end of query results or on some intermediate
between the a database table and a mention table. step of query processing. In the provided example, the LLM
Scan Mention Tables Scan operations over mention is invoked to evaluate all reviews text passages per product.
tables enable efficient querying and processing of entity-
associated text passages. These include entity retrieval scans,
which extract passages linked to specific entities, like ’Prod-
4.3. Query Optimization
uct X’ in example query, and mention highlight scans, which The query optimizer for the proposed neural execution en-
identify all occurrences of targeted entities. Contextual sim- gine bridges the gap between structured and unstructured
ilarity scans rank passages based on semantic relevance to a data processing, enabling efficient query execution across
query vector, while entity-to-entity relationship scans reveal database tables and mention tables. By integrating tradi-
co-occurrences within text. Additional methods, such as tional database strategies with neural processing, it ensures
aggregated entity statistics scans and temporal or categorical scalability and adaptability for hybrid, multi-modal queries.
filters, allow for deeper insights by analyzing mention fre- A core characteristic of the optimizer is its cost-based
quency, context diversity, or filtering by specific attributes. approach, which evaluates potential query execution plans
Advanced operations, like neighborhood scans for exploring based on resource consumption, including computation
entity connections and multi-modal entity scans for linking time, cardinality estimation on both database and mention
to database tables, further enhance the querying capabilities tables, memory usage, and I/O overhead. For neural opera-
of mention tables. These methods leverage the dense vector tors, additional factors such as the cost of vector similarity
representations of entities to facilitate robust and flexible computations and embedding generation are built-in the
data exploration. cost model, ensuring an accurate evaluation of query plans.
Join Mention & Database Tables For join operations in Moreover, two very important aspects are the query de-
the proposed query engine, two types of joins are possible: composition and the cross-modal data flow. The optimizer
a) joins within mention tables and b) hybrid joins between decomposes complex queries into into modality-specific
database and mention tables. sub-queries, ensuring efficient processing of structured and
Joins within mention tables enable the discovery of rela- unstructured data by simultaneously selecting the most ap-
tionships between entities based on shared textual contexts propriate operators, as well as, their optimal order in the
or semantic relevance. These include entity co-occurrence execution plan. In this context, the proposed execution plan
should minimizes redundant computations and intermedi- knowledge bases, 2022. URL: https://arxiv.org/abs/
ate results, optimizing data transfer between operators, for 2204.06031. arXiv:2204.06031.
efficient cross-modal data flow. [6] M. de Jong, Y. Zemlyanskiy, N. A. FitzGerald, F. Sha,
Finally, the optimizer is designed to adapt to dynamic W. W. Cohen, Mention memory: incorporating textual
query workloads and evolving data characteristics by sup- knowledge into transformers through entity mention
porting runtime re-optimization. It monitors operator per- attention, in: 10th International Conference on Learn-
formance during execution and adjusts plans as needed. ing Representations, ICLR 2022, 2022.
Furthermore, it integrates pre-trained or fine-tuned neu- [7] S. Sukhbaatar, J. Weston, R. Fergus, et al., End-to-end
ral models for unstructured data processing, ensuring their memory networks, Advances in neural information
effective and efficient use in query execution. processing systems 28 (2015).
[8] Z. Zhong, T. Lei, D. Chen, Training language
models with memory augmentation, in: Proceed-
5. Conclusions and Future Work ings of the 2022 Conference on EMNLP, Association
for Computational Linguistics, 2022, pp. 5657–5673.
This paper presents a prototype query engine designed to
URL: https://aclanthology.org/2022.emnlp-main.382/.
execute queries across multiple data modalities efficiently
doi:10.18653/v1/2022.emnlp-main.382.
and at scale. A central proposition of this work is the initial
[9] L. Patel, P. Kraft, C. Guestrin, M. Zaharia, Acorn:
association of key entities from the database with their cor-
Performant and predicate-agnostic search over vector
responding references within the text corpus. The system
embeddings and structured data, Proceedings of the
then constructs mention tables, key-value tables containing
ACM on Management of Data 2 (2024) 1–27.
dense vector representations of entities and their related tex-
[10] A. Dargahi Nobari, D. Rafiei, Dtt: An example-driven
tual passages. The results of this approach have the poten-
tabular transformer for joinability by leveraging large
tial to inspire new algorithms that link structured database
language models, Proceedings of the ACM on Man-
information with external unstructured data sources.
agement of Data (SIGMOD) 2 (2024). URL: https://doi.
Furthermore, this novel tabular representation of unstruc-
org/10.1145/3639279. doi:10.1145/3639279.
tured text enables the development of specialized operators
[11] G. Badaro, M. Saeed, P. Papotti, Transformers
that the query engine must support. The design and im-
for tabular data representation: A survey of mod-
plementation of these operators establish the foundational
els and applications, Transactions of the Associa-
components of the envisioned query engine. Additionally,
tion for Computational Linguistics 11 (2023) 227–249.
the integration of these operators calls for the development
URL: https://aclanthology.org/2023.tacl-1.14/. doi:10.
of a new generation of query optimizers capable of gener-
1162/tacl_a_00544.
ating efficient execution plans across both structured and
[12] M. J. Cafarella, C. Re, D. Suciu, O. Etzioni, M. Banko,
unstructured data modalities. This research lays the funda-
Structured querying of web text, in: 3rd Biennial Con-
mentals for a novel approach to querying multi-modal data
ference on Innovative Data Systems Research (CIDR),
and opens new avenues for future exploration in hybrid
Asilomar, California, USA, 2007.
query optimization and execution strategies.
[13] C. Liu, M. Russo, M. Cafarella, L. Cao, P. B.
Chen, Z. Chen, M. Franklin, T. Kraska, S. Madden,
Acknowledgments G. Vitagliano, A declarative system for optimizing ai
workloads, arXiv preprint arXiv:2405.14696 (2024).
This work has been partially supported by DataGEMS, [14] Y. Lin, M. Hulsebos, R. Ma, S. Shankar, S. Zeigham,
funded by the European Union’s Horizon Europe Research A. G. Parameswaran, E. Wu, Towards accurate and
and Innovation programme, under grant agreement No efficient document analytics with large language
101188416 and by project MIS 5154714 of the National Recov- models, 2024. URL: https://arxiv.org/abs/2405.04674.
ery and Resilience Plan Greece 2.0 funded by the European arXiv:2405.04674.
Union under the NextGenerationEU Program. [15] A. Kristo, K. Vaidya, U. Çetintemel, S. Misra, T. Kraska,
The case for a learned sorting algorithm, in: Proceed-
ings of the 2020 ACM SIGMOD International Con-
References ference on Management of Data, SIGMOD ’20, As-
sociation for Computing Machinery, New York, NY,
[1] A. Biswal, L. Patel, S. Jha, A. Kamsetty, S. Liu, J. E.
USA, 2020, p. 1001–1016. URL: https://doi.org/10.1145/
Gonzalez, C. Guestrin, M. Zaharia, Text2sql is not
3318464.3389752. doi:10.1145/3318464.3389752.
enough: Unifying ai and databases with tag, arXiv
[16] I. Sabek, T. Kraska, The case for learned in-
preprint arXiv:2408.14717 (2024).
memory joins, Proc. VLDB Endow. 16 (2023)
[2] M. Saeed, N. De Cao, P. Papotti, Querying
1749–1762. URL: https://doi.org/10.14778/3587136.
large language models with sql, arXiv preprint
3587148. doi:10.14778/3587136.3587148.
arXiv:2304.00472 (2023).
[17] M. Urban, C. Binnig, Eleet: Efficient learned query
[3] L. Patel, S. Jha, C. Guestrin, M. Zaharia, Lotus:
execution over text and tables, volume 17, VLDB En-
Enabling semantic queries with llms over tables of
dowment, 2024, pp. 4867–4880.
unstructured and structured data, arXiv preprint
[18] A. Miller, A. Fisch, J. Dodge, A.-H. Karimi, A. Bor-
arXiv:2407.11418 (2024).
des, J. Weston, Key-value memory networks
[4] Y. Zemlyanskiy, J. Ainslie, M. de Jong, P. Pham, I. Eck-
for directly reading documents, in: Proceed-
stein, F. Sha, Readtwice: Reading very large documents
ings of the 2016 Conference on EMNLP, Associa-
with memories, in: Proceedings of NAACL, 2021.
tion for Computational Linguistics, Austin, Texas,
[5] B. AlKhamissi, M. Li, A. Celikyilmaz, M. Diab,
2016. URL: https://aclanthology.org/D16-1147. doi:10.
M. Ghazvininejad, A review on language models as
18653/v1/D16-1147.