<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>ORB: Empowering Graph Queries through Inference</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Sonia Horchidan</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Paris Carbone</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>KTH Royal Institute of Technology</institution>
          ,
          <addr-line>Stockholm</addr-line>
          ,
          <country country="SE">Sweden</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Executing queries on incomplete, sparse knowledge graphs yields incomplete results, especially when it comes to queries involving traversals. In this paper, we question the applicability of all known architectures for incomplete knowledge bases and propose ORB: a clear departure from existing system designs, relying on Machine Learning-based operators to provide inferred query results. At the same time, ORB addresses peculiarities inherent to knowledge graphs, such as schema evolution, dynamism, scalability, as well as high query complexity via the use of embedding-driven inference. Through ORB, we stress that approximating complex processing tasks is not only desirable but also imperative for knowledge graphs.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>QC
QC</p>
      <sec id="sec-1-1">
        <title>DSRelationaDlI</title>
        <p>Databases
S321 E
DD
QC
DD
QC</p>
      </sec>
      <sec id="sec-1-2">
        <title>DS Graph DI</title>
        <p>Databases</p>
        <p>S321 E</p>
        <p>DD</p>
        <p>DD
QC</p>
        <p>DS DI
Graph Processing</p>
        <p>Systems</p>
        <p>S321 E</p>
        <p>DD</p>
        <p>QC</p>
        <p>DS DI
Graph Streaming</p>
        <p>Frameworks
3SE
2
1</p>
        <p>DD
MaDcShiGneraLpeharDnIing DS ORBDI
Figure 1: Systems coverage for
the key challenges of KGs.</p>
        <p>SC SL Definition</p>
        <p>1 The schema cannot change.</p>
        <p>SE 2 Requires additional operations to support schema changes.</p>
        <p>3 The schema can change with no additional operations.</p>
        <p>1 Can only accommodate new entries via full reconstruction.</p>
        <p>DD 2 Ingests new data entries in batches.</p>
        <p>3 Ingests new data on-the-fly.</p>
        <p>1 Cannot infer new knowledge.</p>
        <p>DI 2 Manual inference logic possible using a query language.</p>
        <p>3 Can natively infer new knowledge.</p>
        <p>1 Can scale to non-skewed graphs.</p>
        <p>DS 2 Can scale to skewed graphs using user-defined partitioning.</p>
        <p>3 Can scale to any graph size and skew automatically.</p>
        <p>1 Can eficiently process local queries.</p>
        <p>QC 2 Can eficiently compute global iterative queries.</p>
        <p>3 Can eficiently compute nested queries.
on KGs; employing approximation is not a design option but a necessity in overcoming a series
of critical shortcomings in graph reasoning today.</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. Key Challenges</title>
      <p>
        To elaborate further on the complex dimensions surrounding KGs and inspired by the work of
Lissandrini et al. [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], we compose a framework of properties (depicted in Table 1) that reflect
core needs in modern query-serving systems.
      </p>
      <p>
        Schema evolution (SE) measures the adaptability of a system to structural changes in the data.
Knowledge graphs lack data schematization, allowing data of any type to be ingested ad
infinitum. On the other hand, data dynamism (DD) defines the system’s capability to enable evolution
in terms of additions or updates that comply with the current schema. Data incompleteness (DI)
involves answering questions given partial knowledge, where, naturally, the outcome is also
bound to be incomplete. This problem is especially daunting for traversal-based computations,
where the missing links strip away potential answers with each intermediate hop. Dealing with
incompleteness implies the need to find novel methods for deriving new conclusions based on
stored historical information. Data scalability (DS) measures the ability to scale to any graph
size and skew. Even though it has been thoroughly researched for decades, scaling out to
massive, power-law graphs that are at the same time, dynamic is still an open challenge [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
Choosing an unsuitable partitioning algorithm can lead to increased query latency, especially
for data traversals that combine information across partitions. Finally, query complexity (QC)
refers to algorithmic computational complexity. For this framework, we define three diferent
support levels as follows: (1) local queries that consist of point look-ups or local traversals,
(2) global fixed-point iterative queries that compute a graph property in polynomial time (e.g.,
PageRank), and (3) nested queries that refer to intractable problems (such as counting motifs
and multi-hop reasoning). The latter currently lacks efective optimization support in most
modern systems, despite their critical need in the graph community [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
      <p>ORB targets the data incompleteness challenge but also pursues the other dimensions through
the lens of approximations via inference-based computations.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Overview of Existing Systems</title>
      <p>
        We include a short overview of the state-of-the-art graph data management systems and discuss
their coverage with respect to the core challenges identified in the previous section (Figure 1).
Relational Databases. The relational model requires manual interventions in the shape of
expensive table alterations when ingesting out-of-schema data points. Hence, relational DBs
ofer partial SE support in our taxonomy. On the other hand, the relational model allows new
data points to be inserted into the tables on-the-fly, thus fully supporting DD. As of today, no
RDBMS can infer new conclusions automatically, but the capability to write custom inference
logic in SQL is possible. For this reason, RDBMSes ofer limited support only for DI. Next, a
RDBMS can natively scale out but does not ofer distribution-aware data partitioning, meaning
that relational DBs ofer no support for DS. Lastly, worst-case optimal join algorithms for
subgraph queries on relational underlying data representation have shown potential in ofering
limited support for QC in our taxonomy [
        <xref ref-type="bibr" rid="ref7 ref8 ref9">7, 8, 9</xref>
        ].
      </p>
      <p>
        Graph Databases. Graph DBs (e.g., Neo4j [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]) ofer better support for SE than their relational
counterpart through the adoption of the property graph data model [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. The model is flexible
enough to fully support DD. Regarding DI, the inference capability is partially ensured through
the graph-tailored query language. Most native graph DBs currently do not ofer automatic data
sharding or do so by relying on naive hash partitioning, with the exception of Neo4j, which
supports manual data partitioning only. For this reason, graph DBs ofer only limited support for
DS. Lastly, when it comes to QC, some specialized tools can eficiently compute global iterative
queries (e.g., Neo4j’s GDS library). However, nested graph problems have been largely
overlooked due to their computational unfeasibility [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Thus, graph DBs ofer limited support for QC.
Graph Processing Systems. Graph Processing Systems (GPSes) such as Pregel [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] and
GraphLab [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], are data-intensive graph system libraries built on MapReduce-based frameworks.
Their focus is computation at scale on stored, static data, which requires the schema to be
known at compile time. Therefore, they are incapable of dealing with SE. In some cases, updates
can occur in batches, making GPSes partially capable of DD. Next, GPSes can be programmed
using simple, limited APIs such as the vertex-centric BSP model [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. Using such APIs, the user
can compose graph-global measurements that activate per existing vertex or neighborhood.
However, this level of simplicity disallows computation on non-existing inferred elements of
a graph; therefore, no support for DI is feasible. On the other hand, GPSes are purposefully
designed to scale out to massive, skewed graphs, qualifying them for full support level for
DS [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. Lastly, they can handle global iterative queries but cannot eficiently compute nested
queries, thus having limited support for QC in our framework.
      </p>
      <p>Graph Streaming Frameworks. Graph Streaming Frameworks (GSFs) target high ingestion
rates and compute global aggregates over streaming data. Similarly to GPSes, GSFs (1) require
a-priori knowledge of the schema, resulting in limited support for SE, (2) provide restricted</p>
      <p>APIs, having no support for DI, (3) are designed to scale out and have full support for DS, and
(4) cannot compute nested workloads eficiently, thus supporting QC only partially. GSFs difer
from GPSes in terms of DD support: GSFs are designed to efectively ingest new data in a
streaming fashion.</p>
      <p>This analysis concludes that no current system architecture is equipped to deal with
incomplete knowledge bases, as their querying capabilities are built upon data traversals. Furthermore,
challenges such as data scalability and query complexity still remain critical open problems.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Graph Machine Learning: The Promise</title>
      <p>The emergent need for a sustainable solution to graph computing is driving researchers to seek
a paradigm shift toward methodologies tolerating increasing levels of data incompleteness while
avoiding the penalty of expensive graph traversals and computations. This section discusses
the properties of graph ML, a promising direction to overcome these challenges.</p>
      <p>
        Graph Representation Learning, specifically Graph Neural Networks (GNNs) [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ], have been
showing outstanding results in discovering unknown patterns and predicting insightful facts.
Their innovation stems from the idea of translating the high-dimensional, non-euclidean space
of graphs into latent, low-dimensional spaces. Each graph node, edge, or even subgraph can
be transformed into a continuous, dense vector (i.e., embedding) that condenses both intrinsic
features and adjacent topological information. Computations can then, in turn, be translated
from non-euclidean spaces to the Euclidean latent spaces of embeddings.
      </p>
      <p>
        The known capabilities of graph ML currently go beyond classifications or regressions,
obtaining impressive estimates for intricate problems and alleviating the computational cost
of running complex queries. Recent work hints at the capabilities of GNNs to approximate
dynamic programming methods and solve any graph algorithm [
        <xref ref-type="bibr" rid="ref25 ref26 ref27">25, 26, 27</xref>
        ]. Table 2 shows
examples of graph ML methods proposed in the literature to solve and approximate a selection
of graph queries with remarkable inference results. With ORB, we acknowledge the potential
of graph ML to enhance the capabilities of query-serving systems. To prove the outstanding
benefits of graph ML in this context, we dedicate the following section to Query2Box [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ], an
embedding-based query execution method on KGs.
      </p>
      <sec id="sec-4-1">
        <title>4.1. The Case of Query2Box</title>
        <p>
          Query2Box [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ] approximates first-order, multi-hop queries and ofers accurate, predicted
answers. Moreover, it computes the results in constant time (i.e., time complexity does not
depend on the graph size), thus eliminating the need for expensive traversals on explicitly
stored data. Query processing boils down to performing box projections and intersections
in the latent space, followed by a cheap look-up on raw data to retrieve the results’ explicit
information [
          <xref ref-type="bibr" rid="ref28">28</xref>
          ]. Therefore, Query2Box caters to missing data in the knowledge bases with the
additional advantage of accelerating multi-hop queries.
        </p>
        <p>
          The source paper of Query2Box stands as a testimony to the practice of applying inference in
extracting insightful answers even in incomplete data scenarios [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ]. We now conduct a set of
preliminary experiments to also quantify its expected query acceleration benefits in comparison
to systems relying on traversal-based methods, such as Neo4j 1 . We choose multi-hop queries of
diferent lengths and measure individual query latencies, which we define as the time it takes the
system to process and return the results of a specific query. Rather than absolute values, we
observe the trends and behavior of the two alternatives. The results (Figure 2) show that Query2Box
answers reasoning queries in constant, reliable time, regardless of the cardinality of the result set
and despite the non-optimized Python implementation. The query latency grows linearly with
the query depth and has very low variation across the query spectrum. Whereas, Neo4j’s query
latency grows exponentially and results in a high number of outliers, hinting at an unpredictable
execution time that depends on the degree of the nodes touched during the traversal.
        </p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Tackling Key System Challenges</title>
        <p>Compared to the considered graph-oriented systems (Section 3), the inference capabilities of
graph ML make it the only viable candidate to conquer DI. Furthermore, as shown in the previous
section, a learned model has the potential to approximate nested, non-polynomial queries in
polynomial time, proving that ML can achieve full support for QC. Regarding DS, we argue
that embedding-driven query-answering does not depend on data locality because traversing
raw data is replaced with traversing latent spaces. Still, DD and SE are not trivial to achieve.
Most ML methods are inherently transductive; the ingestion of new data points is impossible
after model deployment. Furthermore, inferring using out-of-schema data is unsuccessful since
learned models usually require complete knowledge of all the data types at training time.</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. What about Uncertainty?</title>
        <p>
          Relaxing the preciseness of graph queries to achieve fast answers comes at the cost of increased
mistrust, which is especially problematic in this case since KGs model sensitive domains (e.g.,
1The VM was equipped with Intel(R) Xeon(R) @2.00GHz 32-core CPU, 120GB RAM, Nvidia Tesla T4
GPU, 8GB VRAM, Python 3.8. We compared Neo4j Community 4.4.9 against the original Query2Box
implementation2on the FB15K-237 dataset [
          <xref ref-type="bibr" rid="ref29">29</xref>
          ]. We use the first 5000 queries defined as training data in Query2Box
for each query length. Only non-trivial queries whose result set’s cardinality is higher than 1 were depicted.
2https://github.com/hyren/query2box
Optimal Physical
PBlaonunSdeedarqcuhery Error CDFs Logical Plan
        </p>
        <p>uncertainty Physical Plan
Hybrid Query Execution
iFnalsatteenxtescpuaticoens. CPU
Precise execution GPU TPU
on raw data.
Query Executor
ireyengnuE treaoghpS MoEdmMeblasendMadgaiennrgagser
hQ raG I/O Manager
p
a
rG Physical Layer</p>
        <p>Constrained Graph Queries
MATCH (TH:Person {name: 'Tom Hanks’})-[d:DIRECTED]-&gt;</p>
        <p>(M:Movie)-[nf:NOMINATED_FOR]-&gt;(A:Award)
RWEITTUHRNMAAXIMUM UNCERTAINTY 0.05; [CQ]
Data Maintenance and Access</p>
        <p>Drift monitoring</p>
        <p>
          ML models updates
Explicit data Latent Space Embeddings generation
cybersecurity, autonomous vehicles, or medicine). The uncertainty of the predictions must
be quantified on a per-query basis. To the best of our knowledge, no general-purpose
framework exists to quantify the uncertainty of embedding-based approximations, but specialized
uncertainty modeling exists for diferent methods (e.g., for embedding-based first-order logic
queries [
          <xref ref-type="bibr" rid="ref30 ref31">30, 31</xref>
          ]). Conformal Prediction and Venn Predictors show two promising directions for
increasing the trust in ML-based inferences by ofering lightweight procedures for of-the-shelf
to any black box model [
          <xref ref-type="bibr" rid="ref32">32</xref>
          ]. The first adapts the prediction to satisfy the user-defined error
rate (i.e., significance level), while the latter outputs probabilistic predictions. We believe such
methods can be adopted for graph ML tasks [
          <xref ref-type="bibr" rid="ref33">33</xref>
          ].
        </p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. The ORB Architecture</title>
      <p>Figure 3 depicts ORB, the visionary system architecture we designed based on the observations
and conclusions presented in the previous sections. Contrary to traversal-based graph systems,
ORB ofers inferred results and moves expensive operations from the non-euclidean space of
graph data to the latent spaces of learned embeddings. ORB’s operators are designed hybridly:
according to a user-defined uncertainty threshold, ORB chooses to execute operators either
explicitly, on raw data, or in latent spaces, using ML inference. We now provide a high-level
view of the key architecture components and weigh in on some essential design decisions.</p>
      <sec id="sec-5-1">
        <title>5.1. Graph Query Engine</title>
        <p>
          The graph query engine schedules, coordinates, and executes all graph queries. The design
approach resembles relational DBs, starting with a logical plan obtained through query parsing
and deriving an optimized physical plan with data/model access strategies. The graph engine
comprises three modules: the query parser, the query optimizer, and the query executor. The
query parser aims to extend existing graph DSLs. Similar to prior work [
          <xref ref-type="bibr" rid="ref34">34</xref>
          ], ORB annotates
queries with user-defined uncertainty bounds, as it can be noted in the Cypher query CQ
depicted in Figure 3. The bounds can assist the query optimizer in identifying the optimal
physical plan. The query executor is reminiscent of traditional database query execution. We
now dedicate the remainder of this section to the core of the ORB vision, the query optimizer.
Query Optimizer. In ORB, certain operators can be carried out in the latent spaces, resulting
in a mix of raw data operations and embedding-based ones. The Query Optimizer lists all the
candidate physical plans, considering that the same operator can be executed either on raw
data or by potentially diferent ML models. If the user allows for approximate results, ORB
        </p>
        <p>Candidate Physical Plans</p>
        <p>M Φ [REJECTED] M Φ [REJECTED]M Φ
must find the best trade-of between uncertainty and processing cost; running queries in the
embedding space can lead to erroneous predictions but at a lower execution time. Thus, two cost
models are required. A performance-based cost model assigns a score for each valid physical
query plan based on the estimated processing cost. While the cost of traditional operators
can be estimated using standard techniques, the execution of ML-based operators requires
model inference calls that use specialized hardware acceleration. Therefore, estimating the cost
of such operators boils down to a predefined number of matrix multiplications on respective
hardware. An uncertainty-based cost model computes statistically sound prediction errors
using lightweight instrumentation as suggested in Section 4.3. The uncertainty of a candidate
plan can be computed by combining the uncertainty of each model used (e.g., as chained prior
probabilities). If no plan can be identified to satisfy the required error threshold, ORB can choose
to execute the query entirely on explicitly stored data, therefore providing zero uncertainty.
Example:  in Figure 3 illustrates a two-hop Cypher query that searches for all the award
nominations for movies directed by Tom Hanks with a maximum allowed error of 5%. Figure 4
shows the corresponding logical plan. We define the operator (ℎ, ) as the set of nodes  such
that  is an answer to the path query (ℎ, , ?). We assume that a link prediction model is available
to the system. The optimizer can either choose to follow explicit links in the stored graph
(depicted as  ) or to rely on link prediction (shown as  ). The query optimizer identifies
four candidate physical plans. Plan A is the cheapest physical plan with only traditional, explicit
operators. Plans B and C contain a mix of embedding-based and raw data operators, while plan
D operates entirely in the latent space. The cost optimizer identifies plan C as optimal for the
given query since it shows the best performance cost at an acceptable uncertainty level.</p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. Graph Storage</title>
        <p>
          The storage component manages the access to persistent data. Given that ORB relies on
embedding-based query executions, the system continuously maintains ML models. The I/O
Manager operates similarly to today’s systems, as its goal is to oversee the access at the raw
data level. Distinctively, ORB extends the storage engine design by introducing managers for
the models and embeddings, which constitute the main concern of this section.
Model Manager. One-of deployments of pre-trained ML models are unsuitable since the
models become obsolete when the underlying distribution of the data changes [
          <xref ref-type="bibr" rid="ref35">35</xref>
          ]. The Model
Manager incorporates change into the models. Two alternatives exist nowadays for keeping
ML models up-to-date: (1) monitor the data and re-train the models when the drift is detected,
or (2) integrate change into the models incrementally [
          <xref ref-type="bibr" rid="ref36 ref37 ref38">36, 37, 38</xref>
          ]. The first requires lower
computational costs, at the expense of introducing non-determinism between answers of the
same query asked at diferent times. Whereas, the latter happens as a background process
that spends computation cycles continuously, but the embeddings can benefit from numerical
stability over time [
          <xref ref-type="bibr" rid="ref39">39</xref>
          ].
        </p>
        <p>
          Embedding Manager. The embeddings manager is tightly connected to the model manager;
it is in charge of computing and updating the embeddings. The model type determines the
embeddings generation process. Both transductive and inductive embedding methods [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ]
could benefit from caching policies for the embeddings of heavy hitters to alleviate either the
I/O penalty of retrieving embeddings from persistent storage or the computational cost of
computing new embeddings.
        </p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Design Prospects</title>
      <p>We discuss ORB’s design decisions and highlight the trade-ofs, advantages, and limitations of
replacing explicit query executions with ML-based approximations.</p>
      <p>
        Benefits. Besides ”complete” results, the ORB architecture poses significant potential benefits:
1. Fast approximate answers. ORB’s envisioned design avoids expensive and unnecessary
computations that result in nonetheless incomplete results. We argue that, when it comes to
KGs, computationally intensive, exact answers are not desirable since the data is notoriously
incomplete. Thus, burning computation cycles for accurate answers is impractical.
2. Predictably low latency. ORB’s performance cost model reliably the query execution time.
The query plans will consist of a finite set of traditional database operators and ML model
inferences. The cost of inference boils down to the number of pipelined matrix multiplication
operations on respective hardware and is invariant to the input and intermediary results.
3. Seamless integration with modern hardware acceleration. Query answering is handled by ML
scoring that can easily make use of modern hardware acceleration techniques, as opposed to
traditional graph mining techniques that sufer from irregular access patterns [
        <xref ref-type="bibr" rid="ref40">40</xref>
        ].
4. Model-locality, not Data-locality. Contrary to traversal-based systems, data locality becomes
less critical since there is no need to materialize all intermediate steps of a query, thus avoiding
altogether pointer-chasing or maintaining indices for this purpose.
5. Raw Graph State Management. The graph storage layer can potentially be substituted by
existing RDBMSs or graph DBs transparently to the rest of the system. The core requirements
of this layer are (1) fast point lookups, and (2) support for model dynamic, heterogeneous data.
Latency versus Completeness. Graph ML cannot handle schema evolution eficiently due to
models requiring prior knowledge of the schema. However, ORB executes query plans on a mix
of explicitly stored data and embeddings. To achieve full SE and DD, ORB falls back on following
raw data links for queries that operate on new data types while the models have time to be
updated to incorporate the new information. The anticipated outcome is, therefore, a slower
query execution with no new derived conclusions. As time passes and the models get updated,
the queries can rely on inference-based calls and obtain low-latency responses with high
incompleteness support. Under these conditions, we acknowledge the trade-of that emerges
between query latency and data incompleteness support, which can be noted in Figure 5.
Open Problems and Opportunities. Even though ORB empowers graph exploration with
promising benefits, we must acknowledge the open problems related to our visionary
architecture and highlight the interesting research directions required to bring this vision to fruition.
Challenge #1. Inductive graph ML models. Graph ML should advance toward fully inductive
models that can score unseen data points on-the-fly [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]. Full model inductiveness is set to ofer
desirable advantages in adaptability and throughput, leading to seamless integration in data
management systems.
      </p>
      <p>
        Challenge #2. Low-latency predictions. The (scarce) inductive graph ML methods that exist mostly
rely on extracting -hop node neighborhoods. Retrieving this information from the persistent
storage of machines storing diferent graph partitions can quickly become a bottleneck [
        <xref ref-type="bibr" rid="ref41">41</xref>
        ].
Challenge #3. Inference and heterogeneity. Current graph ML methods ofer little to no support
for evolving schemas. Eforts such as one-shot or few-shot learning should be explored further
to avoid expensive model re-training and adapt to new data types [
        <xref ref-type="bibr" rid="ref42 ref43">42, 43</xref>
        ].
      </p>
      <p>Challenge #4. Uncertainty modeling. ORB requires error guarantees for ML-backed inferences
that can be estimated at low computational expenses before the inference is executed so that
the cost model can eficiently choose the appropriate physical plan.</p>
      <p>Discussion Finally, we discuss the applicability of ORB to use cases of critical nature. ORB
will not benefit situations and queries where precise answers or explicit graph traversals are
essential. For this reason, ORB allows the users to constrain the error of the result; if a query
does not allow for uncertain, predicted results, ORB will act as a traditional database. Therefore,
ORB could potentially be built as an extension of existing databases.</p>
    </sec>
    <sec id="sec-7">
      <title>7. Related work</title>
      <p>
        Heavy research currently gravitates around ML algorithms, yet, an all-encompassing practical
system approach is still in need. For example, we observe works focusing on training embedding
models eficiently but failing to address continuous serving and monitoring [
        <xref ref-type="bibr" rid="ref44 ref45">44, 45</xref>
        ]. The time is
ripe for such systems to emerge and potentially solve the major challenges of graph exploration.
      </p>
      <p>
        Our work is a possible materialization of Neural Graph Databases, which was introduced in a
recent report to address the incompleteness assumption of large KGs using Graph Representation
Learning powered by ML inference [
        <xref ref-type="bibr" rid="ref46">46</xref>
        ]. Furthermore, similar research directions attempt to
bring predictive capabilities to SPARQL via the usage of embeddings [
        <xref ref-type="bibr" rid="ref47">47</xref>
        ]. ORB extends these
lines of work by aiming to incorporate error-guided query optimization techniques.
      </p>
      <p>
        Secondly, the research area of neural databases makes use of Natural Language Processing
to ingest and query unstructured data [
        <xref ref-type="bibr" rid="ref48">48</xref>
        ]. The ORB vision complements this line of work
with ML-assisted query acceleration and result completion of common graph queries with no
modifications to the database user interface.
      </p>
      <p>
        A related area of work to ORB is approximate query processing (AQP) [
        <xref ref-type="bibr" rid="ref34 ref49">34, 49, 50, 51, 52</xref>
        ],
which has seen research advancements in the use of learned models and indexes to improve
performance. These new approaches can achieve faster and more accurate approximations of
queries using a smaller amount of memory compared to traditional sample-based AQP [53, 54].
ORB is aligned with this trend by utilizing embedding-based models to approximate graph
queries and also including mechanisms to estimate the approximation error.
      </p>
      <p>Materialized views also relate to the ORB vision as another common methodology of
“preparing” certain query results in advance [55, 56]. ORB’s envisioned storage layer can be enhanced
from view materialization for the storage of both raw data and produced embeddings.</p>
      <p>Finally, ORB’s objective is similar to that of knowledge graph systems like Vadalog [57],
which provide logical reasoning capabilities. However, ORB aims to achieve reasoning through
lightweight ML models with bounded state size, rather than computationally expensive
rulebased methods.</p>
    </sec>
    <sec id="sec-8">
      <title>8. Conclusion</title>
      <p>We foresee that KGs will be the catalyst for the next-generation data-driven applications;
therefore, designing systems well-equipped to handle KGs is essential. In this paper, we introduced
ORB, our visionary system architecture that tackles data incompleteness. ORB opens new
research avenues in graph processing and optimization, with interesting challenges relating to
model-driven data management. We support that a proof-of-concept implementation of ORB
could empower new opportunities in data analysis, granting users richer insights and better
control over uncertainty with significantly lower overhead.</p>
    </sec>
    <sec id="sec-9">
      <title>Acknowledgments</title>
      <p>This work was supported by the Swedish Foundation for Strategic Research (BD15-0006),
Wallenberg Foundation (WASP), and Google Cloud for Education. We also thank Vasiliki Kalavri
for her valuable feedback.
processing, in: SIGMOD Conference, ACM, 2018, pp. 1461–1476.
[50] S. Chaudhuri, B. Ding, S. Kandula, Approximate query processing: No silver bullet, in:</p>
      <p>SIGMOD Conference, ACM, 2017, pp. 511–519.
[51] S. Agarwal, H. Milner, A. Kleiner, A. Talwalkar, M. I. Jordan, S. Madden, B. Mozafari,
I. Stoica, Knowing when you’re wrong: building fast and reliable approximate query
processing systems, in: SIGMOD Conference, ACM, 2014, pp. 481–492.
[52] S. Kandula, A. Shanbhag, A. Vitorovic, M. Olma, R. Grandl, S. Chaudhuri, B. Ding, Quickr:
Lazily approximating complex adhoc queries in bigdata clusters, in: SIGMOD Conference,
ACM, 2016, pp. 631–646.
[53] S. Thirumuruganathan, S. Hasan, N. Koudas, G. Das, Approximate query processing for
data exploration using deep generative models, in: ICDE, IEEE, 2020, pp. 1309–1320.
[54] Q. Ma, A. M. Shanghooshabad, M. Almasi, M. Kurmanji, P. Triantafillou, Learned
approximate query processing: Make it light, accurate and fast., in: CIDR, 2021.
[55] A. Gupta, I. S. Mumick, Maintenance of materialized views: Problems, techniques, and
applications, IEEE Data Eng. Bull. 18 (1995) 3–18.
[56] J. M. F. da Trindade, K. Karanasos, C. Curino, S. Madden, J. Shun, Kaskade: Graph views
for eficient graph analytics, in: ICDE, IEEE, 2020, pp. 193–204.
[57] L. Bellomarini, E. Sallinger, G. Gottlob, The vadalog system: Datalog-based reasoning for
knowledge graphs, Proc. VLDB Endow. 11 (2018) 975–987.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>M.</given-names>
            <surname>Lissandrini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Mottin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Hose</surname>
          </string-name>
          , T. B.
          <string-name>
            <surname>Pedersen</surname>
          </string-name>
          ,
          <article-title>Knowledge graph exploration systems: are we lost?, in: CIDR, www</article-title>
          .cidrdb.org,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>R.</given-names>
            <surname>West</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Gabrilovich</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Murphy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Gupta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <article-title>Knowledge base completion via search-based question answering</article-title>
          , in: WWW, ACM,
          <year>2014</year>
          , pp.
          <fpage>515</fpage>
          -
          <lpage>526</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Q.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Mao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Guo</surname>
          </string-name>
          ,
          <article-title>Knowledge graph embedding: A survey of approaches and applications</article-title>
          ,
          <source>IEEE Trans. Knowl. Data Eng</source>
          .
          <volume>29</volume>
          (
          <year>2017</year>
          )
          <fpage>2724</fpage>
          -
          <lpage>2743</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Zheng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Hao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <article-title>Graph neural networks: Taxonomy, advances, and trends</article-title>
          ,
          <source>ACM Trans. Intell. Syst. Technol</source>
          .
          <volume>13</volume>
          (
          <year>2022</year>
          )
          <volume>15</volume>
          :
          <fpage>1</fpage>
          -
          <lpage>15</lpage>
          :
          <fpage>54</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Abbas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Kalavri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Carbone</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Vlassov</surname>
          </string-name>
          ,
          <article-title>Streaming graph partitioning: an experimental study</article-title>
          ,
          <source>Proceedings of the VLDB Endowment</source>
          <volume>11</volume>
          (
          <year>2018</year>
          )
          <fpage>1590</fpage>
          -
          <lpage>1603</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>S.</given-names>
            <surname>Sahu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mhedhbi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Salihoglu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. T.</given-names>
            <surname>Özsu</surname>
          </string-name>
          ,
          <article-title>The ubiquity of large graphs and surprising challenges of graph processing</article-title>
          ,
          <source>Proc. VLDB Endow</source>
          .
          <volume>11</volume>
          (
          <year>2017</year>
          )
          <fpage>420</fpage>
          -
          <lpage>431</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>X.</given-names>
            <surname>Feng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Jin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Chen</surname>
          </string-name>
          , C. Liu,
          <string-name>
            <given-names>S.</given-names>
            <surname>Salihoğlu</surname>
          </string-name>
          ,
          <article-title>Kùzu graph database management system, in: CIDR, www</article-title>
          .cidrdb.org,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>M. J.</given-names>
            <surname>Freitag</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bandle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Schmidt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kemper</surname>
          </string-name>
          , T. Neumann,
          <article-title>Adopting worst-case optimal joins in relational database systems</article-title>
          ,
          <source>Proc. VLDB Endow</source>
          .
          <volume>13</volume>
          (
          <year>2020</year>
          )
          <fpage>1891</fpage>
          -
          <lpage>1904</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>A.</given-names>
            <surname>Mhedhbi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Salihoglu</surname>
          </string-name>
          ,
          <article-title>Optimizing subgraph queries by combining binary and worstcase optimal joins</article-title>
          ,
          <source>Proc. VLDB Endow</source>
          .
          <volume>12</volume>
          (
          <year>2019</year>
          )
          <fpage>1692</fpage>
          -
          <lpage>1704</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <fpage>Neo4j</fpage>
          , Inc.,
          <source>Neo4j</source>
          ,
          <year>2022</year>
          . URL: https://neo4j.com/,
          <source>[Online; accessed February 13</source>
          ,
          <year>2023</year>
          ].
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>M.</given-names>
            <surname>Lissandrini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Brugnara</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Velegrakis</surname>
          </string-name>
          ,
          <article-title>Beyond macrobenchmarks: microbenchmarkbased graph database evaluation</article-title>
          ,
          <source>Proceedings of the VLDB Endowment</source>
          <volume>12</volume>
          (
          <year>2018</year>
          )
          <fpage>390</fpage>
          -
          <lpage>403</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>G.</given-names>
            <surname>Malewicz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. H.</given-names>
            <surname>Austern</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. J. C.</given-names>
            <surname>Bik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. C.</given-names>
            <surname>Dehnert</surname>
          </string-name>
          , I. Horn,
          <string-name>
            <given-names>N.</given-names>
            <surname>Leiser</surname>
          </string-name>
          , G. Czajkowski,
          <article-title>Pregel: a system for large-scale graph processing</article-title>
          , in: SIGMOD Conference, ACM,
          <year>2010</year>
          , pp.
          <fpage>135</fpage>
          -
          <lpage>146</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Low</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gonzalez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kyrola</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Bickson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Guestrin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. M.</given-names>
            <surname>Hellerstein</surname>
          </string-name>
          ,
          <article-title>Distributed graphlab: A framework for machine learning and data mining in the cloud</article-title>
          ,
          <source>Proceedings of the VLDB Endowment</source>
          <volume>5</volume>
          (
          <year>2012</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>J. E.</given-names>
            <surname>Gonzalez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Low</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Gu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Bickson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Guestrin</surname>
          </string-name>
          ,
          <article-title>Powergraph: Distributed graphparallel computation on natural graphs</article-title>
          , in: OSDI, USENIX Association,
          <year>2012</year>
          , pp.
          <fpage>17</fpage>
          -
          <lpage>30</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>A.</given-names>
            <surname>Grover</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Leskovec</surname>
          </string-name>
          , node2vec:
          <article-title>Scalable feature learning for networks</article-title>
          ,
          <source>in: KDD, ACM</source>
          ,
          <year>2016</year>
          , pp.
          <fpage>855</fpage>
          -
          <lpage>864</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>W. L.</given-names>
            <surname>Hamilton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Ying</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Leskovec</surname>
          </string-name>
          ,
          <article-title>Inductive representation learning on large graphs</article-title>
          ,
          <source>in: NIPS</source>
          ,
          <year>2017</year>
          , pp.
          <fpage>1024</fpage>
          -
          <lpage>1034</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>M.</given-names>
            <surname>Galkin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. G.</given-names>
            <surname>Denis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W. L.</given-names>
            <surname>Hamilton</surname>
          </string-name>
          ,
          <article-title>Nodepiece: Compositional and parametereficient representations of large knowledge graphs, in: ICLR, OpenReview</article-title>
          .net,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>N.</given-names>
            <surname>Park</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X. L.</given-names>
            <surname>Dong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Faloutsos</surname>
          </string-name>
          ,
          <article-title>Estimating node importance in knowledge graphs using graph neural networks</article-title>
          ,
          <source>in: KDD, ACM</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>596</fpage>
          -
          <lpage>606</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>A.</given-names>
            <surname>Merchant</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gionis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mathioudakis</surname>
          </string-name>
          ,
          <article-title>Succinct graph representations as distance oracles: An experimental evaluation</article-title>
          ,
          <source>Proc. VLDB Endow</source>
          .
          <volume>15</volume>
          (
          <year>2022</year>
          )
          <fpage>2297</fpage>
          -
          <lpage>2306</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>F. M.</given-names>
            <surname>Bianchi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Grattarola</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Alippi</surname>
          </string-name>
          ,
          <article-title>Spectral clustering with graph neural networks for graph pooling</article-title>
          , in: ICML, volume
          <volume>119</volume>
          <source>of Proceedings of Machine Learning Research, PMLR</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>874</fpage>
          -
          <lpage>883</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>H.</given-names>
            <surname>Ren</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. Leskovec,</surname>
          </string-name>
          <article-title>Query2box: Reasoning over knowledge graphs in vector space using box embeddings, in: ICLR, OpenReview</article-title>
          .net,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>R.</given-names>
            <surname>Ying</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Lou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>You</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Wen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Canedo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Leskovec</surname>
          </string-name>
          , Neural subgraph matching, CoRR abs/
          <year>2007</year>
          .03092 (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>M.</given-names>
            <surname>Zwolak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Abbas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Horchidan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Carbone</surname>
          </string-name>
          , V. Kalavri,
          <article-title>GCNSplit: bounding the state of streaming graph partitioning</article-title>
          , in: aiDM@SIGMOD, ACM,
          <year>2022</year>
          , pp.
          <volume>3</volume>
          :
          <fpage>1</fpage>
          -
          <lpage>3</lpage>
          :
          <fpage>12</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>M.</given-names>
            <surname>Gori</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Monfardini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Scarselli</surname>
          </string-name>
          ,
          <article-title>A new model for learning in graph domains</article-title>
          ,
          <source>in: Proceedings. 2005 IEEE International Joint Conference on Neural Networks</source>
          ,
          <year>2005</year>
          ., volume
          <volume>2</volume>
          , IEEE,
          <year>2005</year>
          , pp.
          <fpage>729</fpage>
          -
          <lpage>734</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>M.</given-names>
            <surname>Fereydounian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Hassani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Dadashkarimi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Karbasi</surname>
          </string-name>
          ,
          <article-title>The exact class of graph functions generated by graph neural networks</article-title>
          ,
          <source>arXiv preprint arXiv:2202.08833</source>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>A.</given-names>
            <surname>Dudzik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Velickovic</surname>
          </string-name>
          ,
          <article-title>Graph neural networks are dynamic programmers</article-title>
          ,
          <source>CoRR abs/2203</source>
          .15544 (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>K.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. S.</given-names>
            <surname>Du</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Kawarabayashi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Jegelka</surname>
          </string-name>
          ,
          <article-title>What can neural networks reason about?</article-title>
          , in: ICLR, OpenReview.net,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>P.</given-names>
            <surname>Indyk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Motwani</surname>
          </string-name>
          ,
          <article-title>Approximate nearest neighbors: Towards removing the curse of dimensionality</article-title>
          , in: STOC, ACM,
          <year>1998</year>
          , pp.
          <fpage>604</fpage>
          -
          <lpage>613</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>K.</given-names>
            <surname>Toutanova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <article-title>Observed versus latent features for knowledge base and text inference</article-title>
          ,
          <source>in: CVSC, Association for Computational Linguistics</source>
          ,
          <year>2015</year>
          , pp.
          <fpage>57</fpage>
          -
          <lpage>66</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [30]
          <string-name>
            <given-names>H.</given-names>
            <surname>Ren</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Leskovec</surname>
          </string-name>
          ,
          <article-title>Beta embeddings for multi-hop logical reasoning in knowledge graphs</article-title>
          ,
          <source>NeurIPS</source>
          <volume>33</volume>
          (
          <year>2020</year>
          )
          <fpage>19716</fpage>
          -
          <lpage>19726</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [31]
          <string-name>
            <given-names>X.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Boratko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. S.</given-names>
            <surname>Dasgupta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X. L.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>McCallum</surname>
          </string-name>
          ,
          <article-title>Probabilistic box embeddings for uncertain knowledge graph reasoning, in: NAACL-HLT, Association for Computational Linguistics</article-title>
          ,
          <year>2021</year>
          , pp.
          <fpage>882</fpage>
          -
          <lpage>893</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          [32]
          <string-name>
            <given-names>V.</given-names>
            <surname>Vovk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gammerman</surname>
          </string-name>
          , G. Shafer,
          <source>Algorithmic learning in a random world, Springer Science &amp; Business Media</source>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          [33]
          <string-name>
            <given-names>R. F.</given-names>
            <surname>Barber</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. J.</given-names>
            <surname>Candes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ramdas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. J.</given-names>
            <surname>Tibshirani</surname>
          </string-name>
          ,
          <article-title>Conformal prediction beyond exchangeability</article-title>
          ,
          <source>arXiv preprint arXiv:2202.13415</source>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          [34]
          <string-name>
            <given-names>S.</given-names>
            <surname>Agarwal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Mozafari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Panda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Milner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Madden</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Stoica</surname>
          </string-name>
          ,
          <article-title>Blinkdb: queries with bounded errors and bounded response times on very large data</article-title>
          , in: EuroSys, ACM,
          <year>2013</year>
          , pp.
          <fpage>29</fpage>
          -
          <lpage>42</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          [35]
          <string-name>
            <given-names>A.</given-names>
            <surname>Tsymbal</surname>
          </string-name>
          ,
          <article-title>The problem of concept drift: definitions and related work</article-title>
          , Computer Science Department, Trinity College Dublin 106 (
          <year>2004</year>
          )
          <fpage>58</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          [36]
          <string-name>
            <given-names>J.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Song</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <article-title>Streaming graph neural networks via continual learning</article-title>
          ,
          <source>in: CIKM, ACM</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>1515</fpage>
          -
          <lpage>1524</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          [37]
          <string-name>
            <given-names>L.</given-names>
            <surname>Galke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Franke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Zielke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Scherp</surname>
          </string-name>
          ,
          <article-title>Lifelong learning of graph neural networks for open-world node classification</article-title>
          , in: IJCNN, IEEE,
          <year>2021</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>8</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref38">
        <mixed-citation>
          [38]
          <string-name>
            <given-names>M.</given-names>
            <surname>Perini</surname>
          </string-name>
          , G. Ramponi,
          <string-name>
            <given-names>P.</given-names>
            <surname>Carbone</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Kalavri</surname>
          </string-name>
          ,
          <article-title>Learning on streaming graphs with experience replay</article-title>
          ,
          <source>in: Proceedings of the 37th ACM/SIGAPP Symposium on Applied Computing</source>
          ,
          <year>2022</year>
          , pp.
          <fpage>470</fpage>
          -
          <lpage>478</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref39">
        <mixed-citation>
          [39]
          <string-name>
            <given-names>M.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <article-title>Understanding graph embedding methods and their applications</article-title>
          ,
          <source>SIAM Rev</source>
          .
          <volume>63</volume>
          (
          <year>2021</year>
          )
          <fpage>825</fpage>
          -
          <lpage>853</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref40">
        <mixed-citation>
          [40]
          <string-name>
            <given-names>X.</given-names>
            <surname>Shi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zheng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Jin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.-S.</given-names>
            <surname>Hua</surname>
          </string-name>
          ,
          <article-title>Graph processing on GPUs: A survey, ACM Computing Surveys (CSUR) 50 (</article-title>
          <year>2018</year>
          )
          <fpage>1</fpage>
          -
          <lpage>35</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref41">
        <mixed-citation>
          [41]
          <string-name>
            <given-names>H.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Srivastava</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Zeng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Kannan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V. K.</given-names>
            <surname>Prasanna</surname>
          </string-name>
          ,
          <article-title>Accelerating large scale real-time GNN inference using channel pruning</article-title>
          ,
          <source>Proc. VLDB Endow</source>
          .
          <volume>14</volume>
          (
          <year>2021</year>
          )
          <fpage>1597</fpage>
          -
          <lpage>1605</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref42">
        <mixed-citation>
          [42]
          <string-name>
            <given-names>W.</given-names>
            <surname>Xiong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Guo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W. Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <article-title>One-shot relational learning for knowledge graphs</article-title>
          , in: EMNLP, Association for Computational Linguistics,
          <year>2018</year>
          , pp.
          <fpage>1980</fpage>
          -
          <lpage>1990</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref43">
        <mixed-citation>
          [43]
          <string-name>
            <given-names>C.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Yao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. V.</given-names>
            <surname>Chawla</surname>
          </string-name>
          ,
          <article-title>Few-shot knowledge graph completion</article-title>
          , in: AAAI, AAAI Press,
          <year>2020</year>
          , pp.
          <fpage>3041</fpage>
          -
          <lpage>3048</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref44">
        <mixed-citation>
          [44]
          <string-name>
            <given-names>H.</given-names>
            <surname>Ren</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Dai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Dai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Leskovec</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Schuurmans</surname>
          </string-name>
          ,
          <article-title>SMORE: knowledge graph completion and multi-hop reasoning in massive knowledge graphs</article-title>
          ,
          <source>in: KDD, ACM</source>
          ,
          <year>2022</year>
          , pp.
          <fpage>1472</fpage>
          -
          <lpage>1482</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref45">
        <mixed-citation>
          [45]
          <string-name>
            <given-names>N.</given-names>
            <surname>Sheikh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Qin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Reinwald</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Lei</surname>
          </string-name>
          ,
          <article-title>Scaling knowledge graph embedding models for link prediction</article-title>
          , in: EuroMLSys@EuroSys, ACM,
          <year>2022</year>
          , pp.
          <fpage>87</fpage>
          -
          <lpage>94</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref46">
        <mixed-citation>
          [46]
          <string-name>
            <given-names>H.</given-names>
            <surname>Ren</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Galkin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Cochez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Leskovec</surname>
          </string-name>
          ,
          <article-title>Neural graph reasoning: Complex logical query answering meets graph databases</article-title>
          ,
          <source>arXiv preprint arXiv:2303.14617</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref47">
        <mixed-citation>
          [47]
          <string-name>
            <given-names>H.</given-names>
            <surname>Kang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Hong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Park</surname>
          </string-name>
          , S. Kwon,
          <article-title>On integrating knowledge graph embedding into SPARQL query processing</article-title>
          , in: ICWS, IEEE,
          <year>2018</year>
          , pp.
          <fpage>371</fpage>
          -
          <lpage>374</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref48">
        <mixed-citation>
          [48]
          <string-name>
            <given-names>J.</given-names>
            <surname>Thorne</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Yazdani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Saeidi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Silvestri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Riedel</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. Y. Levy</surname>
          </string-name>
          ,
          <article-title>From natural language processing to neural databases</article-title>
          ,
          <source>Proc. VLDB Endow</source>
          .
          <volume>14</volume>
          (
          <year>2021</year>
          )
          <fpage>1033</fpage>
          -
          <lpage>1039</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref49">
        <mixed-citation>
          [49]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Park</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Mozafari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Sorenson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wang</surname>
          </string-name>
          , Verdictdb: Universalizing approximate query
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>