<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Towards Better Understanding of the Performance and Design of Datalog Systems</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Zhiwei Fan</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sunil Mallireddy</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Paraschos Koutris</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Wisconsin-Madison</institution>
          ,
          <addr-line>WI</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <fpage>166</fpage>
      <lpage>180</lpage>
      <abstract>
        <p>Recent years have seen a resurgence of interest in the Datalog language and its syntactic extensions from both the industry and academia. Such interest has motivated a line of work to build eficient Datalog systems that support expressing data analytics of diferent types such as program and graph analyses in a concise and simple way. However, besides the performance improvement of diferent systems presented in these works over existing competitors, little understanding has been gained about the property of varying Datalog workloads (i.e., program and data), and the computation resource usage of diferent systems (i.e., memory and CPU utilization), which are crucial for understanding the essence behind the performance diference observed in diferent systems. When such knowledge is gained, clear guidance could be provided for users to choose between diferent Datalog systems depending on their use cases and system builders to improve the existing system or build more eficient new systems. In this paper, we propose the general profiling of the Datalog program evaluation and present the corresponding visualizations. We further discuss the insights gained from the produced visualizations and how these insights shed light on the pros and cons of existing Datalog systems, which further provides guidance on making improvements over the existing system and designing/building more eficient new systems.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Datalog Systems</kwd>
        <kwd>Recursive Computation</kwd>
        <kwd>Benchmarks</kwd>
        <kwd>Profiling</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        no description has been provided for the profile of the recursive computation. Here, by profile,
we mean information such as the number of iterations, how many facts are produced in every
iteration, etc. As shown in RecStep [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], the relative performance of diferent Datalog systems
may not translate across diferent workloads (i.e., a system that performs well on one Datalog
program and a particular dataset does not show comparable performance on the others). The
lack of a closer look at the recursive computation profiles makes it dificult to analyze and
explain the performance diference across diferent systems and workloads. In turn, this makes
choosing the best system for applications of interest (for users) and improving existing systems
(for system builders) challenging. For example, when attempting to build a new Datalog system,
we need to answer questions such as what techniques in existing systems can we leverage? what
are the limitations of existing systems? are there Datalog workloads of diferent characteristics that
need to be handled diferently? what could be done to improve existing techniques?
      </p>
      <p>To address the aforementioned issue, we argue that a general-purpose recursive computation
profiling framework that can provide insights across systems and workloads is needed. In
this work, we make the first step towards building such a profiling framework by presenting
four important profiling components: recursion profile, runtime, CPU utilization and memory
utilization. First, we describe these four components and briefly discuss their importance. Then
we present case studies based on the profiling visualizations of Datalog workloads from two
application domains: graph analytics and program analysis. As shown in Section 3, there is
no single system always being the winner across all Datalog workloads, even within the same
application domain. With the help of profiling visualizations, we analyze the causes behind
the ineficient executions, extracting insights regarding the proper use cases and limitations
of the studied systems (Section 3.3). By analyzing high-level causes (revealed by the profiling
components) of the ineficiency exposed by a system, one is able to connect these high-level
observations to the specific technical components in the system (e.g., data structures, algorithms),
understanding the system limitations better.</p>
      <p>
        In this paper, we focus on single-node systems, mainly looking at three recently published
well-documented Datalog engines that are publicly available: RecStep [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], Soufle [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], and
DDlog [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Soufle is a new recent high-performance Datalog system that is mainly designed
for program analysis and uses optimization techniques such as eficient program synthesis,
specialized parallel data structures for indexing and compression, and automatic index selection.
RecStep is a Datalog engine built on top of an eficient single-node in-memory database called
Quickstep[
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], leveraging multiple years of eforts in the advancement of database techniques
such as query optimization and eficient parallel query execution. DDlog translates a set
of Datalog rules into the corresponding Diferential Dataflow [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] program that allows for
incremental computation. Other Datalog systems such as BigDatalog [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], BDDBDDB [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] have
been shown to be significantly outperformed by two of the systems (i.e., RecStep, Soufle)
studied in this paper [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] with some notable system limitation and performance issues and
therefore are excluded from the study. We do not consider solvers designed for answer-set
programming such as clingo [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] and dlv [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] as the recent study [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] shows that they are
not capable of handling Datalog workloads in our interested application domains (e.g., graph
analytics, program analysis) due to ineficient use of memory. Our study focuses on positive
Datalog programs (i.e., without negation) that involve recursion.
      </p>
      <p>
        We use the profiling functionalities embedded in RecStep [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], which is able to provide general
profiling information of diferent systems on the Datalog workload evaluation that is not tied to
a specific system. Although the profiling components presented are fairly simple, they already
provide meaningful insights that can aid further system analysis and improvement, which is
not possible by looking solely at the performance numbers.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Recursive Computation Profiling</title>
      <p>
        Most existing systems evaluate Datalog programs using a type of bottom-up evaluation called
semi-naïve evaluation (SN) [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] either explicitly (e.g., RecStep, Soufle) or implicitly (e.g., DDlog).
      </p>
      <p>At each iteration of the recursive computation, there are three types of facts that are important
to consider. Facts of the first type are generated from the evaluation of each recursive rule
(generated facts, GF). The generated facts can contain duplicates, so we also need to consider
the facts after deduplication, called unique generated facts, UGF. Finally, the set-diference
is performed between the unique generated facts and the existing facts to produce the new
facts, NF. Some Datalog systems perform the deduplication and set-diference separately (e.g.,
RecStep), and other systems (e.g., Soufle, DDlog) fuse these two steps into one, often through
the maintained indexes built on the IDB relations throughout the whole computation procedure.</p>
      <p>
        As we will see in Section 3, the sizes of these three diferent types of facts in diferent iterations
serve as the primary fingerprint of the Datalog workload and help better understand the behavior
of various systems along with other profiling information such as runtime and resource usage.
Next, we briefly discuss a few major components for recursive computation profiling. We note
that these components are not the only ones to look at when analyzing the system performance
on varying Datalog workloads. When being available, additional information such as the size of
input data, and hot code paths could be useful and provide additional insights.
Recursion Profile The sizes of facts of three
diferent types in each iteration of the recursive
computation characterize the Datalog workload, 1e8
wwtseaguirrzhesaioeteituiacistvonlhoewondlfcf,yafoDlG,ayencasatFstt,lsGihaasUoalFrtofsgvGgteeohFfrg,eG,uaaaEFUlpnesDGdpsbB)eFeN/catiiwnF≥nficdperD,ueeUaastnaGnptsraGdepFellceFoaNtctgiFiivoficpen≥drlsaoya)n.gt.NdadrNAeasFnUotemotetGeat((eFici.t..hehetIh..ain,,tet-- ftscFoa#0123451 2UNGneewinqeuFr3eaatcGeIttesdenrFae4atricaottnsed#5Facts 6 7iI()tttrsLoaaen7
indicates that many facts were produced
multiple times in the same iteration, while a large gap
between UGF and NF indicates that many Figure 1: Recursive Profile of TC-G 10k
facts have already been produced in previous
iterations. Note that NF also indicates the amount of work to be done during the next iteration
in SN. As an example, Figure 1 is the recursive profile showing the sizes of facts of three types
across seven iterations during SN of transitive closure evaluated on G10 dataset, which will
be analyzed in detail in Section 3.
Runtime Runtime is probably the most straightforward performance measure of diferent
systems on a given workload. The runtime of many existing Datalog systems can be divided
into compilation time and evaluation time. Systems such as Soufle and DDlog first generate
the code given the input program, followed by compilation-level optimizations and executable
binary generation. The overhead induced by code generation and compilation can be safely
ignored, assuming that the generated executable files will be used repetitively later with diferent
inputs. However, such an assumption might not always hold and may not be acceptable in
circumstances where the overhead far exceeds the evaluation time. Thus, it is crucial to have
access to a clear view of runtime breakdown when considering a specific application.
CPU Utilization Like other data-parallel compute engines, recent Datalog systems [
        <xref ref-type="bibr" rid="ref1 ref11 ref2 ref3">1, 2, 3, 11</xref>
        ]
exploit the parallelism packed inside modern servers to achieve high performance and scalability.
However, achieving consistent high CPU eficiency and utilization across diferent workloads is
challenging. Low performance could occur due to either low CPU eficiency (suggesting that
the system might handle more work than necessary), or low CPU utilization (meaning that the
system does not utilize multiple CPU cores well).
      </p>
      <p>
        Memory Consumption Many recent works focus on building in-memory Datalog systems [
        <xref ref-type="bibr" rid="ref1 ref2 ref3 ref4">1,
2, 3, 4</xref>
        ]. However, most of them either ignore the evaluation of memory utilization [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] or miss the
comparison with other existing Datalog systems [
        <xref ref-type="bibr" rid="ref2 ref3">3, 2</xref>
        ]. Since most of the evaluations presented
in these works are standalone (i.e., a system only evaluates one workload at a time in a server
without interference), the lack of understanding of the memory footprint makes it hard to
choose the proper hardware (e.g., a server with small or large memory), estimate the scalability
of a Datalog program (e.g., the maximum dataset the system can handle), and the applicability
(e.g., whether concurrent evaluation is feasible or not).
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Case Studies</title>
      <p>We next present the Datalog programs that arise from graph analytics (TC, SG, REACH) and
program analysis (AA, CSPA, CSDA) followed by the case studies of the corresponding Datalog
workloads. These Datalog programs are supported by all three systems of interest in this paper.
We first look at the linear recursive Datalog programs (TC, SG, REACH, CSDA), in which each
program consists of one non-recursive rule and one linear recursive rule (i.e., the rule body
contains only one recursive IDB predicate). Then, we study the Datalog workloads of non-linear
recursive programs (AA, CSPA) each of which contains at least one recursive rule that has
more than one recursive IDB predicate in the rule body. As we will see from the recursive
computation profiles of these workloads, even when two Datalog programs look very similar,
the relative performance of diferent systems can be very diferent. This happens when two
programs are from the same domain (e.g., TC, REACH) or diferent application domains (e.g.,
REACH, CSDA).</p>
      <p>
        All experiments are conducted on a bare-metal server in Cloudlab [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], a large cloud
infrastructure. The server runs Ubuntu 18.04 LTS and has two Intel Xeon E5-2660 v3 2.60 GHz
(Haswell EP) processors. Each processor has 10 cores, and 20 hyper-threading hardware threads.
The server has 160GB memory and each NUMA node is directly attached to 80GB of memory.
We only consider the CPU and memory utilization of the systems during their actual execution
period and thus the time period used for code generation and compilation is excluded for CPU
and memory profiling. Information about the input and output is summarized in Table 1.
      </p>
      <sec id="sec-3-1">
        <title>3.1. Simple Linear Recursion</title>
        <sec id="sec-3-1-1">
          <title>Transitive Closure (TC):</title>
          <p>Same Generation (SG):
tc(X, Y) :- arc(X, Y).</p>
          <p>tc(X, Y) :- tc(X, Z), arc(Z, Y).
sg(X, Y) :- arc(P, X), arc(P, Y), X ̸= Y.
small (&lt; 1MB), large intermediate results (i.e., three types of facts) as shown in Figure 2a
and Figure 3a are generated. We observe the following features across all recursion profiles
for the above tasks: () the number of iterations is relatively small, () there is a large gap
between GF and UGF , which suggests eficient deduplication is critical to the overall
good performance, and () the gap between UGF and NF is small or non-existent in
most cases.</p>
          <p>
            For TC and SG, while all three systems have relatively high CPU utilization throughout the
evaluation (Figures 2c, 3c), the runtime varies. Besides the relatively long compilation time
of DDlog (∼ 200), DDlog’s evaluation time is about 2 − 3 longer than that of the other
two competitors (i.e., Soufle and RecStep), using a large amount of memory (Figures 2c, 3c).
The performance number of DDlog is significantly worse than that of its runtime primitive
Diferential Dataflow [
            <xref ref-type="bibr" rid="ref6">6</xref>
            ] as shown in [
            <xref ref-type="bibr" rid="ref15">15</xref>
            ]. The ineficiency of DDlog could be attributed to the
fact that its design heavily focuses on incremental computation (e.g., maintaining intermediate
states of large sizes, separate management of existing computation and monitoring new input,
etc), sacrificing the performance for batch processing.
          </p>
          <p>
            RecStep uses significantly more memory (Figures 2d, 3d) on the small input datasets (i.e.,
G10 and G5) compared to other workloads (Figures 4d-12d), the sizes of input datasets of
which vary from 22MB to 1.7GB. This is because RecStep performs deduplication as a separate
step, relying on its backend in-memory relational database QuickStep [
            <xref ref-type="bibr" rid="ref7">7</xref>
            ], which pre-allocates
the memory to the hash table for deduplication based on the size of the generated facts. Due
to the memory ineficiency observed in such cases, RecStep soon runs out of memory when
evaluating TC and SG on graphs with a large number of vertices. At the same time, the edge
inclusion probability remains the same. We run RecStep using its default interpretation mode
without the specialized parallel bit-matrix evaluation (PBME) designed for dense graphs of small
vertices. While PBME [
            <xref ref-type="bibr" rid="ref1">1</xref>
            ] is an eficient technique to address this issue, it is specifically designed
for graphs with a relatively small number of vertices, and its generality is limited.
          </p>
          <p>
            In contrast, Soufle has acceptable overhead from the code generation and compilation
(∼ 10), showing the overall best performance on TC-G10 and SG-G5 with a small memory
footprint mainly thanks to its specialized parallel data structure Brie [
            <xref ref-type="bibr" rid="ref16">16</xref>
            ] for relation storage
and indexing, which provides good compression capability for high-density relations of large
sizes and support of eficient parallel operations (e.g., insertion, lookup).
          </p>
        </sec>
        <sec id="sec-3-1-2">
          <title>Reachability (REACH):</title>
          <p>Context-Sensitive Dataflow Analysis (CSDA):
reach(Y) :- id(Y).
reach(Y) :- reach(X), arc(X, Y).
null(X, Y) :- nullEdge(X, Y).
null(X, Y) :- null(X, W), arc(W, Y).
100
)%80
(
eg60
a
ten40
c
rPe20
00
0 recstep souffle ddlog</p>
          <p>Datalog Systems
(b) Runtime</p>
          <p>We run REACH on orkut, a relatively large real-world online social network dataset in which
the friendship of users is represented as edges. REACH finds the friends of a given set of user ids.
Figure 4 is the recursive computation profile of REACH- orkut and we observe that the relative
system performance looks quite diferent from what is observed on TC-G 10 and SG-G5.
RecStep significantly outperforms Soufle and DDlog and the reasons are two-fold: () GF is
relatively small across just a few number of total iterations, resulting in the negligible overhead
of the separate deduplication step and overall eficiency of RecStep () RecStep utilizes CPU
much more eficiently compared to Soufle and DDlog, which sufer from the long warm-up
phase (e.g., index building) due to the much larger EDB/input sizes. Systems such as Soufle
using a specialized data structure for indexing (i.e., Brie), may sufer from much bigger index
building overhead if the characteristic of indexed data is adversarial to the property of the
indexing data structure. For REACH on relatively sparse large real-world social graphs, we
observe larger indexing overhead in Soufle as the input graph size increases (e.g., livejounral
→ orkut → twitter in Table 1), which results in larger relative performance gap between
Soufle and RecStep when the total number of iterations of recursive computation remains
small (e.g., Figures 5a, 4a, 5c). The observation suggests that the input data itself serves as
an important profiling component sometimes and should be considered together with other
profiling components (e.g., recursion profile, CPU utilization).
1e5</p>
          <p>)
Generated Facts 787
Unique Generated Facts (
New Facts itoan
r
e
t
I
t
s
a</p>
          <p>L
100 200 300 400 500 600 700 800</p>
          <p>Iteration #
(a) Recursion Profile
0 recstep souffle ddlog</p>
          <p>Datalog Systems
(b) Runtime
100</p>
          <p>200Time30(s0)
(c) CPU Utilization</p>
          <p>100 200Time30(s0) 400
(d) Memory Consumption
500</p>
          <p>CSDA can be seen as a variant of transitive closure: first a set of null edges is given to initialize
the non-recursive rule, and then the linear-recursive rule looks the same as that of TC. However,
the recursive computation profiles of CSDA and TC workloads are very diferent. We observe
that RecStep performs significantly worse compared to Soufle and DDlog (Figures 6b, 7b, 8b)
while its CPU utilization over time is lower compared to Soufle and DDlog (Figures 6c, 7c, 8c).
Comparing with TC-G5 and REACH-orkut, the CSDA workloads on linux, postgresql and
httpd have a very long tail in their recursion profiles (Figures 6a, 7a, 8a): it takes a large number
of iterations for the evaluation to reach the fixpoint , and most of the work is performed during the
ifrst few iterations. After further digging, we have confirmed that RecStep’s poor performance is
mainly due to the lack of continuously maintained indexes throughout the program evaluation
that its competitors Soufle and DDlog have. This forces RecStep to reconstruct hash tables and
recstep
souffle
ddlog
400
500
)12
(%10
e
g8
a
t
n6
e
rc 4
e
P2
0
)
1
2
7
(
n
o
i
t</p>
          <p>UNGneewinqeuFreaatcGetesdnFearcattsed Facts IttrsLaae
80 160 240 320 400 480 560 640 720</p>
          <p>Iteration #
(a) Recursion Profile
0 recstep souffle ddlog</p>
          <p>Datalog Systems
(b) Runtime
repeatedly probe the input table (i.e., arc) for joins in every iteration. The resulting overhead
accumulates across iterations, leading to poor CPU utilization and eficiency when the Datalog
workloads have very long-tail recursion profiles. This observation shows the necessity of
continuously maintained indexes for consistent eficient recursive Datalog program execution
in these cases.</p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Non-linear Recursion</title>
        <sec id="sec-3-2-1">
          <title>Andersen’s Analysis (AA):</title>
          <p>pointsTo(Y, X) :- addressOf(Y, X).
pointsTo(Y, X) :- assign(Y, Z), pointsTo(Z, X).
pointsTo(Y, W) :- load(Y, X), pointsTo(X, Z), pointsTo(Z, W).</p>
          <p>pointsTo(Z, W) :- store(Y, X), pointsTo(Y, Z), pointsTo(X, W).</p>
          <p>Context-Sensitive Points-To Analysis (CSPA):
valueFlow(Y, X) :- assign(Y, X).
valueFlow(X, X) :- assign(X, Y).
valueFlow(X, X) :- assign(Y, X).
valueFlow(X, Y) :- assign(X, Z), memoryAlias(Z, Y).</p>
          <p>valueFlow(X, Y) :- valueFlow(X, Z), valueFlow(Z, Y).
valueAlias(X, Y) :- valueFlow(Z, X), valueFlow(Z, Y).</p>
          <p>valueAlias(X, Y) :- valueFlow(Z, X), memoryAlias(Z, W), valueFlow(W, Y).
memoryAlias(X, X) :- assign(Y, X).
memoryAlias(X, X) :- assign(X, Y).</p>
          <p>memoryAlias(X, W) :- dereference(Y, X), valueAlias(Y, Z), dereference(Z, W).</p>
          <p>
            Switching from linear recursion to non-linear recursion, we observe that RecStep significantly
outperforms Soufle and DDlog (Figure 9b) for Andersen’s Analysis evaluated on the largest
input dataset (∼ 1.2G) used in [
            <xref ref-type="bibr" rid="ref1">1</xref>
            ]. Figure 9c shows that both Soufle and RecStep have a long
warm-up time in which the CPU utilization is very low. The reason is similar to that observed
in the evaluation of REACH on large graphs - since there are four EDB/input relations in AA,
the total size of which is relatively large, more preprocessing work for Soufle and DDlog (e.g.,
index construction) is needed.
          </p>
          <p>Since the size of generated facts is relatively small across diferent iterations, RecStep evaluates
AA eficiently while using only a small amount of memory (Figure 9d). Additionally, AA
has nonlinear-recursive rules, and all rules in AA derive the facts for the same relation (i.e.,
pointsTo), in which case RecStep is able to fully utilize CPU and evaluate all rules in parallel.</p>
          <p>The Datalog program itself alone is insuficient to characterize the recursive computation. For
CSPA, the recursive computation profiles of three systems on linux dataset look very diferent
from the ones on postgresql and httpd datasets. Interestingly, we can see that similar
recursive profiles (Figure 11a and Figure 12a) come along with similar profiling information on
runtime (Figure 11b and Figure 12b), CPU (Figure 11c and Figure 12c) and memory (Figure 11d
and Figure 12d). DDlog’s performance seems to be very sensitive to the sizes of the intermediate
results: when there are large number of facts being generated in several iterations, DDlog
recstep
souffle
ddlog
)10
( 8
%
e
g
a6
t
n
ce4
r
e
P2
0</p>
          <p>Generated Facts )24
Unique Generated Facts (
New Facts itoan
r
e
t
I
t
s
a</p>
          <p>L
6 9 Ite1ra2tion15# 18 21 24
(a) Recursion Profile
turns out to require a great amount of memory to maintain the intermediate states (Figure 11d
and Figure 12d) and the corresponding overhead also afects the overall performance greatly
(i.e., DDlog is outperformed by RecStep and Soufle as shown in Figure 11b and Figure 12b).
Such inference can be additionally strengthened by looking at Figure 10, in which Figure 10a
shows fewer facts are generated during the iterative evaluation and DDlog show better relative
performance over RecStep in Figure 10b while using considerably less memory as shown in
Figure 10d.</p>
        </sec>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Discussion</title>
        <p>The case studies above show how we can leverage the recursive computation profiling
components described in Section 2 to understand better the performance diference between diferent
Datalog systems on a given Datalog workload. This also allows us to identify the limitation of
existing systems. For example, DDlog is unsuitable for batch-processing on the workload that
generates intermediate results of large sizes but might be of proper use when GF is small.
Soufle performs well primarily because of the heavy optimization of indexes, but indexing
could become a bottleneck in some cases (Figure 4b and Figure 9b). The large performance gap
between RecStep and the other two Datalog systems observed on CSDA reveals the importance
of continuously maintained indexes to good overall performance. Finally, the large memory
consumption of RecStep observed on TC-G10 and SG-G5 suggests such indexes should also
have good compression capability for memory eficiency.</p>
        <p>Generated Facts
Unique Generated Facts</p>
        <p>New Facts
3
6 9 1It2era1t5ion1#8 21 24 27
(a) Recursion Profile
0 recstep souffle ddlog</p>
        <p>Datalog Systems
(b) Runtime
0 recstep souffle ddlog</p>
        <p>Datalog Systems
(b) Runtime</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusion and Future Work</title>
      <p>We have recently observed a renewed interest in Datalog. While recent work has significantly
advanced the state-of-the-art of Datalog evaluation techniques, we believe that a systematic
way to help gain a better understanding of these techniques is of great importance. Besides
the recursive computation profiling we propose in the paper, a standard benchmark covering
2.00
1.75
ts1.50
c
a1.25
fF1.00
o0.75
#0.50
0.25
0.00</p>
      <p>Generated Facts )31
Unique Generated Facts(
New Facts itoan
r
e
t
I
t
s
a</p>
      <p>L
8 12Iter1a6tion2#0 24 28 32
(a) Recursion Profile
0 recstep souffle ddlog</p>
      <p>
        Datalog Systems
(b) Runtime
diferent aspects of Datalog workloads is needed. This benchmark should be in analogy to
the online analytical processing benchmarks [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] that have been used for more than two
decades for performance validation of decision support systems. Without a benchmark suite
that covers Datalog workloads with diferent profiles, one can only gain a partial view of the
system performance, which could lead to the lack of essential factors needed during application
deployment (for users) and miss of system design decisions (for system builders).
      </p>
      <p>Our ongoing and future work includes improving the recursive computation profiling by
adding other possible profiling components, building the new high-performance Datalog system
based on the recursive computation profiling, and creating a comprehensive benchmark suite.
Once such a benchmark is available and we have a better understanding of the characteristics of
diferent recursive profiles, we wish to find ways to estimate the recursive profile of a Datalog
workload without actually running it, which we believe will be helpful to think of evaluation
strategies and techniques that provide consistent, eficient execution across Datalog workloads
of diferent characteristics.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Fan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Albarghouthi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Koutris</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Patel</surname>
          </string-name>
          ,
          <article-title>Scaling-up in-memory datalog processing: Observations and techniques</article-title>
          , arXiv preprint arXiv:
          <year>1812</year>
          .
          <volume>03975</volume>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>L.</given-names>
            <surname>Ryzhyk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Budiu</surname>
          </string-name>
          , Diferential datalog.,
          <source>Datalog</source>
          <volume>2</volume>
          (
          <year>2019</year>
          )
          <fpage>4</fpage>
          -
          <lpage>5</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>B.</given-names>
            <surname>Scholz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Jordan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Subotić</surname>
          </string-name>
          , T. Westmann,
          <article-title>On fast large-scale program analysis in datalog</article-title>
          ,
          <source>in: Proceedings of the 25th International Conference on Compiler Construction, CC</source>
          <year>2016</year>
          ,
          <article-title>Association for Computing Machinery</article-title>
          , New York, NY, USA,
          <year>2016</year>
          , p.
          <fpage>196</fpage>
          -
          <lpage>206</lpage>
          . URL: https://doi.org/10.1145/2892208.2892226. doi:
          <volume>10</volume>
          .1145/2892208.2892226.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>A.</given-names>
            <surname>Shkapsky</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Interlandi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Chiu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Condie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Zaniolo</surname>
          </string-name>
          ,
          <article-title>Big data analytics with datalog queries on spark</article-title>
          ,
          <source>in: Proceedings of the 2016 International Conference on Management of Data, SIGMOD '16</source>
          ,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA,
          <year>2016</year>
          , p.
          <fpage>1135</fpage>
          -
          <lpage>1149</lpage>
          . URL: https://doi.org/10.1145/2882903.2915229. doi:
          <volume>10</volume>
          .1145/ 2882903.2915229.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Q.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Acharya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Arora</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. T.</given-names>
            <surname>Loo</surname>
          </string-name>
          ,
          <article-title>Optimizing declarative graph queries at large scale</article-title>
          ,
          <source>in: Proceedings of the 2019 International Conference on Management of Data</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>1411</fpage>
          -
          <lpage>1428</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>F.</given-names>
            <surname>McSherry</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. G.</given-names>
            <surname>Murray</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Isaacs</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Isard</surname>
          </string-name>
          , Diferential dataflow.,
          <source>in: CIDR</source>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>J. M.</given-names>
            <surname>Patel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Deshmukh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Potti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Spehlmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Memisoglu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Saurabh</surname>
          </string-name>
          ,
          <article-title>Quickstep: A data platform based on the scaling-up approach</article-title>
          ,
          <source>Proceedings of the VLDB Endowment</source>
          <volume>11</volume>
          (
          <year>2018</year>
          )
          <fpage>663</fpage>
          -
          <lpage>676</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>J.</given-names>
            <surname>Whaley</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Avots</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Carbin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. S.</given-names>
            <surname>Lam</surname>
          </string-name>
          ,
          <article-title>Using datalog with binary decision diagrams for program analysis</article-title>
          ,
          <source>in: Asian Symposium on Programming Languages and Systems</source>
          , Springer,
          <year>2005</year>
          , pp.
          <fpage>97</fpage>
          -
          <lpage>118</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>M.</given-names>
            <surname>Gebser</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Kaminski</surname>
          </string-name>
          , B. Kaufmann, T. Schaub, Clingo= asp+ control:
          <source>Preliminary report, arXiv preprint arXiv:1405.3694</source>
          (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>M.</given-names>
            <surname>Alviano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Faber</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Leone</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Perri</surname>
          </string-name>
          , G. Pfeifer,
          <string-name>
            <surname>G. Terracina,</surname>
          </string-name>
          <article-title>The disjunctive datalog system dlv</article-title>
          ,
          <source>in: International Datalog 2.0 Workshop</source>
          , Springer,
          <year>2010</year>
          , pp.
          <fpage>282</fpage>
          -
          <lpage>301</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>M.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Shkapsky</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Zaniolo</surname>
          </string-name>
          ,
          <article-title>Scaling up the performance of more powerful datalog systems on multicore machines</article-title>
          ,
          <source>The VLDB Journal</source>
          <volume>26</volume>
          (
          <year>2017</year>
          )
          <fpage>229</fpage>
          -
          <lpage>248</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>S.</given-names>
            <surname>Abiteboul</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Hull</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Vianu</surname>
          </string-name>
          , Foundations of databases, volume
          <volume>8</volume>
          ,
          <string-name>
            <surname>Addison-Wesley</surname>
            <given-names>Reading</given-names>
          </string-name>
          ,
          <year>1995</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13] CloudLab, https://www.cloudlab.us/,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>D. A.</given-names>
            <surname>Bader</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Madduri</surname>
          </string-name>
          ,
          <article-title>Gtgraph: A synthetic graph generator suite</article-title>
          , Atlanta,
          <string-name>
            <surname>GA</surname>
          </string-name>
          , February
          <volume>38</volume>
          (
          <year>2006</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>F.</given-names>
            <surname>McSherry</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Lattuada</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Schwarzkopf</surname>
          </string-name>
          , T. Roscoe,
          <article-title>Shared arrangements: practical inter-query sharing for streaming dataflows</article-title>
          , arXiv preprint arXiv:
          <year>1812</year>
          .
          <volume>02639</volume>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>H.</given-names>
            <surname>Jordan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Subotić</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Scholz</surname>
          </string-name>
          ,
          <article-title>Brie: A specialized trie for concurrent datalog</article-title>
          ,
          <source>in: Proceedings of the 10th International Workshop on Programming Models and Applications for Multicores and Manycores</source>
          , PMAM'19,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA,
          <year>2019</year>
          , p.
          <fpage>31</fpage>
          -
          <lpage>40</lpage>
          . URL: https://doi.org/10.1145/3303084.3309490. doi:
          <volume>10</volume>
          .1145/ 3303084.3309490.
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>M.</given-names>
            <surname>Barata</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Bernardino</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Furtado</surname>
          </string-name>
          ,
          <article-title>An overview of decision support benchmarks: Tpcds, tpc-h and ssb</article-title>
          ,
          <source>New Contributions in Information Systems and Technologies</source>
          (
          <year>2015</year>
          )
          <fpage>619</fpage>
          -
          <lpage>628</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>