1. Introduction

Towards Better Understanding of the Performance and Design of Datalog Systems

Zhiwei Fan

Sunil Mallireddy

Paraschos Koutris

0 0 University of Wisconsin-Madison , WI , USA

166 180

Recent years have seen a resurgence of interest in the Datalog language and its syntactic extensions from both the industry and academia. Such interest has motivated a line of work to build eficient Datalog systems that support expressing data analytics of diferent types such as program and graph analyses in a concise and simple way. However, besides the performance improvement of diferent systems presented in these works over existing competitors, little understanding has been gained about the property of varying Datalog workloads (i.e., program and data), and the computation resource usage of diferent systems (i.e., memory and CPU utilization), which are crucial for understanding the essence behind the performance diference observed in diferent systems. When such knowledge is gained, clear guidance could be provided for users to choose between diferent Datalog systems depending on their use cases and system builders to improve the existing system or build more eficient new systems. In this paper, we propose the general profiling of the Datalog program evaluation and present the corresponding visualizations. We further discuss the insights gained from the produced visualizations and how these insights shed light on the pros and cons of existing Datalog systems, which further provides guidance on making improvements over the existing system and designing/building more eficient new systems.

eol>Datalog Systems Recursive Computation Benchmarks Profiling

1. Introduction

no description has been provided for the profile of the recursive computation. Here, by profile, we mean information such as the number of iterations, how many facts are produced in every iteration, etc. As shown in RecStep [ 1 ], the relative performance of diferent Datalog systems may not translate across diferent workloads (i.e., a system that performs well on one Datalog program and a particular dataset does not show comparable performance on the others). The lack of a closer look at the recursive computation profiles makes it dificult to analyze and explain the performance diference across diferent systems and workloads. In turn, this makes choosing the best system for applications of interest (for users) and improving existing systems (for system builders) challenging. For example, when attempting to build a new Datalog system, we need to answer questions such as what techniques in existing systems can we leverage? what are the limitations of existing systems? are there Datalog workloads of diferent characteristics that need to be handled diferently? what could be done to improve existing techniques?

To address the aforementioned issue, we argue that a general-purpose recursive computation profiling framework that can provide insights across systems and workloads is needed. In this work, we make the first step towards building such a profiling framework by presenting four important profiling components: recursion profile, runtime, CPU utilization and memory utilization. First, we describe these four components and briefly discuss their importance. Then we present case studies based on the profiling visualizations of Datalog workloads from two application domains: graph analytics and program analysis. As shown in Section 3, there is no single system always being the winner across all Datalog workloads, even within the same application domain. With the help of profiling visualizations, we analyze the causes behind the ineficient executions, extracting insights regarding the proper use cases and limitations of the studied systems (Section 3.3). By analyzing high-level causes (revealed by the profiling components) of the ineficiency exposed by a system, one is able to connect these high-level observations to the specific technical components in the system (e.g., data structures, algorithms), understanding the system limitations better.

In this paper, we focus on single-node systems, mainly looking at three recently published well-documented Datalog engines that are publicly available: RecStep [ 1 ], Soufle [ 3 ], and DDlog [ 2 ]. Soufle is a new recent high-performance Datalog system that is mainly designed for program analysis and uses optimization techniques such as eficient program synthesis, specialized parallel data structures for indexing and compression, and automatic index selection. RecStep is a Datalog engine built on top of an eficient single-node in-memory database called Quickstep[ 7 ], leveraging multiple years of eforts in the advancement of database techniques such as query optimization and eficient parallel query execution. DDlog translates a set of Datalog rules into the corresponding Diferential Dataflow [ 6 ] program that allows for incremental computation. Other Datalog systems such as BigDatalog [ 4 ], BDDBDDB [ 8 ] have been shown to be significantly outperformed by two of the systems (i.e., RecStep, Soufle) studied in this paper [ 1 ] with some notable system limitation and performance issues and therefore are excluded from the study. We do not consider solvers designed for answer-set programming such as clingo [ 9 ] and dlv [ 10 ] as the recent study [ 11 ] shows that they are not capable of handling Datalog workloads in our interested application domains (e.g., graph analytics, program analysis) due to ineficient use of memory. Our study focuses on positive Datalog programs (i.e., without negation) that involve recursion.

We use the profiling functionalities embedded in RecStep [ 1 ], which is able to provide general profiling information of diferent systems on the Datalog workload evaluation that is not tied to a specific system. Although the profiling components presented are fairly simple, they already provide meaningful insights that can aid further system analysis and improvement, which is not possible by looking solely at the performance numbers.

2. Recursive Computation Profiling

Most existing systems evaluate Datalog programs using a type of bottom-up evaluation called semi-naïve evaluation (SN) [ 12 ] either explicitly (e.g., RecStep, Soufle) or implicitly (e.g., DDlog).

At each iteration of the recursive computation, there are three types of facts that are important to consider. Facts of the first type are generated from the evaluation of each recursive rule (generated facts, GF). The generated facts can contain duplicates, so we also need to consider the facts after deduplication, called unique generated facts, UGF. Finally, the set-diference is performed between the unique generated facts and the existing facts to produce the new facts, NF. Some Datalog systems perform the deduplication and set-diference separately (e.g., RecStep), and other systems (e.g., Soufle, DDlog) fuse these two steps into one, often through the maintained indexes built on the IDB relations throughout the whole computation procedure.

As we will see in Section 3, the sizes of these three diferent types of facts in diferent iterations serve as the primary fingerprint of the Datalog workload and help better understand the behavior of various systems along with other profiling information such as runtime and resource usage. Next, we briefly discuss a few major components for recursive computation profiling. We note that these components are not the only ones to look at when analyzing the system performance on varying Datalog workloads. When being available, additional information such as the size of input data, and hot code paths could be useful and provide additional insights. Recursion Profile The sizes of facts of three diferent types in each iteration of the recursive computation characterize the Datalog workload, 1e8 wwtseaguirrzhesaioeteituiacistvonlhoewondlfcf,yafoDlG,ayencasatFstt,lsGihaasUoalFrtofsgvGgteeohFfrg,eG,uaaaEFUlpnesDGdpsbB)eFeN/catiiwnF≥nficdperD,ueeUaastnaGnptsraGdepFellceFoaNtctgiFiivoficpen≥drlsaoya)n.gt.NdadrNAeasFnUotemotetGeat((eFici.t..hehetIh..ain,,tet-- ftscFoa#0123451 2UNGneewinqeuFr3eaatcGeIttesdenrFae4atricaottnsed#5Facts 6 7iI()tttrsLoaaen7 indicates that many facts were produced multiple times in the same iteration, while a large gap between UGF and NF indicates that many Figure 1: Recursive Profile of TC-G 10k facts have already been produced in previous iterations. Note that NF also indicates the amount of work to be done during the next iteration in SN. As an example, Figure 1 is the recursive profile showing the sizes of facts of three types across seven iterations during SN of transitive closure evaluated on G10 dataset, which will be analyzed in detail in Section 3. Runtime Runtime is probably the most straightforward performance measure of diferent systems on a given workload. The runtime of many existing Datalog systems can be divided into compilation time and evaluation time. Systems such as Soufle and DDlog first generate the code given the input program, followed by compilation-level optimizations and executable binary generation. The overhead induced by code generation and compilation can be safely ignored, assuming that the generated executable files will be used repetitively later with diferent inputs. However, such an assumption might not always hold and may not be acceptable in circumstances where the overhead far exceeds the evaluation time. Thus, it is crucial to have access to a clear view of runtime breakdown when considering a specific application. CPU Utilization Like other data-parallel compute engines, recent Datalog systems [ 1, 2, 3, 11 ] exploit the parallelism packed inside modern servers to achieve high performance and scalability. However, achieving consistent high CPU eficiency and utilization across diferent workloads is challenging. Low performance could occur due to either low CPU eficiency (suggesting that the system might handle more work than necessary), or low CPU utilization (meaning that the system does not utilize multiple CPU cores well).

Memory Consumption Many recent works focus on building in-memory Datalog systems [ 1, 2, 3, 4 ]. However, most of them either ignore the evaluation of memory utilization [ 4 ] or miss the comparison with other existing Datalog systems [ 3, 2 ]. Since most of the evaluations presented in these works are standalone (i.e., a system only evaluates one workload at a time in a server without interference), the lack of understanding of the memory footprint makes it hard to choose the proper hardware (e.g., a server with small or large memory), estimate the scalability of a Datalog program (e.g., the maximum dataset the system can handle), and the applicability (e.g., whether concurrent evaluation is feasible or not).

3. Case Studies

We next present the Datalog programs that arise from graph analytics (TC, SG, REACH) and program analysis (AA, CSPA, CSDA) followed by the case studies of the corresponding Datalog workloads. These Datalog programs are supported by all three systems of interest in this paper. We first look at the linear recursive Datalog programs (TC, SG, REACH, CSDA), in which each program consists of one non-recursive rule and one linear recursive rule (i.e., the rule body contains only one recursive IDB predicate). Then, we study the Datalog workloads of non-linear recursive programs (AA, CSPA) each of which contains at least one recursive rule that has more than one recursive IDB predicate in the rule body. As we will see from the recursive computation profiles of these workloads, even when two Datalog programs look very similar, the relative performance of diferent systems can be very diferent. This happens when two programs are from the same domain (e.g., TC, REACH) or diferent application domains (e.g., REACH, CSDA).

All experiments are conducted on a bare-metal server in Cloudlab [ 13 ], a large cloud infrastructure. The server runs Ubuntu 18.04 LTS and has two Intel Xeon E5-2660 v3 2.60 GHz (Haswell EP) processors. Each processor has 10 cores, and 20 hyper-threading hardware threads. The server has 160GB memory and each NUMA node is directly attached to 80GB of memory. We only consider the CPU and memory utilization of the systems during their actual execution period and thus the time period used for code generation and compilation is excluded for CPU and memory profiling. Information about the input and output is summarized in Table 1.

3.1. Simple Linear Recursion Transitive Closure (TC):

Same Generation (SG): tc(X, Y) :- arc(X, Y).

tc(X, Y) :- tc(X, Z), arc(Z, Y). sg(X, Y) :- arc(P, X), arc(P, Y), X ̸= Y. small (< 1MB), large intermediate results (i.e., three types of facts) as shown in Figure 2a and Figure 3a are generated. We observe the following features across all recursion profiles for the above tasks: () the number of iterations is relatively small, () there is a large gap between GF and UGF , which suggests eficient deduplication is critical to the overall good performance, and () the gap between UGF and NF is small or non-existent in most cases.

For TC and SG, while all three systems have relatively high CPU utilization throughout the evaluation (Figures 2c, 3c), the runtime varies. Besides the relatively long compilation time of DDlog (∼ 200), DDlog’s evaluation time is about 2 − 3 longer than that of the other two competitors (i.e., Soufle and RecStep), using a large amount of memory (Figures 2c, 3c). The performance number of DDlog is significantly worse than that of its runtime primitive Diferential Dataflow [ 6 ] as shown in [ 15 ]. The ineficiency of DDlog could be attributed to the fact that its design heavily focuses on incremental computation (e.g., maintaining intermediate states of large sizes, separate management of existing computation and monitoring new input, etc), sacrificing the performance for batch processing.

RecStep uses significantly more memory (Figures 2d, 3d) on the small input datasets (i.e., G10 and G5) compared to other workloads (Figures 4d-12d), the sizes of input datasets of which vary from 22MB to 1.7GB. This is because RecStep performs deduplication as a separate step, relying on its backend in-memory relational database QuickStep [ 7 ], which pre-allocates the memory to the hash table for deduplication based on the size of the generated facts. Due to the memory ineficiency observed in such cases, RecStep soon runs out of memory when evaluating TC and SG on graphs with a large number of vertices. At the same time, the edge inclusion probability remains the same. We run RecStep using its default interpretation mode without the specialized parallel bit-matrix evaluation (PBME) designed for dense graphs of small vertices. While PBME [ 1 ] is an eficient technique to address this issue, it is specifically designed for graphs with a relatively small number of vertices, and its generality is limited.

In contrast, Soufle has acceptable overhead from the code generation and compilation (∼ 10), showing the overall best performance on TC-G10 and SG-G5 with a small memory footprint mainly thanks to its specialized parallel data structure Brie [ 16 ] for relation storage and indexing, which provides good compression capability for high-density relations of large sizes and support of eficient parallel operations (e.g., insertion, lookup).

Reachability (REACH):

Context-Sensitive Dataflow Analysis (CSDA): reach(Y) :- id(Y). reach(Y) :- reach(X), arc(X, Y). null(X, Y) :- nullEdge(X, Y). null(X, Y) :- null(X, W), arc(W, Y). 100 )%80 ( eg60 a ten40 c rPe20 00 0 recstep souffle ddlog

Datalog Systems (b) Runtime

We run REACH on orkut, a relatively large real-world online social network dataset in which the friendship of users is represented as edges. REACH finds the friends of a given set of user ids. Figure 4 is the recursive computation profile of REACH- orkut and we observe that the relative system performance looks quite diferent from what is observed on TC-G 10 and SG-G5. RecStep significantly outperforms Soufle and DDlog and the reasons are two-fold: () GF is relatively small across just a few number of total iterations, resulting in the negligible overhead of the separate deduplication step and overall eficiency of RecStep () RecStep utilizes CPU much more eficiently compared to Soufle and DDlog, which sufer from the long warm-up phase (e.g., index building) due to the much larger EDB/input sizes. Systems such as Soufle using a specialized data structure for indexing (i.e., Brie), may sufer from much bigger index building overhead if the characteristic of indexed data is adversarial to the property of the indexing data structure. For REACH on relatively sparse large real-world social graphs, we observe larger indexing overhead in Soufle as the input graph size increases (e.g., livejounral → orkut → twitter in Table 1), which results in larger relative performance gap between Soufle and RecStep when the total number of iterations of recursive computation remains small (e.g., Figures 5a, 4a, 5c). The observation suggests that the input data itself serves as an important profiling component sometimes and should be considered together with other profiling components (e.g., recursion profile, CPU utilization). 1e5

) Generated Facts 787 Unique Generated Facts ( New Facts itoan r e t I t s a

L 100 200 300 400 500 600 700 800

Iteration # (a) Recursion Profile 0 recstep souffle ddlog

Datalog Systems (b) Runtime 100

200Time30(s0) (c) CPU Utilization

100 200Time30(s0) 400 (d) Memory Consumption 500

CSDA can be seen as a variant of transitive closure: first a set of null edges is given to initialize the non-recursive rule, and then the linear-recursive rule looks the same as that of TC. However, the recursive computation profiles of CSDA and TC workloads are very diferent. We observe that RecStep performs significantly worse compared to Soufle and DDlog (Figures 6b, 7b, 8b) while its CPU utilization over time is lower compared to Soufle and DDlog (Figures 6c, 7c, 8c). Comparing with TC-G5 and REACH-orkut, the CSDA workloads on linux, postgresql and httpd have a very long tail in their recursion profiles (Figures 6a, 7a, 8a): it takes a large number of iterations for the evaluation to reach the fixpoint , and most of the work is performed during the ifrst few iterations. After further digging, we have confirmed that RecStep’s poor performance is mainly due to the lack of continuously maintained indexes throughout the program evaluation that its competitors Soufle and DDlog have. This forces RecStep to reconstruct hash tables and recstep souffle ddlog 400 500 )12 (%10 e g8 a t n6 e rc 4 e P2 0 ) 1 2 7 ( n o i t

UNGneewinqeuFreaatcGetesdnFearcattsed Facts IttrsLaae 80 160 240 320 400 480 560 640 720

Iteration # (a) Recursion Profile 0 recstep souffle ddlog

Datalog Systems (b) Runtime repeatedly probe the input table (i.e., arc) for joins in every iteration. The resulting overhead accumulates across iterations, leading to poor CPU utilization and eficiency when the Datalog workloads have very long-tail recursion profiles. This observation shows the necessity of continuously maintained indexes for consistent eficient recursive Datalog program execution in these cases.

3.2. Non-linear Recursion Andersen’s Analysis (AA):

pointsTo(Y, X) :- addressOf(Y, X). pointsTo(Y, X) :- assign(Y, Z), pointsTo(Z, X). pointsTo(Y, W) :- load(Y, X), pointsTo(X, Z), pointsTo(Z, W).

pointsTo(Z, W) :- store(Y, X), pointsTo(Y, Z), pointsTo(X, W).

Context-Sensitive Points-To Analysis (CSPA): valueFlow(Y, X) :- assign(Y, X). valueFlow(X, X) :- assign(X, Y). valueFlow(X, X) :- assign(Y, X). valueFlow(X, Y) :- assign(X, Z), memoryAlias(Z, Y).

valueFlow(X, Y) :- valueFlow(X, Z), valueFlow(Z, Y). valueAlias(X, Y) :- valueFlow(Z, X), valueFlow(Z, Y).

valueAlias(X, Y) :- valueFlow(Z, X), memoryAlias(Z, W), valueFlow(W, Y). memoryAlias(X, X) :- assign(Y, X). memoryAlias(X, X) :- assign(X, Y).

memoryAlias(X, W) :- dereference(Y, X), valueAlias(Y, Z), dereference(Z, W).

Switching from linear recursion to non-linear recursion, we observe that RecStep significantly outperforms Soufle and DDlog (Figure 9b) for Andersen’s Analysis evaluated on the largest input dataset (∼ 1.2G) used in [ 1 ]. Figure 9c shows that both Soufle and RecStep have a long warm-up time in which the CPU utilization is very low. The reason is similar to that observed in the evaluation of REACH on large graphs - since there are four EDB/input relations in AA, the total size of which is relatively large, more preprocessing work for Soufle and DDlog (e.g., index construction) is needed.

Since the size of generated facts is relatively small across diferent iterations, RecStep evaluates AA eficiently while using only a small amount of memory (Figure 9d). Additionally, AA has nonlinear-recursive rules, and all rules in AA derive the facts for the same relation (i.e., pointsTo), in which case RecStep is able to fully utilize CPU and evaluate all rules in parallel.

The Datalog program itself alone is insuficient to characterize the recursive computation. For CSPA, the recursive computation profiles of three systems on linux dataset look very diferent from the ones on postgresql and httpd datasets. Interestingly, we can see that similar recursive profiles (Figure 11a and Figure 12a) come along with similar profiling information on runtime (Figure 11b and Figure 12b), CPU (Figure 11c and Figure 12c) and memory (Figure 11d and Figure 12d). DDlog’s performance seems to be very sensitive to the sizes of the intermediate results: when there are large number of facts being generated in several iterations, DDlog recstep souffle ddlog )10 ( 8 % e g a6 t n ce4 r e P2 0

Generated Facts )24 Unique Generated Facts ( New Facts itoan r e t I t s a

L 6 9 Ite1ra2tion15# 18 21 24 (a) Recursion Profile turns out to require a great amount of memory to maintain the intermediate states (Figure 11d and Figure 12d) and the corresponding overhead also afects the overall performance greatly (i.e., DDlog is outperformed by RecStep and Soufle as shown in Figure 11b and Figure 12b). Such inference can be additionally strengthened by looking at Figure 10, in which Figure 10a shows fewer facts are generated during the iterative evaluation and DDlog show better relative performance over RecStep in Figure 10b while using considerably less memory as shown in Figure 10d.

3.3. Discussion

The case studies above show how we can leverage the recursive computation profiling components described in Section 2 to understand better the performance diference between diferent Datalog systems on a given Datalog workload. This also allows us to identify the limitation of existing systems. For example, DDlog is unsuitable for batch-processing on the workload that generates intermediate results of large sizes but might be of proper use when GF is small. Soufle performs well primarily because of the heavy optimization of indexes, but indexing could become a bottleneck in some cases (Figure 4b and Figure 9b). The large performance gap between RecStep and the other two Datalog systems observed on CSDA reveals the importance of continuously maintained indexes to good overall performance. Finally, the large memory consumption of RecStep observed on TC-G10 and SG-G5 suggests such indexes should also have good compression capability for memory eficiency.

Generated Facts Unique Generated Facts

New Facts 3 6 9 1It2era1t5ion1#8 21 24 27 (a) Recursion Profile 0 recstep souffle ddlog

Datalog Systems (b) Runtime 0 recstep souffle ddlog

Datalog Systems (b) Runtime

4. Conclusion and Future Work

We have recently observed a renewed interest in Datalog. While recent work has significantly advanced the state-of-the-art of Datalog evaluation techniques, we believe that a systematic way to help gain a better understanding of these techniques is of great importance. Besides the recursive computation profiling we propose in the paper, a standard benchmark covering 2.00 1.75 ts1.50 c a1.25 fF1.00 o0.75 #0.50 0.25 0.00

Generated Facts )31 Unique Generated Facts( New Facts itoan r e t I t s a

L 8 12Iter1a6tion2#0 24 28 32 (a) Recursion Profile 0 recstep souffle ddlog

Datalog Systems (b) Runtime diferent aspects of Datalog workloads is needed. This benchmark should be in analogy to the online analytical processing benchmarks [ 17 ] that have been used for more than two decades for performance validation of decision support systems. Without a benchmark suite that covers Datalog workloads with diferent profiles, one can only gain a partial view of the system performance, which could lead to the lack of essential factors needed during application deployment (for users) and miss of system design decisions (for system builders).

Our ongoing and future work includes improving the recursive computation profiling by adding other possible profiling components, building the new high-performance Datalog system based on the recursive computation profiling, and creating a comprehensive benchmark suite. Once such a benchmark is available and we have a better understanding of the characteristics of diferent recursive profiles, we wish to find ways to estimate the recursive profile of a Datalog workload without actually running it, which we believe will be helpful to think of evaluation strategies and techniques that provide consistent, eficient execution across Datalog workloads of diferent characteristics.

[1]

Fan ,

Zhu ,

Zhang ,

Albarghouthi ,

Koutris ,

Patel , Scaling-up in-memory datalog processing: Observations and techniques , arXiv preprint arXiv: 1812 . 03975 ( 2018 ).

[2]

Ryzhyk ,

Budiu , Diferential datalog., Datalog 2 ( 2019 ) 4 - 5 .

[3]

Scholz ,

Jordan ,

Subotić , T. Westmann, On fast large-scale program analysis in datalog , in: Proceedings of the 25th International Conference on Compiler Construction, CC 2016 , Association for Computing Machinery , New York, NY, USA, 2016 , p. 196 - 206 . URL: https://doi.org/10.1145/2892208.2892226. doi: 10 .1145/2892208.2892226.

[4]

Shkapsky ,

Yang ,

Interlandi ,

Chiu ,

Condie ,

Zaniolo , Big data analytics with datalog queries on spark , in: Proceedings of the 2016 International Conference on Management of Data, SIGMOD '16 , Association for Computing Machinery, New York, NY, USA, 2016 , p. 1135 - 1149 . URL: https://doi.org/10.1145/2882903.2915229. doi: 10 .1145/ 2882903.2915229.

[5]

Zhang ,

Acharya ,

Chen ,

Arora ,

Chen ,

Liu ,

B. T.

Loo , Optimizing declarative graph queries at large scale , in: Proceedings of the 2019 International Conference on Management of Data , 2019 , pp. 1411 - 1428 .

[6]

McSherry ,

D. G.

Murray ,

Isaacs ,

Isard , Diferential dataflow., in: CIDR , 2013 .

[7]

J. M.

Patel ,

Deshmukh ,

Zhu ,

Potti ,

Zhang ,

Spehlmann ,

Memisoglu ,

Saurabh , Quickstep: A data platform based on the scaling-up approach , Proceedings of the VLDB Endowment 11 ( 2018 ) 663 - 676 .

[8]

Whaley ,

Avots ,

Carbin ,

M. S.

Lam , Using datalog with binary decision diagrams for program analysis , in: Asian Symposium on Programming Languages and Systems , Springer, 2005 , pp. 97 - 118 .

[9]

Gebser ,

Kaminski , B. Kaufmann, T. Schaub, Clingo= asp+ control: Preliminary report, arXiv preprint arXiv:1405.3694 ( 2014 ).

[10]

Alviano ,

Faber ,

Leone ,

Perri , G. Pfeifer, G. Terracina, The disjunctive datalog system dlv , in: International Datalog 2.0 Workshop , Springer, 2010 , pp. 282 - 301 .

[11]

Yang ,

Shkapsky ,

Zaniolo , Scaling up the performance of more powerful datalog systems on multicore machines , The VLDB Journal 26 ( 2017 ) 229 - 248 .

[12]

Abiteboul ,

Hull ,

Vianu , Foundations of databases, volume 8 , Addison-Wesley

Reading

, 1995 .

[13] CloudLab, https://www.cloudlab.us/, 2018 .

[14]

D. A.

Bader ,

Madduri , Gtgraph: A synthetic graph generator suite , Atlanta, GA , February 38 ( 2006 ).

[15]

McSherry ,

Lattuada ,

Schwarzkopf , T. Roscoe, Shared arrangements: practical inter-query sharing for streaming dataflows , arXiv preprint arXiv: 1812 . 02639 ( 2018 ).

[16]

Jordan ,

Subotić ,

Zhao ,

Scholz , Brie: A specialized trie for concurrent datalog , in: Proceedings of the 10th International Workshop on Programming Models and Applications for Multicores and Manycores , PMAM'19, Association for Computing Machinery, New York, NY, USA, 2019 , p. 31 - 40 . URL: https://doi.org/10.1145/3303084.3309490. doi: 10 .1145/ 3303084.3309490.

[17]

Barata ,

Bernardino ,

Furtado , An overview of decision support benchmarks: Tpcds, tpc-h and ssb , New Contributions in Information Systems and Technologies ( 2015 ) 619 - 628 .