Efficient Model Construction for Horn Logic with VLog:
                  Extended Abstract?

Jacopo Urbani1 , Markus Krötzsch2 , Ceriel Jacobs1 , Irina Dragoste2 , and David Carral2
          1 Vrije Universiteit Amsterdam, The Netherlands, firstname@cs.vu.nl
          2 cfaed, TU Dresden, Germany, firstname.lastname@tu-dresden.de


     Horn ontologies consisting of existential rules are used in various fields ranging
from reasoning over knowledge graphs [7] and Description Logics (DL) ontologies
[5,6], to data integration [4] and social network analysis [10]. To solve conjunctive
query answering over these logical theories, we can apply the chase algorithm—a sound
and complete (albeit non-terminating) bottom-up materialisation procedure where all
relevant consequences are precomputed, allowing queries to be directly evaluated over
materialised sets of facts.
     As our main contribution, we extend the in-memory Datalog engine VLog [11] to
support Horn existential rules without equality (a fragment that encompasses Horn-
SRI in terms of expressivity). Namely, we implement the skolem and the restricted
variants of the chase on VLog’s architecture. In the skolem chase, rules are replaced
by their skolemisation. In the restricted chase, new terms are introduced during the
reasoning process only if already derived terms and facts cannot be reused to satisfy
the corresponding existential restriction. The latter terminates in many more cases than
the former [2,3] and often produces smaller models, but termination depends on the
rule application order and its implementation requires value reusability checks. We
implement a slightly different version of the restricted chase which leads to termination
in more cases [3], by prioritising the exhaustive application of Datalog rules (rules
without existentially quantified variables). This enables facts derived from Datalog rules
to satisfy some existential restrictions that would otherwise lead to non-termination.
     In our implementation, we exploit the highly memory-efficient architecture of VLog,
based on columnar storage: instead of storing a list of tuples (rows), the data is organised
into a tuple of columns (value lists). The columns are ordered lexicographically, enabling
fast merge joins and duplicate elimination, as well as data compression schemes for low
memory usage. Because updates are slow in columnar tables, VLog operates in append-
only mode, applying one rule per materialisation step, and creating separate tables for the
derived facts. To reduce redundant derivations, VLog uses semi-naive evaluation, which
only considers rule body matches that were not found up to the previous application of
the same rule. We adopted the 1-parallel-restricted chase [1] optimisation, in which the
facts derived in the ongoing chase step are not checked for value reusability.
     We evaluate our implementation using existential rule programs from a recent
(skolem and restricted) chase benchmark [1]. In addition, we also use rules obtained from
translating data-rich, real world OWL ontologies (UOBM, Reactome, and Uniprot). The
test data involves programs with millions of facts and thousands of rules, and predicates
with relatively large arities (maximum 11). We test increasing partitions of data for the
? The full version of this paper was published at IJCAR 2018 [12].
Fig. 1. Memory usage (left) and materialisation time (right) for VLog and RDFox for the restricted
chase


same rule sets. In each case, we run the skolem and restricted chase on a commodity lap-
top, and measure time and peak memory usage. We compare the results against the chase
benchmark leading system, RDFox [9,8]. Like VLog, RDFox is a performance-oriented
in-memory chase engine, but it is designed with the goal of enabling highly parallel
processing rather than memory efficiency. RDFox does not prioritise Datalog rules,
hence may terminate in fewer cases than VLog, but this does not affect our experiments.
     The results for the restricted chase are shown in Figure 1. Both VLog and RDFox
require significantly less time and memory for the restricted chase, with only few
exceptions (deep-100, deep-200, Ontology-256 require slightly less time and memory
in the skolem chase). These findings further motivate the use of the restricted chase.
     On average, VLog uses only 40% of the memory required by RDFox, ostensibly due
to its compressed data structures. In two cases, RDFox runs out of memory, while VLog
terminates (using OS swap space in only one case). Regarding time performance, VLog
takes between 5.8% and 137.5% of the time needed by RDFox. RDFox outperforms
VLog for only two ontologies, UOBM and doctors. This result is surprising, since VLog
ran on a single thread, while RDFox used maximal parallelism (8 cores), in some cases
peaking at over 700% of CPU usage.
     To summarise, we provide the first implementation of the chase on a columnar
architecture, by extending VLog. Our tool is more memory-efficient than the state of the
art while being comparatively fast, even on a single thread. VLog is free and open-source.
It can be used in two ways: as a command-line client with an optional web interface,3
and through a Java API4 with additional functionality (translating OWL files to rules,
executing SPARQL queries against remote endpoints, integrating RDF data, . . . ). We
hope our tool proves useful for the community, and encourages a broader adoption of
the expressive language of existential rules.

Acknowledgements This work was supported by the DFG within the cfaed Cluster of
Excellence, CRC 912 (HAEC), and Emmy Noether grant KR 4381/1-1.
 3 VLog core reasoner: https://github.com/karmaresearch/vlog
 4 VLog4j: https://github.com/knowsys/vlog4j, also available via Maven
References
 1. Benedikt, M., Konstantinidis, G., Mecca, G., Motik, B., Papotti, P., Santoro, D., Tsamoura,
    E.: Benchmarking the chase. In: Proc. 36th Symposium on Principles of Database Systems
    (PODS’17). pp. 37–52. ACM (2017)
 2. Carral, D., Dragoste, I., Krötzsch, M.: Restricted chase (non)termination for existential rules
    with disjunctions. In: Sierra, C. (ed.) Proc. 26th Int. Joint Conf. on Artificial Intelligence
    (IJCAI’17). pp. 922–928. IJCAI (2017)
 3. Carral, D., Feier, C., Hitzler, P.: A practical acyclicity notion for query answering over Horn-
    SRIQ ontologies. In: Proc. 15th Int. Semantic Web Conf. (ISWC’16). LNCS, vol. 9981, pp.
    70–85 (2016)
 4. Fagin, R., Kolaitis, P.G., Miller, R.J., Popa, L.: Data exchange: semantics and query answering.
    Theoretical Computer Science 336(1), 89–124 (2005)
 5. Kazakov, Y.: Consequence-driven reasoning for Horn SHIQ ontologies. In: Proc. 21st Int.
    Joint Conf. on Artificial Intelligence (IJCAI’09). pp. 2040–2045. IJCAI (2009)
 6. Kazakov, Y., Krötzsch, M., Simančík, F.: The incredible ELK: From polynomial procedures
    to efficient reasoning with EL ontologies. J. of Automated Reasoning 53, 1–61 (2013)
 7. Marx, M., Krötzsch, M., Thost, V.: Logic on MARS: Ontologies for generalised property
    graphs. In: Sierra, C. (ed.) Proc. 26th Int. Joint Conf. on Artificial Intelligence (IJCAI’17).
    pp. 1188–1194. IJCAI (2017)
 8. Motik, B., Nenov, Y., Piro, R., Horrocks, I.: Combining rewriting and incremental materiali-
    sation maintenance for datalog programs with equality. In: Yang, Q., Wooldridge, M. (eds.)
    Proc. 24th Int. Joint Conf. on Artificial Intelligence (IJCAI’15). pp. 3127–3133. AAAI Press
    (2015)
 9. Motik, B., Nenov, Y., Piro, R., Horrocks, I., Olteanu, D.: Parallel materialisation of Datalog
    programs in centralised, main-memory RDF systems. In: Proc. 28th AAAI Conf. on Artif.
    Intell. (AAAI’14). pp. 129–137. AAAI Press (2014)
10. Seo, J., Guo, S., Lam, M.S.: SociaLite: an efficient graph query language based on Datalog.
    IEEE Trans. Knowl. Data Eng. 27(7), 1824–1837 (2015)
11. Urbani, J., Jacobs, C., Krötzsch, M.: Column-oriented Datalog materialization for large
    knowledge graphs. In: Proc. 30th AAAI Conf. on Artificial Intelligence (AAAI’16). pp.
    258–264. AAAI Press (2016)
12. Urbani, J., Krötzsch, M., Jacobs, C.J.H., Dragoste, I., Carral, D.: Efficient model construction
    for horn logic with vlog - system description. In: Galmiche, D., Schulz, S., Sebastiani, R.
    (eds.) Proc. 9th Int. Joint Conf. on Automated Reasoning (IJCAR’18). LNCS, vol. 10900,
    pp. 680–688. Springer (2018)