Common Logic for an RDF Store Robert MacGregor, Ph.D. Franz Inc. Oakland, CA bob.macgregor@gmail.com Abstract — The advent of commercial tools that (ii) Optional support for logic operators that make support reasoning and management of RDF data stores the closed-world assumption and unique id provides a robust base for the growth of Semantic Web assumptions. We include these because classical applications. There is as yet no analogous set of tools and negation and universal quantification operators are products to support advanced logic-based applications. inherently non-scalable. This article examines issues that arise when seeking to combine the expressive power of Common Logic with the scalability of an RDF store. (iii) A CL-based rule language. Index Terms — Common Logic, RDF, Semantic Web, (iv) CL definitions. higher order logic. (v) Extensible operators. I. INTRODUCTION Just having a CL-based query language would be a Franz Inc is researching the possibility of big improvement on the current state of RDF-based implementing a Common Logic [1] parser and query tools. Unlike SPARQL (introduced below), it would processor for our RDF [2] store, AllegroGraph. There have a “real” (model-theoretic) semantics, it would is a very wide scope for how large a subset of the have clean syntax that assumes a calculus-like rather language is implemented, and more significantly, what than an algebraic formulation of clauses, it would be kinds of reasoning are supported. Here we discuss expressive, and it wouldn’t break your fingers when several of the issues and possibilities. you type it. A primary goal for marrying Common Logic (CL) The inclusion of a definition language that to AllegroGraph is to achieve scalable logical subsumes OWL (easy) would allow for a calculus that inference over a dynamic fact base. We believe the spans the range of RDF-based languages with a single AllegroGraph infrastructure is well suited for crafting a syntax. scalable reasoner, however, expressive logics are inherently non-scalable, and there are severe trade-offs There are several “sweet spots” that could be that must be made to achieve reasoning over billions of supported; a sweet spot being a language subset that statements. supports sound and complete reasoning while being at least moderately scalable. We will note these as we As part of the presentation we will discuss our examine various trade-offs. proposal for implementation and requirements for the intelligence community. We will discuss a number of the tradeoff as described below. III. A BASELINE SYSTEM The baseline is a simple query language with II. COMMON LOGIC IMPLEMENTATION atomic ground assertions. Here, we explicitly exclude the possibility of rules or definitions. Such a language A full-fledged implementation would include the would allow arbitrary CL expressions to be evaluated following features: against the fact base. However, both universal quantifiers and classical negation operators will always (i) A user-friendly query language, based on CL, evaluate to false (or unknown) in this scheme; the that supports arbitrary boolean expressions in the former due to the absence of any kind of “closure” “where” clause (here, we are imagining operator, and the latter due (notionally) to the absence augmenting a SQL-like select-from-where syntax). of statements that can logically contradict one another. The simplest way to “achieve closure” is to admit they are a convenience (a major convenience). There operators that define local closed world assumptions. are relatively few examples of systems built using non- For example, a negation-as-failure operation assumes recursive rules; however, limited practical experience that the facts within the scope of a single query are has revealed that most recursive Horn rules can be situationally-complete. A closed-world universal reformulated into equivalent non-recursive rules quantifier makes a similar assumption. Both of these combined with an (expressive) transitive closure operators are completely scalable. In other words, we operator. The most serious drawback to this scheme can add them without reducing our expectations on (non-recursive rules) is that it is theoretically how scalable our language is. The strategy of uninteresting. There is nothing semantically to write embedding the closure within a language operator, about, so there are no papers on the subject. rather than, say, within a predicate, minimizes the scope of the closures, and allows both open and closed- Finally, one could design a system that combines world reasoning to be applied to the same model. recursive and non-recursive rules. This is a quite viable option. The only caveat is that only highly- Alternatively, one could permit assertion of OWL- disciplined users are likely to reformulate as many like operators (e.g., max-cardinality and all-values- rules as possible into non-recursive equivalents. The from) to achieve closure. Introducing these operators benefits of doing so would be orders of magnitude immediately eliminates the possibility of scalable increases in query performance, but your average user reasoning, or complete reasoning, or both. Here, we might not master the technique. are discounting databases where large numbers of asserted and derived have been laboriously loaded and then compiled, yielding a base that melts on the first V. DEFINITIONS update; a “dynamic” scalable application will support real-time updates as a matter of course. We face another choice when we add definitions into the mix. If we interpret our definitions as if-and- Our baseline would not be complete without a only-if rules, then we have abandoned hope of scalable transitivity operator. Transitive closure is the most inference. Alternatively, we can apply an asymmetric important of all the classes of inference. Simplest (if but not only-if) interpretation to Horn-like would be to include the equivalent of an definitions to achieve an expressivity equivalent to ‘owl:TransitiveProperty’ declaration, but practical Horn rules These are superior to one-directional rules, experience has shown that the addition of a transitive because the only-if portion can be reserved for closure operator to the query language syntax has constraint-checking/data validation. A single syntax important benefits. Specifically, it is useful to be able should suffice for either interpretation of a definition to define transitive closures over compound binary (asymmetric or bi-directional); one can envision using expressions; something that OWL can’t do. a single set of definitions for both scalable inference AllegroGraph includes specialized accelerators for and small-scale but rich inference. computing transitive closures. VI. INFERENCE IV. ADDING RULES Tableaux-based reasoners appear to be inherently The addition of Horn-like rules significantly non-scalable over dynamic databases. Instead, we increases the utility of the language. Here we face a focus chiefly on rule-based reasoners. There are three choice. If our rules are recursive, then syntactic basic classes of rules: (1) backward-chaining rules, (2) constraints must be placed upon both the heads and forward-chaining rules, and (3) rewrite rules. bodies of our rules if we are to retain completeness. Backward-chaining rules are the best-behaved. They This results in a Prolog-like semantics, with a CL are relatively insensitive to database updates (cache- syntax. The accompanying reasoning can be made busting will occur, but it is manageable) and they are moderately scalable. moderately scalable. Alternatively, we can degree that our rules are Forward-chaining rules are more powerful (from a non-recursive. Here, we restrict our rules to having completeness standpoint) than backward rules. atomic heads, but we can permit arbitrarily expressive However, some form of truth maintenance is required tails. This scheme is both scalable and complete. We to manage derived facts, and bitter experience has know this because these rules do not actually increase shown that truth maintenance does not scale. Hence, the expressive power of the query language. Instead this option is not viable for large scale applications. the store versus predicates evaluated by other means Rewrite rules (also called “triggers”) are (e.g., equality, inequality, etc.). Rather than treating essentially forward rules that don’t bother to clean up the context/graph dimension as simply one additional after themselves when updates are made. Instead, they argument (to a triple), it adds orthogonal syntactic are interpreted as having a sematics external to the constructs that interleave with the already cumbersome system. This makes them highly useful, but it is triples and filters. While simple SPARQL queries are “buyer beware” when it comes to semantics. Rewrite fairly readable, when complexities such as disjunctions systems have difficulty managing the trigger portion of are utilized, SPARQL queries become very difficult to very expressive rules. For the handling of expressive compose and interpret. rewrite rules, we recommend the introduction of “trigger” clauses into the syntax. The assumption is In the logic world, a primary weapon to counter that such rules will fire only when updates to the fact syntactic complexity is to base the semantics of a logic base are detectible by the trigger portion(s) of the rule; on a small number of primitive operators, and to define other (presumably more expensive) clauses in the rule the remaining operators as compositions of the will not be monitored. Most uses of rewrite rules (e.g., primitives. In this case, the bulk of language syntax Jess rules) are applied only to modest sized databases. may be regarded as syntactic sugar; this makes the job of implementing the language much more manageable. The extensible operator feature allows arbitrarily This is how, for example, KIF [4] and Common Logic complex operators to be added to the language. This have been defined. SPARQL has taken the opposite allows for exotic operators like “cut” or modals to be approach; it has a large number of different semantic added. This is possible because the specialist operators, and is defined in terms of a procedural mechanism includes hooks into the internals of the semantics rather than a declarative semantics. That query executor. High-end inference can be achieved means that the traditional compositional semantics by adding additional operators to the rule engine that approach cannot be applied to SPARQL. include their own logic interpreters. The combination of bloated syntax and an AllegroGraph’s implementation of CL will use the essentially non-existent semantics means that extension mechanism to provide access via CL to its SPARQL cannot readily server as a foundation for the built-in geospatial, temporal and social network addition of rules, modal operators, and other higher analysis features. level constructs. This leaves the field open to competing languages such as Common Logic. VII. COMMON LOGIC AND RDF VIII. EXPRESSIVE POWER EXAMPLES Scalable logic-based applications will most likely be built on top of an RDF triple store. It makes sense In this section, we look at some simple examples to ask what contribution Common Logic can make in where the expressive power of Common Logic can be this context. In fact, a query language based on applied to treat representational problems that are Common Logic would have a number of advantages. difficult or impossible to solve using a SPARQL-like This is due in part to the fact that SPARQL, the defacto language. We will use a KIF-like syntax to express our standard in the RDF world, has a number of serious rules. deficiencies that discourage its use for higher-level logic applications. A common claim made by many RDF advocates is that “the Semantic Web is open world”. Practical SPARQL [3] is a W3C-recommended query experience indicates that this statement is a complete language for RDF data. It has been designed to enable falsehood; in fact, not only are there “pockets” of expressions of common, everyday queries in a style assertions in most semantic networks best treated using that mimics a syntax used elsewhere to express atomic close-world semantics, but these “pockets” tend to be ground assertions. The majority of developers of RDF the locus of the highest-valued information. Therefore, stores provide implementations of SPARQL; this has a practical Semantic Web language will include significantly spurred the growth of RDF-based tools constructs to treat close-world models. and technology. Consider the predicate “single”, as in “not married”. One serious drawback of SPARQL is that it takes It is conventional to treat the definition of the “single” the “kitchen sink” approach to syntax. SPARQL has predicate as the closed-world negation of the predicate two “and” operators, two “or operators, and an “married”, e.g., awkward division between predicates evaluated against (<= (married ?p) kinds of universal quantification expressible in these (exists (?s) (spouse ?p ?s))) languages (SQL has various aggregate operators; (<= (single ?p) SPARQL makes no provisions for universal (not (married ?p))) quantification); (ii) it requires that scope rules for variables be implicit rather than explicit, which works In other words, if you don’t know that a person is well most of the time, but not always. Here is an married, assume s/he is single. This isn’t guaranteed to example representing a simplification of an application be true; but its the way that personnel data is utilized a that this author encountered, where the lack of an great deal of the time. explicit existential quantifier (and accompanying scoping) made composing the query difficult. The The semantics of closed-world negation can either (simplified) problem is to query for two degrees of be assumed to attach to the underlying domain model, distance from Kevin Bacon, based on a ‘knows’ or to be attributed to a logic operator. In the latter relationship. Here is the query expressed without case, since ‘not’ denotes classical negation, we would reference to existential quantification: replace ‘not’ by a specific negation-as-failure operator (variously called ‘thnot’, ‘unsaid’, etc.) to achieve the (select ?x (where (or (?x = Kevin) desired semantics, e.g., (and (knows Kevin ?x1) (or (?x = ?x1) (<= (single ?p) (and (knows ?x1 ?x2) (unsaid (married ?p))) (?x = ?x2)))))) Next, consider universal quantification. The rule The query succeeds only if the variable ?x1 is the below states that you are “off the hook” if all of your same throughout the query. In many quantifier-free children have graduated from college: languages (e.g., SPARQL) variables in parallel disjuncts can have the same name but not be (<= (off-the-hook ?p) (forall (?c) considered the same variable. This is done for a very (implies (child ?p ?c) good reason; however, it means that we can’t be sure (graduated-from-college ?c) how the above query will be evaluated without a detailed inspection. The actual query found in the The trick to evaluating this predicate in a practical application was more complex than this, because the domain lies chiefly in determining if the set of children entities were related by more than one predicate. If known for an individual Fred constitutes the complete you replace set of Fred’s children. This kind of information is (knows ?x1 ?x2) above by typically hard to locate. Instead of looking for a (or (knows ?x1 ?x2) (likes ?x1 ?x2) guaranteed answer, it is more typical to query for all of Fred’s children, and ask if each of those retrieved has then you will have a better approximation of the graduated. This answer can be trusted as far, and only complexity of the query in the application. Doing so as far, as the closed-world assumption holds. makes the scoping that much more tenuous. In fact, the query language used in the application turned out The ability to make closed-world assumptions to have scoping rules that assumed that the variable about sets of entities is critical to many real-world ?x1 was not unique across the query. This made it applications. Having a universal quantifier in the necessary to rewrite the query, approximately doubling language enables this reasoning to be computed its size. On the other hand, if we have an explicit endogenously, rather than relegating it to the existential quantifier, none of this “guessing” is procedurally-evaluated portion of an application. necessary: One would also like aggregate entities to be (select ?x (where (or (?x = Kevin) treatable within a logic. Here is a (somewhat (exists (?x1) simplistic) definition of the term “family”: (and (knows Kevin ?x1) (or (?x = ?x1) (<= (family ?p ?fam) (and (knows ?x1 ?x2) ?fam = (setof (?r) (or (spouse ?p ?r) (?x = ?x2))))))) (child ?p ?r)))) Lastly, a host of logic-based applications find it Query languages such as SQL and SPARQL do useful (and in a cognitive-sense, “necessary”) that the not allow for explicit universal quantifiers in their language support n-ary predicates and n-ary functions. syntax. This has two consequences: (i) it limits the The Franz product features a suite of geospatial, temporal, and semantic network reasoners that are best The implementation community for Common exploited using queries that employ n-ary predicates. Logic needs to produce a target specification that is The query below evaluates: both “doable” and useful to a significant class of applications. There is a chicken-and-egg component, Retrieve important people known to Bob who since one needs to have an expressive language attended a meeting in or near Berkeley, CA in available to appreciate why and how one can use it. November, 2008. REFERENCES (select (?p) (and [1] International Standard ISO/IEC 24707 Information technology (ego-group bob knows ?group 2) — Common Logic (CL): a framework for a family of logic based (actor-centrality-members languages. URL: ?group knows ?p ?importance) http://standards.iso.org/ittf/PubliclyAvailableStandards/c039175_IS (participant ?event ?p) O_IEC_24707_2007(E).zip. (instance ?event Meeting) [2] RDF/XML Syntax Specification (Revised) W3C (interval-during ?event “2008-11-01” Recommendation 10 February 2004. URL: “2008-11-05”) http://www.w3.org/TR/rdf-syntax-grammar/. (contains (geo-box-around [3] SPARQL Query Language for RDF W3C Recommendation 15 (location Berkeley) 5 miles) January 2008. URL: http://www.w3.org/TR/rdf-sparql-query/. (location ?event)))) [4] Knowledge Interchange Format draft proposed American National Standard (dpANS) NCITS.T2/98-004. Here the ‘ego-group’ predicate is a distance-2 http://logic.stanford.edu/kif/dpans.html. Kevin Bacon computation (note how much simpler it is than the previous query). Another comment on the “Kevin Bacon” query: When the relationship predicate is the same on all layers, then a built-in version of the computation can be expected to execute significantly faster than the same computation phrased in logic. However, the original query referenced a different predicate at the first level than the second, and referenced four different predicates at that second level, so a built-in operator was not available. The moral being that built- ins are not a universal panacea for expressiveness. This section has surveyed a sample of Common Logic language constructs to suggest that users benefit both by (i) the ability to program a larger portion of their applications within the logic, rather than resorting to procedural manifestations, and (ii) that use of more expressive constructs can reduce the complexity of the resulting rules and queries, making the language more usable by humans. IX. SUMMARY Adding a Common Logic interface and interpreter to an RDF store would provide a spectrum of possible benefits. At one end, a careful exploitation of CL features would provide “heightened” versions of semi- conventional query processing, over a dynamic, scalable platform. At the high-end, one can contemplate experimenting with combinations of powerful reasoners operating over relatively small sets of data interacting with the large-scale query engine.