Common Logic for an RDF Store
                                          Robert MacGregor, Ph.D.
                                                   Franz Inc.
                                                  Oakland, CA
                                            bob.macgregor@gmail.com

     Abstract — The advent of commercial tools that               (ii) Optional support for logic operators that make
support reasoning and management of RDF data stores               the closed-world assumption and unique id
provides a robust base for the growth of Semantic Web             assumptions. We include these because classical
applications. There is as yet no analogous set of tools and       negation and universal quantification operators are
products to support advanced logic-based applications.
                                                                  inherently non-scalable.
This article examines issues that arise when seeking to
combine the expressive power of Common Logic with the
scalability of an RDF store.                                      (iii) A CL-based rule language.

    Index Terms — Common Logic, RDF, Semantic Web,                (iv) CL definitions.
higher order logic.
                                                                  (v) Extensible operators.
                    I. INTRODUCTION
                                                                   Just having a CL-based query language would be a
    Franz Inc is researching the possibility of               big improvement on the current state of RDF-based
implementing a Common Logic [1] parser and query              tools. Unlike SPARQL (introduced below), it would
processor for our RDF [2] store, AllegroGraph. There          have a “real” (model-theoretic) semantics, it would
is a very wide scope for how large a subset of the            have clean syntax that assumes a calculus-like rather
language is implemented, and more significantly, what         than an algebraic formulation of clauses, it would be
kinds of reasoning are supported. Here we discuss             expressive, and it wouldn’t break your fingers when
several of the issues and possibilities.                      you type it.
     A primary goal for marrying Common Logic (CL)                The inclusion of a definition language that
to AllegroGraph is to achieve scalable logical                subsumes OWL (easy) would allow for a calculus that
inference over a dynamic fact base. We believe the            spans the range of RDF-based languages with a single
AllegroGraph infrastructure is well suited for crafting a     syntax.
scalable reasoner, however, expressive logics are
inherently non-scalable, and there are severe trade-offs           There are several “sweet spots” that could be
that must be made to achieve reasoning over billions of       supported; a sweet spot being a language subset that
statements.                                                   supports sound and complete reasoning while being at
                                                              least moderately scalable. We will note these as we
     As part of the presentation we will discuss our          examine various trade-offs.
proposal for implementation and requirements for the
intelligence community. We will discuss a number of
the tradeoff as described below.                                             III. A BASELINE SYSTEM

                                                                   The baseline is a simple query language with
         II. COMMON LOGIC IMPLEMENTATION                      atomic ground assertions. Here, we explicitly exclude
                                                              the possibility of rules or definitions. Such a language
     A full-fledged implementation would include the          would allow arbitrary CL expressions to be evaluated
following features:                                           against the fact base.        However, both universal
                                                              quantifiers and classical negation operators will always
    (i) A user-friendly query language, based on CL,          evaluate to false (or unknown) in this scheme; the
    that supports arbitrary boolean expressions in the        former due to the absence of any kind of “closure”
    “where” clause (here, we are imagining                    operator, and the latter due (notionally) to the absence
    augmenting a SQL-like select-from-where syntax).          of statements that can logically contradict one another.
     The simplest way to “achieve closure” is to admit      they are a convenience (a major convenience). There
operators that define local closed world assumptions.       are relatively few examples of systems built using non-
For example, a negation-as-failure operation assumes        recursive rules; however, limited practical experience
that the facts within the scope of a single query are       has revealed that most recursive Horn rules can be
situationally-complete.     A closed-world universal        reformulated into equivalent non-recursive rules
quantifier makes a similar assumption. Both of these        combined with an (expressive) transitive closure
operators are completely scalable. In other words, we       operator. The most serious drawback to this scheme
can add them without reducing our expectations on           (non-recursive rules) is that it is theoretically
how scalable our language is. The strategy of               uninteresting. There is nothing semantically to write
embedding the closure within a language operator,           about, so there are no papers on the subject.
rather than, say, within a predicate, minimizes the
scope of the closures, and allows both open and closed-          Finally, one could design a system that combines
world reasoning to be applied to the same model.            recursive and non-recursive rules. This is a quite
                                                            viable option. The only caveat is that only highly-
     Alternatively, one could permit assertion of OWL-      disciplined users are likely to reformulate as many
like operators (e.g., max-cardinality and all-values-       rules as possible into non-recursive equivalents. The
from) to achieve closure. Introducing these operators       benefits of doing so would be orders of magnitude
immediately eliminates the possibility of scalable          increases in query performance, but your average user
reasoning, or complete reasoning, or both. Here, we         might not master the technique.
are discounting databases where large numbers of
asserted and derived have been laboriously loaded and
then compiled, yielding a base that melts on the first                         V. DEFINITIONS
update; a “dynamic” scalable application will support
real-time updates as a matter of course.                         We face another choice when we add definitions
                                                            into the mix. If we interpret our definitions as if-and-
     Our baseline would not be complete without a           only-if rules, then we have abandoned hope of scalable
transitivity operator. Transitive closure is the most       inference. Alternatively, we can apply an asymmetric
important of all the classes of inference. Simplest         (if but not only-if) interpretation to Horn-like
would be to include the equivalent of an                    definitions to achieve an expressivity equivalent to
‘owl:TransitiveProperty’ declaration, but practical         Horn rules These are superior to one-directional rules,
experience has shown that the addition of a transitive      because the only-if portion can be reserved for
closure operator to the query language syntax has           constraint-checking/data validation. A single syntax
important benefits. Specifically, it is useful to be able   should suffice for either interpretation of a definition
to define transitive closures over compound binary          (asymmetric or bi-directional); one can envision using
expressions; something that OWL can’t do.                   a single set of definitions for both scalable inference
AllegroGraph includes specialized accelerators for          and small-scale but rich inference.
computing transitive closures.
                                                                                VI. INFERENCE
                  IV. ADDING RULES
                                                                 Tableaux-based reasoners appear to be inherently
     The addition of Horn-like rules significantly          non-scalable over dynamic databases.       Instead, we
increases the utility of the language. Here we face a       focus chiefly on rule-based reasoners. There are three
choice. If our rules are recursive, then syntactic          basic classes of rules: (1) backward-chaining rules, (2)
constraints must be placed upon both the heads and          forward-chaining rules, and (3) rewrite rules.
bodies of our rules if we are to retain completeness.       Backward-chaining rules are the best-behaved. They
This results in a Prolog-like semantics, with a CL          are relatively insensitive to database updates (cache-
syntax. The accompanying reasoning can be made              busting will occur, but it is manageable) and they are
moderately scalable.                                        moderately scalable.

     Alternatively, we can degree that our rules are             Forward-chaining rules are more powerful (from a
non-recursive. Here, we restrict our rules to having        completeness standpoint) than backward rules.
atomic heads, but we can permit arbitrarily expressive      However, some form of truth maintenance is required
tails. This scheme is both scalable and complete. We        to manage derived facts, and bitter experience has
know this because these rules do not actually increase      shown that truth maintenance does not scale. Hence,
the expressive power of the query language. Instead         this option is not viable for large scale applications.
                                                             the store versus predicates evaluated by other means
     Rewrite rules (also called “triggers”) are              (e.g., equality, inequality, etc.). Rather than treating
essentially forward rules that don’t bother to clean up      the context/graph dimension as simply one additional
after themselves when updates are made. Instead, they        argument (to a triple), it adds orthogonal syntactic
are interpreted as having a sematics external to the         constructs that interleave with the already cumbersome
system. This makes them highly useful, but it is             triples and filters. While simple SPARQL queries are
“buyer beware” when it comes to semantics. Rewrite           fairly readable, when complexities such as disjunctions
systems have difficulty managing the trigger portion of      are utilized, SPARQL queries become very difficult to
very expressive rules. For the handling of expressive        compose and interpret.
rewrite rules, we recommend the introduction of
“trigger” clauses into the syntax. The assumption is              In the logic world, a primary weapon to counter
that such rules will fire only when updates to the fact      syntactic complexity is to base the semantics of a logic
base are detectible by the trigger portion(s) of the rule;   on a small number of primitive operators, and to define
other (presumably more expensive) clauses in the rule        the remaining operators as compositions of the
will not be monitored. Most uses of rewrite rules (e.g.,     primitives. In this case, the bulk of language syntax
Jess rules) are applied only to modest sized databases.      may be regarded as syntactic sugar; this makes the job
                                                             of implementing the language much more manageable.
     The extensible operator feature allows arbitrarily      This is how, for example, KIF [4] and Common Logic
complex operators to be added to the language. This          have been defined. SPARQL has taken the opposite
allows for exotic operators like “cut” or modals to be       approach; it has a large number of different semantic
added.     This is possible because the specialist           operators, and is defined in terms of a procedural
mechanism includes hooks into the internals of the           semantics rather than a declarative semantics. That
query executor. High-end inference can be achieved           means that the traditional compositional semantics
by adding additional operators to the rule engine that       approach cannot be applied to SPARQL.
include their own logic interpreters.
                                                                  The combination of bloated syntax and an
     AllegroGraph’s implementation of CL will use the        essentially non-existent semantics means that
extension mechanism to provide access via CL to its          SPARQL cannot readily server as a foundation for the
built-in geospatial, temporal and social network             addition of rules, modal operators, and other higher
analysis features.                                           level constructs. This leaves the field open to
                                                             competing languages such as Common Logic.
            VII. COMMON LOGIC AND RDF
                                                                      VIII. EXPRESSIVE POWER EXAMPLES
     Scalable logic-based applications will most likely
be built on top of an RDF triple store. It makes sense            In this section, we look at some simple examples
to ask what contribution Common Logic can make in            where the expressive power of Common Logic can be
this context. In fact, a query language based on             applied to treat representational problems that are
Common Logic would have a number of advantages.              difficult or impossible to solve using a SPARQL-like
This is due in part to the fact that SPARQL, the defacto     language. We will use a KIF-like syntax to express our
standard in the RDF world, has a number of serious           rules.
deficiencies that discourage its use for higher-level
logic applications.                                               A common claim made by many RDF advocates is
                                                             that “the Semantic Web is open world”. Practical
     SPARQL [3] is a W3C-recommended query                   experience indicates that this statement is a complete
language for RDF data. It has been designed to enable        falsehood; in fact, not only are there “pockets” of
expressions of common, everyday queries in a style           assertions in most semantic networks best treated using
that mimics a syntax used elsewhere to express atomic        close-world semantics, but these “pockets” tend to be
ground assertions. The majority of developers of RDF         the locus of the highest-valued information. Therefore,
stores provide implementations of SPARQL; this has           a practical Semantic Web language will include
significantly spurred the growth of RDF-based tools          constructs to treat close-world models.
and technology.
                                                                 Consider the predicate “single”, as in “not married”.
    One serious drawback of SPARQL is that it takes          It is conventional to treat the definition of the “single”
the “kitchen sink” approach to syntax. SPARQL has            predicate as the closed-world negation of the predicate
two “and” operators, two “or operators, and an               “married”, e.g.,
awkward division between predicates evaluated against
  (<= (married ?p)                                           kinds of universal quantification expressible in these
       (exists (?s) (spouse ?p ?s)))
                                                             languages (SQL has various aggregate operators;
  (<= (single ?p)                                            SPARQL makes no provisions for universal
       (not (married ?p)))                                   quantification); (ii) it requires that scope rules for
                                                             variables be implicit rather than explicit, which works
     In other words, if you don’t know that a person is      well most of the time, but not always. Here is an
married, assume s/he is single. This isn’t guaranteed to     example representing a simplification of an application
be true; but its the way that personnel data is utilized a   that this author encountered, where the lack of an
great deal of the time.                                      explicit existential quantifier (and accompanying
                                                             scoping) made composing the query difficult. The
     The semantics of closed-world negation can either       (simplified) problem is to query for two degrees of
be assumed to attach to the underlying domain model,         distance from Kevin Bacon, based on a ‘knows’
or to be attributed to a logic operator. In the latter       relationship. Here is the query expressed without
case, since ‘not’ denotes classical negation, we would       reference to existential quantification:
replace ‘not’ by a specific negation-as-failure operator
(variously called ‘thnot’, ‘unsaid’, etc.) to achieve the      (select ?x (where
                                                                    (or (?x = Kevin)
desired semantics, e.g.,                                               (and (knows Kevin ?x1)
                                                                          (or (?x = ?x1)
  (<= (single ?p)                                                            (and (knows ?x1 ?x2)
       (unsaid (married ?p)))                                                   (?x = ?x2))))))

     Next, consider universal quantification. The rule            The query succeeds only if the variable ?x1 is the
below states that you are “off the hook” if all of your      same throughout the query. In many quantifier-free
children have graduated from college:                        languages (e.g., SPARQL) variables in parallel
                                                             disjuncts can have the same name but not be
  (<= (off-the-hook ?p)
       (forall (?c)
                                                             considered the same variable. This is done for a very
            (implies (child ?p ?c)                           good reason; however, it means that we can’t be sure
                (graduated-from-college ?c)                  how the above query will be evaluated without a
                                                             detailed inspection. The actual query found in the
     The trick to evaluating this predicate in a practical   application was more complex than this, because the
domain lies chiefly in determining if the set of children    entities were related by more than one predicate. If
known for an individual Fred constitutes the complete        you replace
set of Fred’s children. This kind of information is               (knows ?x1 ?x2) above by
typically hard to locate. Instead of looking for a                   (or (knows ?x1 ?x2) (likes ?x1 ?x2)
guaranteed answer, it is more typical to query for all of
Fred’s children, and ask if each of those retrieved has      then you will have a better approximation of the
graduated. This answer can be trusted as far, and only       complexity of the query in the application. Doing so
as far, as the closed-world assumption holds.                makes the scoping that much more tenuous. In fact,
                                                             the query language used in the application turned out
    The ability to make closed-world assumptions             to have scoping rules that assumed that the variable
about sets of entities is critical to many real-world        ?x1 was not unique across the query. This made it
applications. Having a universal quantifier in the           necessary to rewrite the query, approximately doubling
language enables this reasoning to be computed               its size. On the other hand, if we have an explicit
endogenously, rather than relegating it to the               existential quantifier, none of this “guessing” is
procedurally-evaluated portion of an application.            necessary:

     One would also like aggregate entities to be              (select ?x (where
                                                                   (or (?x = Kevin)
treatable within a logic.        Here is a (somewhat                   (exists (?x1)
simplistic) definition of the term “family”:                              (and (knows Kevin ?x1)
                                                                            (or (?x = ?x1)
  (<= (family ?p ?fam)                                                          (and (knows ?x1 ?x2)
     ?fam = (setof (?r) (or (spouse ?p ?r)                                          (?x = ?x2)))))))
                            (child ?p ?r))))
                                                                 Lastly, a host of logic-based applications find it
    Query languages such as SQL and SPARQL do                useful (and in a cognitive-sense, “necessary”) that the
not allow for explicit universal quantifiers in their        language support n-ary predicates and n-ary functions.
syntax. This has two consequences: (i) it limits the         The Franz product features a suite of geospatial,
temporal, and semantic network reasoners that are best           The implementation community for Common
exploited using queries that employ n-ary predicates.        Logic needs to produce a target specification that is
The query below evaluates:                                   both “doable” and useful to a significant class of
                                                             applications. There is a chicken-and-egg component,
     Retrieve important people known to Bob who              since one needs to have an expressive language
attended a meeting in or near Berkeley, CA in                available to appreciate why and how one can use it.
November, 2008.
                                                                                       REFERENCES
  (select (?p)
       (and                                                  [1] International Standard ISO/IEC 24707 Information technology
       (ego-group bob knows ?group 2)                        — Common Logic (CL): a framework for a family of logic based
       (actor-centrality-members                             languages.                                                      URL:
               ?group knows ?p ?importance)                  http://standards.iso.org/ittf/PubliclyAvailableStandards/c039175_IS
       (participant ?event ?p)                               O_IEC_24707_2007(E).zip.
       (instance ?event Meeting)                             [2]     RDF/XML        Syntax       Specification    (Revised)  W3C
       (interval-during ?event “2008-11-01”                  Recommendation           10       February        2004.         URL:
                              “2008-11-05”)                  http://www.w3.org/TR/rdf-syntax-grammar/.
       (contains (geo-box-around                             [3] SPARQL Query Language for RDF W3C Recommendation 15
               (location Berkeley) 5 miles)                  January 2008. URL: http://www.w3.org/TR/rdf-sparql-query/.
               (location ?event))))                          [4] Knowledge Interchange Format draft proposed American
                                                             National        Standard        (dpANS)              NCITS.T2/98-004.
    Here the ‘ego-group’ predicate is a distance-2           http://logic.stanford.edu/kif/dpans.html.
Kevin Bacon computation (note how much simpler it is
than the previous query).

     Another comment on the “Kevin Bacon” query:
When the relationship predicate is the same on all
layers, then a built-in version of the computation can
be expected to execute significantly faster than the
same computation phrased in logic. However, the
original query referenced a different predicate at the
first level than the second, and referenced four
different predicates at that second level, so a built-in
operator was not available. The moral being that built-
ins are not a universal panacea for expressiveness.

     This section has surveyed a sample of Common
Logic language constructs to suggest that users benefit
both by (i) the ability to program a larger portion of
their applications within the logic, rather than resorting
to procedural manifestations, and (ii) that use of more
expressive constructs can reduce the complexity of the
resulting rules and queries, making the language more
usable by humans.

                     IX. SUMMARY

     Adding a Common Logic interface and interpreter
to an RDF store would provide a spectrum of possible
benefits. At one end, a careful exploitation of CL
features would provide “heightened” versions of semi-
conventional query processing, over a dynamic,
scalable platform.       At the high-end, one can
contemplate experimenting with combinations of
powerful reasoners operating over relatively small sets
of data interacting with the large-scale query engine.