Expressing No-Value Information in RDF

               Fariz Darari, Radityo Eko Prasojo, and Werner Nutt

         Faculty of Computer Science, Free University of Bozen-Bolzano, Italy
              {fariz.darari,radityoeko.prasojo}@stud-inf.unibz.it
                                nutt@inf.unibz.it


        Abstract. RDF is a data model to represent positive information. Con-
        sequently, it is not clear how to represent the non-existence of information
        in RDF. We present a technique to express such information in RDF and
        incorporate it into SPARQL query answering. Given an empty query an-
        swer, our technique can distinguish whether it is empty due to possibly
        incomplete information, or non-existent information.

Keywords: RDF, negative knowledge, nulls, SPARQL
Reason to visit: To discover how no-value information can (1) be represented
in RDF and (2) be used to infer SPARQL query emptiness.


1     Introduction
RDF is mainly used to express positive information. However, representing neg-
ative information is often of interest in practice. For instance, on Wikidata [1],
we have the following information about Elizabeth I not having any children.


        Fig. 1. No-value information on Wikidata (http://j.mp/elizabethI)
    In the above figure, Wikidata explicitly states that Elizabeth I had no chil-
dren since the property child has “no value”.1 This is different than not record-
ing anything at all which would imply possibly incomplete information for the
children of Elizabeth I. To express this in RDF, one may be tempted to as-
sign a special datatype constant noValue to represent the no-value informa-
tion of the children of Elizabeth I, creating the triple (elizabethI, child, noValue).
However, this creates a problem since executing the SPARQL ASK query Q =
({}, {(elizabethI, child, ?y)}) asking if Elizabeth I has a child, would give the an-
swer ‘yes’. Indeed, due to no formal definition, it is not clear how to properly
use noValue.
1
    For further information about no values on Wikidata, refer to https://www.
    wikidata.org/wiki/Wikidata:Glossary.
2       Fariz Darari, Radityo Eko Prasojo, and Werner Nutt

    The notion of no-value information was first introduced in the relational
databases [2]. There, the term ‘null value’ was used, which may have different
meanings: there exists no value (i.e., non-existence); there exists a value but
it is unknown; or it is unknown whether a value exists. For the second case,
we can leverage RDF blank nodes, while for the third case, the Open World
Assumption (OWA) of RDF simply permits it. However, RDF cannot represent
the first case, which is the no-value nulls, while in fact this no-value information
is useful to distinguish from incomplete information. Furthermore, by having
no-value information, an empty query answer can have two different meanings:
whether it is empty because of possibly incomplete information, or whether it is
truly empty from information that does not exist in the real-world.
    In this poster, we present a technique for representing no-value information
in RDF and incorporating such information into query answering. This introduc-
tion is followed by a formalization of no-value information and query answering
in the presence of such information, a concrete RDF representation of no-value
information, and a discussion.

2   Formalization
Preliminaries. Assume there are three pairwise disjoint infinite sets I (IRIs), L
(literals) and V (variables). A tuple (s, p, o) ∈ I × I × (I ∪ L) is called a triple.
An RDF graph G consists of a finite set of triples.
    SPARQL is the standard query language for RDF [3]. The basic building
blocks of a SPARQL query are triple patterns, which look like RDF triples,
except that in each position, variables are also allowed. In this work, we focus
on the conjunctive fragment of SPARQL where queries are represented as basic
graph patterns (BGPs), that is, sets of triple patterns. The evaluation of a BGP
P over G is defined as JP KG = { µ | µP ⊆ G and dom(µ) = var (P ) }. Given a
query Q = (W, P ), where P is a BGP and W ⊆ var (P ) is the set of distinguished
variables, the evaluation JQKG is the restriction of JP KG to W . Over Q, we define
the prototypical graph P̃ as the graph resulting from mapping each variable
in P to a fresh IRI. The prototypical graph encodes any possible graph that
can satisfy the query. Furthermore, a CONSTRUCT query has the abstract form
(CONSTRUCT P1 P2 ) where both P1 and P2 are BGPs. Evaluating a CONSTRUCT
query over G in a graph where P1 is instantiated with all the mappings in
JP2 KG .

  Let us now formalize no-value information. We first define no-value state-
ments to capture which information is non-existent.

Definition 1 (No-Value Statement). A no-value statement N is defined as
No(P ) where P is a BGP. To N , we associate the CONSTRUCT query QN =
(CONSTRUCT P P ).

We use BGP to have a flexibility to represent complex no-values which need more
than one triple patterns. We then define an incomplete data source to model the
OWA of RDF graphs. As in [4], an incomplete data source G = (Ga , Gi ) is a pair
                                    Expressing No-Value Information in RDF           3

of an available graph Ga and an ideal graph Gi such that Ga ⊆ Gi . Here, an
available graph is the graph that we have, whereas an ideal graph is a possible
extension over the available graph, which represents a version of ideal, complete
information.
    Having no-value statements restricts the possibilities of ideal graphs since
they must not contain any instantiations of the information denotedSby the state-
ments. Over a graph G, we define the transfer operator TN (G) = N ∈N JQN KG .
We define the semantics of no-value statements as follows.

Definition 2 (Satisfaction of No-Value Statements). An incomplete data
source G = (Ga , Gi ) satisfies a set N of no-value statements, written as G |= N ,
if and only if TN (Gi ) = ∅.

Note that since Ga ⊆ Gi holds by the definition of an incomplete data source,
TN (Gi ) = ∅ implies TN (Ga ) = ∅. Next, we define the emptiness of a query over
an incomplete data source.

Definition 3 (Query Emptiness). Let G = (Ga , Gi ) be an incomplete data
source and Q a query. To express that Q is empty, we write Empty(Q). It is
the case that G |= Empty(Q) if and only if JQKGi = ∅.

    Query emptiness over one incomplete data source does not always mean
that it always holds also over other incomplete data sources. For this reason, we
define that the entailment of a set N of no-value statements and query emptiness
Empty(Q) holds, written as N |= Empty(Q), if for any incomplete data source
G |= N , we have that G |= Empty(Q). If the entailment holds, we can guarantee
that the query will always return an empty answer no matter which possible
extensions of a graph are considered. We have the following theorem to check if
a set of no-value statements can guarantee the emptiness of queries.

Theorem 1 (Query Emptiness Entailment from No-Value Statements).
Let N be a set of no-value statements, Q be a query, and P̃ be the prototypical
graph of Q. It is the case that N |= Empty(Q) if and only if TN (P̃ ) 6= ∅.

Example 1. Let N = No({ (obama, child, ?c), (?c, gender, male) }) be a no-value
statement about Obama having no sons. Consider the query Q = ({?c, ?s},
{ (obama, child, ?c), (?c, gender, male), (?c, school, ?s) }) asking for the schools of
Obama’s sons. We have that T{N } (P̃ ) 6= ∅. Thus, from Theorem 1, it holds that
{N } |= Empty(Q). This means that Q returns an empty answer because of non-
existence of the information that is asked, not by the incompleteness of the data
source. In contrast, suppose the constant male in the query Q were a variable
?g. If Q returns an empty answer over the data source, that may be due to the
incompleteness of the data source.

    As seen in the above example, if there is some part of the query that cannot
return any answer due to no-value information, then the whole query does not
return any answer. Now, we can distinguish between empty query answers from
possibly incomplete information, and empty query answers from non-existent
information.
4       Fariz Darari, Radityo Eko Prasojo, and Werner Nutt

3    RDF Representation of No-Value Statements
To concretely represent no-value statements in RDF, we use the reification tech-
nique. Given a no-value statement No({ (s1 , p1 , o1 ), . . . , (sn , pn , on ) }), we repre-
sent the statement as a resource of the class NoValStatement, while for each
triple pattern, we use a blank node with the properties subject, predicate,
and object. Variables are also represented via blank nodes with the property
varName. Each of the patterns’ blank nodes is linked to the statement’s resource
via the property hasPattern. For instance, we represent the no-value statement
“Obama has no sons” as follows:
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix ex:   <http://example.org/> .

ex:stmtObama a ex:NoValStatement ; rdfs:label "Obama has no sons" ;
  ex:hasPattern [ ex:subject ex:Obama ; ex:predicate ex:child ;
                  ex:object [ex:varName "c"]] ;
  ex:hasPattern [ ex:subject [ex:varName "c"] ; ex:predicate ex:gender ;
                  ex:object ex:male ] .

4    Discussion
In this paper, we present a technique for representing no-value information in
RDF and checking the emptiness of queries based on such information. The no-
value information restricts possible extensions of an RDF graph wrt. OWA. As
a consequence, queries always return an empty answer if they try to capture no-
value information. No-value information can also be seen as a stronger version of
completeness information, since it also enforces that all possible extensions must
contain no corresponding information. Furthermore, when a query is ensured to
always return an empty answer, it is obvious that the query is also complete.
Hence, this work can complement the completeness reasoning framework for
RDF data sources as described in [4]. Another use of no-value information is for
data cleaning. If we assume that no-value statements are correct, then we can
detect dirty data sources by checking if they contain information that has been
stated to be non-existent by the statements. For future work, we will study the
relation of our approach to OWL and to more expressive queries.
Acknowledgments The research was supported by the projects “MAGIC:
Managing Completeness of Data” funded by the Bolzano province and “CANDy:
Completeness-Aware Querying and Navigation on the Web of Data” by UniBZ.

References
1. Fredo Erxleben, Michael Günther, Markus Krötzsch, Julian Mendez, and Denny
   Vrandecic. Introducing Wikidata to the Linked Data Web. In ISWC, 2014.
2. Serge Abiteboul, Richard Hull, and Victor Vianu. Foundations of Databases.
   Addison-Wesley, 1995.
3. Steve Harris and Andy Seaborne, editors. SPARQL 1.1 Query Language. W3C
   Recommendation, 21 March 2013.
4. Fariz Darari, Werner Nutt, Giuseppe Pirrò, and Simon Razniewski. Completeness
   Statements about RDF Data Sources and Their Use for Query Answering. In ISWC,
   2013.