<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>pSPARQL: A Querying Language for Probabilistic RDF (Extended Abstract)</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Hong Fang</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Xiaowang Zhang</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>College of Arts and Sciences, Shanghai Polytechnic University</institution>
          ,
          <addr-line>Shanghai 201209</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Key Laboratory of Computer Network and Information Integration (Southeast University), Ministry of Education</institution>
          ,
          <addr-line>Nanjing 211189</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>School of Computer Science and Technology, Tianjin University</institution>
          ,
          <addr-line>Tianjin 300350</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Tianjin Key Laboratory of Cognitive Computing and Application</institution>
          ,
          <addr-line>Tianjin 300350</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In this paper, we present a querying language for probabilistic RDF databases, where each triple has a probability, called pSRARQL, built on SPARQL, recommended by W3C as a querying language for RDF databases. Firstly, we present the syntax and semantics of pSPARQL. Secondly, we define the query problem of pSPARQL corresponding to probabilities of solutions. Finally, we show that the query evaluation of general pSPARQL patterns is PSPACEcomplete.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        Resource Description Framework (RDF)5 is the standard data model in the
Semantic Web. In our real world, RDF data possibly contains some uncertainty
data due to the diversity of data sources such as YAGO6. For instance, some
RDF data is generated from raw data via knowledge extraction and machine
learning. However, RDF model itself provides little support for uncertain data
[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. There are some approaches to querying over probabilistic RDF [
        <xref ref-type="bibr" rid="ref1 ref2 ref3 ref4">1,2,4,3</xref>
        ].
Though those approaches can query probabilistic RDF, they cannot support
SPARQL 7 which, recommended by W3C, has become the standard language
for querying RDF data since 2008. Indeed, SPARQL has been applied to query
probabilistic ontologies [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
      <p>In this paper, we present an extended querying language (called pSPARQL:
probabilistic SPARQL) for probabilistic RDF databases with supporting
SPARQL. As an important result of this paper, we show that the query evaluation of
general pSPARQL patterns is PSPACE-complete, which has the same
complexity of SPARQL.
? Corresponding author: xiaowangzhang@tju.edu.cn
5 https://www.w3.org/TR/rdf11-primer/
6 http://www.mpi-inf.mpg.de/yago./
7 https://www.w3.org/TR/rdf-sparql-query/</p>
    </sec>
    <sec id="sec-2">
      <title>Probabilistic RDF</title>
      <p>Let I, B, and L be infinite sets of IRIs, blank nodes and literals, respectively.
These three sets are pairwise disjoint. We denote the union I [ B [ L by U , and
elements of I [ L will be referred to as constants.</p>
      <p>A triple (s; p; o) 2 (I [ B) I (I [ B [ L) is called an RDF triple. An
RDF graph is a finite set of RDF triples.</p>
      <p>A probabilistic RDF R is a pair (G; ) where G is an RDF graph and is a
total function from G ! [0; 1]. Intuitively speaking, is a probability function
mapping each triple to a probability.</p>
      <p>For instance, let R = (G; ) be a probabilistic RDF with G = ft1; t2; t3g
and is a function from G ! [0; 1] defined in the following table.</p>
      <p>No Triple
t1 (John, sufferedFrom, Schizophrenia) 0.32
t2 (John, sufferedFrom, MentalDisorder) 0.84
t3 (John, Treatedby, Psychiatrist) 0.95
3</p>
      <p>
        pSPARQL
In this section, we introduce a probabilistic SPARQL (for short, pSPARQL).
Patterns The syntax of pSPARQL is slightly different from the syntax of
SPARQL [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>Assume furthermore an infinite countable set V of variables, disjoint from
U . It is a SPARQL convention to prefix each variable with a question mark.</p>
      <p>Patterns are now inductively defined as follows.
– Any triple from (I [ L [ V ) (I [ V ) (I [ L [ V ) is a pattern (called a
triple pattern.
– If P1 and P2 are patterns, then so are the following: P1 UNION P2, P1 AND</p>
      <p>P2, and P1 DIFF P2.
– If P is a pattern and C is a constraint, then P FILTER C is a pattern. Here,
a constraint is a boolean combination (C1 ^ C2, C1 _ C2, or :C) of atomic
constraints with one of the three following forms: bound(?x), ?x =?y, and
?x = c with ?x; ?y 2 V and c 2 U .</p>
      <p>Note that, in pSRARQL, we leave out the OPTIONAL operator while we
add the DIFF operator in SPARQL 1.1. Indeed, the treatment is allowed since
OPTIONAL can be expressed by AND, UNION, and DIFF as follows:
P OPT Q = (P AND Q) UNION (P DIFF Q):
(1)
Semantics The semantics of pSPARQL patterns is defined in terms of sets of
pairs of the form ( ; p) (called a solution) where is simply a total function
: S ! U on some finite set S of variables and p 2 [0; 1]. We denote the
domain S of by dom( ). Note that ( ; p) is meaningless if p = 0. For
simplification, we mainly consider p 6= 0 in the following.</p>
      <p>Now given an RDF graph R = (G; ) and a pattern P , we define the
semantics of P on R, denoted by JP KR, as a set of mappings, in the following
manner.</p>
      <p>– If P is a triple pattern (v1; v2; v3), then</p>
      <p>P R := f( ; p) : fv1; v2; v3g \ V ! U j
J K</p>
      <p>( (v1); (v2); (v3)) 2 G and
p = max
( (v1); (v2); (v3))2G
f (( (v1); (v2); (v3)))gg:
Here, for any mapping and any constant c 2 I [ L, we agree that (c)
equals c itself. In other words, mappings are extended to constants according
to the identity mapping.
– If P is of the form P1 UNION P2, then</p>
      <p>P R := f( ; p) j ( ; p1) 2 JP1KR or ( ; p2) 2 JP2KR
J K
and p = maxfp1; p2gg:
– If P is of the form P1 AND P2, then JP KR := JP1KR on JP2KR, where, for
any two sets of solutions 1 and 2, we define
1 on
= f( 1 [ 2; p) j ( 1; p1) 2
1; ( 2; p2) 2</p>
      <p>2; 1 2
and p = p1 p2g:
Here, two mappings 1 and 2 are called compatible, denoted by 1 2,
if they agree on the intersection of their domains, i.e., if for every variable
?x 2 dom( 1) \ dom( 2), we have 1(?x) = 2(?x). And p is the product
of p1 and p2.
– If P is of the form P1 DIFF P2, then JP KR := JP1KR r JP2KR, where, for
any two sets of mappings 1 and 2, we define
1 r
2 = f( 1; p) 2
1 j :9( 2; p2) 2
2 s.t.</p>
      <p>
        1
2g:
– Finally, if P is of the form P1 FILTER C, then JP KG := f( ; p) 2 JP1KG j
(C) = trueg. Here, for any mapping and constraint C, the evaluation of
C on , denoted by (C), is defined as normal [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>The following proposition shows that the semantics of pSPARQL is
welldefined.</p>
      <p>Proposition 1. For any pSPARQL pattern P , for any probabilistic RDF R, for
any solutions ( ; p) 2 J K</p>
      <p>P R, we have p 2 [0; 1].
4</p>
    </sec>
    <sec id="sec-3">
      <title>Querying evaluation</title>
      <p>A basic SPARQL query is an expression of the form SELECTS (P ) where S is
a finite set of variables and P is a pattern. Semantically, given an RDF graph G,
we define JSELECTS (P )KG = f( jdom( )\S ; p) j ( ; p) 2 JP KGg, where we
use the common notation f jX for the restriction of a function f to a subset X
of its domain.</p>
      <p>Given a probabilistic RDF R, a pSPARQL pattern P , a mapping , a
probability p 2 [0; 1], the query evaluation problem is to determine whether there
exists some probability p0 2 [0; 1] with p0
p such that ( ; p0) 2 J K</p>
      <p>P R.</p>
      <p>Proposition 2. The query evaluation of general pSPARQL patterns is
PSPACEcomplete.</p>
    </sec>
    <sec id="sec-4">
      <title>Acknowledgments</title>
      <p>This work is supported by the program of Applied Mathematics Discipline of
Shanghai Polytechnic University (XXKPY1604) and the open funding project
of Key Laboratory of Computer Network and Information Integration
(Southeast University), Ministry of Education.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Dalvi</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Suciu</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          (
          <year>2004</year>
          ).
          <article-title>Efficient query evaluation on probabilistic databases</article-title>
          .
          <source>In: Proceedings of the Thirtieth International Conference on Very Large Data Bases (VLDB</source>
          <year>2004</year>
          ), pp.
          <fpage>864</fpage>
          -
          <lpage>875</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          (
          <year>2009</year>
          ).
          <article-title>Query evaluation on probabilistic RDF databases</article-title>
          .
          <source>In: Proceedings of the 10th International Conference on Web Information Systems Engineering (WISE</source>
          <year>2009</year>
          ), pp.
          <fpage>307</fpage>
          -
          <lpage>320</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Kementsietsidis</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pema</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Tan</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          (
          <year>2014</year>
          ).
          <article-title>Query answering over incomplete and uncertain RDF</article-title>
          .
          <source>In: Proceedings of the 18th International Workshop on Web and Databases (WebDB</source>
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Lian</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          (
          <year>2011</year>
          ).
          <article-title>Efficient query answering in probabilistic RDF graphs</article-title>
          .
          <source>In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of Date (SIGMOD</source>
          <year>2011</year>
          ), pp.
          <fpage>157</fpage>
          -
          <lpage>168</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5. Pe´rez, J.,
          <string-name>
            <surname>Arenas</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Gutierrez</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          (
          <year>2009</year>
          ).
          <article-title>Semantics and complexity of SPARQL</article-title>
          .
          <source>ACM Transactions on Database Systems</source>
          ,
          <volume>34</volume>
          (
          <issue>3</issue>
          ):article 16,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Schoenfisch</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          (
          <year>2014</year>
          ).
          <article-title>Querying probabilistic ontologies with SPARQL</article-title>
          .
          <source>In: Proceedings of the 44th Jahrestagung der Gesellschaft fr Informatik-Komplexita¨t meistern (GIJahrestagung</source>
          <year>2014</year>
          ), pp.
          <fpage>2245</fpage>
          -
          <lpage>2256</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>