=Paper= {{Paper |id=Vol-1665/paper2 |storemode=property |title=Analyzing Real-World SPARQL Queries in the Light of Probabilistic Data |pdfUrl=https://ceur-ws.org/Vol-1665/paper2.pdf |volume=Vol-1665 |authors=Joerg Schoenfisch,Heiner Stuckenschmidt |dblpUrl=https://dblp.org/rec/conf/semweb/SchoenfischS16 }} ==Analyzing Real-World SPARQL Queries in the Light of Probabilistic Data== https://ceur-ws.org/Vol-1665/paper2.pdf
Analyzing Real-World SPARQL Queries in the Light of
                Probabilistic Data

                      Joerg Schoenfisch and Heiner Stuckenschmidt

                            Data and Web Science Group
                               University of Mannheim
                          B6 26, 68159 Mannheim, Germany
                 {joerg,heiner}@informatik.uni-mannheim.de



       Abstract. Handling uncertain knowledge – like information extracted from un-
       structured text, with some probability of being correct – is crucial for modeling
       many real world domains. Ontologies and ontology-based data access (OBDA)
       have proven to be versatile methods to capture this knowledge. Multiple systems
       for OBDA have been developed and there is theoretical work towards probabilis-
       tic OBDA, namely identifying efficiently processable (safe) queries. However,
       there is no analysis on the safeness of probabilistic queries in real-world applica-
       tions, or in other words the feasibility of fulfilling users’ information needs over
       probabilistic data. In this paper we investigate queries collected from several pub-
       lic SPARQL endpoints and determine the distribution of safe and unsafe queries.
       This analysis shows that many queries in practice are safe, making probabilistic
       OBDA feasible and practical to fulfill real-world users’ information needs.

       Keywords: safeness of probabilistic queries, SPARQL, probabilistic data, linked
       data


1   Motivation
Ontology-based Data Access (ODBA) has received a lot of attention in the Semantic
Web Community. In particular, results on light-weight description logics that allow ef-
ficient reasoning and query answering provide new possibilities for using ontologies in
data access. One approach for ontology-based data access is to rewrite a given query
based on the background ontology in such a way that the resulting – more complex –
query can directly be executed on a relational database. This is possible for different
light-weight ontology languages, in particular the DL-Lite family [1].
     At the same time, many applications in particular on the (Semantic) Web have to
deal with uncertainty in the data. Examples are large-scale information extraction from
text or the integration of heterogeneous information sources. To cope with uncertainty,
the database community has investigated probabilistic databases where each tuple in
the database is associated with a probability indicating the belief in the truth of the
respective statement. Querying a probabilistic database requires not only to retrieve
tuples that match the query, but also to compute a correct probability for each answer.
Research in probabilistic databases has shown that there is a strict dichotomy of safe
(data complexity in PT IME) and unsafe (in #P-hard) queries [15].
    The goal of our work is to develop data access methods that can use background
knowledge in terms of a light-weight ontology and also deal with uncertainty in the
data. A promising idea for efficiently computing probabilistic query answers is to use
existing approaches for OBDA based on query rewriting and pose the resulting query
against a probabilistic database that computes answers with associated probabilities.
    Jung et al. have shown that query rewriting for OBDA can directly be lifted to
the probabilistic case [10]. Furthermore, they prove that the complexity results and the
dichotomy of safe and unsafe queries also carry over. As we have shown in [14] this
approach is well applicable in the real-world and scales well to datasets of the size of
NELL [5] with millions of triples.
    In this paper we address the question of how real-world queries, and consequently
users’ information needs, are distributed concerning query safeness. Therefore, we an-
alyze the queries contained in the Linked SPARQL Queries Dataset (LSQ) [13], which
consists of real-world queries collected from several public SPARQL endpoints.
    The paper is structured as follows: In Section 2 we give the definition of exten-
sional query processing and query safeness for probabilistic queries and the application
to OBDA. The dataset and the analysis and its results are described in Section 3. In Sec-
tion 4 we briefly discuss some related work analyzing SPARQL queries. We conclude
and give directions for future work in Section 5.


2     Preliminaries
In this section we briefly introduce DL-Lite, the description logic underlying the OWL
2 QL profile. Next, we detail how the distribution semantics for probabilistic description
logics [12] is applied to DL-LiteR . Then, we explain extensional query processing –
the key to efficient query processing in probabilistic database – and give the definition
for query safeness. This lays the foundation for tractable ODBA on top of probabilistic
data.

2.1   DL-LiteR and the Distribution Semantics
In DL-LiteR concepts and roles are formed in the following syntax [4]:

                    B → A | ∃R                         C → B | ¬B
                                −
                    R→P |P                             E → R | ¬R


where A denotes an atomic concept, P an atomic role, and P − the inverse of the atomic
role P . B denotes a basic concept, i.e. either an atomic concept or a concept of the form
∃R, where R denotes a basic role, that is, either an atomic role or the inverse of an
atomic role. C denotes a general concept, which can be a basic concept or its negation,
and E denotes a general role, which can be a basic role or its negation.
    A DL-LiteR knowledge base (KB) K = hT , Ai models a domain in terms of a
TBox T and an ABox A. A TBox is formed by a finite set of inclusion assertions of the
form B v C or R v E. An ABox is formed by a finite set of membership assertions on
atomic concepts and on atomic roles, of the form A(a) or P (a, b) stating respectively
that the object denoted by the constant a is an instance of A and that the pair of objects
denoted by the pair of constants (a, b) is an instance of the role P .
    The semantics of a DL is as an interpretation I = (∆I , ·I ), consisting of a nonempty
interpretation domain ∆I and an interpretation function ·I that assigns to each concept
C a subset C I of ∆I , and to each role R a binary relation RI over ∆I :

                              AI ⊆ ∆ I
                              P I ⊆ ∆I × ∆I
                            (P )I = {(o2 , o1 )|(o1 , o2 ) ∈ P I }
                           (∃R)I = {o | ∃o0 .(o, o0 ) ∈ RI }
                          (¬B)I = ∆I \ B I
                          (¬R)I = ∆I × ∆I \ RI

    An interpretation I is a model of C1 v C2 , where C1 ,C2 are general concepts if
C1I ⊆ C2I . Similarly, I is a model of E1 v E2 , where E1 , E2 are general roles if
E1I ⊆ E2I .
    To specify the semantics of membership assertions, the interpretation function is
extended to constants, by assigning to each constant a a distinct object aI ∈ ∆I . This
enforces the unique name assumption on constants. An interpretation I is a model of a
membership assertion A(a) (resp., P (a, b)), if aI ∈ AI (resp., (aI , bI ) ∈ P I ).
    Given an assertion α and an interpretation I, I  α denotes the fact that I is a
model of α. Given a (finite) set of assertions λ, I  λ denotes the fact that I is a
model of every assertion in λ. A model of a KB K = hT , Ai is an interpretation I
such that I  T and I  A. A KB is satisfiable if it has at least one model. A KB K
logically implies an assertion α, written K  α, if all models of K are also models of
α. Similarly, a TBox T logically implies an assertion α, written T  α, if all models of
T are also models of α.
    A TIP-OWL (tuple-independent OWL) knowledge base T KB = hT , A, P i consists
of a DL-LiteR T-Box T , an ABox A, and a probability distribution P : A → [0, 1].
Abusing terminology, we say that KB ⊆ T KB if they have the same T-Box and the A-
Box of KB is a subset of the A-Box of T KB. We adopt the independent tuple semantics
in the spirit of the probabilistic semantics for logic programs proposed in [7] and define
the probabilistic semantics of a TIP-OWL knowledge base in terms of a distribution
over possible knowledge bases as follows:

Definition 1. Let T KB = hT , A, P i be a TIP-OWL knowledge base. Then the proba-
bility of a DL-LiteR Knowledge Base KB = hT , A0 i ⊆ T KB is given by:
                                                              
                                  Y                   Y
                 P (KB|T KB) =         P (a0 ) · 1 −    P (a)
                                   a0 ∈A0                 a∈A\A0

   Based on this semantics, we can now define the probability of existential queries
over TIP-OWL knowledge bases as follows. First, the probability of a query over a
possible knowledge base is one if the query follows from the knowledge base and zero
otherwise:
                                          
                                             1 KB |= Q
                            P (Q|KB) =
                                             0 otherwise
     Taking the probability of possible knowledge bases into account, the probability of
an existential query over a possible knowledge base becomes the product of the proba-
bility of that possible knowledge base and the probability of the query given that knowl-
edge base. By summing up these probabilities over all possible knowledge bases, we
get the following probability for existential queries over TIP-OWL knowledge bases:
                                      X
                    P (Q|T BK) =             P (Q|KB) · P (KB|T KB)
                                   KB⊆T KB

   This defines a complete probabilistic semantics for TIP-OWL knowledge bases and
queries.


2.2   Implementing Reasoning on Top of Probabilistic Databases

In this section, we first briefly recall the idea of first-order rewritability of queries in
DL-Lite and then show that the query rewriting approach proposed by [4] can be used
on top of tuple independent probabilistic databases for answering queries in TIP-OWL
without changing the semantics of answers.


Query Rewriting Query processing in DL-LiteR is based on the idea of first-order
reducibility. This means that for every conjunctive query q we can find a query q 0 that
produces the same answers as q by just looking at the A-Box. Calvanese et al. also
define a rewriting algorithm that computes a q 0 for every q by applying transformations
that depend on the T-Box axioms.
     Given a consistent T-Box T the algorithm takes a conjunctive query q0 and expands
it into a union of conjunctive queries U starting with U = {q0 }. The algorithm succes-
sively extends U by applying the following rule:

                                 U = U ∪ {q[l/r(l, I)]}
where q is a query in U , l is a literal in q, I is an inclusion axiom from T and r is a
replacement function that is defined as follows:

           l       I          r(l, I)         l        I            r(l, I)
                     0
           A(x) A v A         A0 (x)          P ( , x) A v ∃P  −
                                                                    A(x)
           A(x) ∃P v A        P (x, )         P ( , x) ∃P 0 v P −   P 0 (x, )
           A(x) ∃P − v A P ( , x)             P ( , x) ∃P v ∃P P 0 ( , x)
                                                           0−     −

           P (x, ) A v ∃P     A(x)            P (x, y) P 0 v P      P 0 (x, y)
           P (x, ) ∃P 0 v ∃P P 0 (x, )                   0−
                                              P (x, y) P v P    −
                                                                    P 0 (x, y)
           P (x, ) ∃P 0− v ∃P P 0 ( , x)                 0
                                              P (x, y) P v P   −
                                                                    P 0 (y, x)
                                                         0−
                                              P (x, y) P v P        P 0 (y, x)
Here ’ ’ denotes an unbound variable, i.e., a variable that does not occur in any other
literal of any of the queries.

Definition 2 (Derived Query). Let Q = L1 ∧ · · · ∧ Lm be a conjunctive query over a
                                   r
DL-Lite terminology. We write Q → Q0 if Q0 = Q∨Q[Li /r(Li , I)] for some literal Li
            r ∗                                r                               r ∗
from Q. Let → denote the transitive closure of →, then we call every Q0 with Q → Q0
                                                                           r ∗
a derived query of Q. Q0 is called maximal if there is no Q00 such that Q → Q00 and
    r ∗
Q0 → Q00 .

    Using the notion of a maximal derived query, we can establish the FOL-reducibility
of DL-Lite by rephrasing the corresponding theorem from [4].

Theorem 1 (FOL-Reducibility of DL-Lite (from [4])). Query answering in DL-LiteR
is FOL-reducible. In particular, for every query Q with maximal derived query Q0 and
every DL-LiteR T-Box T and non-empty A-Box A we have T ∪ A |= Q if and only if
A |= Q0 .


Correctness of Query Processing Implementing TIP-OWL on top of probabilistic
databases can now be done in the following way. The A-Box is stored in the proba-
bilistic database. It is easy to see that the probabilistic semantics of TIP-OWL A-Boxes
and tuple independent databases coincide. What remains to be done is to show that
the idea of FOL-reducibility carries over to our probabilistic model. In particular, we
have to show that a rewritten query has the same probability given a knowledge base
with empty T-Box as the original query given a complete knowledge base. This result
is established in the following corollary.

Corollary 1. Let T KB = hT , A, Pi be a TIP-OWL knowledge base and T KB0 =
h∅, A, Pi the same knowledge base, but with an empty T-Box. Let further Q be a con-
junctive query and Q0 a union of conjunctive queries obtained by rewriting Q on the
basis of T , then

                             P (Q|T KB) = P (Q0 |T KB 0 )

    We can easily see that the semantics of queries over a TIP-OWL A-Box with empty
T-Box directly corresponds to the tuple-independence semantics used in probabilistic
databases [15], thus queries posed to correctly constructed probabilistic database have
the same probability as a TIP-OWL query with empty T-Box. From theorem 1 we get,
that P (Q|KB) does not change as T , A |= Q if and only if A |= Q0 . Further, as
P (KB|T KB) only depends on A this part also stays unchanged.
    Over the past decade, the database community has developed efficient methods for
querying uncertain information in probabilistic databases. Important results are the in-
troduction of the independent tuple model for probabilistic data as well as a complete
characterization of queries that can be computed in polynomial time. We briefly review
these results in this section.
2.3   Extensional Query Processing
It has been shown that the key to efficient query processing in probabilistic databases
is to avoid the computation of complex event descriptions and to directly compute the
probability of a complex query from the probabilities of subqueries. This approach,
referred to as extensional query processing, has been shown to correctly compute the
probability of queries for the class of tractable queries using the following recursive
algorithm:


Algorithm 1 Extensionally compute P(Q) (from [6])
Require: A conjunctive query Q in CNF, a tuple independent probabilistic database
 1: Q is a conjunctive query in CNF: Compute symbol components Q = Q1 ∧ · · · ∧ Qm
 2: if m ≥ 2 then
               Q
 3:    return i=1,··· ,m P (Qi )
 4: end if
 5:
 6: Q = Q1 ∧ · · · ∧ Qk is a symbol-connected query in CNF:
 7: if k ≥ 2 thenP
       return − s⊆[n],s6=∅ (−1)|s| P
                                           W        
 8:                                          i∈s Qi
 9: end if
10:
11: Q is a disjunctive query: Compute symbol components Q = Q1 ∨ · · · ∨ Qm
12: if m ≥ 2 then                             
                      Q
13:     return 1 −      i=1,··· ,n 1 − P (Qi )
14: end if
15:
16: Q = Q1 ∨ · · · ∨ Qk is a symbol-connected disjunctive query:
17: if Q = t has no variables then
18:     return P (t)
19: end if
20:
21: if Q has a separator
                     Q variable z then            
22:     return 1 −      a∈ADom 1 − P (Q[a/x])
23: else
24:     return false
25: end if



    Symbol components are defined as follows [15]: Let Q = d1 ∧ . . . ∧ dk be a UCQ in
CNF, and K1 , . . . , Km the connected components
                                              V       for the set of queries
                                                                      V      {d1 , . . . , dk }.
The symbol-components of Q are Q1 = i∈K1 di , . . . , Qm = i∈Km di . Then the
probability P (Q) = P (Q1 ) · . . . · P (Qm ). If m = 1, then Q is symbol-connected.
    Each recursion of the algorithm processes a simpler subexpression, until the prob-
ability of an individual can be read directly from the database. Queries that can be
completely processed using the steps above are called safe. It has been shown that safe
queries exactly correspond to queries that can be computed in PT IME whereas queries
that cannot be completely processed using these steps – in particular queries for which
we cannot find a separator variable in line 19 – are in #P-hard. This means that we
can use the notion of safeness and the processing steps above to analyze the general
complexity of certain classes of queries.
Definition 3 (Safeness). A query Q is called safe if and only if it can be computed by
iteratively applying Algorithm 1.
Theorem 2 (Dichotomy Theorem (adapted from [15])). The probability of safe queries
can be computed in PTIME. If a query is not safe computing its probability is in #P-
hard.
     Jung et al. proved that Theorem 2 also holds for probabilistic OBDA in OWL 2
QL [10]. This essentially means that the same dichotomy of safe and unsafe queries
still applies under the presence of reasoning through query rewriting.

3    Analysis of the SPARQL Dataset and Query Safeness
For our analysis we used the queries collected in the Linked SPARQL Queries Dataset
(LSQ) [13]. LSQ contains queries which where posed against the public SPARQL end-
points of DBPedia1 , Linked Geo Data (LGD)2 , Semantic Web Dog Food (SWDF)3 , and
the British Museum (BM)4 . Although the queries in the dataset are not posed against
probabilistic data, we argue that the large number of queries gives a good picture of the
general information needs users have. In the case of DBpedia, there is actually some
amount of uncertainty in its generation, as the data is automatically extracted from
Wikipedia, which itself can uncertain/wrong information.


                       Table 1. Overview of queries in the LSQ dataset

                      SELECT ASK DESCRIBE CONSTRUCT       Total
               SWDF      58 741 157    26 533      52 85 483
               DBpedia 736 726 37 174     777   7 613 782 290
               LGD      265 410 25 140     24   7 004 297 578
               BM        29 073      0      0       0 29 073
               Total  1 089 950 62 471 27 334  14 669 1 194 424


    Table 1 shows the different query types and their distribution in the different datasets.
Those numbers contain only queries that parse successfully; LSQ also lists queries
with parse errors. They clearly show that the main amount of queries are SELECT
queries. Interestingly, the dataset for the British Museum’s endpoint only consists of
SELECT queries; all being uniformly structured. This implies that those are not actual
user queries but automatically created ones of some system. However, the queries from
the British Museum make up only a small part of the whole LSQ dataset. @
 1
   http://dbpedia.org/
 2
   http://linkedgeodata.org/
 3
   http://data.semanticweb.org/
 4
   http://bm.rkbexplorer.com
                             Table 2. Used SPARQL features

                          UNION FILTER DISTINCT GROUP BY EXISTS
                BM             0      0    29073        0     0
                DBpedia    36127 183882   144245        3     3
                LGD        28827 92558     66206       11     0
                SWDF       27982    818    39000      445   123
                Total      92936 277258   278524      459   126




   Looking at the used SPARQL features, FILTER and DISTINCT are the most promi-
nent, with UNION coming in second. GROUP BY and EXISTS play only a minor role.
However, note that DISTINCT is technically a GROUP BY over all variables in the
SELECT clause.
    The focus of our analysis is on SELECT and ASK queries. For DESCRIBE queries
there is no formal definition of how a result is produced, thus making it impossible to
determine if the process is safe; whereas CONSTRUCT queries do not directly trans-
late to query answering as it generates a new graph and not a result set. Furthermore,
our current implementation cannot analyze queries containing features, like GROUP
BY, DISTINCT, EXISTS, OPTIONAL, and endpoint specific functions not part of the
SPARQL recommendation (e.g. Oracle’s version of COUNT). After removing those we
are left with 507 241 queries for which we can determine query safeness.
    We used parts of Ontop [3], a system for deterministic OBDA, to translate the
SPARQL queries to (union of) conjunctive queries. Those queries were then processed
by Algorithm 1 to determine their safeness. The implementation is written in Java and
the experiments were run on an Intel Core i5@2.9 GHz with 4GB of RAM for the Java
VM.
    The distribution of safe and unsafe queries is shown in Table 3. With 99.68%, almost
all of the analyzed queries are safe. Regarding the total number of 1 194 424 queries,
42.33% are definitly safe. Note that this does not imply that 57.67% of the queries
are unsafe. Those queries need a more thorough analysis because of their usage of
DISTINCT/GROUP BY, FILTER, HAVING, EXISTS, or OPTIONAL.
    Looking at the numbers for the different data sources, the percent of unsafe queries
sent to the Semantic Web Dog Food endpoint is significantly higher than for the other
two. Analyzing those queries we found the major number being of the form SELECT
?targetType WHERE { ?obj a . ?obj 
?targetObj. ?targetObj a ?targetType.}. This suggest that those query
are generate by some tool and not by a user. Overall, this shows that the users’ general
information needs can also be satisfied over probabilistic data in a tractable way.
   Our implementation of the algorithm is still quite simple and we are not able to
simplify all queries. Thus determining their safeness becomes sometimes unfeasible.
This results in a timeout for 69 queries after 10 minutes. Overall, the average processing
time for a query is 86ms.
                   Table 3. Analysis results for queries in the LSQ dataset

                      DBpedia         LGD          SWDF          Total
                          #    %        #    %       #    %        #    %
             Safe   287 487 99,72 199 636 99,79 18 240 97,93 505 563 99,68
             Unsafe     809 0,28      414 0,21 386 2,07 1 609 0,32
             Total 288 296        200 050       18 626       507 172



4   Related Work

There is some other work analyzing the characteristics of deterministic SPARQL queries
from different sources. To the best of our knowledge, there is no analysis of the safeness
of real-world SPARQL queries for probabilistic data.
    In [11] Picalausa et al. analyzed 3 million queries from DBpedia query logs. They
come to the conclusion that the majority of the queries therein are tractable for the
deterministic case.
    Arias et al. [8] observe that most real-world SPARQL queries (USEWOD2011
dataset [2] containing queries from DBPedia and Semantic Web Dog Food) are star-
shaped and do not contain property chains. As Jung et al. [10] have shown that tractable
queries are generally star-shaped, their findings also coincide with our results.
    Han et al. [9] also analyzed queries in LSQ. Similarly to Picalausa et al. they come
to the conclusion that most queries are in PT IME. They also show that most queries have
a simple structure consisting of three or less triple patterns, making it less probable for
them having a structure that is unsafe. An unsafe query must at least consist of either
two triple patterns with a self-join (the same predicate), or three triple patterns (cf.
unsafe queries given in [15]).


5   Conclusion and Future Work

In this paper we analyzed half a million queries from the LSQ dataset; those which
can directly be translate to (union of) conjunctive queries. Checking for probabilistic
query safeness we found nearly all queries to be safe and thus tractable when processed
over probabilistic data. Relating to the whole set of queries, about half of them are safe;
for the other half there is no information in our analysis. This shows that the general
information needs a user of the public SPARQL endpoints serving as sources for the
LSQ dataset has are also feasible to be fulfilled on uncertain data. Note that there is
no uncertain version of those dataset. However, we argue that the users’ information
needs would not change over uncertain data, and that it is possible to create such a
version of DBpedia, e.g. by incorporating information about trustworthiness in general
or knowledge about how commonly a certain property is accurately extracted. To the
best of our knowledge this is the first work that analyzes real-world SPARQL queries
in the light of probabilistic data.
    For future work, we have two different directions. First, we want to further study
queries using constructs like DISTINCT/GROUP BY or OPTIONAL and how they
influence query safeness. Second, considering unsafe queries, we want to analyze their
structure and investigate ways of handling those effectively, e.g. through approximation
or simplification similarly to approaches present in probabilistic databases.


References
 1. Artale, A., Calvanese, D., Kontchakov, R., Zakharyaschev, M.: The DL-Lite Family and
    Relations. Journal of Artificial Intelligence Research 36, 1–69 (2009)
 2. Berendt, B., Hollink, L., Hollink, V., Luczak-Rösch, M., Möller, K., Vallet, D.: USE-
    WOD2011 - 1st International Workshop on Usage Analysis and the Web of Data.
    Proceedings of the 20th International Conference Companion on World Wide Web,
    WWW 2011 pp. 305–306 (2011), http://www.scopus.com/inward/record.url?eid=2-s2.0-
    79955127036&partnerID=40&md5=6ec8c19cff2e223d891c92e7c6e236e7
 3. Calvanese, D., Cogrel, B., Komla-ebri, S., Kontchakov, R., Lanti, D.: Ontop : Answer-
    ing SPARQL Queries over Relational Databases. Semantic Web journal 0(0) (2015),
    http://www.semantic-web-journal.net/system/files/swj1004.pdf
 4. Calvanese, D., Giacomo, G., Lembo, D., Lenzerini, M., Rosati, R.: Tractable Reasoning and
    Efficient Query Answering in Description Logics: The DL-Lite Family. Journal of Auto-
    mated Reasoning 39(3), 385–429 (jul 2007), http://link.springer.com/10.1007/s10817-007-
    9078-x
 5. Carlson, A., Betteridge, J., Kisiel, B.: Toward an Architecture for
    Never-Ending        Language       Learning.     In    Proceedings     of    the     Confer-
    ence on Artificial Intelligence (AAAI) (2010) pp. 1306–1313 (2010),
    http://www.aaai.org/ocs/index.php/AAAI/AAAI10/paper/download/1879/2201
 6. Dalvi, N., Suciu, D.: The Dichotomy of Probabilistic Inference for
    Unions of Conjunctive Queries. Journal of the ACM 59(6), 1–87 (2012),
    http://dl.acm.org/citation.cfm?doid=2395116.2395119
 7. De Raedt, L., Kimmig, A., Toivonen, H.: ProbLog: A Probabilistic Prolog and its Application
    in Link Discovery. In: IJCAI International Joint Conference on Artificial Intelligence. pp.
    2468–2473 (2007)
 8. Gallego, Mario Arias Fernández, Javier D. Martı́nez-Prieto, M.A., de la Fuente, P.: An Em-
    pirical Study of Real-World SPARQL Queries. In: 1st International Workshop on Usage
    Analysis and the Web of Data (USEWOD2011) at the 20th International World Wide Web
    Conference (WWW 2011). pp. 10–13 (2011), http://arxiv.org/abs/1103.5043
 9. Han, X., Feng, Z., Zhang, X., Wang, X., Rao, G., Jiang, S.: On the Statistical
    Analysis of Practical SPARQL Queries. arXiv preprint arXiv:1603.06729 p. 6 (2016),
    http://arxiv.org/abs/1603.06729
10. Jung, J.C., Lutz, C.: Ontology-Based Access to Probabilistic Data with OWL QL. In: The
    Semantic Web - ISWC 2012. pp. 182—-197. Springer (2012)
11. Picalausa, F., Vansummeren, S.: What are Real SPARQL Queries Like? Proceedings of
    the International Workshop on Semantic Web Information Management pp. 1–6 (2011),
    http://portal.acm.org/citation.cfm?doid=1999299.1999306
12. Riguzzi, F., Bellodi, E., Lamma, E., Zese, R.: Probabilistic Description Logics un-
    der the Distribution Semantics. Semantic Web journal (2014), http://www.semantic-web-
    journal.net/system/files/swj651.pdf
13. Saleem, M., Ali, M.I., Hogan, A., Mehmood, Q., Ngonga Ngomo, A.C.: LSQ: The Linked
    SPARQL Queries Dataset. Lecture Notes in Computer Science (including subseries Lecture
    Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 9367, 261–269 (2015)
14. Schoenfisch, J., Stuckenschmidt, H.: Towards Large-Scale Probabilistic OBDA. In: Beierle,
    C., Dekhtyar, A. (eds.) Scalable Uncertainty Management: 9th International Conference,
    SUM 2015, Québec City, QC, Canada, September 16-18, 2015. Proceedings, pp. 106—-
    120. Springer International Publishing, Cham (2015), http://dx.doi.org/10.1007/978-3-319-
    23540-0 8
15. Suciu, D., Olteanu, D., Ré, C., Koch, C.: Probabilistic Databases. Morgan & Claypool Pub-
    lishers (2011)