=Paper= {{Paper |id=Vol-1741/jist2016pd_paper5 |storemode=property |title=PIWD: A Plugin-based Framework for Well-Designed SPARQL (Extended Abstract) |pdfUrl=https://ceur-ws.org/Vol-1741/jist2016pd_paper5.pdf |volume=Vol-1741 |authors=Xiaowang Zhang,Zhenyu Song,Zhiyong Feng,Xin Wang |dblpUrl=https://dblp.org/rec/conf/jist/ZhangSFW16a }} ==PIWD: A Plugin-based Framework for Well-Designed SPARQL (Extended Abstract)== https://ceur-ws.org/Vol-1741/jist2016pd_paper5.pdf
          PIWD: A Plugin-based Framework for
        Well-Designed SPARQL (Extended Abstract)

        Xiaowang Zhang1,3,4 , Zhenyu Song1,3 , Zhiyong Feng2,3 , and Xin Wang1,3
    1
   School of Computer Science and Technology, Tianjin University, Tianjin, China
    2
      School of Computer Software, Tianjin University, Tianjin 300350, China
3
  Tianjin Key Laboratory of Cognitive Computing and Application, Tianjin, China
      4
         Key Laboratory of Computer Network and Information Integration
      (Southeast University), Ministry of Education, Nanjing 211189, China



           Abstract. In the real world datasets (e.g.,DBpedia query log), queries
           built on well-designed patterns containing only AND and OPT opera-
           tors (for short, WDAO-patterns) account for a large proportion among
           all SPARQL queries. In this paper, we present a plugin-based frame-
           work for all SELECT queries built on WDAO-patterns, named PIWD.
           Theoretically, we can reduce the query evaluation of WDAO-patterns to
           subgraph homomorphism as well BGP since the query evaluation of BGP
           is equivalent to subgraph homomorphism. Furthermore, our preliminary
           experiments on gStore and RDF-3X show that PIWD can answer all
           queries built on WDAO-patterns effectively and efficiently.


1        Introduction
Resource Description Framework (RDF) is the standard data model in the se-
mantic web. The standard query language for RDF graphs is SPARQL. Though
SPARQL is powerful to expressing queries over RDF graphs, generally, the query
evaluation of the full SPARQL is PSPACE-complete [5].
     Currently, there are some popular query engines for supporting the full
SPARQL such as Jena [1]. However, they become not highly efficient when they
handle some large RDF datasets[7]. Currently, gStore[7] and RDF-3X[3] can
highly efficiently query large datasets. But gStore and RDF-3X merely provide
querying services of BGP. Therefore, it is necessary to develop a query engine
with supporting more expressive queries for large datasets.
     Since the OPT operator is the least conventional operator among SPARQL
operators [6], it is interesting to investigate those patterns extending BGP with
the OPT operator. In fact, we investigate that queries bulit on well-designed
patterns are very popular in a real world. For example, in LSQ, a Linked Dataset
describing SPARQL queries extracted from the logs of four prominent public
SPARQL endpoints containing more than one million available queries, queries
built on well-designed patterns are over 70% [2].
     Furthermore, queries with well-designed AND-OPT patterns (for short, WDAO-
patterns) are over 99% among those queries with well-designed patterns in LSQ
[2]. In a short, the fragment of WDAO-patterns are a natural extension of BGP.
    In this paper, we present a plugin-based framework for all SELECT queries
built on WDAO-patterns, named PIWD. Within this framework, we can employ
any query engine evaluating BGP for evaluate queries bulit on WDAO-patterns
in a convenient way.


2     Preliminaries
Well-Designed Patterns A UNION-free pattern P is well-designed if the fol-
lowings hold:
  – P is safe, that is, each subpattern of the form Q FILTER C of P holds the
    condition: var (C) ⊆ var (Q).
  – for every subpattern P 0 = (P1 OPT P2 ) of P and for every variable ?x
    occurring in P , the following condition hold: If ?x occurs both inside P2 and
    outside P 0 , then it also occurs in P1 .
    For instance, the pattern Q is a well-designed pattern.
    Note that the OPT operation provides really optional left-outer join due to
the weak monotonicity [5].


3     Well-Designed And-Opt Tree
Definition 1 (WDAO-tree). Let P be a well-designed pattern in OPT normal
form. A well-designed tree T based on P is a redesigned parse tree, which can be
defined as follows:
  – All inner nodes in T are labeled by OPT operators and leaf nodes are labeled
    by BGP.
  – For each subpattern (P1 OPT P2 ) of P , the well-designed tree T1 of P1 and
    the well-designed tree T2 of P2 have the same parent node.

    For instance, consider a WDAO-pattern P 5 = (((p1 AND p3 ) OPT2
p2 ) OPT1 ((p4 OPT4 p5 ) OPT5 (p6 OPT6 p7 ))).
    The WDAO-tree T is shown in Figure 1(a). As shown in this example, BGP
- (p1 AND p3 ) is the exact matching in P , which corresponds to the non-optional
pattern. Besides, in WDAO-tree, it is the leftmost leaf in T .


4     PIWD Demonstration
PIWD is written in Java in a 2-tier design shown in Figure 1(b). The bottom
layer consists of any BGP query framework which is used as a black box for
evaluating BGPs. Before answering SPARQL queries, the second layer provides
the rewriting process and left-outer join evaluation, which lead to the solutions.
    BGP query framework supports both query and RDF data management, such
as gStore, RDF-3X and so on, which solve the problem of subgraph isomorphism.
5
    We give each OPT operator a subscript to differentiate them so that readers under-
    stand clearly.
                   OPT1

         OPT2               OPT3

    p1 AND p3 p2     OPT4          OPT5

                    p4    p5    p6    p7
              (a) WDAO-tree                     (b) PIWD architecture

                               Fig. 1. PIWD Overview


PIWD provides the left-outer join between the BGPs. That is, the problem of
answering well-designed SPARQL has been transformed into the problem of
subgraph isomorphism and left-outer join between triple patterns.


5     Experiments and Evaluations

Experiments All experiments were carried out on a machine running Linux,
which has one CPU with four cores of 2.40GHz, 32GB memory and 500GB
disk storage. All of the algorithms were implemented in JAVA. gStore[7] and
RDF-3X[3] are used as the underlying query engines to handle BGPs. In our
experiments, there is no optimization in our OPT operation. In our experiments,
we used LUBM1, LUBM50, LUBM100, LUBM150 and LUBM200 as our query
datasets. The queries over LUBM were designed as four different forms, which
corresponds to different WDAO-trees. OPT nesting in Q2 is the most complex
among four forms. Furthermore, we built AND operators in each query.
    Evaluation on PIWD The variation tendencies of query response time are
shown in Figure 2. Query efficiency is decreased with higher response time when
OPT nesting becomes more complex. Furthermore, there has been a significant
increase in query response time when the dataset scale grows up. For instance,
we observe Q2 , which corresponds to the most complex pattern in our four
experimental SPARQL patterns. When the dataset is ranging from LUBM100
to LUBM200, its query response time extends more than five times even though
the dataset scale extends two times.


6     Conclusion

In this paper, we have presented PIWD, which is a plugin adaptable for any BGP
query framework to handle WDAO-patterns. Theoretically, PIWD rebuilds the
query plan based on WDAO-tree. After employing BGP query framework on
WDAO-tree, PIWD supports left-outer join operation between triple patterns.
Our experiments show that PIWD can deal with complex and multi-level nested
                              ·107                                                  ·107

                         1           Q1                                                    Q1
                                     Q2                                        1           Q2
                                     Q3                                                    Q3
                        0.8
                                     Q4                                       0.8          Q4




            Time[ms]




                                                                   Time[ms]
                        0.6
                                                                              0.6

                        0.4                                                   0.4

                        0.2                                                   0.2

                         0                                                     0

                              LUBM1 LUBM50 LUBM100LUBM150LUBM200                    LUBM1 LUBM50 LUBM100LUBM150LUBM200
                                          Dataset scale                                         Dataset scale

                       (a) Performance on gStore                     (b) Performance on RDF-3X

                                      Fig. 2. Query Response Time over LUBM


WDAO-patterns. In the future, we will further handle other non-well-designed
patterns and deal with more operations such as UNION. Besides, we will consider
OPT operation optimization to improve efficiency of PIWD and implement our
framework on distributed RDF graphs by applying the distributed gStore [4].


Acknowledgments
This work is supported by the program of the program of the National Key
Research and Development Program of China (2016YFB1000603), the National
Natural Science Foundation of China (NSFC) (61502336), and the open funding
project of Key Laboratory of Computer Network and Information Integration
(Southeast University), Ministry of Education (K93-9-2016-05). Xiaowang Zhang
is supported by Tianjin Thousand Young Talents Program and the project-
sponsored by School of Computer Science and Technology in Tianjin University.


References
 1. Carroll J.J., Dickinson I., Dollin C., Reynolds D., Seaborne A., and Wilkinson
    K. (2004). Jena: Implementing the semantic web recommendations. In: Proc. of
    WWW 2004, pages 74–83.
 2. Han X., Feng Z., Zhang X., Wang X., and Rao G. (2016). On the statistical analysis
    of practical SPARQL queries. In: Proc. of WebDB 2016, article 2.
 3. Neumann T. and Weikum G. (2010). The RDF3X engine for scalable management
    of RDF data. VLDB Journal, 19(1):91–113.
 4. Peng P., Zou L., Özsu M.T., Chen L., and Zhao D. (2016). Processing SPARQL
    queries over distributed RDF graphs. VLDB J., 25(2): 243–268.
 5. Pérez J., Arenas M., and Gutierrez C. (2009). Semantics and complexity of
    SPARQL. ACM Trans. Database Syst., 34(3):30–43.
 6. Zhang X. and Van den Bussche J. (2014). On the primitivity of operators in
    SPARQL. Inf. Process. Lett., 114(9):480-485.
 7. Zou L., Özsu M.T., Chen L., Shen X., Huang R., and Zhao D. (2014). gStore: A
    graph-based sparql query engine. VLDB J., 23(4):565–590.