=Paper= {{Paper |id=Vol-2456/paper7 |storemode=property |title=A CONSTRUCT-Based Query for Weighted RDF Graph Analytics |pdfUrl=https://ceur-ws.org/Vol-2456/paper7.pdf |volume=Vol-2456 |authors=Xin Song,Shizhan Chen,Xiaowang Zhang,Zhiyong Feng |dblpUrl=https://dblp.org/rec/conf/semweb/SongCZF19 }} ==A CONSTRUCT-Based Query for Weighted RDF Graph Analytics== https://ceur-ws.org/Vol-2456/paper7.pdf
      A CONSTRUCT-based Query for Weighted
             RDF Graph Analytics

          Xin Song, Shizhan Chen, Xiaowang Zhang? , and Zhiyong Feng

       College of Intelligence and Computing, Tianjin University, Tianjin, China
    Tianjin Key Laboratory of Cognitive Computing and Application, Tianjin, China
                  ?
                    Corresponding Author: xiaowangzhang@tju.edu.cn



        Abstract. In this paper, based on SPARQL CONSTRUCT query, we
        present a new query form called SPARQL Analyze Query, which inte-
        grates graph analytics with graph querying. SPARQL Analyze Query ex-
        tends SPARQL by performing graph algorithms via WITH and PRODUCE
        operators, where WITH is used for invoking a graph algorithm and
        PRODUCE for designating results of variables. SPARQL Analyze Query
        supports two classes of graph algorithms over the weighted RDF graph
        w.r.t. nodes and shortest paths. Besides, we illustrate the feasibility of
        the SPARQL Analyze Query in expressing PageRank.


1     Introduction

In graph databases, graph analytics and graph query are often discussed in
two separate fields, but we often perform graph analytics and graph query at
the same time on one task. Although there are some graph analytical engines
that support graph analytics by means of graph query languages, they cannot
combine graph analytics with graph querying well. One of them is Neo4j and its
query language Cypher [1] which lacks the formalization of graph analytics. In
terms of SPARQL, the standard graph query language of RDF, there are some
works on extending SPARQL with graph analytics [2–4], but they also lack the
formalization and cannot perform graph analytics in a compact way.
    As graph analytics should be performed over a graph, we build SPARQL
Analyze Query on SPARQL CONSTRUCT query, which is one of the four query
forms of SPARQL and also the only form whose result is a graph. We follow the
formalization of the CONSTRUCT query introduced in [5] and briefly recall
some notions. Given a SPARQL graph pattern P , a CONSTRUCT query Qc =
CONSTRUCT H WHERE P . H is a set of triple patterns from (I ∪ L ∪ V ) ×
(I ∪ V ) × (I ∪ L ∪ V ), where I, L, V are the sets of IRIs, literals and variables.
Besides, we do not consider blank nodes in our work.
    An RDF graph G = (N, E), where N and E denote the sets of nodes and
edges of G respectively. We follow the definitions of weighted RDF graph S =
*
    Copyright c 2019 for this paper by its authors. Use permitted under Creative Com-
    mons License Attribution 4.0 International (CC BY 4.0).
2      X. Song et al.

(G, ω) where ω denotes the weight function, and shortest path π introduced
in [6]. Especially, we only consider the weight function on edges ω : E → R≥0 .
    Inspired by Neo4j, SPARQL Analyze Query supports two classes of graph
algorithms w.r.t. nodes and shortest paths, which are PageRank (PR), Between-
ness Centrality (BC), Closeness Centrality (CC), Shortest Path (SP), Single
Source Shortest Path (SSSP) and All Pair Shortest Path (ALSP).


2   Motivating Example
We provide an example of expressing PageRank over a social network to direct-
ly illustrate how SPARQL Analyze Query performs. For clarity, the following
example is based on our semantics introduced in Section 4 while the SPARQL
CONSTRUCT query introduced in [5] cannot output a weighted RDF graph.
    The following CONSTRUCT query Qc generates a social network which con-
sists of all people that ?p1 knows. The generated weighted RDF graph is shown
in Figure 2 where A, B, . . ., H are nicknames of each person respectively.
                            CONSTRUCT { ?p1 knows ?p2.}
                            WHERE {
                              ?m knows ?n.  ?m nickName ?p1.
                              ?n nickName ?p2.
                            }

                        Fig. 1. An example of CONSTRUCT query

   Figure 2 shows a weighted RDF graph where each knows edge is acquired
from WHERE clause along with its original weight. In this example, we assume
the weights denote the interpersonal influence.



            G                 8
                                      D            5
                                                               A

                        9         7
                                               4       3


                                          2
              7               F                    C       1


                    6
                                              10       2



                             13
            H                         E                        B   :
                                                                         *
                                                                       knows




                  Fig. 2. A social network generated by Qc in Figure 1


   Based on Figure 1, we provide a SPARQL Analyze Query with PageRank
algorithm in Figure 3 that reveals the person whose nickname is E has the
greatest interpersonal influence in the social network.
          A CONSTRUCT-based Query for Weighted RDF Graph Analytics                    3


     Qc WITH PR({?p1,?p2},{knows},10,0.85)                ?node  ?pr
     PRODUCE ?node ?pr                                      E 0.776535
     ORDER BY DESC (?pr)                                    G 0.604325
     LIMIT 3                                                H 0.579038

            Fig. 3. A SPARQL Analyze Query with PageRank algorithm


3    Syntax of SPARQL Analyze Query

Given a CONSTRUCT query Qc , a SPARQL Analyze query QA is defined as
follows:
          Qc WITH C PRODUCE v sq [ORDER BY] [LIMIT]
where
 – Qc is a SPARQL CONSTRUCT query;
 – C is an analyzing clause of one of the following six forms:
   Given a weighted RDF graph S, let NH and EH be sets of nodes and edges
   occurring in H respectively, where nodes and edges can be variables or con-
   stants.
     • PR : PR (NH , EH , n, dF ), where n ∈ N+ is the number of iterations in
        PageRank and dF ∈ (0, 1) is the damping factor;
     • BC : BC (NH , EH );
     • CC : CC (NH , EH );
     • SP : SP (NH , EH , u, v) where u, v ∈ I ∪ L;
     • SSSP : SSSP (NH , EH , u) where u ∈ I ∪ L;
     • ALSP : ALSP (NH , EH );
 – PRODUCE v sq designates the output of a SPARQL Analyze Query of
   variables, where v sq is defined as a sequence of two variables (v1 , v2 ), v1 , v2 ∈
   V , and the parentheses () denote an ordered sequence of variables;
 – [ORDER BY] and [LIMIT] are standard SPARQL fragments. They are op-
   tional to use for controlling the order and the number of solutions.


4    Semantics of SPARQL Analyze Query

To be compatible with weighted RDF graphs, we extend the semantics intro-
duced in [5] by means of a weight w ∈ R≥0 . Given a weighted RDF graph S, the
evaluation of a SPARQL graph pattern P over S is denoted by JP KS .
    We introduce some new terminology for SPARQL Analyze Query. Let VN ∈
V and VE ∈ V be the sets of disjoint node and edge variables respectively.
Given a node u, a shortest path π, and let A be the set of real numbers. We
have two mappings λ : VN → N , τ : VE → E and two functions f : u → a
and g : π → a, where f ∈ F, g ∈ G and a ∈ A. F = {P R, BC, CC} and
G = {SP, SSSP, ALSP } are two function sets where each function represents
the computation of a graph algorithm. For example, the betweenness centrality
of u is obtained by BC(u).
4       X. Song et al.

    For a SPARQL Analyze Query QA , we say that Q = Qc WITH C is the
analytical part of QA . Given a graph pattern P , a weighted RDF graph S, a
CONSTRUCT query Qc and an analyzing clause C. The semantics of an an-
alytical part Q of QA returns a set of pairs φ, denoted by JQKS , defined as
follows:
              
              {(u, f (u)) | λ ∈ JP KS , f (u) ∈ A}
                                                     C = PR, BC, CC
      JQKS =
              
                {(π, g(π)) | λ, τ ∈ JP KS , g(π) ∈ A} C = SP, SSSP, ALSP
              

    On the above formalization and given an ordered sequence of variables v sq.
The semantics of a SPARQL Analyze Query QA , denoted by JQA KS , is defined
as follows:
          JQA KS = JQ PRODUCE v sqKS = {φ |(v1 ,v2 ) | φ ∈ JQKS }
where φ |v1 ,v2 is the restriction of φ to (V, V ) correspondingly of its domain.

5   Conclusion
In this paper, we present SPARQL Analyze Query which integrates graph query-
ing with graph analytics. Our approach provides a logical foundation for inte-
grating querying with analytics, which makes it possible to study graph querying
and analytics in a unified way theoretically. In future work, we will study some
fundamental problems of SPARQL Analyze Query such as expressivity and com-
plexity as well as its generalization of more graph algorithms.

Acknowledgments
This work is supported by the National Key Research and Development Pro-
gram of China (2017YFC0908401) and the National Natural Science Foundation
of China (61972455, 61672377). Xiaowang Zhang is supported by the Peiyang
Young Scholars in Tianjin University (2019XRX-0032).

References
 1. Francis, N., Green, A., Guagliardo, P., Libkin, L., et al.: Cypher: An evolving
    query language for property graphs. In: Proc. of SIGMOD, pp. 1433–1445 (2018).
 2. Erétéo, G., Gandon, F., Buffa, M.: Semantic social network analysis. In: Proc. of
    WebSci, pp. 1–5 (2009).
 3. Techentin, R.W., Gilbert, B.K., Lugowski, A., Deweese, K., et al.: Implementing
    iterative algorithms with SPARQL. In: Proc. of ICDT, pp. 216–223 (2014).
 4. Mizell, D., Maschhoff K.J., Reinhardt, S.P.: Extending SPARQL with graph func-
    tions. In: Proc. of BigData, pp. 46–53 (2014).
 5. Kostylev, E.V., Reutter, J.L., Ugarte, M.: CONSTRUCT queries in SPARQL. In:
    Proc. of ICDT, pp. 212–229 (2015).
 6. Tartari, G., Hogan, A.: WiSP: Weighted shortest paths for RDF graphs. In: Proc.
    of ISWC, pp. 37–52 (2018).