=Paper=
{{Paper
|id=Vol-2456/paper7
|storemode=property
|title=A CONSTRUCT-Based Query for Weighted RDF Graph Analytics
|pdfUrl=https://ceur-ws.org/Vol-2456/paper7.pdf
|volume=Vol-2456
|authors=Xin Song,Shizhan Chen,Xiaowang Zhang,Zhiyong Feng
|dblpUrl=https://dblp.org/rec/conf/semweb/SongCZF19
}}
==A CONSTRUCT-Based Query for Weighted RDF Graph Analytics==
A CONSTRUCT-based Query for Weighted RDF Graph Analytics Xin Song, Shizhan Chen, Xiaowang Zhang? , and Zhiyong Feng College of Intelligence and Computing, Tianjin University, Tianjin, China Tianjin Key Laboratory of Cognitive Computing and Application, Tianjin, China ? Corresponding Author: xiaowangzhang@tju.edu.cn Abstract. In this paper, based on SPARQL CONSTRUCT query, we present a new query form called SPARQL Analyze Query, which inte- grates graph analytics with graph querying. SPARQL Analyze Query ex- tends SPARQL by performing graph algorithms via WITH and PRODUCE operators, where WITH is used for invoking a graph algorithm and PRODUCE for designating results of variables. SPARQL Analyze Query supports two classes of graph algorithms over the weighted RDF graph w.r.t. nodes and shortest paths. Besides, we illustrate the feasibility of the SPARQL Analyze Query in expressing PageRank. 1 Introduction In graph databases, graph analytics and graph query are often discussed in two separate fields, but we often perform graph analytics and graph query at the same time on one task. Although there are some graph analytical engines that support graph analytics by means of graph query languages, they cannot combine graph analytics with graph querying well. One of them is Neo4j and its query language Cypher [1] which lacks the formalization of graph analytics. In terms of SPARQL, the standard graph query language of RDF, there are some works on extending SPARQL with graph analytics [2–4], but they also lack the formalization and cannot perform graph analytics in a compact way. As graph analytics should be performed over a graph, we build SPARQL Analyze Query on SPARQL CONSTRUCT query, which is one of the four query forms of SPARQL and also the only form whose result is a graph. We follow the formalization of the CONSTRUCT query introduced in [5] and briefly recall some notions. Given a SPARQL graph pattern P , a CONSTRUCT query Qc = CONSTRUCT H WHERE P . H is a set of triple patterns from (I ∪ L ∪ V ) × (I ∪ V ) × (I ∪ L ∪ V ), where I, L, V are the sets of IRIs, literals and variables. Besides, we do not consider blank nodes in our work. An RDF graph G = (N, E), where N and E denote the sets of nodes and edges of G respectively. We follow the definitions of weighted RDF graph S = * Copyright c 2019 for this paper by its authors. Use permitted under Creative Com- mons License Attribution 4.0 International (CC BY 4.0). 2 X. Song et al. (G, ω) where ω denotes the weight function, and shortest path π introduced in [6]. Especially, we only consider the weight function on edges ω : E → R≥0 . Inspired by Neo4j, SPARQL Analyze Query supports two classes of graph algorithms w.r.t. nodes and shortest paths, which are PageRank (PR), Between- ness Centrality (BC), Closeness Centrality (CC), Shortest Path (SP), Single Source Shortest Path (SSSP) and All Pair Shortest Path (ALSP). 2 Motivating Example We provide an example of expressing PageRank over a social network to direct- ly illustrate how SPARQL Analyze Query performs. For clarity, the following example is based on our semantics introduced in Section 4 while the SPARQL CONSTRUCT query introduced in [5] cannot output a weighted RDF graph. The following CONSTRUCT query Qc generates a social network which con- sists of all people that ?p1 knows. The generated weighted RDF graph is shown in Figure 2 where A, B, . . ., H are nicknames of each person respectively. CONSTRUCT { ?p1 knows ?p2.} WHERE { ?m knows ?n. ?m nickName ?p1. ?n nickName ?p2. } Fig. 1. An example of CONSTRUCT query Figure 2 shows a weighted RDF graph where each knows edge is acquired from WHERE clause along with its original weight. In this example, we assume the weights denote the interpersonal influence. G 8 D 5 A 9 7 4 3 2 7 F C 1 6 10 2 13 H E B : * knows Fig. 2. A social network generated by Qc in Figure 1 Based on Figure 1, we provide a SPARQL Analyze Query with PageRank algorithm in Figure 3 that reveals the person whose nickname is E has the greatest interpersonal influence in the social network. A CONSTRUCT-based Query for Weighted RDF Graph Analytics 3 Qc WITH PR({?p1,?p2},{knows},10,0.85) ?node ?pr PRODUCE ?node ?pr E 0.776535 ORDER BY DESC (?pr) G 0.604325 LIMIT 3 H 0.579038 Fig. 3. A SPARQL Analyze Query with PageRank algorithm 3 Syntax of SPARQL Analyze Query Given a CONSTRUCT query Qc , a SPARQL Analyze query QA is defined as follows: Qc WITH C PRODUCE v sq [ORDER BY] [LIMIT] where – Qc is a SPARQL CONSTRUCT query; – C is an analyzing clause of one of the following six forms: Given a weighted RDF graph S, let NH and EH be sets of nodes and edges occurring in H respectively, where nodes and edges can be variables or con- stants. • PR : PR (NH , EH , n, dF ), where n ∈ N+ is the number of iterations in PageRank and dF ∈ (0, 1) is the damping factor; • BC : BC (NH , EH ); • CC : CC (NH , EH ); • SP : SP (NH , EH , u, v) where u, v ∈ I ∪ L; • SSSP : SSSP (NH , EH , u) where u ∈ I ∪ L; • ALSP : ALSP (NH , EH ); – PRODUCE v sq designates the output of a SPARQL Analyze Query of variables, where v sq is defined as a sequence of two variables (v1 , v2 ), v1 , v2 ∈ V , and the parentheses () denote an ordered sequence of variables; – [ORDER BY] and [LIMIT] are standard SPARQL fragments. They are op- tional to use for controlling the order and the number of solutions. 4 Semantics of SPARQL Analyze Query To be compatible with weighted RDF graphs, we extend the semantics intro- duced in [5] by means of a weight w ∈ R≥0 . Given a weighted RDF graph S, the evaluation of a SPARQL graph pattern P over S is denoted by JP KS . We introduce some new terminology for SPARQL Analyze Query. Let VN ∈ V and VE ∈ V be the sets of disjoint node and edge variables respectively. Given a node u, a shortest path π, and let A be the set of real numbers. We have two mappings λ : VN → N , τ : VE → E and two functions f : u → a and g : π → a, where f ∈ F, g ∈ G and a ∈ A. F = {P R, BC, CC} and G = {SP, SSSP, ALSP } are two function sets where each function represents the computation of a graph algorithm. For example, the betweenness centrality of u is obtained by BC(u). 4 X. Song et al. For a SPARQL Analyze Query QA , we say that Q = Qc WITH C is the analytical part of QA . Given a graph pattern P , a weighted RDF graph S, a CONSTRUCT query Qc and an analyzing clause C. The semantics of an an- alytical part Q of QA returns a set of pairs φ, denoted by JQKS , defined as follows: {(u, f (u)) | λ ∈ JP KS , f (u) ∈ A} C = PR, BC, CC JQKS = {(π, g(π)) | λ, τ ∈ JP KS , g(π) ∈ A} C = SP, SSSP, ALSP On the above formalization and given an ordered sequence of variables v sq. The semantics of a SPARQL Analyze Query QA , denoted by JQA KS , is defined as follows: JQA KS = JQ PRODUCE v sqKS = {φ |(v1 ,v2 ) | φ ∈ JQKS } where φ |v1 ,v2 is the restriction of φ to (V, V ) correspondingly of its domain. 5 Conclusion In this paper, we present SPARQL Analyze Query which integrates graph query- ing with graph analytics. Our approach provides a logical foundation for inte- grating querying with analytics, which makes it possible to study graph querying and analytics in a unified way theoretically. In future work, we will study some fundamental problems of SPARQL Analyze Query such as expressivity and com- plexity as well as its generalization of more graph algorithms. Acknowledgments This work is supported by the National Key Research and Development Pro- gram of China (2017YFC0908401) and the National Natural Science Foundation of China (61972455, 61672377). Xiaowang Zhang is supported by the Peiyang Young Scholars in Tianjin University (2019XRX-0032). References 1. Francis, N., Green, A., Guagliardo, P., Libkin, L., et al.: Cypher: An evolving query language for property graphs. In: Proc. of SIGMOD, pp. 1433–1445 (2018). 2. Erétéo, G., Gandon, F., Buffa, M.: Semantic social network analysis. In: Proc. of WebSci, pp. 1–5 (2009). 3. Techentin, R.W., Gilbert, B.K., Lugowski, A., Deweese, K., et al.: Implementing iterative algorithms with SPARQL. In: Proc. of ICDT, pp. 216–223 (2014). 4. Mizell, D., Maschhoff K.J., Reinhardt, S.P.: Extending SPARQL with graph func- tions. In: Proc. of BigData, pp. 46–53 (2014). 5. Kostylev, E.V., Reutter, J.L., Ugarte, M.: CONSTRUCT queries in SPARQL. In: Proc. of ICDT, pp. 212–229 (2015). 6. Tartari, G., Hogan, A.: WiSP: Weighted shortest paths for RDF graphs. In: Proc. of ISWC, pp. 37–52 (2018).