A General-Purpose Visual Query Language for Knowledge Graphs with Bidirectional Transformations Qiang Fu1 , Xin Wang1 , and Yuan-Fang Li2 1 College of Intelligence and Computing, Tianjin University, Tianjin, China {tomqcust,wangx}@tju.edu.cn 2 Monash University, Melbourne, Australia yuanfang.li@monash.edu. Abstract. In this paper, we present a general-purpose interactive visual query language, GPVQL, to improve the efficiency of end-users’ under- standing and querying of knowledge graphs. Furthermore, GPVQL real- izes the novel capability of flexible bidirectional transformations between query patterns and graph results, therefore significantly assists end-users in formulating queries over large and unfamiliar knowledge graphs in an incremental way. We present the syntax and semantics of GPVQL, dis- cuss our design rationale behind this interactive visual query language, and evaluate the effectiveness of a visual query system based on GPVQL against a number of textual and visual query environments over a large knowledge graph, DBpedia. Our evaluation demonstrates the GPVQL’s superiority in effectiveness and accurateness. Keywords: Knowledge graphs · Visual query language · Interactive · Bidirectional transformation. 1 Introduction Artificial intelligence has become a powerful tool to meet practical requirements in various domains. Knowledge graphs have been identified as a critical com- ponent in diverse AI-based applications. Thus, designing query languages to support the effective and efficient exploration and query answering over knowl- edge graphs has become a key research problem under insentive investigation. However, a number of key challenges still remain on the generalizability and ease of use of these query languages. An illustrating example of the process from a question to the corresponding query results is shown in Fig. 1. As can be seen from the textual query (in SPARQL) at the top middle in Fig. 1, it may be difficult for an end-users to quickly learn and use such a query language, or to be familiar with a knowledge graph. Visual query languages can help make it easier for users to construct query Copyright c 2020 for this paper by its authors. Use permitted under Creative Com- mons License Attribution 4.0 International (CC BY 4.0). Tabular result Textual language SELECT * WHERE { Bob Black influencedBy ?x. ?y influencedBy ?x.} Unidirectional Question query Find philosophers who Edit were influenced by people that influenced Bob Black. Draw ?z ?z Bob Black ?m ?x influencedBy birthYear ?x ?z ?y influencedBy influenced Bidirectional ?n ?x ?y influencedBy deathYear transformation ?m ?n Visual language Graphical result Fig. 1. The process from question to result. patterns, as shown in the bottom middle of Fig. 1. However, end-users may not understand the correspondence between a query pattern and its results (such as those nodes labeled by the same variable name at the bottom middle/right of Fig. 1) when they are exploring a knowledge graph. Such correspondences allow a user to easily modify and expand the current query to build more complex ones and are supported by the bidirectional transformation functionality provided by our GPVQL visual language. In this paper, we introduce the visual syntax and semantics of GPVQL, discuss our design rationale, and demonstrate its accurateness and effectiveness in a user study. 2 Our Approach In GPVQL, (1) single circles with dotted and solid lines indicate variables and constants, respectively; (2) two single circles connected by a directed edge denote a basic triple pattern; and (3) double circles and rectangles represent operators and parameters in a query pattern, respectively. We illustrate the syntax and semantics of GPVQL with five examples in Table 1 and the corresponding queries in three textual query languages: SPARQL, Cypher and Gremlin. In GPVQL, end-users do not need to learn a specific textual query language to write queries. What end-users need to learn are just query examples that can then be used in an interactive query-by-example (QBE [2]) way. Based on our proposed GPVQL language, we have developed a visual inter- face that addresses these challenges to achieve our goal of enabling end-users to query and explore knowledge graphs. The interface 3 of GPVQL is composed of six main components that are displayed in four panes in Fig. 2. Fig. 2 shows three components: an interactive visual query editor (left pane), keyword and type search panel (right top pane), and query code display panel (right bottom pane). In the top left are buttons for additional functionality: adding new nodes, showing user manual, user study documents, using force-directed layout, etc. In GPVQL, the construction of a query pattern is mainly done in a drag-and-drop manner. The operations (i.e., Expand, Collapse, Filter, Lock, Optional, and U- Table 1. The examples of visual syntax and semantics in GPVQL. ID Query pattern of GPVQL SPARQL Cypher Gremlin MATCH ?city Texas SELECT ?city WHERE { (x:City)-[:locatedIn]->(y:State) g.V(‘Texas’) P1 located in ?city locatedIn Texas.} WHERE y.name = "Texas" .in(‘locatedIn’) RETURN x ?city MATCH metro population?metro SELECT ?city WHERE { g.V() P2 (x:City)-[:metro]->(y:Population) ?city metro ?metro.} .in(‘metro’) RETURN x, y Texas MATCH P1 ?city located in SELECT ?city WHERE { (x:City)-[:locatedIn]->(y:State) g.V(‘Texas’) AND ?metro ?city locatedIn Texas. WHERE y.name = "Texas" AND .in(‘locatedIn’) P2 metro po ?city metro ?metro.} (x:City)-[:metro]->(z:Population) .and(out(‘metro’)) pulation RETURN x AND Texas MATCH P1 ?city located in SELECT ?city WHERE { (x1:CITY)-[:locatedIn]->(y:State) g.V(‘Texas’) ?metro ?city locatedIn Texas.} WHERE y.name = "Texas" UNION .in(‘locatedIn’) metro po UNION RETURN x1 UNION MATCH P2 pulation .or(out(‘metro’)) {?city metro ?metro.} (x2:CITY)-[:metro]->(z:Population) UNION RETURN x2 Texas MATCH P1 ?city located in SELECT * WHERE { (x:CITY)-[:locatedIn]->(y:State) g.V(‘Texas’) ?metro ?city locatedIn Texas. OPTIONAL MATCH OPT .in(‘locatedIn’) metro po OPTIONAL{ (x:CITY)-[:metroPopul]->(m:Population) P2 pula tion .or(out(‘metro’)) ?city metro ?metro.}} WHERE y.name = "Texas" OPT RETURN x, m nion) are in the right-click context menu. As shown in Fig. 2, GPVQL not only supports keyword queries, but also type-based queries as the starting point. The process starts with a blank visual query editor. A user can choose keyword or type as the starting point for queries. For example, as shown in Fig. 2, after the user adds a new node 1 , and input “Bob” as a keyword 2 , the related entities will be automatically displayed. Once “Bob Black” is chosen, the thumbnail 3 will be displayed automatically. Further, the user can find people who have in- Fig. 2. Example visual query finding people who influenced both Bob Black and people influenced by Bob Black. fluenced both Bob Black and people influenced by Bob Black (the corresponding query pattern is marked with red arrows). Users can select the Expand opera- tion in the context menu to expand a result. The expanded results are pointed to with blue arrows. When users are interested in some query result, deep ex- ploration to query additional information is supported. For example, users can find the birth year and death year of Peter Kropotkin, or people who influenced Marshall Sahlins (pointed to with green arrows). 3 User Study and Evaluation We conducted a user study to assess how accurate and effective our new visual query language GPVQL 3 is in comparison to the current semantic query lan- guage SPARQL, a visual query language QueryVowl [1], and RDF Explorer [3]. We used the same SPARQL endpoint 4 of the DBpedia knowledge graph to en- sure the fairness of the study. We created five tasks, shown in Table 2, based on an informal survey of interesting patterns that people formed when exploring prior graph query research works. We recruited 20 participants in total and di- vided them into four groups evenly. Each group uses a different tool, e.g., Group A + GPVQL, Group B + SPARQL, Group C + QueryVOWL, and Group D + RDF Explorer. Table 2. Tasks of the user study. Task 1 Task 2 Task 3 Find people who influenced both Find philosophers influenced Find people influenced Bob Description Bob Black and people influenced by Bob Black. Black and Karl Marx. by Bob Black. Bob Black Bob Black ?philosopher Bob Black influencedBy ?x influencedBy ?y Query influencedBy pattern Karl Marx influenced influencedBy ?x influencedBy Task 4 Task 5 Find year of birth and death of people who Find people whose birth date is after 1750 Description influenced both Bob Black and people influenced and influenced philosophers that influenced both by Bob Black. Bob Black and people influenced by Bob Black. Bob Black ?m Bob Black influencedBy influencedBy ?y ?z ?y birthYear influencedBy Query pattern influenced influenced ?n influencedBy deathYear ?x influencedBy FILTER: ?x ?birthDate>1750 We ranked the difficulty of each task based on the number of nodes, edges, and constraints needed. As can be seen in Table 2, the tasks are related to each other, and that later task can be built incrementally from previous tasks. This design is intended to simulate the actual workflow in a real-world setting, where an end-user typically writes related and incrementally more complex queries, but not unrelated random ones. From Fig. 3, we can obtain the following conclusions: (1) Among the five tasks, the query completion times of GPVQL are the shortest; (2) As the tasks become more difficult, the query completion times (resp. accuracy) of SPARQL are gradually increasing (resp. reducing). However, GPVQL outperforms all the other systems: SPARQL, QueryVOWL, and RDF Explorer, and the accuracy of GPVQL remains above 60%. As shown in Fig. 3(b), GPVQL moderately out- performs QueryVOWL on accuracy for all tasks, and significantly outperforms QueryVOWL for the more complex ones, i.e., Task 4 and Task 5. Compared 3 http://gpvql.gq/ 4 http://dbpedia.org/sparql/ 160 GPVQL 100 GPVQL SPARQL SPARQL 140 QueryVOWL QueryVOWL RDF Explorer RDF Explorer 80 120 Mean Accuracy(%) 100 60 Time(in s) 80 60 40 40 20 20 0 0 Task1 Task2 Task3 Task4 Task5 Task1 Task2 Task3 Task4 Task5 (a) Mean completion time for each tool (b) Mean accuracy for each tool per per task. task. Fig. 3. Experiment results on completion time and accuracy. with RDF Explorer, GPVQL reduces the completion time by approx. 15%, and increases the accuracy by about 10%. In summary, our evaluation demonstrates the superiority of GPVQL in both aspects of completion time and accuracy. We believe that the main reasons of GPVQL’s superiority are bidirectional transfor- mation and the deep exploration it facilitates. For changes of the query tasks, traditional query languages requires users to construct new queries from scratch, which reduces the speed, usability, and user-friendliness. With the bidirectional transformation in GPVQL, users can use query results of the previous query pattern as the input of the next query pattern, which reduces the difficulty and time used of creating queries. 4 Conclusion In this paper, we propose a general-purpose visual query language. The main advantage of GPVQL is the flexible bidirectional transformation between query patterns and graph results, which is a useful method for end-users to gain insights of large-scale knowledge graphs, and eliminates the boundary between query patterns and graph results. We experimentally validated the accurateness and effectiveness of GPVQL and its associated interface, showing its superiority over other visual query languages. References 1. Haag, F., Lohmann, S., Siek, S., Ertl, T.: Queryvowl: A visual query notation for linked data. In: International Semantic Web Conference. pp. 387–402. Springer (2015) 2. Jayaram, N., Khan, A., Li, C., Yan, X., Elmasri, R.: Querying knowledge graphs by example entity tuples. IEEE Transactions on Knowledge and Data Engineering 27(10), 2797–2811 (2015) 3. Vargas, H., Buil-Aranda, C., Hogan, A., López, C.: Rdf explorer: A visual sparql query builder. In: International Semantic Web Conference. pp. 647–663. Springer (2019)