Detecting and Diagnosing Syntactic and Semantic Errors in SPARQL Queries Jesús M. Antonio Becerra-Terón Alfredo Cuzzocrea Almendros-Jiménez Dept. of Informatics DIA Department Dept. of Informatics University of Almería, Spain University of Trieste and University of Almería, Spain abecerra@ual.es ICAR-CNR, Italy jalmen@ual.es alfredo.cuzzocrea@dia.units.it ABSTRACT of data, or even knowing this structure, he/she can be frus- In this paper we present a tool to syntactically and seman- trated when the answer is empty or incomplete, mainly due tically validate SPARQL queries. With this aim, we extract to filter conditions that make the query unsatisfiable or too triple patterns and filter conditions from SPARQL queries restrictive. Query unsatisfiability can be also consequence and we use the OWL API and an OWL ontology reasoner in of the rich mechanisms for defining concepts and proper- order to detect wrong expressions. Given an ontology and ties, where the program can require inconsistent queries. A a query, the tool reports di↵erent kinds of programming er- query can be called inconsistent when matching of variables rors: wrong use of vocabulary, wrong use of resources and is incompatible with ontology consistency. In general, the literals, wrong filter conditions and wrong use of variables missing-answer problem is a consequence of query unsatisfia- in triple patterns and filter conditions. When the OWL on- bility and, in particular, of query inconsistency. Finally, ana- tology reasoner is used the tool reports a diagnosis. lyzing the main SPARQL implementations we have found that debugging mechanisms are missing. SPARQL inter- preters are usually equipped with syntactic analysis limited Keywords to grammar checking. SPARQL; RDF; OWL; Debbuging SPARQL programming errors can be due to several rea- sons. Firstly, queries can include wrongly typed expressions which mainly occur in triple patterns. The most typical case 1. INTRODUCTION is when triple patterns are incorrectly instantiated: The Semantic Web has adopted SPARQL [6] as query lan- ? x : age " Alice " guage. While SPARQL queries are usually simple, and the SPARQL (SQL-like) syntax can be easily learned, it does not Here age has range xsd:integer, and thus “Alice” is not a mean that SPARQL programmers can make no mistakes. It suitable value. Combining triple patterns more typing errors can happen for several reasons. can be found. For instance: Firstly, from the database perspective, SPARQL data have ? x : father ? y . ? x : born ? y (in most of cases) a sophisticated schema. The schema de- fines an ontology of concepts and relationships modeled in in which assuming that father range is person, and born RDF and OWL. RDF data have a simpler schema, but the range is country, then ?y cannot be bounded, and thus the RDF model is a graph, and the programmer needs to work answer will be empty. This also happens when variables are with paths and nodes from this graph. However, the pro- bounded to literals and resources at the same time: grammers are more used to handle table-like data structures, ? x rdf : type : human . ? y : age ? x typical in relational databases and SQL data. OWL data (RDF-like enriched data) have a very rich data schema, and However, there are cases not forbidden (in general) by OWL data handle concepts and properties, representing re- ontologies. The same resource can be both an individual lationships. Concept definitions in OWL can be sophisti- and a concept. Thus, the following pattern is well-typed: cated, involving complex subconcept and equivalence rela- ? x rdf : type ? x tionships, and property definitions can be equipped with a rich semantics, enabling inverse, (a)symmetric, (ir)reflexive, Not only variables can be wrongly typed, also concepts, (non) functional and transitive relationships. Secondly, the properties and individuals can occur in wrong positions. For SPARQL programmer can ignore the (complete) structure instance, with RDF and OWL vocabularies we can express queries like the following: ? x rdf : type rdf : type . ? x owl : S y m m e t r i c P r o p e r t y ? y Unfortunately, existing SPARQL implementations do not check types for triple patterns. In fact, they do not check the 2017, Copyright is with the authors. Published in the Workshop Proceed- vocabulary of concepts, properties and individuals either, ings of the EDBT/ICDT 2017 Joint Conference (March 21, 2017, Venice, and wrong spelling (for instance “fahter” instead of “father”) Italy) on CEUR-WS.org (ISSN 1613-0073). Distribution of this paper is leads to empty answers without any warning: permitted under the terms of the Creative Commons license CC-by-nc-nd 4.0 ? x : fahter ? y Some cases correspond to inconsistent queries, in which type checking and debugging facilities. Only benchmarking matching of variables is incompatible with ontology consis- datasets [7, 3] are publicly available to analyze the perfor- tency. Inconsistent queries are well-typed but variable bind- mance of SPARQL implementations. There are also works ing is not possible from the ontology definition. For instance, about analysis of data [5], but less attention has been paid to assuming father relation is irreflexive, a wrong triple pattern SPARQL code. Thus, our work opens up a promising line of is: research, and our work can be seen as a first approximation ? x : father ? x to the solution. Even when answering this question against a consistent ontology is not possible, and thus the answer will be empty, 2. VALIDATION OF SPARQL QUERIES existing SPARQL implementations are not able to detect it. For the validation process, we distinguish between syntac- In some cases, the inconsistency is not trivial and ontology tic and semantic validation, by using the OWL API and the reasoning has to be used. For instance, let us consider the OWL Reasoner, respectively. Type checking is carried out following triples: by the OWL API and the OWL Reasoner, and thus it can be considered as both syntactic and semantic checking. Incon- ? x : father ? y . ? y rdf : type ? z . ? z rdfs : subClassOf : motorcycle sistency of queries is only detected by the OWL Reasoner, and thus it is considered as semantic checking. Here, in order to detect the inconsistency of the query, we have to reason that father has range person and person 2.1 Syntactic Validation of SPARQL Queries is not a subclass of motorcycle. This is also the case of the The syntactic validation uses the OWL API in order to following triples: carry out the following kinds of validation: (a) Wrong use ? x rdf : type : mammal . ? x rdf : type : motorcycle of vocabulary, (b) Wrong use of resources and literals and (c) Wrong filter conditions. We analyze triple patterns s p in which mammal and motorcycle are disjoint and, there- o, where s is a subject, p is a property, and o is an object. fore, the user should be warned. Some cases depend on the In addition, we analyze filter conditions l op r, where op cardinality, for instance: can be one of <, >, >=, <=, =, !=, and l and r are the ? x father ? y . ? x father ? z . ? y age 30 . ? z age 45 left and right hand side of the operation, respectively. We assume that individuals, properties and concepts can share Here father and age properties are functional, and thus the same name, and also that object and data properties can none of the variables can be bound. More complex cases have the same name2 . In order to carry out the syntactic involve filter conditions, for instance: validation, we propose the following rules, expressing the SELECT ? x cases in which a syntactic error (i.e. (a), (b) and (c)) is WHERE { ? x : father ? y . ? x : father ? z found. We also assume an input ontology with namespace FILTER (? y != ? z ) } ns. Finally, except when it is specified, s, p and o (and in which the Boolean expression ?y != ?x contradicts the l, r) can be variables or ontology items (i.e., individuals, functional property of father. properties and concepts). In this paper, we present a tool to validate SPARQL Syn-1(a) s p ns:k where k does not belong to ns vocabu- queries. With this aim, we extract triple patterns and filter lary. ns can be also rdf/rdfs/owl. conditions from SPARQL queries and we use the OWL API Syn-2(a) s ns:k o where k is not a property of ns. ns can and an OWL ontology reasoner in order to detect wrong be also rdf/rdfs/owl. expressions. Given an ontology and a query, the tool re- Syn-3(a) ns:k p o where k does not belong to ns vocabu- ports di↵erent kinds of programming errors: wrong use of lary. ns can be also rdf/rdfs/owl. vocabulary, wrong use of resources and literals, wrong filter Syn-4(b) s ns:k o where o is a resource, k is a data property conditions and wrong use of variables in triple patterns and and it is not an object property of ns. filter conditions. When the OWL ontology reasoner is used, Syn-5(b) s ns:k o where o is a literal, and k is not a data the tool reports a diagnosis. The tool has been implemented property and it is an object property of ns. in XQuery, and it uses the SPARQL to SPIN 1 transforma- Syn-6(b) s p o where p is a literal. tion to get the SPARQL code in XML format. Once the code Syn-7(b) s p o where s is a literal. is transformed, an XQuery function traverses the SPARQL Syn-8(c) l op r, where l and r are literals of di↵erent type. code in order to extract triple patterns and filters. Next, The syntactic rules can be checked by using the ontol- the validation process is carried out by calling the OWL ogy signature: the full vocabulary (rules Syn-1, Syn-3), the API as well as the OWL reasoner HermiT from XQuery, property vocabulary (rules Syn-2, Syn-4, Syn-5), as well as which reports a diagnosis of inconsistent queries. The cur- the syntactic form of the triple pattern components (rules rent implementation covers the cases of triple patterns and Syn-6, Syn-7 and Syn-8). Let us remark that Syn-6 and Syn- filters in SELECT, ASK, CONSTRUCT, DESCRIBE, and 7 are already wrong according to the the SPARQL grammar OPTIONAL triple patterns. (and thus they are usually checked by SPARQL implemen- While programming errors have been studied in SQL [1, tations), but we have included them for completeness. 4, 2], as far as we know, the same topic has not studied for SPARQL yet, (except in a recent work [8] in which au- 2.2 Semantic Validation of SPARQL Queries thors study satisfiability of FILTER conditions). We have The semantic validation uses an OWL ontology reasoner tested the best state-of-art SPARQL implementations (see 2 Table 1) and we have found that they are not equipped with In general, ontology profiles do not consider object and data properties disjoint, while it happens in, for instance, 1 http://spinrdf.org/ OWL DL. Name URL ARQ SPARQL Jena https://jena.apache.org/documentation/query/ Protégé SPARQL Tab http://protegewiki.stanford.edu/wiki/SPARQL Query Twinkle: SPARQL Tools http://www.ldodds.com/projects/twinkle/ Virtuoso SPARQL Query Editor http://dbpedia.org/sparql SPARQLer - General purpose processor http://www.sparql.org/sparql.html Redland Rasqal RDF Query Demonstration http://librdf.org/query OpenUpLabs http://openuplabs.tso.co.uk/sparql wordnet.rkbexplorer.com http://wordnet.rkbexplorer.com/sparql/ DBPedia SNORQL http://dbpedia.org/snorql/ SPARQL editor | The British National Bibliography http://bnb.data.bl.uk/flint-sparql YASGUI SPARQL Editor http://cliopatria.swi-prolog.org/yasgui/index.html SPARQL Carsten Editor http://sparql.carsten.io/ Table 1: SPARQL implementations in order to detect wrongly typed and inconsistent queries. Sem-2(a) s ns:k ?w, where ns is not rdf/rdfs/owl, k is a With this aim, the main idea is to consider the variables data property of ns and it is not an object property of ns. occurring in triples and filter conditions as individuals of In this case ?w rdf:type DTt is added to the ontology, for the ontology to be queried. In other words, each variable each range t of ns:k. Additionally s rdf:type D is added for ?x becomes in ns:x where ns is the namespace of the on- each domain of ns:k. tology. Additionally triple patterns in which ?x occurs, be- Sem-3(a) s ns:k l, where ns is not rdf/rdfs/owl, and l is a come property and concept assertions about ns:x, and fil- literal (thus ns:k is a data property). In this case s ns:k l is ter conditions in which ?x occurs become owl:sameAs and added to the ontology. owl:di↵erentFrom assertions. A new ontology is built from Sem-4(a) s ns:k o, where ns is not rdf/rdfs/owl, and k is the original one in which these property and concept asser- both an object and a data property of ns (o is not a literal, tions, extracted from triple patterns and filter conditions, otherwise this is the previous case). In this case s rdf:type are added. Assuming the original ontology is consistent, the D is added for each domain of ns:k. ontology reasoner is used to detect the consistency of this Sem-5(a) s ns:k l, where ns is rdf/rdfs/owl and l is a literal. new ontology. In case the new ontology is inconsistent, the In this case s rdf:type DR is added to the ontology. SPARQL query is wrongly typed or inconsistent. Sem-6(a) s ns:k ?w, where ns is rdf/rdfs/owl and k is Next, we will give rules for constructing these property a data property of ns. In this case s rdf:type DR and ?w and concept assertions from triple patterns and filter condi- rdf:type DT are added to the ontology. tions. In order to use the ontology reasoner, there are ad- Sem-7(a) s ns:k o, where ns is rdf/rdfs/owl, and k is an ditional modifications in the original ontology. (1) Firstly, object property of ns. In this case s rdf:type DR and o concepts for which it is known that the intersection is empty rdf:type DR are added to the ontology. have to be explicitly defined as disjoint. (2) Secondly, two Sem-8(a) s ?v o. In this case ?v rdf:type DR is added to additional concepts are included in the ontology: DR and the ontology. DT. They represent ontology resources and literals (i.e., Sem-9(b) ?l op v where v is a literal of type t. In this case datatypes), respectively. They are declared as disjoint. DT ?l rdf:type Dt is added to the ontology. has to be also disjoint with the rest of concepts. (3) Thirdly, Sem-10(b) v op ?r where v is a literal of type t. In this additional concepts DTinteger, DTstring, etc., are included case ?r rdf:type Dt is added to the ontology. in the ontology for each datatype. They are declared as dis- Sem-11(b) ?l = ?r. In this case ?l owl:sameAs ?r is added joint (except for compatible datatypes). They are defined to the ontology. as subclasses of DT. Sem-12(b) ?l != ?r. In this case ?l owl:di↵erentFrom ?r Concept disjointness is extensively used by type checking is added to the ontology. and thus it is crucial to declare. In practice, it is enough Sem-13(b) ?l op ?r, where op is one of >,>=,<,<=. In to declare primitive sibling concepts as disjoint. Datatypes this case ?l rdf:type DT and ?r rdf:type DT are added to the have to be converted into concepts (disjoint with the rest ontology. of concepts) in order to use the OWL Reasoner and detect Sem-1 is the most used triple pattern. In this case either o wrongly typed expressions. Let us also remark that (2) and is ?w or an ontology item ns:i. In the first case, ?w is added (3) do not depend on the ontology to be queried. as resource and, in the second one, ns:i is (again) added as The semantic validation covers the following cases: (a) resource. In the case Sem-2, given that ?w will be bounded Wrong use of variables in triple patterns and (b) Wrong use to a literal, the type ?w as well as the type of s are added. of variables in filter conditions. We have the following (non Each range of k to ?w and each domain of k to s are added. overlapping) rules that express which concept and property Sem-3 is similar to the case Sem-1, adding the literal as value assertions are added to the original ontology. We assume of the property k. When k is both data and object property that triples are syntactically correct according to previous (i.e., case Sem-4) the type of ?w cannot be fixed, and thus syntactic rules, and variables ?x are always added as ns:x to only the domain of k to s is added. The special cases of the ontology. Except when it is specified, s, p and o (and rdf/rdfs/owl are handled by rules Sem-5, Sem-6 and Sem- l, r) can be variables or ontology items (i.e., individuals, 7. In this case terminological axioms are not added to the properties and concepts). ontology. Otherwise, the original semantics of the ontology Sem-1(a) s ns:k o, where ns is not rdf/rdfs/owl and k is will be lost. Here the membership to DR and DT is used. In an object property of ns (thus o is not a literal). In this case the case Sem-8, o can be a resource or a literal, but none of s ns:k o is added to the ontology. them gives additional information for ?v, since even when o is a literal, ?v can be an object or a data property, because Inconsistent query : object and data properties are not necessarily disjoint. In Di sjoi ntCl ass es (# DTinteger # DTstring ) case of filter conditions in which one of the sides is a literal Let us now consider the following ASK query in order to (Sem-9 and Sem-10), then the type of this literal is added illustrate the rule Sem-3: to the ontology for the other side. In the case of equalities ASK { : james : age 20 } (Sem-11), owl:sameAs is used to identify elements in the ontology. In case of inequalities (Sem-12), owl:di↵erentFrom Here, we would like to know whether james’s age is 20. is used. Finally, in the case of numeric operators (Sem-13), Let us suppose that james’s age has been set to 45 by the DT is added as type of both sides. ontology. Since age is a functional property, then the tool will answer as follows, due to :james :age 20 has been added 3. EXAMPLES to the ontology by rule Sem-3: Inconsistent query : The following examples are cases of syntactically wrong F u n c t i o n a l D a t a P r o p e r t y (# age ) triple patterns and filter conditions for the ontology exam- ple. Let us now consider the following example in which rules Sem-7 and Sem-9 are applied. Since ?y is used in the place (1) ? x : agex ? y (2) ? x : age : jesus of a resource, then it cannot be used in a filter condition (3) ? x : father 10 with an integer. (4) FILTER (10 = " Alice ") SELECT ? x The tool reports the following answers when the queries WHERE { ? x rdf : type ? y . FILTER (? y = 10) } are validated: The tool works as follows. From the condition ?y=10, (1) The property ’ agex ’ does not exists rule Sem-9 adds ?y rdf:type :DTinteger to the ontology; and (2) The property ’age ’ is not an object property from ?x rdf:type ?y, rule Sem-7 adds ?y rdf:type :DR to the (3) The property ’ father ’ is not a data property ontology. Now, the answer of the tool is as follows: (4) Wrong types in filter : string and integer Inconsistent query : Case (1) illustrates the rule Syn-1 while (2) illustrates Syn- Di sjoi ntCl ass es (# DR # DT ) 4. The case (3) illustrates Syn-5, and case (4) illustrates the SubClassOf (# DTinteger # DT ) rule Syn-8. (1), (2) and (3) use OWL API, while (4) can be With regard to wrong use of variables in property posi- checked from the SPIN representation. tions, rules Sem-2 and Sem-8 allow us to detect the following With regard to semantic errors, we illustrate the rule Sem- inconsistent query: 1, with the following example of a SPARQL query: SELECT ? x SELECT ? x WHERE { ? x ? z ? y . ? t : age ? z } WHERE { ? x : father ? x } since ?z rdf:type :DTinteger and ?z rdf:type :DR are added Here father is an object property and ?x is a variable. to the ontology. Here, the answer of the tool is as follows: Therefore the triple ?x :father ?x is added to the ontology. Inconsistent query : Since father is irreflexive, it causes inconsistency, and the Di sjoi ntCl ass es (# DR # DT ) tool (via the OWL reasoner) answers as follows: SubClassOf (# DTinteger # DT ) Inconsistent query : With regard to rule Sem-11, we can consider the following I r r e f l e x i v e O b j e c t P r o p e r t y (# father ) example of SPARQL query: The rule Sem-1 is also applied to the following query: SELECT ? x WHERE { ? y : father ? z . FILTER (? y = ? z ) . FILTER SELECT ? x (? y != ? z ) } WHERE { ? x : father ? y . ? x : born ? y } Here, the filter conditions are incompatible. In this case, in which ?y should be a person and and a country at the ?y owl:sameAs ?z and ?y owl:di↵erentFrom ?z are added to same time. The tool answers (via the OWL reasoner) as the ontology by rules Sem-11 and Sem-12, respectively. In follows: this case, the answer of the tool is as follows: Inconsistent query : Inconsistent query : Dis joi ntCl asse s (# Person # DT # Country ) SameIndividual (# y # z ) O b j e c t P r o p e r t y D o m a i n (# father # Person ) D i f f e r e n t I n d i v i d u a l s (# y # z ) O b j e c t P r o p e r t y R a n g e (# born # Country ) The following example also illustrates rule Sem-12, in which Let us now consider the following example: ?y owl:di↵erentFrom ?z is added for the following query: SELECT ? x SELECT ? x WHERE { ? x : age ? y . FILTER (? y = ’ Alice ’) } WHERE { ? x : father ? y . ? x : father ? z . FILTER (? y != ? z ) } which illustrates the rules Sem-2 and Sem-9. Here ?y is a variable in the range of a data property (which is not an The answer of the tool is as follows: object property), and each range of the data property is Inconsistent query : used for typing ?y, by rule Sem-2. In this case, ?y rdf:type D i f f e r e n t I n d i v i d u a l s (# y # z ) F u n c t i o n a l O b j e c t P r o p e r t y (# father ) :DTinteger is added to the ontology, (also ?x rdf:type Per- son is added). Now, by rule Sem-9, from ?y=’Alice’, then Finally, rule Sem-13 is illustrated by the following query, ?y rdf:type :DTstring is also added to the ontology. Since in which ?y and ?z are compared by > and thus both ones DTinteger and DTstring are disjoint the tool answers as fol- have type DT which is incompatible with Person (i.e., the lows: range of father): SELECT ? x [4] Muhammad Akhter Javid and Suzanne M Embury. WHERE { ? x : age ? y . ? u : father ? z . Diagnosing faults in embedded queries in database FILTER (? y > ? z ) } applications. In Proceedings of the 2012 Joint Inconsistent query : EDBT/ICDT Workshops, pages 239–244. ACM, 2012. Dis joi ntCl asse s (# Person # DT ) [5] Dimitris Kontokostas, Patrick Westphal, Sören Auer, O b j e c t P r o p e r t y R a n g e (# father # Person ) Sebastian Hellmann, Jens Lehmann, Roland There are some cases in which inconsistency cannot be Cornelissen, and Amrapali Zaveri. Test-driven detected. For instance, let us suppose the following query: evaluation of linked data quality. In Proceedings of the 23rd international conference on World Wide Web, SELECT ? x pages 747–758. ACM, 2014. WHERE { ? x : age ? y . ? x : age ? z . FILTER (? y != ? z ) } [6] Jorge Pérez, Marcelo Arenas, and Claudio Gutierrez. Semantics and complexity of SPARQL. ACM Here, even knowing that age is a functional property, we Transactions on Database Systems (TODS), 34(3):16, cannot detect with the equality ?y != ?z that it cannot be 2009. answered. It is due that owl:di↵erentFrom cannot be used [7] Michael Schmidt, Thomas Hornung, Georg Lausen, and for literals in the OWL reasoner. Christoph Pinkel. SPˆ 2Bench: a SPARQL performance benchmark. In 2009 IEEE 25th Acknowledgements International Conference on Data Engineering, pages 222–233. IEEE, 2009. This work was supported by the EU (FEDER) and the [8] Xiaowang Zhang, Van Den Bussche Jan, and Francois Spanish MINECO Ministry (Ministerio de Economı́a y Com- Picalausa. On the satisfiability problem for SPARQL petitividad) under grant CAVI-TEXTUAL TIN2013-44742- patterns. J. Artif. Int. Res, 56(1):403–428, 2016. C4- 4-R. 4. CONCLUSIONS AND FUTURE WORK We have designed a tool for detecting and diagnosing wrong SPARQL queries. A set of rules has been defined in order to use the OWL API and an OWL reasoner to check wrongly typed and inconsistent queries, reporting the details of the diagnosis. The first question arising is the complete- ness of the method. The defined rules cover a wide number of cases, but a deeper study of completeness is required. In particular, the last example of previous Section shows a lim- itation of the approach. This limitation is imposed by the ontology reasoner. There are also other limitations in filter conditions. For instance, let us suppose that adult class is defined as subclass of person (i.e., older than 18), and a filter condition of a SPARQL query requests adults users whose age is smaller than 10. In this case, the query is inconsis- tent with the ontology, but it cannot still be detected. We have only considered the case of equalities and inequalities in FILTER conditions and sameAs/di↵erentFrom to check them, but more cases of filter conditions3 can be considered. We would like to implement a Java version of our tool, in- tegrated with Jena ARQ SPARQL Engine. We also plan to implement a Web tool for validating SPARQL code in which the ontology URI or SPARQL endpoint can be specified. 5. REFERENCES [1] Stefan Brass and Christian Goldberg. Semantic errors in SQL queries: A quite complete list. Journal of Systems and Software, 79(5):630–644, 2006. [2] Benjamin Dietrich and Torsten Grust. A SQL Debugger Built from Spare Parts: Turning a SQL: 1999 Database System into Its Own Debugger. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, pages 865–870. ACM, 2015. [3] Yuanbo Guo, Zhengxiang Pan, and Je↵ Heflin. LUBM: A benchmark for OWL knowledge base systems. Web Semantics: Science, Services and Agents on the World Wide Web, 3(2):158–182, 2005. 3 https://www.w3.org/TR/rdf-sparql-query/