An On-Line Learning to Query System Jedrzej Potoniec Faculty of Computing, Poznan University of Technology ul. Piotrowo 3, 60-965 Poznan, Poland Jedrzej.Potoniec@cs.put.poznan.pl Abstract. We present an on-line system which learns a SPARQL query from a set of wanted and a set of unwanted results of the query. The sets are extended during a dialog with the user guided by recall and F1 measure. The system leverages SPARQL 1.1 and does not depend on any particular RDF graph. 1 Introduction A common problem with querying a Linked Data dataset is that the user must have prior knowledge about the vocabulary used by the dataset and know a querying language, e.g. SPARQL [3]. A typical approach to remedy the prob- lem is to use some tool helping the user to formulate a SPARQL query, using e.g. faceted browsing [2], natural-language interfaces [4], visual interfaces [7] or recommendations [1]. In all these system the user specifies the query by vari- ous means. We propose a different approach, where the user only specifies what should and what should not be in the results of the query, and the system takes care of formulating the query. The user does not need to know SPARQL, it is enough for her to be able to distinguish wanted and unwanted results. If the user is already familiar with the data (e.g. knows its representation in some other for- mat), it is quite an easy task for her. The system operates by conducting a dialog with the user. In each part of the dialog, the user is asked about a small set of URIs and for every URI she must decide if it should or should not be present in the results of the final query. A similar approach was already presented in [5], where the system used mostly computational power of the computer system of the user. We leverage new features of SPARQL 1.1 and move most of the computation to a SPARQL endpoint, especially the process of query refinement guided by recall and deciding on termination depending on F1 measure. Thought this work, we use the following prefixes: dbr: for http://dbpedia. org/resource/, dbo: for http://dbpedia.org/ontology/, dbp: for http: //dbpedia.org/property/, dct: for http://purl.org/dc/terms/, xsd: for http://www.w3.org/2001/XMLSchema#. 2 System Description A screenshot of the on-line system is presented in Figure 1. The source code of the system is available in a Git 1 repository available at https:// bitbucket.org/jpotoniec/kretr/. An instance of the system is available at https://semantic.cs.put.poznan.pl/ltq/. It uses a SPARQL endpoint set up on Blazegraph2 2.1.1 and loaded with DBpedia 2015-04. Note that the sys- tem itself does not depend neither on Blazegraph nor DBpedia, as it can use any SPARQL 1.1 endpoint. 1 2 4 5 3a 3b Fig. 1. A screenshot of the system while the user is supposed to assign new examples to correct sets. The view is divided into six areas: new examples, which are to be assigned (1), the current query (2), the URIs already assigned by the user: wanted (3a) and unwanted (3b), a form to add a new URI by hand (4) and demo scenarios (5). The aim of the system is to build a SPARQL SELECT query with a single variable in the head. The variable has a modifier DISTINCT. The query contains only a WHERE clause (i.e. there is no GROUP BY, ORDER BY etc.), and in the clause there is only a basic graph pattern (BGP, i.e. triple patterns and filter expressions). The undirected graph corresponding to the BGP is a connected 1 https://git-scm.com/ 2 https://www.blazegraph.com/ graph and the filter expressions are of a form variable >=/<= literal. An example of such a query is SELECT DISTINCT ?uri WHERE { ?uri dct:subject dbr:Category:City_counties_of_Poland . ?uri dbo:populationTotal ?anon1. FILTER(?anon1 >= "205934"^^xsd:nonNegativeInteger). } Fig. 2. A typical workflow with the system. First the user specifies a small set of examples, then the system works in a loop: asses a hypothesis, refines it, finds new examples and asks the user to decide about them. A typical workflow with the system is presented in Figure 2. First, the user specifies a small set of URIs that should be present in the results of the final query, and a small set of URIs that should not be present in the results. Then, the system refines the query by adding to it a new triple pattern (or a triple pattern and a filter expression) while maintaining recall of at least 0.99, i.e. covering at least 99% of the positive examples. In a typical case of multiple possible refinements, they are sorted by F1 measure and precision and the one with the highest values is chosen. Next, the system generates a few new positive and negative examples. The positive examples are simply selected from the results of the refined query, while the negative examples are computed by subtracting from the results of the original query the results of the refined query. In each case we require the found examples to be new, i.e. they must not be already labeled by the user. If finding the new examples is impossible, the refinement is retracted and the next in order is tried. The user is then asked to decide about each example if it should or not be present in the results of the final query (i.e. the user extends the sets defined in the beginning). After the user decides, the system checks if the decisions agrees with the query. If they do not, the cycle repeats: the system refines the query, asks the user and assess the query. Otherwise, the system assumes that the correct query was found. The query is displayed to the user along with the results of the query. If the user decides that the results are not satisfactory, she must add at least one new URI to one of the sets. More details about the algorithm, along with the templates of queries used, is presented in [6]. 3 Proposed Demo During the demo, we will present to the participants how to use the system, what types of queries are possible to learn and what are the limitations. The system has embedded two demo scenarios: the first one to find a query to select all capitals of the member states of European Union and the second one to find a query to select all Polish cities having more than 2000 000 citizens. For the presentation, we will use the on-line instance available at https://semantic. cs.put.poznan.pl/ltq/. 4 Conclusion In this paper we presented a system which is able to learn a SPARQL query from two sets of URIs obtained from the user in a dialog. The system leverages new features of SPARQL 1.1 and does not depend on any particular RDF graph. It also does not require any precomputation before it is ready to use. The source code is publicly available. Acknowledgement. Jedrzej Potoniec acknowledges the support from the Pol- ish National Science Center (Grant No 2013/11/N/ST6/03065). References 1. Campinas, S.: Live SPARQL auto-completion. In: Horridge, M., Rospocher, M., van Ossenbruggen, J. (eds.) Proc. of the ISWC 2014 Posters & Demonstrations Track. CEUR Workshop Proceedings, vol. 1272, pp. 477–480 (2014) 2. Ferré, S.: SPARKLIS: a SPARQL endpoint explorer for expressive question answer- ing. In: Horridge, M., Rospocher, M., van Ossenbruggen, J. (eds.) Proc. of the ISWC 2014 Posters & Demonstrations Track. CEUR Workshop Proc., vol. 1272, pp. 45–48 (2014) 3. Harris, S., Seaborne, A.: SPARQL 1.1 query language. W3C recommendation, W3C (Mar 2013), http://www.w3.org/TR/2013/REC-sparql11-query-20130321/ 4. Höffner, K., Walter, S., Marx, E., Usbeck, R., Lehmann, J., Ngomo, A.C.N.: Survey on challenges of question answering in the semantic web. Semantic Web Journal (accepted for publication) (2016) 5. Lehmann, J., Bühmann, L.: AutoSPARQL: Let users query your knowledge base. In: Antoniou, G., Grobelnik, M., et al. (eds.) The Semantic Web: Research and Applications. LNCS, vol. 6643, pp. 63–79. Springer (2011) 6. Potoniec, J.: Learning to Query: from Concepts in Mind to SPARQL Queries. Tech. Rep. RA-9/16, Institute of Computing Science, Faculty of Computing, Poznan Uni- versity of Technology (aug 2016), http://goo.gl/B7J008 7. e Zainab, S.S., Saleem, M., Mehmood, Q., et al.: Fedviz: A visual interface for SPARQL queries formulation and execution. In: Ivanova, V., Lambrix, P., et al. (eds.) Proc. of the International Workshop on Visualizations and User Interfaces for Ontologies and Linked Data. CEUR Workshop Proc., vol. 1456, p. 49 (2015)