SWISH: An Integrated Semantic Web Notebook Wouter Beek and Jan Wielemaker {w.g.j.beek,j.wielemaker}@vu.nl Dept. of Computer Science, VU University Amsterdam, NL 1 Introduction SPARQL editors like Yasgui [6] make it easier to write and inspect their re- sults. Notebooks like Jupyter/IPython [5] already support computer- and data scientists in domains like statistics and machine learning. There is currently not an integrated notebook solution for Semantic Web programming that combines the strengths of SPARQL editors with the benefits of notebooks. The challenge is that Semantic Web formalisms are mostly logic-based and declarative, which does not always align naturally with imperative programming paradigm. SWISH takes a different approach by presenting an integrated notebook experience to the Semantic Web programmer that uses a declarative programming paradigm (SWI) as an integration layer. 2 Requirements An integrated Semantic Web notebook must implement the following require- ments: 1. Be able to write queries in a modular way. 2. Be able to share these modules with others. 3. Be able to online collaborate with others on building, altering and combining query modules. 4. Be able to interleave SPARQL patterns and filters with functions from other programming paradigms (e.g., NLP, statistics, ML). 5. Be able to calculate query results under standardized and user-defined en- tailment regimes. One of the main problems with existing SPARQL editors is that they do not allow queries to be written in a modular way. This issue is only partially solved by re- cent innovations like grlc [4] that allow full queries to be shared with others. The problem is that SPARQL queries cannot be easily reused as self-contained build- ing blocks, which is possible in programming languages that allow self-contained functions to be reused by other functions. The main challenge is that SPARQL, like most other Semantic Web formalisms, follows a declarative paradigm. What is needed is a programming paradigm that allows subqueries to be naturally encapsulated in functions/predicates and modules (Requirement 1). Once queries can be written as modular code snippets, the online notebook environment must allow these code snippets to be shared with other users (Re- quirement 2). Existing technologies like ShareJS (1 ) make it easy for users to collaboratively work on the same code. This functionality must be integrated into a Semantic Web notebook as well (Requirement 3). In existing notebook systems one is not restricted to using only one stan- dardized syntax for querying. In fact, it is very important for data scientists to be able to mix code from different programming and query languages (Require- ment 4). Use cases for these are evident in many areas, for instance the ability to use Natural Language Processing (NLP) tools for fuzzy string matching (not included in the SPARQL query language) or the ability to perform a statistical test in R (2 ). The user must be able to perform entailment under arbitrary regimes. SPARQL editors are tied to the restrictions of the entailment functionality that is exposed by contemporary triples stores. Support for standardized entailment regimes (RDF(S), OWL) is often partial and it is not always possible to specify al- ternative entailment regimes or domain-specific custom rules. A Semantic Web notebook should allow a user to specify her own deduction rules in addition to standardized entailment regimes (Requirement 5). 3 Implementation SWISH is implemented as a JavaScript (browser) client that runs in combination with the Prolog-based ClioPatria triple store [7]. The client/server communica- tion is implemented by using Pengines [3]. A Pengine is a Prolog engine that can be controlled through (remote) HTTP requests. It allows Prolog queries to be performed from within JavaScript. Since arbitrary programs can be exe- cuted, SWISH is not limited to functionality that is provided by standardized Semantic Web query languages like SPARQL. For instance, the user can choose to perform SQL and Datalog queries in addition to SPARQL queries. She can perform entailment under a domain-specific or otherwise non-standard regime in addition to RDF(S) and OWL. On the server-side code is executed within a sandboxed environment for security and sustainability reasons. If full/unrestricted functionality is needed at the server-side a user can deploy a remote or local SWISH instance herself by cloning the SWISH repository3 . 4 Illustration As an example we take the following SPARQL query that enumerates labor strikes that took place in Amsterdam in 1903: 1 See https://github.com/share/ShareJS 2 See https://www.r-project.org/ 3 See https://github.com/SWI-Prolog/swish Figure 1. Screenshot of the SWISH interface. SELECT ?strike ?days ?workers ?place ?date ?place WHERE { ?strike ex:days ?days . ?strike ex:workers ?workers . ?strike ex:place ?place FILTER (langMatches(lang(?place), "nl")) FILTER (lcase(str(?place)) = "Haarlem") ?strike ex:date ?date . FILTER (year(?date) == 1903) } LIMIT 10 In SWISH we can write any SPARQL query by using the rdf/3 predicate that implements Simple Graph Pattern queries. SPARQL FILTER expressions are im- plemented using a Domain-Specific Language extension (DSL): lang_matches/2 shows how this works (notation between curly braces). sounds/2 performs ‘sounds like’ string matching as implemented by the NLP metaphone algorithm. This il- lustrates how custom functions can be applied as filters within the query.4 strike_by_place_and_year(Strike, PlaceMatch, Year) :- rdf(Strike, ex:numberOfDays, NumDays), rdf(Strike, ex:numberOfWorkers, NumWorkers), rdf(Strike, ex:place, Place), {lang_matches(Place, nl)}, 4 Using lcase/2 would have replicated the SPARQL query. {sounds(Place, PlaceMatch)}, rdf(Strike, ex:date, Date), {Date = date(1903,_,_)}. ?- strike_by_place_and_year(Strike, "Haarlem", 1903). The predicate strike_by_place_and_year/3 has advantages over the SPARQL version. The Prolog predicate can be used to enumerate the labor strikes in any city and in any year. It can also be reused in other queries. Since SWISH pro- grams can be shared online, the Prolog predicate can also be reused in someone else’s query. This functionality allows developers to incrementally build more sophisticated queries on top of existing, proven and tested building blocks. This is an effective way to avoid the large and complex SPARQL queries often found in existing Semantic Web applications. 5 Use cases & Conclusion SWISH is able to support a variety of use cases. Recently TRILL-on-SWISH [2] was released: a fuzzy OWL reasoner built on top of SWISH. It illustrates that SWISH can be used to provide functionality that no existing SPARQL editor or Semantic Web-compatible notebook can provide: reasoning over a non-standard entailment regime5 . SWISH development is still ongoing. The LOD Laundro- mat team is currently using SWISH in order to expose the next version of LOD Laundromat [1] for others to query online. References 1. Beek, W., Rietveld, L., Bazoobandi, H.R., Wielemaker, J., Schlobach, S.: LOD Laun- dromat: A uniform way of publishing other people’s dirty data. In: ISWC 2014, pp. 213–228. Springer (2014) 2. Bellodi, E., Lamma, E., Riguzzi, F., Zese, R., Cota, G.: A web system for reasoning with probabilistic OWL. Software: Practice and Experience (2016) 3. Lager, T., Wielemaker, J.: Pengines: Web logic programming made easy. Theory and Practice of Logic Programming 14(4-5), 539–552 (2014) 4. Meroño-Peñuela, A., Hoekstra, R.: grlc makes GitHub taste like Linked Data APIs. In: Proceedings of the Services and Applications over Linked APIs and Data work- shop, ESWC (2016) 5. Ragan-Kelley, M., Perez, F., Granger, B., Kluyver, T., Ivanov, P., Frederic, J., Bussonier, M.: The Jupyter/IPython architecture: a unified view of computational research, from interactive exploration to communication and publication. In: AGU Fall Meeting Abstracts. vol. 1, p. 07 (2014) 6. Rietveld, L., Hoekstra, R.: Yasgui: Not just another SPARQL client. In: Extended Semantic Web Conference. pp. 78–86. Springer (2013) 7. Wielemaker, J., Beek, W., Hildebrand, M., van Ossenbruggen, J.: ClioPatria: A SWI-Prolog infrastructure for the Semantic Web. Semantic Web Journal 7(5), 529– 541 (2016) 5 See http://trill.lamping.unife.it/