SPARQL Assist Language-Neutral Query Composer

                 Luke McCarthy1, Ben Vandervalk1, Mark Wilkinson1
                  1
                    Providence Heart + Lung Institute at St. Paul's Hospital,
                 University of British Columbia, Vancouver, BC, Canada
                                   markw@illuminae.com


      Abstract. SPARQL query composition is difficult for the lay-person or even the
      experienced bioinformatician in cases where the data model is unfamiliar.
      Established best-practices and internationalization concerns dictate that
      semantic web ontologies should use terms with opaque identifiers, further
      complicating the task. We present SPARQL Assist: a web application that
      addresses these issues by providing context-sensitive type-ahead completion to
      existing web forms. Ontological terms are suggested using their labels and
      descriptions, leveraging existing XML support for internationalization and
      language-neutrality.

      Keywords: SPARQL, RDF, OWL, Semantic Web, Semantic Web Services,
      i18n


1 Introduction

The health care and life science sectors have been some of the most enthusiastic
adopters of semantic web technologies. The benefits of the RDF/OWL data model
are well-understood by bioinformaticians who have too long had to deal with the
problem of integrating data from multiple sources with wildly different underlying
schema. These benefits are less obvious, however, to clinicians and researchers who
merely see one mysterious query language (SQL) exchanged for another (SPARQL).
Even a semantic web-savvy informatician can be daunted when faced with the
challenge of querying an unfamiliar data source whose particular RDF vocabulary is
initially unknown.
   The issue is compounded by the growing use of opaque, semantic-free URIs for
ontological classes and properties (OBO [1], SIO [2], CWA [3]). Where the meaning
of rdf:type or dc:title in a SPARQL query is relatively clear, the meaning of, for
example, sio:SIO_000253 is considerably harder to glean without looking up its
ontological definition. Nevertheless, there are many valid reasons for designing
ontologies this way, not the least of which is language neutrality.
   RDF/XML provides built-in language neutrality by way of the xml:lang attribute;
an ontology can easily be internationalized by providing multiple rdfs:label or
rdfs:comment properties with appropriate xml:lang attributes. However, even those
projects who have, in principle, adopted language neutrality for their classes (e.g.
OBO), have not done so for their properties (OBO Relationship Ontology [4]). This
is no-doubt due, at least in part, to the difficulty of composing SPARQL queries in
which predicates have opaque identifiers. Nevertheless, it is crucial that we do not
allow convenience to direct the development of a core global resource - the Semantic
Web - and thus the problem should be solved at the level of the tools provided, rather
than the resources themselves.


2 SPARQL Assist

In this demonstration, we present SPARQL Assist: a web application that facilitates
the construction of SPARQL queries by providing context-sensitive type-ahead
completion. In addition to assistance with basic syntax, ontological terms are indexed
by their labels, allowing a query to be composed in a user's preferred language,
assuming appropriate labels are present in the ontology. Terms are read on-the-fly
from any ontology specified in a FROM clause, but SPARQL Assist can also be
configured to pre-load terms from particular ontologies or SPARQL endpoints.
   The entire query, as it is being constructed, is used to provide context for the type-
ahead suggestions. Previously declared variables or known individuals are suggested
in the subject or object position of a clause and known properties are suggested in the
predicate position. If a clause specifies an individual, properties that individual is
known to have are displayed preferentially. Similarly, if a clause contains a variable
that can ultimately be connected to a known individual in another part of the query,
that connection is used to find the most likely properties in the current clause.
   Terms are cached on the client side to speed up repeated look-ups, but most of the
processing is done on the server side in Java to take advantage of the mature OWL
toolkit on that platform. In the future, as much computation as possible will be
transferred to the client side to improve both performance and flexibility of
deployment.
   For this demonstration, SPARQL Assist has been implemented in the context of
creating queries that will be resolved by the Semantic Health and Research
Environment (SHARE [5]).


3 SHARE

SHARE is an advanced SPARQL query client built on top of the SADI Framework
[6] for Semantic Web Services. In SADI, services attach properties to input OWL
instances and are indexed in a central registry based on the properties they attach.
SHARE maps the triple patterns of a SPARQL query onto these indexed properties,
allowing a user to query the entire virtual graph of registered SADI services. The
RDF data required to answer a given query is thus dynamically generated in response
to that query.
   In the context of this demonstration, this infrastructure makes SPARQL query
composition even more difficult, since there is no pre-existing database to inspect for
candidate properties and individuals. The specialized SPARQL Assist provider for
SHARE, therefore, uses the SADI registry, in addition to any loaded ontologies, to
suggest properties to be used in a query. As in the generic case, if a clause contains a
named individual or a variable previously connected to an individual, that information
is used to further refine the suggestions; in this case by filtering services (and the
resulting suggested properties) that cannot accept a particular individual.


4 Conclusion

SPARQL Assist provides prototype solutions for two important problems. First, to
hasten the uptake of Semantic Web technologies, it is important to improve access to,
and usability of, Semantic Web resources for the lay-end-user while still maintaining
best-practices in the way these resources are modeled. Opaque identifiers for both
classes and properties are important, as they allow us to avoid “churn” as an ontology
evolves over time. We must therefore support the end-user in constructing queries
over resources formatted in this way. Second, the Semantic Web is intended to be a
global resource, of use to all. As such, a respect for internationalization is also
critical, even at these early stages in Semantic Web evolution. We believe that
SPARQL Assist provides motivation to more widely adopt what are clearly best-
practices in Semantic Web data provision.

Acknowledgments. This work has been supported by the Heart + Stroke Foundation
of BC and Yukon, Microsoft Research, The Canadian Institutes for Health Research,
The Natural Sciences and Engineering Research Council of Canada, and CANARIE.


References

1. Smith, B., et. al.: (The OBO Foundry: coordinated evolution of ontologies to support
   biomedical data integration. Nature Biotechnology 25, 1251--1255 (2007)
2. Semantic Science — Scientific Knowledge Discovery, http://semanticscience.org/
3. NBIC Declaration Concept Web Alliance, http://www.nbic.nl/about-nbic/affiliated-
   organisations/cwa/declaration/
4. OBO          Relationship        Ontology,     http://www.berkeleybop.org/ontologies/obo-
   all/relationship/relationship.owl
5. Vandervalk, B.P., McCarthy, E.L., and Wilkinson, M.D.: SHARE: A Semantic Web Query
   Engine for Bioinformatics. In: LNCS, vol. 5926, pp. 367-369. Springer, Heidelberg (2009)
6. Wilkinson, M.D., Vandervalk, B.P., McCarthy, L.: SADI Semantic Web Services — 'cause
   you can't always GET what you want! In: 2009 IEEE Asia-Pacific Services Computing
   Conference, pp. 13-18. IEEE Asia-Pacific (2010)