Monitoring the Status of SPARQL Endpoints Pierre-Yves Vandenbussche1 , Carlos Buil Aranda2 , Aidan Hogan3 , and Jürgen Umbrich1 1 Fujitsu (Ireland) Limited, Swords, Co. Dublin, Ireland 2 Department of Computer Science, Pontificia Universidad Católica de Chile 3 Digital Enterprise Research Institute, National University of Ireland, Galway Abstract. We demo an online system that tracks the availability of over four-hundred public SPARQL endpoints and makes up-to-date re- sults available to the public. Our demo currently focuses on how often an endpoint is online/offline, but we plan to extend the system to col- lect metrics about available meta-data descriptions, SPARQL features supported, and performance for generic queries. 1 Motivation In previous work [2], we presented an analysis of the landscape of public SPARQL endpoints and asked the question: are these endpoints ready for ac- tion?4 Taking the full list of 427 public endpoints from the CKAN/DataHub catalogue (as available at the time of writing), for each endpoints, we conducted a number of experiments to gauge the following four main aspects: Discoverability: What kinds of meta-data descriptions are available about the endpoints and their content? How easy are these descriptions to find? Interoperability: Which SPARQL (1.1) features does each endpoint support? Which features (or combinations of features) lead to exceptions? Efficiency: How do the endpoints perform for answering generic forms of queries? How is cold-cache performance vs. warm-cache performance? What is the latency like over HTTP? Availability: What are the average uptimes of the endpoints? How many end- points are dying/have died? How many endpoints have high reliability? Our results showed that about half of the endpoints listed on CKAN/- DataHub are now offline, that only a few endpoints make meta-data descrip- tions available about their content (VoID) or features supported (SPARQL 1.1 Service Descriptions) in easy-to-find locations, that there was mixed adoption This work was supported by Fujitsu (Ireland) Ltd. & by Science Foundation Ireland under Grant No. SFI/08/CE/I1380 (Lion-2). Carlos Buil-Aranda was supported by CONICYT/FONDECYT project No. 3130617. 4 This work is accepted for the Experiments track of ISWC 2013 [2]. This demo paper rather focuses on our tool for making results available to the community. of SPARQL and (recently standardised) SPARQL 1.1 features, that the per- formance of different endpoints over HTTP for generic queries could vary by orders of magnitude, and that less than one third of the endpoints had an aver- age availability in the interval 99–100% (i.e., at least two-nines availability). We concluded that the usability of different public endpoints varies greatly. We thus propose a system that tracks and collects metrics about public endpoints over time. Currently, our service tracks the hourly availability of end- points, and we plan to extend it to collect weekly metrics about the available meta-data, supported features and performance of these endpoints, as well as other metrics that the community may wish to suggest. In Section 2, we first discuss our current “SPARQL Endpoint Status” sys- tem, available online at http://labs.mondeca.com/sparqlEndpointsStatus/. Thereafter, in Section 3, we discuss our proposed extensions. 2 SPARQL Endpoint Status Monitoring Availability The system automatically collects and updates a list of public SPARQL endpoints from the CKAN/DataHub catalogue. These end- points are queried on an hourly basis using two alternative SPARQL queries: ASK WHERE{ ?s ?p ?o . } SELECT ?s WHERE{ ?s ?p ?o . } LIMIT 1 The ASK query on the left is issued first. If this query fails (from previous experience, we note that some endpoints do not support ASK [2, § 3]), we try the SELECT query on the right. Both queries are selected at they should be as cheap as possible for the endpoint to run: our goal is simply to check whether or not the endpoint is available for answering queries. If the endpoint returns a valid SPARQL response for either query, we then say that the endpoint is available at that timepoint. We also record the time taken for the query to execute. At the time of writing, we have collected more than two million hourly pings across hundreds of endpoints over a period of more than two years. Detailed analysis of these availability results is available in [2, § 5]. User Interface We provide a user interface to browse and visualise the hourly results. The user interface supports two primary views. The first view, exemplified in Figure 1, provides a full list of all the moni- tored endpoints, their availability in the past 24 hours (ratio of successful hourly queries in that period), and their availability in the past seven days. A green/yel- low/red/gray icon indicates, resp., that the endpoint is operating normally/avail- able but had problems in the past 24 hours/not available currently/not available once in the past 24 hours. As per the icons listed on the right of the screenshot, each endpoint is also associated with (1) an RSS feed to provide updates on availability information, (2) a link to the endpoint itself and (3) a link to the relevant CKAN/DataHub page for the dataset it relates to. The second view provides details for a given endpoint. Figure 2 shows an example screenshot for the DBpedia endpoint. The graph on the left shows the Fig. 1. Screenshot of current SPARQL Endpoint Status list response times for the last 24 hourly pings to that endpoint. The graph on the right plots the 24 hour availability for each of the last seven days. Fig. 2. Screenshot of current SPARQL Endpoint Status detail view for DBpedia RDF Meta-data Results of the hourly pings are exported as RDF. Figure 3 presents an example description. We reuse existing vocabularies as much as pos- sible (VoID, dcterms, etc.) to describe each dataset, their related SPARQL end- point, title and identifier, etc. To capture availability information, we designed a new vocabulary (no existing one handled this feature). The “endpoint status” vocabulary5 (ends) allows the description of a status observation with the in- formation of date, description (we are here reusing dcterms vocabulary), status availability and response time. All RDF data are then published in a SPARQL Endpoint available at: http://labs.mondeca.com/endpoint/ends. 5 http://labs.mondeca.com/vocab/endpointStatus/ Fig. 3. Schema used to express SPARQL endpoint availability in RDF 3 Future Extensions Our system currently captures endpoint availability and query latency. In line with the discussion of Section 1 and the methods of our experimental paper [2], we wish to extend our system to track more metrics about public endpoints. These would include: (1) what meta-data descriptions about each endpoint/- dataset are available and where (e.g., VoID, SPARQL 1.1 SD), (2) what query features each endpoint supports (e.g., SPARQL 1.1, full-text), (3) what per- formance can be expected for generic queries (atomic lookups, dump queries, controlled joins). Since the queries are more expensive to run, we propose run- ning them on a weekly basis to not overburden endpoints. We would then extend our UI and RDF vocabulary to make these metrics available. We are very much open to suggestions/use-cases from the community for collecting further metrics. Furthermore, we are considering making a locally deployable version for clients to monitor endpoints of relevance to them. Acknowledgements: This paper was supported by Fujitsu (Ireland) Lim- ited, and funded in part by Science Foundation Ireland under Grant No. SFI/08/CE/I1380 (Lion-2). Carlos Buil-Aranda was supported by the CONI- CYT/FONDECYT project No. 3130617. References 1. K. Alexander, R. Cyganiak, M. Hausenblas, and J. Zhao. Describing linked datasets. In LDOW, 2009. 2. C. B. Aranda, A. Hogan, J. Umbrich, and P.-Y. Vandenbussche. SPARQL Web- Querying Infrastructure: Ready for Action? In ISWC. Springer (LNCS), 2013. (Ac- cepted; to appear.). 3. G. T. Williams. SPARQL 1.1 Service Description. W3C Recommendation, March 2013.