The YummyData Initiative: How SPARQL-y is your biomedical endpoint? Ivar Andrea Splendiani12 and Johan Nystrom-Persson3 and Michel Dumontier4 and Yasunori Yamamoty5 1 DERI, Galway, IE andrea.splendiani@deri.org 2 intelliLeaf ltd, Cambridge, UK andrea.splendiani@intellilaf.com 3 The National Institute of Biomedical Innovation, Osaka, Japan johan@nibio.go.jp 4 Carleton University, Ottawa, Canada michel dumontier@carleton.ca 5 Database Center for Life Science, JP yy@dbcls.jp Abstract. Although increasing amounts of biomedical data is being provided as structured content on the Semantic Web, there is currently no standardized way to monitor SPARQL endpoints for their availabil- ity, reliability or content flux. Importantly, there are additional issues relating to the provision of version-sensitive data republished by third parties or made available as part of a one off research project. All of these aspects have important consequences for users that rely on feder- ated queries across distributed SPARQL endpoints. 1 Results We describe the YummyData initiative to provide monitoring of biomedical SPARQL endpoints on a variety of factors including availability, reliability, content summarization, and content evolution. Our prototype website, yummy- data.org, provides simple metrics relating to the availability and response status of a selected set of endpoints, the size of the data set, etc. In addition to these fundamental metrics, our long term goal is to compute a SPARKLE score, a composite metric combining measures such as the number of triples, size and frequency of updates, the number of links to other datasets, and the capabilities of the endpoint server. Although the SPARKLE score is not a measure of the quality of a dataset, it helps indicate whether the dataset changes over time, and whether these changes are likely to be negative or positive in nature. These statistics may possibly correlated with the declared update frequency of the datasets published by the endpoint of a given provider, thus providing an addi- tional input to our score. Finally, yummydata.org also supports custom queries that track endpoint/data metrics, allowing new metrics to be computed and shared amongst participants. Although devising an objective measure of quality is controversial, we believe that the YummyData initiative will help users better understand the content that is currently available while also helping providers understand what kinds of metrics are important to users. We also believe that emergence of more and more third party SPARQL endpoint rating services like YummyData will bring about an environment where we can get more objective 2 evaluation of endpoints, and therefore qualities of the entire RDF-based biomed- ical data/services are becoming higher. As yummydata.org is an early prototype, we welcome suggestions to clarify existing metrics and to help develop additional metrics to tease out information of interest. Fig. 1. A screenshot from YummyData.org, showing the evolution of parameters and score for a given endpoint 2 Acknowledgments We wish to thanks the organizers of the BioHackathon 2012, during which this work originated.