<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">FAIR Service Descriptions: enriching life science SPARQL endpoints</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Jerven</forename><surname>Bolleman</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">SIB Swiss Institute of Bioinformatics</orgName>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Alan</forename><surname>Bridge</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">SIB Swiss Institute of Bioinformatics</orgName>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Nicole</forename><surname>Redaschi</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">SIB Swiss Institute of Bioinformatics</orgName>
							</affiliation>
						</author>
						<title level="a" type="main">FAIR Service Descriptions: enriching life science SPARQL endpoints</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">AE673A9D6D89E6A610029578BBF8E215</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T17:40+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>SPARQL</term>
					<term>RDF</term>
					<term>Information schema</term>
					<term>Query rewriting</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>SPARQL service descriptions allow for rich information schemas describing the data inside SPARQL endpoints. Rewriting information schema (re)-discovery queries to queries using an existing one can give major performance benefits. Rich service descriptions have many use cases beyond query rewriting.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>A significant challenge for users of SPARQL endpoints is discovering the shape and quantity of the data exposed inside them. The W3C standards for SPARQL allow for a Service Description (SD), enumerating the capabilities and capacities of SPARQL endpoint. The Swiss-Prot group provides extensive service descriptions for it's SPARQL endpoints: (https://hamap.expasy.org/sparql, https://beta.swisslipids.org/sparql,https://sparql.rheadb.org/sparql and https://sparql.uniprot.org/sparql).</p><p>A SD contain metadata about a SPARQL endpoint, such as when it was updated and which ontologies it uses. Such a SD can be seen as an information schema for a SPARQL endpoint. Using the Service Description <ref type="bibr" target="#b0">[1]</ref>, VoID <ref type="bibr" target="#b1">[2]</ref> and VoID-Ext <ref type="bibr" target="#b2">[3]</ref> vocabularies. We store these in in-dependant named graphs, which we always name as address of the SPARQL endpoint + ./well-known/void. e.g. https://sparql.rhea-db.org/.well-known/void. FAIR SDs have many use cases, such as:</p><p>• Query optimization and dataset visualizations. The tool SPEX which generates entity relationship diagrams uses these in part if they are available. • Generating ShACL files describing the shape of the data in a SPARQL endpoint.</p><p>• Generate APIs in languages such as R or Python to access the data in the SPARQL endpoint.</p><p>To be demonstrated in the CHIST-ERA: Open Research Data -TRIPLE project. • License and last updated information for FAIR data monitors.</p><p>As an example: a common SPARQL query people are thought to use is to discover how many distinct classes there are in a SPARQL endpoint shown in listing:1. For large datasets like UniProt this is a non-trivial. Imagine running it as a classical unix pipeline like listing:2. Then be surprised that this takes a few days to run if you have enough disk space and memory that SWAT4HCLS 2024: Bridging Life Sciences and Technology, February 26-29, Leiden, The Netherlands * Corresponding author. Envelope jerven.bolleman@sib.swiss (J. Bolleman); alan.bridge@sib.swiss (A. Bridge); nicole.redaschi@sib.swiss (N. Redaschi) Orcid 0000-0002-7449-1266 (J. Bolleman); 0000-0003-2148-9135 (A. Bridge); 0000-0001-8890-2268 (N. Redaschi) is. This is because there are more than 140 billion distinct triples in UniProt. Of course having such a SD is not enough as the people who are used to using such queries won't change to use a different query on an "information schema" by default. This means we need to rewrite the query (listing:1) to a query in the form of (listing:3). Query rewriting needs to take into account variations in prefix, white-space and variable naming. We solve this by using a SPARQL parser from the RDF4j project use the abstract SPARQL algebra for the query matching and rewrite. The original query with is redirected to a new location with a new query (http 301). </p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Listing 1 :</head><label>1</label><figDesc>"Count distinct classes used in a SPARQL endpoint. " SELECT ( COUNT ( DISTINCT ? c l a s s ) AS ? c l a s s e s ) WHERE { ? s u b j e c t a ? c l a s s . } Listing 2: "Simple pipeline to count the unique classes in an ntriples file. " s o r t −u a l l _ t r i p l e s _ i n _ u n i p r o t . n t | g r e p r d f : type | s o r t −u | wc − l Listing 3: "Rewritten SPARQL query to retrieve the count of the distinct classes in the endpoint. " SELECT ( COUNT ( DISTINCT ? c l a s s e s R a w ) AS ? c l a s s e s ) FROM &lt; h t t p : / / s p a r q l . u n i p r o t . o r g / . w e l l −known / v o i d &gt; WHERE { [ ] &lt; h t t p : / / r d f s . o r g / ns / v o i d # c l a s s &gt; ? c l a s s e s R a w . }</figDesc></figure>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgments</head><p>The Swiss-Prot group is part of the SIB Swiss Institute of Bioinformatics and of the UniProt Consortium. Swiss-Prot group activities are supported by the Swiss Federal Government through the State Secretariat for Education, Research and Innovation SERI and UniProt is supported by the National Eye Institute (NEI), National Human Genome Research Institute (NHGRI), National Heart, Lung, and Blood Institute (NHLBI), National Institute on Aging (NIA), National Institute of Allergy and Infectious Diseases (NIAID), National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK), National Institute of General Medical Sciences (NIGMS), National Institute of Mental Health (NIMH), and National Cancer Institute (NCI) of the National Institutes of Health (NIH) under grant U24HG007822.</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<monogr>
		<ptr target="https://www.w3.org/TR/sparql11-service-description/" />
		<title level="m">Sparql 1.1 service description</title>
				<imprint>
			<date type="published" when="2013">2013</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<monogr>
		<title level="m" type="main">Describing linked datasets with the void vocabulary</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">H J Z</forename><surname>Keith Alexander</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Richard</forename><surname>Cyganiak</surname></persName>
		</author>
		<ptr target="https://www.w3.org/TR/void/" />
		<imprint>
			<date type="published" when="2011">2011</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Aether -generating and viewing extended void statistical descriptions of rdf datasets</title>
		<author>
			<persName><forename type="first">E</forename><surname>Mäkelä</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">The Semantic Web: ESWC 2014 Satellite Events</title>
				<editor>
			<persName><forename type="first">V</forename><surname>Presutti</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">E</forename><surname>Blomqvist</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">R</forename><surname>Troncy</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">H</forename><surname>Sack</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">I</forename><surname>Papadakis</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">A</forename><surname>Tordai</surname></persName>
		</editor>
		<meeting><address><addrLine>Cham</addrLine></address></meeting>
		<imprint>
			<publisher>Springer International Publishing</publisher>
			<date type="published" when="2014">2014</date>
			<biblScope unit="page" from="429" to="433" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
