<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>J. Bolleman);</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>FAIR Service Descriptions: enriching life science SPARQL endpoints</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jerven Bolleman</string-name>
          <email>jerven.bolleman@sib.swiss</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alan Bridge</string-name>
          <email>alan.bridge@sib.swiss</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nicole Redaschi</string-name>
          <email>nicole.redaschi@sib.swiss</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>SIB Swiss Institute of Bioinformatics</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>SPARQL, RDF, Information schema</institution>
          ,
          <addr-line>Query rewriting</addr-line>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>SWAT4HCLS 2024: Bridging Life Sciences and Technology</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2024</year>
      </pub-date>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0003</lpage>
      <abstract>
        <p>SPARQL service descriptions allow for rich information schemas describing the data inside SPARQL endpoints. Rewriting information schema (re)-discovery queries to queries using an existing one can give major performance benefits. Rich service descriptions have many use cases beyond query rewriting. A significant challenge for users of SPARQL endpoints is discovering the shape and quantity of the data exposed inside them.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Using the Service Description [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], VoID [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] and VoID-Ext [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] vocabularies. We store these in
in-dependant named graphs, which we always name as address of the SPARQL endpoint +
(N. Redaschi)
is. This is because there are more than 140 billion distinct triples in UniProt. Of course having
such a SD is not enough as the people who are used to using such queries won’t change to use
a diferent query on an ”information schema” by default. This means we need to rewrite the
query (listing:1) to a query in the form of (listing:3). Query rewriting needs to take into account
variations in prefix, white-space and variable naming. We solve this by using a SPARQL parser
from the RDF4j project use the abstract SPARQL algebra for the query matching and rewrite.
The original query with is redirected to a new location with a new query (http 301).
      </p>
    </sec>
    <sec id="sec-2">
      <title>Listing 1: ”Count distinct classes used in a SPARQL endpoint.”</title>
      <p>SELECT ( COUNT ( DISTINCT ? c l a s s ) AS ? c l a s s e s )
WHERE { ? s u b j e c t a ? c l a s s . }</p>
    </sec>
    <sec id="sec-3">
      <title>Listing 2: ”Simple pipeline to count the unique classes in an ntriples file.”</title>
      <p>s o r t −u a l l _ t r i p l e s _ i n _ u n i p r o t . n t | g r e p r d f : type | s o r t −u |
wc − l
Listing 3: ”Rewritten SPARQL query to retrieve the count of the distinct classes in the endpoint.”
SELECT ( COUNT ( DISTINCT ? c l a s s e s R a w ) AS ? c l a s s e s )
FROM &lt; h t t p : / / s p a r q l . u n i p r o t . o r g / . w e l l −known / v o i d &gt;
WHERE { [ ] &lt; h t t p : / / r d f s . o r g / n s / v o i d # c l a s s &gt; ? c l a s s e s R a w . }
Acknowledgments
The Swiss-Prot group is part of the SIB Swiss Institute of Bioinformatics and of the UniProt
Consortium. Swiss-Prot group activities are supported by the Swiss Federal Government through
the State Secretariat for Education, Research and Innovation SERI and UniProt is supported by
the National Eye Institute (NEI), National Human Genome Research Institute (NHGRI), National
Heart, Lung, and Blood Institute (NHLBI), National Institute on Aging (NIA), National Institute
of Allergy and Infectious Diseases (NIAID), National Institute of Diabetes and Digestive and
Kidney Diseases (NIDDK), National Institute of General Medical Sciences (NIGMS), National
Institute of Mental Health (NIMH), and National Cancer Institute (NCI) of the National Institutes
of Health (NIH) under grant U24HG007822.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <source>[1] Sparql 1</source>
          .1 service description,
          <year>2013</year>
          . URL: https://www.w3.org/TR/ sparql11-service-description/.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>M. H. J. Z. Keith</surname>
            <given-names>Alexander</given-names>
          </string-name>
          , Richard Cyganiak,
          <article-title>Describing linked datasets with the void vocabulary</article-title>
          ,
          <year>2011</year>
          . URL: https://www.w3.org/TR/void/.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>E.</given-names>
            <surname>Mäkelä</surname>
          </string-name>
          ,
          <article-title>Aether - generating and viewing extended void statistical descriptions of rdf datasets</article-title>
          , in: V.
          <string-name>
            <surname>Presutti</surname>
            , E. Blomqvist,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Troncy</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Sack</surname>
            ,
            <given-names>I. Papadakis</given-names>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          . Tordai (Eds.),
          <source>The Semantic Web: ESWC 2014 Satellite Events</source>
          , Springer International Publishing, Cham,
          <year>2014</year>
          , pp.
          <fpage>429</fpage>
          -
          <lpage>433</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>