<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Formatting SPARQL 1.1 via Parsing Expression Gram mar</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Hirokazu Chiba</string-name>
          <email>chiba@dbcls.rois.ac.jp</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Database Center for Life Science, Joint Support-Center for Data Science Research, Research Organization of Information</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>and Systems</institution>
          ,
          <addr-line>178-4-4 Wakashiba, Kashiwa, Chiba, 277-0871</addr-line>
          ,
          <country country="JP">Japan</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The core programming language of the Semantic Web is SPARQL. To increase the productivity of Semantic Web programming, reusability of the SPARQL queries is key. Here, we aim to implement a formatter that fully supports SPARQL 1.1 to enhance the reusability of SPARQL. First, SPARQL 1.1 grammar defined in EBNF was transformed into Parsing Expression Grammar (PEG) to generate a SPARQL 1.1 parser. The parser accepts a valid SPARQL query and constructs the abstract syntax tree (AST) of the input SPARQL query. Next, a formatter was implemented to output a SPARQL query from this AST. The SPARQL formatter is available on the command line and also available as a JavaScript library for developing websites. SPARQL ASTs are represented internally in JSON, and can be reused for purposes other than formatting. Furthermore, the PEG expression developed here can be readily modified for similar subgraph matching problems on knowledge graphs. The source code and a website of the SPARQL formatter are available at https://github.com/sparqling/sparql-formatter.</p>
      </abstract>
      <kwd-group>
        <kwd>SPARQL</kwd>
        <kwd>parser</kwd>
        <kwd>formatter</kwd>
        <kwd>PEG</kwd>
        <kwd>grammar</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>CEUR
ceur-ws.org</p>
    </sec>
    <sec id="sec-2">
      <title>1. Introduction</title>
      <p>
        The core of Semantic Web programming is SPARQL [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. To increase the productivity of
Semantic Web programming, enhancing the reusability of SPARQL queries is key. Especially,
the readability of code is essential for reusability. However, reformatting the SPARQL query is
not a trivial issue. We need to parse the input SPARQL and output it appropriately to reformat
it. Parsing a SPARQL query is a complex task because the SPARQL specification has a complex
grammar, which is comprised of 173 rules in the format of EBNF. Thus, it is dificult to implement
a parser in a conventional sequential programming language.
      </p>
      <p>
        Several programming languages including JavaScript support Parsing Expression Grammar
(PEG) [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] for describing definitions of a grammar in a declarative manner to generate a parser.
Here, we aim to implement a formatter that fully supports SPARQL 1.1 by using PEG for
command-line use and website development.
CEUR
Workshop
Proceedings
      </p>
      <p>
        © 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
// [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] Query ::= Prologue ( SelectQuery | ConstructQuery | DescribeQuery | AskQuery ) ValuesClause
Query = p:Prologue WS* q:( SelectQuery / ConstructQuery / DescribeQuery / AskQuery ) v:ValuesClause
{
let ast = { type: 'Query' };
if (p.length) {
      </p>
      <p>ast.prologue = p;
}
}
ast.queryBody = q;
if (v) {</p>
      <p>ast.values = v;
}
return ast;</p>
    </sec>
    <sec id="sec-3">
      <title>2. Implementation</title>
      <p>
        We used PEG.js (https://pegjs.org/) to generate a SPARQL 1.1 parser. 173 rules expressed in EBNF
were extracted from SPARQL 1.1 Query Language specification [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] and translated into PEG
rules for constructing abstract syntax trees (ASTs). An excerpt from the EBNF rules of SPARQL
and the corresponding PEG rules for constructing the AST is shown in Figure 1. Notably, white
spaces and comments in SPARQL are not included in EBNF rules. As comments are often
inserted in SPARQL queries in practice, comments besides white spaces are also expressed
in our PEG expressions. Comments are kept separate from the AST. The position from the
beginning of the SPARQL input was obtained and included in the returned values.
      </p>
      <p>The formatter was implemented in JavaScript, which accepts an AST of SPARQL and outputs
each element of the SPARQL uniformly. The position of each element is compared with the
positions of comments, so that the comments are added at the appropriate positions.</p>
      <p>
        91 test queries were extracted from the SPARQL 1.1 Query Language [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] specifications, and
16 from SPARQL 1.1 Update [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. All those queries were formatted and then manually curated
for use as test cases.
      </p>
    </sec>
    <sec id="sec-4">
      <title>3. Use Cases</title>
      <p>The SPARQL formatter is published as an npm library, so that it is easy to install and use in a
Node.js environment as the sparql-formatter command. The Docker version is also available
and published on Docker Hub.</p>
      <p>The JavaScript code of the parser and formatter has been bundled and made available on
Content Delivery Network (CDN), so that users can incorporate the SPARQL formatter into
their websites. Figure 2 shows a website using the SPARQL formatter. All the queries extracted
from the SPARQL 1.1 specifications are available as example queries on the website.</p>
      <p>The AST of a SPARQL query is represented internally in JSON and can be reused for purposes
other than formatting. Figure 3 shows the AST obtained for an example SPARQL query, “SELECT
* WHERE { ?s ?p ?o } LIMIT 10 ”.</p>
    </sec>
    <sec id="sec-5">
      <title>4. Discussion</title>
      <p>
        By generating a SPARQL parser via PEG.js, the SPARQL formatter was implemented. This
work contributes to the increased reusability of SPARQL queries. The queries can be
formatted in a uniform way, thus the readability will be increased. While other eforts to
implement SPARQL parsers have been made in projects such as RSFStore-js [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] and SPARQL.js
(https://github.com/RubenVerborgh/SPARQL.js), they could not format queries as intended.
Notably, by adding a JSON-LD context to the obtained AST for a SPARQL, the query can be
viewed as RDF, thus the SPARQL queries themselves can be treated as RDF resources. It may
contribute to the FAIRification [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] of the Semantic Web queries. The PEG.js code developed
for SPARQL 1.1 can be modified for extended specifications of SPARQL [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] or other subgraph
matching problems on knowledge graphs [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
"type": "Query",
"queryBody": {
"select": [
      </p>
      <p>"*"
],
"where": [
{
"type": "TriplesBlock",
"triplePattern": [
{
"subject": {</p>
      <p>"variable": "s"
},
"properties": [
{
"predicate": {</p>
      <p>"variable": "p"
},
"objects": [
{</p>
      <p>"variable": "o"
}
}
],
"limitOffset": [
{</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <source>[1] SPARQL 1.1 Query Language</source>
          ,
          <year>2013</year>
          . https://www.w3.org/TR/sparql11-query/.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>B.</given-names>
            <surname>Ford</surname>
          </string-name>
          , Parsing Expression Grammars:
          <article-title>a recognition-based syntactic foundation</article-title>
          ,
          <source>in: Proceedings of the 31st ACM SIGPLAN-SIGACT symposium on Principles of programming languages</source>
          ,
          <year>2004</year>
          , pp.
          <fpage>111</fpage>
          -
          <lpage>122</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <source>[3] SPARQL 1.1 Update</source>
          ,
          <year>2013</year>
          . https://www.w3.org/TR/sparql11-update/.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>A. G.</given-names>
            <surname>Hernández</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>García</surname>
          </string-name>
          ,
          <article-title>A JavaScript RDF store and application library for linked data client applications, in: Devtracks of the WWW2012 conference</article-title>
          . Lyon, France,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>M. D.</given-names>
            <surname>Wilkinson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Dumontier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I. J.</given-names>
            <surname>Aalbersberg</surname>
          </string-name>
          , G. Appleton,
          <string-name>
            <given-names>M.</given-names>
            <surname>Axton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Baak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Blomberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.-W.</given-names>
            <surname>Boiten</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. B. da Silva</given-names>
            <surname>Santos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. E.</given-names>
            <surname>Bourne</surname>
          </string-name>
          , et al.,
          <article-title>The FAIR Guiding Principles for scientific data management and stewardship</article-title>
          ,
          <source>Scientific Data</source>
          <volume>3</volume>
          (
          <year>2016</year>
          )
          <fpage>1</fpage>
          -
          <lpage>9</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>H.</given-names>
            <surname>Chiba</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Uchiyama</surname>
          </string-name>
          ,
          <article-title>SPANG: a SPARQL client supporting generation and reuse of queries for distributed RDF databases</article-title>
          ,
          <source>BMC Bioinformatics 18</source>
          (
          <year>2017</year>
          )
          <fpage>1</fpage>
          -
          <lpage>6</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>H.</given-names>
            <surname>Chiba</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Yamanaka</surname>
          </string-name>
          , S. Matsumoto, G2GML:
          <article-title>Graph to graph mapping language for bridging RDF and property graphs</article-title>
          , in: International Semantic Web Conference, Springer,
          <year>2020</year>
          , pp.
          <fpage>160</fpage>
          -
          <lpage>175</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>