<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>rudof: A Rust Library for handling RDF data models and Shapes</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jose-Emilio Labra-Gayo</string-name>
          <email>labra@uniovi.es</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Angel Iglesias-Préstamo</string-name>
          <email>angel.iglesias.prestamo@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Diego Martín-Fernández</string-name>
          <email>diegomartinfnz@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marc-Antoine Arnaud</string-name>
          <email>marc-antoine.arnaud@luminvent.com</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>SHACL</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Data Quality</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>WESO Lab, University of Oviedo</institution>
          ,
          <country country="ES">Spain</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In this paper we present rudof, a Rust library for RDF and RDF Data shapes. The library can be used as a command line tool but it can also be invoked from diferent systems. The flexibility of Rust enables the creation of binaries in Windows, Linux and Mac, as well as ofering Python bindings and WebAssembly components. The library supports ShEx and SHACL, as well as other RDF data modeling languages like DCTAP, ofering conversion mechanisms between them. It can also be used to generate UML-like visualizations of those data models.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>CEUR
ceur-ws.org</p>
    </sec>
    <sec id="sec-2">
      <title>1. Introduction</title>
      <p>
        With the increasing adoption of RDF data shapes, there is a need for eficient libraries and
tools which can be used to validate and process RDF. ShEx[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] was introduced in 2014 as a
human-readable and concise language for RDF validation which was later adopted by Wikidata
in 2019, and SHACL was accepted as a W3C recommendation in 20171. Both ShEx and SHACL
have been increasingly adopted to increase the quality of RDF data. At the same time, other
technologies have appeared to help domain experts declare their expectations on RDF-based
knowledge graphs like DCTAP2, a tabular template that can be used to define data shapes.
      </p>
      <p>The Rust programming language3 ofers some interesting features which are intended to
increase safety with static typing while keeping performance competing with other low-level
languages. Another advantage of Rust is the possibility of developing Python bindings that
allow Python programmers to invoke Rust libraries that have better performance and even the
nEvelop-O
LGOBE
compilation to Web Assembly4, that can interoperate with Javascript code.</p>
      <p>In this paper, we present a new library implemented in Rust, called rudof5 which ofers
support for both ShEx and SHACL, conversion between diferent data models like DCTAP,
generation of UML-like visualizations and RDF data validation. The diferent components
of the library are published at crates.io6, the Rust module registry, and the library publishes
binary releases in Windows, Linux, and MacOS, as well as Debian packages, Docker images,
and Python bindings7.</p>
    </sec>
    <sec id="sec-3">
      <title>2. rudof modules and features</title>
      <p>The library consists of diferent modules which are also published as Rust crates. In order
to minimize external dependencies to other libraries, we created a simple RDF trait called
SRDF which ofers the basic RDF functionalities required for validation (mainly accessing the
neighborhood of a node). We provide two implementations of SRDF, one based on RDF files
like Turtle, and another one based on SPARQL, which allows the library to validate RDF graphs
obtained either from files or through SPARQL endpoints.</p>
      <p>The library defines other crates that contain the abstract syntax tree representation of both
ShEx and SHACL called shex_ast and shacl_ast, as well as their corresponding parsers and
validators. There is a module called shapes_converter which contains converters between
diferent data models like DCTAP to ShEx, ShEx to UML visualizations, etc. A special module is
rudof_cli, which implements the command line tool which is later published as a binary called
rudof in diferent platforms like Linux, Mac and Windows.</p>
      <p>The library already supports the following features8:
• Show information about RDF data and convert between diferent formats like Turtle,</p>
      <p>NTriples, RDF/XML, etc.
• Support for data shapes languages like ShEx and SHACL: show information about shapes
and schemas, and validate RDF data to check conformance.
• Parsing DCTAP data models and conversion to shapes schemas. As an example, Figure 1a
contains a DCTAP file which can be obtained from an spreadsheet in CSV and figure 1b
shows the result of converting it to ShEx.
• Generating UML visualizations of shapes data models. As an example, Figure 1c shows
the UML generated from 1b.
• Generating HTML representations of those schemas, which can be useful when the
schemas contain a large number of shapes. In these cases, the UML visualizations can be
too big and become unusable, while representing each shape in its own web page makes
it possible to browse the shapes in the schema.
• Obtaining information about the neighborhood of a node in an RDF graph (either incoming
or outgoing arcs), which can be useful to create a schema or to debug the validation
results.
4https://webassembly.org/
5https://rudof-project.github.io/rudof/
6https://crates.io/
7https://github.com/rudof-project/rudof/releases
8The project Wiki page contains instructions and how-to guides
• Other conversions: we are exploring the conversion between ShEx/SHACL and SPARQL
as it is a feature that can be useful to create queries based on some shapes.</p>
      <sec id="sec-3-1">
        <title>ShapeId</title>
      </sec>
      <sec id="sec-3-2">
        <title>Person</title>
      </sec>
      <sec id="sec-3-3">
        <title>Course</title>
      </sec>
      <sec id="sec-3-4">
        <title>PropertyId</title>
        <p>name
birthdate
enrolledIn
name</p>
      </sec>
      <sec id="sec-3-5">
        <title>Mandatory</title>
        <p>true
false
false
true</p>
        <p>Repeatable
false
false
true
false
valueDatatype
xsd:string
xsd:date
xsd:string
valueShape
Course
(a) DCTAP example
1 prefix : &lt;http://example.org/&gt;
2 prefix xsd: &lt;http://www.w3.org/2001/XMLSchema#&gt;
3 :Person { :name xsd:string ;
4 :birthdate xsd:date ? ;
5 :enrolledIn @:Course * }
6 :Course { :name xsd:string }
(b) ShEx obtained from DCTAP example conversion
:Person
:name : xsd:string
:birthdate : xsd:date ?
:enrolledIn
*
:Course
:name : xsd:string
(c) ShEx visualization</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>3. Performance benchmarks.</title>
      <p>
        Using Rust as an implementation language can improve the performance compared to Java
or Python. Some preliminary benchmarks have been conducted9 to compare the tool against
several state-of-the-art SHACL implementations. As of September 2, 2024, the tool
demonstrates performance improvements in SHACL validation when compared to Apache Jena10
and TopQuadrant11, although it exhibits longer execution times than rdf4j12. In addition to
these SHACL validation benchmarks, the Python bindings provided by rudof can also be used
to compare performance with other Python libraries for SHACL like pySHACL [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. In this
case, measuring loading, parsing and validating time for the same graph. A summary of the
performance results is available in Table 113.
      </p>
      <sec id="sec-4-1">
        <title>Dataset</title>
        <p>10-LUBM
rudof
7.8971
rdf4j
1.6447</p>
      </sec>
      <sec id="sec-4-2">
        <title>Apache Jena</title>
        <p>60.3583</p>
      </sec>
      <sec id="sec-4-3">
        <title>TopQuadrant</title>
        <p>85.7421
pyrudof
39,364.2842
pySHACL
72,227.2940
9https://github.com/weso/shacl-validation-benchmarks
10https://jena.apache.org/
11https://github.com/TopQuadrant/shacl
12https://rdf4j.org/
13More details about performance comparisons will be available at the rudof github repository</p>
        <p>Although there are several libraries for ShEx, SHACL or DCTAP, most of them are focused
on one of those technologies, and do not ofer a common mechanism for conversion between
shapes-based data models. The main exception could be the SHaclEX library14 written by the
ifrst author of this paper in Scala.</p>
        <p>
          In the Rust ecosystem, Sophia [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] is a toolkit for RDF and linked data which contains several
traits although it does not support for shapes yet. Oxigraph15 is an graph database that supports
SPARQL written in Rust that also publishes several crates related with RDF and doesn’t support
shapes yet. Recently, an W3C RDF Rust Common Crates community group16 has been created
and we are planning to align the dependencies with the expected results of that group.
        </p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusions and future work</title>
      <p>
        Although rudof is still work-in-progress, we consider that it can fill a need in the RDF data
shapes tools Rust ecosystem. The Rust programming language ofers some advantages in terms
of performance and memory safety. It also ofers the possibility to generate binaries for diferent
operating systems like Windows, Linux and Max, as well as Python bindings. We are exploring
the use of the library in WebAssembly17. Our goal is to gradually migrate the code of our
RDFShape playground [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] which was implemented in React and Scala to a new version based
on Web Assembly and Rust.
14https://www.weso.es/shaclex/
15https://github.com/oxigraph/oxigraph
16https://www.w3.org/community/r2c2/
17A prototype based on WebAssembly is available at https://uo271080.github.io/TFG_UO271080/
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>E.</given-names>
            <surname>Prud</surname>
          </string-name>
          <article-title>'hommeaux</article-title>
          ,
          <string-name>
            <given-names>J. E. Labra</given-names>
            <surname>Gayo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Solbrig</surname>
          </string-name>
          , Shape Expressions:
          <article-title>An RDF Validation and Transformation Language</article-title>
          , in: H.
          <string-name>
            <surname>Sack</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Filipowska</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Lehmann</surname>
          </string-name>
          , S. Hellmann (Eds.),
          <source>Proceedings of the 10th International Conference on Semantic Systems, SEMANTICS</source>
          <year>2014</year>
          , Leipzig, Germany, September 4-
          <issue>5</issue>
          ,
          <year>2014</year>
          , ACM Press,
          <year>2014</year>
          , pp.
          <fpage>32</fpage>
          -
          <lpage>40</lpage>
          . doi:
          <volume>10</volume>
          .1145/2660517. 2660523.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>A.</given-names>
            <surname>Sommer</surname>
          </string-name>
          , N. Car, pyshacl,
          <year>2024</year>
          . URL: https://doi.org/10.5281/zenodo.10958008. doi:
          <volume>10</volume>
          . 5281/zenodo.10958008.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>P.-A.</given-names>
            <surname>Champin</surname>
          </string-name>
          ,
          <article-title>Sophia: A Linked Data and Semantic Web toolkit for Rust</article-title>
          , in: E. Wilde, M. Amundsen (Eds.),
          <source>The Web Conference</source>
          <year>2020</year>
          : Developers Track, Taipei,
          <string-name>
            <surname>TW</surname>
          </string-name>
          ,
          <year>2020</year>
          . URL: https://www2020devtrack.github.io/site/schedule.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>J. E. Labra</given-names>
            <surname>Gayo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. Fernández</given-names>
            <surname>Álvarez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>García-González</surname>
          </string-name>
          ,
          <source>RDFShape: An RDF Playground Based on Shapes, in: Proceedings of the ISWC</source>
          <year>2018</year>
          <article-title>Posters and Demonstrations, Industry and Blue Sky Ideas Tracks, co-located with 17th International Semantic Web Conference</article-title>
          , volume
          <volume>2180</volume>
          <source>of CEUR Workshop Proceedings</source>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>