=Paper= {{Paper |id=Vol-3828/ISWC2024_paper_32 |storemode=property |title=rudof: A Rust Library for handling RDF data models and shapes |pdfUrl=https://ceur-ws.org/Vol-3828/paper32.pdf |volume=Vol-3828 |authors=Jose Emilio Labra Gayo,Ángel Iglesias Préstamo,Diego Martín Fernández,Marc Antoine Arnaud |dblpUrl=https://dblp.org/rec/conf/semweb/GayoPFA24 }} ==rudof: A Rust Library for handling RDF data models and shapes== https://ceur-ws.org/Vol-3828/paper32.pdf
                                rudof: A Rust Library for handling RDF data models
                                and Shapes
                                Jose-Emilio Labra-Gayo1 , Angel Iglesias-Préstamo1 , Diego Martín-Fernández1 and
                                Marc-Antoine Arnaud2,∗,†
                                1
                                    WESO Lab, University of Oviedo, Spain
                                2
                                    Lum::invent, France


                                              Abstract
                                              In this paper we present rudof, a Rust library for RDF and RDF Data shapes. The library can be used as a
                                              command line tool but it can also be invoked from different systems. The flexibility of Rust enables the
                                              creation of binaries in Windows, Linux and Mac, as well as offering Python bindings and WebAssembly
                                              components. The library supports ShEx and SHACL, as well as other RDF data modeling languages
                                              like DCTAP, offering conversion mechanisms between them. It can also be used to generate UML-like
                                              visualizations of those data models.

                                              Keywords
                                              RDF, ShEx, SHACL, Data Quality, Rust




                                1. Introduction
                                With the increasing adoption of RDF data shapes, there is a need for efficient libraries and
                                tools which can be used to validate and process RDF. ShEx[1] was introduced in 2014 as a
                                human-readable and concise language for RDF validation which was later adopted by Wikidata
                                in 2019, and SHACL was accepted as a W3C recommendation in 20171 . Both ShEx and SHACL
                                have been increasingly adopted to increase the quality of RDF data. At the same time, other
                                technologies have appeared to help domain experts declare their expectations on RDF-based
                                knowledge graphs like DCTAP2 , a tabular template that can be used to define data shapes.
                                   The Rust programming language3 offers some interesting features which are intended to
                                increase safety with static typing while keeping performance competing with other low-level
                                languages. Another advantage of Rust is the possibility of developing Python bindings that
                                allow Python programmers to invoke Rust libraries that have better performance and even the


                                Posters, Demos, and Industry Tracks at ISWC 2024, November 13–15, 2024, Baltimore, USA
                                Envelope-Open labra@uniovi.es (J. Labra-Gayo); angel.iglesias.prestamo@gmail.com (A. Iglesias-Préstamo);
                                diegomartinfnz@gmail.com (D. Martín-Fernández); marc-antoine.arnaud@luminvent.com (M. Arnaud)
                                GLOBE http://labra.weso.es/ (J. Labra-Gayo); https://angelip2303.github.io/ (A. Iglesias-Préstamo);
                                https://luminvent.com/ (M. Arnaud)
                                Orcid 0000-0001-8907-5348 (J. Labra-Gayo); 0009-0004-0686-4341 (A. Iglesias-Préstamo); 0009-0003-6640-9474
                                (D. Martín-Fernández); 0009-0004-2130-3366 (M. Arnaud)
                                            © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
                                1
                                  https://www.w3.org/TR/shacl/
                                2
                                  https://www.dublincore.org/specifications/dctap/
                                3
                                  https://www.rust-lang.org/




CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
compilation to Web Assembly4 , that can interoperate with Javascript code.
   In this paper, we present a new library implemented in Rust, called rudof5 which offers
support for both ShEx and SHACL, conversion between different data models like DCTAP,
generation of UML-like visualizations and RDF data validation. The different components
of the library are published at crates.io6 , the Rust module registry, and the library publishes
binary releases in Windows, Linux, and MacOS, as well as Debian packages, Docker images,
and Python bindings7 .


2. rudof modules and features
The library consists of different modules which are also published as Rust crates. In order
to minimize external dependencies to other libraries, we created a simple RDF trait called
SRDF which offers the basic RDF functionalities required for validation (mainly accessing the
neighborhood of a node). We provide two implementations of SRDF, one based on RDF files
like Turtle, and another one based on SPARQL, which allows the library to validate RDF graphs
obtained either from files or through SPARQL endpoints.
   The library defines other crates that contain the abstract syntax tree representation of both
ShEx and SHACL called shex_ast and shacl_ast, as well as their corresponding parsers and
validators. There is a module called shapes_converter which contains converters between
different data models like DCTAP to ShEx, ShEx to UML visualizations, etc. A special module is
rudof_cli, which implements the command line tool which is later published as a binary called
rudof in different platforms like Linux, Mac and Windows.
   The library already supports the following features8 :
    • Show information about RDF data and convert between different formats like Turtle,
      NTriples, RDF/XML, etc.
    • Support for data shapes languages like ShEx and SHACL: show information about shapes
      and schemas, and validate RDF data to check conformance.
    • Parsing DCTAP data models and conversion to shapes schemas. As an example, Figure 1a
      contains a DCTAP file which can be obtained from an spreadsheet in CSV and figure 1b
      shows the result of converting it to ShEx.
    • Generating UML visualizations of shapes data models. As an example, Figure 1c shows
      the UML generated from 1b.
    • Generating HTML representations of those schemas, which can be useful when the
      schemas contain a large number of shapes. In these cases, the UML visualizations can be
      too big and become unusable, while representing each shape in its own web page makes
      it possible to browse the shapes in the schema.
    • Obtaining information about the neighborhood of a node in an RDF graph (either incoming
      or outgoing arcs), which can be useful to create a schema or to debug the validation
      results.
4
  https://webassembly.org/
5
  https://rudof-project.github.io/rudof/
6
  https://crates.io/
7
  https://github.com/rudof-project/rudof/releases
8
  The project Wiki page contains instructions and how-to guides
       • Other conversions: we are exploring the conversion between ShEx/SHACL and SPARQL
         as it is a feature that can be useful to create queries based on some shapes.


     ShapeId   PropertyId    Mandatory    Repeatable     valueDatatype     valueShape
     Person    name          true         false          xsd:string
               birthdate     false        false          xsd:date                                   :Person
               enrolledIn    false        true                             Course
     Course    name          true         false          xsd:string                      :name : xsd:string
                                                                                         :birthdate : xsd:date ?
                                 (a) DCTAP example


1 prefix :                                                                       :enrolledIn
2 prefix xsd:                                                     *
3 :Person { :name       xsd:string ;
4           :birthdate xsd:date ? ;                                                                 :Course
5           :enrolledIn @:Course * }
                                                                                             :name : xsd:string
6 :Course { :name       xsd:string }

                                                                                         (c) ShEx visualization
                (b) ShEx obtained from DCTAP example conversion



3. Performance benchmarks.
Using Rust as an implementation language can improve the performance compared to Java
or Python. Some preliminary benchmarks have been conducted9 to compare the tool against
several state-of-the-art SHACL implementations. As of September 2, 2024, the tool demon-
strates performance improvements in SHACL validation when compared to Apache Jena10
and TopQuadrant11 , although it exhibits longer execution times than rdf4j12 . In addition to
these SHACL validation benchmarks, the Python bindings provided by rudof can also be used
to compare performance with other Python libraries for SHACL like pySHACL [2]. In this
case, measuring loading, parsing and validating time for the same graph. A summary of the
performance results is available in Table 113 .

       Dataset      rudof     rdf4j      Apache Jena       TopQuadrant          pyrudof          pySHACL
       10-LUBM      7.8971    1.6447       60.3583            85.7421          39,364.2842       72,227.2940
Table 1
Performance comparison of SHACL validation across state-of-the-art implementations against the rudof.
Execution times are reported in milliseconds, using the LUBM dataset with the same SHACL shape
across all comparisons.


9
 https://github.com/weso/shacl-validation-benchmarks
10
   https://jena.apache.org/
11
   https://github.com/TopQuadrant/shacl
12
   https://rdf4j.org/
13
   More details about performance comparisons will be available at the rudof github repository
4. Related work.
Although there are several libraries for ShEx, SHACL or DCTAP, most of them are focused
on one of those technologies, and do not offer a common mechanism for conversion between
shapes-based data models. The main exception could be the SHaclEX library14 written by the
first author of this paper in Scala.
   In the Rust ecosystem, Sophia [3] is a toolkit for RDF and linked data which contains several
traits although it does not support for shapes yet. Oxigraph15 is an graph database that supports
SPARQL written in Rust that also publishes several crates related with RDF and doesn’t support
shapes yet. Recently, an W3C RDF Rust Common Crates community group16 has been created
and we are planning to align the dependencies with the expected results of that group.


5. Conclusions and future work
Although rudof is still work-in-progress, we consider that it can fill a need in the RDF data
shapes tools Rust ecosystem. The Rust programming language offers some advantages in terms
of performance and memory safety. It also offers the possibility to generate binaries for different
operating systems like Windows, Linux and Max, as well as Python bindings. We are exploring
the use of the library in WebAssembly17 . Our goal is to gradually migrate the code of our
RDFShape playground [4] which was implemented in React and Scala to a new version based
on Web Assembly and Rust.


References
[1] E. Prud’hommeaux, J. E. Labra Gayo, H. Solbrig, Shape Expressions: An RDF Validation
    and Transformation Language, in: H. Sack, A. Filipowska, J. Lehmann, S. Hellmann (Eds.),
    Proceedings of the 10th International Conference on Semantic Systems, SEMANTICS 2014,
    Leipzig, Germany, September 4-5, 2014, ACM Press, 2014, pp. 32–40. doi:10.1145/2660517.
    2660523 .
[2] A. Sommer, N. Car, pyshacl, 2024. URL: https://doi.org/10.5281/zenodo.10958008. doi:10.
    5281/zenodo.10958008 .
[3] P.-A. Champin, Sophia: A Linked Data and Semantic Web toolkit for Rust, in: E. Wilde,
    M. Amundsen (Eds.), The Web Conference 2020: Developers Track, Taipei, TW, 2020. URL:
    https://www2020devtrack.github.io/site/schedule.
[4] J. E. Labra Gayo, D. Fernández Álvarez, H. García-González, RDFShape: An RDF Playground
    Based on Shapes, in: Proceedings of the ISWC 2018 Posters and Demonstrations, Industry
    and Blue Sky Ideas Tracks, co-located with 17th International Semantic Web Conference,
    volume 2180 of CEUR Workshop Proceedings, 2018.


14
   https://www.weso.es/shaclex/
15
   https://github.com/oxigraph/oxigraph
16
   https://www.w3.org/community/r2c2/
17
   A prototype based on WebAssembly is available at https://uo271080.github.io/TFG_UO271080/