rudof: A Rust Library for handling RDF data models and Shapes Jose-Emilio Labra-Gayo1 , Angel Iglesias-Préstamo1 , Diego Martín-Fernández1 and Marc-Antoine Arnaud2,∗,† 1 WESO Lab, University of Oviedo, Spain 2 Lum::invent, France Abstract In this paper we present rudof, a Rust library for RDF and RDF Data shapes. The library can be used as a command line tool but it can also be invoked from different systems. The flexibility of Rust enables the creation of binaries in Windows, Linux and Mac, as well as offering Python bindings and WebAssembly components. The library supports ShEx and SHACL, as well as other RDF data modeling languages like DCTAP, offering conversion mechanisms between them. It can also be used to generate UML-like visualizations of those data models. Keywords RDF, ShEx, SHACL, Data Quality, Rust 1. Introduction With the increasing adoption of RDF data shapes, there is a need for efficient libraries and tools which can be used to validate and process RDF. ShEx[1] was introduced in 2014 as a human-readable and concise language for RDF validation which was later adopted by Wikidata in 2019, and SHACL was accepted as a W3C recommendation in 20171 . Both ShEx and SHACL have been increasingly adopted to increase the quality of RDF data. At the same time, other technologies have appeared to help domain experts declare their expectations on RDF-based knowledge graphs like DCTAP2 , a tabular template that can be used to define data shapes. The Rust programming language3 offers some interesting features which are intended to increase safety with static typing while keeping performance competing with other low-level languages. Another advantage of Rust is the possibility of developing Python bindings that allow Python programmers to invoke Rust libraries that have better performance and even the Posters, Demos, and Industry Tracks at ISWC 2024, November 13–15, 2024, Baltimore, USA Envelope-Open labra@uniovi.es (J. Labra-Gayo); angel.iglesias.prestamo@gmail.com (A. Iglesias-Préstamo); diegomartinfnz@gmail.com (D. Martín-Fernández); marc-antoine.arnaud@luminvent.com (M. Arnaud) GLOBE http://labra.weso.es/ (J. Labra-Gayo); https://angelip2303.github.io/ (A. Iglesias-Préstamo); https://luminvent.com/ (M. Arnaud) Orcid 0000-0001-8907-5348 (J. Labra-Gayo); 0009-0004-0686-4341 (A. Iglesias-Préstamo); 0009-0003-6640-9474 (D. Martín-Fernández); 0009-0004-2130-3366 (M. Arnaud) © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). 1 https://www.w3.org/TR/shacl/ 2 https://www.dublincore.org/specifications/dctap/ 3 https://www.rust-lang.org/ CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings compilation to Web Assembly4 , that can interoperate with Javascript code. In this paper, we present a new library implemented in Rust, called rudof5 which offers support for both ShEx and SHACL, conversion between different data models like DCTAP, generation of UML-like visualizations and RDF data validation. The different components of the library are published at crates.io6 , the Rust module registry, and the library publishes binary releases in Windows, Linux, and MacOS, as well as Debian packages, Docker images, and Python bindings7 . 2. rudof modules and features The library consists of different modules which are also published as Rust crates. In order to minimize external dependencies to other libraries, we created a simple RDF trait called SRDF which offers the basic RDF functionalities required for validation (mainly accessing the neighborhood of a node). We provide two implementations of SRDF, one based on RDF files like Turtle, and another one based on SPARQL, which allows the library to validate RDF graphs obtained either from files or through SPARQL endpoints. The library defines other crates that contain the abstract syntax tree representation of both ShEx and SHACL called shex_ast and shacl_ast, as well as their corresponding parsers and validators. There is a module called shapes_converter which contains converters between different data models like DCTAP to ShEx, ShEx to UML visualizations, etc. A special module is rudof_cli, which implements the command line tool which is later published as a binary called rudof in different platforms like Linux, Mac and Windows. The library already supports the following features8 : • Show information about RDF data and convert between different formats like Turtle, NTriples, RDF/XML, etc. • Support for data shapes languages like ShEx and SHACL: show information about shapes and schemas, and validate RDF data to check conformance. • Parsing DCTAP data models and conversion to shapes schemas. As an example, Figure 1a contains a DCTAP file which can be obtained from an spreadsheet in CSV and figure 1b shows the result of converting it to ShEx. • Generating UML visualizations of shapes data models. As an example, Figure 1c shows the UML generated from 1b. • Generating HTML representations of those schemas, which can be useful when the schemas contain a large number of shapes. In these cases, the UML visualizations can be too big and become unusable, while representing each shape in its own web page makes it possible to browse the shapes in the schema. • Obtaining information about the neighborhood of a node in an RDF graph (either incoming or outgoing arcs), which can be useful to create a schema or to debug the validation results. 4 https://webassembly.org/ 5 https://rudof-project.github.io/rudof/ 6 https://crates.io/ 7 https://github.com/rudof-project/rudof/releases 8 The project Wiki page contains instructions and how-to guides • Other conversions: we are exploring the conversion between ShEx/SHACL and SPARQL as it is a feature that can be useful to create queries based on some shapes. ShapeId PropertyId Mandatory Repeatable valueDatatype valueShape Person name true false xsd:string birthdate false false xsd:date :Person enrolledIn false true Course Course name true false xsd:string :name : xsd:string :birthdate : xsd:date ? (a) DCTAP example 1 prefix : :enrolledIn 2 prefix xsd: * 3 :Person { :name xsd:string ; 4 :birthdate xsd:date ? ; :Course 5 :enrolledIn @:Course * } :name : xsd:string 6 :Course { :name xsd:string } (c) ShEx visualization (b) ShEx obtained from DCTAP example conversion 3. Performance benchmarks. Using Rust as an implementation language can improve the performance compared to Java or Python. Some preliminary benchmarks have been conducted9 to compare the tool against several state-of-the-art SHACL implementations. As of September 2, 2024, the tool demon- strates performance improvements in SHACL validation when compared to Apache Jena10 and TopQuadrant11 , although it exhibits longer execution times than rdf4j12 . In addition to these SHACL validation benchmarks, the Python bindings provided by rudof can also be used to compare performance with other Python libraries for SHACL like pySHACL [2]. In this case, measuring loading, parsing and validating time for the same graph. A summary of the performance results is available in Table 113 . Dataset rudof rdf4j Apache Jena TopQuadrant pyrudof pySHACL 10-LUBM 7.8971 1.6447 60.3583 85.7421 39,364.2842 72,227.2940 Table 1 Performance comparison of SHACL validation across state-of-the-art implementations against the rudof. Execution times are reported in milliseconds, using the LUBM dataset with the same SHACL shape across all comparisons. 9 https://github.com/weso/shacl-validation-benchmarks 10 https://jena.apache.org/ 11 https://github.com/TopQuadrant/shacl 12 https://rdf4j.org/ 13 More details about performance comparisons will be available at the rudof github repository 4. Related work. Although there are several libraries for ShEx, SHACL or DCTAP, most of them are focused on one of those technologies, and do not offer a common mechanism for conversion between shapes-based data models. The main exception could be the SHaclEX library14 written by the first author of this paper in Scala. In the Rust ecosystem, Sophia [3] is a toolkit for RDF and linked data which contains several traits although it does not support for shapes yet. Oxigraph15 is an graph database that supports SPARQL written in Rust that also publishes several crates related with RDF and doesn’t support shapes yet. Recently, an W3C RDF Rust Common Crates community group16 has been created and we are planning to align the dependencies with the expected results of that group. 5. Conclusions and future work Although rudof is still work-in-progress, we consider that it can fill a need in the RDF data shapes tools Rust ecosystem. The Rust programming language offers some advantages in terms of performance and memory safety. It also offers the possibility to generate binaries for different operating systems like Windows, Linux and Max, as well as Python bindings. We are exploring the use of the library in WebAssembly17 . Our goal is to gradually migrate the code of our RDFShape playground [4] which was implemented in React and Scala to a new version based on Web Assembly and Rust. References [1] E. Prud’hommeaux, J. E. Labra Gayo, H. Solbrig, Shape Expressions: An RDF Validation and Transformation Language, in: H. Sack, A. Filipowska, J. Lehmann, S. Hellmann (Eds.), Proceedings of the 10th International Conference on Semantic Systems, SEMANTICS 2014, Leipzig, Germany, September 4-5, 2014, ACM Press, 2014, pp. 32–40. doi:10.1145/2660517. 2660523 . [2] A. Sommer, N. Car, pyshacl, 2024. URL: https://doi.org/10.5281/zenodo.10958008. doi:10. 5281/zenodo.10958008 . [3] P.-A. Champin, Sophia: A Linked Data and Semantic Web toolkit for Rust, in: E. Wilde, M. Amundsen (Eds.), The Web Conference 2020: Developers Track, Taipei, TW, 2020. URL: https://www2020devtrack.github.io/site/schedule. [4] J. E. Labra Gayo, D. Fernández Álvarez, H. García-González, RDFShape: An RDF Playground Based on Shapes, in: Proceedings of the ISWC 2018 Posters and Demonstrations, Industry and Blue Sky Ideas Tracks, co-located with 17th International Semantic Web Conference, volume 2180 of CEUR Workshop Proceedings, 2018. 14 https://www.weso.es/shaclex/ 15 https://github.com/oxigraph/oxigraph 16 https://www.w3.org/community/r2c2/ 17 A prototype based on WebAssembly is available at https://uo271080.github.io/TFG_UO271080/