<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Nara, Japan
* Corresponding author.
$ piotr@neverblink.eu (P. Sowiński); kacper.grzymkowski@neverblink.eu (K. Grzymkowski); anastasiya@neverblink.eu
(A. Danilenka)
 https://ostrzyciel.eu/ (P. Sowiński)</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Jelly-Patch: a Fast Format for Recording Changes in RDF Datasets</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Piotr Sowiński</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kacper Grzymkowski</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Anastasiya Danilenka</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Delete triple</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>NeverBlink</institution>
          ,
          <addr-line>ul. Wspólna 56, 00-684 Warsaw</addr-line>
          ,
          <country country="PL">Poland</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Warsaw University of Technology</institution>
          ,
          <addr-line>Pl. Politechniki 1, 00-661 Warsaw</addr-line>
          ,
          <country country="PL">Poland</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0002</lpage>
      <abstract>
        <p>Recording data changes in RDF systems is a crucial capability, needed to support auditing, incremental backups, database replication, and event-driven workflows. In large-scale and low-latency RDF applications, the high volume and frequency of updates can cause performance bottlenecks in the serialization and transmission of changes. To alleviate this, we propose Jelly-Patch - a high-performance, compressed binary serialization format for changes in RDF datasets. To evaluate its performance, we benchmark Jelly-Patch against existing RDF Patch formats, using two datasets representing diferent use cases (change data capture and IoT streams). Jelly-Patch is shown to achieve 3.5-8.9x better compression, and up to 2.5x and 4.6x higher throughput in serialization and parsing, respectively. These significant advancements in throughput and compression are expected to improve the performance of large-scale and low-latency RDF systems.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;RDF</kwd>
        <kwd>Change data capture</kwd>
        <kwd>Difs</kwd>
        <kwd>RDF Patch</kwd>
        <kwd>Databases</kwd>
        <kwd>Serialization format</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Large-scale RDF databases and low-latency streaming applications require the highest levels of
performance from serialization formats, so as not to bottleneck the system with slow serialization.
This includes OLTP databases working with very frequent or large transactions, which must record
the changes eficiently to quickly close each transaction. In streaming applications, small incremental
changes to RDF datasets are common, such as the sensor value update in the example above. In this
case, reducing the serialization/deserialization time and the size of the representation will result in
lower latency and decreased usage of computing resources.</p>
      <p>
        To answer these needs, in this work we introduce Jelly-Patch, an eficient binary serialization format
for RDF Patch. It is based on Jelly-RDF, a high-performance RDF serialization format [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. Jelly-Patch
was designed as a faster and more compressed alternative to the RDF Patch text format, aiming to
improve the scalability and responsiveness of RDF systems. Our contributions include: (1) an open
specification of the Jelly-Patch format; (2) open-source implementation in Java, integrated with Apache
Jena and RDF4J; and (3) performance evaluation comparing Jelly-Patch to alternative formats.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Jelly-Patch</title>
      <p>
        Jelly-Patch uses Protocol Bufers, a fast and widely used binary serialization framework [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. Jelly-Patch
reuses the base message types for triples, quads, IRIs, literals, etc. from Jelly-RDF. On top of these, RDF
Patch-specific messages were added for transactions, patch headers, and support for adding/deleting
prefixes. Jelly-Patch is defined in an open specification, 1 accompanied by an interface definition file
that can be used to generate serialization/deserialization code in any popular programming language.
The format covers the entirety of RDF 1.1, RDF-star, and generalized RDF.
      </p>
      <p>
        The base compression scheme in Jelly-Patch is the same as in Jelly-RDF [
        <xref ref-type="bibr" rid="ref7 ref9">7, 9</xref>
        ], making it possible to
reuse much of the same code between the two formats. The key diference between the two formats
is that while Jelly-RDF expresses a stream of statements, Jelly-Patch describes a stream of changes,
which can be thought of as a derivative of a stream. A single Jelly-Patch stream can contain many
patches. Compression works in a fully streaming manner over the entire stream, which is especially
advantageous for CDC workloads. For example, if a given IRI is present in a patch earlier in the stream,
later patches can refer to the same IRI through a streaming lookup table, reducing file size.
      </p>
      <p>We implemented Jelly-Patch in Java, as part of the Jelly-JVM library,2 licensed under Apache 2.0.
The core serialization code is generic and can be integrated with any RDF library for the Java Virtual
Machine. We integrated it fully with Apache Jena, including high-level APIs on par with Jena’s internal
serialization formats. We also implemented a low-level integration with RDF4J, which is limited due to
RDF4J not supporting RDF Patch natively.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Evaluation</title>
      <p>We evaluated Jelly-Patch in terms of its serialized representation size, serialization throughput, and
deserialization throughput. The benchmarks were performed with two datasets, representing very
diferent use cases: change data capture of an RDF database, and streaming IoT sensor data.</p>
      <sec id="sec-3-1">
        <title>3.1. Datasets</title>
        <p>
          The change data capture dataset (bsbm-cdc) was created using the Berlin SPARQL Benchmark
(BSBM) [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] test driver, running the update workload against the RDF Delta server 1.1.2 [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. The BSBM
data generator was configured with a scale factor of 100,000, and a transaction count of 100,000. The
BSBM test driver executed 90,000 query mixes, each consisting of 2 insert queries and 3 delete queries. A
single-node RDF Delta setup was used to capture the changes to RDF Patch files. The resulting patches
were then combined into a single dataset.
1https://w3id.org/jelly/dev/specification/patch
2https://w3id.org/jelly/jelly-jvm/
        </p>
        <p>
          The streaming IoT sensor dataset (assist-iot-weather) is based on a RiverBench [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] dataset,3
which consists of sensor readings from an IoT weather station. It was processed by calculating the
rolling diference between consecutive timestamped graphs and writing the diferences as patches.
        </p>
        <p>
          Table 1 presents key statistics about both datasets. In this case, we consider a patch to correspond to
exactly one transaction. An operation is a single row in an RDF Patch file (e.g., transaction begin, triple
add/delete, header). The datasets are publicly available on Zenodo under the CC BY 4.0 license [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ].
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Benchmark Setup</title>
        <p>
          Benchmarks were implemented on top of Apache Jena 5.3.0, using Jelly-JVM 3.4.0. We used the following
Jelly-Patch settings: name table size 4000, prefix table size 1024, frame size 512. We compared it against
formats built into Jena: RDF Patch text (same as described in the RDF Patch specification), RDF Patch
binary (based on Jena’s binary Thrift format [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]), and SPARQL Update (INSERT/DELETE DATA). All
tested methods are based on the same underlying Jena APIs, programming language, and execution
environment, to make the comparison as fair as possible. Similarly to Jelly-Patch, the RDF Patch formats
built into Jena are based on the already heavily optimized implementations of N-Quads and Jena Binary
formats that are used in production environments.
        </p>
        <p>For serialization, an in-memory list of changes was sent to the serializer writing to a blackhole output
stream, discarding the written bytes. For deserialization, the parser was reading from an in-memory
byte array, containing pre-serialized data. The results were sent to a blackhole change handler.</p>
        <p>
          The benchmark was implemented using the Java Microbenchmark Harness (JMH) 1.37, an
industrystandard benchmarking tool that accounts for JVM warmup and dead-code elimination using JVM
blackholes, ensuring fair results [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ]. JMH was run with parameters: -f7 -wi 10 -i 20 -gc true
in single-shot mode. Test bench used: AMD Threadripper 7960X (24-core, 4.2 GHz); 256 GB RAM (DDR5
4800 MHz); Linux 6.8.0; Oracle GraalVM 24.0.2+11.1. The disk was not used during the benchmarks.
Benchmarks were single-threaded, but the JVM was allowed to use the remaining threads for garbage
collection and JIT compilation. Full benchmark code and scripts can be found on GitHub4.
        </p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Results</title>
        <p>Figure 1 compares the serialized representation size of the four formats, with RDF Patch text being used
as the baseline. RDF Patch binary does not employ any compression, and results in files ∼ 20% larger
than with the text format. Jena’s SPARQL Update implementation also does not use any compression,
with files 21–33% larger than the text format. Jelly-Patch is the only format employing any compression,
reducing the file size by 3.5x for bsbm-cdc, and by 8.9x for assist-iot-weather.</p>
        <p>The observed diference in compression ratios is due to bsbm-cdc using long literals and a very large
pool of quickly changing IRIs (e.g., 100,000 products). Jelly-Patch does not apply binary compression to
literals – to alleviate this, the file would have to be additionally compressed with, for example, gzip.
Conversely, in assist-iot-weather the literals are small and IRIs repeat often, making them prone
to Jelly’s compression mechanisms.</p>
        <p>The compressed weather dataset when saved as a change stream in Jelly-Patch takes up only 279.6
MB, as compared to 1513.8 MB of the original RiverBench dataset saved in Jelly-RDF, while not losing
3https://w3id.org/riverbench/datasets/assist-iot-weather-graphs/1.0.3
4https://github.com/Jelly-RDF/jvm-benchmarks</p>
        <p>bsbm-cdc
t
e
s
a
t
a
D
assist-iot-weather
11.2%
28.9%
any information. This is a size reduction of 5.4x, made possible by applying diferential compression
(stream derivative) on top of an already well-compressed file. This highlights the great potential of
dif-based formats in streaming use cases, which we will investigate further in future research.</p>
        <p>Figure 2 presents the serialization and deserialization throughput results. SPARQL Update does
not have a RDF Patch-compatible parser, so only its serialization speed was tested, with it being by
far the slowest format. In bsbm-cdc, Jelly-Patch is only ∼ 5% faster at serializing than Jena’s binary
format. This is due to the dataset being a pessimistic case for Jelly-Patch, with a lot of incompressible
data (long strings) and few repeating structures. Nonetheless, Jelly-Patch delivers over 4x better
compression at nearly the same serialization throughput, making it much more advantageous. In
assist-iot-weather, Jelly-Patch has 2.5x faster serialization than Jena’s binary format.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusion and Future Work</title>
      <p>In this work, we introduce Jelly-Patch, a fast binary format for recording changes in RDF datasets. In the
conducted experiments, it achieved 3.5–8.9x better compression than other RDF Patch formats, while
being up to 2.5x faster to serialize, and up to 4.6x faster to parse. This is a significant diference that is
expected to greatly benefit large-scale RDF systems, especially in networked or distributed settings. The
format is not only useful in CDC scenarios, but also for representing streams of sensor data, achieving
5.4x better compression than the already well-compressed Jelly-RDF.</p>
      <p>
        We plan to continue optimizing both Jelly-RDF and Jelly-Patch through new compression schemes
and improvements in implementation eficiency. Also planned are: implementations for Python and
Rust, conformance test cases, and extending the jelly-cli with commands for Jelly-Patch. Extensive
benchmarks should be performed, using more datasets, which can be contributed to RiverBench [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ].
      </p>
      <p>This work was financially supported from the European Funds under the Sector Agnostic path of the
Huge Thing Startup Booster program (project no. 0021/2025, program FENG.02.28-IP.02-0006/23).</p>
      <p>Declaration on Generative AI. During the preparation of this work, the authors used ChatGPT in
order to draft content. After using this tool, the authors reviewed and edited the content as needed and
take full responsibility for the publication’s content.</p>
      <p>
        Jelly-Patch specification: https://w3id.org/jelly/dev/specification/patch
Jelly-JVM documentation: https://w3id.org/jelly/jelly-jvm
Datasets and full results: https://doi.org/10.5281/zenodo.16498682 [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]
Community contribution guide: https://w3id.org/jelly/dev/contributing
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>M.</given-names>
            <surname>Kleppmann</surname>
          </string-name>
          ,
          <article-title>Designing data-intensive applications: The big ideas behind reliable, scalable, and maintainable systems,</article-title>
          <string-name>
            <surname>O'Reilly Media</surname>
          </string-name>
          , Inc.,
          <year>2017</year>
          , p.
          <fpage>454</fpage>
          . Chapter 11:
          <article-title>Change Data Capture</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>J.</given-names>
            <surname>Mynarz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Sotona</surname>
          </string-name>
          ,
          <article-title>Change data capture of large-scale RDF data</article-title>
          .,
          <source>in: SEMANTiCS (Posters &amp; Demos)</source>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>T.</given-names>
            <surname>Berners-Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Connolly</surname>
          </string-name>
          ,
          <article-title>Delta: an ontology for the distribution of diferences between RDF graphs</article-title>
          , World Wide Web, http://www.w3.org/DesignIssues/Dif 4 (
          <year>2004</year>
          )
          <fpage>4</fpage>
          -
          <lpage>3</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>A.</given-names>
            <surname>Sambra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bertails</surname>
          </string-name>
          , P.-A. Champin,
          <source>Linked Data Patch Format</source>
          ,
          <year>2015</year>
          . Https://www.w3.org/TR/2015/NOTE-ldpatch-
          <volume>20150728</volume>
          /.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>E.</given-names>
            <surname>Prud</surname>
          </string-name>
          <article-title>'hommeaux</article-title>
          , SparqlPatch,
          <year>2014</year>
          . https://www.w3.org/2001/sw/wiki/SparqlPatch, accessed on
          <issue>28</issue>
          <year>July 2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>A.</given-names>
            <surname>Seaborne</surname>
          </string-name>
          , RDF Delta,
          <year>2025</year>
          . https://afs.github.io/rdf-delta/,
          <source>accessed on 28 July</source>
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>P.</given-names>
            <surname>Sowiński</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Bogacka</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Danilenka</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Kozlov</surname>
          </string-name>
          ,
          <article-title>Jelly: a fast and convenient RDF serialization format</article-title>
          ,
          <source>arXiv preprint arXiv:2506</source>
          .11298, SEMANTiCS 2025 Developers Workshop,
          <year>September 03</year>
          ,
          <year>2025</year>
          , Vienna, Austria (
          <year>2025</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Google</surname>
            <given-names>LLC</given-names>
          </string-name>
          , Protocol Bufers Contributors, Protocol Bufers,
          <year>2025</year>
          . https://protobuf.dev/, accessed on
          <issue>31</issue>
          <year>July 2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>P.</given-names>
            <surname>Sowiński</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Wasielewska-Michniewska</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ganzha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Paprzycki</surname>
          </string-name>
          , et al.,
          <string-name>
            <surname>Eficient</surname>
            <given-names>RDF</given-names>
          </string-name>
          <article-title>streaming for the edge-cloud continuum</article-title>
          ,
          <source>in: 2022 IEEE 8th World Forum on Internet of Things (WF-IoT)</source>
          , IEEE,
          <year>2022</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>8</lpage>
          . doi:
          <volume>10</volume>
          .1109/WF-IoT54382.
          <year>2022</year>
          .
          <volume>10152225</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>C.</given-names>
            <surname>Bizer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Schultz</surname>
          </string-name>
          , The Berlin SPARQL benchmark,
          <source>International Journal on Semantic Web and Information Systems</source>
          <volume>5</volume>
          (
          <year>2009</year>
          )
          <fpage>1</fpage>
          -
          <lpage>24</lpage>
          . URL: http://dx.doi.org/10.4018/jswis.2009040101. doi:
          <volume>10</volume>
          . 4018/jswis.2009040101.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>P.</given-names>
            <surname>Sowiński</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ganzha</surname>
          </string-name>
          ,
          <article-title>Realizing a collaborative RDF benchmark suite in practice</article-title>
          ,
          <source>arXiv preprint arXiv:2410.12965, 24th International Conference on Knowledge Engineering and Knowledge Management (EKAW</source>
          <year>2024</year>
          ),
          <fpage>26</fpage>
          -28
          <source>November</source>
          <year>2024</year>
          , Amsterdam, Netherlands (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>P.</given-names>
            <surname>Sowiński</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Grzymkowski</surname>
          </string-name>
          ,
          <article-title>Datasets and results for Jelly-Patch benchmarks</article-title>
          ,
          <year>2025</year>
          . URL: https: //doi.org/10.5281/zenodo.16498682. doi:
          <volume>10</volume>
          .5281/zenodo.16498682.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>Apache</given-names>
            <surname>Software</surname>
          </string-name>
          <string-name>
            <surname>Foundation</surname>
          </string-name>
          ,
          <source>RDF binary using Apache Thrift</source>
          ,
          <year>2025</year>
          . URL: https://jena.apache. org/documentation/io/rdf-binary.
          <source>html, accessed on 12 June</source>
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>OpenJDK</given-names>
            <surname>Contributors</surname>
          </string-name>
          ,
          <source>Java Microbenchmark Harness (JMH)</source>
          ,
          <year>2025</year>
          . https://github.com/openjdk/ jmh, accessed on
          <issue>31</issue>
          <year>July 2025</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>