<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Amsterdam, The Netherlands.
* Corresponding author.
$ piotr.sowinski.dokt@pw.edu.pl (P. Sowiński); maria.ganzha@pw.edu.pl (M. Ganzha)</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Realizing a Collaborative RDF Benchmark Suite in Practice</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Piotr Sowiński</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Maria Ganzha</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Warsaw University of Technology</institution>
          ,
          <addr-line>Pl. Politechniki 1, 00-661 Warsaw</addr-line>
          ,
          <country country="PL">Poland</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2024</year>
      </pub-date>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0002</lpage>
      <abstract>
        <p>Collaborative mechanisms allow benchmarks to be updated continuously and adjust to the changing requirements and new use cases. This paradigm is employed for example in the field of machine learning, but up until now there were no examples of truly open and collaborative benchmarks for RDF systems. In this demo paper we present the collaboration functionalities of RiverBench, an open, multi-task RDF benchmark suite. Owing to its fully open and community-driven design, RiverBench allows any researcher or practitioner to submit a new dataset or benchmark task, report performed benchmark runs, and edit any resource in the suite. RiverBench's collaboration system is itself based on RDF and Linked Data mechanisms, and every resource in the suite has machine-readable RDF metadata. The showcased functionalities together make up a first-of-a-kind fully open and collaborative RDF benchmark suite. These features are meant to encourage other researchers to contribute to RiverBench, and make it a long-term project sustained by the community.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;benchmark</kwd>
        <kwd>Resource Description Framework</kwd>
        <kwd>RiverBench</kwd>
        <kwd>collaboration</kwd>
        <kwd>streaming</kwd>
        <kwd>demonstration</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Benchmarking Resource Description Framework (RDF) systems is a popular topic in the Semantic
Web and Knowledge Graph research communities. Benchmarks not only allow one to compare how
diferent systems perform in certain scenarios, but they also help drive innovation, by identifying gaps
in performance and functionalities. Most often, a benchmark task is defined in a research paper, with
the dataset(s) and benchmark code attached as supplementary resources. However, it is very hard for
such benchmarks to evolve over time, for example, by adding new datasets or metrics contributed by
the community. Furthermore, the dispersed nature of these benchmarks makes it dificult to follow
consistent standards in benchmark specification, dataset distribution, and result reporting. This in turn
reflects negatively upon benchmark reproducibility and slows down progress in the field. We postulate
that these issues could be at least partially overcome with an open, collaborative benchmark suite
which could naturally evolve over time to suit the changing needs of the community, while maintaining
consistent standards.</p>
      <p>
        Although many benchmarks and benchmark suites for RDF systems were proposed, few of them
incorporate any collaborative aspects. The most prominent example here is the Linked Data Benchmark
Council (LDBC), which maintains several graph benchmarks, including RDF-based workloads (e.g., the
Social Network Benchmark [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]). Organizations and individuals can contribute to LDBC with a paid
membership and participation in its working groups. LDBC also gathers audited benchmark results, but
this process is also paid and generally reserved to large industry actors. Although LDBC benchmarks
are technically collaborative, their main target are production-ready commercial systems, and the
contribution process is complex. The only other example that we could find in the RDF community is
the now-defunct Liquid Benchmarks platform [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. It was a collaborative cloud application for running
and reporting benchmark results in a variety of tasks, including SPARQL query processing. However, as
the platform is not available anymore, it is dificult to assess what was the exact extent of its collaborative
capabilities.
      </p>
      <p>
        The idea of collaborative benchmarks is more popular in other areas of computer science, especially
in machine learning (ML) and natural language processing (NLP). For example, the GLUE benchmark
for natural language understanding [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] allows researchers to submit their benchmark results via a
form on the benchmark website. However, this only works for a fixed set of tasks defined in GLUE.
The NLP-Progress website [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] is a more open-ended project, where researchers are invited to report
their NLP benchmark results via pull requests on GitHub. The site then displays these results in the
form of HTML tables grouped by benchmark task and dataset, with no machine-readable content. The
BenchOpt benchmark suite for ML and optimization [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] goes a step further, by being tightly integrated
with a dedicated benchmark runner. The runner can semi-automatically report benchmark results (via
a GitHub pull request) to be displayed on the website. Other contributions (e.g., to the benchmark code)
are also welcome and can be submitted using pull requests.
      </p>
      <p>
        Contribution. In this demo paper we showcase the recently-developed collaborative features of
RiverBench [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], an open, multi-task benchmark suite for RDF systems. RiverBench is designed to be fully
open, transparent, and community-driven from the start. It allows submitting new benchmark tasks
and datasets, reporting benchmark results, and editing any other aspect of the suite. The collaboration
architecture is based on RDF metadata and Linked Data mechanisms, making every resource in
RiverBench machine-accessible. The presented approach goes beyond the current state of the art by being
(to the best of our knowledge) the first RDF benchmark suite with such extensive open collaboration
functionalities. It is also a major technical advancement over the aforementioned benchmark projects
in other fields of computer science, which do not fully employ the FAIR principles, machine-readable
RDF metadata, or the Linked Data mechanisms.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Collaborative Architecture of RiverBench</title>
      <p>RiverBench collects both benchmark tasks and datasets. Although RiverBench focuses on streaming
use cases (e.g., streaming RDF graphs over the network), it also includes non-streaming tasks, such
as loading RDF data into a triple store, or serializing an RDF graph to bytes. Benchmark profiles
group datasets with shared technical characteristics (e.g., datasets with RDF triples only, no quads), to
make dataset selection easier. Every resource in RiverBench has an HTML documentation page and
machine-readable RDF metadata, available via the content negotiation mechanism. The suite makes
extensive use of Continuous Integration / Continuous Deployment (CI/CD) scripts and GitHub’s free
infrastructure to automate most tasks.</p>
      <p>Benchmark datasets are added to the suite through a public community process, visualized in Figure 1.
Any practitioner or researcher is invited to submit their dataset, as long as it meets basic technical
requirements and can be published under a permissive license. Firstly, the contributor fills out the
dataset proposal form on GitHub, giving details about the dataset’s source, license, original use case,
etc. The proposal is then reviewed by RiverBench curators, who check if the dataset meets the
publiclyavailable criteria for new datasets (e.g., open license, properly specified authorship, suficient size, clear
use case) and if it would be a valuable addition. The curators are members of the RDF benchmarking
community, who besides reviewing dataset proposals also have a deciding voice in major decisions
shaping RiverBench. As of November 2024, there are four registered curators from three diferent
institutions1. After the proposal is accepted by the curators, a technical administrator creates a new
GitHub repository for the dataset, and instructs the contributor on how to proceed. The contributor
then uploads the dataset for processing and fills out a metadata file in the Turtle language, describing
the dataset in a structured manner. The dataset and the metadata are then automatically processed
by CI/CD: validating its contents, re-packaging the dataset in several formats, adding more metadata,
generating documentation pages, and publishing it on the website under a permanent URL (PURL).</p>
      <p>Proposing new benchmark tasks follows a very similar public community process. The contributor
also fills out a proposal form on GitHub, which is reviewed by the curators who check if the task</p>
      <sec id="sec-2-1">
        <title>1https://w3id.org/riverbench/v/dev/documentation/maintainers</title>
        <p>ttl
Contributor submits
a public dataset
proposal</p>
        <p>Curators review the
proposal and give
feedback</p>
        <p>Technical admin
creates a git
repository</p>
        <p>Contributor uploads
dataset source files
and metadata</p>
        <p>Validate</p>
        <p>Package
Add metadata
Generate docs</p>
        <p>Publish
CI/CD processes and
publishes the dataset
html rdf</p>
        <p>[PURL]
The dataset
is ready for
reuse
meets the pre-defined criteria. Afterwards, the contributor is asked to write down the description and
metadata of the task in a Turtle file. The new task is then automatically processed by CI/CD (validation,
packaging, documentation generation, publishing).</p>
        <p>The source code of tasks, datasets, profiles (groups of datasets), documentation pages, and other
resources of RiverBench are hosted on GitHub as Markdown or Turtle files in one of the several
repositories. To make editing these resources as easy as possible, every page on the RiverBench website
has an “Edit this page” button (visible at the top of Figure 2). The button immediately redirects the
user to the correct file on GitHub, where they can edit it and submit a pull request with the changes.
After the pull request is validated by a CI job and accepted by a curator or technical administrator, the
changes automatically appear on the website via a CI/CD pipeline.</p>
        <p>
          RiverBench also collects benchmark run reports and displays them on the website (Figures 2 and 3).
Each benchmark run report is a nanopublication [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] – a small unit of scientific knowledge encoded as
an RDF dataset. These nanopublications use the RiverBench ontology and the Informatics Research
Artifact Ontology (IRAO) [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] to structure the information on the used RiverBench task, profile (group
of datasets), benchmark code, evaluated systems, and more.
        </p>
        <p>
          Nanopublications with benchmark run reports can be submitted by any researcher with an
ORCID account to the decentralized nanopublication network. A nanopublication template for the web
application Nanodash [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] was prepared, which simplifies the process, requiring the user to only fill
out a web form (Figure 4). After the nanopublication is submitted, it will be automatically discovered
by RiverBench’s CI/CD. It is then turned into human-readable documentation on the suite’s website
(Figure 3), and republished in metadata dumps for easier reuse. This workflow is similar to a mechanism
proposed by the RDF Stream Taxonomy [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ], where nanopublications are used to create a “living
literature review” of RDF streaming.
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Demonstration</title>
      <p>In the live demonstration, we will use the public RiverBench website2 to briefly showcase the main
collaboration functionalities of RiverBench. The documentation pages linked in the footnotes contain
more details about each functionality.</p>
      <p>
        Reporting benchmark results. We will show how to report the results of benchmarks conducted
using RiverBench3. First, we will fill out the benchmark result form in Nanodash [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] with example
data and submit it. Then, we will show how it appears on the website and in the semantic metadata of
RiverBench.
      </p>
      <sec id="sec-3-1">
        <title>2https://w3id.org/riverbench 3https://w3id.org/riverbench/v/dev/documentation/reporting-results</title>
        <p>Editing documentation and metadata. We will briefly explain the mechanism of editing
documentation and the locations of edit buttons on the RiverBench website4. Then, we will demonstrate
editing a documentation page and the metadata of a resource (e.g., task or dataset). We will then show
to the participants the changes as they appear on the RiverBench website.</p>
        <p>Proposing new datasets. We will briefly explain the dataset proposal process 5, and show how to
ifll out a dataset proposal form and the metadata file.</p>
        <p>Proposing new benchmark tasks. Following a similar structure as with dataset proposals, we will
explain the task proposal process6. We will then show how to fill out the form and the metadata file.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusion</title>
      <p>In this demo paper, we showcased the collaborative features of RiverBench, an open RDF benchmark
suite. The presented four functionalities (reporting benchmark results, editing existing resources,
submitting datasets, and proposing benchmark tasks) are supported by comprehensive automation (CI/CD).
Additionally, RiverBench itself is a use case of knowledge management techniques, as it extensively
uses RDF metadata and Linked Data mechanisms, making this contribution doubly interesting and
relevant to the Semantic Web community. The features were designed to be easy to use, open to any
contributor, and rely on public and transparent community processes.</p>
      <p>To the best of our knowledge, RiverBench is the first RDF benchmark suite to implement collaborative
features so widely. We hope that these easy-to-use functionalities will encourage the Semantic Web and
Knowledge Graph communities to contribute to RiverBench and join the project as curators, ensuring
its long-term relevance and usefulness. The interest of the community ultimately depends on whether
the project fulfills an important need or not – in this case, if it useful for benchmarking RDF systems.
Currently, RiverBench can be used with nine diverse, scientifically relevant benchmark tasks, but this
is expected to be expanded in the future. Based on the early feedback from other researchers, we
plan to expand the suite’s infrastructure to support more use cases and tasks. We will also continue
improving the suite’s documentation and automation system, to match the changing expectations of
the community.</p>
      <p>
        In contrast to the now-defunct Liquid Benchmarks platform [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], RiverBench is hosted on permanently
free infrastructure (GitHub, w3id.org, and the nanopublication network), which should ensure its
longevity. While w3id.org and the nanopublication network are maintained by multiple independent
actors and therefore should be stable, GitHub could conceivably cease its operations in the future, or
stop providing free services. RiverBench is prepared for that eventuality, as all its resources are hosted
under permanent URLs, which can be easily redirected to a diferent service provider. Because the
core features of the suite only require a static file hosting server, the minimal migration would not be
problematic. The only code is executed as part of CI/CD pipelines, which could be ported to alternative
services (e.g., GitLab).
      </p>
      <p>Online Resources
• RiverBench website – the subject of the demo: https://w3id.org/riverbench
• RiverBench documentation: https://w3id.org/riverbench/v/dev/documentation
• Source code on GitHub: https://github.com/RiverBench</p>
      <sec id="sec-4-1">
        <title>4https://w3id.org/riverbench/v/dev/documentation/editing-docs 5https://w3id.org/riverbench/v/dev/documentation/creating-new-dataset 6https://w3id.org/riverbench/v/dev/documentation/creating-new-task</title>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>O.</given-names>
            <surname>Erling</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Averbuch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Larriba-Pey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Chafi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gubichev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Prat</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.-D. Pham</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Boncz</surname>
          </string-name>
          ,
          <article-title>The LDBC social network benchmark: Interactive workload</article-title>
          ,
          <source>in: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data</source>
          ,
          <year>2015</year>
          , pp.
          <fpage>619</fpage>
          -
          <lpage>630</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>S.</given-names>
            <surname>Sakr</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Casati</surname>
          </string-name>
          , Liquid Benchmarks:
          <article-title>towards an online platform for collaborative assessment of computer science research results, in: Performance Evaluation, Measurement and Characterization of Complex Systems: Second TPC Technology Conference</article-title>
          ,
          <string-name>
            <surname>TPCTC</surname>
          </string-name>
          <year>2010</year>
          , Singapore,
          <source>September 13-17</source>
          ,
          <year>2010</year>
          .
          <source>Revised Selected Papers 2</source>
          , Springer,
          <year>2011</year>
          , pp.
          <fpage>10</fpage>
          -
          <lpage>24</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Michael</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Hill</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Levy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. R.</given-names>
            <surname>Bowman</surname>
          </string-name>
          ,
          <string-name>
            <surname>GLUE:</surname>
          </string-name>
          <article-title>A multi-task benchmark and analysis platform for natural language understanding</article-title>
          ,
          <source>EMNLP</source>
          <year>2018</year>
          (
          <year>2018</year>
          )
          <fpage>353</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>S.</given-names>
            <surname>Ruder</surname>
          </string-name>
          , NLP-progress,
          <year>2024</year>
          . https://nlpprogress.com/,
          <source>accessed on 4 September</source>
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>T.</given-names>
            <surname>Moreau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Massias</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gramfort</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Ablin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.-A.</given-names>
            <surname>Bannier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Charlier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Dagréou</surname>
          </string-name>
          , T. Dupre la Tour, G. Durif,
          <string-name>
            <given-names>C. F.</given-names>
            <surname>Dantas</surname>
          </string-name>
          , et al.,
          <article-title>Benchopt: Reproducible, eficient and collaborative optimization benchmarks</article-title>
          ,
          <source>Advances in Neural Information Processing Systems</source>
          <volume>35</volume>
          (
          <year>2022</year>
          )
          <fpage>25404</fpage>
          -
          <lpage>25421</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>P.</given-names>
            <surname>Sowiński</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ganzha</surname>
          </string-name>
          , M. Paprzycki,
          <article-title>RiverBench: an open RDF streaming benchmark suite</article-title>
          ,
          <source>arXiv preprint arXiv:2305.06226</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>T.</given-names>
            <surname>Kuhn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. E.</given-names>
            <surname>Barbano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. L.</given-names>
            <surname>Nagy</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Krauthammer, Broadening the scope of nanopublications, in: The Semantic Web: Semantics and Big Data: 10th International Conference</article-title>
          , ESWC 2013, Montpellier, France, May
          <volume>26</volume>
          -30,
          <year>2013</year>
          . Proceedings 10, Springer,
          <year>2013</year>
          , pp.
          <fpage>487</fpage>
          -
          <lpage>501</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>V. B.</given-names>
            <surname>Nguyen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Svátek</surname>
          </string-name>
          ,
          <article-title>Ontology for informatics research artifacts</article-title>
          ,
          <source>in: European Semantic Web Conference</source>
          , Springer,
          <year>2021</year>
          , pp.
          <fpage>126</fpage>
          -
          <lpage>130</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>T.</given-names>
            <surname>Kuhn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Taelman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Emonet</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Antonatos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Soiland-Reyes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Dumontier</surname>
          </string-name>
          ,
          <article-title>Semantic microcontributions with decentralized nanopublication services</article-title>
          ,
          <source>PeerJ Computer Science</source>
          <volume>7</volume>
          (
          <year>2021</year>
          )
          <article-title>e387</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>P.</given-names>
            <surname>Sowiński</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Szmeja</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ganzha</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Paprzycki, RDF Stream Taxonomy: Systematizing RDF stream types in research and practice</article-title>
          ,
          <source>Electronics</source>
          <volume>13</volume>
          (
          <year>2024</year>
          )
          <fpage>2558</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>