<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Benchmarking SPARQL Engines on Wikidata Queries</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Peter F. Patel-Schneider</string-name>
        </contrib>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <abstract>
        <p>Four open-source SPARQL engines are evaluated on three existing and one new benchmarks for queries against Wikidata, a large community-built knowledge graph with wide usage. Of the engines benchmarked-Blazegraph, MillenniumDB, QLever, and Virtuoso-QLever is the fastest. Blazegraph, which is the SPARQL engine used in the oficial Wikidata Query Service, is significantly slower than some other engines. All of the engines have deviations from the SPARQL standard.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>shows that other modern SPARQL query engines are also much faster than Blazegraph on some Wikidata
queries. There are some first-party benchmarks showing that modern SPARQL engines, including
MillenniumDB [11] and QLever [8, 12] are faster than Blazegraph on the Wikidata RDF dump, but no
large third-party comparison of the engines.</p>
      <p>To better test the performance of modern SPARQL engines over Blazegraph an efort to benchmark
the query performance of several open-source SPARQL engines on the entire Wikidata RDF dump was
undertaken. This is only a part of what is needed in a replacement for Wikidata but is an important
part. The analysis of the benchmark results here was designed to be more useful in determining overall
performance of a service and not so much designed to determine expected performance as seen by
users of the service.</p>
      <p>Three open-source systems that were known to be able to reasonably load Wikidata RDF dumps
and run SPARQL queries on them were selected. These systems are MillenniumDB [11], QLever, and
Virtuoso Open Source [13]. Three existing Wikidata benchmarks were selected and a new benchmark
based on Scholia [14] was created. An October 2024 RDF dump of Wikidata was loaded into each of the
modern engines and Blazegraph. The benchmarks were run on all four engines and their performance
is reported and analyzed here. More information about the benchmarking, including the benchmarks
and all code used, is avilable at https://github.com/wikius/benchmark-wikidata.</p>
      <p>The closest third-party study of SPARQL engines on Wikidata was performed by Lam et al [15]. They
tested the query performance of several SPARQL engines, including an earlier version of QLever, on
Wikidata, using 328 sample queries. This early version of QLever performed poorly in their testing.
They did not test any of the other performant open-source SPARQL engines and QLever has undergone
major improvements since their study.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Wikidata in RDF</title>
      <p>There is an encoding of Wikidata into RDF, and RDF dumps of Wikidata are made weekly. There are
two diferent kinds of dumps. One kind includes only truthy statements (triples), that is, statements
without their qualifiers and other information, no deprecated statements, and normal rank statements
only if there is no preferred rank statement for the same subject and predicate. The other kind of dump
is a full dump that has both truthy statements and a complex encoding of all statements that includes
the rank, qualifers, and other information about each statement. As of October 2024, the full dumps of
Wikidata had about 20 billion triples. The full dumps in Turtle [16] were over 100GB compressed and
over 850GB uncompressed.</p>
      <p>There are public services that evaluate SPARQL queries against full dumps of Wikidata for all four
of the SPARQL engines selected. The oficial service, that uses Blazegraph, has the most up-to-date
information, generally lagging by only a few seconds as updates to Wikidata are processed and then
incorporated into its RDF graph. The QLever service uses similar information and also lags only slightly.
The QLever service lags by around a week, as it can process the weekly dumps in well under a day. The
MillenniumDB service uses the weekly dumps and thus lags by somewhat over a week. The data used
by the Virtuoso service is only updated irregularly and can lag by months.</p>
    </sec>
    <sec id="sec-3">
      <title>3. The Benchmarks</title>
      <p>Three existing benchmarks were selected. These were chosen to provide a varied set of queries with
diferent selection criteria and dificulty.</p>
      <p>WGPB [17] consists of 50 instantiations of 17 simple1 query patterns. A pattern is, in essence, a
small graph whose nodes are shared variables in a set of SPARQL BGPs. Each pattern is instantiated by
picking Wikidata properties for each edge and constructing the BGPs, which are then expanded into a
full SPARQL query. Finally a LIMIT 1000 is added, resulting in 850 SPARQL queries.
1A simple query here is one with only one SPARQL consruct or a small number of similar SPARQL constructs. A complex
query has several diferent SPARQL constructs.</p>
      <p>Figure 1: Part of Scholia Page for Richard Feynman (Q39246)</p>
      <p>WDBench [18] consists of query fragments from the anonymized Wikidata SPARQL Logs2 evaluated
by the Wikidata Query Service in 2017 and 2018 [19]. The queries used were chosen from those that
had timed out. The BGPs, property paths, and some other portions of the queries were extracted and
categorized into those with a single BGP (280 queries), those with multiple BGPs (681 queries), those
with OPTIONAL clauses (498 queries), those with property paths (660 queries), and those that did not
ift into any of the above categories (539 queries). Each of these five sets of query fragments are treated
as a single benchmark here.</p>
      <p>These query fragments have to be expanded into full queries by adding the SELECT portion. The
query fragments do not retain FILTER or other limits on the size of the answer set and can return
very large answer sets. The original benchmarking thus added a LIMIT 100000 to limit the number
of answers. To stress modern SPARQL engines this benchmarking arbitrarily uses instead LIMIT
10000000.</p>
      <p>WDQS [12] consists of a set of 298 queries extracted from Wikidata Query Service logs. This
benchmark was used to evaluate the comparative performance of several SPARQL engines. Several
of the queries return hundreds of millions of answers. For these a LIMIT 10000000 is added to the
query here.</p>
      <p>A new benchmark was created from the queries used by the Scholia [14] interface to Wikidata.
This interface is designed to show information related to scholarly articles. A request for Scholia
information is in the form of one, usually, but sometimes more, Wikidata identifiers. The class(es) of
these identifier(s) in Wikidata are determined and a template HTML document is selected based on the
class(es). There is a default template if there is no template specifically for the type(s). The template
document has sections that are replaced by information constructed from the results of SPARQL queries
constructed by inserting the identifier(s) in a query template.</p>
      <p>For example, a Scholia request for Wikidata identifier Q39246, the item for Richard Feynman, would
query Wikidata to find that the item with this identifier is a human and use the author document
template to determine what queries to construct and how to create the HTML document partly shown
in Figure 1.</p>
      <p>Some of the Scholia queries are dificult for the Wikidata Query Service to evaluate and queries
time out, resulting in documents with errors in them. Further, running these dificult queries puts a
significant load on the Wikidata Query Service. The group maintaining Scholia is thus interested in
determining whether a diferent SPARQL engine would do better.</p>
      <p>The advantage of using Scholia to construct a benchmark is that many queries can be constructed
from the templates. However, there are only about 375 query templates, and some of the templates are
similar to each other, so there is not a wide variety of diferent queries. Another problem with using
Scholia query templates for benchmarking is that they use extensions to SPARQL that are specific to
Blazegraph.</p>
      <p>The Scholia benchmark was constructed by determining the query templates for 33 diferent classes.
The query templates were then turned into standard SPARQL by expanding named queries replacing the
Wikidata Label Service with query fragments to determine English-language labels, and making a few
other, minor modifications. For each of these classes, five items belonging to the class were determined.
In a few cases these items were selected by hand but in most cases the items are the first five answers
to a query that returned instances of the class that had values for properties uses in one or more of the
query templates.</p>
      <p>Some of the templates are complex. For example, here is a query template for the author document
after conversion to standard SPARQL, edited to present better. The target: prefix is instantiated with
the URL for the Wikidata identifier being used.</p>
      <p>SELECT ?year (count(?work) AS ?numb_of_publs) ?role WHERE {
{ SELECT (str(?year_) AS ?year) (0 AS ?pp) ("_" AS ?role) WHERE {</p>
      <p>?year_item wdt:P31 wd:Q577 .
2The logs of the Wikidata Query Service are considered to be private as they might contain personally-identifying information
so constructing public benchmarks from them is not easy.</p>
      <p>A few queries returned large answer sets, which is not useful when constructing the final document,
so LIMIT clauses were added. A few queries had errors, which caused them to return incorrect answer
sets, and were fixed. All these changes were sent to the Scholia repository and have been incorporated
into it.</p>
      <p>A query run then consists of instantiating each query template with each item and evaluating the
resultant query.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Running the Benchmarks</title>
      <p>The benchmarks were all run on a machine with a Ryzen 9950X CPU, 192GB of main memory, and
fast NVMe SSD drives running the Fedora Linux distribution. MillenniumDB, QLever, and Virtuoso
were downloaded from their open-source repositories, using the version current as of 05 March 2025
for MillenniumDB, 22 March 2025 for QLever, and 19 March 2025 for Virtuoso.</p>
      <p>They were compiled using scripts from the repositories. Blazegraph is run from a docker image for
the current version of Blazegraph because of issues with Java. This may slow down Blazegraph by up
to 10%, but probably only slows Blazegraph down a few percent. This possible penalty does not afect
the main conclusions of the evaluation.</p>
      <p>Wikidata RDF dumps from late October 2024 were loaded into all four engines using settings
determined in consultation from developers where possible. Loading was relatively easy for MillenniumDB,
QLever, and Virtuoso and took less than a day for each, with QLever being fastest at about 4.5 hours.
Loading the dumps into Blazegraph took over 10 days and the first try failed, probably due to a bug
related to concurrent access to some data. As loading into Blazegraph was dificult no attempt was
made to use newer dumps of Wikidata.</p>
      <p>Settings for the engines during benchmarking were determined in consultation from developers
where possible and set up so that about 3/4 of main memory was used by the engine. This is more
memory than is commonly allowed in the public Wikidata services but was chosen to better reflect
expected memory growth in the near future. The engines are allowed to use multiple threads, but all
except Blazegraph are only lightly threaded when querying.</p>
      <p>Each query is run with a 10-minute timeout. This is larger than most public Wikidata services, which
generally use a 1-minute timeout, and was chosen to see behavior of the engines on a longer timeframe
and to provide some indication about behavior in future with faster computers.</p>
      <p>Each benchmark run is performed from a cold start, with system caches emptied, and timed after
any startup done by the engine. This means that any engine that defers startup until the first query is
evaluated will be slightly penalized. No engine spends more than a few seconds on startup and almost
all runs took multiple minutes or even hours so the penalty is insignificant. This also means that any
adaptation by the engine to the data in Wikidata or normal queries is considered to be part of the
benchmark timing.</p>
      <p>Then the multiple queries in each benchmark run are evaluated in succession, with no attempt to
clear any cached information between queries. The input and output formats were the same for each
engine. The benchmark runs, with the exception of the Scholia benchmark, had hundreds of queries.
This much better simulates the situation with a query service than attempting to remove caches.</p>
      <p>The controlling program is run on the same computer as the engine. It generally took minimal
resources, except when the queries return very large answer sets and receiving the answer set takes some
resources on the computer. The processing power required for this does not impact the benchmarking
as there are always many threads unused. The memory taken to store the result does have some impact,
competing for main memory with the system disk cache. Running the controlling program on the same
computer as the engines, however, eliminates the overhead in both time and memory to send the results
to a diferent computer. This overhead can be considerable, even when both computers are in the same
local network, so running the controlling program on the same computer was deemed better.</p>
      <p>The controlling program records the elapsed time between sending the query to the engine and
receiving the answers from the engine. This includes any time to transmit the information between the
controlling program and the engine, but not all engines provide internal timing information. If this time
is longer than the maximum time the query evaluation is determined to have timed out. The output
from the engine is checked for any reported errors. For each successful query one piece of information
about the answer set is recorded. For queries with multiple or no answers the number of answers is
recorded. For queries with one answer the value of the first variable in the query is recorded.</p>
      <p>The benchmarking process lasted from late October 2024 to late March 2025. Benchmarks were
run multiple times to remove problems in the early runs and as new versions of some of the engines
were made available. Initial results of the benchmarks were publicized and made public at https:
//www.wikidata.org/wiki/Wikidata:Scaling_Wikidata/Benchmarking and newly-discovered bugs
and anomalies were communicated to the teams responsible for the engine involved, resulting in new
versions of both QLever and MillenniumDB being available. The results here are for the latest runs for
each engine.</p>
      <p>Each set of queries for the existing benchmarks was run three times—once as described above, once
with the query modified to only return the count of the number of answers, and once with the query
modified to return only distinct answers. The second run was performed to eliminate the overhead
of transmitting large answer sets. The third run was performed to help see how many times Virtuoso
returned incorrect answer sets for transitive path queries. The Scholia benchmark queries were only
run unmodified, after the changes described above, as most of them only returned a few answers with
no duplicates. In a few cases the engine terminated when evaluating a query. These cases are marked
and the engine restarted with the next query.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Results</title>
      <p>For each engine the results of each set of queries were analyzed to compute the minimum and each
quartile elapsed times, the mean elapsed time, the number of timeouts, the number of errors encountered,
and the number of times the retained answer information diverges from a single mode for the four
engines. The arithmetic mean is used to show how the queries would consume time on servers as
opposed to show expectations by users, where geometric means are normally used.</p>
      <p>As well, adjusted statistics were computed, where elapsed time is capped at 60 seconds, with times
at least this long counting as a timeout, and any error counted as 60 seconds. This adjusted time is
computed mostly to penalize engines that had many errors, but also to more closely mirror times in</p>
      <sec id="sec-5-1">
        <title>Engine</title>
      </sec>
      <sec id="sec-5-2">
        <title>Blazegraph</title>
      </sec>
      <sec id="sec-5-3">
        <title>MillenniumDB</title>
      </sec>
      <sec id="sec-5-4">
        <title>QLever</title>
      </sec>
      <sec id="sec-5-5">
        <title>Virtuoso</title>
      </sec>
      <sec id="sec-5-6">
        <title>WDQS Benchmark Statistics, Unadjusted Timings</title>
      </sec>
      <sec id="sec-5-7">
        <title>Count min q1 q2 q3 max Mean Error Timeout Diverge</title>
        <p>298 11 88 511 6155 600018 12560 31 1 14
298 1 31 588 24176 602338 103271 0 43 5
298 2 26 103 559 301655 4583 3 0 12
298 1 68 461 3264 600754 14645 13 2 30</p>
      </sec>
      <sec id="sec-5-8">
        <title>WDQS Benchmark Statistics, Adjusted Timings</title>
      </sec>
      <sec id="sec-5-9">
        <title>Count min q1 q2 q3 max Mean Error Timeout Diverge</title>
        <p>298 17 89 520 6403 60000 10236 21 16 13
298 1 31 588 24176 60000 15482 0 64 3
298 2 26 103 559 60000 2290 1 6 12
298 2 80 577 4503 60000 8506 12 13 28</p>
      </sec>
      <sec id="sec-5-10">
        <title>Engine</title>
      </sec>
      <sec id="sec-5-11">
        <title>Blazegraph</title>
      </sec>
      <sec id="sec-5-12">
        <title>MillenniumDB</title>
      </sec>
      <sec id="sec-5-13">
        <title>QLever</title>
      </sec>
      <sec id="sec-5-14">
        <title>Virtuoso</title>
        <p>current public services.</p>
        <p>The statistics for all three variations of the WDQS benchmark with both unmodified and adjusted
timings are shown in Table 1. On this benchmark QLever is significantly the fastest for all three
variations, no matter whether the timings are adjusted or not. The relative diference in speed between
QLever and MillenniumDB, the slowest engine, is about 25 times for unadjusted timings and about 7
times for adjusted timings. QLever never takes the full 600 seconds for any query and only takes more
than 60 seconds for a few, whereas MillenniumDB times out on about 1 in 7 queries.</p>
        <p>Blazegraph has quite a few errors on this benchmark, mostly due to running out of memory. Virtuoso
has a few errors mostly due to refusal to evaluate the query due to high estimated times. incorrect
syntax processing, or issues with transitive paths. The Qlever errors are due to running out of memory.
Each engine diverges from a common mode in a few cases. Virtuoso diverging the most, mostly due to
invalid duplicates from transitive paths. Most of the divergences for MillenniumDB appear to be from
a bug in embedded query processing. The divergences for Blazegraph appear to be mostly from the
Blazegraph loading process removing some triples related to Wikidata labels. Some divergences, and
most of the QLever divergences, are due to extra processing of numeric and GeoSPARQL values.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Summarization and Analysis</title>
      <p>These statistics were further processed to produce summaries, removing the some of the statistical
information to show combined performance of each engine on the benchmarks, with the five components
of WDBench shown separately. This allows the timings and issues for each engine to be shown in a
smaller format. For the existing benchmarks six blocks of information are generated—for adjusted and
unadjusted on each way the benchmarks have been run. This information is shown in Tables 2 and 3.
For the Scholia benchmark again both unadjusted and adjusted information is shown in Table 4, but
only for some of the query classes.</p>
      <p>Existing Benchmarks The summaries for the existing benchmarks show that all the engines have
divergences from a single mode, likely indicating deviations from the SPARQL standard. Penalizing for
divergences was not done, because it was not always certain that divergences are incorrect answers. The
detailed results were examined and some queries run outside the benchmarking process to determine
some reasons for these divergences.</p>
      <p>The large number of divergences for Virtuoso are mostly due to two known issues. Virtuoso returns
duplicates from transitive path matching where the standard requires no duplicates. Virtuoso also
silently only produces at most 1048576 answers for any query. In the WDBench benchmarks this
produces over one thousand divergences, with the second cause producing over 60% of the divergences,
as shown by the statistics for when only counts are returned. This large number of divergences should
be taken into account when considering Virtuoso.</p>
      <p>The divergences for MillenniumDB appear to be mostly due to not returning duplicates for alternatives
in property paths. Other divergences for MillenniumDB appear to be from a bug in embedded query
processing.</p>
      <p>Many divergences for Blazegraph are from the Blazegraph loading process removing some triples
related to Wikidata labels, and are thus not a problem with Blazegraph itself. Other divergences for
Blazegraph come from an incorrect ordering of DISTINCT and LIMIT processing.</p>
      <p>QLever and possibly other engines transform numeric RDF literals into internal data, which does not
conform to the RDF and SPARQL standards. For example, "1"^^xsd:integer and "01"^^xsd:integer
incorrectly become the same RDF node. This causes the majority of the divergences for QLever.
GeoSPARQL datatypes were also a source of divergences.</p>
      <p>The summaries for the existing benchmarks also show a considerable number of errors, so penalizing
the engines for errors is appropriate.</p>
      <p>Most of the errors for QLever result from running out of memory. QLever query processing appears
to trade of space for time, and QLever can request large amounts of memory for queries, thus running
out of space. Optional clauses and requiring results to be distinct appear to afect this tradeof, so much
that QLever runs out of memory very often when either of these constructs is present and resultant
penalty in the adjusted timings is significant.</p>
      <p>Blazegraph also often runs out of memory, with a significant resultant penalty. Blazegraph also
regularly reports errors in access to its internal data structures.</p>
      <p>Virtuoso first estimates the time it would take to evaluate a query and refuses to run the query if this
estimate is out of bounds. Unfortunately, the query estimator regularly produces unbelievable estimates
resulting Virtuoso frequently refusing to run a query. The penalty for these errors is significant.</p>
      <p>MillenniumDB and QLever are not complete implementations of SPARQL and a few queries contain
constructs that they do not handle. Virtuoso also reports a few queries that it cannot handle, mostly
relating to transitive property paths. MillenniumDB has very few errors overall. This does need to be
balanced against the large number of timeouts for MillenniumDB.</p>
      <p>The timings show QLever as the fastest engine for the existing benchmarks, except for the versions
with distinct results where Virtuoso is fastest. Otherwise Virtuoso is the second-fastest, but this needs
to be balanced with the large number of divergences for Virtuoso.</p>
      <p>In unadjusted timings, MillenniumDB is the slowest overall. MillenniumDB is fast when there are
simple queries or limited answers but is very slow when there are complex queries (WDBench others
and WDQS). It thus appears that MillenniumDB is speedy on atomic operations but does not do a good
iv 0 51 83 90 22 20 30 96 iv 0 15 18 06 51 30 82 83 iv 0 1 0 0 5 0 6 2
6 4 2 3
D 3 2 4 3 4 D 3 2 4 3 4 D 3 2 6</p>
      <p>1 1</p>
      <p>S 1 2 8 1 5 0 0 S 1 5 2 4 5 6 8 S
1 1 4 5
2
n O 0 0 0 0 4 5 3 2</p>
      <p>1
U T</p>
      <p>A O 0 0 2 62 11 00 46 302 adn TO 0 0 0 0 4 4 3 1
5 4 11 ,s T 1 5 4 1
1 1</p>
      <p>U
B ,s
M d</p>
      <p>o
n q
n
le ed lo
li if S
i
e
i
iuDm ireeu rrE 0 0 0 0 1 0 0 1 re rr 0 0 0 0 1 0 0 1 ,ts rr 0 0 0 0 1 0 0 1
u E l E
q u</p>
      <p>s
w 9 0 2 6 7 7 8 9 d w 9 0 5 0 9 6 2 1 e w 9 9 0 4 7 5 1 5
1 4 1 6 8 30 ie o 5 7 0 9 4 r 6 2 9 9 3 2
8 3 0 1 6 f l 3 9 1 5 d lo 7 4 8 9 8 9
8 3 97 89 09 id S 12 13 26 te S 5 2 87 89 68
1 o n 1</p>
      <p>m u
nm ean 21 01 42 64 93 37 71 29 n n 1 1 4 0 8 5 2 1 o n 1 7 0 5 6 1 9 9</p>
      <p>a 2 0 8 0 0 9 8 9 C a 2 1 0 7 6 1 8 7
5 4 3 2 3 2 2 U e 5 7 5 7 2 4 2 e 3 9 7 0 9 0
U 5 5 7 2 3 4 4 2 2 5 5 1 1 9 6 1 2 2</p>
      <p>M 2 8 0 2 M 1 1 1 5 M 1 8 0 1</p>
      <p>1 2 1 2
iv 0 5 0 3 4 3 4 9 iv 0 4 0 2 3 9 3 1 iv 0 5 5 7 4 6 4 1</p>
      <p>1 2 1 5 1 1 4 1 2 1 7
D D D
k
r
a
m
h
c
n
e
B
n 8 6 8 6 1 9 5 3 n 2 6 5 7 6 7 6 9 n 2 6 2 0 5 1 0 6
a 7 2 0 6 9 3 4 5 a 4 2 1 3 8 6 0 7 a 6 1 2 4 6 4 3 7
e 9 7 6 5 6 2 6 4 e 3 7 4 1 6 7 5 5 e 7 1 1 1 0 4 1 7</p>
      <p>1 7 5 5 8 4 4 1 0 5 6 7 8 0 4 1 5 7 4 2
M 2 1 1 7 M 1 1 5 M 3 1 1 7
iv 0 5 3 8 0 8 2 6 iv 0 5 3 8 0 8 2 6 iv 0 4 8 9 3 8 2 4</p>
      <p>1 3 1 3 1 4
D D D
O 0 0 0 0 2 0 0 2 O 0 0 0 9 8 4 6 7 O 0 0 0 0 2 0 0 2</p>
      <p>T T 2 4 T
raph rrE 2 0 10 23 21 15 31 102 rrE 1 0 6 19 0 9 21 56 rrE 0 0 7 912 15 7 23 424
g
e
za low 632 602 023 833 001 944 779 012 low 489 572 573 385 336 327 469 628 low 066 611 571 697 337 904 245 962
l
B S 2 5 3 12 29 7 43 S 1 4 4 6 81 7 34 S 1 1 71 13 91 59 42 19</p>
      <p>1 1
iv 0 1 0 0 36 92 42 71 iv 0 15 01 80 11 11 32 41 iv 0 1 7 5 5 5 2 5
5 0 7 0 0 2 6
D 3 2 6 D 4 3 1 1 0 D 4 2 1 1 9</p>
      <p>1
m u E
iu s</p>
      <p>e
n r
n
le ed lo
li t S</p>
      <p>n
M u</p>
      <p>o
2 2 2 1 9 m M
i
t
d
ed iv 0 0 0 0 0 2 3 52 te iv 0 0 0 0 1 0 3 4 d iv 0 0 0 0 0 0 1 1</p>
      <p>2
u
j u
d O 0 0 2 4 0 0 4 0 d
2 1 0 6 0 a O 0 0 0 0 4 6 3 3 j</p>
      <p>5 4 1 d O 0 0 2 6 1 0 4 3
1 2 1 0 6 0
1 2 n T 1 A T 1 2</p>
      <p>U ,
D lts rr 0 0 0 0 1 0 0 1 ,s rr 0 0 0 0 1 0 0 1 lts rr 0 0 0 0 1 0 0 1
lt E u E
u s</p>
      <p>e
w 9 9 8 0 5 1 3 5 se w 7 0 6 3 2 8 6 2 r w 7 0 0 8 0 1 9 5
3 9 1 3 9 r o
2 9 4
s D
u
j
e
t D
s
n
i
t
s
C n 1 7 1 4 7 4 1 5 i n 2 9 9 3 8 2 6 9 D n 2 9 8 7 4 6 0 6
a 2 1 8 0 2 5 4 4 D a 2 5 5 1 1 3 2 2 a 2 5 8 0 6 7 1 2
e 6 5 2 1 3 9 e 7 6 9 3 0 3 0 e 7 9 9 1 9 5 3</p>
      <p>7 2 4 5 9 7 7 8 3 3 1 6 4 3 5 5 7
M 1 1 3 M 2 8 0 3 M 1 1 1 5
1 2
raph rrE 0 0 7 108 10 3 13 141 rrE 2 20 363 275 65 99 32 856 rrE 1 19 359 266 31 85 19 780
g
e
za low 431 611 814 254 956 212 775 792 low 611 114 066 0 09 10 09 27 low 487 149 687 12 13 33 16 97
l 6 3 4 2 8 6 7 7 1
B S 1 13 26 6 21 8 78 S 3 9 20 82 7 23 S 5 29 23 8 19 7 95
1
n 3 9 7 8 7 5 3 2 n 6 3 9 0 6 5 0 9 n 2 8 5 1 7 8 7 8
a 4 1 5 5 2 5 8 4 a 2 7 0 8 1 9 2 1 a 0 0 7 2 7 0 7 6
e 4 6 4 7 6 4 6 0 e 6 8 1 9 7 8 2 4 e 5 9 6 2 7 4 1 6</p>
      <p>1 4 3 8 2 0 2 3 6 1 6 6 2 8 5 6 7 1 7 0 9
M 1 3 2 1 9 M 1 2 2 9 1 7 M 3 3 1 2 1 2
1 1
k
r
a
m
h
c
n
e
B
o
s E
o
rr 27 10 0 5 5 5 0 0 0 7 5 0 1 rr 27 10 0 5 5 5 0 0 0 7 5 0 1
2 1 2 1 1 1 2 2 1 2 1 1 1 2</p>
      <p>2 E 2
u
itrV low 0 415 669 0 0 0 652 0 0 000 802 0 618 low 217 979 669 0 538 232 265 890 410 405 021 741 888</p>
      <p>S 1 1 3 2 9 S 9 9 1 4 2 1 1 6 6 6 4
2 1 2 3 2
2
n 2 0 7 0 8 1 9 8 2 5 7 3 6 n 2 7 7 8 7 2 9 1 4 9 3 4 1
a 7 9 5 5 7 5 7 2 0 9 1 5 5 a 7 0 5 8 0 0 7 0 4 9 5 2 8
e 4 5 2 2 8 2 9 1 5 4 3 2 4 e 1 4 2 0 8 6 9 0 9 9 5 8 0</p>
      <p>2 3 3 3 2 1 5 0 2 5 0 2 3 8 9 6 5 8 4
M 5 M 1 2 1 3 2 4 1 7
3
iv 5 0 0 0 0 0 0 0 0 1 0 0 7 iv 5 0 0 0 0 0 0 0 0 1 0 0 7</p>
      <p>1 1
D D
r rr 10 5 0 0 7 0 0 9 2 0 0 1 3 rr 10 5 0 0 7 0 0 9 2 0 0 1 3</p>
      <p>1 1 9 1 1 9
n 6 5 8 9 5 0 7 6 2 5 5 0 2 n 5 8 8 9 2 0 7 1 0 5 3 9 5
a 0 4 8 0 5 7 1 7 5 9 3 3 6 a 5 2 8 9 7 7 1 6 3 9 3 8 3
e 0 4 5 7 0 3 4 8 5 4 2 8 6 e 9 4 5 4 9 3 4 1 9 4 4 1 2</p>
      <p>3 0 4 2 9 4 1 8 5 0 0 5 2 4 3 9 1 6
M 0 4 4 M 1 3 1 2 1 3</p>
      <p>3 5 2
s
gn iv 6 0 0 0 0 0 0 0 0 0 0 1 01 sg iv 6 0 0 0 0 0 0 0 0 0 0 1 0
i 1</p>
      <p>D
n D
i
m
i
t
ed T
t</p>
      <p>m
O 19 10 5 5 4 7 0 5 9 9 0 6 1 i</p>
      <p>5 t O 19 10 5 5 4 7 0 5 9 9 0 6 1
2 1 5 3 3 2 1 5 3 3 5</p>
      <p>3 d T 3
m d E</p>
      <p>e
BD jsu rr 10 0 0 0 0 0 0 0 0 0 0 5 5 ts rr 10 0 0 0 0 0 0 0 0 0 0 5 5</p>
      <p>1 1 4 ju E 1 1 4
u a d
i n
n w 17 88 41 03 50 24 28 38 02 76 50 17 76 A low 432 584 20 38 39 45 77 16 82 58 51 57 99
n U lo 2 0 4 2 0 2 7 0 7 7 4 1 9 4 3 4 6 4 2 8 5 2 3 0
le S 0 9 9 8 2 2 3 0 0 8 5 4 7 S 7 1 4 8 6 4 0 9 4 6 2 9 8
li 3 1 4 5 1 4 1 6 2 1 8 9 0 1 1 1 1 1 1 1 4 2 3 8</p>
      <p>1 2 1 1 3 1 3 4 2 2 9 4
M 4
n 9 3 9 3 8 5 5 6 4 1 5 0 1 n 7 2 8 6 1 5 4 9 2 3 4 4 2
a 8 3 2 5 2 7 4 6 0 7 8 7 7 a 8 1 0 2 1 1 9 1 1 5 8 3 9
e 6 5 0 4 9 4 1 1 2 2 6 3 5 e 3 0 0 4 4 0 8 1 8 0 6 0 2</p>
      <p>0 9 0 8 2 2 6 0 4 9 5 4 9 3 2 5 3 2 5 2 6 8 7 1 8 7
M 3 1 5 5 1 4 1 6 2 1 8 9 2 M 2 2 1 2 3 1 1 3 4 2 4 1 3
1 2 1 1 3 1 3 4 2 2 9 6</p>
      <p>4
D</p>
      <p>D
rr 15 5 0 5 0 0 0 5 3 3 0 0 5 rr 15 5 0 5 0 0 0 5 3 3 0 0 5
2 1 2 1 4 2 1 2 1 4</p>
      <p>1 E 1
n 0 8 7 1 7 2 3 5 3 2 8 6 5 n 8 7 0 6 3 9 0 3 4 4 6 7 3
a 3 3 3 1 7 1 3 4 6 2 2 2 7 a 6 9 3 4 3 0 5 0 3 0 8 7 2
e 5 4 1 2 8 1 5 5 8 2 8 4 7 e 2 0 1 0 0 6 9 9 1 3 5 6 0</p>
      <p>8 2 0 4 2 5 3 1 2 4 8 9 2 8 0 5 0 0 7 9 6 9 7 2 8 3
M 0 5 6 0 7 4 6 4 9 4 3 M 2 1 3 3 1 1 3 1 3 5
1 1 1 1 2 2 1 6 3</p>
      <p>1
k
r
a
m
h
c
n
e
B</p>
      <p>s
s e
s m i
la le x re y
t
c
e
j
o</p>
      <p>s
s e
s m i
la le x re y
t
c
e
j
o
r -c -e le -s tc tre in
oh m m p tn je p te ic eu irp k TA
tu eh eh om ev ro ro ro op en ik ro O
a c c c e p p p t v w w T</p>
      <p>S
L
r -c -e le -s tc tre in
oh m m p tn je p te ic eu irp k TA
tu eh eh om ev ro ro ro op en ik ro O
a c c c e p p p t v w w T
job of producing good query plans for complex queries. When timings are adjusted to account for
errors, Blazegraph is the slowest by a large margin over QLever and Virtuoso.</p>
      <p>Scholia benchmark The Scholia benchmark also shows the need to adjust timings to account for
errors. In the unadjusted timings Virtuoso is the fastest, but it has the most errors. When timings are
adjusted, Virtuoso sinks to third and QLever is fastest by a ratio of about two-thirds over Blazegraph.
MillenniumDB is the slowest on this benchmark, timing out on many of the queries, and is about 2.7
times slower than QLever. The slowness of MillenniumDB is likely due to the complex queries in the
benchmark.</p>
      <p>Almost all of the 145 errors for Blazegraph in the Scholia benchmark are due to running out of
memory. MillenniumDB produces no output for its 45 errors so the cause cannot be determined, but it
is likely that the cause for most of them is unrecognized answers from service calls. Close to half of
the 93 errors for QLever are due to running out of memory, with most of the rest due to unrecognized
answers from service calls and a few due to unimplemented syntax. Of the 221 errors for Virtuoso, over
half are due to unimplemented syntax and most of the rest due to high estimated execution times with
most of these estimated times being excessive or abnormal.</p>
      <p>There are some divergences in the answers from the engines. As before, Virtuoso has the most
divergences, with Blazegraph having the fewest. The reason for most of these divergences is unknown
due to the complex nature of the queries. Some divergences appear to be due to the reasons identified
above.</p>
    </sec>
    <sec id="sec-7">
      <title>7. Summary and Recommendation</title>
      <p>QLever is the fastest engine overall, but is slower for distinct answers. Virtuoso is fast but diverges
the most by far mostly due to several known causes. MillenniumDB and Blazegraph are the slowest.
MillenniumDB is fast on simple queries, but slow on complex queries.</p>
      <p>None of the engines are free of errors or divergences, even Blazegraph. That Blazegraph has
divergences is a bit surprising because Blazegraph was in use for the oficial Wikidata Query Service while
it was still being maintained. Both QLever and MillenniumDB are under active development, which
should improve their performance and reduce their errors and divergences.</p>
      <p>From the results in these benchmarks, a Wikidata Query Service based on QLever would be
significantly faster and produce more answers and fewer errors than one based on Blazegraph. QLever
now appears to be a viable replacement for Blazegraph in the oficial Wikidata Query Service as it has
recently been extended to allow its RDF graph to be updated while it is running.</p>
    </sec>
    <sec id="sec-8">
      <title>Declaration on Generative AI</title>
      <p>The author has not employed any Generative AI tools in the work reported on in this paper nor in the
preparation of this paper.</p>
    </sec>
    <sec id="sec-9">
      <title>Acknowledgments:</title>
      <p>This work was partly supported by a grant from Wikimedia Switzerland.
[4] SPARQL, SPARQL 1.1 query language, W3C Recommendation, https://www.w3.org/TR/sparql11
-query/, 2013.
[5] Richard Cyganiak and David Wood and Markus Lanthaler, RDF 1.1 concepts and abstract syntax,</p>
      <p>W3C Recommendation, https://www.w3.org/TR/rdf11-concepts/, 2014.
[6] Wikidata:RDF, Wikidata:RDF, https://www.wikidata.org/wiki/Wikidata:RDF, 2025. Accessed 30</p>
      <p>April 2025.
[7] Blazegraph, Welcome to Blazegraph, blazegraph.com, 2020. Accessed 23 July 2024.
[8] H. Bast, B. Buchhold, QLever: A query engine for eficient SPARQL+text search, in: CIKM ’17:</p>
      <p>ACM Conference on Information and Knowledge Management, 2017.
[9] G. Lederrey, L. Pintscher, D. Causse, Wikidata query service: Where are we? Where is it going?,
Data Reuse Days 2025, https://docs.google.com/presentation/d/1DHxnjkZKwly9AKONOJtvf Tk6ls
6DBw1Ab6gHdODM5XA, 2025.
[10] Wikidata SPARQL Query Service Backend Update, Wikidata SPARQL query service backend update,
https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/WDQS_backend_update, 2025.</p>
      <p>Accessed 30 April 2025.
[11] D. Vrgoč, C. Rojas, R. Angles, M. Arenas, D. Arroyuelo, C. Buil-Aranda, A. Hogan, G. Navarro,
C. Riveros, J. Romero, MillenniumDB: An open-source graph database system, Data Intelligence 5
(2023).
[12] H. Bast, QLever performance evaluation and comparison to other SPARQL engines, https://github
.com/ad-f reiburg/qlever/wiki/QLever-performance-evaluation-and-comparison-to-other-SPA
RQL-engines, 2025. Accessed 8 May 2025.
[13] Virtuoso, Virtuoso open-source edition, https://vos.openlinksw.com/owiki/wiki/VOS, 2024.</p>
      <p>Accessed 30 April 2025.
[14] F. Å. Nielsen, D. Mietchen, E. Willighagen, Scholia and scientometrics with Wikidata, in:
Scientometrics 2017, 2017, pp. 237–259. URL: https://arxiv.org/pdf/1703.04222.
[15] A. N. Lam, B. Elvesaeter, F. Martin-Recuerda, in: The Semantic Web: 20th International Conference,</p>
      <p>ESWC 2023, 2023, pp. 679–696. doi:http://dx.doi.org/10.1007/978-3-031-33455-9_40.
[16] RDF 1.1 Turtle, RDF 1.1 Turtle, W3C Recommendation, https://www.w3.org/TR/turtle/, 2014.
[17] A. Hogan, C. Riveros, C. Rojas, A. Soto, A worst-case optimal join algorithm for SPARQL, in:</p>
      <p>Proceedings of the 18th International Semantic Web Conference (ISWC), 2019.
[18] R. Angles, C. B. Aranda, A. Hogan, C. Rojas, D. Vrgoč, Wdbench: A wikidata graph query
benchmark, in: U. Sattler, A. Hogan, M. Keet, V. Presutti, J. P. A. Almeida, H. Takeda, P. Monnin,
G. Pirrò, C. d’Amato (Eds.), The Semantic Web – ISWC 2022, Springer, 2022, pp. 714–731.
[19] S. Malyshev, M. Krötzsch, L. González, J. Gonsior, A. Bielefeldt, Getting the most out of Wikidata:
Semantic technology usage in Wikipedia’s knowledge graph, in: D. Vrandečić, K. Bontcheva, M. C.
Suárez-Figueroa, V. Presutti, I. Celino, M. Sabou, L.-A. Kafee, E. Simperl (Eds.), Proceedings of the
17th International Semantic Web Conference (ISWC’18), Springer, 2018, pp. 376–394.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>D.</given-names>
            <surname>Vrandečić</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Krötzsch</surname>
          </string-name>
          ,
          <article-title>Wikidata: A free collaborative knowledgebase, C. of the ACM 57 (</article-title>
          <year>2014</year>
          )
          <fpage>78</fpage>
          -
          <lpage>85</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Wikidata</surname>
          </string-name>
          , Wikidata main page, https://www.wikidata.org/wiki/Wikidata:Main_Page,
          <year>2025</year>
          . Accessed 30
          <string-name>
            <surname>April</surname>
          </string-name>
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Wikimedia</given-names>
            <surname>Deutschland</surname>
          </string-name>
          , Wikibase, https://wikiba.se/,
          <source>2025. Accessed 30 April</source>
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>