=Paper= {{Paper |id=Vol-3262/paper9 |storemode=property |title=Getting and hosting your own copy of Wikidata |pdfUrl=https://ceur-ws.org/Vol-3262/paper9.pdf |volume=Vol-3262 |authors=Wolfgang Fahl,Tim Holzheim,Andrea Westerinen,Christoph Lange ,Stefan Decker |dblpUrl=https://dblp.org/rec/conf/semweb/FahlHW0D22a }} ==Getting and hosting your own copy of Wikidata== https://ceur-ws.org/Vol-3262/paper9.pdf
Getting and hosting your own copy of Wikidata
Wolfgang Fahl1 , Tim Holzheim1 , Andrea Westerinen2 , Christoph Lange1,3 and
Stefan Decker1,3
1
  RWTH Aachen University, Computer Science i5, Aachen, Germany
2
  OntoInsights LLC, Elkton, MD, United States
3
  Fraunhofer FIT, Sankt Augustin, Germany


                                         Abstract
                                         Wikidata is a very large, crowd sourced, general knowledge graph that is backed by a worldwide
                                         community. Its original purpose was to link different versions of Wikipedia articles across multiple
                                         languages. Access to Wikidata is provided by the non-profit Wikimedia Foundation and recently also by
                                         Wikimedia Enterprise as a commercial service. The query access via the public Wikidata Query Service
                                         (WDQS) has limits that make larger queries with millions of results next to impossible, due to a one
                                         minute timeout restriction. Beyond addressing the timeout restriction, hosting a copy of Wikidata may
                                         be desirable in order to have a more reliable service, quicker response times, less user load, and better
                                         control over the infrastructure.
                                             It is not easy, but it is possible to get and host your own copy of Wikidata. The data and software
                                         needed to run a complete Wikidata instance are available as open source or accessible via free licenses.
                                             In this paper, we report on both successful and failed attempts to get and host your own copy of
                                         Wikidata, using different triple store servers.
                                             We share recommendations for the needed hardware and software, provide documented scripts to
                                         semi-automate the procedures, and document things to avoid.

                                         Keywords
                                         Wikidata, RDF Bulk Import, SPARQL




1. Introduction
In recent years Wikidata [1] has gained increasing visibility and value as an open knowledge
graph. Accessing its information is constrained, however, since the resources behind the
Wikidata Query Service (WDQS)1 are limited and used by many people and services (such as
bots that scan or update Wikidata automatically). In addition, due to the size of Wikidata (∼17B
triples), even a simple query can hit the WDQS query timeout limit2 .
   Public alternatives to WDQS are (to our knowledge) rare. We only know of the three endpoints
shown in Table 1. And, these alternatives have their own limitations. The QLever3 Wikidata
endpoint has still limitations in the SPARQL language features that are accepted. The Virtuoso4

Wikidata’22: Wikidata workshop at ISWC 2022
 0000-0002-0821-6995 (W. Fahl); 0000-0003-2533-6363 (T. Holzheim); 0000-0002-8589-5573 (A. Westerinen);
0000-0001-9879-3827 (C. Lange); 0000-0001-6324-7164 (S. Decker)
                                       © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings         CEUR Workshop Proceedings (CEUR-WS.org)
                  http://ceur-ws.org
                  ISSN 1613-0073




                  1
                    https://query.wikidata.org/
                  2
                    https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/query_limits
                  3
                    https://qlever.cs.uni-freiburg.de/wikidata
                  4
                    https://wikidata.demo.openlinksw.com/sparql
Table 1
Public Wikidata query endpoints
           Provider                    Triple Store    URL
           University of Freiburg      QLever          https://qlever.cs.uni-freiburg.de/wikidata
           OpenLink Software           Virtuoso        https://wikidata.demo.openlinksw.com/sparql
           Wikimedia Foundation        Blazegraph      https://query.wikidata.org/


endpoint provided by OpenLink Software has data that is many months old.
   A key decision factor for getting and hosting your own copy of Wikidata is the cost-benefit
relation. The cost includes the infrastructure plus the effort to get, host and update the data.
The benefit depends on the use cases and what Wikidata content your use cases need - see
section 1.1 below.
   The current public Wikidata Query Service infrastructure uses Blazegraph [2]. That imple-
mentation is based on open-source software that is no longer in development and is approaching
execution limitations. As stated in a Wikimedia Foundation Blog entry "Blazegraph maxing out
in size poses the greatest risk for catastrophic failure, as it would effectively prevent WDQS
from being updated further [...]" [3] and therefore, the Wikimedia Foundation is currently
investigating various stop-gap and replacement strategies [4]. Having second source options to
host a complete copy of Wikidata in a reproducible way is an important alternative, especially
for the transition phase of the upcoming years.

1.1. Content of Wikidata
Wikidata originally started as a knowledge graph for the international Wikipedia encyclopedias.
From there it evolved to a folksonomy-style [5] general knowledge graph.
  The Wikidata statistics web page5 shows a pie chart of the entity types having the largest
number of instances, but the data is already 2 years old, at the time of this writing. Khatun [6]
provided a table of the top 50 "subgraphs" - which are essential the entity classes - of Wikidata
and corresponding charts. A more current statistic based on our Stardog6 import of 2022-06 is
shown in Figure 1.
  If you are interested in a large number of Wikidata items, then there is a higher probability that
having your own copy of Wikidata will be beneficial. Our use case in the ConfIDent project [7]
revolves around the entities, academic conference, scientific event series, scholarly article,
proceedings and author, which constitute a substantial portion of the Wikidata knowledge
graph. Since essential queries for this project where not possible via the publicly available
endpoints, we decided to get and host our own copies of Wikidata.

1.2. Size of Wikidata as a moving target
Figure 2 shows how the size of Wikidata dumps has grown substantially over the 2018-2022
time frame. The “Triples” column in Table 3 indicates the difference in import size. From the

    5
        https://www.wikidata.org/wiki/Wikidata:Statistics
    6
        https://www.stardog.com/
Figure 1: Wikidata instanceof/P31 statistics 2022-07


first reported success in 2017 to the most recent in 2022-07, Wikidata has grown by more than a
factor 5. Therefore the reproducibility and comparability of the results are limited.

                           200
                                        nt-bz2
                           175          nt-gz
                                        ttl-bz2
                           150          ttl-gz
                                        json-bz2
                           125          json-gz
          file size [GB]




                           100

                           75

                           50
                                                                                                         17.2 billion triples→
                                                                                         11.3 billion triples→
                           25                                                       ←9.5 billion triples
                                                          ←3.0 billion triples
                            0

                                    7                 8                 9                0             1              2
                                 201               201            201              202           202            202
                                                                                 date


Figure 2: Evolution of the Wikidata dump file size by format and compression



1.3. Getting and hosting your own copy of Wikidata
If you intend to get and host your own copy of Wikidata, the following aspects need to be
considered:

Reliability of the import - Will you actually get a running system in the end or run into
     some kind of software or hardware limit you didn’t foresee?
Needed Resources - What kind of computer will you need - e.g. how much RAM and disk
    space are needed? What kind and version of operating system are needed? Can the
    machine be hosted virtually, and if so how much will it cost? How much time will the
    import take?

Usefulness of results - How useful will the result be for your use case? Will you get better
     performing queries? Do you need “always current” data? Will your own Wikidata server
     be compatible with the original one? Will the infrastructure be more reliable than the
     public endpoints?

Consistency How to keep your copy in sync with Wikidata? Integration of the wikidata update
     stream7 into the knowledge graph and how to handle update interrupts? Possibility of
     publishing updates from the copy to Wikidata.


2. Materials and Methods
2.1. Wikidata copy bulk import procedure
Höfler’s [8] StackExchange question and Malyshev’s [9] "getting started" guide were the basis
of our first attempt to get and host a complete copy of Wikidata in January 2018. The motivation
was to run Apache Gremlin8 queries on Wikidata. 9
   Table 2 shows the different triple stores we considered for testing the import and setup
procedure.

General steps for getting a complete Wikidata copy

    • Procure the hardware and software for the indexing and hosting environment (which
      might be two different computers)
    • Download the current Wikidata dump
    • Install the triple store software
    • Configure the triple store
    • Optionally preprocess the triples
    • Run the bulk import / indexing procedure
    • Optionally copy the results from the indexing machine to a target machine
    • Start the server
    • Optionally start a separate GUI webserver




    7
       https://www.wikidata.org/wiki/Wikidata:Data_access#Recent_Changes_stream
    8
       https://github.com/blazegraph/tinkerpop3
     9
       Unfortunately we never got this working since we couldn’t connect the Blazegraph endpoint to the gremlin
infrastructure.
Table 2
Candidate RDF Triple stores for hosting a full copy of Wikidata
           Name                Homepage                                             Version    First Attempt
           Apache Jena         https://jena.apache.org/                              4.3.1          2020
           Allegro Graph       https://allegrograph.com/                              7.3         planned
           Blazegraph          https://blazegraph.com/                               2.1.5          2018
           QLever              https://qlever.cs.uni-freiburg.de/wikidata           2022-01       2022-01
           RDF4J               https://rdf4j.org/                                                 planned
           Stardog             https://www.stardog.com                               8.0          2022-07
           Virtuoso            https://virtuoso.openlinksw.com/                     0.7.20        planned

Table 3
Success Reports
          Date      Source                            Target        Triples          Load Time         Reference
          2017-12   latest-truthy.nt.gz               Apache Jena   ?                8h                [11]
          2018-01   wikidata-20180101-all-BETA.ttl    Blazegraph    3 billion        4d                [10]
          2018-05   latest-all.json.gz                dgraph        ?                4d                [12]
          2019-02   latest-all.ttl.gz                 Apache Jena   ?                2d                [13]
          2019-05   wikidata-20190513-all-BETA.ttl    Blazegraph    ?                10.2 d            [14]
          2019-05   wikidata-20190513-all-BETA.ttl    Virtuoso      ?                43 h              -
          2019-09   latest-all.ttl (2019-09)          Virtuoso      9.5 billion      9.1 h             [15]
          2019-10                                     Blazegraph    ~10 billion      5.5 d             [16]
          2020-03   latest-all.nt.bz2 (2020-03-01     Virtuoso      ~11.8 billion    10 h + 1 d prep   [17]
          2020-06   latest-all.ttl (2020-04-28)       Apache Jena   12.9 billion     6 d 16 h          [18]
          2020-07   latest-truthy.nt (2020-07-15)     Apache Jena   5.2 billion      4 d 14 h          [19]
          2020-08   latest-all.nt (2020-08-15)        Apache Jena   13.8 billion     9 d 21 h          [20]
          2022-02   latest-all.nt (2022-01-29)        QLever        16.9 billion     4d2h              [21]
          2022-02   latest-all.nt (2022-02)           Stardog       16.7 billion     9h                [22]
          2022-05   latest-all.ttl.bz2 (2022-05-29)   QLever        ~17 billion      14 h              [23]
          2022-06   latest-all.nt (2022-06-25)        QLever        17.2 billion     1d2h              [24]
          2022-07   latest-all.ttl (2022-07-12)       Stardog       17.2 billion     1 d 19 h          [25]



3. Results
3.1. Wikidata Imports
Table 3 shows the collection of reports of successful Wikidata imports. Also see the "Success
Reports" section [10] describing our import attempts. In that section, we describe our experi-
ences with the candidate RDF triple stores shown in table 2, cross referenced with the success
reports in chronological order.

Blazegraph Blazegraph as a software package is quite simple to install. See the setting up
     Blazegraph10 section in the SPARQL tutorial by the main author of this work. The
     Wikimedia Foundation also provides details on setting up Blazegraph11 .
         The "getting started" procedure by Malyshev [9] was followed by us in 2018 on an Ubuntu
         18.04 LTS machine (later upgraded to 20.04 LTS) and wasn’t successful on the first attempt.
   10
        https://wiki.bitplan.com/index.php/SPARQL
   11
        https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service
        We had to use an SSD disk instead of a rotating disk to improve the preparation speed
        of the munge script that does the preprocessing of the triples. After successful import,
        the endpoint has been running reliably since 2018 and only needs an occasional server
        restart (usually after the software crashes on a query that pushes the hardware limits of
        64 GB RAM).
        The automation of the procedure is quite poor - there are several manual configuration
        steps necessary. Note that we did not bother to document an attempt that took 4 days!
        The hardware cost was some 100 EUR for a used server and 140 EUR for a 480 GB SSD.
Apache Jena/Fuseki Based on Andy Seaborn’s [11] success report of 2017 we performed
    seven attempts from 2020-04 until 2020-08, when we could document a success [20].
    The problems encountered have been documented on our wiki and include references
    to the Apache Jena tdbloader software issues that were required to be fixed. The script
    wikidata2jena on the Success Report web site has a fully automated procedure that worked
    at the time of the success.
        Figure 3 shows the non linear index speed of the tdbloader2 bulk import. The non-linearity
        is especially bad when using a rotating disk. This is due to the B-Tree structure being
        applied by the Jena triple store. The needed balancing operations need to move data
        which causes head movements which are notoriously slow on rotating media. The few
        milliseconds needed for each move would have added up to almost half a year of indexing
        time. The remedy was to use a 4 TB SSD where the non linearity slowed down the
        processing speed from 100 k triples/second to a reasonable 13 k triples/second. The final
        import statistic was: Time = 395.522,378 seconds : Triples = 5.253.753.313 : Rate = 13.283
        /s
        The 4 TB SSD cost some 700 EUR. The indexing server was a 1000 EUR used Apple Mac
        Pro 2012/12 core with 64 GB of RAM.
        The 4 TB SSD was moved to a publicly visible endpoint of the RWTH Aachen i5 that is
        access protected and has been operating reliably. Restarts are necessary when the 6 GB
        RAM allocation limit of the server has been hit.
QLever A series of import attempts were performed for QLever between 2022-02 and 2022-07.
    Hannah Bast’s [23] import results are publicly available. Our import attempts were
    mostly performed on a used 128 GB RAM Server using Ubuntu 20.04 LTS. We also tried
    using the Apple Mac Pro 2012 machine. Our results varied depending on whether we
    were working from a docker image or using a native build. Two successful indexing
    attempts are documented on our wiki.
        The procedure is fully automated using the QLever script12 and alternatively the “official”
        qlever-control13 that was inspired by the need to make the import procedure repeatable.
        The import took 9 h in Bast’s attempt and 22 h for the fastest import on our own machine.
        The public server has been quite reliable while our own machine has had a sequence
        of problems that we documented on our wiki and in the issues of the QLever github
  12
       https://wiki.bitplan.com/index.php/QLever/script
  13
       https://github.com/ad-freiburg/qlever-control
          Figure 3: Non Linear Indexer performance of Apache Jena


          repository14 . The cost for the used server was 400 EUR and the cost for another 4 TB SSD
          had fallen to 350 EUR by this time.

Stardog Based on Evren Sirin’s [22] blog entry, we performed an import attempt in 2022-07.
     The import was run on a AWS EC2 instance with 253 GB RAM and 2TB SSD using Stardog
     8.0. For the import the ttl.bz2 dump from 2022-07-12 was used and it took one day and 19
     hours to complete with a loading speed of 109.5K triples/sec [25]. The slow loading time
     can be compared to Sirin’s results of 9h using Stardog 7.9. The differences in time might
     be due to some preprocessing 15 that was performed, taking additionally multiple hours,
     and the selected dump format. The results suggest that the n-tuples format allows faster
     loading times due to better parallelization. After the configuration of the server, instance
     the import was started with a single command.
         With a cost of around 50 EUR per day we switched from an AWS instance to a self hosted
         server with 130GB RAM. The server has running reliably for three weeks as of 2022-08.

Virtuoso The result of Hugh Williams’s [17] Wikidata import of 2020 is publicly available as
     a Virtuoso SPARQL endpoint described in Table 1. Our own attempt for import has not
     been finished yet since the needed hardware was not procured in time.

Allegro Graph Based on Craig Norvell [26] report, we intend to attempt our own import as
     soon as the needed hardware is procured.

3.2. Usefulness
The Wikidata Graph Pattern Benchmark [27] has been used to check the basic functionality and
performance of the successful imports in comparison with public endpoints. The benchmark
consists of 850 queries and covers different aspects of graph patterns. Figure 4 shows the median
   14
        https://github.com/ad-freiburg/qlever
   15
        starting with Stardog 7.9 this preprocessing step of partitioning the dump file is no longer required
execution time of the 850 queries of the benchmark after 10 iterations on each endpoint. This is
only anecdotal evidence for the performance given the difference is in the number of triples
and the hardware. All queries except 2 ran successfully at least once on each endpoint 16 .
   The compatibility of the servers with Wikidata and the kind of queries that are possible
differs widely from endpoint to endpoint.




Figure 4: Wikidata Graph Pattern Benchmark (WGPB) results for different query engines.




4. Related Work
In addition to the success reports in Table 3, there has been much other Wikidata-related work
performed across the industry. The following list describes this work.

Wikidata imports Höfler [8] asked what the procedure for importing a complete copy of
    Wikidata was on StackExchange in 2013. Malyshev [9] provided an official "Getting
    started" guide to host your own copy of Wikidata in 2015. The Wikimedia Founda-
    tion [28] provides a web page defining how to create a comparable Blazegraph Wikidata
    implementation.
         Table 3 references the successful Wikidata imports of which we are aware.
          Hernández et al. [29] compare the import of a 2016 Wikidata dump with 18 million
          entities into Blazegraph 2.1.0, Virtuoso 7.2.3, PostgreSQL 9.1.20 and Neo4J-community-
          2.3.1 regarding the performance of queries.
          Kovács et al. [30] used the same 2016 Wikidata dump as [29] and reported on an import
          into Neo4J 3.3.3, Blazegraph 2.1.4 and JanusGraph 0.2.0.
   16
        WGPB query J4 in line 38 has a syntax error and thus can not run successfully on any endpoint
Proposals to avoid using your own copy of Wikidata Minier et al. [31] propose mitigat-
     ing the SPARQL query quota problem of service providers by splitting queries and running
     the individual sub-queries within the quotas. This workaround will still not create reliable
     and timely results if a service completely fails or is under heavy load. Also, splitting the
     queries requires complex analysis and processing of intermediate result sets for the heavy
     load use cases that are the motivation for our work.
          Henselmann and Harth [32] propose an algorithm for constructing a subset of the Wikidata
          knowledge graph on demand. The implementation likely requires a copy of Wikidata to
          be able to run. This defeats the purpose of the idea if the implementation would not be
          provided as a service.
         Aimonier and Davat [33] propose using the HyperLogLog++ algorithm to estimate cardi-
         nalities for COUNT-DISTINCT queries. This only solves part of the general problem and
         requires the installation and execution of a separate infrastructure.
         Chalupksy [34] proposes using the Knowledge Graph Toolkit KGTK17 to import the
         Wikidata dump and be able to query the result locally on a machine with low resources
         (laptop). 18

Data Quality Färber defines 34 data quality metrics and analyses five knowledge graphs,
     Freebase, DBpedia, OpenCyc, Wikidata and YAGO, against these metrics [35]. The
     reproducibility of the results is limited since the work was done in 2015 when Wikidata
     had less than 20 million items.


5. Conclusion
The procedure for getting and hosting your own copy of Wikidata is a moving target and is not
well defined, automated or repeatable. Comparison of key features such as reliability, needed
resources and usefulness is based on anecdotal evidence at this time.
  The download and indexing time for a round trip update is currently a full day, even with a
robust hardware environment. For personal use, this is already prohibitive.

5.1. Future work
Improving the technical exchange related to successfully hosting a copy of Wikidata would be
valuable to remedy the current lack of definition, automation and repeatability of the procedures.
Having sound scientific performance and usefulness analyses would benefit all parties needing
access to a reliable and performant Wikidata knowledge graph (that is not as limited as the
current public offerings). Specifically, implementing the metric analysis according to [35] in a
dashboard would be a valuable contribution.
  The search for a Blazegraph alternative by the Wikimedia Foundation [4] has already provided
valuable analyses that could become the basis for a general benchmark for Wikidata content

   17
        https://github.com/usc-isi-i2/kgtk
   18
        see documentation at https://kgtk.readthedocs.io/en/latest/import/import_wikidata/
platforms. However, the Foundation is limiting its exploration of alternatives to only open
source solutions.
   Hernández [29] pointed out that "testing all of these combinations of features in a systematic
way would require extensive experiments outside the current scope" and indeed such extensive
experiments would be valuable.
   There are also other new graph databases and other approaches that should be considered for
their viability especially regarding scalability and distributability. RDF4J Release 4 [36] with a
new embedded triple store should also be investigated. Azure CosmosDB, HyperGraphDB and
GraKn have already been mentioned in [30] as planned analysis targets. Dgraph was already
targeted in 2018 by the success story [12].
   Lastly, the problem of keeping a Wikidata implementation “current” must be addressed. This
is difficult since data changes are frequent, while Wikidata RDF dumps are released weekly.
Both a Kafka- and HTTP API-based solution is documented Wikidata query service updater
evolution19 . This needs to be further refined and provided as a service. It is valuable to note
that the current updater used by the Wikimedia Foundation is not publicly available.


6. Acknowledgements
We thank all reporters of successful Wikidata imports for their valuable contribution. We thank
Federico Bonelli for sponsoring the AWS instance used for the Stardog import.
  This research has been partly funded by a grant of the Deutsche Forschungsgemeinschaft
(DFG). 20


References
 [1] D. Vrandečić, M. Krötzsch, Wikidata, Communications of the ACM 57 (2014) 78–85.
     doi:10.1145/2629489.
 [2] B. Bebee, Blazegraph™ db - ultra high-performance graph database supporting blueprints
     and rdf/sparql api, 2015. URL: https://blazegraph.com/.
 [3] W. Search, WMDE, Wikidata:sparql query service/wdqs backend update/august 2021
     scaling update, 2021. URL: https://m.wikidata.org/wiki/Wikidata:SPARQL_query_service/
     WDQS_backend_update/August_2021_scaling_update.
 [4] M. Pham, G. Lederrey, A. Westerinen, et al., Wikidata query service backend update, 2022.
     URL: https://www.wikidata.org/wiki/Category:WDQS_backend_update.
 [5] T. V. Wal, Folksonomy coinage and definition, 2007. URL: https://www.vanderwal.net/
     folksonomy.html.
 [6] A. Khatun, Table of top 50 subgraph information, 2021. URL: https://wikitech.wikimedia.
     org/wiki/User:AKhatun/Wikidata_Subgraph_Analysis#Table_of_top_50_subgraph_
     information.
 [7] T. I. T. Hannover, 2020. URL: https://projects.tib.eu/en/confident/.

   19
        https://addshore.com/2022/04/wikidata-query-service-updater-evolution/
   20
        ConfIDent project; see https://gepris.dfg.de/gepris/projekt/426477583
 [8] P. Höfler,       How can i download the complete wikidata database,
     2013.               URL:             https://opendata.stackexchange.com/questions/107/
     how-can-i-download-the-complete-wikidata-database.
 [9] S. Malyshev, Getting started, 2015. URL: https://github.com/wikimedia/wikidata-query-rdf/
     blob/master/docs/getting-started.md.
[10] W. Fahl, T. Holzheim, Get your own copy of wikidata, 2020. URL: https://wiki.bitplan.com/
     index.php/Get_your_own_copy_of_WikiData.
[11] A. Seaborne, Report on loading wikidata, 2017. URL: https://lists.apache.org/thread/
     m8jjmbckm4c31gcl73dl30z6m6jpzj5o.
[12] topicseed, Importing wikidata dumps — the easy part, 2018. URL: https://topicseed.com/
     blog/importing-wikidata-dumps/.
[13] C. Capol, Wikidata import in apache jena, 2019. URL: https://muncca.com/2019/02/14/
     wikidata-import-in-apache-jena/.
[14] A. Sanchez, [wikidata] minimal hardware requirements for loading wikidata dump in blaze-
     graph, 2019. URL: https://lists.wikimedia.org/hyperkitty/list/wikidata@lists.wikimedia.
     org/message/3FIVDPUGNPGDUURWSDPYQG4W6DN2I2RR/.
[15] A. Sanchez,         Virtuoso hosted wikidata instance,            2019. URL: https://
     lists.wikimedia.org/hyperkitty/list/wikidata@lists.wikimedia.org/message/
     CKN3QPV2NJ5TAKHORMYDTTXG7Y65UXAF/.
[16] A. Shoreland, Your own wikidata query service, with no limits, 2019. URL: https://addshore.
     com/2019/10/your-own-wikidata-query-service-with-no-limits/.
[17] H. Williams,         Loading wikidata into virtuoso (open source or en-
     terprise       edition),      2020.      URL:        https://community.openlinksw.com/t/
     loading-wikidata-into-virtuoso-open-source-or-enterprise-edition/2717.
[18] J. Sourlier, Jena issue 1909 - report of a wikidata import with jena, 2020. URL: https:
     //issues.apache.org/jira/browse/JENA-1909.
[19] W. Fahl, Wikidata import 2020-07-15, 2020. URL: https://wiki.bitplan.com/index.php/
     WikiData_Import_2020-07-15.
[20] W. Fahl, Wikidata import 2020-08-15, 2020. URL: https://wiki.bitplan.com/index.php/
     WikiData_Import_2020-08-15.
[21] W. Fahl, Wikidata import 2022-01-29, 2022. URL: https://wiki.bitplan.com/index.php/
     WikiData_Import_2022-01-29.
[22] P. K. Evren Sirin, Wikidata in stardog, 2022. URL: https://www.stardog.com/labs/blog/
     wikidata-in-stardog.
[23] H. Bast, Using qlever for wikidata, 2022. URL: https://github.com/ad-freiburg/qlever/wiki/
     Using-QLever-for-Wikidata.
[24] W. Fahl, Wikidata import 2022-06-25, 2022. URL: https://wiki.bitplan.com/index.php/
     WikiData_Import_2022-06-25.
[25] T. Holzheim, W. Fahl, Wikidata import 2022-07-12, 2022. URL: https://wiki.bitplan.com/
     index.php/WikiData_Import_2022-07-12.
[26] C. Norvell, Wikidata on allegrograph, 2022. URL: https://wiki.bitplan.com/index.php/
     Wikidata_on_Allegrograph.
[27] A. Hogan, C. Riveros, C. Rojas, A. Soto, Wikidata Graph Pattern Benchmark (WGPB) for
     RDF/SPARQL, 2019. URL: https://doi.org/10.5281/zenodo.4035223. doi:10.5281/zenodo.
     4035223.
[28] Wikimedia, Wikidata query service, 2017. URL: https://wikitech.wikimedia.org/wiki/
     Wikidata_Query_Service.
[29] D. Hernández, A. Hogan, C. Riveros, C. Rojas, E. Zerega, Querying wikidata: Comparing
     sparql, relational and graph databases, in: P. Groth, E. Simperl, A. J. G. Gray, M. Sabou,
     M. Krötzsch, F. Lécué, F. Flöck, Y. Gil (Eds.), The Semantic Web - ISWC 2016 - 15th
     International Semantic Web Conference, Kobe, Japan, October 17-21, 2016, Proceedings,
     Part II, volume 9982 of Lecture Notes in Computer Science, 2016, pp. 88–103. URL: https:
     //doi.org/10.1007/978-3-319-46547-0_10. doi:10.1007/978-3-319-46547-0\_10.
[30] T. Kovács, G. Simon, G. Mezei, Benchmarking graph database backends - what works well
     with wikidata?, Acta Cybern. 24 (2019) 43–60. URL: https://doi.org/10.14232/actacyb.24.1.
     2019.5. doi:10.14232/actacyb.24.1.2019.5.
[31] T. Minier, H. Skaf-Molli, P. Molli, SaGe: Web preemption for public SPARQL query services,
     in: The World Wide Web Conference on - WWW '19, ACM Press, 2019, pp. 1268––1278.
     doi:10.1145/3308558.3313652.
[32] D. Henselmann, A. Harth, Constructing demand-driven wikidata subsets, in: L. Kaffee,
     S. Razniewski, A. Hogan (Eds.), Proceedings of the 2nd Wikidata Workshop (Wikidata 2021)
     co-located with the 20th International Semantic Web Conference (ISWC 2021), Virtual
     Conference, October 24, 2021, volume 2982 of CEUR Workshop Proceedings, CEUR-WS.org,
     2021. URL: http://ceur-ws.org/Vol-2982/paper-10.pdf.
[33] J. Aimonier-Davat, H. Skaf-Molli, P. Molli, A. Grall, T. Minier, Online approximative
     SPARQL query processing for COUNT-DISTINCT queries with web preemption, Semantic
     Web 13 (2022) 735–755. doi:10.3233/sw-222842.
[34] H. Chalupsky, P. Szekely, F. Ilievski, D. Garijo, K. Shenoy, Creating and querying personal-
     ized versions of wikidata on a laptop, 2021. doi:10.48550/ARXIV.2108.07119.
[35] M. Färber, F. Bartscherer, C. Menne, A. Rettinger, Linked data quality of DBpedia, freebase,
     OpenCyc, wikidata, and YAGO, Semantic Web 9 (2017) 77–129. doi:10.3233/sw-170275.
[36] Eclipse, Rdf4j 4.0.0 released, 2022. URL: https://rdf4j.org/news/2022/04/24/rdf4j-4.0.
     0-released/.