<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Wikidata workshop at ISWC</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Getting and hosting your own copy of Wikidata</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Wolfgang Fahl</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tim Holzheim</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andrea Westerinen</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Christoph Lange</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Stefan Decker</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Fraunhofer FIT</institution>
          ,
          <addr-line>Sankt Augustin</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>OntoInsights LLC</institution>
          ,
          <addr-line>Elkton, MD</addr-line>
          ,
          <country country="US">United States</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>RWTH Aachen University</institution>
          ,
          <addr-line>Computer Science i5, Aachen</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2022</year>
      </pub-date>
      <volume>2022</volume>
      <fpage>0000</fpage>
      <lpage>0002</lpage>
      <abstract>
        <p>Wikidata is a very large, crowd sourced, general knowledge graph that is backed by a worldwide community. Its original purpose was to link diferent versions of Wikipedia articles across multiple languages. Access to Wikidata is provided by the non-profit Wikimedia Foundation and recently also by Wikimedia Enterprise as a commercial service. The query access via the public Wikidata Query Service (WDQS) has limits that make larger queries with millions of results next to impossible, due to a one minute timeout restriction. Beyond addressing the timeout restriction, hosting a copy of Wikidata may be desirable in order to have a more reliable service, quicker response times, less user load, and better control over the infrastructure. It is not easy, but it is possible to get and host your own copy of Wikidata. The data and software needed to run a complete Wikidata instance are available as open source or accessible via free licenses. In this paper, we report on both successful and failed attempts to get and host your own copy of Wikidata, using diferent triple store servers. We share recommendations for the needed hardware and software, provide documented scripts to semi-automate the procedures, and document things to avoid.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Wikidata</kwd>
        <kwd>RDF Bulk Import</kwd>
        <kwd>SPARQL</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>endpoint provided by OpenLink Software has data that is many months old.</p>
      <p>A key decision factor for getting and hosting your own copy of Wikidata is the cost-benefit
relation. The cost includes the infrastructure plus the efort to get, host and update the data.
The benefit depends on the use cases and what Wikidata content your use cases need - see
section 1.1 below.</p>
      <p>
        The current public Wikidata Query Service infrastructure uses Blazegraph [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. That
implementation is based on open-source software that is no longer in development and is approaching
execution limitations. As stated in a Wikimedia Foundation Blog entry "Blazegraph maxing out
in size poses the greatest risk for catastrophic failure, as it would efectively prevent WDQS
from being updated further [...]" [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] and therefore, the Wikimedia Foundation is currently
investigating various stop-gap and replacement strategies [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Having second source options to
host a complete copy of Wikidata in a reproducible way is an important alternative, especially
for the transition phase of the upcoming years.
      </p>
      <sec id="sec-1-1">
        <title>1.1. Content of Wikidata</title>
        <p>
          Wikidata originally started as a knowledge graph for the international Wikipedia encyclopedias.
From there it evolved to a folksonomy-style [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] general knowledge graph.
        </p>
        <p>
          The Wikidata statistics web page5 shows a pie chart of the entity types having the largest
number of instances, but the data is already 2 years old, at the time of this writing. Khatun [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]
provided a table of the top 50 "subgraphs" - which are essential the entity classes - of Wikidata
and corresponding charts. A more current statistic based on our Stardog6 import of 2022-06 is
shown in Figure 1.
        </p>
        <p>
          If you are interested in a large number of Wikidata items, then there is a higher probability that
having your own copy of Wikidata will be beneficial. Our use case in the ConfIDent project [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]
revolves around the entities, academic conference, scientific event series, scholarly article,
proceedings and author, which constitute a substantial portion of the Wikidata knowledge
graph. Since essential queries for this project where not possible via the publicly available
endpoints, we decided to get and host our own copies of Wikidata.
        </p>
      </sec>
      <sec id="sec-1-2">
        <title>1.2. Size of Wikidata as a moving target</title>
        <p>5https://www.wikidata.org/wiki/Wikidata:Statistics
6https://www.stardog.com/
200
175
150
] 125
B
G
[ze 100
i
s
life 75
50
25
0
nt-bz2
nt-gz
ttl-bz2
ttl-gz
json-bz2
json-gz
←3.0 billion triples</p>
        <p>17.2 billion triples→
11.3 billion triples→
←9.5 billion triples
ifrst reported success in 2017 to the most recent in 2022-07, Wikidata has grown by more than a
factor 5. Therefore the reproducibility and comparability of the results are limited.
2017
2018
2019</p>
        <p>2020
date
2021
2022</p>
      </sec>
      <sec id="sec-1-3">
        <title>1.3. Getting and hosting your own copy of Wikidata</title>
        <p>If you intend to get and host your own copy of Wikidata, the following aspects need to be
considered:
Reliability of the import - Will you actually get a running system in the end or run into
some kind of software or hardware limit you didn’t foresee?
Needed Resources - What kind of computer will you need - e.g. how much RAM and disk
space are needed? What kind and version of operating system are needed? Can the
machine be hosted virtually, and if so how much will it cost? How much time will the
import take?
Usefulness of results - How useful will the result be for your use case? Will you get better
performing queries? Do you need “always current” data? Will your own Wikidata server
be compatible with the original one? Will the infrastructure be more reliable than the
public endpoints?
Consistency How to keep your copy in sync with Wikidata? Integration of the wikidata update
stream7 into the knowledge graph and how to handle update interrupts? Possibility of
publishing updates from the copy to Wikidata.</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. Materials and Methods</title>
      <sec id="sec-2-1">
        <title>2.1. Wikidata copy bulk import procedure</title>
        <p>Höfler’s [ 8] StackExchange question and Malyshev’s [9] "getting started" guide were the basis
of our first attempt to get and host a complete copy of Wikidata in January 2018. The motivation
was to run Apache Gremlin8 queries on Wikidata. 9</p>
        <p>Table 2 shows the diferent triple stores we considered for testing the import and setup
procedure.</p>
        <p>General steps for getting a complete Wikidata copy
• Procure the hardware and software for the indexing and hosting environment (which
might be two diferent computers)
• Download the current Wikidata dump
• Install the triple store software
• Configure the triple store
• Optionally preprocess the triples
• Run the bulk import / indexing procedure
• Optionally copy the results from the indexing machine to a target machine
• Start the server
• Optionally start a separate GUI webserver
7https://www.wikidata.org/wiki/Wikidata:Data_access#Recent_Changes_stream
8https://github.com/blazegraph/tinkerpop3
9Unfortunately we never got this working since we couldn’t connect the Blazegraph endpoint to the gremlin
infrastructure.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Results</title>
      <p>3.1.</p>
      <sec id="sec-3-1">
        <title>Wikidata Imports</title>
        <p>10https://wiki.bitplan.com/index.php/SPARQL
11https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service</p>
        <p>We had to use an SSD disk instead of a rotating disk to improve the preparation speed
of the munge script that does the preprocessing of the triples. After successful import,
the endpoint has been running reliably since 2018 and only needs an occasional server
restart (usually after the software crashes on a query that pushes the hardware limits of
64 GB RAM).</p>
        <p>The automation of the procedure is quite poor - there are several manual configuration
steps necessary. Note that we did not bother to document an attempt that took 4 days!
The hardware cost was some 100 EUR for a used server and 140 EUR for a 480 GB SSD.
Apache Jena/Fuseki Based on Andy Seaborn’s [11] success report of 2017 we performed
seven attempts from 2020-04 until 2020-08, when we could document a success [20].
The problems encountered have been documented on our wiki and include references
to the Apache Jena tdbloader software issues that were required to be fixed. The script
wikidata2jena on the Success Report web site has a fully automated procedure that worked
at the time of the success.
The 4 TB SSD cost some 700 EUR. The indexing server was a 1000 EUR used Apple Mac
Pro 2012/12 core with 64 GB of RAM.</p>
        <p>The 4 TB SSD was moved to a publicly visible endpoint of the RWTH Aachen i5 that is
access protected and has been operating reliably. Restarts are necessary when the 6 GB
RAM allocation limit of the server has been hit.</p>
        <p>QLever A series of import attempts were performed for QLever between 2022-02 and 2022-07.</p>
        <p>Hannah Bast’s [23] import results are publicly available. Our import attempts were
mostly performed on a used 128 GB RAM Server using Ubuntu 20.04 LTS. We also tried
using the Apple Mac Pro 2012 machine. Our results varied depending on whether we
were working from a docker image or using a native build. Two successful indexing
attempts are documented on our wiki.</p>
        <p>The procedure is fully automated using the QLever script12 and alternatively the “oficial”
qlever-control13 that was inspired by the need to make the import procedure repeatable.
The import took 9 h in Bast’s attempt and 22 h for the fastest import on our own machine.
The public server has been quite reliable while our own machine has had a sequence
of problems that we documented on our wiki and in the issues of the QLever github
12https://wiki.bitplan.com/index.php/QLever/script
13https://github.com/ad-freiburg/qlever-control
repository14. The cost for the used server was 400 EUR and the cost for another 4 TB SSD
had fallen to 350 EUR by this time.</p>
        <p>Stardog Based on Evren Sirin’s [22] blog entry, we performed an import attempt in 2022-07.</p>
        <p>The import was run on a AWS EC2 instance with 253 GB RAM and 2TB SSD using Stardog
8.0. For the import the ttl.bz2 dump from 2022-07-12 was used and it took one day and 19
hours to complete with a loading speed of 109.5K triples/sec [25]. The slow loading time
can be compared to Sirin’s results of 9h using Stardog 7.9. The diferences in time might
be due to some preprocessing 15 that was performed, taking additionally multiple hours,
and the selected dump format. The results suggest that the n-tuples format allows faster
loading times due to better parallelization. After the configuration of the server, instance
the import was started with a single command.</p>
        <p>With a cost of around 50 EUR per day we switched from an AWS instance to a self hosted
server with 130GB RAM. The server has running reliably for three weeks as of 2022-08.
Virtuoso The result of Hugh Williams’s [17] Wikidata import of 2020 is publicly available as
a Virtuoso SPARQL endpoint described in Table 1. Our own attempt for import has not
been finished yet since the needed hardware was not procured in time.</p>
        <p>Allegro Graph Based on Craig Norvell [26] report, we intend to attempt our own import as
soon as the needed hardware is procured.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Usefulness</title>
        <p>The Wikidata Graph Pattern Benchmark [27] has been used to check the basic functionality and
performance of the successful imports in comparison with public endpoints. The benchmark
consists of 850 queries and covers diferent aspects of graph patterns. Figure 4 shows the median
14https://github.com/ad-freiburg/qlever
15starting with Stardog 7.9 this preprocessing step of partitioning the dump file is no longer required
execution time of the 850 queries of the benchmark after 10 iterations on each endpoint. This is
only anecdotal evidence for the performance given the diference is in the number of triples
and the hardware. All queries except 2 ran successfully at least once on each endpoint 16.</p>
        <p>The compatibility of the servers with Wikidata and the kind of queries that are possible
difers widely from endpoint to endpoint.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Related Work</title>
      <p>In addition to the success reports in Table 3, there has been much other Wikidata-related work
performed across the industry. The following list describes this work.</p>
      <p>Wikidata imports Höfler [ 8] asked what the procedure for importing a complete copy of
Wikidata was on StackExchange in 2013. Malyshev [9] provided an oficial "Getting
started" guide to host your own copy of Wikidata in 2015. The Wikimedia
Foundation [28] provides a web page defining how to create a comparable Blazegraph Wikidata
implementation.</p>
      <p>Hernández et al. [29] compare the import of a 2016 Wikidata dump with 18 million
entities into Blazegraph 2.1.0, Virtuoso 7.2.3, PostgreSQL 9.1.20 and
Neo4J-community2.3.1 regarding the performance of queries.</p>
      <p>Kovács et al. [30] used the same 2016 Wikidata dump as [29] and reported on an import
into Neo4J 3.3.3, Blazegraph 2.1.4 and JanusGraph 0.2.0.</p>
      <p>16WGPB query J4 in line 38 has a syntax error and thus can not run successfully on any endpoint</p>
      <sec id="sec-4-1">
        <title>Proposals to avoid using your own copy of Wikidata Minier et al. [31] propose mitigat</title>
        <p>ing the SPARQL query quota problem of service providers by splitting queries and running
the individual sub-queries within the quotas. This workaround will still not create reliable
and timely results if a service completely fails or is under heavy load. Also, splitting the
queries requires complex analysis and processing of intermediate result sets for the heavy
load use cases that are the motivation for our work.</p>
        <p>Henselmann and Harth [32] propose an algorithm for constructing a subset of the Wikidata
knowledge graph on demand. The implementation likely requires a copy of Wikidata to
be able to run. This defeats the purpose of the idea if the implementation would not be
provided as a service.</p>
        <p>Aimonier and Davat [33] propose using the HyperLogLog++ algorithm to estimate
cardinalities for COUNT-DISTINCT queries. This only solves part of the general problem and
requires the installation and execution of a separate infrastructure.</p>
        <p>Chalupksy [34] proposes using the Knowledge Graph Toolkit KGTK17 to import the
Wikidata dump and be able to query the result locally on a machine with low resources
(laptop). 18
Data Quality Färber defines 34 data quality metrics and analyses five knowledge graphs,
Freebase, DBpedia, OpenCyc, Wikidata and YAGO, against these metrics [35]. The
reproducibility of the results is limited since the work was done in 2015 when Wikidata
had less than 20 million items.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion</title>
      <p>The procedure for getting and hosting your own copy of Wikidata is a moving target and is not
well defined, automated or repeatable. Comparison of key features such as reliability, needed
resources and usefulness is based on anecdotal evidence at this time.</p>
      <p>The download and indexing time for a round trip update is currently a full day, even with a
robust hardware environment. For personal use, this is already prohibitive.</p>
      <sec id="sec-5-1">
        <title>5.1. Future work</title>
        <p>Improving the technical exchange related to successfully hosting a copy of Wikidata would be
valuable to remedy the current lack of definition, automation and repeatability of the procedures.
Having sound scientific performance and usefulness analyses would benefit all parties needing
access to a reliable and performant Wikidata knowledge graph (that is not as limited as the
current public oferings). Specifically, implementing the metric analysis according to [ 35] in a
dashboard would be a valuable contribution.</p>
        <p>
          The search for a Blazegraph alternative by the Wikimedia Foundation [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] has already provided
valuable analyses that could become the basis for a general benchmark for Wikidata content
17https://github.com/usc-isi-i2/kgtk
18see documentation at https://kgtk.readthedocs.io/en/latest/import/import_wikidata/
platforms. However, the Foundation is limiting its exploration of alternatives to only open
source solutions.
        </p>
        <p>Hernández [29] pointed out that "testing all of these combinations of features in a systematic
way would require extensive experiments outside the current scope" and indeed such extensive
experiments would be valuable.</p>
        <p>There are also other new graph databases and other approaches that should be considered for
their viability especially regarding scalability and distributability. RDF4J Release 4 [36] with a
new embedded triple store should also be investigated. Azure CosmosDB, HyperGraphDB and
GraKn have already been mentioned in [30] as planned analysis targets. Dgraph was already
targeted in 2018 by the success story [12].</p>
        <p>Lastly, the problem of keeping a Wikidata implementation “current” must be addressed. This
is dificult since data changes are frequent, while Wikidata RDF dumps are released weekly.
Both a Kafka- and HTTP API-based solution is documented Wikidata query service updater
evolution19. This needs to be further refined and provided as a service. It is valuable to note
that the current updater used by the Wikimedia Foundation is not publicly available.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Acknowledgements</title>
      <p>We thank all reporters of successful Wikidata imports for their valuable contribution. We thank
Federico Bonelli for sponsoring the AWS instance used for the Stardog import.</p>
      <p>This research has been partly funded by a grant of the Deutsche Forschungsgemeinschaft
(DFG). 20</p>
      <p>19https://addshore.com/2022/04/wikidata-query-service-updater-evolution/
20ConfIDent project; see https://gepris.dfg.de/gepris/projekt/426477583
[8] P. Höfler, How can i download the complete wikidata database,
2013. URL: https://opendata.stackexchange.com/questions/107/
how-can-i-download-the-complete-wikidata-database.
[9] S. Malyshev, Getting started, 2015. URL: https://github.com/wikimedia/wikidata-query-rdf/
blob/master/docs/getting-started.md.
[10] W. Fahl, T. Holzheim, Get your own copy of wikidata, 2020. URL: https://wiki.bitplan.com/
index.php/Get_your_own_copy_of_WikiData.
[11] A. Seaborne, Report on loading wikidata, 2017. URL: https://lists.apache.org/thread/
m8jjmbckm4c31gcl73dl30z6m6jpzj5o.
[12] topicseed, Importing wikidata dumps — the easy part, 2018. URL: https://topicseed.com/
blog/importing-wikidata-dumps/.
[13] C. Capol, Wikidata import in apache jena, 2019. URL: https://muncca.com/2019/02/14/
wikidata-import-in-apache-jena/.
[14] A. Sanchez, [wikidata] minimal hardware requirements for loading wikidata dump in
blazegraph, 2019. URL: https://lists.wikimedia.org/hyperkitty/list/wikidata@lists.wikimedia.
org/message/3FIVDPUGNPGDUURWSDPYQG4W6DN2I2RR/.
[15] A. Sanchez, Virtuoso hosted wikidata instance, 2019. URL: https://
lists.wikimedia.org/hyperkitty/list/wikidata@lists.wikimedia.org/message/
CKN3QPV2NJ5TAKHORMYDTTXG7Y65UXAF/.
[16] A. Shoreland, Your own wikidata query service, with no limits, 2019. URL: https://addshore.</p>
      <p>com/2019/10/your-own-wikidata-query-service-with-no-limits/.
[17] H. Williams, Loading wikidata into virtuoso (open source or
enterprise edition), 2020. URL: https://community.openlinksw.com/t/
loading-wikidata-into-virtuoso-open-source-or-enterprise-edition/2717.
[18] J. Sourlier, Jena issue 1909 - report of a wikidata import with jena, 2020. URL: https:
//issues.apache.org/jira/browse/JENA-1909.
[19] W. Fahl, Wikidata import 2020-07-15, 2020. URL: https://wiki.bitplan.com/index.php/</p>
      <p>WikiData_Import_2020-07-15.
[20] W. Fahl, Wikidata import 2020-08-15, 2020. URL: https://wiki.bitplan.com/index.php/</p>
      <p>WikiData_Import_2020-08-15.
[21] W. Fahl, Wikidata import 2022-01-29, 2022. URL: https://wiki.bitplan.com/index.php/</p>
      <p>WikiData_Import_2022-01-29.
[22] P. K. Evren Sirin, Wikidata in stardog, 2022. URL: https://www.stardog.com/labs/blog/
wikidata-in-stardog.
[23] H. Bast, Using qlever for wikidata, 2022. URL: https://github.com/ad-freiburg/qlever/wiki/</p>
      <p>Using-QLever-for-Wikidata.
[24] W. Fahl, Wikidata import 2022-06-25, 2022. URL: https://wiki.bitplan.com/index.php/</p>
      <p>WikiData_Import_2022-06-25.
[25] T. Holzheim, W. Fahl, Wikidata import 2022-07-12, 2022. URL: https://wiki.bitplan.com/
index.php/WikiData_Import_2022-07-12.
[26] C. Norvell, Wikidata on allegrograph, 2022. URL: https://wiki.bitplan.com/index.php/</p>
      <p>Wikidata_on_Allegrograph.
[27] A. Hogan, C. Riveros, C. Rojas, A. Soto, Wikidata Graph Pattern Benchmark (WGPB) for
RDF/SPARQL, 2019. URL: https://doi.org/10.5281/zenodo.4035223. doi:10.5281/zenodo.
4035223.
[28] Wikimedia, Wikidata query service, 2017. URL: https://wikitech.wikimedia.org/wiki/</p>
      <p>Wikidata_Query_Service.
[29] D. Hernández, A. Hogan, C. Riveros, C. Rojas, E. Zerega, Querying wikidata: Comparing
sparql, relational and graph databases, in: P. Groth, E. Simperl, A. J. G. Gray, M. Sabou,
M. Krötzsch, F. Lécué, F. Flöck, Y. Gil (Eds.), The Semantic Web - ISWC 2016 - 15th
International Semantic Web Conference, Kobe, Japan, October 17-21, 2016, Proceedings,
Part II, volume 9982 of Lecture Notes in Computer Science, 2016, pp. 88–103. URL: https:
//doi.org/10.1007/978-3-319-46547-0_10. doi:10.1007/978-3-319-46547-0\_10.
[30] T. Kovács, G. Simon, G. Mezei, Benchmarking graph database backends - what works well
with wikidata?, Acta Cybern. 24 (2019) 43–60. URL: https://doi.org/10.14232/actacyb.24.1.
2019.5. doi:10.14232/actacyb.24.1.2019.5.
[31] T. Minier, H. Skaf-Molli, P. Molli, SaGe: Web preemption for public SPARQL query services,
in: The World Wide Web Conference on - WWW '19, ACM Press, 2019, pp. 1268––1278.
doi:10.1145/3308558.3313652.
[32] D. Henselmann, A. Harth, Constructing demand-driven wikidata subsets, in: L. Kafee,
S. Razniewski, A. Hogan (Eds.), Proceedings of the 2nd Wikidata Workshop (Wikidata 2021)
co-located with the 20th International Semantic Web Conference (ISWC 2021), Virtual
Conference, October 24, 2021, volume 2982 of CEUR Workshop Proceedings, CEUR-WS.org,
2021. URL: http://ceur-ws.org/Vol-2982/paper-10.pdf.
[33] J. Aimonier-Davat, H. Skaf-Molli, P. Molli, A. Grall, T. Minier, Online approximative
SPARQL query processing for COUNT-DISTINCT queries with web preemption, Semantic
Web 13 (2022) 735–755. doi:10.3233/sw-222842.
[34] H. Chalupsky, P. Szekely, F. Ilievski, D. Garijo, K. Shenoy, Creating and querying
personalized versions of wikidata on a laptop, 2021. doi:10.48550/ARXIV.2108.07119.
[35] M. Färber, F. Bartscherer, C. Menne, A. Rettinger, Linked data quality of DBpedia, freebase,</p>
      <p>OpenCyc, wikidata, and YAGO, Semantic Web 9 (2017) 77–129. doi:10.3233/sw-170275.
[36] Eclipse, Rdf4j 4.0.0 released, 2022. URL: https://rdf4j.org/news/2022/04/24/rdf4j-4.0.
0-released/.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>D.</given-names>
            <surname>Vrandečić</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Krötzsch</surname>
          </string-name>
          , Wikidata,
          <source>Communications of the ACM</source>
          <volume>57</volume>
          (
          <year>2014</year>
          )
          <fpage>78</fpage>
          -
          <lpage>85</lpage>
          . doi:
          <volume>10</volume>
          .1145/2629489.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>B.</given-names>
            <surname>Bebee</surname>
          </string-name>
          , Blazegraph™ db
          <article-title>- ultra high-performance graph database supporting blueprints</article-title>
          and rdf/sparql api,
          <year>2015</year>
          . URL: https://blazegraph.com/.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>W.</given-names>
            <surname>Search</surname>
          </string-name>
          ,
          <string-name>
            <surname>WMDE</surname>
          </string-name>
          ,
          <article-title>Wikidata:sparql query service/wdqs backend update/august 2021 scaling update</article-title>
          ,
          <year>2021</year>
          . URL: https://m.wikidata.org/wiki/Wikidata:SPARQL_query_service/ WDQS_backend_
          <source>update/August_2021_scaling_update.</source>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>M.</given-names>
            <surname>Pham</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Lederrey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Westerinen</surname>
          </string-name>
          , et al.,
          <source>Wikidata query service backend update</source>
          ,
          <year>2022</year>
          . URL: https://www.wikidata.org/wiki/Category:WDQS_backend_update.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>T. V.</given-names>
            <surname>Wal</surname>
          </string-name>
          ,
          <article-title>Folksonomy coinage</article-title>
          and definition,
          <year>2007</year>
          . URL: https://www.vanderwal.net/ folksonomy.html.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>A.</given-names>
            <surname>Khatun</surname>
          </string-name>
          ,
          <source>Table of top 50 subgraph information</source>
          ,
          <year>2021</year>
          . URL: https://wikitech.wikimedia. org/wiki/User:AKhatun/Wikidata_Subgraph_
          <article-title>Analysis#Table_of_top_50_subgraph_ information.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>T. I. T.</given-names>
            <surname>Hannover</surname>
          </string-name>
          ,
          <year>2020</year>
          . URL: https://projects.tib.eu/en/confident/.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>