<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>September</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>The Wikidata Query Service Split and its Impact on the Scholarly Graph</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Tiago Lubiana</string-name>
          <email>tiagolubiana@gmail.com</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lane Rasberry</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Daniel Mietchen</string-name>
          <email>daniel.mietchen@fiz-karlsruhe.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>FIZ Karlsruhe - Leibniz Institute for Information Infrastructure</institution>
          ,
          <addr-line>Berlin</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>School of Data Science, University of Virginia</institution>
          ,
          <addr-line>Charlottesville, VA</addr-line>
          ,
          <country country="US">United States of America</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Wikimedia Brasil</institution>
          ,
          <addr-line>São Paulo, SP</addr-line>
          ,
          <country country="BR">Brazil</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <volume>0</volume>
      <fpage>3</fpage>
      <lpage>05</lpage>
      <abstract>
        <p>Wikidata, the open knowledge graph sister to Wikipedia, is undergoing major changes in 2025 as the Wikimedia Foundation splits its data into two graphs. One of the split pieces is essentially the data of the WikiCite project, which is an initiative expanding the use of Wikidata as a platform for scholarly information. Due to WikiCite's success, it accounts for over 50% of the triples on the Wikidata graph. This split was motivated by challenges in scaling up the Wikidata SPARQL Query Service, which relies on Blazegraph, a technology unsuited for Wikidata's current scale and growth rate. While the split has been prepared since 2021, this large infrastructure change has wide implications both for community tools which use WikiCite data, such as the Scholia platform, as well as for core functionalities of Wikidata, such as systems that detect duplicate items and constraint violations. In this paper, we present an overview of the infrastructure available on Wikidata for querying scholarly information and how it serves the community endeavours related to the WikiCite initiative. In particular, we focus on the Wikidata Query Service split, its motivations, and its impacts for those intending to use Wikidata as a source of semantic scholarly data. We present the alternatives for rewriting or redirecting broken queries, making explicit the rules of the graph split, and when federated queries are now required, guiding stakeholders on how to adapt to the new infrastructure.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Wikidata</kwd>
        <kwd>Scholia</kwd>
        <kwd>WikiCite</kwd>
        <kwd>scholarly knowledge graph</kwd>
        <kwd>SPARQL endpoint</kwd>
        <kwd>Blazegraph</kwd>
        <kwd>scalability</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Wikidata has matured as a core resource in the Linked Open Data ecosystem for academic information.
As of 2025, the platform has become part of workflows in biocuration and scholarmetrics, collecting
identifiers, disambiguating names, and providing a Query Service to gather information from a single
access point.</p>
      <p>
        The WikiCite project has scaled the capacity of Wikidata to host metadata about publications,
bringing semantics to the handling of citations in the Wikimedia ecosystem [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ]. The CiteQ template
on Wikipedia, for example, streamlined structured generation of references and is now evaluated as part
of the Altmetric credit pipeline [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Even inside Wikidata, the scholarly information curated through
WikiCite supports provenance for statements [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <p>
        The statements on Wikidata are retrievable in diferent ways. Data is openly available with CC0
dedication, and access options range from complete dumps [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] to on-demand requests via the MediaWiki
Action API and the recently developed Wikibase REST API. One of the arguably most powerful ways is
via SPARQL queries, which enable precise and complex requests for information [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <p>Wikidata hosts an oficial endpoint with a graphical interface, known as the "Wikidata Query Service"
(WDQS). The web interface of the query service, available at https://query.wikidata.org, facilitates the
writing of queries by providing autocompletion, syntax highlighting, and showcasing examples. The</p>
      <p>
        WDQS also provides a rich set of default visualizations, including image grids, interactive maps, bar
charts, and timelines, which can be embedded directly into web pages using &lt;iframe&gt; tags. The WDQS
is updated nearly in real time, with query results usually reflecting changes within minutes or even
seconds of the information coming to the platform. [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]
      </p>
      <p>
        Powerful as it is, the community widely uses the Wikidata Query Service, and embedding its results
is common practice. Of note, the Scholia platform for presenting and curating scholarly information
relies heavily on WDQS and its standardized, plug-and-play visualizations [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. The Scholia-style display
of rich bibliometric information via the Query Service was also adopted by other projects, such as
Vitrine NeuroMat [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] and the SARS-CoV-2 Query Book [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
      </p>
      <p>
        In 2015, the query service was launched by the Discovery Department at the Wikimedia Foundation
[
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] using Blazegraph as a database, out of many possible competing technologies [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. The Blazegraph
software, however, was soon left unmaintained and increasingly unable to keep up with the demands of
a growing Wikidata, including the tens of millions of articles added in the context of WikiCite (Figure
1). By 2022, the Wikimedia Foundation had already undergone several rounds of research towards a
replacement, but no clear winner emerged [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. They also analyzed the nature of the scholarly content
of Wikidata, at the time around 6.4 billion triples, half the size of the database [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. As an emergency
measure to avoid a catastrophic failure of the service, the Search Platform Team (which replaced the
Discovery Department at the Foundation in 2017) decided to divide the SPARQL Endpoint into two:
one for just the scholarly articles and another one for the rest of Wikidata.
      </p>
      <p>
        Even if carefully planned, such a change brings a number of challenges to Wikidata. Here, we describe
the graph split from a WikiCite perspective, documenting it with a focus on the academic community
relying on the Wikidata infrastructure. A more complete version of the discussion is documented on a
living Meta-Wiki page [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. There are also other perspectives published by the Wikidata community,
including a 2024 op-ed on Wikipedia’s newspaper, The Signpost [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ], and a 2025 benchmark report of
alternative backends [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ].
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Handling The Split</title>
      <sec id="sec-2-1">
        <title>2.1. Which items are included in each part of the split?</title>
        <p>
          As of August 2025, the technical rules for the split [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ] determine which items are considered scholarly
articles or not. Then, the processing pipeline separates the triples where scholarly articles are subjects
into a separate graph, which is loaded into the scholarly SPARQL endpoint. Every time a change
happens in Wikidata, the pipeline updates the triples served by each endpoint.
        </p>
        <p>
          Included in the scholarly graph are all triples for items that either (1) contain a (non-deprecated)
statement with the property "publication type of scholarly work" (P13046) or (2) have a (non-deprecated)
value for "instance of" (P31) that matches any one out of 49 scholarly QIDs. This list includes "scholarly
article" (Q13442814) and 24 subclasses of it, "thesis" (Q1266946) and 15 subclasses of it, plus "erratum"
(Q1348305), "dissertation" (Q1385450), "comment" (Q58897583), "research report" (Q59387148), "field
study report" (Q1402850), "conference poster" (Q54670950), "scientific note" (Q114613919) and, perhaps
surprisingly, "Bachelor of Literature" (Q112585758). These QIDs were selected by the Search Platform
team from a list containing over 4000 values [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ]. The triples that do not match the rules are directed to
the main graph, and there is no duplication of content.
        </p>
        <p>There are around 77 million items on the main graph and 45 million items on the scholarly graph.
The information on these scholarly works is particularly rich, corresponding to 8.7 billion triples, while
the main endpoint hosts slightly less, with around 8.4 billion.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. How to query for scholarly information</title>
        <p>
          It is often not trivial to know in which endpoint a particular query should be run, as query results may
be split across diferent graphs. To aid users in rewriting queries, we developed a simple online checker
for verifying queries across the Wikimedia-hosted endpoints [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ]. We also designed a simple decision
graph, towards helping SPARQL query writers in adapting to the change (Figure 2). This workflow
is being used to support the rewriting of queries in the Scholia platform, many of which depend on
information from both endpoints [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ].
        </p>
        <p>
          For SPARQL queries needing both graphs, internal federation is necessary. Its usage is being
documented collaboratively in the Internal Federation Guide [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ]. An example of basic syntax for retrieving
items from one graph and labels from another is shown in Query 1. Note that while the article type for
Q41799194 is retrieved from the scholarly graph, the labels for the types are only retrieved if calling the
wikibase:label service from the main endpoint.
        </p>
        <sec id="sec-2-2-1">
          <title>Query 1: Wikidata Query Service internal federation example</title>
          <p>SELECT ? a r t i c l e T y p e ? a r t i c l e T y p e L a b e l WHERE {
SERVICE wdsubgra ph : s c h o l a r l y _ a r t i c l e s {
wd : Q 4 1 7 9 9 1 9 4 wdt : P31 ? a r t i c l e T y p e }
SERVICE w i k i b a s e : l a b e l { bd : s e r v i c e P a r a m
w i k i b a s e : l a n g u a g e " en " . } }</p>
        </sec>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Does the split afect anything other than the Wikidata Query Service? For example, the Action API?</title>
        <p>The split does not afect functionalities of Wikidata that are independent of the Wikidata Query Service.
For example, requests via the REST API or the Action API should not be changed.</p>
        <p>
          There are assessments of growth on multiple parts of Wikidata, including discussions on the limits
of the platform and on extra regulation of mass edits [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ]. While allocating scholarly information in a
separate Wikibase instance may happen in the future, to the best of our knowledge, as of 2025, there
are no plans to split any other infrastructures beyond the SPARQL query service.
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Conclusion</title>
      <p>
        This paper discusses the Wikidata Query Service graph split from a WikiCite perspective, providing the
community with resources to navigate the changes. A longer, multilingual version of the presented FAQ
is available on the Meta page for WikiCite [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. With that, we hope to help semantic web researchers
use the scholarly graph on Wikidata and join the conversation towards a sustainable future for this
major Linked Open Data infrastructure.
      </p>
    </sec>
    <sec id="sec-4">
      <title>4. Acknowledgments</title>
      <sec id="sec-4-1">
        <title>This work was supported in part by Wikimedia CH and Wikimedia Deutschland.</title>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the author(s) used Grammarly, ChatGPT in order to: Grammar
and spelling check, Formatting assistance. After using these tools/services, the authors reviewed and
edited the content as needed and take full responsibility for the publication’s content.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>D.</given-names>
            <surname>Taraborelli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Dugan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Pintscher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Mietchen</surname>
          </string-name>
          , C. Neylon,
          <source>WikiCite 2016 Report</source>
          ,
          <year>2016</year>
          . doi:
          <volume>10</volume>
          . 6084/m9.figshare.
          <volume>4042530</volume>
          .
          <year>v2</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>M.</given-names>
            <surname>Lemus-Rojas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. D.</given-names>
            <surname>Odell</surname>
          </string-name>
          ,
          <article-title>Creating Structured Linked Data to Generate Scholarly Profiles: A Pilot Project using Wikidata and Scholia 6 (</article-title>
          <year>2018</year>
          ). URL: https://jlsc-pub.org/articles/10.7710/
          <fpage>2162</fpage>
          -
          <lpage>3309</lpage>
          .2272/galley/201/download/. doi:
          <volume>10</volume>
          .7710/
          <fpage>2162</fpage>
          -
          <lpage>3309</lpage>
          .
          <fpage>2272</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Altmetric</surname>
          </string-name>
          ,
          <article-title>Wikipedia tracking enhancement: supporting the Cite Q template</article-title>
          ,
          <year>2025</year>
          . URL: https://updates.altmetric.com/announcements/ wikipedia-tracking
          <article-title>-enhancement-supporting-the-cite-q-template.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>T. S.</given-names>
            <surname>Shafee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Mietchen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Lubiana</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Jemielniak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Waagmeester</surname>
          </string-name>
          ,
          <article-title>Ten quick tips for editing Wikidata 19 (</article-title>
          <year>2023</year>
          )
          <article-title>e1011235</article-title>
          . URL: https://journals.plos.org/ploscompbiol/article?id=
          <volume>10</volume>
          .1371/ journal.pcbi.1011235. doi:
          <volume>10</volume>
          .1371/journal.pcbi.
          <volume>1011235</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Wikidata</surname>
          </string-name>
          : Database download,
          <year>2025</year>
          . URL: https://www.wikidata.org/wiki/Wikidata:Database_ download.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>A.</given-names>
            <surname>Shorland</surname>
          </string-name>
          , Wikidata query service updater evolution,
          <year>2022</year>
          . URL: https://addshore.com/
          <year>2022</year>
          /04/ wikidata-query
          <article-title>-service-updater-evolution/.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>F.</given-names>
            <surname>Nielsen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Mietchen</surname>
          </string-name>
          , E. Willighagen, Scholia, Scientometrics and Wikidata,
          <source>in: The Semantic Web: ESWC 2017 Satellite Events, Lecture Notes in Computer Science</source>
          , Springer,
          <year>2017</year>
          , pp.
          <fpage>237</fpage>
          -
          <lpage>259</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>319</fpage>
          -70407-4_
          <fpage>36</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8] NeuroMat Research Group, Vitrine NeuroMat,
          <year>2025</year>
          . URL: https://vitrine.numec.prp.usp.br/.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Addshore</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Mietchen</surname>
            ,
            <given-names>E. L.</given-names>
          </string-name>
          <string-name>
            <surname>Willighagen</surname>
          </string-name>
          ,
          <article-title>Wikidata Queries around the SARS-CoV-2 virus and pandemic</article-title>
          , Zenodo,
          <year>2020</year>
          . URL: https://egonw.github.io/SARS-CoV-2-Queries/. doi:
          <volume>10</volume>
          .5281/zenodo. 3977414.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>D.</given-names>
            <surname>Garry</surname>
          </string-name>
          ,
          <source>Announcing the release of the Wikidata Query Service</source>
          ,
          <year>2015</year>
          . URL: https://lists.wikimedia.org/hyperkitty/list/wikidata@lists.wikimedia.org/thread/ N2HPRCYIWGLM2IDTNCHQLNY574H5ZEQR/, mailing list message.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Wikimedia</surname>
            <given-names>Foundation</given-names>
          </string-name>
          ,
          <article-title>Wikidata query pick backend (comparison spreadsheet</article-title>
          ),
          <year>2016</year>
          . URL: https: //www.wikidata.org/wiki/Wikidata:SPARQL_
          <article-title>query_service/Benchmark_for_the_backend.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>A.</given-names>
            <surname>Westerinen</surname>
          </string-name>
          , WDQS Search Team,
          <source>WDQS Backend Alternatives: The Process, Details and Results - Version 1</source>
          .1, https://commons.wikimedia.org/wiki/File:WDQS_Backend_Alternatives_working_ paper.pdf,
          <source>2022. Version 1</source>
          .1,
          <string-name>
            <surname>March</surname>
            <given-names>29</given-names>
          </string-name>
          ,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>A.</given-names>
            <surname>Khatun</surname>
          </string-name>
          , Wikidata Scholarly Articles Subgraph Analysis, https://wikitech.wikimedia.org/wiki/ User:Aisha_Khatun/Wikidata_Scholarly_Articles_Subgraph_Analysis,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14] WikiCite/WDQS graph split - FAQ,
          <year>2025</year>
          . URL: https://meta.wikimedia.org/wiki/WikiCite/WDQS_ graph_split.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>L.</given-names>
            <surname>Rasberry</surname>
          </string-name>
          ,
          <article-title>Wikidata to split as sheer volume of information overloads infrastructure</article-title>
          , https: //en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2024-05-16/Op-Ed,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>P. F. P.</given-names>
            <surname>Schneider</surname>
          </string-name>
          , Scaling Wikidata/Benchmarking/Final Report, https://www.wikidata.org/wiki/ Wikidata:Scaling_Wikidata/Benchmarking/Final_Report,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <surname>Wikidata</surname>
          </string-name>
          , Wikidata:SPARQL query service/WDQS graph split/Rules,
          <year>2025</year>
          . URL: https://www. wikidata.org/wiki/Wikidata:SPARQL_query_service/WDQS_graph_split/Rules.
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>T.</given-names>
            <surname>Lubiana</surname>
          </string-name>
          ,
          <article-title>Snapshot - WikiCite subclass hierarchy for the Wikidata scholarly graph split</article-title>
          ,
          <year>2025</year>
          . doi:
          <volume>10</volume>
          .5281/zenodo.16884396.
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>T.</given-names>
            <surname>Lubiana</surname>
          </string-name>
          ,
          <source>Wikidata Graph Split Simple SPARQL Benchmark</source>
          ,
          <year>2025</year>
          . URL: http://tiago.bio.br/ query-split-tester/.
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <surname>Wikidata</surname>
            <given-names>SPARQL</given-names>
          </string-name>
          <source>query service - internal federation guide</source>
          ,
          <year>2025</year>
          . URL: https://www.wikidata. org/wiki/Wikidata:SPARQL_query_service/WDQS_graph_split/Internal_Federation_Guide.
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>L.</given-names>
            <surname>Pintscher</surname>
          </string-name>
          , Wikidata:Requests for comment/Mass-editing policy, https://www.wikidata.org/wiki/ Wikidata:Requests_for_comment/Mass-editing_policy,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>