<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>L. Feddoul); frank.loefler@uni-jena.de (F. Löfler); sirko.schindler@dlr.de
(S. Schindler)</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Analysis of Consistency between Wikidata and Wikipedia Categories</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Leila Feddoul</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Frank Löfler</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sirko Schindler</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Competence Center for Digital Research, Michael Stifel Center</institution>
          ,
          <addr-line>Jena</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Heinz Nixdorf Chair for Distributed Information Systems, Friedrich Schiller University Jena</institution>
          ,
          <addr-line>Jena</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Institute of Data Science, German Aerospace Center DLR</institution>
          ,
          <addr-line>Jena</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2022</year>
      </pub-date>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0001</lpage>
      <abstract>
        <p>Wikipedia categories play a significant role in organizing articles by topic. They form a hierarchy, which groups related articles into larger collections. Wikidata provides a corresponding item for each category and allows to define membership of other items to the specific category by a SPARQL query or by specifying classes and properties. This provides us with multiple, redundant sources of category membership which may deviate quite substantially. In this paper, we investigate inconsistencies between Wikipedia and Wikidata category members and analyze possible reasons. We propose a candidate category generation and evaluation workflow that traverses the category hierarchy of Wikipedia in all available languages and compares the results with information obtained from Wikidata. This workflow can be executed either online using the publicly available endpoints or ofline based on the provided dumps. Furthermore, we formulate concrete suggestions to harmonize category membership definitions between Wikipedia and Wikidata.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Wikidata</kwd>
        <kwd>Wikipedia</kwd>
        <kwd>Wikipedia Category</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Wikipedia has grown to be a valuable source of semi-structured information, written and
maintained by a large community and provided for everyone to use. As of 2022, it contains over
6.5 million articles in its English section1, but is also available in 329 other languages2. It has a
community of about 280, 000 active editors and more than 100 million registered users. The
basic building blocks of Wikipedia are articles that are interlinked among each other.</p>
      <p>
        Wikidata [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] is, like Wikipedia, free and open, but instead of a collection of articles that are
intended primarily to be read by humans, it is a knowledge base that is intended to be read and
edited by humans and machines. Wikidata is a source of open data that other projects, including
Wikipedia, can use to enrich their services. The basic building block of Wikidata is an item,
which represents any kind of real-world topic, concept, or entity that is uniquely identified.
      </p>
      <sec id="sec-1-1">
        <title>Qualifiers</title>
      </sec>
      <sec id="sec-1-2">
        <title>SPARQL</title>
        <p>
          Wikipedia established ways to structure its building blocks (articles): Categories, i.e. sets
of articles or subcategories, are among them. They play an important role since they support
ifnding sets of articles having the same characteristics without knowing individual articles
beforehand. The Wikipedia category structure has also been exploited for other tasks like entity
retrieval [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] or document classification [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]. Wikipedia’s categories3 group articles with similar
topics. E.g., Category: Former countries groups a set of articles related to the concept of a former
country. This does not only include articles about the respective former countries like Inca
Empire, but also subcategory pages, e.g., Category: Former countries in fiction . Categories can
contain subcategories, but the resulting data structure is not a tree, but a more general graph
because articles and subcategories can be members in multiple parent categories and, while
discouraged, even loops can exist4.
        </p>
        <p>
          There is a quite close connection between categories in Wikipedia and Wikidata. In general,
for each page (article, category, or otherwise) in Wikipedia, there exists a corresponding
Wikidata item that is unique for all languages [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]. Wikidata category items are instances of
Wikimedia category (Q4167836). This type of Wikidata item has specific properties, some of
which describe a criterion for membership of a given Wikidata item to the considered category:
(i) Category contains (P4224) is described as category contains elements that are instances of this
item and consists of a value together with qualifiers 5 if available. The property value refers to
the type of items contained and is referred to as target in this paper. (ii) Wikidata SPARQL query
equivalent (P3921) is described as SPARQL code that returns a set of entities that correspond with
this category or list. Figure 1 shows the Wikidata item for the category Association football video
games (Q13199045), with video game (Q7889) as a target, genre (P136) together with sport (P641)
values as qualifiers, and a corresponding SPARQL query. In addition, a list of corresponding
Wikipedia articles is linked. This exemplifies the multiple sources for category membership used
3https://en.wikipedia.org/wiki/Wikipedia:Categorization
4https://en.wikipedia.org/wiki/Wikipedia:FAQ/Categorization
5Qualifiers provide additional information about a specific statement that may not be represented in a single
triple statement. For more details, kindly refer to https://www.wikidata.org/wiki/Help:Qualifiers.
in Wikipedia and Wikidata. If more than one source is given, the resulting category members
could in theory difer. As we will show later, this is often the case in practice and poses a
possible consistency problem.
        </p>
        <p>To the best of our knowledge, no previous work has analyzed the (in)consistencies between
Wikipedia category members and items retrieved using the SPARQL queries or targets attached
to the respective Wikidata categories and proposed a solution on how to reduce inconsistencies.
In this paper, we analyze them by comparing their content, elaborating on possible reasons
for and against making all sources consistent, and suggesting some potential future research
directions. The key contributions of this paper are: (i) A workflow for the automatic generation of
candidate categories together with their SPARQL Wikidata and Wikipedia members (mapped to
Wikidata) derived from traversing the Wikipedia category hierarchy in all available languages.
(ii) An analysis of inconsistencies within Wikidata categories and between Wikipedia and
Wikidata. (iii) An automatic investigation of possible reasons for inconsistency.</p>
        <p>
          The source code for the dataset generation is publicly available [
          <xref ref-type="bibr" rid="ref5 ref6">5, 6</xref>
          ] under an MIT License
and works on both online Wikipedia/Wikidata public endpoints and ofline SQL/JSON dumps.
All generated data [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ], cache files [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] containing data retrieved from dumps as well as experiment
results [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] are published on Zenodo. This makes the whole analysis fully reproducible (on
dumps of historic versions of the sources) as well as reusable (assuming the underlying items
and articles in Wikipedia and Wikidata do not change too much).
        </p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        Various works have investigated diferent aspects of leveraging Wikidata and Wikipedia
content. H. Turki et al. [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] focus on explaining how Wikipedia and Wikidata can be processed
using existing techniques for data parsing and querying. Furthermore, they raise awareness
about the usefulness of the integration of Wikipedia and Wikidata categories for diferent
semantic applications and provide some ideas to enhance the quality of both sources (e.g.,
removing non-transitive relations from the Wikipedia category graph through the analysis of
Wikidata statements). Driven by the observation that a large number of Wikidata entities lack
corresponding Wikipedia articles in some languages (orphans), N. Ostapuk et al. [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] propose
a pipeline to map Wikidata orphan entities to Wikipedia articles’ sections. Their goal is to
enrich orphans with additional facts and properties that are derived from their corresponding
textual description in Wikipedia. As a result they provide a dataset consisting of a collection
of Wikidata entities together with their potential links to related Wikipedia pages in diferent
languages. I. Johnson [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] analyzed how Wikidata content is referenced within the English
Wikipedia and proposed a taxonomy that categorizes Wikidata transclusions based on the reader
impact. In the context of Wikidata enrichment from external sources, A. Boschin et al. [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]
proposed a method based on knowledge graph embeddings to predict new facts (e.g., triple
completion) using the hyperlinks between Wikipedia articles. P. Curotto et al. [13] proposed a
Wikipedia-based approach for automatic suggestion of authoritative references for Wikidata
statements. The goal is to support editors while referencing Wikidata claims. To evaluate the
accuracy of the automatic recommendations, they also provide a gold standard dataset of sample
claims and their corresponding external references in the English Wikipedia.
Endpoints
      </p>
      <p>Cache
Population</p>
      <p>Candidate
Generation</p>
      <p>Candidate
Cleaning</p>
      <p>Evaluation</p>
    </sec>
    <sec id="sec-3">
      <title>3. Approach</title>
      <p>As the base of our analysis, we need to retrieve categories and their members from Wikipedia
and Wikidata respectively. The corresponding pipeline is outlined in Figure 2. It can be executed
either using the public APIs ofered by Wikipedia 6 and Wikidata7 or the regularly provided
SQL/JSON dumps from both sites8. We employ a cache to hold all information needed. When
using the public APIs, this cache is filled successively after each request. In case the data dumps
are used, a preprocessing step extracts all relevant data from the provided files and populates
the cache accordingly. In both cases, the cache allows to prevent redundant requests and speeds
up the processing considerably.</p>
      <p>We start the candidate generation by retrieving all items from Wikidata that correspond
to a Wikipedia category, i.e. instances of Wikimedia category (Q4167836). For each of those,
we further store: (i) The Wikidata identifier, (ii) the target given by the value of category
contains (P4224) including a list of subclasses, (iii) further qualifiers attached to the target,
(iv) the corresponding SPARQL query via Wikidata SPARQL query equivalent (P3921), if existing,
including the results after running the query, (v) and the corresponding Wikipedia category
pages in all languages. For each Wikipedia member, we also retrieve the direct types and all
their properties and their corresponding values9. Some categories are removed from further
consideration as the corresponding SPARQL queries do not adhere to the structure of a single
target and associated qualifiers. Among the deviations, there are: the use of multiple targets,
lack of an instance of (P31) relation, and queries involving property paths.</p>
      <p>Next, we turn to Wikipedia. For each of the previously identified categories, we fetch the
members and traverse the hierarchy of subcategories if necessary. Wikipedia versions in
diferent languages are maintained independently from each other 10. Membership in categories
is maintained manually and, hence, also difers across languages. Hence, we have to traverse
categories for each language independently. For each member, we store the corresponding
Wikidata identifier. While traversing the hierarchy of subcategories using a Breadth-First Search,
we apply type checks using the target of the initial category: If fewer than 50% of member
6https://en.wikipedia.org/w/api.php
7https://query.wikidata.org/
8https://dumps.wikimedia.org/backup-index.html and
https://dumps.wikimedia.org/wikidatawiki/entities/</p>
      <p>9We consider only object properties with unique values and they are used to automate the comparison during
the evaluation.</p>
      <p>10An exception are links between articles with similar topics across languages.
articles (excluding any subcategories) are instances of the target or any of its subclasses11, the
traversal in this branch will end. After this step, we have acquired not only member items from
Wikidata through the provided SPARQL query but also the manually curated list of members
from Wikipedia.</p>
      <p>Finally, we apply a cleaning step that removes some categories from consideration. Categories
will be omitted if one of the following criteria applies: (i) The category has more than one target.
(ii) The category has no corresponding Wikipedia members. This may be due to, e.g., the type
check already failing for the members of the initial Wikipedia category. (iii) The corresponding
SPARQL query yielded no results. (iv) Multiple SPARQL queries were supplied.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Evaluation</title>
      <p>The pipeline was executed using the Wikidata JSON dump of 2022-05-02 and the Wikipedia
SQL dumps of 2022-05-01. At that time, Wikidata contained roughly 4.99 million categories12.
Out of these, only 2, 280 have a corresponding SPARQL query (P3921), 749, 385 have a target
(category contains P4224), and only 516 have both of them. Using the restrictions outlined in
Section 3, this leaves us with 206 categories used for evaluation.</p>
      <p>Our goal is to perform an analysis of the consistency between Wikipedia and Wikidata
with respect to the categories’ content and an automatic investigation of possible reasons.
For this purpose, we compare the two member sets by calculating the precision and recall of
items corresponding to Wikipedia articles { } with respect to SPARQL query results
{ }:
  = |{ } ∩ { }|
|{ }|
(1)
 = |{ } ∩ { }| (2)</p>
      <p>|{ }|</p>
      <p>Results reveal an average precision of ∼ 0.65 and an average recall of
∼ 0.75. Figure 3a and Figure 3b show the distribution of both metrics for the 206
candidate categories. Based on Figure 3a, we observe that for 136 out of 206 categories, at least 80% of
items retrieved using SPARQL also appear as Wikipedia members, 88 categories share more than
90% of the items, and 19 of the categories have a low recall of below or equal to 30%. Figure 3b
shows a rather uniform distribution of the precision, except for categories having more than 90%
precision which applies to 61 out of 206. Overall, a rather high recall can be observed. Items
retrieved by SPARQL but not found via Wikipedia (causing lower recall) can be attributed to
one of the two reasons: Either the entity was not added to the category by any Wikipedia editor
or the traversal has been stopped too early and the respective subcategory was not visited.</p>
      <p>Since the overall precision provides a rather mixed picture, we conducted a more detailed
investigation into possible reasons. Precision gives insights about items that were found in
Wikipedia but not by SPARQL. We define the following possible reasons for an item not being
found via SPARQL queries:
11SPARQL: ?entity wdt:P31/wdt:P279* ?target
12Retrieved via the following query: SELECT (COUNT(DISTINCT ?cat) AS ?count) WHERE { ?cat wdt:P31
wd:Q4167836 .}
0.9-1.00.8-0.90.7-0.80.6-0.70.5-0.60.4-0.50.3-0.40.2-0.30.1-0.20.0-0.1
(a) Recall.</p>
      <p>0.9-1.00.8-0.90.7-0.80.6-0.70.5-0.60.4-0.50.3-0.40.2-0.30.1-0.20.0-0.1
(b) Precision.</p>
      <p>We then analyzed the distribution of issues over all items that were not found by SPARQL (items
appearing only as Wikipedia members) for all categories. Based on Figure 4a, we notice that
the most spread issue type is the difPropValue with ∼ 87% of not found items, followed by
missingProp with ∼ 15%, and otherPropUsage with ∼ 0.41%. Note that the fraction of items for
each issue does not sum up to 100% because the same item may be counted multiple times if
the SPARQL query contains multiple properties. E.g., it is possible that one property is missing
while the other one has a diferent value.</p>
      <p>We also analyzed the consistency of a SPARQL query with the target and qualifier information
available in the category contains (P4224) property of the Wikidata category. Here, the same
issue classes as before apply as well. Figure 4b shows the distribution of issues over all categories
in this case. We notice that the most wide-spread issue type is missingProp with 52 categories,
followed by otherPropUsage with 19 categories, and difPropValue with 4 categories – the
remaining categories (131) show no issues. An example category with the otherPropUsage
issue is Category:Uruguayan beach volleyball players (Q22136982) with category contains (P4224)
consisting of the following qualifiers: &lt; occupation (P106): beach volleyball player (Q17361156),
country for sport (P1532): Uruguay (Q77)&gt;, and with a SPARQL query: ?item wdt:P31 wd:Q5;
wdt:P27 wd:Q77; wdt:P106 wd:Q17361156. In this case, the property used within the target’s
qualifiers, country for sport (P1532), has been replaced by a similar albeit not equal property,
country of citizenship (P27).</p>
      <p>For all categories, we further considered the correlation between the fraction of items with a
specific issue type and the number of Wikipedia items not found by SPARQL. Based on Figure 5a
and Figure 5b, we notice that categories with no issues mostly have a small number of Wikipedia
items not found by SPARQL, observing some outliers with a very low fraction of items with
the issue but with more than 1, 000 not found entities. Furthermore, categories with all items
having the issue are observed for categories with a rather small number of items not found.
0%
10%
20%
30%
40%
50%
60%
70%
80%</p>
      <p>The remaining categories do not follow any specific trend since for categories with a similar
number of items not found, we observe a great variety for the fraction of items afected by
the issues. Figure 5c shows that most of the categories afected by the respective issue have a
number of Wikipedia items not found by SPARQL ranging from 100 to 10, 000 with a rather
low percentage of items afected since it is the less wide-spread issue. In general, we do not see
a clear trend for a correlation between the size of the Wikipedia items not found and the items
afected by the issue.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion and Future Work</title>
      <p>We analyzed the consistency between Wikidata and Wikipedia categories and investigated
possible reasons. We also compared the available information within a Wikidata category
(SPARQL query with the target and qualifiers). For this, we proposed a workflow for automatic
generation of candidate categories. It traverses Wikipedia’s category hierarchy in all available
languages and retrieves corresponding members as long as certain conditions hold. Results
reveal diferences of various degrees between all sources and show three possible causes.</p>
      <p>The underlying reason for the discovered inconsistencies are rooted in the manual curation of
three separate sources answering in essence the same question: Which items/articles should be
members of a given category? To increase consistency, we suggest to treat Wikidata’s category
contains (P4224) as the main source of truth. From an automation standpoint, this provides the
most structured information. From this, SPARQL queries could be automatically generated.
Finally using these queries, the members of Wikipedia’s categories can be derived. As Wikidata
albeit growing remains incomplete, we may further use the current category membership in
Wikipedia together with Wikidata’s category contains (P4224) to complete the information of
items of Wikidata.</p>
      <p>Two approaches are possible to improve this situation: First, new changes to any source are
verified against the information contained in the other two. Editors may get a warning if they
seemingly violate these constraints. The cause might not be their current action but a mismatch
with another source. So, editors may still overrule the warning and commit their change.
Second, we can create an interactive interface to review the changes proposed previously. As
we can not be certain which information is wrong or incomplete, here human editors may
verify the assumptions of an automated system and only verified changes will be propagated to
Wikipedia and Wikidata respectively. Both approaches will over time increase the consistency
and quality of both Wikipedia and Wikidata and as a consequence improve their usefulness in
other applications.</p>
      <p>We base our work on the assumption that the definitions of categories, i.e. their semantics,
are consistent across Wikidata and all languages within Wikipedia. This might fail, though.
Categories like Category:American singers (Q7063228) can be seen in at least two ways, both
of which are legitimate interpretations: based on the country of citizenship (P27) as Wikidata
currently does or based on the place of birth (P19). Although we do not have evidence of similar
divergences in existence, they would require a community process to converge on a common
interpretation before applying our suggestions.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>This work has been partially funded by the German Aerospace Center (DLR). We thank Prof.
Dr. Birgitta König-Ries for the guidance and feedback.
[13] P. Curotto, A. Hogan, Suggesting Citations for Wikidata Claims based on Wikipedia’s
External References, in: Proceedings of the 1st Wikidata Workshop (Wikidata 2020)
colocated with 19th International Semantic Web Conference (OPub 2020), Virtual Conference,
volume 2773 of CEUR Workshop Proceedings, 2020.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>D.</given-names>
            <surname>Vrandečić</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Krötzsch</surname>
          </string-name>
          , Wikidata:
          <string-name>
            <given-names>A Free</given-names>
            <surname>Collaborative</surname>
          </string-name>
          <string-name>
            <surname>Knowledgebase</surname>
          </string-name>
          ,
          <source>Commun. ACM</source>
          <volume>57</volume>
          (
          <year>2014</year>
          )
          <fpage>78</fpage>
          -
          <lpage>85</lpage>
          . doi:
          <volume>10</volume>
          .1145/2629489.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>R.</given-names>
            <surname>Kaptein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kamps</surname>
          </string-name>
          ,
          <article-title>Exploiting the category structure of Wikipedia for entity ranking</article-title>
          ,
          <source>Artificial Intelligence</source>
          <volume>194</volume>
          (
          <year>2013</year>
          )
          <fpage>111</fpage>
          -
          <lpage>129</lpage>
          . doi:
          <volume>10</volume>
          .1016/j.artint.
          <year>2012</year>
          .
          <volume>06</volume>
          .003,
          <string-name>
            <surname>artificial</surname>
            <given-names>Intelligence</given-names>
          </string-name>
          , Wikipedia and
          <string-name>
            <surname>Semi-Structured Resources</surname>
          </string-name>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>J. F.</given-names>
            <surname>Medeiros</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. P.</given-names>
            <surname>Nunes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. W. M.</given-names>
            <surname>Siqueira</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. A. P. P.</given-names>
            <surname>Leme</surname>
          </string-name>
          ,
          <article-title>TagTheWeb: Using Wikipedia Categories to Automatically Categorize Resources on the Web</article-title>
          ,
          <source>in: Lecture Notes in Computer Science</source>
          , Springer International Publishing,
          <year>2018</year>
          , pp.
          <fpage>153</fpage>
          -
          <lpage>157</lpage>
          . doi:
          <volume>10</volume>
          . 1007/978-3-
          <fpage>319</fpage>
          -98192-5_
          <fpage>29</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>N.</given-names>
            <surname>Ostapuk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. E.</given-names>
            <surname>Difallah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Cudré-Mauroux</surname>
          </string-name>
          ,
          <article-title>SectionLinks: Mapping Orphan Wikidata Entities onto Wikipedia Sections</article-title>
          ,
          <source>in: Proceedings of the 1st Wikidata Workshop (Wikidata</source>
          <year>2020</year>
          )
          <article-title>co-located with 19th International Semantic Web Conference</article-title>
          , Virtual Conference, volume
          <volume>2773</volume>
          <source>of CEUR Workshop Proceedings</source>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>L.</given-names>
            <surname>Feddoul</surname>
          </string-name>
          , S. Schindler, fusion
          <article-title>-jena/wiki-category-</article-title>
          <string-name>
            <surname>consistency</surname>
          </string-name>
          ,
          <year>2022</year>
          . URL: https://github. com/fusion-jena/
          <article-title>wiki-category-consistency.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>L.</given-names>
            <surname>Feddoul</surname>
          </string-name>
          , S. Schindler, fusion-jena/wiki-category
          <source>-consistency v1.0.2</source>
          ,
          <year>2022</year>
          . doi:
          <volume>10</volume>
          .5281/ zenodo.6963599.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>L.</given-names>
            <surname>Feddoul</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Löfler</surname>
          </string-name>
          , S. Schindler, wiki
          <article-title>-category-consistency-</article-title>
          <string-name>
            <surname>dataset</surname>
          </string-name>
          ,
          <year>2022</year>
          . doi:
          <volume>10</volume>
          .5281/ zenodo.6913282.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>L.</given-names>
            <surname>Feddoul</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Löfler</surname>
          </string-name>
          , S. Schindler, wiki
          <article-title>-category-consistency-</article-title>
          <string-name>
            <surname>cache</surname>
          </string-name>
          ,
          <year>2022</year>
          . doi:
          <volume>10</volume>
          .5281/ zenodo.6913134.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>L.</given-names>
            <surname>Feddoul</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Löfler</surname>
          </string-name>
          , S. Schindler, wiki
          <article-title>-category-consistency-</article-title>
          <string-name>
            <surname>eval</surname>
          </string-name>
          ,
          <year>2022</year>
          . doi:
          <volume>10</volume>
          .5281/ zenodo.6913332.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>H.</given-names>
            <surname>Turki</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. A. H.</given-names>
            <surname>Taieb</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. B.</given-names>
            <surname>Aouicha</surname>
          </string-name>
          ,
          <article-title>Coupling Wikipedia Categories with Wikidata Statements for Better Semantics</article-title>
          ,
          <source>in: Proceedings of the 2nd Wikidata Workshop (Wikidata</source>
          <year>2021</year>
          )
          <article-title>co-located with the 20th International Semantic Web Conference (ISWC</article-title>
          <year>2021</year>
          ), Virtual Conference, volume
          <volume>2982</volume>
          <source>of CEUR Workshop Proceedings</source>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>I. Johnson</surname>
          </string-name>
          , Analyzing Wikidata Transclusion on English Wikipedia,
          <source>in: Proceedings of the 1st Wikidata Workshop (Wikidata</source>
          <year>2020</year>
          )
          <article-title>co-located with 19th International Semantic Web Conference</article-title>
          , Virtual Conference, volume
          <volume>2773</volume>
          <source>of CEUR Workshop Proceedings</source>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>A.</given-names>
            <surname>Boschin</surname>
          </string-name>
          , T. Bonald,
          <article-title>Enriching Wikidata with Semantified Wikipedia Hyperlinks</article-title>
          ,
          <source>in: Proceedings of the 2nd Wikidata Workshop (Wikidata</source>
          <year>2021</year>
          )
          <article-title>co-located with the 20th International Semantic Web Conference (ISWC</article-title>
          <year>2021</year>
          ), Virtual Conference, volume
          <volume>2982</volume>
          <source>of CEUR Workshop Proceedings</source>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>