<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Reference Statistics in Wikidata Topical Subsets</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Seyed</forename><surname>Amir</surname></persName>
						</author>
						<author>
							<persName><forename type="first">Hosseini</forename><surname>Beghaeiraveri</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">School of Mathematical and Computer Sciences</orgName>
								<orgName type="institution">Heriot-Watt University</orgName>
								<address>
									<settlement>Edinburgh</settlement>
									<country key="GB">UK</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Alasdair</forename><forename type="middle">J G</forename><surname>Gray</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">School of Mathematical and Computer Sciences</orgName>
								<orgName type="institution">Heriot-Watt University</orgName>
								<address>
									<settlement>Edinburgh</settlement>
									<country key="GB">UK</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Fiona</forename><forename type="middle">J</forename><surname>Mcneill</surname></persName>
							<affiliation key="aff1">
								<orgName type="department">School of Informatics</orgName>
								<orgName type="institution">The University of Edinburgh</orgName>
								<address>
									<settlement>Edinburgh</settlement>
									<country key="GB">UK</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Reference Statistics in Wikidata Topical Subsets</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">A51F85939311D697BF8A7AB91AD875F3</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T11:47+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Reference quality</term>
					<term>Wikidata</term>
					<term>Data quality</term>
					<term>Topical subset</term>
					<term>WikiProject</term>
					<term>Gene Wiki</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Wikidata is the only general-purpose open knowledge graph with the capability of specifying references for every single statement. Currently, about 68% of Wikidata statements have at least one reference but the quality of these references is rarely covered in data quality studies. There is also a lack of a comprehensive framework for evaluating references. In this paper, we investigate the statistics of Wikidata references in 6 topical subsets of Wikidata. We compare these statistics over two Wikidata dumps; one from 2016 and one from 2021.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>Wikidata <ref type="bibr" target="#b30">[31]</ref> is a knowledge graph that started in 2012 and is now the most active Wikimedia project. It contains knowledge on a broad range of topics with statements (data asserting a fact) being created and edited through crowdsourcing. A distinguishing characteristic of Wikidata is its ability to capture additional information about statements, such as providing references for each piece of data. According to the Wikidata project,"Wikidata is not a database that stores facts about the world, but a secondary knowledge base that collects and links to references to such knowledge" <ref type="bibr" target="#b6">[7]</ref>.</p><p>Our focus in this paper is on the references of statements. Having good evidence of where the data came from improves the trust and reusability of the data as errors can be traced, and data can be categorized according to where they came from <ref type="bibr" target="#b24">[25,</ref><ref type="bibr" target="#b25">26]</ref>. According to Wikidata policy <ref type="bibr" target="#b6">[7]</ref>, all statements need to be referenced except statements about common human knowledge, statements that refer to an external source, or statements of items that are a source for other statements. Wikidata recommends that references be relevant and authoritative, but these terms are not explicitly defined. Providing appropriate references is the responsibility of the person who adds the statement. Assessment of references is the responsibility of the Wikidata user community. Currently, about 68% of Wikidata statements already have at least one reference <ref type="bibr" target="#b3">[4]</ref>. While there has been some initial work to look at reference quality <ref type="bibr" target="#b26">[27]</ref>, there no systematic way to assess the quality of a reference.</p><p>Wikidata aims to cover a wide range of topics via user collaborations. Users interested in a particular topic form communities called WikiProjects <ref type="bibr" target="#b15">[16]</ref>. Besides human users, WikiProjects may use bots to collect and edit a mass of data, including references. Wikidata enforces strict rules for accepting edits by bots <ref type="bibr" target="#b4">[5]</ref>. WikiProjects reflect the activity of contributors in covered topics. Investigating WikiProjects provides a topical comparison basis to analyze the functionality of humans and bots in different quality metrics across different Wikidata topics.</p><p>In this paper, we perform a statistical analysis on the reference statements of different WikiProjects to provide insight into their quality. Our contributions in this paper are:</p><p>1. Creating a topical comparison platform for investigating the quality of references. This is done by extracting 6 topical subsets from Wikidata corresponding to 6 different WikiProjects. We also publish the subsets for further community experiments. 2. Providing a statistical report of references in the 6 subsets.</p><p>In Section 2 we discuss related work on reference quality. Section 3 explains reference nodes in the Wikidata RDF model. Section 4 details the process of subsetting Wikidata to build topical subsets for the topical comparison platform. In Section 5, we present the statistics of references in the extracted subsets. Section 6 outlines our position on the importance of studying the quality of references and the initial ideas of a reference quality checking framework. The conclusion of the paper is presented in Section 7.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Related Work</head><p>Provenance of data in knowledge graphs and its quality is one of the criteria of trustworthiness which is one of the main dimensions of data quality <ref type="bibr" target="#b21">[22]</ref>. The analysis by Farber et al. <ref type="bibr" target="#b21">[22]</ref> gives Wikidata the full score for the trustworthiness on statement level as Wikidata can provide references for each single statement. They do not give an analysis of how Wikidata uses this feature. Farber et al. reported that the coverage of references over the statements in October 2015 is 1.3% while Wikidata Stats says more than 50% of statements had a reference at that date <ref type="bibr" target="#b3">[4]</ref>. The reason for this difference is that Farber et al. counted the number of distinct reference nodes, while a reference node might be shared between more than one statement. We call this shared references.</p><p>Accuracy and trustworthiness have not been covered in Wikidata as much as other data quality dimensions <ref type="bibr" target="#b27">[28]</ref>. Piscopo et al. <ref type="bibr" target="#b26">[27]</ref> proposed an approach to evaluate the authoritativeness and the relevance of Wikidata external sources based on the quality definitions set by the Wikidata community. The approach consists of two main steps. First, a set of sample references is evaluated through microtask crowdsourcing. Then, this data is fed to a machine-learning algorithm to apply a large-scale evaluation over the whole Wikidata dump (from October 2016). They evaluated only English language sources, mainly because of the limits of performing crowdsourcing for non-English sources. They show that Wikidata external sources are of good quality as 70% are relevant and 80% are authoritative.</p><p>Comparing between Wikidata and Wikipedia external references, Piscopo et al. <ref type="bibr" target="#b28">[29]</ref> showed that Wikidata has a more diverse pool of sources, in terms of country of provenance, and employs a larger percentage of external databases and reference sources, such as library catalogues, compared to the online encyclopedia. More recently, Shenoy et al. <ref type="bibr" target="#b29">[30]</ref> developed a framework to detect and analyze low-quality statements in Wikidata. Their work does not consider the quality of references as a metric. Curotto and Hugan <ref type="bibr" target="#b20">[21]</ref> proposed a method of searching and indexing English Wikipedia references to create references for Wikidata facts. This proposal like any other reference-suggesting tool needs to be evaluated in terms of the quality of suggested references which indicates the need for a comprehensive reference quality checking framework.</p><p>The few prior work on reference quality <ref type="bibr" target="#b26">[27,</ref><ref type="bibr" target="#b28">29]</ref> were applied on the 2016 and 2017 dumps of Wikidata. Given the exponential growth of Wikidata in recent years, there is a need for a comprehensive investigation on the diversity of current Wikidata references, the extent to which bots and humans participate in references, and comparisons between bots and humans regarding to the quality of references. Also, no prior work studied the reference quality across different topics in Wikidata. In this paper, by investigating reference statistics we start a path to a comprehensive review of Wikidata references. We aim to develop a broader framework by precisely defining other data quality criteria for references.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Wikidata Data Model</head><p>The Wikidata knowledge graph consists of items (entities from the real world) and properties (relationships between items or between items and values). An (item, property, value) triple in Wikidata is called a claim. A distinguishing characteristic of Wikidata is its ability to capture the provenance of each claim. This is achieved by enriching the claim with qualifiers (contextual information) and/or references (the source of the claim) to create a statement.</p><p>The Wikidata RDF model uses reification <ref type="bibr" target="#b0">[1]</ref> for adding references to statements, as shown in Figure <ref type="figure" target="#fig_0">1</ref>. Every statement in Wikidata has a statement node (identified by a unique ID in the wds: namespace) from which all references, qualifiers, ranks, and values are stored. References are linked through prov:wasDerivedFrom edges to reference nodes (identified by a unique ID in the wdref: namespace). Reference nodes provide the provenance of the fact by one or more properties like retrieved date (P813), stated in (P248), and reference URL (P248). If a statement has multiple references, there will be a separate reference node for each reference. If two statement nodes share the same provenance, then they link to the same reference node. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Topical Subsetting of Wikidata</head><p>Investigating reference quality requires access to the reference nodes. This can be achieved by querying for the reference nodes by type, e.g. using the basic graph pattern ?item rdf:type wikibase:Reference. The Wikidata Query Service has blocked these queries for performance reasons <ref type="bibr" target="#b0">[1]</ref>. Due to the enormous size of Wikidata dumps, locally indexing a complete Wikidata dump is time consuming, costly, and requires hardware beyond a standard desktop computer. Therefore, we use topical subsets of Wikidata <ref type="bibr" target="#b18">[19]</ref> which gives us substantially smaller datasets over which we can research references locally. They also provide a basis for comparing the richness and quality of referencing across Wikidata topics; thus reflecting the work of different communities. These smaller, focused, datasets are more likely to be reused <ref type="bibr" target="#b23">[24]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1">Topic Selection Desiderata</head><p>A WikiProject <ref type="bibr" target="#b15">[16]</ref> is a team of Wikidata contributors who aim to improve Wikidata by working on a specific topic or doing a specific task. A simple query 3 shows that there are 243 WikiProjects in Wikidata, many of which have been created to enrich Wikidata (both A-Boxes and T-Boxes) on a particular topic, such as music, scientific disciplines, or politics. WikiProject contributors typically list classes and properties for their topic so data instances that match these definitions can be added to Wikidata. We can use such definitions to determine the boundaries and the scope of the topic and extract their subset. These subsets are representative of their relevant WikiProject in different experiments (e.g. in reference statistics as we present in this paper).</p><p>WikiProjects vary in purpose, scope, activity, and progress. Extracting subsets for each of the projects is not feasible due to their number, nor are all of them suitable. A candidate project must meet the following desiderata:</p><p>-It should be topical in nature. Task-based projects such as disambiguation pages <ref type="bibr" target="#b8">[9]</ref> are not suitable for topical subsetting. -Contributors should provide information about items, classes, and properties that are added to Wikidata through the project. This information is presented as lists, tables, entity schemas, or UML class diagrams. Using this information, we can determine the boundaries of the covered topic. -The topic of the project should not be too limited or too broad. For example, in the Scholia project <ref type="bibr" target="#b12">[13]</ref>, just scholarly articles make up 30% of Wikidata items <ref type="bibr" target="#b5">[6]</ref> which is very broad. We would like our candidates to have the same level of independence <ref type="bibr" target="#b11">[12]</ref> from the whole Wikidata. -We would like our experiment to be a good approximation of the whole Wikidata so we need candidates from a wide range of subject areas.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2">Selected Projects</head><p>Based on our topic selection desiderata we identified the following projects for topical subsetting to enable us to investigate reference quality. We have selected some closely related projects to allow direct comparison, and then some less related ones for contrast. We have selected a combination of scientific and nonscientific topics. The projects are of similar size and scope.</p><p>Gene Wiki <ref type="bibr" target="#b19">[20]</ref>: Gene Wiki aims to make and maintain Wikidata as a central hub of linked knowledge on Genes, Proteins, Diseases, Drugs, and related items. It is one of the most active WikiProjects. It has five active bots and specified 24 classes of data to be added to Wikidata pictured in a UML class diagram. We include all instances of these 24 main classes and their subclasses into the subset. Taxonomy <ref type="bibr" target="#b14">[15]</ref>: The goal of this project is to populate Wikidata with taxonomic names and their classifications. This project consists of the class of Taxon (Q16521) and its hierarchy plus 47 other related classes that are specified in the wiki page of the project. The Taxon (Q16521) class and its subclasses are also considered in the Gene Wiki project. Considering it as a separate use case allows investigating the references in this focused part of Gene Wiki as compared to the rest. Astronomy <ref type="bibr" target="#b7">[8]</ref>: The main goal of this project is to define classes and properties for items related to Astronomy. Accurate referencing is one of the main goals of the project. Besides that, an active community, well-structured ontology definition, and usefulness of the project motivate us to consider this project. This subset consists of all instances of astronomical object (Q6999) class and its subclasses. Law <ref type="bibr" target="#b9">[10]</ref>: This project aims to cover anything that touches the law, e.g. economic laws, evidences, and legal proceedings. The provided data would be particularly useful for judicial systems. The project intends to be broad in scope, but it has a detailed ontology definition. Law (Q7748), public order (Q294199), and evidence (Q176763) are some of the included classes.</p><p>Music <ref type="bibr" target="#b10">[11]</ref>: This project aims to map and import all music-related data from diverse sources to feed Wikipedia music infoboxes. Referencing is also important in this project. Musician (Q639669), musical ensemble (Q2088357), and musical work(Q2188189) are some of the main classes. Ships <ref type="bibr" target="#b13">[14]</ref>: This project aims to establish the most ideal structure for ship data, and create and update claims for all ship items on Wikidata. The project has a well-structured class hierarchy. Based on the mentioned items and classes on the project's web page, all instances of all subclasses of watercraft (Q1229765) and ship class (Q559026) are in the subset.</p><p>Full programmatic definitions of the subsets can be found in the supplementary material for this paper <ref type="bibr" target="#b16">[17]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3">Subset Extraction Setup</head><p>We use the Wikidata WDumper <ref type="bibr" target="#b22">[23]</ref> tool to extract subsets corresponding to each project. For each project, the main classes are identified according to the project's wiki page. Identified classes are then used to write WDumper specification files. The specification files are then enriched with subclasses via a Python script <ref type="bibr" target="#b16">[17]</ref>. Finally, the related A-Boxes are extracted via WDumper. Subsets include all statements for A-Boxes along with references, qualifiers, and rank data. T-Boxes have been ignored as referencing does not apply to them. The WDumper specification files for each project are in <ref type="bibr" target="#b16">[17]</ref>.</p><p>For each project, two separate subsets are extracted: one from the 2016 dump (3 October 2016) <ref type="bibr" target="#b2">[3]</ref> and one from the 2021 dump (30 June 2021) <ref type="bibr" target="#b1">[2]</ref>. The 2021 dump was downloaded from the Wikimedia dump store <ref type="foot" target="#foot_0">4</ref> . We chose the 2016 dump as it is used in prior work on Wikidata reference quality. Thus, it will allow us to compare statistics between the two different snapshots and drawer some conclusions with that earlier work. The extracted subsets in N-Triples are in <ref type="bibr" target="#b17">[18]</ref>. For this paper, the subsets were indexed and queried using Blazegraph<ref type="foot" target="#foot_1">5</ref> 2.1.6 triplestore.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">Reference Statistics</head><p>We consider four experiments in which we perform a set of SPARQL queries over each extracted subset to obtain a statistical overview of references in Wikidata. The SPARQL queries for each experiment along with results can be found at the GitHub repository of the paper <ref type="bibr" target="#b16">[17]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.1">Basic Statistics</head><p>Table <ref type="table" target="#tab_0">1</ref> shows the initial statistics for references in each project. The first two columns are the number of items and the number of statements. The Fig. <ref type="figure">2</ref>. An example of a reference node that is shared between three statement nodes third column is the number of reference nodes (i.e. nodes with the type wikibase:Reference). The fourth column shows the number and percentage of statements with at least one reference. The difference between the number of referenced statements and the number of reference nodes is significant. This is because a number of reference nodes are common between statements. In other words, a number of statements have exactly the same references. We call these shared references, which is shown in Figure <ref type="figure">2</ref>. The fifth column shows the number and percentage of those reference nodes that are shared between more than two statements.</p><p>As we can see from the table, in all projects the number of items, statement nodes, and reference nodes has substantially increased from 2016 to 2021. After the extraction, we recognized that Taxonomy project makes up about 30% of Gene Wiki. The percentage of referenced statements has increased in all cases except Music and Ships. In the case of Ships, the percentage of referenced statements has dramatically decreased. Considering the increase in statements in both, the decrease in referenced statement can show that human users are more active than bots in Ships and Music (if we intuitively accept that bots provide references more and better than humans). The percentage of shared references for Gene Wiki, Taxonomy, Law, and Ships has increased from 2016 to 2021, while for Astronomy and Music this amount has decreased. Among the 2021 datasets, the highest number of referenced statements belongs to the Astronomy project and the lowest to the Ships project. The increase in shared references in the Gene Wiki and Taxonomy subsets is likely due to the use of bots to populate Wikidata. Considering the 2021 datasets, the highest number of shared references is allocated to the Law project and the lowest to the Taxonomy project.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.2">Usage of Reference-specific Properties</head><p>Wikidata offers a set of properties such as stated in (P248) and reference URL (P854) to be used in references. In addition, different projects may offer properties for their references, e.g. the Gene Wiki and Taxonomy projects use properties such as IUCN taxon ID (P627) even though they are identifier properties. Figure <ref type="figure" target="#fig_1">3</ref> shows the frequency of reference-specific properties used in references in each use case for 2021 subsets. Note that, Figure <ref type="figure" target="#fig_1">3</ref> illustrates only the most used properties; the variety of properties is more but the abundance of the remaining properties is less than 3% overall. For details, see the CSV file at the GitHub repository of the paper <ref type="foot" target="#foot_2">6</ref> .</p><p>In Gene Wiki, Taxonomy, and Law, the most frequently used properties are stated in (P248), retrieved (P813), and reference URL (P854), while Music makes most use of the first two. This indicates that most of the references in these subsets rely on external sources that were likely populated by bots. For Gene Wiki and Taxonomy, the next most frequently used properties correspond to identifier properties for well known data sources in the life sciences. It is likely that these are used to indicate these data sources as the provider of the claim. The use of external sources accounts for about 60% (Music) to 100% (Taxonomy) of the references. In Astronomy (58%) and Ships (56%), the most frequently used properties are imported from Wikimedia project (P143) and Wikimedia import URL (P4656). These properties indicate that the source of the statement is one of the internal Wikimedia projects, e.g. Wikipedia. Mentioning the Wikipedia article as a source for corresponding Wikidata item is not recommended <ref type="bibr" target="#b6">[7]</ref>, so the extent of these should be carefully considered in future studies.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.3">Distribution of Triples per Nodes</head><p>Via the reference-specific properties, each reference node uses one or more triples to point to the provenance of the claim. Figure <ref type="figure" target="#fig_2">4</ref> shows that the most frequent properties are probably used together. Having more triples in a reference node provides more details about the source which is likely to increase the accuracy. Figure <ref type="figure" target="#fig_2">4</ref> shows the distribution of the number of triples over the total reference nodes in each project in 2016 and 2021 dumps. In all projects except Law and Ships, the average number of triples in references has decreased from 2016 to 2021. The best average belongs to Gene Wiki. The similarity of Gene Wiki statistics in both 2016 and 2021 dumps is interesting and is probably related to the steady activities of the project bots. The uniform distribution of triples in taxonomy might be due to the steady activity of the bots in a specific field (as opposed to Gene Wiki, which consists of several fields such as biology, chemistry, and pharmacology). In 2021, Astronomy has the lowest average number of triples in reference nodes, despite having the highest percentage of referenced statements. In the Music project, there are reference nodes with 22 and 35 triples; these outliers are omitted from the figure for presentation purposes. The average number of triples ranges between 1.2 (Ships 2016) and 3.5 (Gene Wiki 2021).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.4">Distribution of Reference Sharing</head><p>Shared reference nodes can affect the quality of references. Having shared references is not necessarily negative as they can reduce redundancy. For example, in Gene Wiki multiple statements about a protein might be taken simultaneously from the UniProt dataset via a bot, so the reference node of all these statements will be the same. Figure <ref type="figure">5</ref> shows the distribution of reference sharing of each project in the 2016 and 2021 dumps. In all projects except Astronomy, the reference sharing rate has decreased from 2016 to 2021. Although Figure <ref type="figure">5</ref> shows the  normal distribution rate in shared reference nodes, there are exception reference nodes in each project shared between a very large number of statements. Table <ref type="table" target="#tab_1">2</ref> shows the mean and maximum of reference sharing in each project in the 2021 dump. In Astronomy, there are about 43 million statements connected to just one reference node, however, there is only one reference node with such situation. In all projects, there are reference nodes that are providing the source of more than 50,000 statements. This amount of sharing might challenge the relevancy condition <ref type="bibr" target="#b6">[7]</ref> and should be carefully examined.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6">From Statistics to Quality</head><p>To the best of our knowledge, the Piscopo et al. study <ref type="bibr" target="#b28">[29]</ref> is the only research on the quality of references in Wikidata, but this work has considerable limitations. They started with the Wikidata edit history in October 2016. They extracted all statements containing external references. Then they excluded statements that do not require reference according to Wikidata policy which leads them to 1,629,102 references. At this step, 89% of references pointed to two specific sources 7 that are excluded from the evaluation as around 98% of these were added by one bot. In the remaining 11%, they evaluated only English sources that are about 46%. In the end, only 83,215 references were reviewed. The number of statements and references is completely changed now. We can see from [4] that the number of statements has increased 10 fold and the percentage of statements referring to external sources has also increased from 25% to 68%. Figure <ref type="figure" target="#fig_1">3</ref> confirms that currently there is a diversity in the most used properties in references. All of these mean that there is a possibility of greater diversity in references and a need for a comprehensive evaluation. The impact of bots on the quality of references should also be examined. Although Wikidata has strict policies for using bots, the effect of bots on references has not been studied. The challenge here is that tracking bot activities requires processing Wikidata edit history, which is ten times larger than the current Wikidata dump. Shared references can also be a potential factor in reference quality because they can at least challenge the relevancy condition.</p><p>Currently, the most important shortcoming is the lack of a framework to examine the quality of referencing in Wikidata and other knowledge graphs. Our idea is a scoring system that can evaluate different criteria on references and quantify the result. For this scoring system, different criteria should be defined according to the references. Relevancy and authoritativeness have been suggested by the Wikidata community for references. There are also data quality criteria such as Accuracy, Accessibility, Consistency, and Completeness that need to be accurately defined according to the context of the references and referencespecific properties. For example, accessibility can be defined as the availability of the links mentioned in the references.</p><p>The above criteria apply to single references, but criteria such as shared references should be considered on the whole of Wikidata (or its subsets). Furthermore, some criteria can be measured by the machine, while some others such as relevancy and authoritativeness are subjective, and evaluating them requires machine training with human intervention.</p><p>Statistical information can be effective in defining some criteria. For example, using the information of Section 5.3, we can determine a minimum number of triples that reference nodes should have. Section 5.2 also tells us what properties are most commonly used in references so we will be able to define necessary criteria appropriate to those properties. Our plan for this scoring system is to identify the necessary criteria, provide a precise definition of them, and measure them on the six subsets as well as a random sample from Wikidata. We also plan to separate human references and bot references to compare the quality between them.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7">Conclusion</head><p>In this paper, we performed a statistical review of the references in Wikidata. We extracted six independent Wikidata subsets corresponding to 6 different WikiProjects and reviewed reference statistics in them. These statistics can be used by project contributors to improve Wikidata, e.g. correcting the properties used in their project, reviewing shared references, and trying to provide a sufficient number of triples. The subsetting method used can be replicated for other Wikidata projects and other fields of study.</p><p>Our statistics show the importance of a more in-depth study of Wikidata references. We stated our position of the need for a reference quality scoring system based on data quality dimensions and provided basic ideas for the system. Such an assessment system can provide precise and detailed suggestions to Wikidatians/WikiProject holders. Our future work is to complete the definition and development of the reference quality scoring system. We aim to perform a comprehensive evaluation on Wikidata references, using WikiProjects along with randomly selected subsets. The challenges for the future work are the large volume of data, tracing bot/human edits, and the subjective nature of the concepts.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Fig. 1 .</head><label>1</label><figDesc>Fig.1. The reification used in Wikidata for adding references into statements derived from<ref type="bibr" target="#b0">[1]</ref>. abc is an arbitrary QID. opq, rst, and xyz are arbitrary PIDs.</figDesc><graphic coords="4,169.35,115.83,276.67,116.23" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Fig. 3 .</head><label>3</label><figDesc>Fig. 3. The frequency of reference-specific properties used in references in each project (2021 subsets).</figDesc><graphic coords="9,134.77,115.83,345.83,235.17" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Fig. 4 .</head><label>4</label><figDesc>Fig. 4. The distribution of triples per reference nodes. The red lines are the medians. Triangles are mean points. Outliers are omitted for presentation purposes.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1 .</head><label>1</label><figDesc>The basic statistic of references in each subset.</figDesc><table><row><cell cols="3">Project Dump Items</cell><cell>Statement Nodes</cell><cell>Reference Nodes</cell><cell>Referenced Statements</cell><cell>Shared Reference Nodes</cell></row><row><cell>Gene Wiki</cell><cell cols="6">2016 2,647,174 17,656,669 169,493 8,789,246(50%) 2021 8,801,623 92,729,475 9,559,517 65,780,005(71%) 4,700,610(49%) 42,902(25%)</cell></row><row><cell>Taxonomy</cell><cell cols="5">2016 2,214,088 16,056,914 2021 3,225,102 32,536,083 498,535 19,423,938(60%) 95,714 8,146,218(51%)</cell><cell>5,971(06%) 204,602(41%)</cell></row><row><cell>Astronomy</cell><cell cols="5">2016 2021 8,416,958 144,637,511 157,558 128,394,763(89%) 141,843 888,717 13,260 751,158(85%)</cell><cell>12,198(92%) 112,365(71%)</cell></row><row><cell>Law</cell><cell>2016 2021</cell><cell cols="4">67,763 433,440 4,236,657 407,409 2,266,462(53%) 174,252 380 48,225(27%)</cell><cell>152(40%) 317,975(78%)</cell></row><row><cell>Music</cell><cell>2016 2021</cell><cell cols="4">598,074 3,742,474 948,266 11,702,021 1,329,746 6,342,019(54%) 80,857 2,298,330(61%)</cell><cell>35,574(44%) 374,440(28%)</cell></row><row><cell>Ships</cell><cell>2016 2021</cell><cell cols="2">42,873 126,896 1,101,802 183,240</cell><cell>857 59,282</cell><cell>114,528(62%) 315,381(29%)</cell><cell>227(26%) 16,396 (28%)</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2 .</head><label>2</label><figDesc>Rounded mean and maximum of reference sharing in 2021 dump for each project.</figDesc><table><row><cell></cell><cell cols="3">Gene Wiki Taxonomy Astronomy</cell><cell>Law</cell><cell cols="2">Music Ships</cell></row><row><cell>Mean</cell><cell>13</cell><cell>93</cell><cell>1142</cell><cell>7</cell><cell>14</cell><cell>17</cell></row><row><cell cols="2">Max 1,281,307</cell><cell cols="5">408,522 42,876,186 155,508 1,385,109 96,659</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head></head><label></label><figDesc>The distribution of reference sharing (for reference nodes that are connected to ≥ 2 statement). The red lines are the medians. Outliers and means are omitted for presentation purposes.</figDesc><table><row><cell>Number of Statements Pointed to Each Refrence Node (&gt;1)</cell><cell>0 10 20 30 40</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell>year 2016 2021</cell></row><row><cell></cell><cell>Gene Wiki</cell><cell>Taxonomy</cell><cell>Astronomy</cell><cell>Subset</cell><cell>Music</cell><cell>Law</cell><cell>Ships</cell></row><row><cell>Fig. 5.</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="4" xml:id="foot_0">https://dumps.wikimedia.org/other/wikibase/wikidatawiki/ -accessed 2 July 2021</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="5" xml:id="foot_1">https://github.com/blazegraph/database/releases/tag/BLAZEGRAPH_2_1_6_RC</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="6" xml:id="foot_2">https://github.com/seyedahbr/Wikidata_Reference_Statistics/blob/main/ QueryResults/UsageofReference-specificProperties/PropertyUsage.xlsx</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="8" xml:id="foot_3">http://shex.io/ -accessed July 2021</note>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgements</head><p>We would like to acknowledge the useful guidance and fruitful discussions with the ShEx Community Group 8 ; Kat Thronton, Andra Waagmeester, Dan Brickley, and Eric Prud'hommaux.</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<monogr>
		<ptr target="https://www.mediawiki.org/wiki/Wikibase/Indexing/RDF_Dump_Format" />
		<title level="m">Wikibase/Indexing/RDF Dump Format -MediaWiki</title>
				<imprint>
			<date type="published" when="2021-06-30">2021-06-30</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<monogr>
		<ptr target="https://archive.org/details/wikidata-20210630.json" />
		<title level="m">JSON) generated on June 30, 2021 : Free Download, Borrow, and Streaming</title>
				<imprint>
			<date type="published" when="2021-09-27">2021-09-27</date>
		</imprint>
	</monogr>
	<note>Wikidata entity dump</note>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Wikidata entity dumps (JSON and TTL) of all Wikibase entries for Wikidata generated on</title>
		<ptr target="https://archive.org/details/wikibase-wikidatawiki-20161003" />
	</analytic>
	<monogr>
		<title level="m">Wikidata editors : Free Download, Borrow, and Streaming</title>
				<imprint>
			<date type="published" when="2016-10-03">October 03, 2016. 2021-06-30</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<monogr>
		<ptr target="https://wikidata-todo.toolforge.org/stats.php" />
		<title level="m">Wikidata Stats</title>
				<imprint>
			<date type="published" when="2021-06-30">2021-06-30</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<monogr>
		<ptr target="https://www.wikidata.org/wiki/Wikidata:Bots,ac-cessed" />
		<title level="m">Bots -Wikidata</title>
				<imprint>
			<date type="published" when="2021-06-30">2021-06-30</date>
		</imprint>
	</monogr>
	<note>Wikidata</note>
</biblStruct>

<biblStruct xml:id="b5">
	<monogr>
		<ptr target="https://www.wikidata.org/wiki/Wikidata:Statistics/Wikipedia/Type_of_content" />
		<title level="m">Statistics/Wikipedia/Type of content -Wikidata</title>
				<imprint>
			<date type="published" when="2021-06-30">2021-06-30</date>
		</imprint>
	</monogr>
	<note>Wikidata</note>
</biblStruct>

<biblStruct xml:id="b6">
	<monogr>
		<ptr target="https://www.wikidata.org/wiki/Wikidata:Verifiability" />
		<title level="m">Verifiability -Wikidata</title>
				<imprint>
			<date type="published" when="2021-06-30">2021-06-30</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<monogr>
		<ptr target="https://www.wikidata.org/wiki/Wikidata:WikiProject_Astronomy" />
		<title level="m">WikiProject Astronomy -Wikidata</title>
				<imprint>
			<date type="published" when="2021-06-30">2021-06-30</date>
		</imprint>
	</monogr>
	<note>Wikidata</note>
</biblStruct>

<biblStruct xml:id="b8">
	<monogr>
		<ptr target="https://www.wikidata.org/wiki/Wikidata:WikiProject_Disambiguation_pages" />
		<title level="m">WikiProject Disambiguation pages -Wikidata</title>
				<imprint>
			<date type="published" when="2021-06-30">2021-06-30</date>
		</imprint>
	</monogr>
	<note>Wikidata</note>
</biblStruct>

<biblStruct xml:id="b9">
	<monogr>
		<ptr target="https://www.wikidata.org/wiki/Wikidata:WikiProject_Law#Participants" />
		<title level="m">WikiProject Law -Wikidata</title>
				<imprint>
			<date type="published" when="2021-06-30">2021-06-30</date>
		</imprint>
	</monogr>
	<note>Wikidata</note>
</biblStruct>

<biblStruct xml:id="b10">
	<monogr>
		<ptr target="https://www.wikidata.org/wiki/Wikidata:WikiProject_Music#Overview" />
		<title level="m">WikiProject Music -Wikidata</title>
				<imprint>
			<date type="published" when="2021-06-30">2021-06-30</date>
		</imprint>
	</monogr>
	<note>Wikidata</note>
</biblStruct>

<biblStruct xml:id="b11">
	<monogr>
		<ptr target="https://www.wikidata.org/wiki/Wikidata:WikiProject_Schemas/Subsetting" />
		<title level="m">WikiProject Schemas/Subsetting -Wikidata</title>
				<imprint>
			<date type="published" when="2021-06-30">2021-06-30</date>
		</imprint>
	</monogr>
	<note>Wikidata</note>
</biblStruct>

<biblStruct xml:id="b12">
	<monogr>
		<ptr target="https://www.wikidata.org/wiki/Wikidata:WikiProject_Scholia" />
		<title level="m">WikiProject Scholia -Wikidata</title>
				<imprint>
			<date type="published" when="2021-06-30">2021-06-30</date>
		</imprint>
	</monogr>
	<note>Wikidata</note>
</biblStruct>

<biblStruct xml:id="b13">
	<monogr>
		<ptr target="https://www.wikidata.org/wiki/Wikidata:WikiProject_Ships" />
		<title level="m">WikiProject Ships -Wikidata</title>
				<imprint>
			<date type="published" when="2021-06-30">2021-06-30</date>
		</imprint>
	</monogr>
	<note>Wikidata</note>
</biblStruct>

<biblStruct xml:id="b14">
	<monogr>
		<ptr target="https://www.wikidata.org/wiki/Wikidata:WikiProject_Taxonomy" />
		<title level="m">WikiProject Taxonomy -Wikidata</title>
				<imprint>
			<date type="published" when="2021-06-30">2021-06-30</date>
		</imprint>
	</monogr>
	<note>Wikidata</note>
</biblStruct>

<biblStruct xml:id="b15">
	<monogr>
		<ptr target="https://www.wikidata.org/wiki/Wikidata:WikiProjects" />
		<title level="m">WikiProjects -Wikidata</title>
				<imprint>
			<date type="published" when="2021-06-30">2021-06-30</date>
		</imprint>
	</monogr>
	<note>Wikidata</note>
</biblStruct>

<biblStruct xml:id="b16">
	<monogr>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">A H</forename><surname>Beghaeiraveri</surname></persName>
		</author>
		<ptr target="https://github.com/seyedahbr/Wikidata_Reference_Statistics" />
		<title level="m">Wikidata reference statistics</title>
				<imprint>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<monogr>
		<title level="m" type="main">Wikidata Subsets of 6 Wikiproject (Gene Wiki, Taxonomy, Astronomy</title>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">A H</forename><surname>Beghaeiraveri</surname></persName>
		</author>
		<idno type="DOI">10.5281/zenodo.5117928</idno>
		<ptr target="https://doi.org/10.5281/zenodo.5117928" />
		<imprint>
			<date type="published" when="2021-07">Jul 2021</date>
			<pubPlace>Music, Law, Ships</pubPlace>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">Experiences of Using WDumper to Create Topical Subsets from Wikidata</title>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">A H</forename><surname>Beghaeiraveri</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">J G</forename><surname>Gray</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><forename type="middle">J</forename><surname>Mcneill</surname></persName>
		</author>
		<ptr target="https://researchportal.hw.ac.uk/en/publications/experiences-of-using-wdumper-to-create-topical-subsets-from-wikid" />
	</analytic>
	<monogr>
		<title level="m">CEUR Workshop Proceedings</title>
				<imprint>
			<publisher>CEUR-WS</publisher>
			<date type="published" when="1613">Jun 2021. 1613-0073</date>
			<biblScope unit="volume">2873</biblScope>
			<biblScope unit="page">13</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">Wikidata as a semantic framework for the Gene Wiki initiative</title>
		<author>
			<persName><forename type="first">S</forename><surname>Burgstaller-Muehlbacher</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Waagmeester</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Mitraka</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Turner</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Putman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Leong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Naik</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Pavlidis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Schriml</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">M</forename><surname>Good</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">I</forename><surname>Su</surname></persName>
		</author>
		<idno type="DOI">10.1093/database/baw015</idno>
		<ptr target="https://academic.oup.com/database/article-lookup/doi/10.1093/database/baw015" />
	</analytic>
	<monogr>
		<title level="j">Database</title>
		<imprint>
			<date type="published" when="2016">2016. 2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">Suggesting citations for wikidata claims based on wikipedia&apos;s external references</title>
		<author>
			<persName><forename type="first">P</forename><surname>Curotto</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Hogan</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Wikidata@ ISWC</title>
				<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<analytic>
		<title level="a" type="main">Linked data quality of DBpedia, Freebase, OpenCyc, Wikidata, and YAGO</title>
		<author>
			<persName><forename type="first">M</forename><surname>Färber</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Bartscherer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Menne</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Rettinger</surname></persName>
		</author>
		<idno type="DOI">https://www.medra.org/servlet/aliasResolver?alias=iospress&amp;doi=10.3233/SW-170275</idno>
		<ptr target="https://www.medra.org/servlet/aliasResolver?alias=iospress&amp;doi=10.3233/SW-170275" />
	</analytic>
	<monogr>
		<title level="j">Semantic Web</title>
		<imprint>
			<biblScope unit="volume">9</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="77" to="129" />
			<date type="published" when="2017-11">Nov 2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<monogr>
		<author>
			<persName><forename type="first">B</forename><surname>Fünfstück</surname></persName>
		</author>
		<ptr target="https://github.com/bennofs/wdumper" />
		<title level="m">Wdumper</title>
				<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b23">
	<analytic>
		<title level="a" type="main">Dataset Reuse: Toward Translating Principles to Practice</title>
		<author>
			<persName><forename type="first">L</forename><surname>Koesten</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Vougiouklis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Simperl</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Groth</surname></persName>
		</author>
		<idno type="DOI">10.1016/j.patter.2020.100136</idno>
		<ptr target="https://www.sciencedirect.com/science/article/pii/S2666389920301847" />
	</analytic>
	<monogr>
		<title level="j">Patterns</title>
		<imprint>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="issue">8</biblScope>
			<biblScope unit="page" from="100" to="136" />
			<date type="published" when="2020-11">Nov 2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b24">
	<analytic>
		<title level="a" type="main">DeFacto -Deep Fact Validation</title>
		<author>
			<persName><forename type="first">J</forename><surname>Lehmann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Gerber</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Morsey</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">C</forename><surname>Ngonga Ngomo</surname></persName>
		</author>
		<idno type="DOI">10.1007/978-3-642-35176-1_20</idno>
		<idno>642-35176-1 20</idno>
		<ptr target="https://doi.org/10.1007/978-3-" />
	</analytic>
	<monogr>
		<title level="m">The Semantic Web -ISWC 2012</title>
		<title level="s">Lecture Notes in Computer Science</title>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2012">2012</date>
			<biblScope unit="page" from="312" to="327" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b25">
	<analytic>
		<title level="a" type="main">Trust in wikipedia: how users trust information from an unknown source</title>
		<author>
			<persName><forename type="first">T</forename><surname>Lucassen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">M</forename><surname>Schraagen</surname></persName>
		</author>
		<idno type="DOI">10.1145/1772938.1772944</idno>
		<ptr target="https://doi.org/10.1145/1772938.1772944" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 4th workshop on Information credibility</title>
				<meeting>the 4th workshop on Information credibility<address><addrLine>Raleigh, North Carolina, USA</addrLine></address></meeting>
		<imprint>
			<publisher>Association for Computing Machinery</publisher>
			<date type="published" when="2010-04">Apr 2010</date>
			<biblScope unit="page" from="19" to="26" />
		</imprint>
	</monogr>
	<note>WICOW &apos;10</note>
</biblStruct>

<biblStruct xml:id="b26">
	<analytic>
		<title level="a" type="main">Provenance Information in a Collaborative Knowledge Graph: An Evaluation of Wikidata External References</title>
		<author>
			<persName><forename type="first">A</forename><surname>Piscopo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">A</forename><surname>Kaffee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Phethean</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Simperl</surname></persName>
		</author>
		<idno type="DOI">10.1007/978-3-319-68288-4_32</idno>
		<ptr target="http://link.springer.com/10.1007/978-3-319-68288-4_32" />
	</analytic>
	<monogr>
		<title level="m">The Semantic Web -ISWC 2017</title>
				<meeting><address><addrLine>Cham</addrLine></address></meeting>
		<imprint>
			<publisher>Springer International Publishing</publisher>
			<date type="published" when="2017">2017</date>
			<biblScope unit="volume">10587</biblScope>
			<biblScope unit="page" from="542" to="558" />
		</imprint>
		<respStmt>
			<orgName>Computer Science</orgName>
		</respStmt>
	</monogr>
</biblStruct>

<biblStruct xml:id="b27">
	<analytic>
		<title level="a" type="main">What we talk about when we talk about wikidata quality: a literature survey</title>
		<author>
			<persName><forename type="first">A</forename><surname>Piscopo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Simperl</surname></persName>
		</author>
		<idno type="DOI">10.1145/3306446.3340822</idno>
		<ptr target="https://dl.acm.org/doi/10.1145/3306446.3340822" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 15th International Symposium on Open Collaboration</title>
				<meeting>the 15th International Symposium on Open Collaboration<address><addrLine>Skövde Sweden</addrLine></address></meeting>
		<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2019-08">Aug 2019</date>
			<biblScope unit="page" from="1" to="11" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b28">
	<analytic>
		<title level="a" type="main">What do Wikidata and Wikipedia Have in Common?: An Analysis of their Use of External References</title>
		<author>
			<persName><forename type="first">A</forename><surname>Piscopo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Vougiouklis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">A</forename><surname>Kaffee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Phethean</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Hare</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Simperl</surname></persName>
		</author>
		<idno type="DOI">10.1145/3125433.3125445</idno>
		<ptr target="http://dl.acm.org/citation.cfm?doid=3125433.3125445" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 13th International Symposium on Open Collaboration -OpenSym &apos;17</title>
				<meeting>the 13th International Symposium on Open Collaboration -OpenSym &apos;17<address><addrLine>Galway, Ireland</addrLine></address></meeting>
		<imprint>
			<publisher>ACM Press</publisher>
			<date type="published" when="2017">2017</date>
			<biblScope unit="page" from="1" to="10" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b29">
	<monogr>
		<author>
			<persName><forename type="first">K</forename><surname>Shenoy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Ilievski</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Garijo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Schwabe</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Szekely</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2107.00156</idno>
		<idno>arXiv: 2107.00156</idno>
		<ptr target="http://arxiv.org/abs/2107.00156" />
		<title level="m">A Study of the Quality of Wikidata</title>
				<imprint>
			<date type="published" when="2021-06">Jun 2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b30">
	<analytic>
		<title level="a" type="main">Wikidata: a free collaborative knowledgebase</title>
		<author>
			<persName><forename type="first">D</forename><surname>Vrandečić</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Krötzsch</surname></persName>
		</author>
		<idno type="DOI">10.1145/2629489</idno>
		<ptr target="https://dl.acm.org/doi/10.1145/2629489" />
	</analytic>
	<monogr>
		<title level="j">Communications of the ACM</title>
		<imprint>
			<biblScope unit="volume">57</biblScope>
			<biblScope unit="issue">10</biblScope>
			<biblScope unit="page" from="78" to="85" />
			<date type="published" when="2014-09">Sep 2014</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
