<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Is SHACL Suitable for Data Quality Assessment?</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Carolina Cortés</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lisa Ehrlinger</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lorena Etcheverry</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Felix Naumann</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Hasso Plattner Institute</institution>
          ,
          <addr-line>Potsdam</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Universidad de la República</institution>
          ,
          <addr-line>Montevideo</addr-line>
          ,
          <country country="UY">Uruguay</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <abstract>
        <p>Knowledge graphs have been widely adopted in both enterprises, such as the Google Knowledge Graph, and open platforms like Wikidata, to represent domain knowledge and support artificial intelligence applications. They model real-world information as nodes and edges. To embrace flexibility, knowledge graphs often lack enforced schemas (i.e., ontologies), leading to potential data quality issues, such as semantically overlapping nodes. Yet ensuring their quality is essential, as issues in the data can afect applications relying on them. To assess the quality of knowledge graphs, existing works propose either high-level frameworks comprising various data quality dimensions without concrete implementations, define tools that measure data quality with ad-hoc SPARQL queries, or promote the usage of constraint languages, such as the Shapes Constraint Language (SHACL), to assess and improve the quality of the graph. Although the latter approaches claim to address data quality assessment, none of them comprehensively tries to cover all data quality dimensions. In this paper, we explore this gap by investigating the extent to which SHACL core can be used to assess data quality in knowledge graphs. Specifically, we defined SHACL shapes for 69 data quality metrics proposed by Zaveri et al. [1] and implemented a prototype that automatically instantiates these shapes and computes the corresponding data quality measures from their validation results. All resources are provided for repeatability.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Knowledge Graphs</kwd>
        <kwd>Data Quality Assessment</kwd>
        <kwd>RDF Validation</kwd>
        <kwd>SHACL</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Knowledge graphs (KGs) have been increasingly used to represent domain knowledge and support
artificial intelligence applications, such as information retrieval and question answering [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. As a result,
ensuring the quality of KGs is crucial for applications that rely on their input [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Major technology
companies, such as Google (Google Knowledge Graph [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]) and Amazon (Alexa Knowledge Graph [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]),
have developed proprietary KGs [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], while collaborative KGs, such as Wikidata [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], DBpedia [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] and
YAGO [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], have emerged as open-source alternatives. This growing adoption of KGs is reflected in the
growth of the Linked Open Data (LOD) cloud, which has enabled the publication of numerous datasets
across domains, with 1 656 resources as of Nov. 2024 [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
      </p>
      <p>
        Knowledge graphs represent information about the world as nodes and edges [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. They are typically
constructed and enriched using diverse sources and (semi-)automated techniques, with some also
supporting human curation [
        <xref ref-type="bibr" rid="ref10 ref3">3, 10</xref>
        ]. While several graph data models are available [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], this paper
focuses on the Resource Description Framework (RDF) [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], which structures data as triples consisting
of a subject, a predicate, and an object. RDF-based graphs can include a semantic schema, such as
a vocabulary or ontology, that defines their expected structure. However, to ensure flexibility and
support the evolution of KGs over time, these schemata are generally not enforced [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], which can
introduce data quality issues. Furthermore, the quality of ontologies directly afects the quality of the
data, as poorly defined classes or properties can lead to misclassified or ambiguous data in the graph,
and inconsistent ontology axioms can propagate errors in reasoning over the graph [14]. Consider the
case of Wikidata, which contains several classes that are dificult to distinguish, such as “geographical
location”, “location”, and “geographic region”, and also classes and instances appear mixed, such as
“scientist”, which is both a subclass of “researcher” and an instance of “profession” [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. We focus on the
quality assessment of the data graph, but the proposed methods can also be applied to ontologies.
      </p>
      <p>Data quality (DQ) is generally defined as “fitness for use” [ 15], which highlights the importance
of considering the context in which the data is utilized when assessing its quality. DQ is typically
evaluated across various dimensions, such as Completeness, Consistency, and Understandability [16].
These dimensions can be quantified using specific metrics, known as DQ metrics, which are designed
to measure diferent aspects of each dimension [ 16]. The process of data quality assessment (DQA)
involves obtaining numerical values, referred to as DQ measures, that characterize various aspects of DQ.
While many classifications exist for both DQ dimensions and metrics, there is no universally accepted
standard [17]. As a result, DQA remains a challenging task in practice.</p>
      <p>
        Several works address the quality of KGs from a high-level perspective, presenting definitions and
metrics [
        <xref ref-type="bibr" rid="ref1">1, 18</xref>
        ] or conceptual frameworks to assess the quality of KGs [19, 20] without a concrete
implementation. Other works approach the topic of KG quality from a bottom-up approach, by developing
tools to assess DQ (e.g., [21, 22, 23, 24]). Most of these works define ad-hoc SPARQL (SPARQL Protocol
and RDF Query Language) queries to obtain measures for DQ dimensions.
      </p>
      <p>
        In recent years, constraint languages, such as the Shapes Constraint Language (SHACL) [25] and
Shapes Expression Language (ShEx) [26], have emerged to enable RDF graph validation by specifying
constraints as shapes [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. Despite syntactic diferences, both languages enable the definition of
constraints on nodes and their value nodes (i.e., values reachable via properties or paths), allowing for
the detection of data violations [27]. These languages enable a low-level perspective on DQ, focusing
on concrete error detection through constraint checks. Unlike ad-hoc SPARQL-based approaches,
constraint languages like SHACL ofer a formal way to express validation rules.
      </p>
      <p>
        Research gap and contributions. Since constraint languages for RDF were first introduced, diferent
works have focused on the generation of shapes, either from the data or its metadata (e.g., ontologies) [
        <xref ref-type="bibr" rid="ref14 ref15 ref16 ref17 ref18">28,
29, 30, 31, 32, 33, 34, 35, 36</xref>
        ]. Only some works attempt to bridge the gap between the generation of
shapes and their potential to assess and improve the quality of the data [
        <xref ref-type="bibr" rid="ref17 ref18">28, 29, 35, 36</xref>
        ]. In particular,
Luthfi et al. [
        <xref ref-type="bibr" rid="ref18">36</xref>
        ] approaches DQ from the dimensions perspective, defining shapes for Completeness.
      </p>
      <p>
        This paper explores how SHACL core can be leveraged for DQA. In particular, we want to investigate
the extent to which we can connect the high-level view on DQ dimensions with the low-level view on
DQ, which entails identifying constraint violations using SHACL shapes. Thus, the contributions of
this paper are as follows:
1. Evaluation of the suitability of SHACL core for DQA by defining shapes for the set of 69 DQ
metrics defined by Zaveri et al. [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
2. A prototype that (i) automatically instantiates the defined SHACL shapes, and (ii) computes DQ
metrics based on the shape validation results.
      </p>
      <p>
        Outline. Section 2 summarizes related work. Section 3 presents the definition of SHACL shapes for a
single dimension and metric of each group defined by [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. The complete list of all defined SHACL shapes
for all dimensions is provided in the appendix of the extended version of this paper [
        <xref ref-type="bibr" rid="ref19">37</xref>
        ]. Section 4
describes the implemented prototype for SHACL-based DQA, followed by a discussion of the suitability
of SHACL for DQA in Section 5. Finally, Section 6 concludes the paper with an outlook on future work.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        Data quality in knowledge graphs. Several works address the quality of KGs from a high-level
perspective. In particular, Zaveri et al. [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] identified a set of DQ dimensions and metrics through a
systematic literature review and compared diferent DQA tools. Issa et al. [18] defined a set of DQ metrics
for the Completeness dimension, and also analyzed diferent tools capable of assessing KG Completeness.
Nayak et al. [19] analyzed diferent tools for assessment, profiling, and improvement of linked data
and proposed a DQ refinement lifecycle. Chen et al. [20] proposed a DQA framework by mapping KG
application requirements to DQ dimensions from [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], and extending them with two new dimensions:
Robustness and Diversity. However, none of these works present a concrete implementation.
      </p>
      <p>
        In addition, several tools have been proposed over the years to assess the quality of KGs and linked
data (which we consider a type of KG, though not all KGs follow Linked Data principles). Some of
these tools focus on specific DQ dimensions [
        <xref ref-type="bibr" rid="ref20 ref21">38, 39</xref>
        ], while others try to cover a wider range of them
[
        <xref ref-type="bibr" rid="ref22 ref23 ref24 ref25 ref26">21, 23, 40, 41, 24, 42, 43, 44</xref>
        ]. Moreover, some tools focus on assessing the quality of SPARQL endpoints
[
        <xref ref-type="bibr" rid="ref20 ref21 ref24">42, 39, 38</xref>
        ], while others focus on the quality of the data itself [
        <xref ref-type="bibr" rid="ref22 ref23 ref25 ref26">21, 23, 40, 41, 24, 43, 44</xref>
        ]. Assessment
is primarily done via ad-hoc SPARQL queries [
        <xref ref-type="bibr" rid="ref20 ref22 ref24 ref26">23, 40, 38, 24, 42, 22, 44</xref>
        ], while some introduce more
complex techniques [
        <xref ref-type="bibr" rid="ref25">21, 43</xref>
        ]. Only [24] considers the usage of a constraint language. Additionally, some
of these tools haven’t been maintained for some time or aren’t available (See Appendix A of [
        <xref ref-type="bibr" rid="ref19">37</xref>
        ] for
more details).
      </p>
      <p>Constraint languages. Constraint languages validate a set of conditions over RDF graphs. The
Shapes Constraint Language (SHACL), a W3C recommendation [25], enables this via shapes that specify
constraints on the graph. The shapes graph holds these shapes, and the data graph is the RDF graph
being validated. When a shape is evaluated on a node (the focus node), SHACL uses node shapes, to
constrain the node itself, and property shapes to constrain value nodes reached from the focus node via
a property or path in the graph. Constraint components define conditions to validate focus and value
nodes. For example, MinCountConstraintComponent defines the minCount property, which can be used
to specify a minimum number of values for a given property. The validation process takes a data graph
and a shapes graph as input and produces a validation report, which provides insights on how to fix the
errors causing the violations.</p>
      <p>
        While constraint languages provide a formal way to define and validate constraints, writing them
is time-consuming and requires domain expertise [27]. To address this, several works aim to
automatically [
        <xref ref-type="bibr" rid="ref14 ref15 ref16">28, 29, 30, 31, 32, 33, 34</xref>
        ] or semi-automatically [
        <xref ref-type="bibr" rid="ref17 ref18">35, 36</xref>
        ] generate shapes from existing
data [
        <xref ref-type="bibr" rid="ref16">29, 30, 28, 34</xref>
        ] or related artifacts [
        <xref ref-type="bibr" rid="ref14 ref15">32, 33, 31</xref>
        ], such as ontologies. Data-driven approaches usually
cover basic constraints (e.g., required properties, ranges, cardinality) [
        <xref ref-type="bibr" rid="ref15">33</xref>
        ] but often produce many
unreliable shapes, which are filtered using support/confidence [ 29] or trustworthiness scores [
        <xref ref-type="bibr" rid="ref16">34</xref>
        ].
Artifact-based methods, on the other hand, can generate richer constraints by leveraging formal
restrictions like OWL axioms.
      </p>
      <p>
        Few of these works try to bridge the gap between the generation of shapes and how these can help
assess and improve DQ. Spahiu et al. [28] present an approach to generate SHACL shapes from semantic
profiles created with a profiling tool, which are then used to assess the quality of diferent versions of a
dataset over time. Rabbani et al. [29] present the tool SHACTOR, which not only generates shapes from
a KG, but also allows the user to generate SPARQL queries that retrieve the triples that produce low
support and confidence shapes. These triples are considered to be erroneous and can be removed from
the graph to improve its quality. Luthfi et al. [
        <xref ref-type="bibr" rid="ref18">36</xref>
        ] consider the survey [18] and define shape patterns for
diferent aspects of Completeness, testing them on Wikidata and DBpedia. To instantiate the shape
patterns, they consider specific information provided by Wikidata, such as property constraints. Yang
et al. [
        <xref ref-type="bibr" rid="ref17">35</xref>
        ] propose a SHACL-based DQ validation process for Completeness, Accuracy, and Consistency,
using shapes tailored to a health ontology. While constraint examples are given, full shape definitions
are not publicly accessible, and the method is specific to a particular ontology and dataset.
      </p>
      <p>Although the approaches discussed above claim to address DQ in some way, none of them
comprehensively attempts to cover all DQ dimensions.</p>
    </sec>
    <sec id="sec-3">
      <title>3. SHACL shapes for DQ dimensions</title>
      <p>
        This paper builds on the comprehensive DQ metrics survey by Zaveri et al. [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], which groups dimensions
into four categories. We follow their structure, defining SHACL core shapes for each metric or explaining
SHACL core’s limitations when shapes cannot be defined. 1 We present the results of this study in tables
1–4, which illustrate the feasibility of implementing metrics from [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] with SHACL core. Symbols ✓,
1A discussion on SHACL extensions and the justification for the focus on SHACL core can be found in Section 5.3. Across the
paper we use SHACL interchangeably with SHACL core, unless stated otherwise.
p, and x denote the degree to which metric implementation was possible (full, partial, or not at all,
respectively). For the shape definition, we made some realistic assumptions (A1–3):
(A1) All entities are explicitly typed: for each entity e representing a real-world object, the triple
⟨e,rdf:type,c⟩ exists in the graph.
(A2) The ontology used is suficiently defined: each class c and property p has triples
⟨c,rdf:type,rdfs:Class⟩ and ⟨p,rdf:type,rdf:Property⟩, respectively. Each property p,
includes domain and range triples ⟨p,rdfs:domain,d⟩ and ⟨p,rdfs:range,r⟩. Other property
characteristics (e.g., irreflexivity, asymmetry) are also defined. For every instance i of a class c
defined in the ontology, ⟨i,rdf:type,owl:NamedIndividual⟩ exists.
(A3) Relevant domain knowledge, typically provided by domain experts, is available for shape
instantiation, e.g., expected number of values for certain properties, gold standard value sets, or
definitions of when data is considered to be up to date.
      </p>
      <p>We acknowledge that these assumptions may not always hold in real-world KGs. If (A1) is violated,
untyped instances are excluded from validation. Likewise, when (A2) is not satisfied, schema
characteristics such as range, domain, or property constraints (e.g., symmetry, irreflexivity) cannot be checked. One
possible direction to relax these requirements is to mine such characteristics automatically or leverage
profiling statistics to approximate range and domain information, thereby supporting the construction
of an “initial” ontology. Regarding (A3), this is inherent to DQA, as some quality dimensions rely on
domain knowledge, limiting their assessment. Tables 1–4, mark the use of assumptions with *  , where
X is the assumption number. In the following subsections, we summarize SHACL core coverage per
group, discuss a representative DQ metric for a single dimension of each group, and, if applicable,
provide the corresponding shape in Turtle syntax. For the implemented shapes, we also indicate the DQ
measure type: binary measure (1 if no violations, 0 otherwise), ratio measure (violation-based ratios), or
composite measure (aggregated score across shape instances for specific properties or classes).</p>
      <sec id="sec-3-1">
        <title>3.1. Accessibility</title>
        <p>
          The Accessibility group includes dimensions related to accessing, retrieving, and verifying the
authenticity of data: Availability, Licensing, Interlinking, Security, and Performance [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. In the following, we
exemplarily present the shape definition for a metric of the dimension Performance. Table 1 summarizes
which metrics from the entire Accessibility group can be implemented using SHACL core. The remaining
shapes’ definitions can be found in Appendix B of [
          <xref ref-type="bibr" rid="ref19">37</xref>
          ].
        </p>
        <p>
          Performance refers to how eficiently a system that hosts a large dataset can process data. For this
dimension, Zaveri et al. [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] defined four metrics. We showcase the assessment of Performance with
metric P1 in SH1. P1 was defined by [
          <xref ref-type="bibr" rid="ref27">45</xref>
          ] and checks for slash-URIs in datasets with over 500 000 triples.
According to W3C recommendations, slash-URIs are preferred to identify entities in large graphs
because hash (“#”) URIs can cause performance issues [
          <xref ref-type="bibr" rid="ref28">46</xref>
          ]. We follow the W3C Best Practices [
          <xref ref-type="bibr" rid="ref29">47</xref>
          ] and
apply this rule to entities’ URIs. Here, SH1 targets subjects of triples with predicate rdf:type, applying
a regex pattern that does not allow # to appear in URIs. Note that the regex pattern identifies any hash
occurrence (not just at the end of the URI) to cover cases such as https://www.example.org#Example1/,
which still loads all entities with https://www.example.org# as base namespace. Therefore, the validation
result will output nodes whose URI contains a hash in any part of the URI, not just at the end.
        </p>
        <p>Shape 1: Performance - Use of Hash URIs in Entities
ex:UsageHashURIsShape a sh:NodeShape ;
sh:targetSubjectsOf rdf:type;
sh:or (
[ sh:path rdf:type; sh:hasValue rdfs:Class; ] [ sh:path rdf:type; sh:hasValue rdf:Property; ]
[ sh:path rdf:type; sh:hasValue owl:NamedIndividual; ] [ sh:pattern "^[^#]*$"; ]
).</p>
        <p>For this metric, assumption (A1) is required; without it, URIs of untyped entities cannot be checked.
Assumption (A2) is also needed to restrict the constraint to entities, excluding classes, properties, and
named individuals. Instances of owl:NamedIndividual are excluded to avoid mixing ontology-level
individuals with graph entities, since validation runs over a graph containing both instance data and
schema definitions. This “filtering” approach is used across the whole study when the metric verifies a
constraint across all entities in the graph. The DQ measure derived from the validation result is a ratio
measure, calculated with ##vieonltaittiioenss .</p>
        <p>Dimension
Availability
Licensing
Interlinking</p>
        <p>Security
Performance</p>
        <p>Metric Id</p>
        <p>A1
A2
A3
A4
A5
L1
L2
L3
I1
I2
I3
S1
S2
P1
P2
P3
P4</p>
        <p>Metric
Accessibility of the SPARQL endpoint and the server
Accessibility of the RDF dumps
Dereferenceability of the URI
No misreported content types
Dereferenced forward-links
Machine-readable indication of a license in the VoID description
Human-readable indication of a license in the documentation
Specifying the correct license
Detection of good quality interlinks
Existence of links to external data providers
Dereferenced back-links
Usage of digital signatures
Authenticity of the dataset
Usage of slash-URIs
Low latency
High throughput</p>
        <p>Scalability of a data source</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Intrinsic</title>
        <p>
          The Intrinsic category groups dimensions that are independent of the user’s context, and assess whether
data accurately (syntactically and semantically), compactly, and completely represents the real world,
and whether it is logically consistent. The dimensions in this category are Syntactic Validity, Semantic
Accuracy, Consistency, Conciseness, and Completeness [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. In the following, we exemplarily present
the shape definition for a metric of the dimension Consistency. Table 2 summarizes which metrics from
the Intrinsic group can be implemented using SHACL core. The remaining shapes’ definitions can be
found in Appendix B of [
          <xref ref-type="bibr" rid="ref19">37</xref>
          ].
        </p>
        <p>
          Consistency means that a knowledge base contains no (logical or formal) contradictions according
to its knowledge representation and inference mechanisms [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. For this dimension, Zaveri et al.
[
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] identified 10 metrics. We showcase the assessment of Consistency with metric CN5, which checks
the correct use of inverse-functional properties. For this metric, Zaveri et al. [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] discuss two ways
to check inverse-functional properties: (i) verifying the uniqueness of their values and (ii) defining
a SPARQL constraint for such properties. While SHACL core cannot cover (ii), we defined SH2 to
address (i), which ensures that no two diferent subjects share the same value, by verifying that each
object has only one incoming link.
        </p>
        <p>Shape 2: Consistency - Uniqueness of inverse functional properties
ex:InverseFunctionalPropertyShape a sh:NodeShape ;
sh:targetObjectsOf PROPERTY_URI;
sh:property [ sh:path [ sh:inversePath PROPERTY_URI ]; sh:maxCount 1; ].</p>
        <p>SH2 shape is meant to be instantiated by replacing the placeholder PROPERTY_URI with specific
inverse-functional properties defined in the ontology/vocabulary. Shape instantiation is needed in cases
where metrics apply to particular classes or properties: instead of defining a separate shape for each
one, we define a generic shape with a placeholder and instantiate it with the relevant URI(s) before
validation. Moreover, in some cases, shapes may need to be instantiated with domain knowledge (e.g.
SH3 in Section 3.3).</p>
        <p>The validation report outputs a violation for each property value that is used more than once.
Additionally, the DQ measure is a composite measure, so we compute an individual score for each
property as: 1 if no violations are found, 0 otherwise. The final metric score is then aggregated
with the formula: # invers#e-ifnuvnecrtsieo-nfaulncptrioopnearltipersopuesretdietso cionrsrteacnttliyatuesetdhe shape . We consider an
inversefunctional property correctly used if the value of its individual score is 1.</p>
        <p>Dimension
Syntactic
validity
Semantic
accuracy
Consistency
Conciseness
Completeness</p>
        <p>Metric Id</p>
        <p>SV1
SV2
SV3
SA1
SA2
SA3
SA4
SA5
CN1
CN2
CN3
CN4
CN5
CN6
CN7
CN8
CN9
CN10
CS1
CS2
CS3
CP1
CP2
CP3
CP4</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Contextual</title>
        <p>
          Contextual dimensions are those that depend on the specific task at hand or on the context. This
group includes four dimensions: Relevancy, Trustworthiness, Understandability, and Timeliness. We
exemplarily present the shape definition for a metric of the dimension Timeliness. Table 3 summarizes
which metrics from the entire Contextual group can be implemented using SHACL core. The remaining
shapes’ definitions can be found in Appendix B of [
          <xref ref-type="bibr" rid="ref19">37</xref>
          ].
        </p>
        <p>
          Timeliness measures how current (or up-to-date) data is in relation to a specific task. For this
dimension, Zaveri et al. [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]present two metrics. We illustrate the assessment of Timeliness with T1,
which verifies the freshness of the dataset based on currency and volatility. This metric uses the
formula {0, 1 − vcuorlarteinlictyy }, where volatility refers to the length of time the data remains valid, and
currency describes the age of the data at the time it is delivered to the user. In this case, we are not
able to calculate the formula, but we can use SHACL core to identify outdated nodes. Therefore, for
the definition of SH3, we assume there’s some temporal annotation in the data, for example, using
properties like dcterms:date or dcterms:temporal from the Dublin Core vocabulary. The defined
shape identifies outdated entities by constraining the value of the temporal property to be after a certain
point in time, indicating that the node is up-to-date. For this shape, we need to consider all assumptions,
given that we are targeting entities. Additionally, we require a vocabulary or ontology to determine
which properties are used to annotate entities with temporal facts, and domain knowledge indicating
when entities are considered up-to-date. The validation report for this shape outputs a violation for
each of the entities whose dcterms:date value is older than DATE_RANGE_MIN_BOUND. The DQ measure
in this case is a ratio measure, calculated as #violations .
        </p>
        <p>#entities</p>
        <p>Shape 3: Timeliness - Outdated entities
ex:TimelinessEntitiesShape a sh:NodeShape ;
sh:targetSubjectsOf rdf:type;
sh:or ( [ sh:path rdf:type; sh:hasValue rdfs:Class; ]
[ sh:path rdf:type; sh:hasValue rdf:Property; ]
[ sh:path rdf:type; sh:hasValue owl:NamedIndividual; ]
[ sh:path dcterms:date; sh:minInclusive "DATE_RANGE_MIN_BOUND"; ] ).</p>
        <p>Dimension</p>
        <p>Relevancy
Understandability
Trustworthiness</p>
        <p>Timeliness</p>
        <p>Metric Id</p>
        <p>R1
R2
U1
U2
U3
U4
U5
U6
TW1
TW2
TW3
TW4
TW5
TW6
TW7
T1
T2</p>
        <p>Metric
Relevant terms within meta-information attributes
Coverage
Human-readable labelling of classes, properties and entities as well
as presence of metadata
Indication of one or more exemplary URIs
Indication of a regular expression that matches the URIs of a dataset
Indication of an exemplary SPARQL query
Indication of the vocabularies used in the dataset
Provision of message boards and mailing lists
Trustworthiness of statements
Trustworthiness through reasoning
Trustworthiness of statements, datasets and rules
Trustworthiness of a resource
Trustworthiness of the information provider
Trustworthiness of information provided (content trust)
Reputation of the dataset
Freshness of datasets based on currency and volatility
Freshness of datasets based on their data source
Implemented with SHACL core</p>
        <p>x
p * 3
p * 1* 2</p>
        <p>✓
✓ * 1* 2
x
✓
x
x
p * 1* 2
p * 1* 2</p>
        <p>x
p * 2* 3
✓ * 1 * 2 * 3</p>
        <p>x
p * 1 * 2 * 3
p * 3</p>
      </sec>
      <sec id="sec-3-4">
        <title>3.4. Representational</title>
        <p>
          The Representational group addresses design aspects of the data. The dimensions in this category are
Representational Conciseness, Interoperability, Versatility, and Interpretability [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. We exemplarily
present the shape definition for a metric of the dimension Versatility. Table 4 summarizes which metrics
from the entire Representational group can be implemented using SHACL core. The remaining shapes’
definitions can be found in Appendix B of [
          <xref ref-type="bibr" rid="ref19">37</xref>
          ].
        </p>
        <p>
          Versatility refers to the availability of data in multiple representations and its support for
internationalization [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. For this dimension, Zaveri et al. [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] present two metrics. We showcase the assessment
of Versatility with V2, defined in [
          <xref ref-type="bibr" rid="ref27">45</xref>
          ], which checks whether data is available in multiple languages
by verifying the use of language tags in literals used for entity labels and descriptions. This metric
can be partially covered by SHACL core, as it allows us to check for the presence of language tags on
labels and descriptions. However, it does not verify whether the literal values are actually written in
the specified language, which would require semantic analysis beyond SHACL’s capabilities. Therefore,
for this metric, we defined shapes SH4 and SH5, which check that labels and descriptions in entities
have language tags. For both shapes, we consider assumptions (A1) and (A2), as we verify the use of
language tags in entity labels and descriptions. Moreover, constraints are only checked for entities that
have a label or description.
        </p>
        <p>Shape 4: Versatility - Languages in entities labels
ex:DifferentLanguagesLabelsShape a sh:NodeShape ;
sh:targetSubjectsOf rdfs:label;
sh:or (
[sh:path rdf:type; sh:hasValue rdfs:Class;]
[sh:path rdf:type; sh:hasValue rdf:Property;]
[sh:path rdf:type; sh:hasValue owl:</p>
        <p>NamedIndividual;]
[sh:path rdfs:label; sh:datatype rdf:langString;]
).</p>
        <p>Shape 5: Versatility - Languages in entities
descrip</p>
        <p>tions
ex:DifferentLanguagesDescriptionsShape a sh:NodeShape;
sh:targetSubjectsOf rdfs:comment;
sh:or (
[sh:path rdf:type; sh:hasValue rdfs:Class;]
[sh:path rdf:type; sh:hasValue rdf:Property;]
[sh:path rdf:type; sh:hasValue owl:NamedIndividual</p>
        <p>;]
[sh:path rdfs:comment; sh:datatype rdf:langString</p>
        <p>;]
).</p>
        <p>The validation report for SH4 and SH5 outputs a violation for each entity that has a label or description
without a language tag, respectively. In both cases, the DQ measure is a ratio measure, calculated using
the formulas # ent#itviieoslawtiitohnslabels and # entiti#esviwoiltahtidoensscriptions , respectively.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Prototype</title>
      <p>
        This section presents our prototype for DQA using SHACL core. It is built with the Python libraries
rdflib and PySHACL and provides a Streamlit dashboard to visualize results. The code is available on
GitHub at [
        <xref ref-type="bibr" rid="ref30">48</xref>
        ].
      </p>
      <p>Overview of the DQ assessment process. Our DQA process takes as input the data graph to be
evaluated, optionally a metadata file (either the VoID or DCAT description of the graph), and an ontology,
along with a set of vocabularies used in the data graph. Without an ontology, vocabularies, or a metadata
ifle, fewer shapes are instantiated, and some, such as those targeting void:Dataset or those checking
for class/property labels, are not validated, as they rely on these artifacts. A configuration file allows
users to customize preferred properties needed for instantiating SHACL shapes.</p>
      <p>Figure 1 shows the architecture of the prototype. The DQA process begins by Step (1)
proifling the graph, ontology, and vocabularies to obtain the necessary information for shape
instantiation and metric calculation. For example, SH2 is instantiated with properties marked as
owl:InverseFunctionalProperty in the ontology; while to compute the measure associated with
SH1, we retrieve the total number of entities.</p>
      <p>
        After obtaining this information, we (2) instantiate the shapes’ templates and generate the shapes
graph. Our prototype stores SHACL shapes as reusable templates with variables that are replaced with
actual values during instantiation – a process where the template is populated with specific properties
from either the graph profiling results or the configuration file of the dataset. For example, shape SH1
uses the type property specified in the configuration file to create a concrete shape from its template
form. Moreover, shape SH2 is instantiated with inverse-functional properties used in the dataset,
obtained from the graph profiling results. Before shape validation, we perform a pre-processing step,
based on assumption (A2), to enrich the data graph with necessary triples (described in detail in Section
4 of [
        <xref ref-type="bibr" rid="ref19">37</xref>
        ]). Then, we use the validator (3) provided by the PySHACL library to validate the shapes against
the data graph, ontology, vocabularies, or metadata file, depending on the shape. Once the validation is
completed, we (4) calculate the DQ measures from the validation result, which are later stored in a CSV
ifle. Finally, the results can be visualized in a dashboard (5).
      </p>
      <p>
        Shape instantiation details. For the 69 metrics defined in Zaveri et al. [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], we defined 64 shapes. The
mapping is not one-to-one: some metrics have no shape, others (e.g., V2 in Section 3.4) have multiple.
We identified 38 shapes that could be instantiated without domain expert input, and excluded five
more: one requiring merged graphs (SH11) and 4 using non-standard vocabularies (S13, S14, S53, S55)
(see Appendix B of [
        <xref ref-type="bibr" rid="ref19">37</xref>
        ]). Of the 38, 11 rely on vocabulary or ontology terms, and are instantiated only
with those found in the data graph (e.g., SH2 uses only inverse-functional properties present in the
data). This avoids generating unnecessary shapes, as many data graphs may partially reuse vocabularies.
The exceptions are the shapes for metrics CN2 and CP1: CN2 checks for property/class misuse by
instantiating with all available classes and properties since misused ones do not appear in profiling
results, while CP1 uses all defined classes in the vocabulary to check if they are used. See Appendix C
of [
        <xref ref-type="bibr" rid="ref19">37</xref>
        ] for additional aspects of shape instantiation aimed at improving runtime eficiency.
Evaluation. We evaluated the prototype with three datasets from the LOD Cloud [
        <xref ref-type="bibr" rid="ref31">49</xref>
        ]: Temples of the
Classical World (15 326 triples and 1 363 entities), DBTunes - John Peel Sessions (271 369 triples and 76 056
entities), and Drugbank (3 646 181 triples and 316 555 entities). Validation time ranged from 40 seconds
to 3 hours for 500–740 instantiated shapes, depending on the dataset. Longer times are likely due to
both more triples and more shapes. As the prototype aimed to test the SHACL-based DQA approach,
performance aspects were left outside its scope, and no runtime experiments were conducted since
validation relied on an external library. Section 4 of [
        <xref ref-type="bibr" rid="ref19">37</xref>
        ] presents detailed results for the Temples of the
Classical World dataset, while results for other datasets are available on GitHub at [
        <xref ref-type="bibr" rid="ref30">48</xref>
        ]
      </p>
    </sec>
    <sec id="sec-5">
      <title>5. Suitability of SHACL core to assess RDF data quality</title>
      <p>
        We now discuss the suitability of SHACL core to assess RDF DQ. Section 5.1 highlights where SHACL
core falls short in covering certain DQ dimensions – metrics not mentioned are considered covered
(see Section 3 and Appendix B of [
        <xref ref-type="bibr" rid="ref19">37</xref>
        ]). We then examine SHACL core’s strengths and limitations
(Section 5.2), and discuss SHACL core’s extensions (Section 5.3).
      </p>
      <sec id="sec-5-1">
        <title>5.1. Coverage of DQ dimensions by SHACL core</title>
        <p>
          For each of the 18 DQ dimensions discussed in Zaveri et al. [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ], we present SHACL core coverage as
the normalized sum of coverage values assigned to each DQ metric: 1 for full coverage, 0.5 for partial
coverage, and 0 for no coverage. The resulting sum is divided by the total number of metrics to obtain
an overall coverage percentage. Figure 2 shows these results, where two dimensions with 100% coverage
stand out: Representational Conciseness and Security. In these cases, the implemented metrics check
for the presence or use of specific properties or classes.
        </p>
        <p>Seven dimensions present a coverage between 50-90% : Timeliness, Understandability, Consistency,
Completeness, Syntactic Validity, Interoperability, Interpretability, and Versatility. Regarding
Timeliness, SHACL core partially covers both of the proposed metrics. While it cannot directly compute the
formulas these metrics rely on, like T1’s formula or T2’s time-distance calculation, it can still check
related aspects. For example, for T1, we defined a shape that detects outdated entities via temporal
annotations, and for T2, the shape verifies if the dataset is up to date, even though we cannot compute
distances with SHACL core.</p>
        <p>In the context of Understandability, Versatility, and Interpretability, SHACL only partially covers
metrics U1, V2, and ITP2 due to its limitations to do “semantic” checks. It can confirm the presence
of labels (U1) and language tags (V2, ITP2), but it cannot assess the readability of labels or whether
literals align with their specified language. Additionally, for Understandability, SHACL core cannot
cover metrics U4 and U6. U4 is not covered because there’s no standard method to declare example
SPARQL queries. While datasets might use properties like rdfs:comment to include such queries, this is
not standard and would require language processing to verify. U6 involves checking external resources
(e.g., mailing lists), which SHACL core cannot handle since it only works on RDF graphs.</p>
        <p>In terms of Consistency, several metrics rely on SPARQL queries or reasoning, which are beyond the
capabilities of SHACL core (specifically metrics CN5, CN6, and CN10). Additionally, SHACL core does
not cover CN8, as it requires specialized knowledge related to the representation of spatial data.</p>
        <p>In the case of Completeness, all metrics are partially covered. CP1 measures schema completeness,
but SHACL core can only check class usage, not property usage, since SHACL core cannot target triples
in the graph to check whether a property is used or not. CP2 defines two aspects for measuring property
completeness, where the second one uses property distribution statistics to assess completeness, which
SHACL core does not support. CP3 measures population completeness; while SHACL core can check for
cardinality and allowed values, it cannot leverage semantic constructs explicitly stated in the graph (e.g.,
recognizing equivalent entities defined using owl:sameAs). Finally, CP4 (interlinking completeness)
is partially covered: SHACL core cannot verify linkset completeness because it involves checking the
existence of links between instances of equivalent classes. However, SHACL node shapes require a
specific target, while this check verifies the existence of certain triples in the graph and is not associated
with any particular node.</p>
        <p>When it comes to Syntactic Validity, SHACL core cannot cover SV1 since it requires checking syntax
errors of RDF documents, while SHACL works after parsing the RDF document. SV2 is only partially
covered because some aspects of this metric involve complex techniques like clustering.</p>
        <p>For Interoperability, SHACL core does not cover all metrics due to its limited target capabilities. In
particular, ITO2 requires checking vocabulary usage. While SHACL can check the usage of classes,
it cannot check property usage, as this would entail checking if a property is used in any triple, but
SHACL cannot target the predicate position in triples.</p>
        <p>Finally, we turn to dimensions that are covered below 50% (Availability, Interlinking, Licensing
Performance, Relevancy, Trustworthiness, Conciseness, and Semantic accuracy).</p>
        <p>For Availability, most of the metrics cannot be covered with SHACL core, as most require accessing
resources on the web, such as checking the dereferenceability of URIs or detecting broken links.</p>
        <p>In the case of Interlinking, SHACL core cannot cover I1, as it requires computing graph-based
measures (e.g., interlinking degree), which cannot be expressed as constraints over nodes and their
properties. I3 also cannot be covered: SHACL cannot dereference URIs (task 1), nor verify whether
there is any triple with the resource as the object (in-links) (task 2). For task 2, SHACL core also falls
short because, while it can check for the existence of values for specific properties using sh:path, it
cannot verify the existence of any triple where the resource is the object, regardless of the predicate.</p>
        <p>As for Licensing, the metric L2 cannot be covered with SHACL core since it entails verifying the
existence of a human-readable license in the documentation of the dataset, which usually is an HTML
document. For metric L3, it also cannot be covered with SHACL core, as it requires checking license
clauses and comparing licenses between datasets, which involves natural language processing.</p>
        <p>Regarding Performance, SHACL core cannot cover metrics (P2 - P4), as they describe system-level
behaviors (e.g., low-latency or high throughput), rather than constraints on nodes and properties.</p>
        <p>For Relevancy, R1 is not covered, as it requires identifying relevant data via ranking or crowd-sourcing,
which SHACL, as a constraint language, does not support. R2 is only partially covered, since it entails
measuring the coverage (i.e., number of entities) and level of detail of entities (i.e., number of properties)
in the dataset, to ensure that the data is appropriate for the task at hand. While SHACL core does
not provide a way of counting entities and properties, we were able to define a shape that states the
expected properties (level of detail) for instances of a certain class. In the validation results, we obtain
entities lacking the expected level of detail.</p>
        <p>Regarding Trustworthiness, SHACL core fully covers one metric, partially covers three, and cannot
cover the remaining three. Metrics TW2 and TW3 require annotating the data with trust values, for
example, using a trust ontology. While SHACL core cannot annotate the data, it can check the presence
of these annotations, so these are partially covered. TW5 can only be partially covered since one of the
aspects of this metric states checking the trustworthiness of the information provider using decision
networks, which SHACL core cannot handle. To conclude, the metrics that cannot be covered for this
dimension require trust value computations (TW1, TW4, and TW7) or human input (TW7), both beyond
SHACL’s capabilities.</p>
        <p>For Conciseness, SHACL core is not able to cover the metric CS1, as it requires identifying semantically
equivalent properties or classes. CS2 is only partially covered, as one of its approaches requires
identifying duplicated entities (i.e., entities with diferent URIs but similar or identical property values),
which involves cross-entity comparisons and similarity measures, both unsupported by SHACL core.
Finally, CS3 cannot be covered by SHACL core, as detecting ambiguous labels and annotations requires
semantic interpretation beyond SHACL’s capabilities.</p>
        <p>
          Finally, for Semantic Accuracy, three metrics (SA1, SA4, and SA5) are not covered, since they rely
on outlier detection (SA1), profiling (SA4), and association rule generation via induction and analogy
methods (SA5) - all of which are unsupported by SHACL core. The other two (SA2 and SA3) are only
partially covered. SA2 specifies three checks to detect inaccurate values; the third (i.e., validating
functional dependencies) is not supported by SHACL core, as it requires comparing property values for
triples that do not share the same subject. Metric SA3 checks for inaccurate labels and classifications.
We defined a SHACL shape to verify labels and types against a list of allowed values, but we cannot
compute the original metric defined in Zaveri et al. [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] since it requires similarity measures, and SHACL
core supports only exact value matching.
        </p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. Strengths and Limitations of SHACL core</title>
        <p>
          SHACL core has many strengths for specific aspects of DQA. For example, it can perform syntactic
validation, including pattern validation, datatype checks, and verifying whether values fall within
specified ranges or lists. It can validate consistency between the data and vocabulary or ontology
definitions, such as ensuring correct use of a property’s domain or range. It can also support best
practices for publishing Linked Data, such as the use of hash URIs for entities, labels in resources
(i.e., entities, classes, and properties), and some characteristics of URI designs (i.e., short URIs and
no parameters). Moreover, SHACL can verify the correct application of property characteristics such
as irreflexive, functional, asymmetric, and inverse-functional. Finally, it is well-suited for defining
the expected structure of class instances, specifying required properties, their types, and other basic
constraints [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ].
        </p>
        <p>However, its applicability to DQA is still limited in several ways:
Lack of External Access. SHACL core operates solely within the RDF graph and cannot access
external web resources or perform system-level checks. This restricts its applicability to metrics related
to Availability and Performance, which often require testing endpoints, dereferencing URIs, or measuring
response times.</p>
        <p>Node-Centric Scope. Constraints in SHACL core are evaluated for individual focus nodes and their
immediate property values. This node-centric approach limits the ability to compute network-level
measures, which are essential for evaluating the Interlinking dimension (covered by SHACL-SPARQL).
No Cross-Entity Comparison. SHACL core lacks mechanisms to compare property values across
diferent entities in the graph. As a result, it cannot assess functional dependencies or detect duplicated
entities, limiting its applicability for Semantic Accuracy and Conciseness (covered by SHACL-SPARQL).
No Arithmetic or Dynamic Expressions. SHACL core lacks support for arithmetic operations and
dynamic expressions. For instance, it cannot evaluate conditions like age = now() - birthDate. This
limits the applicability of SHACL core for metrics that depend on computed values, such as Timeliness
and Trustworthiness (covered by SHACL-SPARQL).</p>
        <p>Limited Semantic Awareness. SHACL core lacks mechanisms to verify semantic aspects of the
data, which are crucial for dimensions like Semantic Accuracy, Understandability, and Versatility.
Additionally, it cannot take advantage of semantic declarations present in the graph, such as entity
alignments defined through properties like owl:sameAs.</p>
        <p>Restricted Targeting Mechanism. SHACL provides only a small set of predefined target types (e.g.,
sh:targetClass, sh:targetNode). This makes it dificult to define metrics that need to assess patterns
over arbitrary triples or general predicate-based constraints (covered by SHACL-SPARQL).</p>
      </sec>
      <sec id="sec-5-3">
        <title>5.3. SHACL extensions</title>
        <p>
          The SHACL language consists of two main components: SHACL core and SHACL-SPARQL. Additionally,
there exist non-standard SHACL extensions such as the DASH Data Shapes Vocabulary [
          <xref ref-type="bibr" rid="ref32">50</xref>
          ] and SHACL
Advanced Features [
          <xref ref-type="bibr" rid="ref33">51</xref>
          ], all of which rely heavily on SPARQL. This paper focuses on SHACL core for
several reasons. First, studies that extract constraints from RDF graphs primarily use SHACL core
components. Notably, research by Spahiu et al. [28] and Rabbani et al. [29] indicate that SHACL can
enhance DQ. Thus, we aim to determine if SHACL core alone is suficient for this purpose. Second, while
SHACL can be implemented in various programming languages (e.g., jena-shacl in Java and PySHACL
in Python), each with their own optimizations, once SPARQL-based constraints are used, performance
depends on the underlying query engine. Furthermore, while the SHACL recommendation details
its extension via SPARQL, other languages could be used. Lastly, we choose to focus on SHACL core
because it is typically easier to understand and use than writing complete SPARQL queries, allowing for
more intuitive formulation of constraints accessible to domain experts [52]. Appendix D of [
          <xref ref-type="bibr" rid="ref19">37</xref>
          ] provides
more details on SHACL extensions.
        </p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion and Future Work</title>
      <p>
        In this paper, we assess the suitability of SHACL for DQA by defining shapes for the 69 data quality (DQ)
metrics identified by Zaveri et al. [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], whenever possible. We also developed a prototype to automatically
instantiate and validate the defined shapes, and compute DQ measures from the validation results.
      </p>
      <p>Our findings indicate that SHACL is well-suited for syntactic validation (pattern validation, datatype
checks, and value range enforcement), structural validation of class instances, ensuring correct use of
properties and classes, and enforcing linked data best practices. This makes SHACL particularly useful for
assessing dimensions such as Syntactic Validity, Interpretability, Security, Representational Conciseness,
and specific aspects of Consistency, Versatility, and Understandability. However, SHACL core has
several limitations: it cannot access external resources, perform cross-entity comparisons, support
network-based measures, or assess data semantics. This limits its use for Availability, Conciseness,
Interlinking, and Semantic Accuracy. Its lack of dynamic calculations and limited targeting also limits
its applicability for Interoperability, Trustworthiness, and Timeliness.</p>
      <p>
        The DQ metrics defined by Zaveri et al. [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] may benefit from refinement, as some are overly specific
(e.g., those that require spatial data representations) and dificult to generalize, limiting their practical
use in KG DQA. Moreover, new metrics (e.g., bias [53]) are not covered in the survey; we suggest
extending this work to test whether SHACL core can cover them.
      </p>
      <p>Assessing DQ goes beyond checking constraints, encompassing the data source, the system processing
the data, downstream tasks, and human factors [54]. SHACL core focuses on data graphs, limiting
its applicability for comprehensive DQA. Future work should explore hybrid approaches combining
SHACL’s constraint checking with tools that access external resources and LLMs to overcome its limited
semantic awareness. It would also be useful to identify which limitations are inherent to SHACL core
design that can be addressed through extensions.</p>
    </sec>
    <sec id="sec-7">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, Grammarly was used for spell checking.
[14] S. M. Gurk, C. Abela, J. Debattista, Towards ontology quality assessment, in: Joint proceedings of
the Workshop on Managing the Evolution and Preservation of the Data Web (MEPDaW) and the
Workshop on Linked Data Quality (LDQ) co-located with European Semantic Web Conference
(ESWC), volume 1824 of CEUR Workshop Proceedings, CEUR-WS.org, 2017, pp. 94–106.
[15] R. Y. Wang, D. M. Strong, Beyond Accuracy: What Data Quality Means to Data Consumers,</p>
      <p>Journal of Management Information Systems 12 (1996) 5–33.
[16] L. L. Pipino, Y. W. Lee, R. Y. Wang, Data quality assessment, Communications of the ACM 45
(2002) 211–218. doi:10.1145/505248.506010.
[17] C. Cichy, S. Rass, An Overview of Data Quality Frameworks, IEEE Access 7 (2019) 24634–24648.
[18] S. Issa, O. Adekunle, F. Hamdi, S. S. Cherfi, M. Dumontier, A. Zaveri, Knowledge Graph
Completeness: A Systematic Literature Review, IEEE Access 9 (2021) 31322–31339. doi:10.1109/ACCESS.
2021.3056622.
[19] A. Nayak, B. Bozic, L. Longo, Linked Data Quality Assessment: A Survey, in: Web Services
- ICWS - International Conference, Held as Part of the Services Conference Federation, SCF,
volume 12994 of Lecture Notes in Computer Science, Springer, 2021, pp. 63–76. doi:10.1007/
978-3-030-96140-4\_5.
[20] H. Chen, G. Cao, J. Chen, J. Ding, A practical framework for evaluating the quality of knowledge
graph, Communications in Computer and Information Science 1134 CCIS (2019) 111–122. doi:10.
1007/978-981-15-1956-7_10.
[21] J. Debattista, S. Auer, C. Lange, Luzzu—A Methodology and Framework for Linked Data Quality</p>
      <p>Assessment, Journal on Data and Information Quality (JDIQ) 8 (2016). doi:10.1145/2992786.
[22] D. Kontokostas, P. Westphal, S. Auer, S. Hellmann, J. Lehmann, R. Cornelissen, Databugger: a
testdriven framework for debugging the web of data, in: International World Wide Web Conference,
ACM, 2014, pp. 115–118. doi:10.1145/2567948.2577017.
[23] M. A. Pellegrino, A. Rula, G. Tuozzo, Kgheartbeat: An open source tool for periodically evaluating
the quality of knowledge graphs, in: The Semantic Web - International Semantic Web Conference,
volume 15233 of Lecture Notes in Computer Science, Springer, 2024, pp. 40–58. doi:10.1007/
978-3-031-77847-6\_3.
[24] C. Fürber, M. Hepp, SWIQA - A Semantic Web Information Quality Assessment Framework, in:</p>
      <p>European Conference on Information Systems, 2011, p. 76.
[25] W3C, Shapes Constraint Language (SHACL), https://www.w3.org/TR/shacl/, 2017. W3C
Recommendation.
[26] E. Prud’hommeaux, J. E. L. Gayo, H. R. Solbrig, Shape expressions: an RDF validation and
transformation language, in: Proceedings of the International Conference on Semantic Systems,
SEMANTiCS, ACM, 2014, pp. 32–40. doi:10.1145/2660517.2660523.
[27] K. Rabbani, M. Lissandrini, K. Hose, SHACL and shex in the wild: A community survey on
validating shapes generation and adoption, in: Companion of The Web Conference 2022, Virtual Event /
Lyon, France, April 25 - 29, 2022, ACM, 2022, pp. 260–263. doi:10.1145/3487553.3524253.
[28] B. Spahiu, A. Maurino, M. Palmonari, Towards Improving the Quality of Knowledge Graphs with
Data-driven Ontology Patterns and SHACL, in: Proceedings of the Workshop on Ontology Design
and Patterns (WOP) co-located with International Semantic Web Conference (ISWC), volume 2195
of CEUR Workshop Proceedings, CEUR-WS.org, 2018, pp. 52–66.
[29] K. Rabbani, M. Lissandrini, K. Hose, SHACTOR: Improving the Quality of Large-Scale Knowledge
Graphs with Validating Shapes, in: Companion of the International Conference on Management
of Data, SIGMOD/PODS, ACM, 2023, pp. 151–154. doi:10.1145/3555041.3589723.
[30] I. Boneva, J. Dusart, D. Fernández-Álvarez, J. E. L. Gayo, Shape Designer for ShEx and SHACL
constraints, in: Proceedings of the ISWC Satellite Tracks (Posters &amp; Demonstrations, Industry,
and Outrageous Ideas) co-located with International Semantic Web Conference (ISWC), volume
2456 of CEUR Workshop Proceedings, CEUR-WS.org, 2019, pp. 269–272.
[31] A. Cimmino, A. Fernández-Izquierdo, R. García-Castro, Astrea: Automatic generation of SHACL
shapes from ontologies, in: The Semantic Web - International Conference, ESWC, Proceedings,
volume 12123 of Lecture Notes in Computer Science, Springer, 2020, pp. 497–513. doi:10.1007/
2023. Accessed: 2025-07-29.
[52] T. Hartmann, B. Zapilko, J. Wackerow, K. Eckert, Validating RDF Data Quality Using Constraints
to Direct the Development of Constraint Languages, Proceedings - IEEE International Conference
on Semantic Computing (ICSC) (2016) 116–123. doi:10.1109/ICSC.2016.43.
[53] A. Kraft, R. Usbeck, The lifecycle of "facts": A survey of social bias in knowledge graphs, in:
Proceedings of the Conference of the Asia-Pacific Chapter of the Association for Computational
Linguistics and the International Joint Conference on Natural Language Processing, 2022, pp.
639–652. doi:10.18653/V1/2022.AACL-MAIN.49.
[54] S. Mohammed, L. Ehrlinger, H. Harmouch, F. Naumann, D. Srivastava, The Five Facets of Data
Quality Assessment, SIGMOD Record 54 (2025) 18–27.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Zaveri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Rula</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Maurino</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Pietrobon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lehmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Auer</surname>
          </string-name>
          ,
          <article-title>Quality assessment for Linked Data: A Survey, Semantic Web 7 (</article-title>
          <year>2012</year>
          )
          <fpage>63</fpage>
          -
          <lpage>93</lpage>
          . doi:
          <volume>10</volume>
          .3233/SW-150175.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>F. M.</given-names>
            <surname>Suchanek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Alam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Bonald</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Chen</surname>
          </string-name>
          , P. H. Paris, J. Soria,
          <article-title>Yago 4.5: A large and clean knowledge base with a rich taxonomy</article-title>
          ,
          <source>Proceedings of the International Conference on Information retrieval (SIGIR)</source>
          (
          <year>2024</year>
          )
          <fpage>131</fpage>
          -
          <lpage>140</lpage>
          . doi:
          <volume>10</volume>
          .1145/3626772.3657876.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <article-title>[3] Linked data quality of DBpedia, Freebase</article-title>
          , OpenCyc, Wikidata, and
          <string-name>
            <surname>YAGO</surname>
          </string-name>
          ,
          <source>Semantic Web</source>
          <volume>9</volume>
          (
          <year>2018</year>
          )
          <fpage>77</fpage>
          -
          <lpage>129</lpage>
          . doi:
          <volume>10</volume>
          .3233/SW-170275.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>A.</given-names>
            <surname>Singhal</surname>
          </string-name>
          ,
          <article-title>Introducing the knowledge graph: things, not strings</article-title>
          , https://blog.google/products/ search/introducing
          <article-title>-knowledge-graph-things-</article-title>
          <string-name>
            <surname>not</surname>
            <given-names>/</given-names>
          </string-name>
          ,
          <year>2012</year>
          . Accessed:
          <fpage>2025</fpage>
          -07-29.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A.</given-names>
            <surname>Developer</surname>
          </string-name>
          , Alexa entities reference, https://developer.amazon.com/en-US/docs/alexa/ custom-skills/alexa-entities-reference.html,
          <year>2023</year>
          . Accessed:
          <fpage>2025</fpage>
          -07-29.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>D.</given-names>
            <surname>Vrandečić</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Krötzsch</surname>
          </string-name>
          ,
          <article-title>Wikidata: A free collaborative knowledgebase</article-title>
          ,
          <source>Communications of the ACM</source>
          <volume>57</volume>
          (
          <year>2014</year>
          )
          <fpage>78</fpage>
          -
          <lpage>85</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>S.</given-names>
            <surname>Auer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Bizer</surname>
          </string-name>
          , G. Kobilarov,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lehmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Cyganiak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Ives</surname>
          </string-name>
          ,
          <article-title>Dbpedia: A nucleus for a web of open data</article-title>
          ,
          <source>in: The Semantic Web</source>
          , Springer,
          <year>2007</year>
          , pp.
          <fpage>722</fpage>
          -
          <lpage>735</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>F. M.</given-names>
            <surname>Suchanek</surname>
          </string-name>
          , G. Kasneci, G. Weikum,
          <article-title>Yago: A core of semantic knowledge</article-title>
          ,
          <source>in: Proceedings of the 16th international conference on World Wide Web, ACM</source>
          ,
          <year>2007</year>
          , pp.
          <fpage>697</fpage>
          -
          <lpage>706</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>G.</given-names>
            <surname>Tuozzo</surname>
          </string-name>
          ,
          <article-title>Navigating the lod subclouds: Assessing linked open data quality by domain</article-title>
          ,
          <source>in: Companion Proceedings of the ACM on Web Conference</source>
          , ACM,
          <year>2025</year>
          , pp.
          <fpage>2141</fpage>
          -
          <lpage>2148</lpage>
          . doi:
          <volume>10</volume>
          . 1145/3701716.3717569.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>A.</given-names>
            <surname>Hogan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Gutierrez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Cochez</surname>
          </string-name>
          , G. de Melo,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kirrane</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Polleres</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Navigli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.-C. N.</given-names>
            <surname>Ngomo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. M.</given-names>
            <surname>Rashid</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Schmelzeisen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Staab</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Blomqvist</surname>
          </string-name>
          , C. d'Amato,
          <string-name>
            <given-names>J. E. L.</given-names>
            <surname>Gayo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Neumaier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Rula</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Sequeda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Zimmermann</surname>
          </string-name>
          , Knowledge
          <string-name>
            <surname>Graphs</surname>
          </string-name>
          (
          <year>2022</year>
          ). doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>031</fpage>
          -01918-0.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>R.</given-names>
            <surname>Angles</surname>
          </string-name>
          ,
          <article-title>The Property Graph Database Model</article-title>
          ,
          <source>in: Proceedings of the 12th Alberto Mendelzon International Workshop on Foundations of Data Management</source>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>D.</given-names>
            <surname>Wood</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lanthaler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Cyganiak</surname>
          </string-name>
          ,
          <source>RDF 1.1 Concepts</source>
          and
          <string-name>
            <given-names>Abstract</given-names>
            <surname>Syntax</surname>
          </string-name>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>A.</given-names>
            <surname>Polleres</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Pernisch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bonifati</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Dell'Aglio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Dobriy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Dumbrava</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Etcheverry</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ferranti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Hose</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Jiménez-Ruiz</surname>
          </string-name>
          , et al.,
          <article-title>How does knowledge evolve in open knowledge graphs?</article-title>
          ,
          <source>Transactions on Graph Data and Knowledge</source>
          <volume>1</volume>
          (
          <year>2023</year>
          )
          <fpage>11</fpage>
          -
          <lpage>1</lpage>
          .
          <fpage>978</fpage>
          -3-
          <fpage>030</fpage>
          -49461-2\_
          <fpage>29</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [32]
          <string-name>
            <given-names>H. J.</given-names>
            <surname>Pandit</surname>
          </string-name>
          ,
          <string-name>
            <surname>D. O'Sullivan</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Lewis</surname>
          </string-name>
          ,
          <article-title>Using Ontology Design Patterns to Define SHACL Shapes</article-title>
          ,
          <source>in: Proceedings of the Workshop on Ontology Design Patterns</source>
          (
          <article-title>WOP) co-located with the International Semantic Web Conference</article-title>
          (ISWC),
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [33]
          <string-name>
            <given-names>X.</given-names>
            <surname>Duan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Chaves-Fraga</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Derom</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Dimou</surname>
          </string-name>
          ,
          <article-title>SCOOP All the Constraints' Flavours for Your Knowledge Graph</article-title>
          ,
          <source>in: Proceedings of the Extended Semantic Web Conference (ESWC)</source>
          , volume
          <volume>14665</volume>
          LNCS, Springer,
          <year>2024</year>
          , pp.
          <fpage>217</fpage>
          -
          <lpage>234</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>031</fpage>
          -60635-9_
          <fpage>13</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [34]
          <string-name>
            <given-names>D.</given-names>
            <surname>Fernandez-Álvarez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. E.</given-names>
            <surname>Labra-Gayo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Gayo-Avello</surname>
          </string-name>
          ,
          <article-title>Automatic extraction of shapes using sheXer, Knowledge-Based Systems 238 (</article-title>
          <year>2022</year>
          )
          <article-title>107975</article-title>
          . doi:
          <volume>10</volume>
          .1016/J.KNOSYS.
          <year>2021</year>
          .
          <volume>107975</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [35]
          <string-name>
            <given-names>X.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <article-title>A Method for Data Quality Validation Based on Shapes Constraint Language</article-title>
          , Proceedings - International
          <source>Conference on Big Data, Information and Computer Network</source>
          , BDICN (
          <year>2023</year>
          )
          <fpage>83</fpage>
          -
          <lpage>87</lpage>
          . doi:
          <volume>10</volume>
          .1109/BDICN58493.
          <year>2023</year>
          .
          <volume>00024</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [36]
          <string-name>
            <surname>M. J. Luthfi</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Darari</surname>
            ,
            <given-names>A. C.</given-names>
          </string-name>
          <string-name>
            <surname>Ashardian</surname>
          </string-name>
          ,
          <source>SoCK: SHACL on Completeness Knowledge, in: Proceedings of the Workshop on Ontology Design</source>
          and
          <article-title>Patterns (WOP) co-located with the International Semantic Web Conference (ISWC), CEUR-WS</article-title>
          .org,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [37]
          <string-name>
            <given-names>C.</given-names>
            <surname>Cortés</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Ehrlinger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Etcheverry</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Naumann</surname>
          </string-name>
          ,
          <source>Is SHACL Suitable for Data Quality Assessment?</source>
          ,
          <year>2025</year>
          . URL: https://arxiv.org/abs/2507.22305. arXiv:
          <volume>2507</volume>
          .
          <fpage>22305</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [38]
          <string-name>
            <given-names>P. Y.</given-names>
            <surname>Vandenbussche</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Umbrich</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Matteis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Hogan</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.</surname>
          </string-name>
          <article-title>Buil-Aranda, SPARQLES: Monitoring Public SPARQL Endpoints, Semantic Web 8 (</article-title>
          <year>2017</year>
          )
          <fpage>1049</fpage>
          -
          <lpage>1065</lpage>
          . doi:
          <volume>10</volume>
          .3233/SW-170254.
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [39]
          <string-name>
            <given-names>N.</given-names>
            <surname>Mihindukulasooriya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>García-Castro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gómez-Pérez</surname>
          </string-name>
          ,
          <article-title>LD Snifer: A Quality Assessment Tool for Measuring the Accessibility of Linked Data, in: Knowledge Engineering and Knowledge Management - EKAW Satellite Events, EKM and Drift-an-</article-title>
          <string-name>
            <surname>LOD</surname>
          </string-name>
          , volume
          <volume>10180</volume>
          of Lecture Notes in Computer Science, Springer,
          <year>2016</year>
          , pp.
          <fpage>149</fpage>
          -
          <lpage>152</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>319</fpage>
          -58694-6\_
          <fpage>20</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [40]
          <string-name>
            <given-names>A.</given-names>
            <surname>Langer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Siegert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Göpfert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Gaedke</surname>
          </string-name>
          ,
          <article-title>Semquire - assessing the data quality of linked open data sources based on DQV</article-title>
          , in: Current Trends in Web Engineering - ICWE International Workshops, MATWEP, EnWot, KD-WEB, WEOD, TourismKG, volume
          <volume>11153</volume>
          of Lecture Notes in Computer Science, Springer,
          <year>2018</year>
          , pp.
          <fpage>163</fpage>
          -
          <lpage>175</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>030</fpage>
          -03056-8\_
          <fpage>14</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [41]
          <string-name>
            <given-names>D.</given-names>
            <surname>Kontokostas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Zaveri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Auer</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. Lehmann,</surname>
          </string-name>
          <article-title>TripleCheckMate: A Tool for Crowdsourcing the Quality Assessment of Linked Data, in: Knowledge Engineering and</article-title>
          the Semantic Web - International Conference, volume
          <volume>394</volume>
          of Communications in Computer and Information Science, Springer,
          <year>2013</year>
          , pp.
          <fpage>265</fpage>
          -
          <lpage>272</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>642</fpage>
          -41360-5\_
          <fpage>22</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [42]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yamamoto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Yamaguchi</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          . Splendiani,
          <article-title>YummyData: providing high-quality open life science data</article-title>
          ,
          <source>Database J. Biol. Databases Curation</source>
          <year>2018</year>
          (
          <year>2018</year>
          ). doi:
          <volume>10</volume>
          .1093/DATABASE/BAY022.
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [43]
          <string-name>
            <given-names>E.</given-names>
            <surname>Ruckhaus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Vidal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Castillo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Burguillos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Baldizan</surname>
          </string-name>
          ,
          <article-title>Analyzing Linked Data Quality with LiQuate, in: The Semantic Web: ESWC Satellite Events</article-title>
          , volume
          <volume>8798</volume>
          of Lecture Notes in Computer Science, Springer,
          <year>2014</year>
          , pp.
          <fpage>488</fpage>
          -
          <lpage>493</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>319</fpage>
          -11955-7\_
          <fpage>72</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [44]
          <string-name>
            <given-names>D.</given-names>
            <surname>Pizhuk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Ehrlinger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Denk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Geist</surname>
          </string-name>
          ,
          <article-title>A data quality dashboard for (security) knowledge graphs, in: Datenbanksysteme für Business, Technologie und Web (BTW), Fachtagung des GIFachbereichs, Datenbanken und Informationssysteme" (DBIS), Proceedings</article-title>
          , volume P-361
          <string-name>
            <surname>of</surname>
            <given-names>LNI</given-names>
          </string-name>
          , Gesellschaft für Informatik e.V.,
          <year>2025</year>
          , pp.
          <fpage>803</fpage>
          -
          <lpage>810</lpage>
          . doi:
          <volume>10</volume>
          .18420/BTW2025-45.
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [45]
          <string-name>
            <given-names>A.</given-names>
            <surname>Flemming</surname>
          </string-name>
          ,
          <article-title>Qualitätsmerkmale von Linked Data-veröfentlichenden Datenquellen, Master's thesis</article-title>
          , Universität Leipzig,
          <year>2011</year>
          .
          <article-title>Diplomarbeit (Quality Criteria for Linked Data Sources)</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [46]
          <string-name>
            <given-names>D.</given-names>
            <surname>Berrueta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Phipps</surname>
          </string-name>
          ,
          <article-title>Cool URIs for the Semantic Web</article-title>
          ,
          <source>W3C Recommendation, W3C</source>
          ,
          <year>2008</year>
          . Accessed:
          <fpage>2025</fpage>
          -07-24.
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [47]
          <string-name>
            <given-names>G. A.</given-names>
            <surname>Bernadette Hyland</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Villazón-Terrazas</surname>
          </string-name>
          ,
          <article-title>Best Practices for Publishing Linked Data</article-title>
          , W3C Working Group Note,
          <year>W3C</year>
          ,
          <year>2014</year>
          . Accessed:
          <fpage>2025</fpage>
          -07-24.
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [48]
          <string-name>
            <given-names>C.</given-names>
            <surname>Cortés</surname>
          </string-name>
          ,
          <article-title>SHACL-DQA-prototype</article-title>
          , https://github.com/HPI-Information-Systems/SHACL-DQA,
          <year>2025</year>
          . GitHub repository.
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [49]
          <string-name>
            <given-names>C.</given-names>
            <surname>Cortes</surname>
          </string-name>
          ,
          <article-title>Datasets used for SHACL-based Data Quality Assessment prototype</article-title>
          , https://doi.org/10. 5281/zenodo.16644385,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          [50] TopQuadrant, DASH - Data
          <string-name>
            <surname>Shapes</surname>
          </string-name>
          , https://datashapes.org/dash,
          <year>2024</year>
          . Accessed:
          <fpage>2025</fpage>
          -07-29.
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          [51]
          <string-name>
            <surname>W3C SHACL Community</surname>
            <given-names>Group</given-names>
          </string-name>
          , SHACL Advanced Features, https://www.w3.org/TR/shacl-af/,
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>