<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jindřich Mynarz</string-name>
          <email>mynarzjindrich@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kateřina Haniková</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Vojtěch Svátek</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Information and Knowledge Engineering, Prague University of Economics and Business</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>KGCW'23: 4th International Workshop on Knowledge Graph Construction</institution>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Nám. W. Churchilla 4</institution>
          ,
          <addr-line>130 67, Praha 3</addr-line>
          ,
          <country country="CZ">Czech Republic</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>We propose how to construct knowledge graphs using a method based on user requirements formulated as competency questions. It suggests to formalize the competency questions as SPARQL queries wrapped as SHACL constraints allowing them to be used as automated tests. The method defines a process to guide knowledge graph construction by feedback from tests. It aims to reduce the engineering efort required to construct a knowledge graph that meets the requirements, while assessing the quality of semantic artifacts produced in this efort. We intend the method to provide a solid engineering basis for knowledge graph construction that uses the lessons learnt from software development and is built on open standards. We demonstrate using the method to construct a knowledge graph about antigen covid tests and reflect on the proposed method.</p>
      </abstract>
      <kwd-group>
        <kwd>testing</kwd>
        <kwd>knowledge graphs</kwd>
        <kwd>competency questions</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Knowledge graphs use graph-based data models to capture knowledge commonly combined from
large and diverse data sources [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Constructing knowledge graphs can be an intricate and
openended task that is in many ways an art rather than a craft. 1 While there are resources covering
how to build knowledge graphs, there is little on what to build: how to elicit, formulate, and
validate requirements for a knowledge graph. Overall, there is a shortage of solid engineering
practices guiding through this task.
      </p>
      <p>
        We propose how to construct knowledge graphs using a standards-based method founded
upon user requirements formulated as competency questions (CQs). It suggests to formalize
the CQs as SPARQL queries [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] and wrap them as SHACL constraints [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] in order to allow
them to be executed as automated tests. The method defines a process guiding knowledge
graph construction that is based on feedback from tests. It aims to reduce the efort required
to construct a knowledge graph that meets its requirements and passes quality assessment of
semantic artifacts, such as ontologies, that are made in the process.
CEUR
Workshop
Proceedings
1The same argument was made about designing ontologies by Soldatova et al. [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Motivation</title>
      <p>Following a method for knowledge graph construction ofers several benefits. In general,
a method helps decide what to do next. In particular, it helps to overcome the blank canvas
paralysis at the beginning and jump-start the work. Moreover, a method breaks down the
efort required for knowledge graph construction into sub-tasks, which can help organize and
coordinate a team working on them.</p>
      <p>The direction provided by a method anchored in user requirements can reduce the unfocused
upfront efort and help avoid over-engineering. For example, it discourages needless efort
spent on achieving a lossless transformation that preserves all source data in the produced
knowledge graph even though it might not be needed. It can help avoid premature abstraction
and premature optimization, such as for performance or readability. Thanks to the tests checking
if the user requirements are satisfied, we get an early proof of value instead of speculating
about it. Additionally, the explicit links between requirements and tests allow to assess the
requirements traceability and coverage.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Related Work</title>
      <p>
        When outlining this method, we can learn from ontology engineering, as it has a head start of
several decades on knowledge graphs. There is a long tradition of using competency questions
as requirements for ontologies [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Much of this experience can be reused, since ontologies
serve as essential building blocks of knowledge graphs imbuing them with explicit semantics.
Knowledge graphs in turn can be considered as ontologies populated with data. Creating tests
for ontologies out of CQs is also nothing new [
        <xref ref-type="bibr" rid="ref6 ref7">6, 7</xref>
        ]. More recently, it was adopted for knowledge
graphs too [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
      </p>
      <p>
        Broader still, we can adopt the practices established in software engineering. The hereby
presented method borrows from the test-driven development cycle [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], which is commonly
characterized by the following steps:
• Add a test
• Run all tests, expecting the new test to fail
• Write the simplest code that passes the new test
• All tests should pass
• Refactor as needed
      </p>
      <p>
        A fundamental feature of this cycle is that it intertwines development with testing, so that tests
exercise the developed artifacts and provide feedback informing further work. Tests provide a
controlled way to evolve the artifacts under construction in response to changing requirements,
such as when their scope is extended or when they are refined iteratively. The field of ontology
engineering is already adopting agile development approaches using tests. Ontology-specific
methods, such as SAMOD [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] or Linked Open Terms [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], are being proposed.
      </p>
      <p>In summary, the hereby described method borrows approaches either from ontology
engineering or software development in general and proposes a novel way how to combine them for
knowledge graph construction. The following text characterizes the method and demonstrates
its use for building a knowledge graph about antigen covid tests.</p>
      <sec id="sec-3-1">
        <title>User requirements</title>
      </sec>
      <sec id="sec-3-2">
        <title>Competency questions</title>
      </sec>
      <sec id="sec-3-3">
        <title>SPARQL queries formulated as translate to</title>
      </sec>
      <sec id="sec-3-4">
        <title>SHACL</title>
        <p>constraints
Knowledge
graph
validate
executed
by
induced
from
extracted from</p>
      </sec>
      <sec id="sec-3-5">
        <title>Ontology</title>
        <p>terms
uses
combine
expressed
by</p>
      </sec>
      <sec id="sec-3-6">
        <title>Source</title>
        <p>data
is input to</p>
      </sec>
      <sec id="sec-3-7">
        <title>Transformation to RDF produces</title>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Method</title>
      <p>
        We present a method for test-driven knowledge graph construction. The method proposes
a sequence of steps and feedback loops between them. Each step produces or consumes one or
more artifacts, such as ontologies or SPARQL queries. An overview of the relations between
the key artifacts used by the method is depicted in Figure 1. The method involves the following
steps:
1. Start by using knowledge elicitation techniques [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] to gather user requirements. User
requirements can be gathered from subject-matter experts or prospective users of the
knowledge graph. Formulate the user requirements as CQs [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. CQs are analytical
questions that the target knowledge graph is expected to answer. In order to avoid
misinterpretation, they should be reviewed by those who expressed the requirements
the CQs capture.
2. Analyze the CQs to extract terms and relations. CQs can be analyzed using the linguistic
notion of presupposition, which can be defined as ”a condition that must be met for
a linguistic expression to have a denotation” [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Viewed this way, CQs presuppose the
terms with which they can be expressed. The terms to extract from CQs are typically
nouns or noun phrases that refer to entities from the domain of the knowledge graph under
construction. They can be considered lexical surface forms of these entities. The terms
can be identified manually or with the help of tools, such as part-of-speech tagging.
The relations to extract might be represented by verbs connecting the terms co-occurring
in a CQ. Less direct mappings between entities and surface forms are possible too, such
as when a surface form provides the context disambiguating an entity.
3. Formalize the terms and relations in a minimum viable ontology using RDF Schema [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ].
      </p>
      <p>
        While terms typically map to classes (i.e. instances of r d f s : C l a s s ), relations map either
to properties (i.e. instances of r d f : P r o p e r t y ) or n-ary relations represented by classes.
Document the ontology with definitions sourced from and validated by subject-matter
experts to create a shared understanding. The ontology should make minimal ontological
commitment to aid its evolution, so limit its upfront axiomatization, such as by using
OWL restrictions. The resulting ontology shall serve as an explicit and machine-readable
conceptualization of the knowledge graph’s domain. Formalizing the ontology may in turn
reveal ambiguity in the CQs. Ambiguous CQs cannot be formalized reliably since their
misinterpretations may occur. In such cases, the sources of ambiguity should be revised
with subject-matter experts while aiming to reformulate the CQs in an unambiguous way.
4. Translate the CQs to SPARQL queries expressed by the ontology. SPARQL queries make
the CQs executable. Here, we expect a manual translation of the CQs to SPARQL queries.
Automated translation was attempted by others, such as in [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. Start with syntactically
valid SPARQL queries of hypothetical data expressed by the ontology. A CQ may translate
to a SPARQL graph pattern composed of triple patterns joining the terms co-occurring in
the question via specific relations. The way CQs should be translated to SPARQL queries
largely depends on their expected answers. Ren et al. [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] suggest that we might be ”more
interested in checking if a CQ can be meaningfully answered, instead of directly answering
a CQ.” When using such existential quantification of the desired answers, a SPARQL
query formalizing a CQ can be expected to return non-empty results. What it means is
that the query can answer a CQ using a given knowledge graph. It does not verify if
the answer is correct. Conversely, some CQs may describe universal invariants of their
domain. Queries encoding these invariants are akin to property-based testing [15]. In
case stricter guarantees are needed, the expected correct answers can be included in the
queries and validation constraints in SHACL implementing the CQs, corresponding to
the usual example-based testing.
5. Wrap the queries as SPARQL-based SHACL constraints to automate their execution.
      </p>
      <p>SHACL defines a target 2 of each constraint. In our context, a target sets the scope of a
CQ. The scope of the data graph validated by a given constraint may either encompass
the entire knowledge graph or cover its subset. For instance, existentially quantified
queries checking the complete data graph can target it by using s h : t a r g e t N o d e [ ] . It is
also possible to rewrite a part of a SPARQL query translating a CQ as target selection in
SHACL. Alternatively, the prerequisite part of the query can be represented as a
SPARQLbased target [16]. An entity-centric partitioning of a given knowledge graph can be
implemented by extracting concise bounded descriptions [17] of the targeted entities.
Some data sources of knowledge graphs may natively partition data to independent
subsets, such as API responses or messages from a queue, that can be efectively validated
by specific subsets of SHACL constraints. SHACL can also define the passing criteria of
SPARQL-based constraints. An s h : a s k query is expected to return the boolean t r u e , while</p>
      <sec id="sec-4-1">
        <title>2https://www.w3.org/TR/shacl/#targets</title>
        <p>any results produced by an s h : s e l e c t query are treated as violations. Specific expected
results can be either hard-coded into the queries or specified via s h : h a s V a l u e in case
a single value is expected. Note however, that when including the expected results in
SPARQL-based constraints, we may run into the limitations of SHACL due to pre-binding
of variables.3 For example, the V A L U E S clause, which could represent the expected results
of S E L E C T queries, is not allowed.
6. Develop a transformation of the source data to the target knowledge graph that is expected
to meet the constraints. This can be implemented in many ways, largely dependent on
the format of the source data, so we will not cover it here. For an overview of these
approaches, see e.g., Fensel et al. [18]
7. Validate the knowledge graph under development with the SHACL constraints. If the
validation fails, resolve the reported violations by fixing the artifacts created in the
previous steps. Fix the most primary artifact causing the violations, i.e. their root cause.
For instance, data transformation shall not work around an insuficiently expressive
ontology. Similarly, imposing a more expressive data model on the knowledge graph
might reveal errors in its source data. In this way, the finer structure of its ontology
makes the previously hidden data quality issues visible. Some feedback may even indicate
ill-formed CQs in need of revision.
8. Refactor the artifacts. The SHACL specification of the knowledge graph is a key artifact
to refactor. It can serve as a formal specification of the anticipated use of the ontology.
Moreover, SHACL can transcend a single ontology and specify how to combine terms
reused from multiple ontologies. During the refactoring, SHACL core constraints can be
induced from SPARQL-based constraints to represent the frequently queried patterns in
data. For instance, aided by the knowledge about the domain, you can infer universally
quantified axioms represented as SHACL constraints from a sample of existentially
quantified axioms represented as SPARQL queries. In particular, SHACL can prescribe
the expected relations in data. More complex or distant relations might even require the
expressivity of SHACL to be represented, using property paths or SPARQL graph patterns.
For example, even though SHACL does not support conditional constraints directly, they
can be rewritten according to the inference rule of material implication  →  ⇔ ¬ ∨  .</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Non-functional Requirements</title>
      <p>Note that the above-described method tests if a knowledge graph meets the given user
requirements. It does not evaluate how well these requirements are satisfied. Absence of errors does
not imply that the knowledge graph is sound. Therefore, the functional user requirements
should be complemented by non-functional requirements describing the desired qualities of the
solution. These requirements can be checked and refactored by using alternative sources of
feedback, such as:
Code review</p>
      <p>Elicit expert insight from ontology and data engineers.</p>
      <sec id="sec-5-1">
        <title>3https://www.w3.org/TR/shacl/#pre-binding</title>
        <p>User feedback</p>
        <p>The results produced by SPARQL queries formalizing the CQs can be judged to be incorrect
by users. User feedback can identify wrong interpretation of CQs or false assumptions.
It might prompt revisiting any of the previously described steps or lead to adding more
CQs.</p>
        <p>Usability testing</p>
        <p>Usability may manifest in developer experience of writing SPARQL queries on a
knowledge graph. In particular, complexity and verbosity of SPARQL queries may be considered.
In this way, usability of the queries indirectly reflects the usability of the ontology. For
example, verbosity may be caused by duplicate data that can be abstracted to the ontology,
while complexity can be reduced by introducing ontological shortcuts. Verbose SPARQL
queries may also suggest a need for introducing abstraction or additional axioms into the
ontology. In fact, some CQs can be answered directly using the knowledge formalized in
the ontology, since SHACL expects the validated data graph4 to include the ontologies it
populates.</p>
        <p>Performance testing</p>
        <p>Performance can be evaluated indirectly using the execution time of SHACL validation.
This time is proportional to the execution times of the SPARQL queries the constraints
include.</p>
        <p>The non-functional requirements can also be covered by tests, such as query performance
tests. These tests can provide additional feedback informing the test-driven development. Using
the above-mentioned feedback is a balancing act of multi-objective optimization. For example,
performance feedback may be incompatible with the feedback about verbosity or CQs may pose
mutually conflicting requirements. Addressing the feedback thus requires making informed
trade-ofs.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Limitations</title>
      <p>We are aware of several limitations of the proposed method. Here we recount some of them and
suggest how to remedy them. The method may cause the developed artifacts outlined in the
Figure 1 to overfit the user requirements in scope, which would hinder the reuse of the artifacts.
This shortcoming can be remedied by including more CQs or focusing on reusability and
abstraction during refactoring. User requirements for the knowledge graph under construction
can be more complex than what SPARQL can express. One way around it is formalizing the
requirements in a more expressive programming language that extends SHACL, such as in [19].
Ultimately, in order to ameliorate the limitations of this method, it is best combined with other
methods, such as those for ontology design (e.g., Kendall and McGuinness [20]).</p>
      <sec id="sec-6-1">
        <title>4https://www.w3.org/TR/shacl/#data-graph</title>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>7. Case Study: Antigen Covid Tests Knowledge Graph</title>
      <p>We used the described method to create a knowledge graph about antigen tests for SARS-CoV-2.</p>
      <p>Its source data was originally gathered to create an overview of diferent evaluations of
selected rapid antigen tests available on the Czech market. Data analyses were carried out [21]
in order to validate the assumption that the claims on test quality provided by the manufacturers
difer significantly from what is verified by the laboratories of independent organizations.</p>
      <p>The data was collected manually from several regulatory bodies, such as SÚKL5 and EU
HSC6, and was combined with performance evaluations of the tests, such as their sensitivity and
specificity, that came from independent studies from various countries. The data was collected
into an XML file. The final destination of the data was originally twofold. One was the actual
portal that provided a graphical visualization of various properties of the tests.7 The other was
a dashboard allowing to perform analyses online. While the XML data was already structured,
it lacked any ontological grounding, and some of its features were tuned towards presentation
aspects within the website.</p>
      <p>We set to create a knowledge graph out of it to open it to a wider reuse and allow performing
retrospective analyses of this data. All artifacts we developed in this efort, such as CQs, are
available as open source8 and the knowledge graph is released.9</p>
      <p>Understanding the source data and its domain was a key prerequisite of our work. The data
covers 158 antigen SARS-CoV-2 tests and their evaluations from several sources. Each source
had its way of describing the domain, which we needed to comprehend first to represent the
source’s data faithfully and make it commensurable. We also needed to become aware of the
purpose for which the data was collected and how it was represented to serve this purpose.
The data was originally structured in XML for display on a web page, so we needed to map
its document-oriented structure and visual encoding into semantics. Knowing the source data
well, we could begin with the knowledge graph construction.</p>
      <p>Step 1: The first step of the proposed method is gathering user requirements. We started
with capturing user requirements formulated as CQs, such as ”What is the sensitivity of given
tests according to their manufacturers?” We came up with 19 CQs in total, covering both basic
information look-up and more complex analytical questions. We removed one of the CQs later
upon finding that it was subsumed by another CQ. The complete list of the CQs is available
online.10</p>
      <p>Step 2: We continued with analysis of the CQs and extraction of terms and relations, such as
”sensitivity”, ”test”, or ”has manufacturer” from the given example question. Since we did not
have many CQs, we extracted the terms and relations manually.</p>
      <p>Step 3: The third step was to create a minimum viable ontology from the extracted terms and
relations, which we did mostly by reusing existing ontology terms and using RDF Schema; see
5https://www.sukl.cz/prehled-testu-k-diagnostice-onemocneni-covid-19
6https://health.ec.europa.eu/health-security-and-infectious-diseases/crisis-management/covid-19-diagnostic-tests_
en
7https://covidtesty.vse.cz/english/test-evaluation-older-data/
8https://github.com/KIZI/antigen-covid-tests-knowledge-graph
9https://github.com/KIZI/antigen-covid-tests-knowledge-graph/releases/tag/v1.0.0
10https://github.com/KIZI/antigen-covid-tests-knowledge-graph/wiki/Competency-questions
Listing 1. For what we did not find suitable terms to reuse, we formalised terms in the simple
Antigen Covid Test Ontology. This ontology aimed to allow expressing the CQs in SPARQL.</p>
      <p>Listing 1: Example ontology terms
:DiagnosticSensitivity a rdfs:Class ;</p>
      <p>rdfs:label ”Diagnostic sensitivity”@en .
:AntigenCovidTest a rdfs:Class ;</p>
      <p>rdfs:label ”Antigen covid test”@en .
:hasManufacturer a rdf:Property ;
rdfs:label ”Has manufacturer”@en .</p>
      <p>Step 4: Having a minimum viable ontology, we were able to translate the CQs into SPARQL
queries, such as in the example Listing 2. Each question was translated manually.</p>
      <sec id="sec-7-1">
        <title>Listing 2: Example competency question in SPARQL</title>
        <p>PREFIX act: &lt;https://covidtesty.vse.cz/vocabulary#&gt;
PREFIX dcterms: &lt;http://purl.org/dc/terms/&gt;
PREFIX ncit: &lt;http://purl.obolibrary.org/obo/NCIT_&gt;
PREFIX rdf: &lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#&gt;
PREFIX schema: &lt;http://schema.org/&gt;
ASK {
?test a act:AntigenCovidTest ;</p>
        <p>schema:manufacturer ?manufacturer .
}</p>
        <p>Step 5: We wrapped the SPARQL queries as SHACL constraints, such as in the example
Listing 3 that defines a SPARQL-based constraint component 11 applied to the entire data graph.
The s h : m e s s a g e captures the CQ in natural language, which helps to identify which question
failed to validate in the validation reports. The s h : a s k contains the CQ translated to a SPARQL
query prepared in the previous step.</p>
        <p>Listing 3: Example competency question in SHACL
:DatasetShape a sh:NodeShape ;
rdfs:label ”Competency questions applicable to the entire dataset”@en ;
sh:targetNode [] ;
:cq12 [] .
11https://www.w3.org/TR/shacl/#sparql-constraint-components
:CQ12 a sh:ConstraintComponent ;
rdfs:label ”Manufacturer-declared test sensitivity”@en ;
sh:parameter [</p>
        <p>sh:path :cq12
] ;
sh:nodeValidator [
a sh:SPARQLAskValidator ;
sh:message ”””What is the sensitivity of given tests</p>
        <p>according to their manufacturers?”””@en ;
sh:prefixes :prefixes ;
sh:ask ”””
ASK {
?test a act:AntigenCovidTest ;</p>
        <p>schema:manufacturer ?manufacturer .</p>
        <p>Step 6: We started with implementing a transformation of the source data to RDF. We
converted the input XML data into RDF/XML via an XSL transformation followed by SPARQL
Update operations for post-processing. For example, we used post-processing for normalization
of code-list values and manufacturers’ names. We also implemented few traditional unit tests
of XSL functions within the transformation.</p>
        <p>Step 7: The resulting data was tested by the CQs implemented in SHACL. We automated the
data processing and test execution by a shell script based on Jena command-line tools.12 Given
that this is a small knowledge graph of around 10 thousand RDF triples, we validated it as a
whole.</p>
        <p>Step 8: The final step of the proposed method is about refactoring the developed artifacts.
Our development work proceeded in several iterations guided by continuous feedback from the
script for data transformation and validation. When the knowledge graph satisfied the CQs, we
continued with refactoring the semantic artifacts we created. For example, we wanted to avoid
blank nodes to make the data transformation results better comparable, so we improved the
XSL transformation to generate IRIs instead. We also abstracted a parent class for evaluations
in the ontology and adjusted the SHACL shapes accordingly.</p>
        <p>Some errors were not detected by the tests implementing the CQs. For example, when
querying the knowledge graph, we found that the data transformation to RDF mistakenly
created IRIs of manufacturers based on their antigen covid test identifiers. Consequently, the
12https://jena.apache.org/documentation/tools
resulting data related each manufacturer to one antigen covid test, so that it was not possible to
group multiple tests by the same manufacturer. A common response to discovering errors not
covered by tests is to improve the test coverage to enable detecting the errors found and avoid
future regressions. We followed this practice and added a CQ comparing a given manufacturer’s
antigen covid tests. The CQ presupposed that there are manufacturers ofering more than one
test and would therefore fail in case of the above-described error. Apart from fixing the way
we generated manufacturers’ IRIs, we spent further efort on de-duplicating manufacturers via
post-processing by SPARQL Update operations.</p>
        <p>While we had CQs about manufacturers’ tests, the implementations of these CQs generally
asked if manufacturers’ tests matching certain criteria exist. Being encoded as SPARQL ASK
queries, any non-empty query results satisfied these CQs. They did not detect when the results
were not the expected ones, such as when not showing all manufacturer’s tests in the example
of duplicate manufacturers above. We addressed this limitation by asking a more specific CQ
presupposing more specific assertions. The more specific assertions we make about our domain,
the better we can detect invalid data describing it. Yet when the domain changes or when
our understanding of the domain is flawed, such assertions are more likely to become invalid.
Consequently, there is a trade-of to be made between an assertion’s specificity and its durability
in face of change.</p>
        <p>We encountered several other challenges during the development of the knowledge graph,
such as handling the structure of the source data originally designed for a web presentation
or frequent changes in relevant legislation and evaluation studies. Since the source data
was collected manually, it was inconsistent and required fixes. Some of these inconsistencies
manifested as duplicates and were detected by our test suite. Structuring the data as a knowledge
graph and more rigorous testing thus helped to make the errors visible. Future work on this
knowledge graph can be aimed at automatic updates of the data and expanding its coverage
beyond the Czech market for antigen covid tests.</p>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>8. Conclusions</title>
      <p>The hereby proposed method guides through the open-ended process of knowledge graph
construction. It breaks the process down into a well-defined sequence of steps, feedback loops
between them, and semantic artifacts that are produced or consumed in the process. The central
contribution of the method is allowing to test if the produced knowledge graph satisfies the
requirements for its construction. This is done via competency questions formulated as SPARQL
queries embedded in SHACL shapes for test automation. As such, one of its key advantages is
being based on the established semantic web standards.</p>
      <p>We shared the experience of applying the method to build a knowledge graph about antigen
covid tests. The method aided in collaborative development of this knowledge graph, allowing
its incremental refinement without regressions. Since constructing knowledge graphs from
the same user requirements using multiple methods, while comparing their outcomes, is
prohibitively expensive, we expect future improvements of the proposed method to come from its
applications on knowledge graphs with diferent requirements.</p>
    </sec>
    <sec id="sec-9">
      <title>Acknowledgments</title>
      <p>This research was supported by CHIST-ERA within the CIMPLE project
(CHIST-ERA-19-XAI003).
Ekaputra (Eds.), Knowledge Engineering and Knowledge Management, Springer, Cham,
2022, pp. 19–35.
[15] G. Fink, M. Bishop, Property-based testing: A new approach to testing for assurance,
SIGSOFT Software Engineering Notes 22 (1997) 74––80. URL: https://doi.org/10.1145/
263244.263267. doi:1 0 . 1 1 4 5 / 2 6 3 2 4 4 . 2 6 3 2 6 7 .
[16] SHACL-AF, SHACL advanced features, 2017. URL: https://www.w3.org/TR/shacl-af.
[17] P. Stickler, CBD, 2005. URL: https://www.w3.org/Submission/CBD.
[18] D. Fensel, U. Şimşek, K. Angele, E. Huaman, E. Kärle, O. Panasiuk, I. Toma, J. Umbrich,
A. Wahler, How to Build a Knowledge Graph, Springer, Cham, 2020, pp. 11–68. URL:
https://doi.org/10.1007/978-3-030-37439-6_2. doi:1 0 . 1 0 0 7 / 9 7 8 - 3 - 0 3 0 - 3 7 4 3 9 - 6 _ 2 .
[19] SHACL-js, SHACL JavaScript extensions, 2017. URL: https://www.w3.org/TR/shacl-js.
[20] E. F. Kendall, D. L. McGuinness, Ontology engineering, volume 18 of Synthesis lecture on
the semantic web: theory and technology, Morgan and Claypool, 2019.
[21] T. Kliegr, J. Jarkovský, H. Jiřincová, J. Kuchař, T. Karel, R. Tachezy, Role of population and
test characteristics in antigen-based SARS-CoV-2 diagnosis, czechia, august to november
2021, Euro Surveillance 27 (2022). URL: https://doi.org/10.2807/1560-7917.ES.2022.27.33.
2200070.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Hogan</surname>
          </string-name>
          , E. Blomqvist,
          <string-name>
            <given-names>M.</given-names>
            <surname>Cochez</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.</surname>
          </string-name>
          <article-title>D'amato</article-title>
          , G. D.
          <string-name>
            <surname>Melo</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Gutierrez</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Kirrane</surname>
            ,
            <given-names>J. E. L.</given-names>
          </string-name>
          <string-name>
            <surname>Gayo</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Navigli</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Neumaier</surname>
            ,
            <given-names>A.-C. N.</given-names>
          </string-name>
          <string-name>
            <surname>Ngomo</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Polleres</surname>
            ,
            <given-names>S. M.</given-names>
          </string-name>
          <string-name>
            <surname>Rashid</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Rula</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Schmelzeisen</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Sequeda</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Staab</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Zimmermann</surname>
          </string-name>
          ,
          <article-title>Knowledge graphs</article-title>
          ,
          <source>ACM Computing Survey</source>
          <volume>54</volume>
          (
          <year>2021</year>
          ). URL: https://doi.org/10.1145/3447772.
          <source>doi:1 0 . 1 1</source>
          <volume>4 5 / 3 4 4 7 7 7 2 .</volume>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>L.</given-names>
            <surname>Soldatova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Panov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Džeroski</surname>
          </string-name>
          ,
          <article-title>Ontology engineering: From an art to a craft</article-title>
          , in: V.
          <string-name>
            <surname>Tamma</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Dragoni</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Gonçalves</surname>
            ,
            <given-names>A</given-names>
          </string-name>
          . Ławrynowicz (Eds.), Ontology Engineering, Springer, Cham,
          <year>2016</year>
          , pp.
          <fpage>174</fpage>
          -
          <lpage>181</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <article-title>[3] SPARQL, SPARQL 1.1 query language</article-title>
          ,
          <year>2013</year>
          . URL: http://www.w3.org/TR/sparql11-query.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>SHACL</given-names>
            <surname>,</surname>
          </string-name>
          <article-title>Shapes constraint language (SHACL</article-title>
          ),
          <year>2017</year>
          . URL: https://www.w3.org/TR/shacl.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>M.</given-names>
            <surname>Gruninger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. S.</given-names>
            <surname>Fox</surname>
          </string-name>
          ,
          <article-title>The design and evaluation of ontologies for enterprise engineering</article-title>
          , in: Workshop on Implemented Ontologies,
          <source>European Conference on Artificial Intelligence (ECAI)</source>
          ,
          <year>1994</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Ren</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Parvizi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Mellish</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. Z.</given-names>
            <surname>Pan</surname>
          </string-name>
          ,
          <string-name>
            <surname>K. van Deemter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Stevens</surname>
          </string-name>
          ,
          <article-title>Towards competency question-driven ontology authoring</article-title>
          , in: V.
          <string-name>
            <surname>Presutti</surname>
          </string-name>
          , C.
          <string-name>
            <surname>d'Amato</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Gandon</surname>
            , M. d'Aquin,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Staab</surname>
            ,
            <given-names>A</given-names>
          </string-name>
          . Tordai (Eds.),
          <source>The Semantic Web: Trends and Challenges</source>
          , Springer, Cham,
          <year>2014</year>
          , pp.
          <fpage>752</fpage>
          -
          <lpage>767</lpage>
          . URL: https://link.springer.com/content/pdf/10.1007/978-3-
          <fpage>319</fpage>
          -07443-6_
          <fpage>50</fpage>
          .pdf.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>L.</given-names>
            <surname>Zemmouchi-Ghomari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. R.</given-names>
            <surname>Ghomari</surname>
          </string-name>
          ,
          <article-title>Translating natural language competency questions into SPARQL queries: a case study</article-title>
          ,
          <source>in: The First International Conference on Building and Exploring Web Based Environments</source>
          ,
          <year>2013</year>
          , pp.
          <fpage>81</fpage>
          -
          <lpage>86</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>J. Z.</given-names>
            <surname>Pan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Matentzoglu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Jay</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Vigo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <source>Understanding Author Intentions: Test Driven Knowledge Graph Construction</source>
          , Springer, Cham,
          <year>2017</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>26</lpage>
          . URL: https: //doi.org/10.1007/978-3-
          <fpage>319</fpage>
          -49493-
          <issue>7</issue>
          _1.
          <source>doi:1 0 . 1 0</source>
          <volume>0 7 / 9 7 8 - 3 - 3 1 9 - 4 9 4 9 3 - 7</volume>
          _
          <fpage>1</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>K.</given-names>
            <surname>Beck</surname>
          </string-name>
          ,
          <article-title>Test-driven development: by example</article-title>
          ,
          <source>Addison-Wesley</source>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>S.</given-names>
            <surname>Peroni</surname>
          </string-name>
          ,
          <article-title>A simplified agile methodology for ontology development</article-title>
          ,
          <source>in: OWL: Experiences and directions - reasoner evaluation</source>
          , Springer, Cham,
          <year>2016</year>
          , pp.
          <fpage>55</fpage>
          -
          <lpage>69</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>M.</given-names>
            <surname>Poveda-Villalón</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Fernández-Izquierdo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Fernández-López</surname>
          </string-name>
          ,
          <string-name>
            <surname>R.</surname>
          </string-name>
          <article-title>García-Castro, LOT: An industrial oriented ontology engineering framework</article-title>
          ,
          <source>Engineering Applications of Artificial Intelligence</source>
          <volume>111</volume>
          (
          <year>2022</year>
          ). URL: https://www.sciencedirect.com/science/article/pii/ S0952197622000525. doi:h t t p s : / / d o i .
          <source>o r g / 1 0 . 1 0</source>
          <volume>1 6</volume>
          / j . e n g a p p
          <source>a i . 2 0</source>
          <volume>2 2 . 1 0 4 7 5 5 .</volume>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>N. R.</given-names>
            <surname>Shadbolt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. R.</given-names>
            <surname>Smart</surname>
          </string-name>
          , Knowledge elicitation, 4 ed., CRC Press, Boca Raton,
          <year>2015</year>
          , pp.
          <fpage>163</fpage>
          -
          <lpage>200</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <source>[13] RDFS, RDF Schema 1.1</source>
          ,
          <year>2014</year>
          . URL: https://www.w3.org/TR/rdf-schema.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>P.</given-names>
            <surname>Espinoza-Arias</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Garijo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Corcho</surname>
          </string-name>
          ,
          <article-title>Extending ontology engineering practices to facilitate application development</article-title>
          , in: O.
          <string-name>
            <surname>Corcho</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Hollink</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          <string-name>
            <surname>Kutz</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Troquard</surname>
            ,
            <given-names>F. J.</given-names>
          </string-name>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>