<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Portorož, SLO
£ C.Debruyne@uliege.be (C. Debruyne)
Ȉ</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>A Protocol for KG Construction Tasks Involving Users</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Ademar Crotti Junior</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Christophe Debruyne</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Montefiore Institute, University of Liège</institution>
          ,
          <country country="BE">Belgium</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0003</lpage>
      <abstract>
        <p>Knowledge graph construction (KGC) from (semi-)structured data is challenging, and facilitating user involvement is an issue frequently brought up within this community. We cannot deny the progress we have made with respect to (declarative) knowledge graph construction languages and tools to help build such mappings. However, it is surprising that no two studies report on similar protocols. This heterogeneity does not allow for comparing KGC languages, techniques, and tools. This paper first analyses studies involving users to identify the points of comparison. These gaps include a lack of systematic consistency in task design, participant selection, and evaluation metrics. Moreover, there needs to be a systematic way of analyzing the data and reporting the findings, which is also lacking. We thus propose and introduce a user protocol for KGC designed to address this challenge. Where possible, we draw and take elements from the literature we deem fit for such a protocol. The protocol, as such, allows for the comparison of languages and techniques for the RDF Mapping Language (RML) core functionality, which is covered by most of the other state-of-the-art techniques and tools. We also propose how the protocol can be amended to compare extensions (of RML). This protocol provides an important step towards a more comparable evaluation of KGC user studies.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;KG Construction</kwd>
        <kwd>User studies</kwd>
        <kwd>Research Methods</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>We preach to the choir that knowledge graphs are essential for meaningfully organizing and
representing information in various domains. However, as knowledge graphs grow in complexity, eficient
methods for their construction (or generation) are crucial. When dealing with the challenges of
(semi)structured data sources, such as the lack of explicit semantics, which need to be aligned with ontologies
or vocabularies, creating such mappings becomes a knowledge engineering task where user involvement
is crucial. Users bring the necessary domain expertise to ensure the mappings are appropriate.</p>
      <p>Scholars have systematically analyzed the functionalities of Knowledge Graph Construction (KGC)
tools and proposed benchmarks to analyze their behavior in diferent settings and their memory and
CPU usage. It is thus surprising that user involvement and the perception of the user using the
languages, tools, etc., have yet to be studied in such detail. Conducting such a study for all languages and
tools is an infeasible undertaking for one group, but what is feasible is putting forward a protocol that
scholars in the domain should adopt to report on user studies. This paper aims to achieve this goal by
proposing a user study protocol for KGC.</p>
      <p>This will enable researchers to compare diferent knowledge graph construction languages and
techniques, leading to a better understanding of their strengths and weaknesses and, ultimately, to more
efective tools for KGC.</p>
      <p>This paper’s contributions are twofold. First, we review user studies in the KGC domain, which
indicate the abovementioned challenges. The second contribution is the protocol. Section 2 provides
an overview of the related work on (declarative) approaches to KGC by mapping data sources onto
RDF datasets, focusing on those that explicitly report on user studies. The goal is to show that no two
papers adopt the same protocol, which makes comparing studies impossible. Section 3 presents the
protocol we have made available with CC-BY-SA 4.0 license. The protocol provides detailed guidelines
for recruiting participants and disclosing potential biases. The process guidelines for informed
consent, pre-questionnaires, familiarization activities, task execution, and post-questionnaires. The tasks
consist of five tasks, of which, when comparing two groups, the last two can be changed to ensure a
common base for comparison. The related work will show that reporting is often limited to simple
metrics and averages. Still, we deem it important to analyze the relationships between task
execution, perceived usefulness, and perceived cognitive load. To this end, Section 4 proposes the statistical
means to use when adopting this protocol. In Section 5, we discuss the resources from various aspects,
such as the scientific and technical, to elaborate on the soundness of our approach. This section also
discusses some of the limitations. Section 6 then concludes the paper and proposes future directions.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        In [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], the authors presented an excellent survey on declarative KGC tools to help the community and
practitioners choose which languages, tools, or techniques fit their needs. However, the article looks
at those from a technical perspective. They look at the functionalities ofered by the diferent options.
In [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], the authors proposed a benchmark to compare KGC tools and applied it to some well-known
implementations such as RMLMapper1, Morph-KGC [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], and SDM-RDFizer [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. It is surprising to see
that the perceptions of users and practitioners have yet to be examined in a systematic manner.
      </p>
      <p>
        From a broader perspective, [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] describes three “personas” that engage with KGs: KG builders, KG
analysts, and KG consumers, which were distilled from interviews with practitioners. As the name
intuitively implies, the KG builder persona would be responsible for generating the KG from
heterogeneous sources, but the persona is also in charge of ontology engineering. The authors state that
builders could benefit from tools that help them ensure that the schema is respected (what the authors
call an “enforcer”) as well as adequate visualization tools. While the paper does not explicitly mention
KG construction and mappings as tasks, they fall under the “data integration” umbrella. The interviews
indicate that there are challenges impeding uptake.
      </p>
      <p>
        It seems that practitioners’ or users’ roles are sometimes neglected. This is certainly the case for
KG construction, as we will demonstrate via our literature review. Our review looked at the following
papers reporting on users, their experiences, and/or perceived usability: [
        <xref ref-type="bibr" rid="ref10 ref11 ref12 ref6 ref7 ref8 ref9">6, 7, 8, 9, 10, 11, 12, 13, 14,
15, 16</xref>
        ]. Most of these studies looked at the creation of mappings. Exceptions are [13] reporting on
studies on mapping understanding and [16] reporting on a complex data flow that included mapping
creation.2 We compare the various aspects of these user studies in Tables 1, 2, and 3. From these tables,
we can observe a couple of important points:
• Some report on comparing mapping languages and/or tools (e.g., [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] and [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]), and others report
on comparing mapping languages (e.g., [14] and [15]). Quite a few papers merely report on the
perceived usability of their tool without any comparison. We argue that reporting on user studies
only makes sense if there is a basis for comparison. Without a reference point (e.g., a comparable
tool evaluated under a shared protocol) or even a common protocol to establish such points, it
becomes dificult to interpret the significance of usability results. For instance, reporting that
users found a tool “easy to use” or completed a task within a certain time frame lacks context
unless these outcomes can be meaningfully compared. Comparative user studies are therefore
essential to provide this community with insights and to guide the development of more efective
knowledge graph construction languages and tools.
• Looking at the procedure, we see many recurring elements (some (training) resources are being
shared, pre-assessment surveys, introduction of tasks, surveys, etc.). No two procedures are
the same, which limits our ability to compare the studies. Some studies reported asking about
information such as gender and age but did not report on those in the data analysis.
• Most studies involved participants with expertise in IT, databases and/or semantic web
technologies. Many studies also report inviting MSc students in computer science or related fields.
      </p>
      <sec id="sec-2-1">
        <title>1https://github.com/RMLio/rmlmapper-java</title>
        <p>2Excluded from this survey are publications that do not report on users. For example, in [17], the authors reported involving
participants, but no report on the participants’ experience was made. Other examples of papers mentioning users,
participants, etc. without any detailed reporting include [18], [19], and [20].</p>
        <p>Self-reported prior knowledge and competencies are a recurring theme, but no two studies tackle
this aspect comparably.
• The same heterogeneity can be observed for the tasks and datasets, where we do notice that most
studies adopt datasets that do not require specific domain expertise (people, movies, places, etc.).
• Recurring themes in data being analyzed are time (eficiency), accuracy, and perceived usability.</p>
        <p>Most rely on System Usability Metrics [21] (SUS) for perceived usability. A few studies rely
on Post-Study System Usability Questionnaire [22] (PSSUQ) to obtain information on perceived
information quality, interface quality, and system usefulness. Such studies do allow a means to
compare results. Few studies have reported on qualitative feedback from users and the mental
workload of tools and mapping languages.
• Most studies merely report on averages, which can arguably make sense when authors only
report on one group and tools or languages are not compared. Few studies employed techniques to
analyze whether groups are (significantly) diferent or whether certain aspects had a (statistically
significant) impact on eficiency, accuracy, etc.</p>
        <p>From this survey, we can conclude that there is a critical need for homogeneous protocols, including
tasks, for comparing advances in KG construction (KGC) approaches (mapping languages and tools
alike). In the next section, we propose a protocol to address these issues and how they can be used.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. The KG Construction User Study Protocol</title>
      <p>This section presents the protocol for comparing KGC tools and languages. The protocol’s3 structure
foresees placeholders for text to be easily adapted for research ethics applications.</p>
      <p>The protocol can be used to analyze the perceived usability and cognitive load of a mapping language
or tool, as well as the accuracy achieved by users and the task execution time. When scholars use
this protocol on only one group, the hypotheses are limited to comparing participants with diferent
backgrounds or comparing the results with other experiments using that same protocol. Scholars
adopting this protocol can easily extend the protocol to compare two tools, techniques, or languages.
This will be explained in Section 3.5.</p>
      <p>The focus of the protocol is to facilitate the discovery of problems (i.e., formative testing), not on
task-level measurements. [23] describes the diference between the two. The protocol proposes
tasklevel measurements and techniques for analyzing the data. When sample sizes are small, they can
merely give insights.</p>
      <sec id="sec-3-1">
        <title>3.1. Participant Selection</title>
        <p>Adopters of the protocol should indicate how participants are recruited and from where. Adopters
should disclose potential biases by providing details about factors influencing the study or
participant behavior. Examples include the hierarchical relationship between research group leaders and
researchers, as well as students recruited from classes. There is also a diference between voluntary
participation and mandatory representation (e.g., in the context of a teaching activity), which may lead
to non-probability samples. The absence of certain participants may lead to response bias, which is
the possible impact on observations had those participants taken part in the experiment. In [24], the
authors describe how disclosing participant selection is important to recognize potential biases.</p>
        <p>Practitioners involved with UI and UX often state that five users are suficient to discover most of the
usability problems. That is more likely the case for problem discovery than task-level measurements,
which require larger sample sizes [23]. As [25] observed in an experiment, “increasing the number
from 5 to 10 can result in a dramatic improvement in data confidence.” They also found that increasing
the number to 20 practically guarantees all problems to be seen, but we recognize that recruiting
participants is dificult. As such, we (strongly) recommend 10 participants per group.</p>
        <sec id="sec-3-1-1">
          <title>3https://github.com/chrdebru/kgc-user-study-protocol</title>
          <p>• 1-hour study
• tutorial and material before the experiment (days before)</p>
          <p>Comparison • self-assessment on Semantic Web and mapping languages
YARRRML and used, and demographics
SPARQL Anything. • task
• nonstandard evaluation questionnaire
• self-assessment on Semantic Web
Developers use Matey, • briefing about technologies and tools
non-experts use • task</p>
          <p>RMLEditor. • usability assessment and some specific questions
De Brouwer et RMLEditor,
al., 2024 Matey</p>
          <p>RML,</p>
          <p>YARRRML</p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Process</title>
        <p>Participants begin by reviewing informed consent materials and completing a pre-questionnaire
assessing their backgrounds, prior knowledge, and expectations. Next, they attend a presentation
introducing the technology, review relevant documentation, and engage in a familiarization activity. The</p>
        <p>Employees and 2 musaencyamseasp,puinncglse.aOrhnoew accuracy
Heyvaert et projects. Movies schema based (start from usability (SUS)
al., 2016 (DBpedia) and
directors.</p>
        <p>qualitative feedback
Crotti Junior et
al., 2017</p>
        <p>Northwind
database
1 mapping in 3 parts (2
classes and linking them)
core of the study involves participants executing a defined task using the technology. Finally,
participants complete a post-questionnaire evaluating their experience, including usability, eficiency, and
perceived cognitive load. This structured process aims to gather comprehensive data on user
interaction and perception of the technology.</p>
        <p>Next to presenting and demonstrating the tool or mapping language, we also request participants
to handle the environment. We deem this familiarization activity novel compared to the related work.
ontology) and one data
based (start from data).
3 tasks of low, medium
and high complexity as
described by the authors
10 mappings, 2 classes/3
object properties/5 data
type properties. (Maybe
this means 1 mapping task
in several steps but not</p>
        <p>clear).</p>
        <p>understanding the
mapping i.e., what is</p>
        <p>being mapped
2 use cases, one per
dataset.</p>
        <p>time
accuracy
completion rate (# people who
completed tasks)
qualitative feedback
time
accuracy
usability (PSSUQ)
accuracy (11 questions on what is
being mapped)
users’ preference and confidence
questions
accuracy
usability (SUS)
qualitative feedback
accuracy
1 mapping in 3 parts (2 usability (PSSUQ)
classes and linking them) mental workload (MWL and</p>
        <p>NASA-TLX)
1 mapping on each
representation relating
people and places
time
accuracy
usability (PSSUQ)
mental workload
2 tasks
mouse and keyboard activity
a modified usability questionnaire
accuracy
time
fill gaps and partial
solutions were provided
task was on the overall
workflow and not directly
on mapping creation
accuracy (manually)
accuracy
usability (SUS)</p>
        <p>Averages on accuracy
ANOVA, Anderson darling, Welch
and Wilcoxon tests
Normality tests
Correlations with Pearson and
Spearman
Reliability
Averages on accuracy
ANOVA, Anderson darling, Welch
and Wilcoxon tests
Normality tests
Correlations with Pearson and
Spearman
Reliability
Average, standard deviation, min and
max values
One-way ANOVA
Kruskal-Wallis test
Subjective evaluation when
analyzing users’ results
Averages
Averages
Averages
Averages
Averages
Averages
Number of people who completed
tasks.</p>
        <p>Averages on accuracy and time
Welch Two Sample t-test and
Friedman non-parametric test for
PSSUQ
This activity ensures participants are comfortable executing mappings within the provided (tool’s)
environment. We guide participants in demonstrating the practical aspects of using the tool’s interface,
such as utilizing the command line in the terminal or identifying the correct buttons to click. This
focused familiarization will prevent the environment from becoming an obstacle, allowing us to assess
the tool or language’s usability and gather unbiased feedback on its functionalities.</p>
        <p>Furthermore, we ask authors to report on how responses were submitted (e.g., email, paper, form)
and the anticipated duration of the experiment. While in-class experiments often have time constraints,
other environments may be more flexible. Our protocol foresees 1 hour for the five tasks. If, for
example, all steps are conducted in a classroom setting, the experiment would require 2:30 in total (i.e.,
including the questionnaires and consulting the training material). Finally, clarify whether
participants can ask questions during the experiment. Help should be limited to aspects not core to the KG
construction process and experiment. For example, helping participants navigate to the correct
directory in a terminal or assessing whether a network issue is permitted, but providing help to execute a
mapping is not. Ideally, studies should report on those (and their number of occurrences).
Project_ID</p>
        <p>Project
name
start_date</p>
        <p>end_date
first_name</p>
        <p>Employee_ID
last_name</p>
        <p>Employee
managed by
part of</p>
        <p>Task_ID
Task
description_en
description_fr</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Pre-questionnaire</title>
        <p>Studies often inquire about participants and group them based on self-reported information on their
background and proficiency with specific techniques. We aim to homogenize this by proposing an
exhaustive list of current roles, formal training, and self-perceived competency levels in certain Semantic
Web technologies. We also included three questions related to intrinsic motivation (enjoyment,
curiosity, and value). It is important to analyze the impact of self-selection bias in voluntary participation.
This allows us to explore potential self-selection bias in voluntary participation by examining
correlations between motivation levels, task performance, and perceived usability. In contrast to self-selection
bias in voluntary participation, mandatory settings introduce the risk of low engagement among less
intrinsically motivated participants. Including questions on intrinsic motivation enables us to assess
how motivation levels influence task performance and perceived usability.</p>
      </sec>
      <sec id="sec-3-4">
        <title>3.4. Mapping tasks</title>
        <p>Participants will be requested to complete five mapping tasks, some of which are interdependent. We
can observe that diferent studies adopt diferent domains, some of which are specific (e.g., health care).
We argue that participants should question the domain used in the experiment. Therefore, we propose a
domain that is suficiently generic and accessible for participants to understand. Our protocol proposes
mapping data about projects, project tasks, and employees who manage projects and are assigned
tasks. Figure 1 depicts the Universe of Discourse (UoD) of the data to be transformed into RDF. We
use an Entity-Relationship Diagram (ERD), but JSON and XML files can easily represent the data. 4 To
ensure attribute names do not confuse participants, we ensured all attributes are unambiguous. In this
simple UoD, all relations are many-to-one, though this can be easily extended to many-to-many when
transforming documents.</p>
        <p>The tasks can be summarized as follows:
1. Generate instances of ex:Employee with their first and last-names. The IRIs of employees are
based on the name.
2. Generate instances of ex:Project with their name, start- and end-date. Both dates are of the
type xsd:date, allowing us to assess the creation of typed literals. The IRIs of projects are based
on the project’s ID.
3. Generate ex:managedBy properties from projects to employees.
4. Generate instances of ex:Task with their descriptions (in two languages). The IRIs of tasks are
based on the task’s ID. The descriptions allow us to assess the creation of language tags.
5. Generate ex:of and ex:assignedTo properties from tasks to, resp., projects and employees.</p>
        <p>We point out that the mappings mainly focus on RML-core [20] functionality. Part of RML-core are
multi-valued expression maps, which are irrelevant for CSV files. People can easily adapt the JSON
ifles to provide one or more task descriptions in diferent languages to test more complex language
maps, for instance.</p>
        <sec id="sec-3-4-1">
          <title>4Examples are provided in the GitHub repository under the “assets” directory.</title>
          <p>We draw attention to the fact that employees’ IRIs are based on their names, whereas the project
data sources refer to employees via their IDs. This requires users to join the data in the two sources.
In the case of RML, this requires users to “join” the two sources either at the level of the logical source
or by using a referencing object map.</p>
        </sec>
      </sec>
      <sec id="sec-3-5">
        <title>3.5. Variants</title>
        <p>We stated that the tasks focus on RML-core functionality, which is also covered by other KGC languages
and techniques such as ShExML [14] and SPARQL Anything [26]. While one can argue that the set
of desired functionalities is limited, [20] covered the requirements of all RML extensions. Formulating
tasks that cover practically all possible features and scenarios, not only in time but also in data source
complexity, is not feasible. However, when a particular feature or scenario needs to be investigated,
the protocol can be amended. This section describes how this protocol can be adapted and used to
compare diferent tools, features, or aspects.</p>
        <p>• One can provide the same tasks to two diferent groups when comparing languages, techniques,
or tools. One can compare diferent mapping languages (e.g., ShExML vs. RML), compare editors
vs. “bare bones” languages (e.g., RMLEditor vs. RML), compare languages and abstractions of
languages (e.g., RML vs. YARRRML), and even diferent editors and languages.
• When the aim is to compare a tool or language’s support for an advanced KGC requirement
such as named graphs, collections and containers, or RDF-star (among others), then one can
take this protocol as is for one group, and only change the last two tasks for the second in which
those requirements are covered for the second group. The first three tasks, which are shared,
provide a basis for comparing the two groups. This design ensures that both groups share a
comparable foundation (tasks 1-3), allowing us to isolate and evaluate the impact of the new
features introduced in the final tasks.
• We have proposed a generic and accessible domain for the protocol to ensure broad applicability.</p>
        <p>However, the protocol can be adapted to include similar tasks within a diferent (and
domainspecific) context. This adaptation would enable researchers to assess whether a language or tool
designed for a particular application domain performs better in its setting. In such cases, it is
important to compare the tool or language across both the original (generic) domain and the
adapted (domain-specific) version.</p>
        <p>Participants are assumed to have access to prepared “resources” or “environments” to focus on the
tasks. In the case of RML, for instance, the logical sources would be provided in the tool or for them to
copy and paste. This allows researchers to assess the languages and tools with respect to these aspects
by giving one group the prepared artifacts and requesting the other to formulate the logical sources
themselves. We deem this a special case of comparing a baseline with an extension as described above,
but where the five tasks remain unchanged.</p>
      </sec>
      <sec id="sec-3-6">
        <title>3.6. Post-questionnaire</title>
        <p>
          Both SUS and PSSUQ are used to measure usability. SUS is adequate for a rapid and general measure
of a system’s usability. Still, the latter ofers more advantages because it assesses three aspects of a
system: system usefulness, information quality, and interface quality. Furthermore, there is a question
about the system as a whole, which allows one to dampen the perception of individual aspects. The
original PSSUQ survey uses 19 questions (as adopted by [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ], for instance), but recent iterations have
removed three redundant questions.
        </p>
        <p>As for the perceived (mental) workload, we adopt both the Workload Profile (WP) [ 27] and the NASA
Task Load Index (NASA-TLX) [28].</p>
        <p>• WP adopts a theory in which participants have diferent capacities (dimensions) related to the
stage, mode, input, and output of information processing. The eight dimensions are each
quantiifed through subjective rates, and participants must rate the proportion of attentional resources
instruments.
racy. For instance:
used for performing a given task with a value from 0 to 100 after task completion. A rating of 0
means that the task placed no demand, while 100 indicates that it required maximum attention.</p>
        <p>8
The WP of a participant is calculated as 1 ∑=1  .</p>
        <p>8
• NASA-TLX has been validated in several domains [28] and combines six factors believed to
inlfuence the mental workload. Each factor is quantified with a subjective judgment and a weight
computed via a paired comparison procedure. For each possible pair of the six factors,
participants must decide which factor contributed the most to the mental workload during the task. The
weights  are the number of times each dimension was selected. The possible weights range from
0 (irrelevant) to 5 (most important). The final score is computed as a weighted average,
considering the subjective rating of each attribute   and the corresponding weights   : 1
It is possible to calculate the scores by eliminating the weighted procedure, which yields the
15</p>
        <p>6
∑=1 
 ∗   .</p>
        <p>so-called Raw TLX.</p>
        <p>
          Both instruments are used in industry and research. You may notice that both instruments use
diferent rating systems, which may confuse participants in a paper survey. Erroneous inputs can be
prevented by adopting electronic forms. We choose not to harmonize the scales, which has been done
in [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ], for instance, to obtain results that can be compared to other studies faithfully adopting those
Studies should explicitly report the method used to calculate performance measures, such as
accu• Accuracy should be determined by (1) graph isomorphism (did they generate the expected graph,
which is true or false), and, for more nuanced numbers, (2) precision (the proportion of triples
that are generated that are in the expected graph), (3) recall (the proportion of expected triples
that are in the generated graph), and (4) F-measure combining precision and recall.
        </p>
        <p>This approach accounts for situations where a participant generates additional triples, for instance.
that measures how well resources are used, which is not only time. Most studies measured the
time it took for tasks to be completed. We recommend studies to report on task execution time,
and the method to measure it. One can manually record time or use software to record user
interactions to time the tasks. Another approach is to request users to report on the time or use
electronic forms that keep track of time.</p>
        <p>While we strongly encourage placing a time limit on the tasks for the experiment to obtain
comparable results, there are two important cases to track: did not finish
and did not start. The former
may indicate insuficient time left to finish a task or that the task was too dificult. The latter merely
indicates that the user never started the task.</p>
        <p>We propose to limit the protocol to these five metrics (four on accuracy and one related to task
execution time). Studies are free to include other metrics, such as the number of times a mapping was
executed (i.e., trial and error), but that would indirectly impact the task execution time.</p>
        <p>As the experiment should not be too time-consuming, we avoided interviews to obtain qualitative
feedback. We also avoid “think aloud” experiments as they can impact the cognitive load. Whether a
study reports on it or not, we recommend studies to report on any additional instruments they used.
There is, however, a qualitative dimension to our protocol, as the PSSUQ does leave room for comments
on each of the 16 questions.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Results and Analysis</title>
      <p>As part of our protocol, we recommend a structured approach for reporting collected data. As
mentioned, while many user studies focus primarily on presenting averages and standard deviations, we
emphasize the importance of extending these reports to include statistical tests. This ensures robust
comparisons between groups and tools, enhancing the general reliability and interpretability of the
experiments. The following describes the recommended aspects and tests to be considered when
reporting results and analysis.</p>
      <p>Reliability and Internal Consistency Reliability refers to the degree to which the items within a
test or survey consistently measure the same construct. High internal consistency strengthens
the statistical reliability of metrics, thereby enhancing the validity of group comparisons. We
recommend using Cronbach’s Alpha to evaluate internal consistency. Higher alpha coeficients
indicate greater shared covariance among items, suggesting they assess the same underlying
concept. A Cronbach’s Alpha value of ≥ 0.7 is generally considered acceptable. [29]
Data Normality Data normality refers to how much data distribution aligns with a normal curve.</p>
      <p>While normality is not always required, t-tests and ANOVA assume that data follows such a
distribution. ANOVA is relatively robust when data is not normally distributed, when sample
sizes are large, but that is dificult when dealing with user studies. We, therefore, require studies
to test for normality and report on normality. Participants may, if they wish, use other statistical
measures. As the sample sizes of a group will likely not exceed 50, we propose the Shapiro-Wilk
test to assess normality. In this test, the sample is compared to a theoretical normal distribution.
Homogeneity Some statistical tests, such as ANOVA, assume that the variances across groups are
equal. This is known as the homogeneity of variances. Again, this assumption should be verified
as part of the analysis. Levene’s Test is a standard method for evaluating this assumption.
Group Comparisons To determine whether diferences between groups are statistically significant,
researchers should employ well-established statistical tests. The choice of test depends on the
assumptions about the data. The recommended tests when the data is normally distributed (also
known as parametric tests) are Welch’s t-test when comparing two groups and ANOVA when
comparing more than two groups simultaneously. Both tests assume normality and homogeneity
of variance. The recommended tests when the data is not normally distributed (also known as
non-parametric tests) are the Wilcoxon test for comparing two groups and the Kruskal-Wallis test
for multiple.</p>
      <p>Correlation Analysis Correlation methods assess the strength and direction of the relationship
between variables, which can provide deeper insights into study outcomes. For instance,
examining correlations between usability, accuracy, and mental workload can reveal relationships in
user behavior. When data is normally distributed, we recommend the Pearson’s Correlation to
measure the strength of a relationship, for example, between accuracy and usability. Otherwise,
one should use the Spearman’s Correlation. Researchers must report on correlations between all
relevant variables (e.g., usability and mental workload, usability and accuracy, etc.) to provide a
comprehensive analysis.</p>
      <p>Transparency and Accessibility To promote transparency and reproducibility, the data and the
statistical tests performed should be publicly available online, provided that all personally
identifiable information is removed and participant anonymity is preserved in accordance with ethical
research standards.</p>
      <p>Access to data, analysis scripts, and detailed methodology facilitates validation and enhances
the study’s credibility. Moreover, sharing such data would allow one to compare results across
studies more easily, provided the conditions are similar.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Discussion</title>
      <p>We presented a comparison of user studies in the KGC domain, and we noticed that all studies were
diferent. This makes it impossible to compare KGC languages, tools, and software. To this end, we
analyzed the related work, distilled elements we appreciated, and proposed others to establish a common
protocol. As such, we proposed a resource, a user study protocol, that provides the KGC community
with a better way to present, compare, and scrutinize contributions.</p>
      <p>
        When designing the protocol, we selected and refined elements from the state-of-the-art that we
appreciated. Examples include the use of accuracy and task execution time as simple measures, the use
of PSSUQ over SUS to obtain more fine-grained information on usability, usefulness, and information
quality, and measuring the mental workload right after the task. Reporting on the correlations between
the perceived usability, task execution, and mental workload could shed interesting insights. [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] We
provided guidelines on what statistical techniques to use when reporting on user studies with this
protocol, as most merely reported on averages.
      </p>
      <p>Relatively novel compared to the related work is our informed decision to adopt an accessible
domain, focus on RML-core functionality as a basis, and formulate five tasks. In Section 3, we provided
a rationale that, when comparing two groups, the last two tasks can be replaced for one group so that
extensions or variants within the same language or tool can be compared. As such, the resource is
suficiently general for use in the KGC community, and the approach in designing this protocol may
inspire others within the Semantic Web community.</p>
      <p>The resources have not only been made available with a DOI5 on a long-term preservation platform,
but they are also available on a GitHub repository. The latter allows peers to contribute to the project.
The directory structure we use allows for variants of the protocol to be made available.</p>
      <p>The protocol has not yet been used at this stage, but its first use is planned for the spring of 2025.
However, many of its separate elements are drawn from existing studies. As such, those parts have
already been validated in the community. We also aim to engage with the wider KGC community via
the W3C working group on adopting this protocol across diferent institutions.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusions</title>
      <p>Prior KGC user studies used diferent protocols, making comparison impossible. This paper thus
highlights the lack of standardized protocols in user studies related to KGC, making it dificult to
compare diferent studies. We present a new protocol to address these inconsistencies, focusing on
participant selection, task design, and evaluation metrics. The protocol suggests detailed guidelines for
recruiting participants and disclosing potential biases. The process guidelines for informed consent,
pre-questionnaires, familiarization activities, task execution, and post-questionnaires. Five specific
mapping tasks are proposed, which can be solved with the equivalent of the RML-Core specification.
The protocol recommends using the Post-Study System Usability Questionnaire (PSSUQ) for usability,
and the NASA Task Load Index (NASA-TLX) for mental workload, among others. We designed the
protocol in such a way that one can analyze one group, or compare groups. To this end, we provided
guidelines on which statistical instruments to use.</p>
      <p>This protocol aims to provide a more comparable evaluation of KGC user studies, ultimately leading
to more efective tools for knowledge graph construction. As such, the protocol is an essential artifact
for future longitudinal and comparative studies. We recognize that the proposal has been constructed
in a bottom-up fashion for this community, and future work should look into aligning our proposal
with methods for comparing the usability of diferent (information) systems, such as [ 30]. Finally,
future work also involves encouraging the adoption of this protocol by various KGC scholars. While
ambitious, it is hoped that this protocol will form the basis of a new, open repository of KGC user
studies.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments</title>
      <p>We thank the reviewers for their many (many) thoughtful comments, which greatly improved the paper.
Their feedback was invaluable.</p>
    </sec>
    <sec id="sec-8">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the author(s) used Grammarly to improve grammar, check spelling,
and reword. After using these tool(s)/service(s), the author(s) reviewed and edited the content as
needed and take(s) full responsibility for the publication’s content.
cations - Second International Symposium, H-WORKLOAD 2018, Amsterdam, The Netherlands,
September 20-21, 2018, Revised Selected Papers, volume 1012 of Communications in Computer and
Information Science, Springer, 2018, pp. 160–179.
[13] A. Crotti Junior, A Jigsaw Puzzle Metaphor for Representing Linked Data Mappings, Ph.D. thesis,</p>
      <p>Trinity College Dublin, 2019.
[14] H. García-González, I. Boneva, S. Staworko, J. E. L. Gayo, J. M. C. Lovelle, ShExML: improving
the usability of heterogeneous data mapping languages for first-time users, PeerJ Comput. Sci. 6
(2020) e318.
[15] P. Warren, P. Mulholland, E. Daga, L. Asprino, Path-based and triplification approaches to
mapping data into RDF: User behaviours and recommendations, Semantic Web (2023) 1–27.
[16] M. De Brouwer, P. Bonte, D. Arndt, M. Vander Sande, A. Dimou, R. Verborgh, F. De Turck, F.
Ongenae, Optimized continuous homecare provisioning through distributed data-driven semantic
services and cross-organizational workflows, J. Biomed. Semant. 15 (2024) 9.
[17] P. Heyvaert, A. Dimou, R. Verborgh, E. Mannens, R. Van de Walle, Semantically annotating
CEUR-WS workshop proceedings with RML, in: Semantic Web Evaluation Challenges - Second
SemWebEval Challenge at ESWC 2015, Portorož, Slovenia, May 31 - June 4, 2015, Revised Selected
Papers, volume 548 of Communications in Computer and Information Science, Springer, 2015, pp.
165–176.
[18] I. A. Ibrahim, T. Choudhury, J. Sargeant, R. Shah, M. J. Hossain, S. M. Islam, CEREI: an open-source
tool for cost-efective renewable energy investments, SoftwareX 26 (2024) 101708.
[19] P. Heyvaert, B. De Meester, A. Dimou, R. Verborgh, Declarative Rules for Linked Data Generation
at Your Fingertips!, in: The Semantic Web: ESWC 2018 Satellite Events - ESWC 2018 Satellite
Events, Heraklion, Crete, Greece, June 3-7, 2018, Revised Selected Papers, volume 11155 of Lecture
Notes in Computer Science, Springer, 2018, pp. 213–217.
[20] A. Iglesias-Molina, D. Chaves-Fraga, I. Dasoulas, A. Dimou, Human-Friendly RDF Graph
Construction: Which One Do You Chose?, in: Web Engineering - 23rd International Conference,
ICWE 2023, Alicante, Spain, June 6-9, 2023, Proceedings, volume 13893 of Lecture Notes in
Computer Science, Springer, 2023, pp. 262–277.
[21] J. Brooke, SUS: a retrospective, J. Usability Studies 8 (2013) 29–40.
[22] J. R. Lewis, Psychometric evaluation of the PSSUQ using data from five years of usability studies,</p>
      <p>International Journal of Human-Computer Interaction 14 (2002) 463–488.
[23] J. R. Lewis, Sample sizes for usability tests: mostly math, not magic, Interactions 13 (2006) 29–33.
[24] J. W. Creswell, J. D. Creswell, Research design: Qualitative, quantitative, and mixed methods
approaches, Sage publications, 2017.
[25] L. Faulkner, Beyond the five-user assumption: Benefits of increased sample sizes in usability
testing, Behavior Research Methods, Instruments, &amp; Computers 35 (2003) 379–383.
[26] E. Daga, L. Asprino, P. Mulholland, A. Gangemi, Facade-X: an opinionated approach to SPARQL
anything, volume 53: Further with Knowledge Graphs of Studies on the Semantic Web, IOS Press,
2021, pp. 58–73.
[27] P. S. Tsang, V. L. Velazquez, Diagnosticity and multidimensional subjective workload ratings,</p>
      <p>Ergonomics 39 (1996) 358–381.
[28] S. G. Hart, NASA-task load index (NASA-TLX); 20 years later, in: Proceedings of the human
factors and ergonomics society annual meeting, volume 50, 2006, pp. 904–908.
[29] R. A. Peterson, A meta-analysis of cronbach’s coeficient alpha, Journal of consumer research 21
(1994) 381–391.
[30] R. Kruger, J. Brosens, M. Hattingh, A methodology to compare the usability of information
systems, in: Responsible Design, Implementation and Use of Information and Communication
Technology: 19th IFIP WG 6.11 Conference on e-Business, e-Services, and e-Society, I3E 2020, Skukuza,
South Africa, April 6–8, 2020, Proceedings, Part II, Springer-Verlag, Berlin, Heidelberg, 2020, p.
452–463.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>D.</given-names>
            <surname>Van Assche</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Delva</surname>
          </string-name>
          , G. Haesendonck,
          <string-name>
            <given-names>P.</given-names>
            <surname>Heyvaert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>De Meester</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Dimou</surname>
          </string-name>
          ,
          <article-title>Declarative RDF graph generation from heterogeneous (semi-)structured data: A systematic literature review</article-title>
          ,
          <source>J. Web Semant</source>
          .
          <volume>75</volume>
          (
          <year>2023</year>
          )
          <fpage>100753</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>D.</given-names>
            <surname>Van Assche</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Chaves-Fraga</surname>
          </string-name>
          ,
          <article-title>A. Dimou, KROWN: A benchmark for RDF graph materialisation</article-title>
          , in: G. Demartini,
          <string-name>
            <given-names>K.</given-names>
            <surname>Hose</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Acosta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Palmonari</surname>
          </string-name>
          , G. Cheng, H.
          <string-name>
            <surname>Skaf-Molli</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Ferranti</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Hernández</surname>
            ,
            <given-names>A</given-names>
          </string-name>
          . Hogan (Eds.),
          <source>The Semantic Web - ISWC 2024 - 23rd International Semantic Web Conference</source>
          , Baltimore,
          <string-name>
            <surname>MD</surname>
          </string-name>
          , USA, November
          <volume>11</volume>
          -
          <issue>15</issue>
          ,
          <year>2024</year>
          , Proceedings,
          <string-name>
            <surname>Part</surname>
            <given-names>III</given-names>
          </string-name>
          , volume
          <volume>15233</volume>
          of Lecture Notes in Computer Science, Springer,
          <year>2024</year>
          , pp.
          <fpage>20</fpage>
          -
          <lpage>39</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>J.</given-names>
            <surname>Arenas-Guerrero</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Chaves-Fraga</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Toledo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. S.</given-names>
            <surname>Pérez</surname>
          </string-name>
          , Ó. Corcho, Morph-KGC:
          <article-title>Scalable knowledge graph materialization with mapping partitions</article-title>
          ,
          <source>Semantic Web</source>
          <volume>15</volume>
          (
          <year>2024</year>
          )
          <fpage>1</fpage>
          -
          <lpage>20</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>E.</given-names>
            <surname>Iglesias</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Jozashoori</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Chaves-Fraga</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Collarana</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Vidal, SDM-RDFizer: An RML Interpreter for the Eficient Creation of RDF Knowledge Graphs</article-title>
          , in: M.
          <string-name>
            <surname>d'Aquin</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Dietze</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Hauf</surname>
            , E. Curry,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Cudré-Mauroux</surname>
          </string-name>
          (Eds.),
          <source>CIKM '20: The 29th ACM International Conference on Information and Knowledge Management</source>
          , Virtual Event, Ireland,
          <source>October 19-23</source>
          ,
          <year>2020</year>
          , ACM,
          <year>2020</year>
          , pp.
          <fpage>3039</fpage>
          -
          <lpage>3046</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>H.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Appleby</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. D.</given-names>
            <surname>Brumar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Suh</surname>
          </string-name>
          ,
          <article-title>Knowledge Graphs in Practice: Characterizing their Users, Challenges, and Visualization Opportunities</article-title>
          ,
          <source>IEEE Transactions on Visualization and Computer Graphics</source>
          <volume>30</volume>
          (
          <year>2023</year>
          )
          <fpage>584</fpage>
          -
          <lpage>594</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>C.</given-names>
            <surname>Pinkel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Binnig</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Haase</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Martin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Sengupta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Trame</surname>
          </string-name>
          ,
          <article-title>How to Best Find a Partner? An Evaluation of Editing Approaches to Construct R2RML Mappings, in: The Semantic Web: Trends and Challenges -</article-title>
          11th International Conference, ESWC 2014, Anissaras, Crete, Greece, May
          <volume>25</volume>
          -29,
          <year>2014</year>
          . Proceedings, volume
          <volume>8465</volume>
          of Lecture Notes in Computer Science, Springer,
          <year>2014</year>
          , pp.
          <fpage>675</fpage>
          -
          <lpage>690</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>P.</given-names>
            <surname>Heyvaert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Dimou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Herregodts</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Verborgh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Schuurman</surname>
          </string-name>
          , E. Mannens, R. Van de Walle,
          <article-title>RMLEditor: A Graph-Based Mapping Editor for Linked Data Mappings, in: The Semantic Web</article-title>
          .
          <source>Latest Advances and New Domains - 13th International Conference, ESWC</source>
          <year>2016</year>
          , Heraklion, Crete, Greece, May 29 - June 2,
          <year>2016</year>
          , Proceedings, volume
          <volume>9678</volume>
          of Lecture Notes in Computer Science, Springer,
          <year>2016</year>
          , pp.
          <fpage>709</fpage>
          -
          <lpage>723</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Á.</given-names>
            <surname>Sicilia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Nemirovski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Nolle</surname>
          </string-name>
          , Map-On:
          <article-title>A web-based editor for visual ontology mapping</article-title>
          ,
          <source>Semantic Web</source>
          <volume>8</volume>
          (
          <year>2017</year>
          )
          <fpage>969</fpage>
          -
          <lpage>980</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>J.</given-names>
            <surname>Bak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Blinkiewicz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Lawrynowicz</surname>
          </string-name>
          ,
          <article-title>User-friendly Visual Creation of R2RML Mappings in SQuaRE</article-title>
          , in: Proceedings of the Third International Workshop on Visualization and
          <article-title>Interaction for Ontologies and Linked Data co-located with the 16th International Semantic Web Conference (ISWC</article-title>
          <year>2017</year>
          ), Vienna, Austria, October
          <volume>22</volume>
          ,
          <year>2017</year>
          , volume
          <volume>1947</volume>
          <source>of CEUR Workshop Proceedings, CEUR-WS.org</source>
          ,
          <year>2017</year>
          , pp.
          <fpage>139</fpage>
          -
          <lpage>150</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>A.</given-names>
            <surname>Crotti Junior</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Debruyne</surname>
          </string-name>
          ,
          <string-name>
            <surname>D.</surname>
          </string-name>
          <article-title>O'Sullivan, Juma: An Editor that Uses a Block Metaphor to Facilitate the Creation and Editing of R2RML Mappings, in: The Semantic Web: ESWC 2017 Satellite Events - ESWC 2017 Satellite Events</article-title>
          , Portorož, Slovenia, May 28 - June 1,
          <year>2017</year>
          , Revised Selected Papers, volume
          <volume>10577</volume>
          of Lecture Notes in Computer Science, Springer,
          <year>2017</year>
          , pp.
          <fpage>87</fpage>
          -
          <lpage>92</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>P.</given-names>
            <surname>Heyvaert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Dimou</surname>
          </string-name>
          , B. De Meester,
          <string-name>
            <given-names>T.</given-names>
            <surname>Seymoens</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Herregodts</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Verborgh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Schuurman</surname>
          </string-name>
          , E. Mannens,
          <article-title>Specification and implementation of mapping rule visualization and editing: MapVOWL and the RMLEditor</article-title>
          ,
          <source>J. Web Semant</source>
          .
          <volume>49</volume>
          (
          <year>2018</year>
          )
          <fpage>31</fpage>
          -
          <lpage>50</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>A.</given-names>
            <surname>Crotti Junior</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Debruyne</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Longo</surname>
          </string-name>
          ,
          <string-name>
            <surname>D.</surname>
          </string-name>
          <article-title>O'Sullivan, On the Mental Workload Assessment of Uplift Mapping Representations in Linked Data, in: Human Mental Workload: Models and Appli-</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>