<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>On the Perception of Dificulty: Diferences between Humans and AI</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Philipp Spitzer</string-name>
          <email>Philipp.Spitzer@kit.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Joshua Holstein</string-name>
          <email>Joshua.Holstein@kit.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Michael Vössing</string-name>
          <email>Michael.Voessing@kit.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Niklas Kühl</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Artificial Intelligence, Human-AI Interaction, Confidence Estimation, Instance Dificulty</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Karlsruhe Institute of Technology</institution>
          ,
          <addr-line>Kaiserstraße 89-93, Karlsruhe, 76133</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Bayreuth</institution>
          ,
          <addr-line>Wittelsbacherring 10, Bayreuth, 95444</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>ing Automation Experiences</institution>
          ,
          <addr-line>CHI '23</addr-line>
        </aff>
      </contrib-group>
      <abstract>
        <p>With the increased adoption of artificial intelligence (AI) in industry and society, efective human-AI interaction systems are becoming increasingly important. A central challenge in the interaction of humans with AI is the estimation of dificulty for human and AI agents for single task instances. These estimations are crucial to evaluate each agent's capabilities and, thus, required to facilitate efective collaboration. So far, research in the field of human-AI interaction estimates the perceived dificulty of humans and AI independently from each other. However, the efective interaction of human and AI agents depends on metrics that accurately reflect each agent's perceived dificulty in achieving valuable outcomes. Research to date has not yet adequately examined the diferences in the perceived dificulty of humans and AI. Thus, this work reviews recent research on the perceived dificulty in human-AI interaction and contributing factors to consistently compare each agent's perceived dificulty, e.g., creating the same prerequisites. Furthermore, we present an experimental design to thoroughly examine the perceived dificulty of both agents and contribute to a better understanding of the design of such systems.</p>
      </abstract>
      <kwd-group>
        <kwd>ple instances (for example</kwd>
        <kwd>in Hemmer et al</kwd>
        <kwd>[8]</kwd>
        <kwd>Geirhos</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>1. Introduction
In recent decades, technological advances have led to
artificial intelligence (AI) applications becoming part of
our everyday lives, e.g., when learning a new language
[1] or driving autonomous cars [2]. Like many other
methods and terms, most prominently uncertainty,
conifdence , performance (e.g., in [19]), for measuring the
dificulty of human and AI agents, which is why we aim
to delimit our research in the following and create a
shared understanding of the relevant terms. Before
diving into the frequently used methods, we elaborate on the
examples of human-AI interaction, it comes down to ap- commonly used terms to describe the dificulty.
Perforpropriately assessing the dificulty of diferent situations
for each agent (human and AI). The consequences for
incorrect estimates can range from rejecting such sys- stance [10, 19].
mance represents the aggregated accuracy over multiple
instances for a task or over multiple agents for an
intems, e.g., when the human learner is given too dificult
words or grammar without being ready, to potentially
severe consequences, e.g., autonomously driving cars on
a foggy night. Consequently, it is necessary to estimate
each agent’s dificulty for an instance adequately.</p>
    </sec>
    <sec id="sec-2">
      <title>Further examples of human-AI interaction that draw from an estimation of instance dificulty are</title>
      <sec id="sec-2-1">
        <title>AI complementarity [3–11], curriculum learning [12–14],</title>
        <p>
          and machine teaching [
          <xref ref-type="bibr" rid="ref8">12, 13, 15–18</xref>
          ]. Accurately
assessing the dificulty of single instances for both human and
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>AI interaction to fully exploit their complementary capa</title>
      <p>bilities while creating pleasant automation experiences.</p>
    </sec>
    <sec id="sec-4">
      <title>By reviewing related literature, we observe diferent</title>
      <p>∗Corresponding author.
†These authors contributed equally.
nEvelop-O
of human and AI agents is assessed diferently. First, a
potential issue arises from an existing gap in access to
relevant information. Usually, the AI agent is trained and
having information on the label distribution. However,
AI agents is central to developing these forms of human- of a given task [21]. For the perceived dificulty , one must
this is often not the case for humans (e.g., in [ 10, 22, 23]). advancing [27, 28]. Hereby, various forms of human-AI
Therefore, it remains unclear whether and how, amongst interaction rely on estimating an instance’s dificulty for
others, this afects humans’ perception of dificulty. Sec- efective collaboration. Following, we outline the three
ond, the dificulty of single instances is assessed difer- forms of human-AI interaction most relevant to our
reently. For AI agents, the distribution of the softmax out- search: human-AI complementarity, curriculum learning,
puts is often used to determine its uncertainty [ 10, 22]. and machine teaching.</p>
      <p>
        Contrarily, the human’s perceived instance dificulty is In the field of human-AI complementarity, recent
often measured by observing the distribution of predic- research studies complementary team performance—
tions over groups of humans for single instances or by exceeding the performance each agent (human or AI)
their average performance for an instance [19, 23]. Con- can achieve on their own [
        <xref ref-type="bibr" rid="ref4">3, 5</xref>
        ]. In this collaboration, it
sequently, individual skills and capabilities of humans are is crucial to properly delegate tasks to each agent to
exneglected, potentially resulting in poor experiences in ploit their complementary capabilities [9]. Steyvers et al.
human-AI interaction settings [24]. As related literature [10] establish a framework to facilitate both human and
shows, humans have distinct cognitive styles which can AI agents’ confidence scores to investigate factors that
afect their perceived dificulty [ 25]. Hence, neglecting influence complementary capabilities of human-AI
coltheir individual traits and generalizing their predictions laborations. Lai et al. [11] suggest using uncertainty as a
to determine the perceived dificulty can result in poor measure to delegate tasks between human and AI agents.
estimation for individuals. In the work of Fügener et al. [29], the authors evaluate
      </p>
      <p>As we observe inconsistencies in the measurement of diferent delegation strategies based on the performance
the perceived dificulty between human and AI agents, of both agents for single instances. They find that
huwe outline existing metrics to measure their perceived dif- mans’ perception of task dificulty difers from the actual
ifculty as a first step. Moreover, we scrutinize methods to task dificulty. Lubars and Tan [6] investigate, amongst
compare both agents adequately. Based upon this, we are others, the efect of the dificulty of single instances to
interested in adequately examining the diference in the delegate tasks.
perceived dificulty between humans and AI. Therefore, Curriculum learning denotes another form of
humanwe state the following research question: AI interaction in which the perceived dificulty is relevant
to the overall process. This form of learning is based on
RQ: What are the diferences in the perceived dificulty of human learning and incorporates the idea that the order
humans and AI for single instances? is crucial in which training instances are presented to
a learner [12]. A central aspect of curriculum learning</p>
      <p>To answer this research question, we conduct a liter- is the assertion of dificulty levels of single instances.
ature review to evaluate existing research fields relying Wei et al. [13] use the annotator agreement in an image
on an accurate measurement of the perceived instance classification task to determine the dificulty of instances.
dificulty. Furthermore, we present an experimental de- In the field of machine teaching, a human or an AI agent
sign that avoids the previously mentioned inconsisten- is trained by selecting samples to achieve high learning
cies. Through our experiment, we want to analyze the outcomes [15]. The selection of training instances can be
perceived dificulty of human and AI agents for single grounded on dificulty estimation. For example, Zhang
instances, using established metrics like confidence [ 10] et al. [16] presents an interactive learning procedure in
and PVI [26] adequately. We support our endeavor to which crowd workers are trained based on an
approxestablish adequate methods to consistently measure the imated dificulty for instances. Similarly, Singla et al.
perceived instance dificulty of human and AI agents with [17] select training instances for learners based on an
ifrst empirical results based on an existing, public dataset. expected uncertainty measured by an AI agent.
Overall, with our experiment, we aim to contribute to
a better and more integrated understanding of how to
adequately compare human and AI agents’ perceived dif- 2.2. Measuring Perceived Dificulty of
ifculty leading to a thorough understanding of the design Humans and AI
of human-AI interaction systems.</p>
    </sec>
    <sec id="sec-5">
      <title>AI’s perceived dificulty. In Ståhl et al. [30], the authors</title>
      <p>evaluate diferent metrics to compare the uncertainty of
2. Related Work deep learning models. One of these metrics is a Bayesian
network-based approach using dropout [31]. Further,
2.1. Human-AI Interaction and Instance Xu et al. [32] present a metric that builds on Shannon
Dificulty entropy [33] to compare the dificulty of diferent datasets.</p>
      <p>Moreover, Ethayarajh et al. [26] extend this metric, called
With the latest ascent in research on human-AI inter-  -usable information, to apply it to single instances. This
action, the deployment of AI in automated systems is metric, the pointwise  -information (PVI), is used to
compare the dificulty of single instances with respect with the confidence of human and AI agents available,
to a model family  . According to the authors, PVI, in we compare their performance and confidence for single
contrast to related metrics, quantifies the dificulty of instances.
single instances accounting for how much information Figure 1 illustrates and compares instance performance
can be extracted beyond the label distribution. and confidence for ten randomly sampled instances of</p>
      <p>Human’s perceived dificulty. Most works focus on the ImageNet-16H dataset. The left part represents the
estimating the perceived dificulty of humans by aggre- AI agent’s output, while the right part shows the
hugating over multiple humans. For example, Peterson et al. man’s self-reported confidence. Based on this, we can
[23] asses the disagreement of two decision-makers. In make several observations. First, task performance is not
their work, the authors define the dificulty of a single necessarily a reliable factor to determine the perceived
instance by using the disagreement of crowdsourcing dificulty of an instance. For example, instances seven
annotators. To measure the individual perceived difi- to nine have the same performance but difer greatly in
culty of instances, Steyvers et al. [10] use a diferent their reported confidence. Second, human and AI agents
approach. The authors use the ordinal responses of hu- can perceive diferent instances as easy, e.g., the AI agent
mans to determine their confidence. Similarly, Bıyık et al. has low confidence for instances seven and eight, while
[34] determine human dificulty by asking participants the humans have medium to high confidence. Third, the
about their perceived task dificulty. human self-reported confidence scores difer among
participants, as can be seen from the standard deviation of
confidence. We argue that these observations represent
3. Empirical Validation Using ifrst evidence in the direction of our hypotheses. More
Public Datasets specifically, we can see that the average performance of
an instance cannot be used to determine the perceived
Before our experiment, we examine reports of other stud- dificulty of an instance for individual humans. Instead,
ies to investigate the diferences in the perceived dificulty other metrics need to be considered.
of single instances. Therefore, we utilize publicly avail- Moreover, the high standard deviation of human
conable datasets, e.g., CIFAR10-H [23], modelvshuman [22], fidence for almost all instances indicates that humans
or ImageNet-16H [10]. However, the first two datasets, difer in their perceived dificulty. Consequently, the
diCIFAR10-H, and modelvshuman do not contain individ- versity of humans must be taken into consideration when
ual human confidence or uncertainty measurements. In- designing human-AI interaction systems.
stead, the authors of the datasets [22, 23] estimate the
instance dificulty by aggregating the performance of
multiple human annotators for instances. ImageNet-16H 4. Experimental Design
is the only dataset containing human dificulty
measurements in the form of self-reported confidence levels, e.g., Our experiment is based on a mixed-efects model that
low, medium, and high. To compare these reported confi- combines a between-subject and a within-subject design
dence levels with the commonly used technique of aver- [35]. We follow the notion of existing works and
unage instance performance, we transformed the confidence derstand confidence as a proxy for dificulty [ 36]. More
levels to 0 (low), 0.5 (medium), and 1 (high). precisely, we measure the dificulty of the human and</p>
      <p>Further, we fine-tune an eficientnet model with the the AI agent by two metrics: the commonly used
confidataset for two epochs and use Monte-Carlo Dropout to dence [10] and the PVI score [26] as a novel metric that
receive the perceived confidence of the AI agent. Finally, considers the label distribution. We measure the
confidence of AI agents by Monte-Carlo Dropout [31] and for
humans via probabilities, e.g., using a scale between 0%
and 100%. We use a binary classification task to avoid
participants having to assign multiple probabilities. The
binary classification allows us to observe one
probability, e.g., an image showing a cat with a probability of
80%, and calculate the complementary probability, e.g.,
the complementary probability that the image does not
represent a cat is 20%.</p>
      <p>Part I
Start &amp; consent</p>
      <p>Attention check</p>
      <p>Part II
Task 1:</p>
      <p>Image
Classification</p>
      <p>Task 2:
Classification
on Tabular Data
Measure difficulty
for each instance
performance of human and AI agents is a consequence
of the probabilities they assign to each class and, thus,
their uncertainty, we argue that the perceived dificulty
for an instance can difer even for instances both agents
have classified the same. Thus, we hypothesize:</p>
      <sec id="sec-5-1">
        <title>Hypothesis 2. There are instances for which human and AI agents make the same prediction but difer in their perceived dificulty.</title>
        <p>Within our experiment, we leverage two datasets for
the tasks of Part II to compare the perceived dificulty
of human and AI agents. Both conditions comprise the
same tasks. We chose two diferent tasks: one visual
classification task and one based on tabular data. Research
shows the impact of diferent cognitive styles on
participants’ task performance (i.e., [25, 37, 38]). By choosing a
visual and a text-based task, we account for participants’
diferent cognitive styles and individual perceptions of
dificulty. Accordingly, participants will be asked to
conduct a questionnaire in which we determine their
cognitive styles.We assess these styles by using the validated
items of Kirby et al. [37] (initially presented by
Richardson [39]). The items of the cognitive style questionnaire
are randomly arranged as suggested by Kirby et al. [37].
All items are measured on a five-point Likert scale. We
hypothesize:</p>
      </sec>
      <sec id="sec-5-2">
        <title>Hypothesis 3. Humans with distinct cognitive styles perceive the dificulty of single instances diferently.</title>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>The preliminary experiment design is illustrated in</title>
      <p>Figure 2. The experiment is composed of three parts.</p>
      <p>Part I includes consent, instructions, and a demographics
questionnaire. Next, Part II comprises two binary classi- 5. Discussion
ifcation tasks—one visual and one textual—and, finally,
Part III is a questionnaire on cognitive styles. In both In this work, we propose an experimental design to
intasks, we measure the perceived dificulty of participants vestigate the diference in perceived dificulty between
and AI for single instances. human and AI agents for single instances. To build a</p>
      <p>In our experiment, we have two treatments. First, foundation, we assess related work and common
metas we want a consistent comparison of the perceived rics to estimate instance dificulty. Yet, these studies
dificulty between humans and AI, we must ensure they insuficiently scrutinize consistent dificulty estimations
have access to the relevant information. However, in between humans and AI. By first examining a related
contrast to humans, the AI agent has access to the label dataset, we show the discrepancies in dificulty
estimadistribution through its training prior to the task. As we tion by applying conventional approaches. Thus, we
want to examine this efect, we show humans the label propose an experiment design that paves the way for a
distribution before conducting the task in one condition. broad main study in which we: (I) Develop a consistent
Thus, we hypothesize: way to measure the perceived dificulty of instances, (II)
Examine the diferences in the perceived dificulty of
huHypothesis 1. Access to the information on label distri- man and AI agents, (III) Investigate a potential cause in
bution has an impact on humans’ perceived dificulty of varying perceived dificulty of humans.
single instances. Through our main study, we expect to contribute to</p>
      <p>After providing a consistent way to measure the the ongoing discussion on developing automated and
confidence—as a proxy for the perceived dificulty—of the reliable AI agents interacting with humans with diverse
human and the AI agent, we want to examine the difer- skills and capabilities. Moreover, our results will provide
ences in their perceived dificulty of instances. Previous guidance not only in research but also in practice on
research identified subsets of data on which either human designing human-AI interaction systems. A promising
or AI agent has a better performance, e.g., [22]. As the field of research lies ahead.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <source>chine Learning</source>
          , PMLR,
          <year>2022</year>
          , pp.
          <fpage>5988</fpage>
          -
          <lpage>6008</lpage>
          . [27]
          <string-name>
            <given-names>E.</given-names>
            <surname>Barboni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.-F.</given-names>
            <surname>Ladry</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Navarre</surname>
          </string-name>
          , P. Palanque,
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <article-title>systems models</article-title>
          ,
          <source>in: Proceedings of the 2nd ACM</source>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <source>computing systems</source>
          ,
          <year>2010</year>
          , pp.
          <fpage>165</fpage>
          -
          <lpage>174</lpage>
          . [28]
          <string-name>
            <given-names>V.</given-names>
            <surname>Roto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Palanque</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Karvonen</surname>
          </string-name>
          , Engaging au-
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <source>tomation: 5th IFIP WG 13</source>
          .6 Working Conference,
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <source>HWID</source>
          <year>2018</year>
          , Espoo, Finland,
          <source>August 20-21</source>
          ,
          <year>2018</year>
          , Re-
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <source>vised Selected Papers 5</source>
          , Springer,
          <year>2019</year>
          , pp.
          <fpage>158</fpage>
          -
          <lpage>172</lpage>
          . [29]
          <string-name>
            <given-names>A.</given-names>
            <surname>Fügener</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Grahl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gupta</surname>
          </string-name>
          , W. Ketter, Collabo-
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <year>2019</year>
          . [30]
          <string-name>
            <given-names>N.</given-names>
            <surname>Ståhl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Falkman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Karlsson</surname>
          </string-name>
          , G. Mathiason,
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <source>18th International Conference, IPMU 2020</source>
          , Lisbon,
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <surname>Portugal</surname>
          </string-name>
          , June 15-19,
          <year>2020</year>
          , Proceedings,
          <string-name>
            <surname>Part I</surname>
          </string-name>
          18,
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <surname>Springer</surname>
          </string-name>
          ,
          <year>2020</year>
          , pp.
          <fpage>556</fpage>
          -
          <lpage>568</lpage>
          . [31]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Gal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Ghahramani</surname>
          </string-name>
          ,
          <article-title>Dropout as a bayesian ap-</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <article-title>chine learning</article-title>
          ,
          <source>PMLR</source>
          ,
          <year>2016</year>
          , pp.
          <fpage>1050</fpage>
          -
          <lpage>1059</lpage>
          . [32]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Song</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Stewart</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ermon</surname>
          </string-name>
          , A
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <surname>constraints</surname>
          </string-name>
          , arXiv preprint arXiv:
          <year>2002</year>
          .
          <volume>10689</volume>
          (
          <year>2020</year>
          ). [33]
          <string-name>
            <given-names>C. E.</given-names>
            <surname>Shannon</surname>
          </string-name>
          ,
          <article-title>A mathematical theory of commu-</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <source>communications review 5</source>
          (
          <year>2001</year>
          )
          <fpage>3</fpage>
          -
          <lpage>55</lpage>
          . [34]
          <string-name>
            <given-names>E.</given-names>
            <surname>Bıyık</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Palan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. C.</given-names>
            <surname>Landolfi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. P.</given-names>
            <surname>Losey</surname>
          </string-name>
          ,
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          arXiv:
          <year>1910</year>
          .
          <volume>04365</volume>
          (
          <year>2019</year>
          ). [35]
          <string-name>
            <given-names>L.</given-names>
            <surname>Riefle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Benz</surname>
          </string-name>
          , T. Tomar, “may i help you?”: Ex-
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <article-title>use of conversational agents</article-title>
          ,
          <source>ICIS 2022 Proceedings</source>
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          (
          <year>2022</year>
          ). [36]
          <string-name>
            <given-names>B.</given-names>
            <surname>Kompa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Snoek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. L.</given-names>
            <surname>Beam</surname>
          </string-name>
          , Second opinion
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <article-title>chine learning</article-title>
          ,
          <source>NPJ Digital Medicine</source>
          <volume>4</volume>
          (
          <year>2021</year>
          )
          <article-title>4</article-title>
          . [37]
          <string-name>
            <given-names>J. R.</given-names>
            <surname>Kirby</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. J.</given-names>
            <surname>Moore</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. J.</given-names>
            <surname>Schofield</surname>
          </string-name>
          , Verbal and
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          <source>psychology 13</source>
          (
          <year>1988</year>
          )
          <fpage>169</fpage>
          -
          <lpage>184</lpage>
          . [38]
          <string-name>
            <given-names>L.</given-names>
            <surname>Riefle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Hemmer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Benz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Vössing</surname>
          </string-name>
          , J. Pries,
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          <article-title>derstanding of explanations</article-title>
          ,
          <source>ICIS 2022 Proceedings</source>
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          (
          <year>2022</year>
          ). [39]
          <string-name>
            <given-names>A.</given-names>
            <surname>Richardson</surname>
          </string-name>
          ,
          <article-title>Verbalizer-visualizer: a cognitive</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          <string-name>
            <surname>style dimension.</surname>
          </string-name>
          ,
          <source>Journal of mental imagery</source>
          (
          <year>1977</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>