<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta />
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        e.g., pertaining to semantic visual interpretation, natural / multimodal human-machine interaction, high-level
data analytics (e.g., for post hoc diagnostics, dispute settlement) [
        <xref ref-type="bibr" rid="ref34 ref35">35, 34</xref>
        ]. This will necessitate –amongst other
things– human-centred qualitative benchmarks relevant to different facets of machine (visual) intelligence, and
incorporation of multifaceted hybrid AI solutions to fulfil such requirements. We claim that what appears as a
spectrum of complex challenges (in autonomous driving) are actually rooted to one fundamental methodological
consideration that needs to be prioritised, namely: the design and implementation of human-centred technology
based on a confluence of techniques and perspectives from AI+ML, Cognitive Science &amp; Psychology,
HumanMachine Interaction, and Design Science. Like in many applications of AI, such an integrative approach has so
far not been mainstream also within autonomous driving research.
      </p>
      <p>
        Multimodal Interaction: The Case of Visuospatial Complexity. The central focus of the research presented
in this paper is to develop a systematic methodology for the development of human-centred benchmarks for
visual sensemaking and multimodal interaction for the domain of autonomous vehicles. Our work is driven by a
bottom-up interdisciplinary approach –combining techniques in AI, Psychology, HCI, and design– for the study
of embodied multimodal interaction in diverse ecologically valid conditions, with a particular emphasis on
lowspeed driving and complex urban environments (possibly also with unstructured traffic and people dynamics).
We emphasise driver behaviour, as well as behaviour of other users such as pedestrians, bicyclists, bikers whilst
focussing on natural interactions (e.g., gestures, joint attention) amongst involved stakeholders.
Key Contribution. We present a cognitive model of visuospatial complexity in everyday driving situations that
may be used as a basis to design, evaluate, standardise, and test &amp; validate visuospatial sensemaking capabilities
of computational models of visual intelligence (for autonomous vehicles). We posit that our methodology can
be used as a basis of developing human-centred benchmark datasets for visual sensemaking in autonomous
driving encapsulating key cognitive principles founded on empirically established (context-specific) embodied
multimodal interaction patterns under naturalistic driving conditions in everyday life [
        <xref ref-type="bibr" rid="ref1 ref24">1, 24</xref>
        ]. The proposed model
of visuospatial complexity is based on quantitative, structural, and dynamic roadside attributes identifiable
from the stimuli under consideration. As an example application of our model, we report work-in-progress
concerning the development of one instance of a dataset where the central emphasis is on the evaluation of
visuospatial complexity of driving stimuli. Both our proposed methodology and its human evaluation are driven
by behavioural research in visual and spatial cognition methods as pursued within cognitive and environmental
psychology. Such bottom-up interdisciplinary studies combining AI, psychology, HCI, and design are needed to
better appreciate the complexity and spectrum of varied human-centred challenges in autonomous driving.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Naturalistic Human Factors in Everyday Driving</title>
      <p>
        We approach cognitively motivated human-factors in autonomous driving from the viewpoint of visual attention
(e.g. visual search, change detection), spatial cognition (e.g. cognitive mapping), and multimodal communication
(e.g. joint attention, gesturing). We conduct systematic behavioural studies with a representative set of scenarios
that encompasses challenging, hazardous, or cognitively demanding events in the urban set-up. The scenarios
involve a range of visuospatial complexity levels in streetscape that reflects different areas of the world.
Multimodal Interactions (in the Streetscape) Multimodal interactions are highly varied and they can convey
very different meanings depending on the task, the environmental setting, or other social, and dynamic factors.
Even if they seem everyday monotonic events, they are complex problems for today’s autonomous systems.
Embodied interactions (e.g., on the street) use a combination of communication modalities such as gestures
and speech to pass a message or solve a conflict (Fig. 1a), or head rotations and eye contact to establish joint
MODALITIES Examples
SPEECH Ask - Warn - Shout - Scold - Give directions
HEAD MOVEMENTS Turn towards the street - Tilt to a direction - Nod for disapproval - Slide for notice - Protrusion for warning
FACIAL EXPRESSIONS Smiles - Frowns - Wrinkle - Eye Rolling - Cut Eye - Eyebrows Raising - Lips Movement - Mouth Movement
GESTURES Emblematic (hitchhiking, stop) - Iconic (direction of movement) - Deictic (pointing) - Beat (irritation, gratitude)
BODY POSTURES Crossing arms - Idle - Stand with the back to the street - Lean towards - Stand besides a car/bike
GAZE Eye contact - Seek attention - Follow other’s gaze - Follow a moving object - Aversion - Point towards a direction
AUDITORY CUES Honking - Car engine - Traffic light sound - Brakes - Siren - Voice
PRACTICAL ACTIONS (Select Sample)
Cyclist standing beside the bike while putting on the helmet indicates he will start cycling soon - Pedestrian on a wheelchair beside the street with
hands on the wheels indicates his intention to cross - Pedestrian pushing the button for the traffic light, not monitoring the traffic and relying on
auditory cues - Driver switching the light on and off to indicate that the cyclist can cross, combined with a node to the direction of movement
Driver slowing down/accelerating indicates intentions to give / take priority.
attention (Fig. 1b) (Table 1). A practical action when observed by an agent can be considered both as a physical
action as well as a message (e.g., a pedestrian stepping on the street indicating willingness to get priority; Fig.
1c). Consequently, street users receive and comprehend different implicit cues that may alter their behaviour
based on their observations and their expectations of being observed (e.g. a pedestrian is moving to the side of
the street based on the auditory cue of a car approaching and the estimation of street’s width; Fig. 1d).
Visuospatial Complexity Visual complexity and its effect on visual attention and cognition has been
investigated by different disciplines - including cognitive science, psychology, computer science, marketing [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. Several
definitions have been proposed as well as several methodologies to measure it or measure specific attributes of it
(e.g. visual saliency, spatial frequency, subband entropy) [
        <xref ref-type="bibr" rid="ref16 ref19 ref26">16, 19, 26</xref>
        ]. Visual complexity has been broadly defined
as the level of detail and intricacy contained within an image or a scene [
        <xref ref-type="bibr" rid="ref32">32</xref>
        ]. With the term visuospatial
complexity we consider the combination of visual and spatial characteristics that both coexist in dynamic naturalistic
scenes where a person acts. In addition to the visual related features such as colour, contrast, number of objects
etc., we also study the size and the structure of the space, the connection of its parts, etc. Visual attention is
being controlled involuntary by external stimulus features (exogenous) such as luminance, and colour; as well as
by voluntary internal cognitively relevant features of the world (endogenous) such as people, objects [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. Visual
search refers to an intentional look in the scene and the visuospatial complexity of the scene is highly correlated
to visual search performance and the visual attention patterns as they evolve over time [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. However, it is
challenging to separate the bottom-up from the top-down cognitive processes (such as visual search, attentional
priming, foraging) throughout a naturalistic visual attention task in driving, and therefore to establish which
dimension of complexity (low-level scene properties, or high-level semantic properties) affect the visual attention
patterns [
        <xref ref-type="bibr" rid="ref17 ref18 ref29 ref3 ref40">29, 3, 40, 18, 17</xref>
        ]. For this reason it is necessary to establish a taxonomy of attributes and parameters
(pertaining to visuospatial complexity) quantifying the levels of complexity reliably based on their effects on
people’s performance and visual attention patterns.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Visuospatial Complexity in Everyday Driving: A Human-Centred Model</title>
      <p>
        Developing a taxonomy and model of visuospatial complexity necessitates the identification of objective physical
attributes that affect visuospatial perception and cognitive functions, such as visual search in everyday activities
(e.g., driving, walking, cycling). However, the majority of existing empirical evidence is based on studies focusing
on static real-world scenes, or abstract shapes and symbols [
        <xref ref-type="bibr" rid="ref13 ref14 ref15">13, 15, 14</xref>
        ]. For our taxonomy of attributes, we focus
on naturalistic dynamic scenes, and we take into consideration recent work about attentional synchrony on
dynamic real-world scenes demonstrating the effect of the dynamic attributes on visual attention [
        <xref ref-type="bibr" rid="ref20 ref31">31, 20</xref>
        ]. We
categories the attributes into (A1–A3):
A1. Quantitative Attributes. Objective environmental factors – referring to low-level (lines, edge, contour ) and
middle-level (corners, orientation) features of the scene – and their relation to visual perception and cognition,
have been a topic of research on computational image processing [
        <xref ref-type="bibr" rid="ref30 ref7">7, 30</xref>
        ]. Psychophysical and neurophysiological
research has shown that these attributes are part of early human visual processing [
        <xref ref-type="bibr" rid="ref31">31</xref>
        ]. Moreover, the physical
space and its functional clutter are general properties of all scenes that are immediately accessible to humans
[
        <xref ref-type="bibr" rid="ref23">23</xref>
        ]. Clutter refers to overabundance of information, including the number of components of the scene, their
variety, as well as the density of the viewed entities from the current perspective. Additionally, luminance and
colour, and their respective retinotopic gradients (contrast), are typically employed in computational models
[Width / Depth / Height] The dimensions of the physical space, the area coved by the visual stimulus.
      </p>
      <p>
        No. components (objects, people, shapes etc.)
No. colours
No. shapes / objects
No. objects in a defined area
No. edges of objects in a scene / visual area
Amount of light emitted / reflected from the scene
Particularly prominent objects based on characteristics of colour, luminance and contrast
Compare similarity in luminance, contrast, structure, or spatial and orientation information
Recurrence of the same element of group of elements or characteristics on a line, a grid or a patterns in space
Resilience to transformation and movement. Types: reflectional, rotational, translational, helical, fractal
Organised elements based on a recognised structure, Varies from poorly organised to highly organised
The state of being all the same kind/ diverse. Varies from single shape repeated to multiple distinct shapes
Variations in a placement rule across a surface or line; Varies from simple polygons to abstract shapes
The ratio between empty and full space
No. elements that are part of a group
No. people or objects moving in the scene
Abrupt changes over-time (in luminance, colours, etc.)
The rate of change of position with respect to time | Move or facing towards
of saliency. Similarities between a target with the background in various low-level features or in the spatial
arrangement also correlates with visuospatial complexity [
        <xref ref-type="bibr" rid="ref26">26</xref>
        ].
      </p>
      <p>
        A2. Structural Attributes. Structural attributes refer to the relations that the elements form due to positioning
in space, or the overall distribution in the viewing scene. Many times structural attributes of the scene can
mitigate the effects of visuospatial complexity increasing due to low-level quantitative attributes. High regular
arrangements of elements in space or in the scene are related to low visuospatial complexity, and more randomised
arrangements contribute to higher complexity levels [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. Repetition, symmetry, order, and grouping are related
to a decrease in spatial complexity [
        <xref ref-type="bibr" rid="ref27 ref38">38, 27</xref>
        ]. Likewise, heterogeneity (e.g., a single shape repeated vs. multiple
distinct shapes), regularity of shapes and objects (e.g., simple polygons vs. more abstract shapes), openness (i.e.,
relation between full and empty space) and their relationship to complexity have been investigated in the fields
of urban design and architecture [
        <xref ref-type="bibr" rid="ref27 ref5">5, 27</xref>
        ].
      </p>
      <p>
        A3. Dynamic Attributes. Studies of visual attention on videos reveal a significant impact of dynamic features
of the scene such as motion and flicker on top-down as well as bottom-up cognitive processes. Even the same
environment can be perceived as more complex due to an increase in velocity of the observer. Taking into
consideration dynamic aspects in scene analysis promotes gaze allocation prediction, as previous studies suggest
that dynamic cues are predominant, and reliable predictors for greater number of people [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]. Concerning
topdown processes, clusters of attention often coincide with semantically rich objects such as eyes, hands, vehicles,
etc. On the other hand, cortical analysis shows a selective response to moving elements on the scene, meaning
that we are able to notice moving objects even if we are not looking for them [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ]. Flicker, i.e., abrupt changes
in luminance over time, has been shown to pop-out independently of observers attention. Moreover, speed and
direction are related to motion but they have different dependences on the way they are encoded during a
cognitive task [
        <xref ref-type="bibr" rid="ref20 ref8">8, 20</xref>
        ].
      </p>
      <p>
        The role of visuospatial complexity on visual attention depends on the kind of stimuli employed and on the
way in which visuospatial complexity is defined, manipulated, and measured. Consequently, there are more
significant factors influencing the role of visuospatial complexity on human behaviour that we do not consider
in this paper but we cannot ignore. The nature of the task or aspects of familiarity, working memory, as well as
previous knowledge of the context, are some them. For instance, semantic grouping of the scene based on previous
knowledge during driving indicating where targets expected to appear. Additionally, studying dynamic scenes
reveals the significance of time on visual attention, and a continuous interplay between low-level and high-level
cognitive functions that affect the visual search performance. Analysing visual attention over time shows that
first fixations are more influenced by environmental properties, but after the first 200ms fixation locations are
determined predominantly by top-down processes [
        <xref ref-type="bibr" rid="ref29">29</xref>
        ].
Towards a model of visuospatial complexity. Visuospatial complexity is a function of not only individual factors
but also interactions between them. For instance, clutter can increase complexity, however high complexity may
also be achieved with a combination of low clutter and low contribution of structural attributes (e.g. order,
heterogeneity). A systematic analysis of different combinations of attributes can provide a better understanding
of the aggravation or counterbalance dynamics between the attributes and their effect on human behaviour.
To empirically define a model of visuospatial complexity for dynamic naturalistic scenes, we use the taxonomy
introduced in Table 2 to develop a number of scenes in virtual reality (VR) that differ on the combination
of attributes involved, as well as on the degree of each attribute the scene contains (Fig. 3a).1 This way
we create a matrix of possible scenes (Fig. 2), and use a number of them as the dynamic stimulus for our
empirical study. The matrix provides an indication for a scale of visuospatial complexity based on the current
knowledge from the literature on the effect of individual attributes on human behaviour. However, the empirical
result of our study will be used to confirm or decline this indication, and further investigate the weight of
each attribute to the overall effect on human behaviour (Fig. 3b). To represent these interactions between the
attributes and their contribution to the overall complexity level we are developing a dynamic bubble diagram
(Fig. 3c) that demonstrates the correlation coefficients of visuospatial complexity attributes and the behavioural
metrics. We expect a stronger correlation between the visual search inefficiency and factors of clutter than
size or structural attributes. Additionally, we expect quantitative attributes to have a negative correlation with
visual search performance, while the structural ones to have a positive correlation. Although this effect can be
counterbalanced.
4
      </p>
    </sec>
    <sec id="sec-4">
      <title>Visual Search in the Streetscape: An Ongoing Empirical Study</title>
      <p>
        A visual search task in driving. Driving performance has been shown to depend on visuospatial complexity in
this case the streetscape, but also on the complexity of the driving task, and other factors that affect cognitive
resources, such as individual differences, fatigue, age, dual task requests, etc [
        <xref ref-type="bibr" rid="ref11 ref39">11, 39</xref>
        ]. We are primarily concerned
about the extent of which environmental factors of the dynamic scene affect the driving performance, such as
1An important aspect not considered in this paper is high-level event perception [
        <xref ref-type="bibr" rid="ref37">37</xref>
        ]; presently, work is also in progress to include
high-level event segmentation primitives complementing the developed visuospatial complexity model.
locating and responding to traffic, directions or warning signs, hazard detection, and vehicle control etc. Theses
actions are closely connected to visual perception and visual search processes. Visual search in driving primarily
refers to detection of signs, obstacles, or road users, as well as situation awareness, including perception of hazard.
Four scenarios in four levels of visuospatial complexity. We collect and analyse a list of scenarios with
dynamic scenes in streetscape that differ on the events and the interactions between the street users involved.
The scenarios are collected from online video footage, following the indications about hazard situations published
on Accidence Research report by the German Insurance Association [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], and categorised based on the nature
of the incident (Table: 1). For the behavioural study in progress we choose four scenarios that differ on the
multimodal interaction event and the practical actions used (Table: 1), and we replicate them in VR environment,
in four different levels of visuospatial complexity. One example scenario on inattentive crossing by pedestrian
is presented in Fig. 4. It is a typical scenario of pedestrian on a low traffic street with no official indication
for crossing such as zebra crossing or traffic light. The pedestrian who is crossing more than one lane, firstly
deviates from his path on the sidewalk and then he performs a slow speed crossing, while he is altering his visual
attention states multiple times, between monitoring traffic (t1), crossing the street being inattentive (t2), sharing
attention with a driver (t3), and continue to cross the next lane being visually inattentive (t4). The visual search
task for the driver is to detect the pedestrian and elaborate on his behaviour. The pedestrian - considered from
the driver’s point of view as the dynamic visual target - passes gradually from the peripheral view of the driver
to the fovea area, while he is changes orientation and speed.
      </p>
      <p>As the visual search process can be divided in different stages (early phase of search guidance, verification
component), no single metric can describe the effect of visuospatial complexity on a search process entirely;
rather it a combination of several metrics for performance and physiological measurements such as eye-tracking
which can provide a more precise assessment of the effect of visuospatial complexity on human behaviour. We
collect eye-tracking data from the participants, as well as reaction time, and accuracy to the target (Table 3)2.
For the analysis we define areas of interest and distractors according to the nature of the scene and the interactions
involved. We further combine the results from the physiological and performance measurements and use them
to trace the effect of visuospatial complexity levels. We expect the combination of measurements to provide
insights about the attentional cost and the underline cognitive processes affected by visuospatial complexity.
5</p>
    </sec>
    <sec id="sec-5">
      <title>Outlook</title>
      <p>
        The road ahead (in autonomous driving) presents several opportunities: next-generation developments to achieve
(legal) deployment and user / ethical acceptability require attention to a wider spectrum of autonomous driving
2The study is conducted using HTC Vive Pro Eye headset with embedded eye-tracking device. Unity Game Engine was used for
the VR simulation of the scenes. VR is currently firmly established as an experimental tool; it combines a high degree of control
with high level of ecological validity, and has shown important benefits for basic psychology and neuroscience research, especially on
aspects of visual perception and spatial cognition [
        <xref ref-type="bibr" rid="ref22 ref28 ref41 ref6">6, 22, 28, 41</xref>
        ].
challenges from the viewpoint of human-centred technology design encompassing natural human-machine
interaction, industry-wide benchmarking, standardisation and statutory validation of technology components. The
work reported in this paper is being conducted in synergy with research in computational cognitive vision [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ],
particularly in relation with the development of integrated vision and semantics solutions for active, explainable
visual sensemaking for autonomous vehicles [
        <xref ref-type="bibr" rid="ref34 ref35">35, 34</xref>
        ]. The empirically founded model developed in this paper
also constitutes the basis for computational assessment and analysis of visuospatial complexity. Towards this,
we use the empirical results to develop a declaratively grounded computational model integrating KR and vision
([
        <xref ref-type="bibr" rid="ref33 ref36">36, 33</xref>
        ]) towards: a) learning behavioural models from empirical data, i.e., using (statistical) relational learning
to induce weighted dependencies and constraints within the model, and b) utilising the computational model
to examine and construct benchmarking datasets ensuring variety in driving conditions, naturalistic multimodal
human interaction events, challenging and diverse environments, etc. Out goal is that a computational model
of visuospatial complexity may provide a measure for the complexity of benchmarking datasets/scenarios and
thus serve as a guideline to develop realistic human-centred evaluation criteria. We posit that such
interdisciplinary studies are needed to better appreciate the complexity and spectrum of varied human-centred challenges
in autonomous driving.
      </p>
      <p>References</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>M.V.</given-names>
            <surname>Angrosino</surname>
          </string-name>
          , Naturalistic Observation,
          <article-title>Qualitative essentials</article-title>
          , Left Coast Press,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>GDV: The German Insurance Association</surname>
          </string-name>
          , Compact Accident Research,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>E.</given-names>
            <surname>Awh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.V.</given-names>
            <surname>Belopolsky</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Theeuwes</surname>
          </string-name>
          , '
          <article-title>Top-down versus bottom-up attentional control: a failed theoretical dichotomy</article-title>
          .',
          <source>Trends Cogn Science</source>
          ,
          <volume>16</volume>
          (
          <issue>8</issue>
          ),
          <fpage>437</fpage>
          -
          <lpage>443</lpage>
          , (
          <year>2012</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>M.</given-names>
            <surname>Bhattt</surname>
          </string-name>
          and
          <string-name>
            <given-names>J.</given-names>
            <surname>Suchan</surname>
          </string-name>
          , '
          <article-title>Cognitive Vision and Perception: Deep Semantics Integrating AI and Vision for Reasoning about Space, Motion, and Interaction'</article-title>
          ,
          <source>in ECAI</source>
          <year>2020</year>
          , (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>G.</given-names>
            <surname>Boeing</surname>
          </string-name>
          , '
          <article-title>Measuring the complexity of urban form and design'</article-title>
          , Urban Design International,
          <volume>23</volume>
          (
          <issue>4</issue>
          ),
          <fpage>281</fpage>
          -
          <lpage>292</lpage>
          , (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>C.J.</given-names>
            <surname>Bohil</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Alicea</surname>
          </string-name>
          , and
          <string-name>
            <given-names>F.A.</given-names>
            <surname>Biocca</surname>
          </string-name>
          , '
          <article-title>Virtual reality in neuroscience research and therapy'</article-title>
          ,
          <source>Nature Rev Neurosc</source>
          ,
          <volume>12</volume>
          , (
          <year>2011</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>M.J.</given-names>
            <surname>Bravo</surname>
          </string-name>
          and
          <string-name>
            <given-names>H.</given-names>
            <surname>Farid</surname>
          </string-name>
          , '
          <article-title>A scale invariant measure of clutter'</article-title>
          ,
          <source>Journal of Vision</source>
          ,
          <volume>8</volume>
          (
          <issue>1</issue>
          ), (
          <year>2008</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>M.</given-names>
            <surname>Carrasco</surname>
          </string-name>
          , '
          <article-title>Visual attention: The past 25 years'</article-title>
          ,
          <source>Vision Research</source>
          ,
          <volume>51</volume>
          (
          <issue>13</issue>
          ),
          <fpage>1484</fpage>
          -
          <lpage>525</lpage>
          , (
          <year>2011</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>F.</given-names>
            <surname>Doyon-Poulin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Ouellette</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.M.</given-names>
            <surname>Robert</surname>
          </string-name>
          , '
          <article-title>Review of visual clutter and its effects on pilot performance: New look at past research'</article-title>
          ,
          <source>in Digital Avionics Systems Conference (DASC)</source>
          .
          <source>IEEE/AIAA 31st</source>
          , (
          <year>2012</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>M.P.</given-names>
            <surname>Eckstein</surname>
          </string-name>
          , '
          <article-title>Visual search: A retrospective'</article-title>
          ,
          <source>Jounal of Vision</source>
          ,
          <volume>14</volume>
          (
          <issue>5</issue>
          ),
          <fpage>1</fpage>
          -
          <lpage>36</lpage>
          , (
          <year>2011</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>J.</given-names>
            <surname>Edquist</surname>
          </string-name>
          ,
          <source>The Effects of Visual Clutter on Driving Performance, Ph.D. dissertation</source>
          , Monash University,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>A.</given-names>
            <surname>Cavalcante</surname>
          </string-name>
          et al., '
          <article-title>Measuring streetscape complexity based on the statistics of local contrast and spatial frequency'</article-title>
          ,
          <source>PLoS ONE</source>
          ,
          <volume>9</volume>
          (
          <issue>2</issue>
          ), (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>A.</given-names>
            <surname>Forsythe</surname>
          </string-name>
          , '
          <article-title>Visual complexity: Is that all there is?', Engin Psychology and Cognitive Ergonomics</article-title>
          , HCII,
          <fpage>158</fpage>
          -
          <lpage>166</lpage>
          , (
          <year>2009</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>A.</given-names>
            <surname>Gartus</surname>
          </string-name>
          and
          <string-name>
            <given-names>H.</given-names>
            <surname>Leder</surname>
          </string-name>
          , '
          <article-title>Predicting perceived visual complexity of abstact patterns using computational measures: The influence of mirror symmetry on complexiity perception'</article-title>
          ,
          <source>PLoS ONE</source>
          ,
          <volume>12</volume>
          (
          <issue>11</issue>
          ), (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>S.</given-names>
            <surname>Harper</surname>
          </string-name>
          , E. Michailidou, and
          <string-name>
            <given-names>R.</given-names>
            <surname>Stevens</surname>
          </string-name>
          , '
          <article-title>Toward a definition of visual complexity as an implicit measure of cognitive load'</article-title>
          ,
          <source>ACM Transactions on Applied Perception</source>
          ,
          <volume>6</volume>
          (
          <issue>2</issue>
          ),
          <fpage>1</fpage>
          -
          <lpage>18</lpage>
          , (
          <year>2009</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>J.</given-names>
            <surname>Henderson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Chanceaux</surname>
          </string-name>
          , and
          <string-name>
            <given-names>T.</given-names>
            <surname>Smith</surname>
          </string-name>
          , '
          <article-title>The influence of clutter on real-world scene search: Evidence from search efficiency and eye movements'</article-title>
          ,
          <source>Journal of Vision</source>
          ,
          <volume>9</volume>
          (
          <issue>1</issue>
          )(32),
          <fpage>1</fpage>
          -
          <lpage>8</lpage>
          , (
          <year>2009</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <surname>Á</surname>
          </string-name>
          .
          <article-title>Kristjánsson and Á</article-title>
          .G. Ásgeirsson, '
          <article-title>Attentional priming: recent insights and current controversies'</article-title>
          , Current Opinion in Psychology,
          <volume>29</volume>
          ,
          <fpage>71</fpage>
          -
          <lpage>75</lpage>
          , (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>T.</given-names>
            <surname>Kristjánsson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.M.</given-names>
            <surname>Thornton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Chetverikov</surname>
          </string-name>
          , and nsson Á. Kristjánsson, Á., '
          <article-title>Dynamics of visual attention revealed in foraging tasks'</article-title>
          ,
          <source>Cognition</source>
          ,
          <volume>194</volume>
          , (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>C.</given-names>
            <surname>Madan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Bayer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Gamer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Lonsdorf</surname>
          </string-name>
          , and T. Sommer, '
          <article-title>Visual complexity and affect: Ratings reflect more than meets the eye'</article-title>
          , Frontiers in Psychology,
          <volume>8</volume>
          , (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>P.K.</given-names>
            <surname>Mital</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Smith</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.L.</given-names>
            <surname>Hill</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.M.</given-names>
            <surname>Henderson</surname>
          </string-name>
          , '
          <article-title>Clustering of gaze during dynamic scene viewing in predicted by motion'</article-title>
          ,
          <source>Cognitive Computing</source>
          ,
          <volume>3</volume>
          ,
          <fpage>5</fpage>
          -
          <lpage>24</lpage>
          , (
          <year>2011</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          <source>[21] BMVI: Fed Ministry of Transport &amp; Digital Infrastructure, Report by the ethics commission on automated and connected driving, Germany</source>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>B.</given-names>
            <surname>Olk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Dinu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. J.</given-names>
            <surname>Zielinski</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>Kopper</surname>
          </string-name>
          , '
          <article-title>Measuring visual search and distraction in immersive virtual reality</article-title>
          .', Royal Society open science,
          <volume>5</volume>
          , (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>S.</given-names>
            <surname>Park</surname>
          </string-name>
          , T. Konkle,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Oliva</surname>
          </string-name>
          , '
          <article-title>Parametric coding of the size and clutter of natural scenes in the human brain'</article-title>
          ,
          <string-name>
            <surname>Cerebral</surname>
            <given-names>Cortex</given-names>
          </string-name>
          , (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>A.T.</given-names>
            <surname>Reader</surname>
          </string-name>
          and
          <string-name>
            <given-names>N.P.</given-names>
            <surname>Holmes</surname>
          </string-name>
          , '
          <article-title>Examining ecological validity in social interaction: problems of visual fidelity, gaze, and social potential', Culture and</article-title>
          brain,
          <volume>4</volume>
          (
          <issue>2</issue>
          ),
          <fpage>134</fpage>
          -
          <lpage>146</lpage>
          , (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>R.</given-names>
            <surname>Rosenholtz</surname>
          </string-name>
          , '
          <article-title>A simple saliency model predicts a number of motion popout phenomena'</article-title>
          ,
          <source>Vision Research</source>
          ,
          <volume>39</volume>
          (
          <issue>19</issue>
          ), (
          <year>1999</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>R.</given-names>
            <surname>Rosenholtz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          , and
          <string-name>
            <given-names>L.</given-names>
            <surname>Nakano</surname>
          </string-name>
          , '
          <article-title>Measuring visual clutter'</article-title>
          ,
          <source>Journal of Vision</source>
          , 7, (
          <year>2007</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>N.A.</given-names>
            <surname>Salingaros</surname>
          </string-name>
          , '
          <article-title>Complexity in architecture and design'</article-title>
          ,
          <source>Oz Journal</source>
          ,
          <volume>36</volume>
          ,
          <fpage>18</fpage>
          -
          <lpage>25</lpage>
          , (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>P.</given-names>
            <surname>Scarfe</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Glennerster</surname>
          </string-name>
          , '
          <article-title>Using high-fidelity virtual reality to study perception in freely moving observers'</article-title>
          ,
          <source>Journal of Vision</source>
          ,
          <volume>15</volume>
          (
          <issue>9</issue>
          ), (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>H.</given-names>
            <surname>Schütt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Rothkegel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Trukenbrod</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Engbert</surname>
          </string-name>
          , and
          <string-name>
            <given-names>F.</given-names>
            <surname>Wichmann</surname>
          </string-name>
          , '
          <article-title>Disentangling bottom-up versus top-down and low-level versus high-level influences on eye movements over time'</article-title>
          ,
          <source>Journal of Vision</source>
          ,
          <volume>19</volume>
          (
          <issue>3</issue>
          ), (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [30]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Semizer and M.M. Michel</surname>
          </string-name>
          , '
          <article-title>Natural image clutter degrates over search performance independenly of set size'</article-title>
          ,
          <source>Journal of Vision</source>
          ,
          <volume>19</volume>
          (
          <issue>4</issue>
          ),
          <fpage>1</fpage>
          -
          <lpage>16</lpage>
          , (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [31]
          <string-name>
            <given-names>T.</given-names>
            <surname>Smith</surname>
          </string-name>
          and
          <string-name>
            <given-names>P.</given-names>
            <surname>Mital</surname>
          </string-name>
          , '
          <article-title>Attentional synchrony and the influence of viewing task on gaze behavior in static and dynamic scenes'</article-title>
          ,
          <source>Journal of Vision</source>
          ,
          <volume>13</volume>
          (
          <issue>8</issue>
          ),
          <fpage>16</fpage>
          -
          <lpage>16</lpage>
          , (
          <year>2013</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          [32]
          <string-name>
            <given-names>J.G.</given-names>
            <surname>Snodgrass and M. Vanderwart</surname>
          </string-name>
          , '
          <article-title>A standardized set of 260 pictures: Norms for name agreement, image agreement, familiarity, and visual complexity</article-title>
          .',
          <source>Journal of Experimental Psychology: Human Perception and Performance</source>
          ,
          <volume>6</volume>
          ,
          <fpage>174</fpage>
          -
          <lpage>215</lpage>
          , (
          <year>1980</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          [33]
          <string-name>
            <given-names>J.</given-names>
            <surname>Suchan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bhatt</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Schultz</surname>
          </string-name>
          , '
          <article-title>Deeply semantic inductive spatio-temporal learning'</article-title>
          ,
          <source>in 26th International Conference on Inductive Logic Programming</source>
          , eds.,
          <source>James Cussens and Alessandra Russo</source>
          , volume
          <year>1865</year>
          , (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          [34]
          <string-name>
            <given-names>J.</given-names>
            <surname>Suchan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bhatt</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Varadarajan</surname>
          </string-name>
          , '
          <article-title>Out of sight but not out of mind: An answer set programming based online abduction framework for visual sensemaking in autonomous driving'</article-title>
          ,
          <source>in IJCAI 2019</source>
          , pp.
          <fpage>1879</fpage>
          -
          <lpage>1885</lpage>
          , (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          [35]
          <string-name>
            <given-names>J.</given-names>
            <surname>Suchan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bhatt</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Varadarajan</surname>
          </string-name>
          , '
          <article-title>Driven by commonsense: On the role of human-centred visual explainability for autonomous vehicles'</article-title>
          ,
          <string-name>
            <surname>in</surname>
            <given-names>ECAI</given-names>
          </string-name>
          , (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          [36]
          <string-name>
            <given-names>J.</given-names>
            <surname>Suchan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bhatt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Vardarajan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.A.</given-names>
            <surname>Seyed</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Yu</surname>
          </string-name>
          , '
          <article-title>Semantic Analysis of (Reflectional) Visual Symmetry: A HumanCentred Computational Model for Declarative Explainability'</article-title>
          ,
          <source>Advances in Cognitive Systems</source>
          ,
          <volume>6</volume>
          ,
          <fpage>65</fpage>
          -
          <lpage>84</lpage>
          , (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          [37]
          <string-name>
            <given-names>B.</given-names>
            <surname>Tversky</surname>
          </string-name>
          and
          <string-name>
            <given-names>J. M.</given-names>
            <surname>Zacks</surname>
          </string-name>
          ,
          <article-title>Oxford handbook of cognitive psychology, chapter Event perception</article-title>
          ,
          <fpage>83</fpage>
          -
          <lpage>94</lpage>
          , Oxford: Oxford,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref38">
        <mixed-citation>
          [38]
          <string-name>
            <surname>P.A. van der Helm</surname>
          </string-name>
          , '
          <article-title>Simplicity versus likelihood in visual perception: From surprisals to precisals</article-title>
          .',
          <string-name>
            <surname>Psychological</surname>
            <given-names>Bulletin</given-names>
          </string-name>
          ,
          <year>126s</year>
          ,
          <fpage>770</fpage>
          -
          <lpage>800</lpage>
          , (
          <year>2000</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref39">
        <mixed-citation>
          [39]
          <string-name>
            <given-names>M.</given-names>
            <surname>Wierda</surname>
          </string-name>
          , Vision in Vehicles - V,
          <article-title>chapter Beyond the eye: cognitive factors in drivers' visual perception</article-title>
          ,
          <volume>97</volume>
          -
          <fpage>105</fpage>
          ,
          <year>1996</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref40">
        <mixed-citation>
          [40]
          <string-name>
            <given-names>C.C.</given-names>
            <surname>Williams</surname>
          </string-name>
          and
          <string-name>
            <given-names>S.M.</given-names>
            <surname>Castelhano</surname>
          </string-name>
          , '
          <article-title>The changing landscape: High-level influences on eye movement guidance in scenes'</article-title>
          ,
          <source>vision</source>
          ,
          <volume>3</volume>
          (
          <issue>33</issue>
          ), (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref41">
        <mixed-citation>
          [41]
          <string-name>
            <given-names>C.J.</given-names>
            <surname>Wilson</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Soranzo</surname>
          </string-name>
          , '
          <article-title>The use of virtual reality in psychology: A case study in visual perception'</article-title>
          ,
          <source>Advances in Computational Psychometrics</source>
          , (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>