Machine Learning Analysis of Pedestrians’ Hazard Anticipation
from Eye Tracking Data
Andreas Gregoriades1, Loukas Dimitriou2, Maria Pampaka3, Harris Michail1, and Michael
Georgiades 4

1
  Cyprus University of Technology, Limassol, Cyprus
2
  University of Cyprus, Nicosia, Cyprus
3
  The University of Manchester, Manchester, UK
4
  Neapolis University of Pafos, Pafos, Cyprus


                     Abstract
                     Pedestrian tourists are considered the most vulnerable road users of urban mobility
                     environments. Tourists are a special category of pedestrians, exhibiting different visual
                     behaviour to residents due to their enthusiasm and unfamiliarity with the environment. These
                     characteristics of pedestrian tourists influence their hazard perception. Eye tracking technology
                     became popular in investigating pedestrian safety problems after findings that eye-gaze
                     behaviour is linked with human attention and hazard anticipation. The majority of eye-tracking
                     studies to date use stationary technology that may miss out important properties relating to
                     environmental dynamics that cannot be accurately simulated. This study employs a novel
                     method utilising mobile eye-tracking technology in naturalistic settings to investigate the
                     application of machine learning in identifying differences between tourist and resident
                     pedestrians’ visual behaviour. Eye tracking metrics are used to train an Extreme Gradient Boost
                     (XGBoost) model to examine whether tourists have less hazard perception than residents when
                     visiting destinations with opposite driving conventions to their own. Preliminary results with
                     a small group of tourist and resident pedestrians demonstrate how such machine learning
                     models could be used in real-time by agent-based systems that utilise wearable augmented
                     reality displays to support hazard perception of tourist pedestrians.

                     Keywords 1
                     Pedestrian safety, Mobile eye tracking, XGBoost classification, Wearable Augmented Reality
                     Displays.


1. Introduction
   Pedestrians constitute 22% of all road traffic fatalities worldwide [1]. Tourists represent a vulnerable
category of road users due to their unfamiliarity with the environment and traffic rules at destinations
they visit [2][3]. These, in combination with their curiosity and enthusiasm for exploration, reduce their
hazard perception making tourists more vulnerable to accidents.
   Hazard perception refers to the anticipation of traffic hazards, is a critical component of road safety
and is directly linked to pedestrians’ and drivers’ visual behaviour [4]. Novice road users in comparison
with experienced road users are less effective in anticipating safety-relevant traffic events [5][6] thus a
need to assist novice users based on experts’ knowledge is required [4]. Tourists visiting countries with

1
 ATT 2022: 12th International Workshop on Agents in Traffic and Transportation held in conjunction with IJCAI-ECAI 2022
EMAIL: andreas.gregoriades@cut.ac.cy (A. 1); lucdimit@ucy.ac.cy (A. 2); maria.pampaka@manchester.ac.uk (A. 3);
harris.michail@cut.ac.cy (A. 4); michael.georgiades@gmail.com (A. 5)
ORCID: 0000-0002-7422-1514 (A. 1); 0000-0002-8427-058X (A. 2); 0000-0001-5481-1560 (A. 3); 0000-0002-8299-8737 (A. 4); 0000-
0002-5930-8814 (A. 5)

©️ 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
CEUR Workshop Proceedings (CEUR-WS.org)
different road conventions compared to those in their origin country are considered as novice road users
since they are not familiar with neither the environment, nor the driving rules. Research indicates that
54% of tourists experience problems crossing the road as pedestrians when they are unfamiliar with a
country’s road convention [7]. A recent driver safety study attributed many accidents involving tourist
drivers to attentional factors, such as increased cognitive workload and reduced hazard perception while
adapting to new traffic environments (i.e. finding out where to look at and from where to expect
incoming cars at intersections) [8]. Like drivers, to maintain their safety when crossing roads,
pedestrians have to process different information from the environment such as the road infrastructure
characteristics, traffic density, incoming vehicles direction, sounds, other pedestrian movements and
other visual distractions such as illuminated advertisements [9]. Despite this variety of factors that
influence pedestrian safety most studies use either surveys [10], or unrealistic synthetic environments
through simulated experiments to draw conclusions. These approaches, however, might miss important
information [11] and lead to inaccurate conclusions. Recently, eye tracking technology has emerged as
a promising method for analyzing safety by examining humans’ visual behaviour and head movements.
This technology enables capturing richer information relevant to pedestrians’ safety after evidence
linking eye movements with attention. Therefore, what we are looking at often corresponds to what we
are attending cognitively. Mental attention is a cognitive resource that is limited and is consumed when
processing visual information. Changes in cognitive effort are manifested through changes in the
physiology of the human eye such as pupil dilations that regulate the amount of light that enters the eye.
Pupils’ dilations have also been linked to cognitive factors such as workload, surprise, attention,
emotional arousal [12] and hazard anticipation[13]. Therefore, eye tracking data can be used to reveal
information about mental processes, which are not easily accessible through behavioural performance
measures alone. Most eye tracking studies, however, use stationary eye tracking in a lab with a computer
screen as the visual scene to be analysed. Such artificial settings suffer from low realism.
    In this study, we use mobile eye tracking equipment (i.e., Tobii glasses) to investigate differences
between two groups of participants (i.e., residents and tourists) during a road crossing scenario and use
eye tracking metrics to train a machine learning (ML) model to analyse hazard perception. This is a
continuation of our previous work in naturalistic eye tracking [14]. The main aim of the study is to
investigate how the visual behaviour of resident and tourist pedestrians differs when it comes to hazards
and how tourists hazard anticipation can be enhanced in real time using wearable technology and tacit
knowledge (visual behaviour for improved hazard perception) from expert road users. Due to the large
volume of data generated by eye tracking equipment the use of ML is utilised in this study to assist in
automating the hazard perception support of pedestrians. A popular classification technique is used,
namely, XGBoost, to train a model and evaluate its patterns to identify differences between tourists and
residents. Such models can be utilised by autonomous agents in transportation networks to evaluate the
state of pedestrians’ hazard anticipation in real time.
    The study proposes the use of Wearable Augmented Reality Displays (WARD) equipped with eye
tracking capability to assist in pedestrians’ hazards anticipation. WARD can overlay the physical world
with digital content and thus can provide users with important safety information. They have been
recently used to enhance the situation awareness of highway workers[15] and in other applications such
as healthcare [16]. The analysis highlights ways to enhance tourist safety by analyzing eye gaze, eye
pupil and head movements data through ML techniques and the use of such models in prospective multi-
agent systems that would utilise WARD and Fog/Edge computing to process and store data [17], and
make inferences about hazard anticipation of pedestrians.
    The paper is organized as follows. The next section describes the literature relating to workload,
hazard anticipation and use of eye tracking techniques for their assessment. This is followed by a section
describing the methodology followed by the data preprocessing and training of an XGBoost model. The
paper concludes with preliminary results, a discussion and conclusions.

2. Literature Review
   Pedestrian safety countermeasures usually include infrastructural changes or policies [2]. These,
however, are expensive and time consuming. Alternatively, technological developments such as
Intelligent Transportation Systems that integrate information and communication technologies within
the transportation infrastructure can provide real time information to road users through software agents
running on the Fog/Edge of a computer network, offering time-sensitive and location-based services
for autonomous vehicles or WARD agents as in our case [31].
    Currently, in-vehicle information systems are becoming popular means for improving driver hazard
perception, and the same principle could be applied for pedestrians. Such systems are designed based
on knowledge from safety literature highlighting that visual and auditory clutter overloads drivers and
pedestrians, and in combination with distractions (i.e., advertisements, cell phones, in-vehicle
conversations, billboards, etc.) can interfere with visual search strategies [19] such as, where to focus
attention to infer information necessary to maintain safety and anticipate risk. These, in combination
with traffic rules and environment unfamiliarity, can increase the risk of accidents [20]. This risk
becomes critical when pedestrians are crossing roads or engage with new road infrastructure where
hazard perception is key [21]. Tourists are more vulnerable to overloading due to unfamiliarity with the
environment and the driving conventions which decreases their hazard perception [3].
    Cognitive workload and hazard perception of pedestrians can be measured in different ways.
Methods are usually categorized into subjective, performance-based, and physiological. Subjective
techniques, including surveys, are commonly used in safety research, with NASA task load index being
a popular option. Performance-based measures are usually classified in terms of primary and secondary
task performance where users are engaged in a secondary task to the main task and through its
assessment the level of workload can be inferred, since it shows user’s spared cognitive capacity;
examples of such measures include vehicle lane departures, lateral deviations, task completion time,
reaction time, accuracy, and error rates, with poor performance indicating that a driver is overloaded.
Physiological measures encompass audiology, cardiovascular, respiratory, neurophysiology, and
ophthalmic physiology [22]. The latter refers to metrics of pupil dilations, and is the method used in
this study. Physiological methods are advantageous, because they can assess workload and hazard
perception in real-time since increased workload or anticipation of hazard evoke small and involuntary
fluctuations in pupil dilation due to the attentional demands imposed by a cognitive task [23].
    Pupil dilation is considered a reliable and valid psychophysiological measure of the amount of
cognitive effort devoted to a given visual stimuli and a suitable indicator for hazard perception [24].
Techniques for assessing workload using pupillometry include the task-evoked pupillary response [25]
and the index of cognitive activity (ICA) [12]. Both refer to variations in dilation as reactions to
cognitive processing. Hazard perception can be assessed in addition to pupillometry using metrics such
as fixation count on certain areas of the visual scene (i.e., Areas of interest), fixation duration, time to
first fixation, fixation heat map, scan-paths, etc.
    When it comes to designing solutions to tackle safety, eye tracking insights can be used for
specifying and validating requirements in of prospective systems, with methods such as the one
presented in[18] demonstrating the optimization of an information system’s user interface based on
knowledge extracted from eye tracking experiments.

    2.1.         Eye tracking and Visual behaviour
   Visual attention refers to the cognitive processes that guide the selection of relevant information
from visual scenes and the filtering out of irrelevant information. Numerous studies agree that where
we direct our eyes, and for how long often correlate with attentional selection and information
processing [26]. Eye-tracking technology has gained popularity in different safety and consumer-related
disciplines, due to its ability to provide accurate information on visual attention of participants in
experiments. Attention is expressed in the form of fixations (i.e., periods when eyes are relatively still
and the visual system absorbs information about what is being looked at) and visualization patterns
represented by saccades, which refer to fast eye movements between stimuli.
   Most eye-tracking studies assume that humans process the visual information they focus on [27],
also known as the eye-mind hypothesis [28] with Hyönä [29] reporting this is true when the visual
environment in front of our eyes is relevant to the task we are about to perform.
   Eye tracking technology can be used alone or in combination with other physiological measures
such as electrooculography. Alone they can measure participants’ attention in a visual environment
through 1) eye fixations in Areas of Interest (AOI) that denote important areas in a visual scene (e.g.,
hazard perception indicators in this study), 2) saccadic behaviour that indicate visual exploration or
confusion, 3) goal seeking behaviour through scan-path analysis of transitions of fixations between
AOIs, and 4) head movements and acceleration in three axes as used in this study.
    Eye tracking data is obtained on a millisecond basis therefore millions of observations can be
recorded which makes the analysis of such raw data difficult. Eye tracking software can provide support
for this process, but this is limited to generic features that might not fit the needs of the researcher/user..
Therefore, researchers use ML to analyse raw eye tracking data to generate new features from the data.
Example ML applications on eye tracking data include classification problems from different domains
including safety, psychology and systems design. No work however has been reported on using mobile
eye tracking with ML on pedestrian hazard perception.
    On the contrary the applications of eye tracking in safety related studies mainly used stationary
apparatus with static imagery [30]. Mobile eye tracking studies are limited due the complexity of
analyzing the data and difficulty in controlling confounding variables but can collect richer data and
can be used to explain pedestrian safety issues in a more holistic manner. Alternative methods to eye
tracking include pedestrian observation from cameras or pedestrian simulators with limited pedestrian
movement [11]. The latter though suffers from a limited level of realism that could lead to biased
conclusions.
    Recent efforts to improve pedestrian safety focus on increasing pedestrians’ situation
awareness/hazard perception through assistive wearable technologies and sensors. Such systems
identify potential dangers and provide warnings. Eye tracking results from this study can be used to
infer the information needs of pedestrians and thus assist when designing new pedestrian safety
technologies such as wearable augmented reality glasses by minimizing the design space exploration
of potential technological solutions. However, more in depth understanding of such designs can be
achieved when both simulations and naturalistic visual behaviour analyses are combined, as in this
study. The latter can explore the problem in its natural settings to identify main requirements which
can be then refined and evaluated through simulations.

3. Methodology
    This study proposes a novel analytical framework for pedestrian safety analysis based on explainable
ML models with mobile eye tracking data extracted from a road crossing experiment with six
participants in naturalistic settings. The technique used to assess hazard perception is based on pupillary
variations when fixating in AOI linked to hazardous areas in the visual scene. The method consists of
six steps: (1) designing the experiment, selection of road section and specifying the hazard AOIs; (2)
participant selection, familiarization with equipment and procedure, conducting the experiment and
collecting the data; (3) selecting the set of eye movement variables from the raw data that are relevant
to hazard perception and cognitive workload; (4) pre-processing data associated with hazard AOIs; (5)
Train an XGBoost classifier to identify patterns that explain the link between the selected variables and
the target variable (tourists or resident pedestrians); and (6) interpreting the learned XGBoost model
using an explainable ML method to highlight differences among the two groups.

    3.1.         Participants and Procedure
   The experiment used a targeted sample of participants. Six healthy participants (3 male and 3 female)
with an average age of 35 years (standard deviation of 7.19 years) participated in this study. Three
participants were tourists from a country with opposite driving rules to Cyprus and the other three were
residents of Cyprus familiar with the rules and area where the experiment was conducted. The purpose
and procedure of the experiment were explained to participants in advance, and it was made clear that
they could abort the experiment at any time. Participants were given enough time to familiarize with
the Tobii glasses prior to undertaking the experiment. The eye-tracker was calibrated for each
participant prior to the experiment. The selection of the experiment’s location was based on tourist
visitation data obtained from local authorities. The road section (Figure 1) includes a one-way road
enclosed by buildings (occluded) with no visibility of incoming vehicles from side roads until the
pedestrians reached the intersection points (kerb), to avoid providing any advance information to
participants regarding imminent vehicles and hence test their vigilance level and visual behaviour at
kerb (K). The road section had no clear indication of incoming vehicle direction, hence helped to convey
visual scanning behaviours from habitual knowledge of participants from different driving conventions,
that might expect incoming vehicles from specific direction. To minimise the effect of confounding
variables such as different traffic flow for each participant, the experiment was conducted when the
road section under study was closed for planned road works (2 days) but not at the points of interest
and not visible by the participants. Neither the residents nor the tourists were aware of this closure so
as not to influence their visual behaviour. Participants engaged the pre-specified route of Figure 1, with
a free walk scenario (observe stimuli as they do normally ), crossing the road at the point shown in
the figure. The flow of vehicles in this road section is indicated with the blue arrows.


                            K
                                                 Hazard AOI

   Figure 1. Left: Road section and path that participants had to follow (red arrows, allowed vehicles
 flow in blue arrows, kerb indicated by the letter K). Right: Wide angle view of the infrastructure on
   the kerb with the hazard AOI as an overlay polygon and the path participants followed in dotted
       arrows. The name of the shop has been blurred for anonymity purposes(gray rectangle).

    3.2.         Extracting patterns from XGBoost models
    Data from the experiment were used to train a ML model that allowed the automated identification
of patterns from the large amount of eye tracking data. ML techniques are classified into supervised
and unsupervised techniques, the former requiring labelled data while the latter not, and they are applied
into two broad types of problems, namely classification (e.g., predicting the probability that an input
set of data is associated with a categorical output variable) and regression (e.g., predicting the value
rather than a category) problems. ML techniques have been applied on eye movement data from
cognitive science and other domains to classify human performance on cognitive tasks. In ML, features
correspond to measurable properties that best characterize the problem under study.
    The classification technique employed in this study is used to classify participants based on eye
tracking data into tourist or residents and in this way identify patterns that characterize each group. For
this task a binary Extreme Gradient Boost (XGBoost) [32] model was developed in Python. XGBoost
is a newer version of the gradient boosting decision tree model and has been extensively used in various
problems including human performance due to its excellent performance by preventing overfitting with
regularization and computational efficiency. XGBoost is an ensemble method since it combines
multiple classification and regression trees, each composed of several nodes that represent variables in
the dataset. During XGBoost, multiple decision trees are trained, with each tree built based on the result
of the previous developed tree.
     The binary XGBoost was trained to predict hazard anticipation using as features the visual
behaviour data of participants in the road crossing experiment. In this work, for each participant several
observations were made by the eye tracker regarding participants’ eyes and head movements every
millisecond. These were set as initial features of the classifier and were refined afterwards through
feature selection while developing the model.
    Although ML algorithms have been proven effective in prediction problems, when it comes to the
interpretation of their results they are classified into white and black box techniques [33]. Black box
techniques such as the XGBoost ensemble method used in this work and deep neural networks, can
produce better results in terms of performance but provide little insights into how they come up with
the outcome. Thus, they suffer from low interpretability. This is important in situations such as the one
presented in this study where the reason that cause low hazard anticipation is key to specifying the
requirements for future solutions to this problem.
    A popular method for black box explanation used in this study is the SHAP (SHapley Additive
explanation) technique that is based on cooperative game theory [34], and allows black box ML models
to explain the impact of input variables on the model’s outcome. SHAP is used to extract the patterns
of the trained XGBoost model.

    3.3.         Outcome variable and data labelling
    An XGBoost classifier is trained in this study with outcome class variable being the tourist/resident
property of participants and predictors their eye tracking observations. The class variable was annotated
using the Tobii Pro Lab software based on the tourist/resident status of participants.
    Independent variables or features used to train the model included eye movement parameters
relevant to cognitive effort and hazard anticipation based on the literature. The initial features were
developed using statistical properties of raw eye tracking variables such as mean and standard deviation
of fixation coordinates (on 2 dimension space), pupillary and head movement data. Pupillary data was
selected for fixations data points only, to account for points in the visual scene that participants attended
cognitively. Additional features were the average duration values for fixations, and their normalized
metrics by dividing the total duration of each parameter by the total time participants took engaging the
intersection. Other pupillary features were the standardised pupillary score and the pupillary moving
standard deviations. Head movement data was collected from the glasses’ gyroscope readings
(degrees/s) with regards to lateral and vertical head movements (yaw, pitch).
     As a hazard perception feature we considered the successful fixations of participants in an area of
the road from where potential hazards could occur [35], and these have been prespecified by the
researchers based on incoming vehicles, bicycles and mopeds direction. Fixating on the area where a
hazard may occur does not always indicate that the hazard has been anticipated, however, studies [36]
have shown that most glances in that area occur because the road user is anticipating the hazard. Thus,
the hazard anticipation variable was determined from pedestrians’ fixations on specific intersection
areas and was coded from the recorded eye tracker videos and the raw data that were collected by the
device. Specifically, to code the hazard AOIs a predetermined hazard zone was defined on the visual
scene coordinates, that refer to the position of incoming vehicles (Figure 1). Anticipatory glances were
labelled accordingly based on participants’ fixations in these hazard AOIs. Tobii Pro Lab software
allows the mapping of eye gaze/fixation data onto still images (2D) such as snapshots of the
environment under study (road crossing in our case) and thus hazard hits could be assessed using AOI
coordinates on this image.

    3.4.         Data pre-processing
    Raw data from the eye tracker consisted of multidimensional time series data of all variables of
interest (262K rows of data tuples for all participants during their interaction with the road crossing
scenario) collected approximately every 20 milliseconds (50Hz sampling rate) and containing
information about, horizontal, vertical head movements, head accelerometer data, fixation coordinates
on x and y axes, fixation duration, eye gaze direction on x, y axes, eye gaze coordinates, saccades, AOI
(Hazard areas) hits, pupil diameter, and pupil assessment confidence (indicator of the eye tracker's
level of confidence it is correctly measuring the pupil). For the road crossing scenario only tuples that
fall within the time interval in which pedestrians engaged with the road crossing scenario were selected,
resulting in 62K observations from the initial dataset, thus eliminating irrelevant data prior to
participants entering the scenario zone. The time intervals were specified in the Tobii pro lab software
prior to extracting the raw data. Tuples’ timestamps were used for filtering the eye tracking data.
    Additional filtering activities were performed to ensure sufficient eye tracking quality. Thus, all data
points with low pupil confidence (estimated by the eye tracker) were removed. To normalize the data,
the z score, mean and median values were extracted and used as additional features during the ML
training, to find the best fitting for the problem. Raw data pre-processing was performed in Python.
   To identify differences in hazard perception and workload level among the two groups, the number
of variations in pupils’ diameter in the timeseries was used similarly to [12]. The points of interest in
the pupils’ time-series data refer to temporal sections with increases/decreases of pupil’s diameter
which indicate participants’ cognitive processing due to hazard anticipation. We hypothesize that, since
the pupillary data selected for processing are the ones associated with hazard AOIs, the pupillary
variations are a response to hazard perception for those AOIs. To achieve this the moving standard
deviation approach is employed with a sliding window on the pupillary timeseries (Figure 2). This was
essential to minimise noise and highlight pupilar variations relevant to hazard perception. Thus, the
higher the pupillary variations the higher the hazard anticipation. The size of the sliding window was
identified after trying a range of window sizes until the points of high variability became apparent and
the trained ML model produced best predictive performance.
                   Pupil size (mm)


                                                             Convert raw data into
                                                             moving standard
                                                             deviations


                                                    Time (ms)
   Figure 2. Example use of moving standard deviation on a portion of the pupil dilation data of one
participant. Top series(blue) refer to raw data and bottom to moving standard deviations used later as
                          one of the features in the XGBoost model training.

    Head movement data was also collected corresponding to gyroscope measures, yaw-movement
(turning of the head sideways), pitch-movement (nodding the head up/down), roll-movement (tilting
the head to the side). Positive yaw values denote high lateral head variability, when participants were
turning their heads at the kurb possibly to scan for hazards. A positive slope in the yaw data denotes
turning the head to the right and a negative slope turning to the left. The accelerometer measures the
acceleration of the head unit and denotes the speed with which the head turns due to surprise.
Acceleration is measured along three axes in the head unit coordinate system in m/s2. Head movement
data was also essential in assessing participants hazard perception.

    3.5.        Training the XGBoost model
    The XGBoost model was trained using the data described in the previous section. During XGBoost
training, firstly, we specified the classifier’s performance metric to optimize. In this case, since the
output variable is binary (tourist/resident) and we wanted to maximise the performance of the model in
predicting both states of the class variable while also addressing class imbalances, the metric we used
was the Area Under the Curve (AUC). During model learning the data was split into training and testing
sets. A 70/30 stratified training-test set split, meaning that 70% of data was used to train the model and
30% was used to validate the model's accuracy after training. This split was stratified, which means that
both the train and test set maintained a roughly equal proportion of data of both classes (tourists,
residents). The trained XGBoost model achieved an AUC of 0.89 which demonstrates the predictive
performance of the model.
    During model optimization, hyperparameter tuning was performed using an exhaustive search
approach (GridSearch). The best performing model is selected based on the hyperparameters and
features that maximise AUC. Feature selection is an important step in training a model in ML that
evaluates the importance of each feature (eye tracking variable) on classifiers performance. During
feature selection several variables were eliminated using the ANOVA F-test and the Scikit-learn
SelectKBest function.

4. Results
    Since XGBoost is not considered an interpretable model, it was imperative to use the SHAP
technique [34] that visualizes the contribution of each input feature. A SHAP value is assigned to each
of the model’s features based on its marginal contribution to the model’s output.
    The SHAP summary plot depicted in Figure 3 shows the features based on their importance in
classifying tourists/residents. The red color presents a larger value of the feature, while the blue
indicates a smaller value. The horizontal axis represents the SHAP value of each data point. A positive
SHAP value on X-axis demonstrates the increase in probability (log odds) that the data belong to a
tourist rather than a resident participant, while negative values imply that the subject is a resident. This
diagram enables us to visualize the relationships between feature values (i.e., red/blue/purple dots on
the horizontal lines next to each feature representing the intensity of the feature value) and their
associations with the output variable(resident/tourist). The impact of each feature on the output variable
can be examined further using the dependency plots depicted in Figure 4 where feature’s values is
plotted on the x-axis and the SHAP value of the feature on the y-axis. Dependency plots enable analysts
to drill down into each feature to examine how the feature interacts with the output variable and with
other model variables.


                                       Resident                           Tourist


    Figure 3. SHAP summary plot with features on Y axis and probability of belonging to either state
of the target variable on x axis

    From the summary plot it can be observed that pupil variability (Pupil feature in figure 3) is the most
influential feature in the model for the data associated with the hazards’ AOI under investigation.
Increased pupil variability (indicated with red colour observations) is associated more with residents.
This indicates that the residents are more aware of the road hazards at the intersection, and they are
anticipating them. On the contrary, tourists show low variability on the hazards AOI. Cyprus follows
the right-hand driving convention while all tourist participants were familiar with the left-hand drive
convention. This difference in prior knowledge resulted in tourists failing to attend the safety relevant
areas when crossing the road which highlighted the need for information support in the form of warning
when incoming vehicles or mopeds are electric and thus silent.
    With regards to fixations on the x-axis (with [0,0] being the upper leftmost point on the coordinate
system, the values of x refer to x after subtracting the mean value of x), since high values of x are
associated more with residents, this denotes that residents were fixating more on the right side of the
intersection that was associated with the path that they had to follow according to Figure 3. High lateral
head movement (Yaw) and high head acceleration (Accelerometer X) were associated more with
residents than tourists which indicate that residents were more vigilant and were scanning the area for
hazards with rabid lateral head movements, in contrast to tourists that scanned the area less. Similarly
high head pitch movements were associated more with residents than tourists that pitch with less
intensity but more often possibly to regularly check the surface where they were walking.
    A drill down analysis of the summary plot is performed using the dependency plots of Figure 4. This
show the link between pupil variability and class variable (resident/tourist on Y axis), in the bottom
chart. Thus, when the pupil variability increases, the odds of being a resident pedestrian also increase,
which denotes that those resident participants have better awareness of the hazard since they attend and
process the hazard AOI more. The dependency plot also show the interactions between Yaw movement
and pupil variability that confirms that residents are more vigilant with more lateral head movements.
This top chart in Figure 4 show that high pupil variability(bright red dots as opposed to blue/purple) is
associated more with lateral head moves for resident pedestrians rather than tourists.

                                 Tourist


                                 Resident


                                  Tourist


                                  Resident


   Figure 4. Dependency plots of pupil variability(bottom) with class variable’s odds on y axis
(resident/tourist), and lateral head movements(top) with pupil variability and class variable (top). The
dotted line distinguish residents from tourists scores.

5. Discussion and Implications
   These findings provide novel insights to understand the heterogeneous impacts of pupil and head
movement on hazard perception using the SHAP method. The results point to the conclusion that
tourists' hazard anticipation needs to be supported. Different techniques to alert pedestrians of imminent
threats include the use of auditory, vibration and visual cues. The former two fail in accurately orienting
the attention of pedestrians on the direction of the risk, thus the use of visual cues in WARD is
recommended.
   Traditional approaches to pedestrian safety challenges, such as infrastructural changes, are
expensive and sometimes less desirable. For instance, the “look left/right” road markings in pedestrian
crossings can warn the pedestrians but such signs do not exist in all road sections, plus they warn only
if the pedestrians attend this information. WARD can resolve these issues through dynamic virtual signs
that attract attention and could be part of a multi-agent system. Such intelligent multi-agent
transportation systems can integrate information and communication technologies within the
transportation infrastructure to provide real-time information to road users through software agents
running on the Fog/Edge of a computer network, offering time-sensitive and location-based services
for autonomous vehicles or WARD agents as in our case [31]. A conceptual model of a multi-agent
system that could be utilised in the scenario presented in this study is depicted in Figure 5. WARD
agents communicate with intersection agents on fog/edge servers that can store and process real time
data from the WARD agents and make predictions (using trained ML models) of the likelihood that
pedestrians will miss a hazardous event and accordingly provide them with warnings. WARD agents
will have data collection and visualization functionalities. The training of the ML models could be
performed on the edge/fog using as training data the eye tracking/head movement observations of all
pedestrians, their geolocation and road infrastructure scene properties (from WARD) using object
detection capabilities as proposed by [16]. Data can be labeled based on pedestrians’ status (expert
residents or novice/tourists).

                                                              Edge/Fog

                            WARD                        ML model
                                                                         Data
                                                        Training
                                                          Intersection Agent
                                                        Inference      Data pre-
                                                          engine      processing


                          Expert
                                         WARD

                                                                                    WARD


                                      Novice
                                                                                   Expert

      Figure 5. Conceptual model of a multi-agent system for pedestrian hazard perception support

6. Conclusions
   This work proposes an analytical method for evaluating the hazard perception of pedestrians using
mobile eye tracking in naturalistic settings. This work constitutes one of the few studies that use mobile
eye tracking in naturalistic settings for pedestrian hazard perception. The study verifies the hypothesis
that resident pedestrians attend more safety-relevant information than tourists and highlights important
information-needs of tourists compared to residents in order to maintain adequate hazard perception.
   The results of this study provide evidence of several issues with regards to tourist pedestrian safety,
that points out the need for hazard anticipation support using WARD and trained ML models that utilise
eye gaze and head movement observations. The study proposes the assessment of pedestrians’ hazard
perception in naturalistic settings through the utilization of a multi-agent system and fog/edge
computing. Future directions, include the simulation of a prospective muti-agent WARD system
architecture using a pedestrian scenario to verify its utility and response time performance.


References
[1]    WHO, Pedestrian safety: a road safety manual for decision-makers and practitioners. 2013.
[2]    J. Wilks, B. Watson, and I. J. Faulks, “International tourists and road safety in Australia:
       Developing a national research and management programme,” Tour. Manag., vol. 20, no. 5, pp.
       645–654, 1999.
[3]    L. (Don) A. N. Dioko and R. Harrill, “Killed while traveling – Trends in tourism-related
       mortality, injuries, and leading causes of tourist deaths from published English news reports,
       2000–2017 (1H),” Tour. Manag., 2019.
[4]    K. Shimazaki, T. Ito, A. Fujii, and T. Ishida, “Improving drivers’ eye fixation using accident
       scenes of the HazardTouch driver-training tool,” Transp. Res. Part F Traffic Psychol. Behav.,
       vol. 51, pp. 81–87, 2017.
[5]    M. J. Lazaro, M. H. Yun, and S. Kim, “Stress-level and attentional functions of experienced and
       novice young adult drivers in intersection-related hazard situations,” Int. J. Ind. Ergon., vol. 90,
       p. 103315, 2022.
[6]    A. Meir and T. Oron-Gilad, “Understanding complex traffic road scenes: The case of child-
       pedestrians’ hazard perception,” J. Safety Res., 2020.
[7]    FCO, “UK citizens and road accidents abroad,” 2008.
[8]    C. Thompson and M. Sabik, “Allocation of attention in familiar and unfamiliar traffic
       scenarios,” Transp. Res. Part F Traffic Psychol. Behav., 2018.
[9]    H. Tapiro, T. Oron-Gilad, and Y. Parmet, “The effect of environmental distractions on child
       pedestrian’s crossing behavior,” Saf. Sci., 2018.
[10]   E. Papadimitriou, S. Lassarre, and G. Yannis, “Human factors of pedestrian walking and
       crossing behaviour,” Transp. Res. Procedia, vol. 25, pp. 2002–2015, 2017.
[11]   S. Kalantarov, R. Riemer, and T. Oron-Gilad, “Pedestrians’ road crossing decisions and body
       parts’ movements,” Transp. Res. Part F Traffic Psychol. Behav., vol. 53, pp. 155–171, 2018.
[12]   R. Buettner, S. Sauer, C. Maier, and A. Eckhardt, “Real-time Prediction of User Performance
       based on Pupillary Assessment via Eye-Tracking,” AIS Trans. Human-Computer Interact., vol.
       10, no. 2, pp. 26–56, 2018.
[13]   N. Kim, J. Kim, and C. R. Ahn, “Predicting workers’ inattentiveness to struck-by hazards by
       monitoring biosignals during a construction task: A virtual reality experiment,” Adv. Eng.
       Informatics, vol. 49, p. 101359, 2021.
[14]   A. Gregoriades and L. Dimitriou, “Naturalistic analysis of tourist pedestrians’ spatial cognition,”
       in Smart Innovation, Systems and Technologies, 2020, pp. 3–13.
[15]   S. Sabeti, O. Shoghli, M. Baharani, and H. Tabkhi, “Toward AI-enabled augmented reality to
       enhance the safety of highway work zones: Feasibility, requirements, and challenges,” Adv. Eng.
       Informatics, vol. 50, p. 101429, 2021.
[16]   Y. Ghasemi, H. Jeong, S. H. Choi, K.-B. Park, and J. Y. Lee, “Deep learning-based object
       detection in augmented reality: A systematic review,” Comput. Ind., vol. 139, p. 103661, 2022.
[17]   T. S. J. Darwish and K. Abu Bakar, “Fog Based Intelligent Transportation Big Data Analytics
       in The Internet of Vehicles Environment: Motivations, Architecture, Challenges, and Critical
       Issues,” IEEE Access, vol. 6, pp. 15679–15701, 2018.
[18]   J. A. Diego-Mas, D. Garzon-Leal, R. Poveda-Bautista, and J. A. Alcaide-Marzala, “User-
       interfaces layout optimization using eye-tracking, mouse movements and genetic algorithms,”
       Appl. Ergon., vol. 78, pp. 197–209, 2019.
[19]   A. Gregoriades, C. Florides, V. P. Lesta, and M. Pampaka, “Driver Behaviour Analysis through
       Simulation,” in 2013 IEEE International Conference on Systems, Man, and Cybernetics, 2013,
       pp. 3681–3686.
[20]   B. Wallace, “Driver distraction by advertising: genuine risk or urban myth?,” Munic. Eng., vol.
       156, no. 3, pp. 185–190, 2003.
[21]   T. Rosenbloom, R. Mandel, Y. Rosner, and E. Eldror, “Hazard perception test for pedestrians,”
       Accid. Anal. Prev., vol. 79, pp. 160–169, 2015.
[22]   R. L. Charles and J. Nixon, “Measuring mental workload using physiological measures: A
       systematic review,” Appl. Ergon., vol. 74, pp. 221–232, 2019.
[23]   S. Goldinger and M. Papesh, “Pupil dilation reflects the creation and retrieval of memories,”
       Curr. Dir. Psychol. Sci., vol. 21, pp. 90–95, 2012.
[24]   K. Kitazawa and T. Fujiyama, “Pedestrian vision and collision avoidance behaviour:
       Investigation of the Information Process Space of pedestrians using an eye tracker,” in
       Pedestrian and Evacuation Dynamics, 2009, pp. 95–108.
[25]   G. Marquart, C. Cabrall, and J. de Winter, “Review of Eye-related Measures of Drivers’ Mental
       Workload,” Procedia Manuf., vol. 3, pp. 2854–2861, 2015.
[26]   J. Theeuwes, A. Belopolsky, and C. N. L. Olivers, “Interactions between working memory,
       attention and eye movements,” Acta Psychol. (Amst)., 2009.
[27]   B. Strobel, M. A. Lindner, S. . Saß, and O. Köller, “Task-irrelevant data impair processing of
       graph reading tasks: An eye tracking study,” Learn. Instr., vol. 55, pp. 139–147, 2018.
[28]   M. A. Just and P. A. Carpenter, “A Theory of Reading: From Eye Fixations to Comprehension.,”
       Psychol. Rev., vol. 87, pp. 329–354, 1980.
[29]   J. Hyönä, “The use of eye movements in the study of multimedia learning,” Learn. Instr., vol.
       20, no. 2, pp. 172–176, Apr. 2010.
[30]   F. Muñoz-Leiva, J. Hernández-Méndez, and D. Gómez-Carmona, “Measuring advertising
       effectiveness in Travel 2.0 websites through eye-tracking technology,” Physiol. Behav., 2019.
[31]   R. Mahmud, K. Ramamohanarao, and R. Buyya, “Application Management in Fog Computing
       Environments: A Taxonomy, Review and Future Directions,” ACM Comput. Surv., vol. 53, no.
       4, Jul. 2020.
[32]   T. Chen and C. Guestrin, “XGBoost: A Scalable Tree Boosting System,” in Proceedings of the
       22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,
       2016, pp. 785–794.
[33]   A. Barredo Arrieta et al., “Explainable Artificial Intelligence (XAI): Concepts, taxonomies,
       opportunities and challenges toward responsible AI,” Inf. Fusion, vol. 58, pp. 82–115, 2020.
[34]   S. M. Lundberg et al., “From local explanations to global understanding with explainable AI for
       trees,” Nat. Mach. Intell., vol. 2, no. 1, pp. 56–67, 2020.
[35]   D. Zhang et al., “Research on drivers’ hazard perception in plateau environment based on visual
       characteristics,” Accid. Anal. Prev., vol. 166, p. 106540, 2022.
[36]   L. Ābele, S. Haustein, L. M. Martinussen, and M. Møller, “Improving drivers’ hazard perception
       in pedestrian-related situations based on a short simulator-based intervention,” Transp. Res. Part
       F Traffic Psychol. Behav., vol. 62, pp. 1–10, Apr. 2019.