1. Introduction

A Gaze Guidance Method in Autonomous Vehicles through Gamification⋆

Mingsong Guo

Chun Xie

Itaru Kitahara

0 0 Center for Computational Sciences, University of Tsukuba , 1-1-1 Tennodai, Tsukuba, Ibaraki, 305-8577 Japan 1 Doctoral Program in Empowerment Informatics, University of Tsukuba , 1-1-1 Tennodai, Tsukuba, Ibaraki, 305-8573 Japan

2021

Directing attention to critical areas of the road plays a key role in facilitating drivers' comprehension of their surrounding environment. Prior research has shown that explicit gaze guidance using visual cues can effectively enhance visual focus. This study explores whether incorporating gamification into gaze guidance can further improve attentional direction when observing real-world driving scenarios. We also examine the impact of such guidance on situational awareness. To this end, we developed gamified gaze guidance content tailored for passengers in autonomous vehicles. An experimental study was conducted with thirty participants who viewed recorded driving videos through a head-mounted display while their eye movements were tracked. The results demonstrate that gamified gaze guidance more effectively directs visual attention than conventional methods, and participants who experienced it also demonstrated an improved understanding of the driving environment.

eol>Gaze guidance Autonomous vehicles Gamification1

1. Introduction

Renner et al. have shown that displaying guiding objects can result in a gaze guidance effect. Renner et al. found that using Augmented Reality (AR) to show the location of a real object can reduce the time needed to find it [ 7 ]. Reif et al. reported improvements in accuracy and a reduction in human errors during warehouse work when significant information, such as item names and storage locations, was displayed using AR through a Head-Mounted Display (HMD) [ 8 ].

However, there are challenges associated with using explicit gaze guidance. Studies have shown that while it can enhance guidance, it may also increase cognitive load by adding extra objects to the eyesight. Also, only displaying gaze guidance objects can lead to boredom and difficulty in gaining a continuous gaze guidance rate. Furthermore, if target objects are obscured or blocked by the guiding objects, it can cause distractions and make it difficult to understand the guiding information [ 7, 8, 9 ]. To solve these problems, we used a gamification approach in this study.

Gamification involves adding game-like elements, such as earning points or leveling up for players, to tasks to encourage participation and achieve goals more quickly. By integrating the narrative of gamification into tasks, we believe a significantly higher gaze guidance rate can be achieved compared to merely displaying gaze guidance objects, as this helps to keep participants’ attention focused longer on the road. Additionally, the anticipation of increased participation may enhance participants’ understanding of their surrounding environments as well.

Furthermore, Muguro et al. measured the reaction time of the driver to traffic hazards in a driving simulator [ 10 ]. As a result, they found that even though there is a slightly increased reaction time compared to not playing a game, the difference was small and statistically consistent. Interestingly, the game condition resulted in more stable reaction times, suggesting the player maintained a steady level of engagement. This shows that the gamification allows the driver to remain engaged during driving without significantly increasing the reaction time for decision-making.

As illustrated in Figure 1, the purpose of this research is to improve the comprehension of specific areas in the surrounding environment for drivers by providing gaze guidance to an appropriate area. To achieve this objective, two experiments were conducted: one to assess the effect of gaze guidance through gamification in real-world driving videos and another to evaluate the comprehension of gaze-guided information.

2. Related works

There are mainly two ways to realize gaze guidance: implicit and explicit. In implicit gaze guidance, gaze guidance is conducted by altering the color and resolution of the video being viewed. A key advantage of this method is that it facilitates natural gaze guidance, as users may not realize the intention behind it. Miyajima et al. utilized this approach in their research aimed at reducing motion sickness by controlling gaze direction. As a result, they found that gaze guidance was possible while not letting participants know the intention of the guidance [ 11 ]. However, in our research, recognizing the intention behind the guidance is needed to understand information about the next guidance point more quickly. Therefore, we employ explicit gaze guidance instead of implicit gaze guidance.

In contrast to implicit gaze guidance, explicit gaze guidance employs visual cues such as arrows or pointers. The advantage of this method is that it not only has a strong effect on gaze guidance but also allows for intentional gaze guidance, where attention is directly guided. Sasamoto et al. applied this approach in caregiving training. By showcasing the gaze of expert physiotherapists in training videos, they observed that learners’ intentional gaze increased. Moreover, the learners gained a deeper understanding of the essential points in the training video [ 12 ].

Furthermore, McCay-Peet et al. have studied how visual catchiness (saliency) affects user engagement. The experiment tested how the visual prominence of important details affects user engagement, focused attention, and emotional response by comparing two conditions: a highsalience condition, where text was presented with a large font size, bold, or italicized styling, and a low-salience condition, where text was displayed in a standard format without bold or italicized styling. As a result, they have found that a visually noticeable location helps the users to process information faster and more efficiently [ 13 ]. This finding is significant for our research, as the goal is for participants to quickly notice and comprehend the gaze-guided areas. Therefore, explicit gaze guidance is used in our research. 2.2.

Effect of gaze guidance for driving

Some research on gaze guidance has been conducted in the context of autonomous vehicles. Han et al. investigated the effect of gaze guidance on detecting hazardous situations faster during takeovers by comparing the difference between high- and low- saliency levels. In this research, a driving simulator was used for experiments. They found that gaze guidance with a high salience level, where a flashing red bounding box appeared around the side mirror to help with blind spot detection, was more effective in reducing crashes during takeovers. In contrast, gaze guidance with a low saliency level, where a static red bounding box was used around the side mirror to assist with blind spot awareness, was less effective [ 14 ].

Similarly, Laura et al. used a driving simulator to examine whether gaze guidance can help drivers avoid pedestrian collisions by measuring reaction times and accident rates. As a result, they found that gaze guidance not only decreases crashes involving pedestrians but also promotes safer driving behavior overall [ 15 ]. Therefore, gaze guidance has a positive effect on driving and can help prevent car accidents. 2.3.

Gaze guidance with gamification

Several studies have integrated gaze guidance with gamification. Bishop et al. developed a virtual reality (VR)-based gamification system designed to train young cyclists aged 11 to 14. Participants rode a virtual bicycle and earned points for looking at specific areas, such as intersections or approaching cars. The results indicated that participants developed better safety habits [ 16 ]. This shows that gamification helps participants stay more engaged in the experiments and is effective for learning.

Muguro et al. investigated how VR/AR-based gamification in autonomous vehicles can maintain user engagement while ensuring road awareness. In the experiment, participants controlled a paddle using a joystick to intercept objects while also responding to pop-up traffic hazards, allowing researchers to measure reaction time, gaze behavior, and cognitive load. The findings show that gamification helps direct drivers’ visual attention to relevant areas and maintains their engagement and road awareness [ 10 ].

Steinberger et al. sought to reduce driver boredom and enhance focus through gamification. If the driver’s gaze is focused on road hazards, pedestrians, and important objects, points will be given. If the driver looks away for too long, the points will be deducted. This approach proved effective in decreasing driver boredom and increasing engagement during extended periods of driving [ 17 ]. This shows that gamification is effective for gaze guidance and can help the driver stay focused on the road.

However, it is worth noting that most of these studies have been conducted in simulated environments rather than real-world settings [ 10, 17, 18, 19 ]. Additionally, there are no direct studies on how gaze guidance combined with gamification affects drivers’ understanding of their surrounding environment. Furthermore, some of the gamification is complicated and contains too much information for drivers, which makes it difficult to track the surrounding environments [ 17, 19, 20 ].

Therefore, to address the limitations of simulated environments and excessive complexity in gamification, our research utilizes real-world recorded driving videos and introduces a straightforward gamification system. This system is designed to help drivers easily understand their environment while still providing strong gaze guidance effects.

3. Gaze guidance using gamification

As shown in Figure 2, gaze guidance processing with gamification is applied to 360-degree driving videos on a frame-by-frame basis to guide the passenger’s eye to the intended location. The overall method is divided into two units: the image processing unit and the human interface unit. The image processing unit is responsible for analyzing the recorded driving scenes, extracting frames, and detecting important objects through segmentation. Once a target object, such as a pedestrian, vehicle, or signboard, is identified, the system places a sphere at the object’s coordinates so that the intended area of focus is clearly highlighted within the immersive environment.

The human interface unit builds on this foundation by introducing elements of gamification. Instead of passively displaying the sphere, the system actively responds to the user’s gaze: when the gaze aligns with the sphere, it enlarges in size, creating an immediate sense of interaction and feedback. If the gaze is held long enough, the object disappears, and a new one is generated shortly after, maintaining continuous engagement. This gamified cycle not only directs visual attention to critical areas of the road but also prevents fatigue or boredom that may occur with static cues. In the following subsections, each unit will be explained in detail, highlighting how they work together to achieve effective and engaging gaze guidance. As shown in Figure 3, the image processing unit serves as the initial stage of the gaze guidance method, managing the preprocessing of recorded 360-degree driving videos. It takes these videos as input and extracts individual frames. Each frame represents a snapshot of the driving environment, providing a structured basis for subsequent analysis and object tracking.

Once the frames are extracted, the next step involves selecting an object for the user to focus on in each frame. Object selection, such as for pedestrians and road signs, is performed using image segmentation, which di3vides the visual data into distinct regions based on shared characteristics such as color, texture, or shape at the pixel level. This process will be discussed in more detail in the implementation section. The segmentation process isolates the target object from its surrounding environment, ensuring that the gaze guidance system can accurately track the object’s location across consecutive frames.

After the target object is identified and segmented, the system proceeds to extract the object’s coordinates from each frame. These coordinates serve as spatial reference points necessary for placing the gaze guidance object within the video. Initially, the coordinates are extracted in two-dimensional Cartesian form. Once all the object coordinates are obtained, the system converts them into polar coordinates to match the 360-degree display environment used in the HMD.

To guide the user’s gaze, the system places a visually distinct sphere at these polar coordinates, ensuring that the object is easily noticeable within the immersive 360-degree environment. The sphere’s placement is carefully calibrated to match the object’s position across frames, maintaining consistency as the video plays. By the end of the image processing unit, 360-degree video with a gaze guidance object is generated as the output. As shown in Figure 4, the human interface unit is responsible for interacting with the user’s gaze behavior. The user’s gaze information is obtained from the HMD in polar coordinates. When the user’s gaze aligns with the coordinates of the gaze guidance object, the object undergoes a visual transformation by enlarging. The scale of this enlargement increases linearly according to the following equation:

Scale() = 1 + - / × ( − 1), (1) where represents the duration of time the user maintains their gaze on the sphere, is the required gaze duration for the sphere to disappear, and is the maximum scale of the sphere. In this experiment, the values are set to = 0.3 second and = 2. When the user continuously gazes at the sphere for one second, the sphere disappears. If the user’s gaze moves away from the sphere’s coordinates, the unit resets and starts counting from the beginning again. A few seconds after the sphere disappears, a new sphere is generated at a different location. This approach effectively directs the user’s attention by strategically placing gaze guidance objects within the driving videos.

4. Implementation

4.1.

Generating gaze guidance videos

This section provides a detailed explanation of how gaze guidance videos are generated. The Segment Anything Model 2 (SAM2) [ 21 ], developed by Meta for image segmentation, is utilized to extract the coordinates of selected objects from driving videos. The chosen objects—such as pedestrians, cars, or signboards—were manually identified. Once the coordinates are converted to a polar coordinate from a two-dimensional Cartesian format, a red sphere object is placed at the specified location using Unity. When the driving videos are played in Unity, the red sphere moves along with the video, following the gaze guidance location to direct the user’s focus. When the user’s gaze coordinates, extracted from the HMD, match this object, the sphere begins to grow. After one second of staring at the object, it disappears. 4.2.

Video data for the experiment

For this research experiment, three different types of videos are used, which are outlined below: • • •

Original 360-degree driving videos 360-degree driving videos + gaze guidance object 360-degree driving videos + gaze guidance object + gamification feature The first video is the original recorded driving scene, specifically filmed in New Orleans [ 22 ]. The second video incorporates the gaze guidance object—a semitransparent red sphere—added to the original video. In this video, the red sphere moves to track specific objects from the footage. The third video builds upon the second by adding a gamification feature; when the viewer’s gaze aligns with the red sphere, it enlarges and eventually disappears. These are the three videos that are used in the experiments.

5. Experiment

5.1.

Gaze guidance by gamification

The objective of the first experiment was to evaluate the effect of gaze guidance through gamification in real-world driving videos. Three different types of videos are used, as mentioned in the previous chapter: original videos, videos with gaze guidance objects (no gamification features), and videos with gaze guidance objects plus gamification features. Also, three different driving scenes were prepared: turning right, stopping at a red light, and driving straight. Therefore, there are nine videos in total (3 original videos, 3 without gamification videos, and 3 with gamification videos). Table 1 provides a quick layout of the videos. Each participant views three different types of videos, each depicting different driving scenes. Each video lasts approximately 20 to 30 seconds, with a resolution of 3840 pixels × 2160 pixels at 30 frames per second (fps). The videos are presented in the following order: original, without gamification, and then with gamification. Each participant’s eye gaze information is recorded while watching the video.

Two experiments are conducted in this research. Both experiments are conducted with 30 people. All the participants had normal eyesight and normal color vision. Also, Meta Quest Pro was used in these experiments, which allows for tracking of the participants’ gaze information. Figure 5 shows the experiment scene with an HMD. Table 2 and Figure 6 show the results for the average gaze guidance rate for each video type and scene. The gaze guidance rate measures how many frames the participants focused on the gazeguided object across all video frames. The results indicate that original videos have the lowest gaze guidance rate, while videos featuring red spheres demonstrate a significantly higher gaze guidance rate. Furthermore, the combination of red spheres and gamification results in an even higher gaze guidance rate, suggesting that gamification has a substantial impact on guiding participants’ gaze. A Kruskal-Wallis test was conducted to determine if there were significant differences among the groups. The analysis revealed a significant difference (p < 0.001) among the different types of videos within the same scene.

Comprehension of gaze guidance information

The second experiment aimed to investigate whether users can enhance their understanding of their surrounding environment through gaze guidance. In this experiment, only two videos were asked to watch: original and gamification videos. Both videos are driving straight scenes. Each video lasts approximately 10 to 15 seconds, with a resolution of 3840 pixels × 2160 pixels at 30 fps. The order in which the videos were presented was randomized (either the original video first or the gamified video first). Figure 7 shows scenes from the gamification videos. After watching each video, participants answer three questions about the video content. Examples of these questions are as follows: • •

On the left road, there is a signboard with one alphabet written on it. What is the alphabet? Multiple choices: F, G, H, J On the left, there are two people: one person is dancing. What is the other person doing? Multiple choices: sleeping, playing guitar, painting a picture, watching a phone.

Results and analysis of the comprehension experiment

Table 3 shows the results of average accuracy rate and standard deviation for each video. The results indicate that the gamified videos achieve a higher average accuracy rate compared to the original videos. Table 4 and Table 5 illustrate the varying order in which participants viewed the videos. Table 4 shows results when the original videos are viewed first, while Table 5 shows results when the gamified videos are viewed first. From these two tables, it can be observed that videos watched later tend to have a higher accuracy rate. This phenomenon is likely due to a learning effect; after the first video, participants understood the question format and knew what details to attend to in the second video. Nevertheless, even after accounting for this learning effect, the gamified videos consistently show a higher accuracy rate. This finding suggests that gamification effectively aids participants in deepening their understanding of their surrounding environment.

6. Conclusion

This research aims to improve the understanding of the surrounding environment for drivers by providing gaze guidance to an appropriate area and specific location. Two experiments were conducted to achieve this goal using a video with gamification features for intentional gaze guidance. The first experiment explored the impact of gaze guidance through gamification in real-world driving scenarios. The second experiment examined whether this gaze guidance helps users gain a deeper comprehension of their surroundings. As a result, gamification significantly improves gaze guidance compared to traditional methods of explicit gaze direction, even in real-world videos. Additionally, gamification has been shown to enhance drivers’ awareness of their environment by guiding their gaze to a specific location. In conclusion, the findings indicate that gamification can intentionally enable gaze guidance to a specific location and effectively improve drivers’ understanding of their surroundings.

One limitation of this study is that in Experiment 1, the condition order was fixed (original, without gamification, with gamification), which may have introduced a learning effect as participants became increasingly familiar with the driving scenes. Although Experiment 2 randomized the order and confirmed the superiority of the gamified condition regardless of sequence, future work should adopt a fully counterbalanced design to eliminate potential order effects and further strengthen the validity of the results.

Acknowledgements

Special thanks are extended to Hitesh Pandya, Tiago Rodrigues, and Bruno Coelho (Capgemini Japan K.K., Engineering and RD Services) and to Chokiu Leung (University of Tsukuba, Institute of Systems and Information Engineering) for their valuable advice and insightful feedback throughout the research. This research received partial funding from the JSPS Grant-in-Aid for Scientific Research (Grant Number 24K02978, 25K03146).

Declaration on Generative AI

During the preparation of this work, the author(s) used Grammarly in order to: Grammar and spelling check. After using these tools/services, the author(s) reviewed and edited the content as needed and take(s) full responsibility for the publication’s content. [3] P. Lindemann, T.-Y. Lee and G. Rigoll, "Supporting Driver Situation Awareness for Autonomous Urban Driving with an Augmented-Reality Windshield Display," in IEEE International Conference on Intelligent Transportation Systems (ITSC), Hawaii, 2018. [4] R. Lin, L. Ma and W. Zhang, "An interview study exploring Tesla drivers' behavioral adaptation,"

Applied Ergonomics (Elsevier), vol. 72, pp. 37-47, 2018. [6] H. Somerville and D. Shepardson, "Uber car's 'safety' driver streamed TV show before fatal crash -police," Reuters, 23 June 2018. [Online]. Available: https://www.reuters.com/article/world/ubercar-s-safety-driver-streamed-tv-show-before-fatal-crash-police-idUSKBN1JI0LB/. [Accessed 20 August 2024].

[1]

C. D. o. M.

Vehicles , "California DMV Approves Mercedes-Benz Automated Driving System for Certain Highways and Conditions," 8 June 2023 . [Online]. Available: https://www.dmv.ca.gov/portal/news-and -media/california-dmv-approves-mercedes-benzautomated-driving-system-for-certain-highways-and-conditions/ . [Accessed 20 January 2025 ].

[5]

N. T. S.

Board , " Preliminary Report Highway HWY18MH010," NTSB , Washington, 2018 .

[7]

Renner and

Pfeiffer , "AR-Glasses-Based Attention Guiding for Complex Environments: Requirements, Classification, and Evaluation," in 13th ACM International Conference on Pervasive Technologies Related to Assistive Environments (PETRA 2020 ), Corfu, 2020 .

[8]

Reif ,

W. A.

Günthner ,

Schwerdtfeger and

Klinker , "Pick-by-Vision comes on Age: Evaluation of an Augmented Reality Supported Picking System in a Real Storage Environment," in 6th ACM International Conference on Computer Graphics , Virtual Reality, Visualisation and Interaction in Africa (Afrigraph 2009 ), Afrigraph, 2009 .

[9]

Volmer ,

Baumeister ,

S. V.

Itzstein ,

Bornkessel-Schlesewsky ,

Schlesewsky ,

Billinghurst and

B. H.

Thomas , "A Comparison of Predictive Spatial Augmented Reality Cues for Procedural Tasks," IEEE Transactions on Visualization and Computer Graphics , vol. 24 , no. 11 , pp. 2846 - 2856 , 2018 .

[10]

J. K.

Muguro ,

P. W.

Laksono ,

Sasatake ,

Matsushita and

Sasaki , "User Monitoring in Autonomous Driving System Using Gamified Task: A Case for VR/AR In-Car Gaming," Multimodel Technologies and Interaction (MDPI) , vol. 5 , no. 40 , pp. 1 - 18 , 2021 .

[11]

Miyajima ,

Xie and I. Kitahara , "Video Generation Method Unconsciously Gaze-Guiding for a Passenger on Autonomous Vehicle with Controlling Color and Resolution," in AHFE International Conference , 2024 .

[12]

Sasamoto and

Kanai , "Development of Short-Term Transfer Training Program for Patients and Caregiver and Its Effect -The Intervention Based on the Teaching Video and Practical Guidance Designed by a Physiotherapist's Gaze Measurement-," Japanese Journal of Occupational Medicine and Traumatology , vol. 68 , no. 1 , pp. 31 - 38 , 2020 .

[13]

McCay-Peet ,

Lalmas and

Navalpakkam , "On Saliency, Affect and

Focused

Attention , " in Proceedings of the SIGGHI Conference on Human Factors in Computing Systems (CHI 2012 ), Austin, 2012 .

[14]

D. W.

Han , J . Liu,

X. J.

Yang ,

Romo ,

Horrey ,

Tilbury ,

Zhou ,

Robert and

Molnar , "Supporting Driver Attention Toward Potential Hazards During Takeover: A Preliminary Result," in Proceedings of the Human Factors and Ergonomics Society Annual Meeting (HFES2024) , 2024 .

[15]

Pomarjanschi ,

Dorr and

Barth , "Gaze Guidance Reduces the Number of Collisions with Pedestrians in a Driving Simulator," ACM Transactions on Interactive Intelligent Systems (TiiS) , vol. 1 , no. 2 , p. Article 8 , 2012 .

[16]

D. T.

Bishop ,

Daylamani-Zad ,

T. S.

Dkaidek ,

Fukaya and

D. P.

Broadbent , "A brief Gamified Immersive Intervention to Improve 11-14- Year-Olds ' Cycling-Related Looking Behaviour and Situation Awareness: A School-Based Pilot Study," Transportation Research Part F: Psychology and Behaviour , vol. 97 , pp. 17 - 30 , 2023 .

[17]

Steinberger ,

Schroeter and

C. N.

Watling , "From Road Distraction to Safe Driving: Evaluating the Effects of Boredom and Gamification on Driving Behaviour, Physiological Arousal, and Subjective Experience," Computers in Human Behavior (Elsevier) , vol. 75 , pp. 714 - 726 , 2017 .

[18]

S. D. E.

Maurer , "Guardian Angel: A Driver-Vehicle Interaction for Oversteering the Driver in a Highly Automated Vehicle ," Ulm University, Ulm, 2021 .

[19]

Mukhopadhyay ,

V. K.

Sharma ,

P. G.

Tatyarao ,

A. K.

Shah ,

A. M. C.

Rao ,

P. R.

Subin and

Biswas , "A Comparison Study Between XR Interfaces for Driver Assistance in Take Over Request," Transportation Engineering (Elsevier) , vol. 11 , p. 100159 , 2023 .

[20]

Wu ,

Ye ,

Zheng ,

Zhu and

Xiang , "Dangerous Slime: A Game for Improving Situation Awareness in Automated Driving," in Proceedings of the 15th International Conference on Automotive User Interfaces and Interactive Vehicular Applications (AutomotiveUI 2023 ), Ingolstadt, 2023 .

[21]

Ravi ,

Gabeur , Y.-T. Hu,

Hu ,

Ryali , T. Ma, H. Khedr,

Rädle ,

Rolland ,

Gustafson ,

Mintun ,

Pan ,

K. V.

Alwala ,

Carion , C.-Y. Wu , R. Girshick P.

Dollar , and C.

Feichtenhofer , "SAM 2: Segment Anything in Images and Videos," Meta, 28 October 2023 . [Online]. Available: https://arxiv.org/abs/2408.00714. [Accessed 23 October 2024 ].

[22] U. J , " New Orleans 8K - VR 360° Drive - 60FPS," YouTube , 14 May 2019 . [Online]. Available: https://www.youtube.com/watch?v=bSV8qc2 _qFs. [Accessed 11 July 2024 ].