-

Prediction of visual memorability with EEG signals: A comparative study. Sensors

dataset: A P300

Rachelle Hamelink

0 1 0 CEUR Workshop Proceedings , CEUR-WS.org 1 Radboud University , Netherlands

2023

20 9 11 13

In modern and more specifically social media, most advertisements nowadays consist of very short, few second videos. This leads to an increase of interest in recognizability and rememberability of visual media. The present study analyzed event-related potentials (ERP's) across 1000 trials of the Memento10k dataset to explore neurophysiological differences between videos that are later remembered or not when shown again to the same subject. The posterior brain region was analyzed across three channels and the right temporal cortex was analyzed in one channel. A significant difference, measured as p < .05, in amplitude was found in the 340 408 ms window after onset between remembered and not remembered videos as well as around the 476 ms Timepoint across all trials, channels and participants in the visual cortex. In the right temporal cortex, a significant difference, measured as p <.05, in amplitude was observed in the 306 - 816 ms window. These results suggest, in line with previous literature, that a stronger P300 component can be found for Remembered videos than Not Remembered videos in the right temporal lobe.In the visual cortex, an opposite effect was found, as a higher positivity was observed for Not Remembered videos.

INTRODUCTION

RELATED WORK

The posterior brain regions are correlated with the visual cortex and are in close proximity to the hippocampus, respectively important for the processing of visual imput and for initial consolidation into memory systems. The right temporal lobe of the brain is involved in the processes of learning and remembering non-verbal information, such as music and videos.

Polich [11] explains the P300 component by amplitude and latency. The amplitude is defined by the difference between the mean baseline voltage and the largest positive peak of the ERP wavefrom within a time window, in this study 300-1000ms after stimulus onset based on [5]. Latency is defined as the time in ms from stimulus onset to the point of maximum positive amplitude within a time window, 300-1000 ms in this study based on [5]. Analyzing ERP amplitudes for memory is an interesting method for declarative memory, as this method has shown that formation differences can already be observed that can later predict memorability [5]. Classic declarative memory studies would have to rely on a consolidation period of time and possible sleep factors in order to measure successful memorization of the stimulus input.

This study aims to investigate whether this P300 effect can be observed in videos of the Memento10k dataset. The aim is to research whether a significant P300 difference effect can be found between Remembered and Not Remembered videos in the encoding phase of the old/new paradigm in this experiment. This data is then used to analyze whether a P300 effect in the encoding phase of an old/new paradigm can predict successful video memorability. ERP’s rely on the onset of a stimulus to observe a difference in brain function. EEG recordings were collected for the first second of the videos [2]. Therefore, as it can be argued that a one second video does not differ severely from a still image, the main hypothesis in this paper is that within the 300-1000 ms time window in the data, a significant P300 component can be observed. 3 3.2

APPROACH Participants, materials and procedure

The participants, materials and procedure of the Predicting Video Memorability task data collection can be found in the overview paper [2]. For this analysis, the posterior brain region or visual cortex was analyzed by means of the EEG cap channels Oz, O1 and O2. In addition, the right temporal lobe was analyzed by means of the EEG cap channel P8. Both these regions were analyzed based on previous results using MEG by Osipova et al. [5]. The posterior brain regions correspond with the visual cortex, which is mostly activated when processing multimedia input and has close proximity to the hippocampus, which is a key are for memorization. The right temporal lobe is considered involved in learning and remembering of nonverbal stimuli, such as music and videos. Both these areas were analyzed separately to observe a difference in Remembered versus Not Remembered videos. Participants had to indicate on second viewing whether they remembered the presented video or not. Only trials that included a Memorability Score for that video were included in final analysis. 3.3

Data cleaning and statistical analysis

Preprocessing steps were undertaken to create the final dataset [2]. In addition to the preprocessing steps described in the overview paper of the task, outlier amplitudes across all participants, the four included channels and all trials were removed based on 1.5 standard devation below the first quartile of the data and above the third quartile of the amplitude data. A linear mixed effect model analysis with Memorability Score, Channel and Timepoint as fixed effects and Subject as random effect was used to analyze the dataset in the visual cortex. An additional linear effect model analysis with Memorability Score and Timepoint as fixed effects and Subject as random effect was performed to analyze the dataset in the right temporal cortex. All data was processed and plotting and analyzing of the data was performed using RStudio [12]. 4

RESULTS AND ANALYSIS

A linear mixed effects model with fixed effects Timepoint, Channel and Memorability Score was performed on the Amplitude ERP data, with an added random factor of Subject. Table 1 represents the significant Timepoint and Memorability Score interactions discovered in the data, indicating that the time window of ~ 340 ms - ~ 408 ms and the Timepoint at ~ 476 ms showed a significant difference in ERPs between Remembered and Not Remembered videos across all trials and participants. These differences in ERP amplitudes can be observed in Figure 1. The largest differences can be observed at Timepoint 10 (374 ms) and Timepoint 13 (475 ms) in Figure 1. All other timepoints showed no significant difference between Remembered and Not Remembered videos (all p’s > 0.05).

A linear mixed effects model with fixed effects Timepoint and Memorability Score was performed on the Amplitude ERP data, with an added random factor of Subject. Table 2 represents the significant Time and Memorability Score interactions discovered in the data, indicating that the time window of ~ 306 ms - ~ 816 ms showed a significant difference in ERPs between Remembered and Not Remembered videos across all trials and participants. These differences in ERP amplitudes can be observed in Figure 2. The largest differences can be observed at Timepoint 13 (475 ms) in Figure 2. All other Timepoints showed no significant difference between Remembered and Not Remembered videos (all p’s > 0.05). Memorability Score * Timepoint 8 (~ 306 ms) Memorability Score * Timepoint 9 (~ 340 ms) Memorability Score * Timepoint 10 (~ 374 ms) Memorability Score * Timepoint 11 (~ 408 ms) Memorability Score * Timepoint 12 (~ 442 ms) Memorability Score x Timepoint 13 (~ 476 ms) Memorability Score x Timepoint 14 (~ 510 ms) Memorability Score x Timepoint 15 (~ 544 ms) Memorability Score x Timepoint 16 (~ 578 ms) Memorability Score x Timepoint 17 (~ 612ms) Memorability Score x Timepoint 18 (~ 646 ms) Memorability Score x Timepoint 19 (~ 680 ms) Memorability Score x Timepoint 20 (~ 714 ms) Memorability Score x Timepoint 21 (~ 748 ms) Memorability Score x Timepoint 22 (~ 782 ms) Memorability Score x Timepoint 23 (~ 816 ms)

CONCLUSIONS

The aim of this study was to investigate whether a difference in amplitude of ERP’s could be observed within the first second after onset of thousand three-second videos in the Memento10k dataset, between videos that participants would later indicate as Remembered versus Not Remembered. This was done as part of the MediaEval 2022 Predicting Video Memorability task [2]. A significant difference in amplitude was found at four Timepoints, 340ms, 374ms, 408ms and 476ms in the posterior brain region. This difference was found in the opposite direction than was hypothesized, as the neural oscillations were of less positivity for Remembered videos than for Not Remembered videos. Contrary to the hypothesis based on [5], that showed a greater positivity for Remembered videos than for Not remembered videos. In addition, a significant difference in amplitude in the right temporal lobe was observed, in the 306 – 816 ms time window. These results are in line with earlier studies [5][7][8][9] that have observed a greater positive difference in ERP amplitude around the 300ms mark after onset of stimulus, up until one second after the onset of the stimulus. The results from this project add evidence that suggests successful remembrance of a video later on can be predicted by looking for a P300 component after the onset of the video in the right temporal lobe. In the visual cortex, a positive peak was observed for the not-remembered videos. Future studies could explore the interaction effect between this positivity and negativity around the 300 ms time window after onset of a video.

This study adds to the body of literature in a way that it provides evidence for the importance of a P300 component for memorability beyond the scope of image research. In addition, the paradigm in which the videos were presented to the participants, can be argued as mimicking how a person would normally perceive visual imput via, e.g. social media. Most media platforms nowadays consist of an endless stream of very short videos and only certain grab and hold the attention of the viewer beyond the point of retrieval. This is in line with the increase in interest described in the introduction with regard to predicting video memorability in modern media.

However, further studies would be needed to investigate further how video relates to image literature beyond the scope of this paper. As only neurophysiological data of the first second of the videos was recorded in this paper, a limitation to this study is to draw any bridging conclusions between the literature of image memorability to video memorability. Even though the dataset consists of moving pictures, it can be argued that one second of viewing is not extensive enough to compare video and image media. Full recordings of the entire video could provide more insight as to where the spark of activation occurs over time and potentially over the biological structure of the brain. Future studies would have to indicate whether other brain regions such described in [7][8][9] (e.g., anterior prefrontal cortex, parietal cortex and the medial-frontal areas) show importance during a video viewing task and thus extending this body of knowledge beyond the scope of image research. It could be interesting to combine the behavioral data and EEG data in this study with diffusion magnetic resonance imaging (dMRI), to deepen the understanding of the interplay between those brain regions that might be a factor in the later remembrance of a short video.

In addition, future research is necessary to investigate the features that play a role in memorability. This study only included the first second of the video in final analysis and it was assumed that since the videos were only three-seconds long, the main element of the subject of the video would already present itself in the first second of the video. Feature extraction could, however, analyze this statement in more depth to investigate whether there are common subject features that can be extracted from the Memento10k dataset that ensure memorability. This has the potential to answer the questions from the introduction section of this paper, to investigate what, if any, features of a short video can predict memorability. If a common benchmark of easily remembered features in videos can be established, this has a wide range of practical implementations for advertisers, influencers and other stakeholders that might have an interest of getting short videos remembered.

To conclude, this present paper provides evidence for a P300 effect differences in the posterior brain regions and right temporal lobe at retrieval for the predictability of memorability of videos from the Memento10k dataset [3]. More research is needed to understand how this data relates to image studies in the past and feature extraction could potentially identify common features that correlate with an increase in memorability.

ACKNOWLEDGMENTS

I want to thank dr. Martha Larson for her guidance and critical eye in this project, as well as allowing me to explore a topic that suited my personal interests. I would also like to thank dr. Alba García Seco de Herrera and the entire research team that collected this ERP and covariate data. Lastly, I would like to specifically thank Yana van de Sande, Jasper de Meijer and Floris Cos for their help with data cleaning and analyzing steps, as well as ongoing support during the task.