=Paper= {{Paper |id=Vol-3181/paper16 |storemode=property |title=Overview of the EEG Pilot Subtask at MediaEval 2021: Predicting Media Memorability |pdfUrl=https://ceur-ws.org/Vol-3181/paper16.pdf |volume=Vol-3181 |authors=Lorin Sweeney,Ana Matran-Fernandez,Sebastian Halder,Alba Garcia Seco de Herrera,Alan Smeaton,Graham Healy |dblpUrl=https://dblp.org/rec/conf/mediaeval/SweeneyM0HSH21 }} ==Overview of the EEG Pilot Subtask at MediaEval 2021: Predicting Media Memorability== https://ceur-ws.org/Vol-3181/paper16.pdf
               Overview of the EEG Pilot Subtask at MediaEval 2021:
                         Predicting Media Memorability
                                         Lorin Sweeney1 , Ana Matran-Fernandez2 , Sebastian Halder2 ,
                                          Alba G. Seco de Herrera2 , Alan Smeaton1 , Graham Healy1
                                                           1 School of Computing, Dublin City University
                                    2 School of Computer Science and Electronic Engineering, University of Essex

            lorin.sweeney8@mail.dcu.ie,{amatra,s.halder,alba.garcia}@essex.ac.uk,{alan.smeaton,graham.healy}@dcu.ie

ABSTRACT                                                                                     researchers, allowing them to explore and leverage EEG features
The aim of the Memorability-EEG pilot subtask at MediaEval’2021                              without any of the requisite domain knowledge, but also increase
is to promote interest in the use of neural signals—either alone                             the interdisciplinary interest in the subject of memorability more
or in combination with other data sources—in the context of pre-                             broadly.
dicting video memorability by highlighting the utility of EEG data.                             Applying EEG to the question of whether an experience will be
The dataset created consists of pre-extracted features from EEG                              subsequently remembered or forgotten is a well researched area
recordings of subjects while watching a subset of videos from Pre-                           [7, 9, 12, 15]. Memorability, however, has been shown to be distinct
dicting Media Memorability subtask 1. This demonstration pilot                               from subsequent memory effects [1, 14], and received little inter-
gives interested researchers a sense of how neural signals can be                            disciplinary attention. Additionally, even though the application
used without any prior domain knowledge, and enables them to do                              of machine learning to EEG is an active area of interest—allowing
so in a future memorability task. The dataset can be used to support                         for the automation or augmentation of neurological diagnostics
the exploration of novel machine learning and processing strategies                          [3–5, 10], and the classification of emotional states [16], mental
for predicting video memorability, while potentially increasing in-                          tasks [11], and sleep stages [2]—the use of EEG to predict visual
terdisciplinary interest in the subject of memorability, and opening                         memorability has yet to be firmly established, and was previously
the door to new combined EEG-computer vision approaches.                                     limited to static content [6]. To the best of our knowledge, this
                                                                                             paper outlines the first application of EEG to video memorability.

1     INTRODUCTION AND RELATED WORK
                                                                                             2    EXPERIMENT DESIGN AND STRUCTURE
Even though the nature and constitution of people’s memories re-
                                                                                             The stimuli used in the study are a subset of the subtask 1 data
mains elusive, and our understanding of what makes one thing
                                                                                             (i.e., the short-term video memorability prediction task) in Media-
more/less memorable than another is still nascent, combining com-
                                                                                             Eval’2021 [8], and consists of 450 videos, 96 of which were des-
putational (e.g., machine learning) and neurophysiological (e.g.,
                                                                                             ignated as targets and selected to reflect the bottom and top 50
electroencephalography; EEG) tools to investigate the mechanisms
                                                                                             memorable videos from the TRECVid dataset, 200 were selected to
(formation and recall) of memory may offer insights that would be
                                                                                             reflect the next top and bottom 100, and 100 were selected to reflect
otherwise unobtainable. While EEG is not a tool that can directly ex-
                                                                                             the middle 100 memorable videos (95 selected + 5 duplicates) from
plain what makes a video more/less memorable, it can help us trim
                                                                                             the set of subtask videos. EEG data was collected from 11 subjects
the umbral undergrowth surrounding the subject, shedding light
                                                                                             while they completed a short-term memory experiment, which was
and offering a potential leap forward in our understanding of the
                                                                                             used to annotate the videos for memorability. EEG data acquisition2
interplay between the mechanisms of memory and memorability.
                                                                                             was carried out in two separate locations using a shared experimen-
    The purpose of this pilot study at MediaEval’2021 [8] was to
                                                                                             tal procedure, and each location annotated the same set of videos.
collect enough EEG data for proof of concept and demonstration
                                                                                             Rather than being split into separate encoding and recognition
purposes, showcasing what could be done in subsequent work on
                                                                                             phases, the experiment was continuous in nature.
predicting media memorability. The study involved the collection,
                                                                                                 Before the experiment was carried out, participants were given
filtering, and interpretation of neurophysiological data, and the
                                                                                             a verbal description of the experiment procedure, presented with
use and evaluation of machine learning methods to enable the
                                                                                             a set of written instructions, and taken through a practice run of
assessment of EEG data as a predictor of video memorability. The
                                                                                             3 videos to familiarise them with the experiment. The experiment
study has culminated in a demonstration of the utility of EEG in
                                                                                             used a total of 450 videos, 192 of which were the target videos (96
the context of video memorability, along with the public release of
                                                                                             targets, shown twice), and the remaining 258 videos were the fillers.
processed EEG features for others to explore1 . This study has the
                                                                                             The experiment was broken into 9 blocks of 50 videos, where a
potential to not only broaden the research horizons of computing
1 Dataset and examples of use, as well as the code to replicate the results in this paper,
are available at https://osf.io/zt6n9/                                                       2 Data collection for participants 1–5 was carried out at Dublin City University (DCU)
                                                                                             with approval from the university’s Research Ethics Committee (DCUREC / 2021 /
Copyright 2021 for this paper by its authors. Use permitted under Creative Commons           171), and for participants 6–11 at the University of Essex (UoE) with approval from
License Attribution 4.0 International (CC BY 4.0).                                           the Ethics Committee (ETH2122-0001). Data at DCU was collected using a 32-channel
MediaEval’21, December 13-15 2021, Online                                                    ANT Neuro eego system with a sampling rate of 1000 Hz. Data at UoE was collected
                                                                                             using a 64-channel BioSemi ActiveTwo system at a sampling rate of 2048 Hz.
MediaEval’21, December 13-15 2021, Online                                                                                   Sweeney et al.


fixation cross was displayed for 3–4.5s, followed by the video pre-      Table 1: Mean AUC values obtained for each participant
sentation for its ~6 second duration, followed by a “get ready to        across all folds, separately for ERP and ERSP features.
answer” prompt of 1–3 seconds, followed by a 3s period for recog-
nition response (repeated video or not). The time per block was           Participant   ERP-based classification   ERSP-based classification
approximately 700 seconds (~12 minutes) without accounting for
                                                                                1             0.564 ± 0.09               0.522 ± 0.09
30-second closed/open eye baselines and breaks, which occurred
                                                                                2             0.585 ± 0.11               0.558 ± 0.07
between blocks. In order to account for recency effects, the first 50
                                                                                3             0.520 ± 0.07               0.532 ± 0.07
videos presented did not include targets, but had 5 filler repeats,
                                                                                4             0.666 ± 0.07               0.626 ± 0.09
and the presentation positions of targets between each of the partic-
                                                                                5             0.714 ± 0.06               0.649 ± 0.08
ipants was pseudo-randomised, with the distances between target
                                                                                6             0.555 ± 0.11               0.522 ± 0.10
and repeat videos roughly fitting a uniform distribution, and the
                                                                                7             0.601 ± 0.10               0.525 ± 0.08
position of each block aside from block 1 being rotated by 1 for
                                                                                8             0.590 ± 0.08               0.674 ± 0.08
each participant.
                                                                                9             0.609 ± 0.09               0.489 ± 0.06
                                                                               10             0.628 ± 0.06               0.618 ± 0.09
3   ANALYSIS AND RESULTS                                                       11             0.477 ± 0.08               0.611 ± 0.12
EEG data from both locations were processed in the same way for
                                                                              Mean            0.591 ± 0.06               0.575 ± 0.06
the 30 channels that were common across the two setups: data were
first referenced using a common average and band-pass filtered
between 0.1–30 Hz using a symmetric linear-phase FIR filter. Inde-
pendent Component Analysis (ICA) was used to remove artifacts,
and trial rejection using subject-specific thresholds was applied.        A
   To establish a baseline using features extracted from the time
domain, the EEG was low-pass filtered with a cutoff frequency of
15 Hz and downsampled to 30 Hz. We applied baseline correction
to the average of the 250-ms pre-stimulus interval and extracted
the data corresponding to the first second of each repeated clip,
from each of the 30 channels, and concatenated it to form a feature
vector. We term these the Event-Related Potential (ERP) features.
   A second set of features were extracted from the EEG, this time
from the time-frequency domain, which we refer to as ERSP (Event-
                                                                          B
Related Spectral Perturbation) features. For this, we extracted 4-
second long epochs and computed a trial-by-trial time-frequency
representation using Morlet wavelets for frequencies between 2-
30 Hz. For this set of features, we used data from only 4 channels,
namely Fz, Cz, Pz, and O1.
   Since there were very few forgotten clips, in this task we dif-
ferentiate between the first and the second viewing of clips that
were successfully remembered based only on EEG data. To establish
a baseline, we standardised the data to have mean zero and unit          Figure 1: Grand-averaged butterfly plot showing differences
standard deviation, and used scikit-learn’s Bayesian Ridge regressor     in EEG activity for the second minus first presentation for
with default parameters. Results were obtained through 20-fold           videos for the first second (top-A). Averaged time-frequency
cross-validation with a 20% train-test split, separately for ERP and     differences in power for the second presentation minus that
ERSP features. The individual classification results for each partici-   for the first presentation of videos for the first 3 seconds for
pant are shown in Table 1, measured using Area Under the Receiver        channels Fz and Pz (bottom-B left and right, resp.).
Operating Characteristic Curve [13].

4   DISCUSSION AND OUTLOOK
This was an exploratory pilot task to guide the development of           collected with a revised experimental protocol and more partici-
a future experimental protocol for capturing EEG signatures re-          pants to support a future fully-fledged task for predicting video
lating to successful memory encoding and retrieval to be used in         memorability. The preprocessed EEG data captured is released to
predicting video memorability. While our experimental protocol           the research community.
resulted in too little data to examine differences between success-
ful and unsuccessful encoding, we show EEG-related differences
exist between the encoding and recognition phases of previously          ACKNOWLEDGMENTS
seen videos. These results indicate that EEG signatures relating to      This work was part-funded by NIST Award No. 60NANB19D155 and
memory processes for video are present, and thus suitable to be          by Science Foundation Ireland under grant number SFI/12/RC/2289_P2.
Predicting Media Memorability                                                                           MediaEval’21, December 13-15 2021, Online


REFERENCES                                                                       [16] Xiao-Wei Wang, Dan Nie, and Bao-Liang Lu. 2014. Emotional state
 [1] Wilma A Bainbridge, Daniel D Dilks, and Aude Oliva. 2017. Memora-                classification from EEG data using machine learning approach. Neu-
     bility: A stimulus-driven perceptual neural signature distinctive from           rocomputing 129 (2014), 94–106.
     memory. NeuroImage 149 (2017), 141–152.
 [2] Farideh Ebrahimi, Mohammad Mikaeili, Edson Estrada, and Homer
     Nazeran. 2008. Automatic sleep stage classification based on EEG
     signals by using neural networks and wavelet packet coefficients. In
     2008 30th Annual International Conference of the IEEE Engineering in
     Medicine and Biology Society. IEEE, 1151–1154.
 [3] Denis A Engemann, Federico Raimondo, Jean-Rémi King, Benjamin
     Rohaut, Gilles Louppe, Frédéric Faugeras, Jitka Annen, Helena Cassol,
     Olivia Gosseries, Diego Fernandez-Slezak, and others. 2018. Robust
     EEG-based cross-site and cross-protocol classification of states of
     consciousness. Brain 141, 11 (2018), 3179–3192.
 [4] Behshad Hosseinifard, Mohammad Hassan Moradi, and Reza Rostami.
     2013. Classifying depression patients and normal subjects using ma-
     chine learning techniques and nonlinear features from EEG signal.
     Computer methods and programs in biomedicine 109, 3 (2013), 339–345.
 [5] Cosimo Ieracitano, Nadia Mammone, Amir Hussain, and Francesco C
     Morabito. 2020. A novel multi-modal machine learning based approach
     for automatic classification of EEG recordings in dementia. Neural
     Networks 123 (2020), 176–190.
 [6] Sang-Yeong Jo and Jin-Woo Jeong. 2020. Prediction of visual memo-
     rability with EEG signals: A comparative study. Sensors 20, 9 (2020),
     2694.
 [7] Demetrios Karis, Monica Fabiani, and Emanuel Donchin. 1984. “P300”
     and memory: Individual differences in the von Restorff effect. Cognitive
     Psychology 16, 2 (1984), 177–216.
 [8] Rukiye Savran Kiziltepe, Mihai Gabriel Constantin, Claire-Hélène
     Demarty, Graham Healy, Camilo Fosco, Alba García Seco de Herrera,
     Sebastian Halder, Bogdan Ionescu, Ana Matran-Fernandez, Alan F.
     Smeaton, and Lorin Sweeney. 2021. Overview of The MediaEval 2021
     Predicting Media Memorability Task. In Working Notes Proceedings of
     the MediaEval 2021 Workshop.
 [9] Wolfgang Klimesch. 1999. EEG alpha and theta oscillations reflect
     cognitive and memory performance: a review and analysis. Brain
     research reviews 29, 2-3 (1999), 169–195.
[10] Christoph Lehmann, Thomas Koenig, Vesna Jelic, Leslie Prichep, Roy E
     John, Lars-Olof Wahlund, Yadolah Dodge, and Thomas Dierks. 2007.
     Application and comparison of classification algorithms for recogni-
     tion of Alzheimer’s disease in electrical brain activity (EEG). Journal
     of neuroscience methods 161, 2 (2007), 342–350.
[11] Nan-Ying Liang, Paramasivan Saratchandran, Guang-Bin Huang, and
     Narasimhan Sundararajan. 2006. Classification of mental tasks from
     EEG signals using extreme learning machine. International journal of
     neural systems 16, 01 (2006), 29–38.
[12] Eunho Noh, Grit Herzmann, Tim Curran, and Virginia R de Sa. 2014.
     Using single-trial EEG to predict and analyze subsequent memory.
     NeuroImage 84 (2014), 712–723.
[13] Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent
     Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Pret-
     tenhofer, Ron Weiss, Vincent Dubourg, and others. 2011. Scikit-learn:
     Machine learning in Python. the Journal of machine Learning research
     12 (2011), 2825–2830.
[14] Michael D Rugg and Tim Curran. 2007. Event-related potentials and
     recognition memory. Trends in cognitive sciences 11, 6 (2007), 251–257.
[15] Thomas F Sanquist, John W Rohrbaugh, Karl Syndulko, and Donald B
     Lindsley. 1980. Electrocortical signs of levels of processing: Perceptual
     analysis and recognition memory. Psychophysiology 17, 6 (1980), 568–
     576.