<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Overview of The MediaEval 2021 Predicting Media Memorability Task</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Rukiye Savran Kiziltepe</string-name>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mihai Gabriel Constantin</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Claire-Hélène Demarty</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Graham Healy</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Camilo Fosco</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alba G. Seco de Herrera</string-name>
          <email>alba.garcia@essex.ac.uk</email>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sebastian Halder</string-name>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Bogdan Ionescu</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ana Matran-Fernandez</string-name>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alan F. Smeaton</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lorin Sweeney</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Dublin City University</institution>
          ,
          <country country="IE">Ireland</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>InterDigital</institution>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Massachusetts Institute of Technology Cambridge</institution>
          ,
          <addr-line>Massachusetts</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>University Politehnica of Bucharest</institution>
          ,
          <country country="RO">Romania</country>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>University of Essex</institution>
          ,
          <country country="UK">UK</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2021</year>
      </pub-date>
      <fpage>13</fpage>
      <lpage>15</lpage>
      <abstract>
        <p>This paper describes the MediaEval 2021 Predicting Media Memorability task, which is in its 4th edition this year, as the prediction of short-term and long-term video memorability remains a challenging task. In 2021, two datasets of videos are used: first, a subset of the TRECVid 2019 Video-to-Text dataset; second, the Memento10K dataset in order to provide opportunities to explore cross-dataset generalisation. In addition, an Electroencephalography (EEG)-based prediction pilot subtask is introduced. In this paper, we outline the main aspects of the task and describe the datasets, evaluation metrics, and requirements for participants' submissions.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>INTRODUCTION</title>
      <p>Information retrieval and recommendation systems deal with
exponential growth in media platforms such as social networks and
media marketing. New methods of organising and retrieving digital
material are needed in order to increase the usefulness of
multimedia events in our daily lives. Memorability, like other
important video properties, such as aesthetics or interestingness, can be
viewed as useful to contribute in the selection of competing videos
especially when developing advertising or instructional material.
In advertising, predicting the memorability of a video is important
since multimedia materials have varying efects on human memory.
In addition to advertising, this task may have an impact on other
ifelds such as film making, education, and content retrieval.</p>
      <p>
        The Predicting Media Memorability task addresses this problem.
The task is part of the MediaEval benchmark and, following the
success of previous editions [
        <xref ref-type="bibr" rid="ref15 ref2 ref4 ref6">2, 4, 6, 15</xref>
        ], creates a common
benchmarking protocol and provides a ground truth dataset for short-term
and long-term memorability using common definitions.
      </p>
    </sec>
    <sec id="sec-2">
      <title>RELATED WORK</title>
      <p>
        The computational study of video memorability is a natural
extension of research into image memorability prediction, which has
gained increasing attention in the years after Isola et al.’s work [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ].
Models have reached remarkable predictive accuracy for image
memorability [
        <xref ref-type="bibr" rid="ref12 ref19">12, 19</xref>
        ], and we have just begun to see the application
of approaches such as style transfer to enhance image
memorability [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ], demonstrating that we have progressed from simply
measuring memorability to using it as an evaluation aspect.
      </p>
      <p>
        In contrast, computer science research on visual memorability
(VM) is still in its early stages. Recent studies on video memorability
have focused on the short-term [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], but the lack of studies on VM
can be explained by a number of factors. To begin with, there are
currently not enough publicly available data sets for training and
testing models.The second problem is the absence of a standardised
definition for VM. In terms of modeling, previous attempts at VM
prediction [
        <xref ref-type="bibr" rid="ref16 ref3">3, 16</xref>
        ] have identified several features that contribute to
VM prediction, including semantic, saliency, and colour features.
However, the work is far from complete, and our ability to propose
efective computational models will aid in meeting the challenge of
VM prediction.
      </p>
      <p>The purpose of this task is to contribute to the harmonisation and
advancement of this rapidly growing multimedia field. Additionally,
in contrast to prior work on image memorability prediction in which
memorability was tested only a few minutes after memorisation, we
present a dataset containing long-term memorability annotations.
We expect that models trained on this will produce predictions that
are more indicative of long-term memorability, which is preferred
in a wide variety of applications. This year we also distribute an
external dataset for generalisation purposes and propose a new
pilot task which is EEG-based video memorability.
3</p>
    </sec>
    <sec id="sec-3">
      <title>TASK DESCRIPTION</title>
      <p>The Predicting Media Memorability task asks participants to develop
automatic systems that predict short-term and long-term
memorability scores from short videos. Participants were given a dataset of
short videos with short-term and long-term memorability scores,
raw annotations, and extracted features. Participants were assigned
three sub-tasks:
•
•</p>
      <p>Video-based prediction: Participants are required to
generate automatic systems that predict short-term and
longterm memorability scores of new videos based on the given
video dataset and their memorability scores.</p>
      <p>
        Generalization (optional): Participants will train their
system on one of the two sources of data we provide and
will test them on the other source of data. This is an
optional sub-task.
• Electroencephalography (EEG)-based prediction
(pilot): Participants are required to generate automatic
systems that predict short-term memorability scores of new
videos based on the given EEG data. This is a pilot sub-task
and details for it can be found in [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ].
4
      </p>
    </sec>
    <sec id="sec-4">
      <title>COLLECTION 4.2</title>
      <p>
        This task utilises a subset of the TRECVID 2019 Video-to-Text video
dataset [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. The dataset contains Twitter Vine videos where various
actions are performed. This year, the dataset has been expanded and
normalised short-term memorability scores are provided with
memory alpha decay values. Additionally, we open the Memento10K [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]
dataset to participants. Apart from traditional video information
like metadata and extracted visual features, part of the data will
be accompanied by Electroencephalography (EEG) recordings that
would allow to explore the physical reactions of users.
      </p>
      <p>
        A set of pre-extracted features are also distributed as follows:
The Memento10K dataset [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] contains 10,000 three-second videos
depicting in-the-wild scenes, with their associated short-term
memorability scores, memorability decay values, action labels, and 5
accompanying captions. The scores were computed with 90
annotations per video on average, and the videos were shown to
participants without sound. 7,000 videos were released as a training
• image-level features: AlexNetFC7 [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], HOG [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], HSVHist, set, and 1,500 were provided for validation. The last 1,500 videos
RGBHist, LBP [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], VGGFC7 [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ], DenseNet121 [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], ResNet50 [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], were used as the test set for scoring submissions.
      </p>
      <p>
        EficientNet b3 [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ];
• video-level feature: C3D [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ]; 5 SUBMISSION AND EVALUATION
• audio-level feature: VGGish [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
      </p>
      <p>Memento10K dataset
videos are repeated to ensure that participants are paying attention
to the task. After 24 to 72 hours, the same individuals are
anticipated to return for the second step, which involves collecting labels
for long-term memorability. This time, 40 target videos chosen at
random from the non-vigilance fillers in the first stage and 80 fillers
chosen at random from new videos are displayed to determine the
target videos’ long-term memorability scores. The percentage of
correct recognition for each video is used to calculate short-term
and long-term memorability scores.</p>
      <p>Three frames from each video were used to extract image-level
features: the first, the middle, and the last frame. Additionally, each
TRECVid video includes at least two textual captions summarising
the action, whereas Memento10K includes five. The annotations
acquired from participants included the first and second appearance
positions of each target video, as well as participants’ response times
and the keys pressed while watching each video.
4.1</p>
    </sec>
    <sec id="sec-5">
      <title>TRECVid 2019 Video-to-Text dataset</title>
      <p>
        The TRECVid 2019 Video-to-Text dataset [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] contains 6,000 videos.
In 2021, three subsets were distributed as part of the MediaEval
Predicting Media Memorability task. The training set contained
588 videos, the development set 1,116 videos and the test set 500
videos. Each video has two associated memorability scores
indicating its likelihood of being remembered after two distinct periods of
memory retention. Similar to previous editions of the task [
        <xref ref-type="bibr" rid="ref2 ref4">2, 4</xref>
        ],
memorability was measured twice using recognition test: a few
minutes after the videos were shown (short-term) and 24-72 hours
later (long-term). The videos are released under Creative Commons
licences that allow their redistribution.
      </p>
      <p>
        The ground truth dataset was generated using a video
memorability game protocol proposed by Cohendet et al. [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. The
memorability game was formed in two versions. One was made available on
Amazon Mechanical Turk (AMT), and another was made available
for general use in three languages: English, Spanish, and Turkish.
      </p>
      <p>In the video memorability game protocol, participants were
expected to watch 180 and 120 videos in short-term and long-term
memorisation steps, respectively. The goal was essentially to press
the space bar whenever participants recognise a previously seen
video, which allows for the determination of which videos they do
and do not recognise. The game begins with the repetition of 40
target videos after a few minutes to accumulate short-term
memorability labels. Regarding the first step’s filler videos, 60 non-vigilance
ifller videos are shown once. After a few seconds, 20 vigilance filler
As it is with previous tasks, each team is expected to submit both
short-term and long-term memorability predictions. A total of ten
runs, five for each, can be submitted for video-based prediction.
In addition, participants can submit five runs per optional
subtask (generalisation and EEG-based prediction). All information,
including given features, ground truth data, video sample titles,
features extracted from visual material, and even external data, may
be used to build the system. Short-term and long-term annotation
memorability runs must be submitted separately and must not
include each others.</p>
      <p>Classic evaluation metrics (including Spearman’s rank
correlation) are used to compare the predicted memorability scores for the
videos with the ground truth memorability scores.
6</p>
    </sec>
    <sec id="sec-6">
      <title>DISCUSSION AND OUTLOOK</title>
      <p>In this paper we introduced the 4th edition of the Predicting
Media Memorability at the MediaEval 2021 Benchmarking initiative.
With this task, a comparative assessment of current state-of-the-art
machine learning techniques to predict short- and long-term
memorability can be conducted. A dataset containing short videos is
distributed with memorability annotations and external data is
provided for generalisation purposes. Moreover, EEG annotations are
also provided for a pilot study. Related information has also been
made available to participants so they can refine their strategies.
The 2021 MediaEval workshop proceedings presents details on the
participants’ approaches to the task including methodologies used
and findings.</p>
    </sec>
    <sec id="sec-7">
      <title>ACKNOWLEDGMENTS</title>
      <p>MGC and BI’s contribution is supported under project AI4Media,
a European Excellence Centre for Media, Society and Democracy,
H2020 ICT-48-2020, grant #951911. The work of RSK is partially
funded by the Turkish Ministry of National Education. This work
was part-funded by NIST Award No. 60NANB19D155 and by Science
Foundation Ireland under grant number SFI/12/RC/2289_P2.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>George</given-names>
            <surname>Awad</surname>
          </string-name>
          , Asad A Butt, Keith Curtis,
          <string-name>
            <given-names>Yooyoung</given-names>
            <surname>Lee</surname>
          </string-name>
          , Jonathan Fiscus, Afzal Godil, Andrew Delgado, Jesse Zhang, Eliot Godard, Lukas Diduch, and others.
          <source>2019. TRECVID</source>
          <year>2019</year>
          :
          <article-title>An Evaluation Campaign to Benchmark Video Activity Detection, Video Captioning and Matching, and</article-title>
          <string-name>
            <given-names>Video</given-names>
            <surname>Search</surname>
          </string-name>
          &amp; Retrieval. (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Romain</given-names>
            <surname>Cohendet</surname>
          </string-name>
          ,
          <string-name>
            <surname>Claire-Hélène</surname>
            <given-names>Demarty</given-names>
          </string-name>
          , Ngoc Duong, Mats Sjöberg, Bogdan Ionescu, and
          <string-name>
            <surname>Thanh-Toan Do</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>MediaEval 2018: Predicting media memorability task</article-title>
          .
          <source>In Working Notes Proceedings of the MediaEval 2018 Workshop</source>
          . Sophia Antipolis, France.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Romain</given-names>
            <surname>Cohendet</surname>
          </string-name>
          ,
          <string-name>
            <surname>Claire-Hélène</surname>
            <given-names>Demarty</given-names>
          </string-name>
          ,
          <source>Ngoc QK Duong, and Martin Engilberge</source>
          .
          <year>2019</year>
          .
          <article-title>VideoMem: Constructing, Analyzing, Predicting Short-term and Long-term Video Memorability</article-title>
          .
          <source>In Proceedings of the IEEE International Conference on Computer Vision</source>
          . 2531-
          <fpage>2540</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Mihai</given-names>
            <surname>Gabriel</surname>
          </string-name>
          <string-name>
            <given-names>Constantin</given-names>
            , Bogdan Ionescu,
            <surname>Claire-Hélène</surname>
          </string-name>
          <string-name>
            <surname>Demarty</surname>
          </string-name>
          , Ngoc QK Duong,
          <article-title>Xavier Alameda-Pineda, and</article-title>
          <string-name>
            <given-names>Mats</given-names>
            <surname>Sjöberg</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Predicting Media Memorability Task at MediaEval 2019</article-title>
          .
          <source>In Working Notes Proceedings of the MediaEval 2019 Workshop</source>
          . Sophia Antipolis, France.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Navneet</given-names>
            <surname>Dalal</surname>
          </string-name>
          and
          <string-name>
            <given-names>Bill</given-names>
            <surname>Triggs</surname>
          </string-name>
          .
          <year>2005</year>
          .
          <article-title>Histograms of oriented gradients for human detection</article-title>
          .
          <source>In 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05)</source>
          , Vol.
          <volume>1</volume>
          . IEEE,
          <fpage>886</fpage>
          -
          <lpage>893</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Alba</given-names>
            <surname>García Seco de Herrera</surname>
          </string-name>
          , Rukiye Savran Kiziltepe, Jon Chamberlain, Mihai Gabriel Constantin,
          <string-name>
            <surname>Claire-Hélène</surname>
            <given-names>Demarty</given-names>
          </string-name>
          , Faiyaz Doctor, Bogdan Ionescu,
          <string-name>
            <given-names>and Alan F.</given-names>
            <surname>Smeaton</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>Overview of MediaEval 2020 Predicting Media Memorability Task: What Makes a Video Memorable?</article-title>
          .
          <source>In Working Notes Proceedings of the MediaEval 2020 Workshop.</source>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Dong-Chen He</surname>
            and
            <given-names>Li</given-names>
          </string-name>
          <string-name>
            <surname>Wang</surname>
          </string-name>
          .
          <year>1990</year>
          .
          <article-title>Texture unit, texture spectrum, and texture analysis</article-title>
          .
          <source>IEEE Transactions on Geoscience and Remote Sensing</source>
          <volume>28</volume>
          ,
          <issue>4</issue>
          (
          <year>1990</year>
          ),
          <fpage>509</fpage>
          -
          <lpage>512</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Kaiming</given-names>
            <surname>He</surname>
          </string-name>
          , Xiangyu Zhang, Shaoqing Ren, and
          <string-name>
            <given-names>Jian</given-names>
            <surname>Sun</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Deep residual learning for image recognition</article-title>
          .
          <source>In Proceedings of the IEEE conference on computer vision and pattern recognition</source>
          .
          <volume>770</volume>
          -
          <fpage>778</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Shawn</given-names>
            <surname>Hershey</surname>
          </string-name>
          , Sourish Chaudhuri, Daniel PW Ellis, Jort F Gemmeke,
          <string-name>
            <surname>Aren</surname>
            <given-names>Jansen</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>R Channing</given-names>
            <surname>Moore</surname>
          </string-name>
          , Manoj Plakal, Devin Platt,
          <article-title>Rif A Saurous, Bryan Seybold, and</article-title>
          <string-name>
            <surname>others.</surname>
          </string-name>
          <year>2017</year>
          .
          <article-title>CNN architectures for large-scale audio classification</article-title>
          .
          <source>In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)</source>
          . IEEE,
          <fpage>131</fpage>
          -
          <lpage>135</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Gao</surname>
            <given-names>Huang</given-names>
          </string-name>
          , Zhuang Liu,
          <string-name>
            <surname>Laurens Van Der Maaten</surname>
          </string-name>
          , and
          <string-name>
            <surname>Kilian Q Weinberger</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Densely connected convolutional networks</article-title>
          .
          <source>In Proceedings of the IEEE conference on computer vision and pattern recognition</source>
          .
          <volume>4700</volume>
          -
          <fpage>4708</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Phillip</surname>
            <given-names>Isola</given-names>
          </string-name>
          , Jianxiong Xiao, Devi Parikh, Antonio Torralba, and
          <string-name>
            <given-names>Aude</given-names>
            <surname>Oliva</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>What makes a photograph memorable?</article-title>
          <source>IEEE Transactions on Pattern Analysis and Machine Intelligence</source>
          <volume>36</volume>
          ,
          <issue>7</issue>
          (
          <year>2013</year>
          ),
          <fpage>1469</fpage>
          -
          <lpage>1482</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Aditya</surname>
            <given-names>Khosla</given-names>
          </string-name>
          , Akhil S Raju, Antonio Torralba, and
          <string-name>
            <given-names>Aude</given-names>
            <surname>Oliva</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Understanding and predicting image memorability at a large scale</article-title>
          .
          <source>In Proceedings of the IEEE International Conference on Computer Vision</source>
          . 2390-
          <fpage>2398</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Alex</surname>
            <given-names>Krizhevsky</given-names>
          </string-name>
          , Ilya Sutskever, and
          <string-name>
            <given-names>Geofrey E</given-names>
            <surname>Hinton</surname>
          </string-name>
          .
          <year>2012</year>
          .
          <article-title>Imagenet classification with deep convolutional neural networks</article-title>
          .
          <source>In Advances in Neural Information Processing Systems</source>
          .
          <volume>1097</volume>
          -
          <fpage>1105</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Anelise</surname>
            <given-names>Newman</given-names>
          </string-name>
          , Camilo Fosco, Vincent Casser,
          <string-name>
            <given-names>Allen</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <surname>Barry McNamara</surname>
            ,
            <given-names>and Aude</given-names>
          </string-name>
          <string-name>
            <surname>Oliva</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>Multimodal Memorability: Modeling Efects of Semantics and Decay on Video Memorability</article-title>
          . In Computer Vision - ECCV
          <year>2020</year>
          ,
          <string-name>
            <given-names>Andrea</given-names>
            <surname>Vedaldi</surname>
          </string-name>
          , Horst Bischof, Thomas Brox, and
          <string-name>
            <surname>Jan-Michael Frahm</surname>
          </string-name>
          (Eds.). Springer International Publishing, Cham,
          <fpage>223</fpage>
          -
          <lpage>240</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>Rukiye</given-names>
            <surname>Savran</surname>
          </string-name>
          <string-name>
            <given-names>Kiziltepe</given-names>
            , Lorin Sweeney, Mihai Gabriel Constantin, Faiyaz Doctor, Alba García Seco de Herrera,
            <surname>Claire-Hélène</surname>
          </string-name>
          <string-name>
            <surname>Demarty</surname>
          </string-name>
          , Graham Healy, Bogdan Ionescu,
          <string-name>
            <given-names>and Alan F.</given-names>
            <surname>Smeaton</surname>
          </string-name>
          .
          <year>2021</year>
          .
          <article-title>An Annotated Video Dataset for Computing Video Memorability</article-title>
          . Data in Brief (
          <year>2021</year>
          ),
          <volume>107671</volume>
          . https://doi.org/10.1016/j.dib.
          <year>2021</year>
          .107671
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Sumit</surname>
            <given-names>Shekhar</given-names>
          </string-name>
          , Dhruv Singal, Harvineet Singh,
          <string-name>
            <given-names>Manav</given-names>
            <surname>Kedia</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Akhil</given-names>
            <surname>Shetty</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Show and recall: Learning what makes videos memorable</article-title>
          .
          <source>In Proceedings of the IEEE International Conference on Computer Vision Workshops</source>
          .
          <fpage>2730</fpage>
          -
          <lpage>2739</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <surname>Aliaksandr</surname>
            <given-names>Siarohin</given-names>
          </string-name>
          , Gloria Zen, Cveta Majtanovic,
          <string-name>
            <surname>Xavier</surname>
            <given-names>AlamedaPineda</given-names>
          </string-name>
          , Elisa Ricci, and
          <string-name>
            <given-names>Nicu</given-names>
            <surname>Sebe</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Increasing Image Memorability with Neural Style Transfer</article-title>
          .
          <source>ACM Trans. Multimedia Comput. Commun. Appl</source>
          .
          <volume>15</volume>
          ,
          <issue>2</issue>
          ,
          <string-name>
            <surname>Article 42</surname>
          </string-name>
          (
          <year>June 2019</year>
          ),
          <volume>22</volume>
          pages. https: //doi.org/10.1145/3311781
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>Karen</given-names>
            <surname>Simonyan</surname>
          </string-name>
          and
          <string-name>
            <given-names>Andrew</given-names>
            <surname>Zisserman</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Very Deep Convolutional Networks for Large-Scale Image Recognition</article-title>
          .
          <source>In International Conference on Learning Representations.</source>
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>Hammad</given-names>
            <surname>Squalli-Houssaini</surname>
          </string-name>
          ,
          <source>Ngoc QK Duong</source>
          , Marquant Gwenaëlle, and
          <string-name>
            <surname>Claire-Hélène Demarty</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Deep learning for predicting image memorability</article-title>
          .
          <source>In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)</source>
          . IEEE,
          <fpage>2371</fpage>
          -
          <lpage>2375</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <surname>Lorin</surname>
            <given-names>Sweeney</given-names>
          </string-name>
          , Ana Matran-Fernandez,
          <string-name>
            <given-names>Sebastian</given-names>
            <surname>Halder</surname>
          </string-name>
          , Alba García Seco de Herrera, Alan
          <string-name>
            <given-names>F.</given-names>
            <surname>Smeaton</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Graham</given-names>
            <surname>Healy</surname>
          </string-name>
          .
          <year>2021</year>
          .
          <article-title>Overview of the EEG Pilot Subtask at MediaEval 2021: Predicting Media Memorability</article-title>
          .
          <source>Working Notes Proceedings of the MediaEval 2021 Workshop.</source>
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>Mingxing</given-names>
            <surname>Tan</surname>
          </string-name>
          and
          <string-name>
            <given-names>Quoc</given-names>
            <surname>Le</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Eficientnet: Rethinking model scaling for convolutional neural networks</article-title>
          .
          <source>In International Conference on Machine Learning. PMLR</source>
          ,
          <fpage>6105</fpage>
          -
          <lpage>6114</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <surname>Du</surname>
            <given-names>Tran</given-names>
          </string-name>
          , Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and
          <string-name>
            <given-names>Manohar</given-names>
            <surname>Paluri</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Learning spatiotemporal features with 3d convolutional networks</article-title>
          .
          <source>In Proceedings of the IEEE International Conference on Computer Vision</source>
          . 4489-
          <fpage>4497</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>