<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">The MediaEval 2018 Emotional Impact of Movies Task</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Emmanuel</forename><surname>Dellandréa</surname></persName>
							<email>emmanuel.dellandrea@ec-lyon.fr</email>
							<affiliation key="aff0">
								<orgName type="institution">Ecole Centrale de Lyon</orgName>
								<address>
									<country key="FR">France</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Martijn</forename><surname>Huigsloot</surname></persName>
							<email>huigsloot@nicam.nl</email>
							<affiliation key="aff1">
								<orgName type="institution">NICAM</orgName>
								<address>
									<country key="NL">Netherlands</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Liming</forename><surname>Chen</surname></persName>
							<email>liming.chen@ec-lyon.fr</email>
							<affiliation key="aff0">
								<orgName type="institution">Ecole Centrale de Lyon</orgName>
								<address>
									<country key="FR">France</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Yoann</forename><surname>Baveye</surname></persName>
							<email>yoann.baveye@capacites.fr</email>
							<affiliation key="aff2">
								<address>
									<settlement>Capacités</settlement>
									<country key="FR">France</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Zhongzhe</forename><surname>Xiao</surname></persName>
							<email>xiaozhongzhe@suda.edu.cn</email>
							<affiliation key="aff3">
								<orgName type="institution">Soochow University</orgName>
								<address>
									<country key="CN">China</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><surname>Sjöberg</surname></persName>
							<affiliation key="aff4">
								<orgName type="institution">Aalto University</orgName>
								<address>
									<country key="FI">Finland</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">The MediaEval 2018 Emotional Impact of Movies Task</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">53E59DEBD2284036BB683DB2A50A076E</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T02:17+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>This paper provides a description of the MediaEval 2018 "Emotional Impact of Movies task". It continues to build on last year's edition, integrating the feedback of previous participants. The goal is to create systems that automatically predict the emotional impact that video content will have on viewers, in terms of valence, arousal and fear. Here we provide a description of the use case, task challenges, dataset and ground truth, task run requirements and evaluation metrics.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">TASK DESCRIPTION</head><p>The task requires participants to deploy multimedia features and models to automatically predict the emotional impact of movies. This emotional impact is considered here to be the prediction of the expected emotion. The expected emotion is the emotion that Copyright held by the owner/author(s).</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">INTRODUCTION</head><p>Affective video content analysis aims at the automatic recognition of emotions elicited by videos. It has a large number of applications, including mood based personalized content recommendation <ref type="bibr" target="#b2">[3]</ref> or video indexing <ref type="bibr" target="#b11">[12]</ref>, and efficient movie visualization and browsing <ref type="bibr" target="#b12">[13]</ref>. Beyond the analysis of existing video material, affective computing techniques can also be used to generate new content, e.g., movie summarization <ref type="bibr" target="#b7">[8]</ref>, or personalized soundtrack recommendation to make user-generated videos more attractive <ref type="bibr" target="#b9">[10]</ref>. Affective techniques can also be used to enhance the user engagement with advertising content by optimizing the way ads are inserted inside videos <ref type="bibr" target="#b10">[11]</ref>.</p><p>While major progress has been achieved in computer vision for visual object detection, scene understanding and high-level concept recognition, a natural further step is the modeling and recognition of affective concepts. This has recently received increasing interest from research communities, e.g., computer vision, machine learning, with an overall goal of endowing computers with humanlike perception capabilities. Thus, this task is proposed to offer researchers a place to compare their approaches for the prediction of the emotional impact of movies. It continues to build on last year's edition <ref type="bibr" target="#b4">[5]</ref> integrating the feedback of participants. The task consists of two subtasks, the first one being related to valence and arousal prediction, and the second one to fear detection. the majority of the audience feels in response to the same content. In other words, the expected emotion is the expected value of experienced (i.e. induced) emotion in a population. While the induced emotion is subjective and context dependent, the expected emotion can be considered objective, as it reflects the more-or-less unanimous response of a general audience to a given stimulus <ref type="bibr" target="#b6">[7]</ref>.</p><p>This year, two scenarios are proposed as subtasks. In both cases, long movies are considered.</p><p>(1) Valence and Arousal prediction: participants' systems have to predict a score of expected valence and arousal continuously (every second) along movies. Valence is defined on a continuous scale from most negative to most positive emotions, while arousal is defined continuously from calmest to most active emotions <ref type="bibr" target="#b8">[9]</ref>; (2) Fear detection: the purpose here is to predict beginning and ending times of sequences inducing fear in movies. The targeted use case is the detection of frightening scenes to help systems protecting children from potentially harmful video content.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">DATA DESCRIPTION</head><p>The dataset used in this task is the LIRIS-ACCEDE dataset <ref type="foot" target="#foot_0">1</ref> . It contains videos from a set of 160 professionally made and amateur movies, shared under Creative Commons licenses that allow redistribution <ref type="bibr" target="#b1">[2]</ref>. Several movie genres are represented in this collection of movies such as horror, comedy, drama, action and so on. Languages are mainly English with a small set of Italian, Spanish, French and others subtitled in English. A total of 44 movies (total duration of 15 hours and 20 minutes) selected from the set of 160 movies are provided as development set for both subtasks with the annotations according to fear, valence and arousal. A complementary set of 10 movies (11 hours and 29 minutes) is available for the first subtask with the valence and arousal annotations.</p><p>The test set consists of 12 other movies selected from the set of 160 movies, for a total duration of 8 hours and 56 minutes.</p><p>In addition to the video data, participants are also provided with general purpose audio and visual content features. To compute audio features, movies have first been processed to extract consecutive 5-second segments sliding over the whole movie with a shift of 1 second. Then, audio features have been extracted from these segments using openSmile toolbox 2 <ref type="bibr" target="#b5">[6]</ref>. The default configuration named "emobase2010.conf" was used. It allows the computation of MediaEval'18, 29-31 October 2018, Sophia Antipolis, France E. Dellandréa et al.</p><p>1,582 features, which result from a base of 34 low-level descriptors (LLD) with 34 corresponding delta coefficients appended, and 21 functionals applied to each of these 68 LLD contours (1,428 features). In addition, 19 functionals are applied to the 4 pitch-based LLD and their four delta coefficient contours (152 features). Finally the number of pitch onsets (pseudo syllables) and the total duration of the input are appended (2 features).</p><p>Beyond audio features, for each movie, image frames were extracted every one second. For each of these images, several general purpose visual features have been provided. They have been computed using LIRE library <ref type="foot" target="#foot_1">3</ref> , except CNN features (VGG16 fc6 layer) that have been extracted using Matlab Neural Networks toolbox <ref type="foot" target="#foot_2">4</ref>  </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">GROUND TRUTH</head><p>As mentioned in the previous section, the development set contains a part that is common to both subtasks with valence, arousal and fear annotations (44 movies), and an additional part only concerning the first subtask, with valence and arousal annotations (10 movies).</p><p>For each movie from the development set for the first subtask, a file is provided containing valence and arousal values for each second of the movie.</p><p>Moreover, for all movies from the development set for the second subtask, a file is provided containing the beginning and ending times of each sequence in the movie inducing fear.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1">Ground Truth for the first subtask</head><p>In order to collect continuous valence and arousal annotations, a total of 28 French participants had to continuously indicate their level of valence and arousal while watching the movies using a modified version of the GTrace annotation tool <ref type="bibr" target="#b3">[4]</ref> and a joystick. Each annotator continuously annotated one subset of the movies considering the induced valence and another subset considering the induced arousal, for a total duration of around 8 hours on 2 days. Thus, each movie has been continuously annotated by three to five different annotators.</p><p>Then, the continuous valence and arousal annotations from the participants have been down-sampled by averaging the annotations over windows of 10 seconds with a shift of 1 second overlap (i.e., 1 value per second) in order to remove the noise due to unintended movements of the joystick. Finally, these post-processed continuous annotations have been averaged in order to create a continuous mean signal of the valence and arousal self-assessments, ranging from -1 (most negative for valence, most passive for arousal) to +1 (most positive for valence, most active for arousal). The details of this processing are given in <ref type="bibr" target="#b0">[1]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2">Ground Truth for the second subtask</head><p>Fear annotations for the second subtask were generated using a tool specifically designed for the classification of audio-visual media allowing to perform annotation while watching the movie (at the same time). The annotations have been realized by two well experienced team members of NICAM 5 both of them trained in classification of media. Each movie has been annotated by 1 annotator reporting the start and stop times of each sequence in the movie expected to induce fear.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">RUN DESCRIPTION</head><p>Participants can submit up to 5 runs for each of the two subtasks, so 10 runs in total. Models can rely on the features provided by the organizers or any other external data.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6">EVALUATION CRITERIA</head><p>Standard evaluation metrics are used to assess systems performance. The first subtask can be considered as a regression problem (estimation of expected valence and arousal scores) while the second subtask can be seen as a binary classification problem (the video segment is supposed to induce/not induce fear).</p><p>For the first subtask, the official metric is the Mean Square Error (MSE), which is the common measure generally used to evaluate regression models. However, to allow a deeper understanding of systems' performance, we also consider Pearson's Correlation Coefficient. Indeed, MSE is not always sufficient to analyze models efficiency and the correlation may be required to obtain a deeper performance analysis. As an example, if a large portion of the data is neutral (i.e., its valence score is close to 0.5) or is distributed around the neutral score, a uniform model that always outputs 0.5 will result in good MSE performance (low MSE). In this case, the lack of accuracy of the model will be brought to the fore by the correlation between the predicted values and the ground truth that will be also very low.</p><p>For the second subtask, as the goal is to detect time sequences inducing fear, the official metric is the Intersection over Union of time intervals.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7">CONCLUSION</head><p>The Emotional Impact of Movies Task provides participants with a comparative and collaborative evaluation framework for emotional detection in movies, in terms of valence, arousal and fear. The LIRIS-ACCEDE dataset has been used as development and test sets. Details on the methods and results of each individual team can be found in the papers of the participating teams in the MediaEval 2018 workshop proceedings.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head></head><label></label><figDesc>. The visual features are the following: Auto Color Correlogram, Color and Edge Directivity Descriptor, Color Layout, Edge Histogram, Fuzzy Color and Texture Histogram, Gabor, Joint descriptor joining CEDD and FCTH in one histogram, Scalable Color, Tamura, Local Binary Patterns, VGG16 fc6 layer.</figDesc><table /></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">http://liris-accede.ec-lyon.fr</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_1">http://www.lire-project.net/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="4" xml:id="foot_2">https://www.mathworks.com/products/neural-network.html</note>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>ACKNOWLEDGMENTS</head><p>This task is supported by the CHIST-ERA Visen project ANR-12-CHRI-0002-04.</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Deep Learning vs. Kernel Methods: Performance for Emotion Prediction Emotional Impact of Movies Task MediaEval&apos;18, 29</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Baveye</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Dellandréa</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Chamaret</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Chen</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Humaine Association Conference on Affective Computing and Intelligent Interaction (ACII)</title>
				<imprint>
			<date type="published" when="2015-10">2015. October 2018</date>
			<biblScope unit="page">31</biblScope>
		</imprint>
	</monogr>
	<note>Sophia Antipolis, France in Videos</note>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">LIRIS-ACCEDE: A Video Database for Affective Content Analysis</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Baveye</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Dellandréa</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Chamaret</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Chen</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE Transactions on Affective Computing</title>
		<imprint>
			<biblScope unit="volume">6</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="43" to="55" />
			<date type="published" when="2015">2015. 2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Affective recommendation of movies based on selected connotative features</title>
		<author>
			<persName><forename type="first">L</forename><surname>Canini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Benini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Leonardi</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE Transactions on Circuits and Systems for Video Technology</title>
		<imprint>
			<biblScope unit="volume">23</biblScope>
			<biblScope unit="issue">4</biblScope>
			<biblScope unit="page" from="636" to="647" />
			<date type="published" when="2013">2013. 2013</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Gtrace: General trace program compatible with emotionml</title>
		<author>
			<persName><forename type="first">R</forename><surname>Cowie</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Sawey</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Doherty</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Jaimovich</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Fyans</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Stapleton</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Humaine Association Conference on Affective Computing and Intelligent Interaction (ACII)</title>
				<imprint>
			<date type="published" when="2013">2013</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">The MediaEval 2017 Emotional Impact of Movies Task</title>
		<author>
			<persName><forename type="first">E</forename><surname>Dellandréa</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Huigsloot</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Baveye</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Sjöberg</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">MediaEval 2017 Workshop</title>
				<imprint>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Recent Developments in openSMILE, the Munich Open-Source Multimedia Feature Extractor</title>
		<author>
			<persName><forename type="first">F</forename><surname>Eyben</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Weninger</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Gross</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Schuller</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">ACM Multimedia (MM)</title>
				<meeting><address><addrLine>Barcelona, Spain</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2013">2013</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Extracting moods from pictures and sounds: Towards truly personalized TV</title>
		<author>
			<persName><forename type="first">A</forename><surname>Hanjalic</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE Signal Processing Magazine</title>
		<imprint>
			<date type="published" when="2006">2006. 2006</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Affective video summarization and story board generation using pupillary dilation and eye gaze</title>
		<author>
			<persName><forename type="first">H</forename><surname>Katti</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Yadati</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Kankanhalli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Tatseng</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">IEEE International Symposium on Multimedia (ISM)</title>
				<imprint>
			<date type="published" when="2011">2011</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Core affect and the psychological construction of emotion</title>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">A</forename><surname>Russell</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Psychological Review</title>
		<imprint>
			<date type="published" when="2003">2003. 2003</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Advisor: Personalized video soundtrack recommendation by late fusion with heuristic rankings</title>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">R</forename><surname>Shah</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Yu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Zimmermann</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">ACM International Conference on Multimedia</title>
				<imprint>
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Cavva: Computational affective video-in-video advertising</title>
		<author>
			<persName><forename type="first">K</forename><surname>Yadati</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Katti</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Kankanhalli</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE Transactions on Multimedia</title>
		<imprint>
			<biblScope unit="volume">16</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="15" to="23" />
			<date type="published" when="2014">2014. 2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Affective visualization and retrieval for music video</title>
		<author>
			<persName><forename type="first">S</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><surname>Huang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Jiang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Gao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><surname>Tian</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE Transactions on Multimedia</title>
		<imprint>
			<biblScope unit="volume">12</biblScope>
			<biblScope unit="issue">6</biblScope>
			<biblScope unit="page" from="510" to="522" />
			<date type="published" when="2010">2010. 2010</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Flexible presentation of videos based on affective content analysis</title>
		<author>
			<persName><forename type="first">S</forename><surname>Zhao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Yao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Sun</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Jiang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Xu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Advances in Multimedia Modeling</title>
		<imprint>
			<biblScope unit="page">7732</biblScope>
			<date type="published" when="2013">2013. 2013</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
