<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>LIA @ MediaEval 2013 MusiClef Task: A Combined Thematic and Acoustic Approach</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Mohamed Morchid</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Richard Dufour</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mohamed Bouallegue</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Georges Linarès</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Driss Matrouf</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>LIA - University of Avignon Avignon</institution>
          ,
          <country country="FR">France</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2013</year>
      </pub-date>
      <fpage>18</fpage>
      <lpage>19</lpage>
      <abstract>
        <p>In this paper, we describe the LIA system proposed for the MediaEval 2013 Soundtrack task. The aim is to predict the most suitable soundtrack from a list of candidate songs, given a TV commercial. The organizers provide a development dataset including multimedia features. The initial assumption of the proposed system is that commercials which sell the same type of product, also share the same music rhythm. A two-fold system is proposed to provide a music for a commercial: nd commercials with close subjects in order to determine the mean rhythm of this subset, and then extract from the candidate songs the music which better correspond to this mean rhythm.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. INTRODUCTION</title>
      <p>
        The success of a product or a service essentially depends of
the way to present it. Thus, companies pay much attention
to choose the most appropriate advertisement that will make
a di erence in the customer choice. The advertisers have
di erent media possibilities, such as journal paper, radio,
TV or Internet. In this context, they can exploit the audio
media using a song related to the commercial which attracts
listeners. Therefore, the choice of an appropriate song is
crucial and can determine the success of a product [
        <xref ref-type="bibr" rid="ref2 ref5">5, 2</xref>
        ].
      </p>
      <p>
        For these reasons, the MediaEval 2013 Soundtrack task
for commercials becomes a challenging and helpful task [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
Indeed, the MusiClef task seeks to make this process
automated by taking into account both context- and
contentbased information about the video, the brand, and the
music. The main di culty of this task is to nd the set of
relevant features that best describes the most appropriate
song for a video. We propose a hybrid approach that uses a
set of features from textual and audio media.
      </p>
    </sec>
    <sec id="sec-2">
      <title>PROPOSED APPROACH</title>
      <p>The proposed hybrid system is composed of two processes.
The rst one projects a TV commercial into a topic space to
nd a set of other commercials sharing close topics. A TV
commercial from the test set is thus linked to the TV
commercial from the development set sharing the closest topics.
This work was funded by the SUMACC project supported
by the French National Research Agency (ANR) under
contract ANR-10-CORD-007.
As a result, each TV commercial from the test set will be
associated with a song extracted from the development data.</p>
      <p>The second step has the responsibility to nd, using audio
features, the most similar songs to the one associated during
the rst step from a list of candidate songs (see gure 1).</p>
      <p>C1
A TV commercial
from TEST set</p>
      <p>{ St} t∈T</p>
      <p>INPUT</p>
      <p>In details, the development set D is composed of TV
commercials Cd, with for each, a soundtrack Sd and a vector
representation V d related to the dth TV commercial. In
the same manner, the test set T is composed of TV
commercials Ct, with, for the tth one, a vector representation
V t and a soundtrack St to predict. Then a similarity score
t=1;:::;T
f d;tgd=1;:::;D is computed for each commercial Cid of the
development set given one from the test set Ct:
D = fCd; V D; Sdgd=1;:::;D
T = fCt; V T ; Sktgtk==11;;::::::;;T5000 :
(1)</p>
      <p>In the next sections, the topic space representation and
the mapping of a commercial in this topic representation
are described. Then, the computed similarity score is
detailed. Finally, the soundtrack prediction process from a
TV commercial is explained.
2.1</p>
    </sec>
    <sec id="sec-3">
      <title>Topic representation of a TV Commercial</title>
      <p>Let's consider a corpus D from the development set of TV
commercials with a word vocabulary V = fw1; : : : ; wN g of
Tosppicace</p>
      <p>M
a
p
p
i
n
g</p>
      <sec id="sec-3-1">
        <title>Cosine Similarity</title>
        <p>5 nearest soundtracks { St} t=1,...,5
of the commercial C1
OUTPUT</p>
      </sec>
      <sec id="sec-3-2">
        <title>Topic</title>
      </sec>
      <sec id="sec-3-3">
        <title>Vectors</title>
        <p>{ V d} d∈D</p>
      </sec>
      <sec id="sec-3-4">
        <title>Topic</title>
      </sec>
      <sec id="sec-3-5">
        <title>Vector</title>
        <p>V 1
l nearest commercial
with C1T</p>
        <p>C
osi
n
e
S
i
m
il
arit
y
α
S
l commercial
{ Cl, Sl} l with highest
similarity α1, l with C1
Mean of { Sd} d∈l
z1</p>
        <sec id="sec-3-5-1">
          <title>WORD WEIGHT</title>
          <p>
            w1 P(w1| z1)
w2 . P(w2| z1)
.
w| V | . P(w| V | | z1)
Vd[
            <xref ref-type="bibr" rid="ref2">2</xref>
            ]
Vd[n]
zn
          </p>
        </sec>
        <sec id="sec-3-5-2">
          <title>WORD WEIGHT</title>
          <p>w1 P(w1| zn)
w2 . P(w2| zn)</p>
          <p>.</p>
          <p>
            w| V | . P(w| V | | zn)
Vd[
            <xref ref-type="bibr" rid="ref1">1</xref>
            ]
          </p>
          <p>TV Commercial</p>
          <p>
            Vd[
            <xref ref-type="bibr" rid="ref3">3</xref>
            ]
          </p>
          <p>
            Vd[
            <xref ref-type="bibr" rid="ref4">4</xref>
            ]
. . . . . .
size N . This corpus contains 10; 724 Web pages related to
brands of the commercials contained in D. This corpus is
composed of 44; 229; 747 words for a vocabulary of 4; 476; 153
unique words. The topic representation is performed using
a Latent Dirichlet Allocation (LDA) [
            <xref ref-type="bibr" rid="ref1">1</xref>
            ] approach. At the
nal LDA analysis, a topic space m of n topics is obtained
with, for each theme z, the probability of each word w of
v knowing z and for the entire model m, the probability of
each theme z knowing the model m. Each TV commercial
from both development and test set is mapped into the topic
space (see gure 2).
2.2
          </p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Similarity measure</title>
      <p>Each commercial have been mapped into the topic space
to produce its vector representation. Then, commercials
from the test set T that deal with the same subjects of
commercials from the development set D are clustered. The
cosine is used as a similarity measure:
cosine(V d; V t) = d;t
=
n
P V d[i]
i=1
s n</p>
      <p>P V d[i]2
i=1</p>
      <p>V t[i]
s n</p>
      <p>P V t[i]2
i=1
(2)
2.3</p>
    </sec>
    <sec id="sec-5">
      <title>Rhythm pattern</title>
      <p>The cosine measure, presented in the previous section, is
also used to evaluate the similarity between a mean rhythm
pattern vector St of a song and all the candidate songs Skt
of the test set.</p>
      <p>In details, each commercial from D, is related with a
soundtrack that is represented with a rhythm pattern
vector. In our experiments, the 10 rhythm features of the song
are used (speed, percussion, periodicity, rhythm pattern. . . ).
As a result, each commercial is represented by a rhythm
pattern vector of size 58. From the subset of soundtracks of
the l nearest commercials from D, a mean rhythm vector S
is performed as:</p>
      <p>S = 1 X Sd :
l</p>
      <p>Finally, the cosine measure between this mean rhythm S
of the l nearest commercials from D and each commercial
(cosine(S; St)t2T ) is used to nd, from the soundtrack St
of the test set T , the 5 songs from all the candidates having
the closest rhythm pattern.</p>
    </sec>
    <sec id="sec-6">
      <title>EXPERIMENTS AND RESULTS</title>
      <p>
        The proposed system is evaluated in the MediaEval 2013
MusiClef benchmark [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. The aim of this task is to predict
for each video in the test set, the most suitable soundtrack
from 5,000 candidate songs. The dataset is split into 3 sets.
The development set contains multimodal information on
392 commercials (various metadata, Youtube uploader
comments, various audio features, video features, web pages and
text features). The test set is a set of 55 videos where a song
should be associated using the recommandation set of 5,000
soundtracks (30 seconds long excerpts).
      </p>
      <p>
        For each video in the test set, a ranked list of 5 candidate
songs is proposed. The song prediction evaluation is
manually performed using the Amazon Mechanical Turk
platform. Three scores have been computed from our system
output [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]:
      </p>
      <p>First rank average score: 2.16
Top 5 average score (arithmetic mean): 2.24
Top 5 average score (harmonic mean, taking rank into
account): 2.22</p>
      <p>Considering that human judges rate the predicted songs
from 1 (very poor) to 4 (very well), we can consider that our
system is slightly better than the mean evaluation score (2)
no matter the metric considered.</p>
    </sec>
    <sec id="sec-7">
      <title>CONCLUSION</title>
      <p>In this paper, an automatic system to assign a
soundtrack to a TV commercial has been proposed. This system
combines two media: textual commercial content and audio
rhythm pattern.
5.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>D.</given-names>
            <surname>Blei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ng</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Jordan</surname>
          </string-name>
          .
          <article-title>Latent dirichlet allocation</article-title>
          .
          <source>The Journal of Machine Learning Research</source>
          ,
          <volume>3</volume>
          :
          <fpage>993</fpage>
          {
          <fpage>1022</fpage>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>C.</given-names>
            <surname>Bullerjahn</surname>
          </string-name>
          .
          <article-title>The e ectiveness of music in television commercials</article-title>
          .
          <source>Food Preferences and Taste: Continuity and Change</source>
          ,
          <volume>2</volume>
          :
          <fpage>207</fpage>
          ,
          <year>1997</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>N.</given-names>
            <surname>Hoeberichts</surname>
          </string-name>
          .
          <article-title>Music and advertising: The e ect of music in television commercials on consumer attitudes</article-title>
          .
          <source>Bachelor Thesis</source>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>C. C. S.</given-names>
            <surname>Liem</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Orio</surname>
          </string-name>
          , G. Peeters, and
          <string-name>
            <given-names>M.</given-names>
            <surname>Scheld</surname>
          </string-name>
          .
          <source>MusiClef</source>
          <year>2013</year>
          :
          <article-title>Soundtrack Selection for Commercials</article-title>
          . In MediaEval,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>C. W.</given-names>
            <surname>Park</surname>
          </string-name>
          and
          <string-name>
            <given-names>S. M.</given-names>
            <surname>Young</surname>
          </string-name>
          .
          <article-title>Consumer response to television commercials: The impact of involvement and background music on brand attitude formation</article-title>
          .
          <source>Journal of Marketing Research</source>
          , pages
          <volume>11</volume>
          {
          <fpage>24</fpage>
          ,
          <year>1986</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>