=Paper= {{Paper |id=None |storemode=property |title=LIA @ MediaEval 2013 MusiClef Task: A Combined Thematic and Acoustic Approach |pdfUrl=https://ceur-ws.org/Vol-1043/mediaeval2013_submission_75.pdf |volume=Vol-1043 |dblpUrl=https://dblp.org/rec/conf/mediaeval/MorchidDBLM13a }} ==LIA @ MediaEval 2013 MusiClef Task: A Combined Thematic and Acoustic Approach== https://ceur-ws.org/Vol-1043/mediaeval2013_submission_75.pdf
                  LIA @ MediaEval 2013 MusiClef Task:
              A Combined Thematic and Acoustic Approach

                         Mohamed Morchid, Richard Dufour, Mohamed Bouallegue,
                                  Georges Linarès and Driss Matrouf∗
                                                  LIA - University of Avignon
                                                        Avignon, France
                                       {firstname.lastname}@univ-avignon.fr

ABSTRACT                                                          As a result, each TV commercial from the test set will be
In this paper, we describe the LIA system proposed for the        associated with a song extracted from the development data.
MediaEval 2013 Soundtrack task. The aim is to predict               The second step has the responsibility to find, using audio
the most suitable soundtrack from a list of candidate songs,      features, the most similar songs to the one associated during
given a TV commercial. The organizers provide a develop-          the first step from a list of candidate songs (see figure 1).
ment dataset including multimedia features. The initial as-
sumption of the proposed system is that commercials which                                                                      l nearest commercial
                                                                                               Topic                                  with C1T
sell the same type of product, also share the same music                      LDA                space
rhythm. A two-fold system is proposed to provide a music
for a commercial: find commercials with close subjects in or-                                                          Topic




                                                                                                                                        Cosine Similarity α
                                                                                                                      Vectors
der to determine the mean rhythm of this subset, and then          DEV                                                {V d}d∈D
                                                                         {C d , S d}d∈D                                                                             l commercial




                                                                                                  Mapping
extract from the candidate songs the music which better
                                                                                                                                                              {C l , S l }l with highest
correspond to this mean rhythm.                                                                                                                               similarity α1, l with C 1
                                                                                                                      Topic
                                                                             C1                                       Vector
1.   INTRODUCTION                                                  A TV commercial
                                                                                                                       V1

   The success of a product or a service essentially depends of     from TEST set
the way to present it. Thus, companies pay much attention                  {S t}t∈T               Cosine Similarity                     S                       Mean of {S d}d∈l
to choose the most appropriate advertisement that will make
                                                                   INPUT
a difference in the customer choice. The advertisers have
                                                                                          5 nearest soundtracks {S t}t=1,...,5
different media possibilities, such as journal paper, radio,                                    of the commercial C 1
TV or Internet. In this context, they can exploit the audio                                                         OUTPUT
media using a song related to the commercial which attracts
listeners. Therefore, the choice of an appropriate song is        Figure 1: Global architecture of the proposed sys-
crucial and can determine the success of a product [5, 2].        tem.
   For these reasons, the MediaEval 2013 Soundtrack task
for commercials becomes a challenging and helpful task [3].
Indeed, the MusiClef task seeks to make this process auto-           In details, the development set D is composed of TV com-
mated by taking into account both context- and content-           mercials C d , with for each, a soundtrack S d and a vector
based information about the video, the brand, and the mu-         representation V d related to the dth TV commercial. In
sic. The main difficulty of this task is to find the set of       the same manner, the test set T is composed of TV com-
relevant features that best describes the most appropriate        mercials C t , with, for the tth one, a vector representation
song for a video. We propose a hybrid approach that uses a        V t and a soundtrack S t to predict. Then a similarity score
set of features from textual and audio media.                     {αd,t }t=1,...,T                                     d
                                                                         d=1,...,D is computed for each commercial Ci of the
                                                                                                                  t
                                                                  development set given one from the test set C :
2.   PROPOSED APPROACH
   The proposed hybrid system is composed of two processes.                               D = {C d , V D , S d }d=1,...,D                                                        (1)
The first one projects a TV commercial into a topic space to
                                                                                          T = {C t , V T , Skt }k=1,...,5000
                                                                                                                t=1,...,T    .
find a set of other commercials sharing close topics. A TV
commercial from the test set is thus linked to the TV com-          In the next sections, the topic space representation and
mercial from the development set sharing the closest topics.      the mapping of a commercial in this topic representation
∗
  This work was funded by the SUMACC project supported            are described. Then, the computed similarity score is de-
by the French National Research Agency (ANR) under con-           tailed. Finally, the soundtrack prediction process from a
tract ANR-10-CORD-007.                                            TV commercial is explained.

                                                                  2.1          Topic representation of a TV Commercial
Copyright is held by the author/owner(s).                           Let’s consider a corpus D from the development set of TV
MediaEval 2013 Workshop, October 18-19, 2013, Barcelona, Spain    commercials with a word vocabulary V = {w1 , . . . , wN } of
                                                z2                                                                     is performed as:
                                       WORD        WEIGHT
                                                                                                z3
                                                                                    WORD          WEIGHT
                                                                                                                                                    1X d
                                        w1          P (w1 |z2 )                                                                               S=      S .
                                        w2          P (w2 |z2 )                         w1        P (w1 |z3 )                                       l
           z1                                                                                                                                         d∈l




                                                ...
                                                                                        w2        P (w2 |z3 )




                                                                                                ...
                                        w|V |      P (w|V | |z2 )
  WORD       WEIGHT                                                                                                       Finally, the cosine measure between this mean rhythm S
                                                                                        w|V |     P (w|V | |z3 )
   w1        P (w1 |z1 )                                                                                               of the l nearest commercials from D and each commercial
   w2        P (w2 |z1 )
                                                                                                                       (cosine(S, S t )t∈T ) is used to find, from the soundtrack S t
           ...




                                                          Vd [2]
   w|V |     P (w|V | |z1 )
                              Vd [1]
                                                                               Vd [3]                                  of the test set T , the 5 songs from all the candidates having
                                                                                                                       the closest rhythm pattern.
                                                                                                 z4
                                  TV Commercial                                                                        3.    EXPERIMENTS AND RESULTS
                                                                                        WORD          WEIGHT
                                                                                          w1
                                                                      Vd [4]                          P (w1 |z4 )         The proposed system is evaluated in the MediaEval 2013
                                                Vd [n]                                    w2          P (w2 |z4 )
                                                                                                                       MusiClef benchmark [4]. The aim of this task is to predict




                                                                                                 ...
                                                zn                                       w|V |        P (w|V | |z4 )   for each video in the test set, the most suitable soundtrack

                                                                          ...
                                       WORD        WEIGHT                                                              from 5,000 candidate songs. The dataset is split into 3 sets.
                                                                    ...                                                The development set contains multimodal information on
                                        w1         P (w1 |zn )
                                        w2         P (w2 |zn )
                                                                                                                       392 commercials (various metadata, Youtube uploader com-
                                                ...




                                        w|V |      P (w|V | |zn )                                                      ments, various audio features, video features, web pages and
                                                                                                                       text features). The test set is a set of 55 videos where a song
                                                                                                                       should be associated using the recommandation set of 5,000
Figure 2: Mapping of a TV commercial in the topic
                                                                                                                       soundtracks (30 seconds long excerpts).
space.
                                                                                                                          For each video in the test set, a ranked list of 5 candidate
                                                                                                                       songs is proposed. The song prediction evaluation is man-
                                                                                                                       ually performed using the Amazon Mechanical Turk plat-
size N . This corpus contains 10, 724 Web pages related to                                                             form. Three scores have been computed from our system
brands of the commercials contained in D. This corpus is                                                               output [4]:
composed of 44, 229, 747 words for a vocabulary of 4, 476, 153
                                                                                                                            • First rank average score: 2.16
unique words. The topic representation is performed using
a Latent Dirichlet Allocation (LDA) [1] approach. At the                                                                    • Top 5 average score (arithmetic mean): 2.24
final LDA analysis, a topic space m of n topics is obtained
with, for each theme z, the probability of each word w of                                                                   • Top 5 average score (harmonic mean, taking rank into
v knowing z and for the entire model m, the probability of                                                                    account): 2.22
each theme z knowing the model m. Each TV commercial
from both development and test set is mapped into the topic                                                               Considering that human judges rate the predicted songs
space (see figure 2).                                                                                                  from 1 (very poor) to 4 (very well), we can consider that our
                                                                                                                       system is slightly better than the mean evaluation score (2)
2.2        Similarity measure                                                                                          no matter the metric considered.
   Each commercial have been mapped into the topic space
to produce its vector representation. Then, commercials                                                                4.    CONCLUSION
from the test set T that deal with the same subjects of com-                                                             In this paper, an automatic system to assign a sound-
mercials from the development set D are clustered. The                                                                 track to a TV commercial has been proposed. This system
cosine is used as a similarity measure:                                                                                combines two media: textual commercial content and audio
                                                                                                                       rhythm pattern.
                 cosine(V d , V t ) = αd,t
                                                                                                                       5.    REFERENCES
                                                          n
                                                 V d [i] × V t [i]
                                                          P
                                                          i=1
                                          = s             s                                                  (2)       [1] D. Blei, A. Ng, and M. Jordan. Latent dirichlet
                                              n             n                                                              allocation. The Journal of Machine Learning Research,
                                                        2
                                                              V t [i]2
                                              P             P
                                                V d [i]
                                                         i=1           i=1                                                 3:993–1022, 2003.
                                                                                                                       [2] C. Bullerjahn. The effectiveness of music in television
2.3        Rhythm pattern                                                                                                  commercials. Food Preferences and Taste: Continuity
                                                                                                                           and Change, 2:207, 1997.
   The cosine measure, presented in the previous section, is
also used to evaluate the similarity between a mean rhythm                                                             [3] N. Hoeberichts. Music and advertising: The effect of
pattern vector S t of a song and all the candidate songs Skt                                                               music in television commercials on consumer attitudes.
of the test set.                                                                                                           Bachelor Thesis, 2012.
   In details, each commercial from D, is related with a                                                               [4] C. C. S. Liem, N. Orio, G. Peeters, and M. Scheld.
soundtrack that is represented with a rhythm pattern vec-                                                                  MusiClef 2013: Soundtrack Selection for Commercials.
tor. In our experiments, the 10 rhythm features of the song                                                                In MediaEval, 2013.
are used (speed, percussion, periodicity, rhythm pattern. . . ).                                                       [5] C. W. Park and S. M. Young. Consumer response to
As a result, each commercial is represented by a rhythm                                                                    television commercials: The impact of involvement and
pattern vector of size 58. From the subset of soundtracks of                                                               background music on brand attitude formation. Journal
the l nearest commercials from D, a mean rhythm vector S                                                                   of Marketing Research, pages 11–24, 1986.