<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Detection of Musical Event Drop from Crowdsourced Annotations Using a Noisy Channel Model</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Naveen Kumar</institution>
          ,
          <addr-line>Shrikanth S. Narayanan</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2014</year>
      </pub-date>
      <fpage>16</fpage>
      <lpage>17</lpage>
      <abstract>
        <p>This paper describes the algorithm for our submission to the MediaEval 2014 crowdsourcing challenge. We perform a Maximum Likelihood (ML) estimation of the true label, using only the multiple noisy labels. Each annotator's decision is modeled by a die-toss based on which the annotator changes the true label. We learn parameters of this noisy channel model using the Expectation-Maximization algorithm. We also show that using a smaller number of annotators in the model than the actual number can give better accuracy because there is more data per annotator to estimate the parameters reliably.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. INTRODUCTION</title>
      <p>
        The Mediaeval 2014 crowdsourcing challenge [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] involves
multiple noisy annotations for presence of the musical event
drop in 15s clips taken from Electronic Dance Music (EDM)
genre music. Each annotator assigns one of 3 class labels
depending on the extent to which the event is present in the 15
second clip. For each such clip atleast 3 unique annotations
are available from di erent annotators. The total number
of unique annotators is 30, however the bulk of annotations
are done by a handful of them (Fig.1).
      </p>
      <p>
        The typical approach to modeling multiple noisy
annotations is to model each of the M annotators as a noisy
channel that distorts the true label Y into a noisy
annotation Ye k for each of the K annotations, k = 1 : : : ; K per
song. This can either be done in a data-independent [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]
or a data-dependent fashion [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. However, in the current
problem at hand, these methods are not readily applicable,
beause of our lack understanding of good features for the
task. The 2014 crowdsourcing challenge dataset comprises
of only noisy annotations and without any ground truth the
process of feature design is di cult.
      </p>
      <p>
        Hence, we use a much simpler model instead, based on [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]
which only uses the multiple noisy annotations and models
each annotator as a noisy channel that corrupts the \true
label" (Y ) by tossing a B-faced die where B is the number
of classes and the die is chosen depending on Y .
      </p>
    </sec>
    <sec id="sec-2">
      <title>NOISY CHANNEL MODEL</title>
      <p>500
s
ion400
tt
a
o
n
fan300
o
r
e
bm200
u
N
100
00
5 10 15 20 25 30 35</p>
      <p>Annotator
Figure 1: Number of annotations per annotator.
Note that most of the annotations are from a few
annotators.
vary for each song. In addition, we denote the annotator
id for each annotation Yeik using Aik. This information is
provided on our dataset. For each annotator m we denote
the parameters of her noisy channel model by m. As an
example if we denote pik = P r(Yi = k) and qij = P r(Yei = j)
then the mth annotator distorts her label as qi = mpi.</p>
      <p>We treat the true label Yi as a hidden parameter and
perform Expectation Maximization to estimate it for each
parameter, learning the model parameters m at the same
time. Since the annotator ids for each annotation are known
to us, it is straightforward to compute the total data
likelihood. For a dataset D = fY; Ye ; Ag it is shown in Eqn.(1).</p>
      <p>N
P r(D; 1 M ) = Y P r(Yi) Y P (Yeik; AikjYi; 1 M )
i=1 k</p>
      <p>N
= Y P r(Yi) Y
i=1
k
ab; if Aik = m; Yeik = a; Yi = b
m
Note that since Yi is a latent variable, in practice we shall
maximize a lower bound of this likelihood function by
taking an expectation w.r.t posterior distribution of the latent
variable. This amounts to replacing b by a soft label pib,</p>
      <p>N
E[P r(D)] = Y P r(Yi) Y Y( amb)pib ; if Aik = m; Yeik = a
i=1 k b
where pib is de ned as earlier. We compute pib formally in
(1)
(2)
(6)
0.8
0.75
0.7
0.55
0.5
0.450</p>
      <p>N
b = X
i=1</p>
      <p>ib=N
amb = P r(Yeik = ajYi = b; Aik = m)
=</p>
      <p>PN
i=1</p>
      <p>PN
i=1
Pk ib (Yeik = a; Aik = m)</p>
      <p>Pk ib (Aik = m)
To estimate the parameter amb we count all probability mass
for true class b from annotator m when the noisy label
annotated was a. This is divided by the probability mass for
annotator m, for the true label b irrespective of the
annotator's label.</p>
      <p>We keep repeating these steps till the update in log-likelihood
is below a certain threshold.
2.3</p>
    </sec>
    <sec id="sec-3">
      <title>Uniqueness and Initialization</title>
      <p>The EM algorithm can be shown to be a gradient ascent
on log likelihood and hence is prone to getting stuck in
local optima. Moreover, for this speci c model there is an
inherent non-uniqueness resulting from assignment of class
labels. This means that by changing the order of columns in
the parameters m we can obtain a di erent permutation of
true class labels. Each such permutation will still yield the
same value of log-likelhood and is hence an equally optimal
solution. This makes a good initialization of the EM
algorithm important in this case. We use labels obtained using
a majority vote as the initial estimates for ib.</p>
      <p>Additionally, as pointed earlier most of the annotations
are from a handful of annotators. This can lead to poor
parameter estimates for annotators with few annotations.
Thus, we choose the number of annotators M within the
model to be smaller than the actual number of annotators.
We use M = 8 for the submitted runs based on a rough
estimate from Fig.1. The annotator ids for top (M 1)
annotators by number of annotations are retained. The rest
are grouped together under the M th annotator id. The e ect
of varying the parameter M is shown in Fig.2. Finally, to
deal with numerical instabilities resulting from dividing by
small numbers in the M-step we use Laplace smoothing.
3.</p>
    </sec>
    <sec id="sec-4">
      <title>RESULTS AND CONCLUSIONS</title>
      <p>We submitted two systems using the proposed method
using a random initialization and the other using majority
voted labels. The results are shown in Table 1. We compare
the results against a high- delity annotation that is assumed
to be the ground truth for the purposes of this challenge.</p>
      <p>The accuracy for the submitted systems indicate that a
proper initialization of the EM-algorithm is important as
anticipated. Using labels obtained through majority voting
of multiple noisy annotations allows us to obtain better
results compared to simple sample level majority voting.
Results are also sensitive to the number of annotators selected,
and in the future we would like to automatically learn the
paramter M .</p>
      <p>WF
UWF
the next section by estimating the posterior distribution of
true labels given the noisy ones.
2.1</p>
    </sec>
    <sec id="sec-5">
      <title>Expectation Step</title>
      <p>In this step, we estimate the posterior probability of the
latent variable Yi given the noisy annotations Yeik" model
parameters 1:::M and class priors b = P r(Yib = 1). This
is done as follows</p>
      <p>P r(Yib = 1jYei1:::K ) =</p>
      <p>P r(Yei1:::K jYib = 1)P r(Yib = 1)</p>
      <p>Pj P r(Yei1:::K jYij = 1)P r(Yij = 1)
We denote this as ib and note this can be computed by
knowledge of parameters 1:::m and P r(Yib = 1) that we
shall refer to as b.
2.2</p>
    </sec>
    <sec id="sec-6">
      <title>Maximization Step</title>
      <p>This step performs optimization of the alternate
likelihood function. The M-step in this case can be performed
by simple count and divide as follows
2 4 6 8 10 12 14 16 18 20</p>
      <p>M
Figure 2: Figure shows the e ect of the assumed
number of annotators M on system F1-score.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>K.</given-names>
            <surname>Audhkhasi</surname>
          </string-name>
          and
          <string-name>
            <given-names>S. S.</given-names>
            <surname>Narayanan</surname>
          </string-name>
          .
          <article-title>Data-dependent evaluator modeling and its application to emotional valence classi cation from speech</article-title>
          .
          <source>In INTERSPEECH</source>
          , pages
          <volume>2366</volume>
          {
          <fpage>2369</fpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>A. P.</given-names>
            <surname>Dawid</surname>
          </string-name>
          and
          <string-name>
            <given-names>A. M.</given-names>
            <surname>Skene</surname>
          </string-name>
          .
          <article-title>Maximum likelihood estimation of observer error-rates using the em algorithm</article-title>
          . Applied statistics, pages
          <volume>20</volume>
          {
          <fpage>28</fpage>
          ,
          <year>1979</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>M. L.</given-names>
            <surname>Karthik</surname>
          </string-name>
          <string-name>
            <given-names>Yadati</given-names>
            ,
            <surname>Pavala</surname>
          </string-name>
          <string-name>
            <given-names>S.N. Chandrasekaran</given-names>
            <surname>Ayyanathan</surname>
          </string-name>
          .
          <article-title>Crowdsorting timed comments about music: Foundations for a new crowdsourcing task</article-title>
          .
          <source>In MediaEval 2014 Workshop</source>
          , Barcelona, Spain, October
          <volume>16</volume>
          -17
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>V. C.</given-names>
            <surname>Raykar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. H.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Jerebko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Florin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. H.</given-names>
            <surname>Valadez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Bogoni</surname>
          </string-name>
          , and
          <string-name>
            <given-names>L.</given-names>
            <surname>Moy</surname>
          </string-name>
          .
          <article-title>Supervised learning from multiple experts: whom to trust when everyone lies a bit</article-title>
          .
          <source>In Proceedings of the 26th Annual international conference on machine learning</source>
          , pages
          <volume>889</volume>
          {
          <fpage>896</fpage>
          . ACM,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>