<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Aggregating Crowdsourced Image Segmentations</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Akash Das Sarma Facebook</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Inc. akashds@fb.com</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Aditya Parameswaran University of Illinois</institution>
          ,
          <addr-line>Urbana-Champaign</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Doris Jung-Lin Lee University of Illinois</institution>
          ,
          <addr-line>Urbana-Champaign</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Instance-level image segmentation provides rich information crucial for scene understanding in a variety of real-world applications. In this paper, we evaluate multiple crowdsourced algorithms for the image segmentation problem, including novel worker-aggregation-based methods and retrieval-based methods from prior work. We characterize the different types of worker errors observed in crowdsourced segmentation, and present a clustering algorithm as a preprocessing step that is able to capture and eliminate errors arising due to workers having different semantic perspectives. We demonstrate that aggregation-based algorithms attain higher accuracies than existing retrieval-based approaches, while scaling better with increasing numbers of worker segmentations.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>
        Precise, instance-level object segmentation is crucial for
identifying and tracking objects in a variety of real-world
emergent applications of autonomy, including robotics
        <xref ref-type="bibr" rid="ref12">(Natonek 1998)</xref>
        , image organization and retrieval
        <xref ref-type="bibr" rid="ref20">(Yamaguchi
2012)</xref>
        , and medicine
        <xref ref-type="bibr" rid="ref1 ref11 ref7 ref8">(Irshad and et. al. 2014)</xref>
        . To this end,
there has been a lot of work on employing crowdsourcing
to generate training data for segmentation, including
PascalVOC
        <xref ref-type="bibr" rid="ref14 ref15 ref3 ref4 ref5 ref8">(Everingham et al. 2015)</xref>
        , LabelMe
        <xref ref-type="bibr" rid="ref17 ref19">(Torralba et al.
2010)</xref>
        , OpenSurfaces
        <xref ref-type="bibr" rid="ref14 ref15 ref3 ref4 ref5 ref8">(Bell et al. 2015)</xref>
        , and MS-COCO
        <xref ref-type="bibr" rid="ref10">(Lin
et al. 2012)</xref>
        . Unfortunately, raw data collected from the
crowd is known to be noisy due to varying degrees of
worker skills, attention, and motivation
        <xref ref-type="bibr" rid="ref1 ref11 ref17 ref19 ref7 ref8">(Bell et al. 2014;
Welinder et al. 2010)</xref>
        .
      </p>
      <p>
        To deal with these challenges, many have employed
heuristics indicative of crowdsourced segmentation quality to pick
the best worker-provided segmentation
        <xref ref-type="bibr" rid="ref16 ref18">(Sorokin and Forsyth
2008; Vittayakorn and Hays 2011)</xref>
        . However, this approach
ends up discarding the majority of the worker segmentations
and is limited by what the best worker can do. In this paper,
we make two contributions: First, we introduce a novel class
of aggregation-based methods that incorporates portions of
segmentations from multiple workers into a combined one
described in Section 4. To our surprise, despite its intuitive
simplicity, we have not seen this class of algorithms described
or evaluated in prior work. We evaluate this class of
algorithms against existing methods in Section 6. Second, our
analysis of common worker errors in crowdsourced
segmentation shows that workers often segment the wrong objects or
erroneously include or exclude large semantically-ambiguous
portions of an object in the resulting segmentation. We
disCopyright c 2018for this paper by its authors. Copying permitted
for private and academic purposes.
cuss such errors in Section 3 and propose a clustering-based
preprocessing technique that resolves them in Section 5.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>
        As shown in Figure 1, quality evaluation methods for
crowdsourced segmentation can be classified into two categories:
Retrieval-based methods pick the “best” worker
segmentation based on some scoring criteria that evaluates the
quality of each segmentation, including vision information
        <xref ref-type="bibr" rid="ref14 ref15 ref18 ref3 ref4 ref5 ref8">(Vittayakorn and Hays 2011; Russakovsky et al. 2015)</xref>
        , and
clickstream behavior
        <xref ref-type="bibr" rid="ref14 ref14 ref15 ref15 ref16 ref3 ref3 ref4 ref4 ref5 ref5 ref8 ref8">(Cabezas et al. 2015; Sameki et al. 2015;
Sorokin and Forsyth 2008)</xref>
        .
      </p>
      <p>Aggregation-based methods combine multiple worker
segmentations to produce a final segmentation that is not
restricted to any single worker segmentation. An
aggregationbased majority vote approach was employed in Sameki et
al. (2015) to create an expert-established gold standard for
characterizing their dataset and algorithmic accuracies, rather
than for segmentation quality evaluation as described here.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Error Analysis</title>
      <p>On collecting and analyzing a number of crowdsourced
segmentations (described in Section 6), we found that common
worker segmentation errors can be classified into three types:
(1) Semantic Ambiguity: workers have differing opinions
on whether particular regions belong to an object (Figure 2
left: annotations around ‘flower and vase’ when ‘vase’ is
requested); (2) Semantic Error: workers annotate the wrong
object entirely (Figure 2 right: annotations around ‘turtle’ and
‘monitor’ when ‘computer’ is requested.); and (3) Boundary
Imperfection: workers make unintentional mistakes while
drawing the boundaries, either due to low image resolution,
small area of the object, or lack of drawing skills (Figure 3
left: imprecision around the ‘dog’ object).</p>
      <p>Quality evaluation methods in prior work have largely
focused on minimizing boundary imperfection issues. So,
we first describe our novel aggregation-based algorithms
designed to reduce boundary imperfections in Section 4. Next,
in Section 5, we discuss a preprocessing method that
eliminates semantic ambiguities and errors. We present our
experimental evaluation in Section 6.
At the heart of our aggregation techniques is the tile data
representation. A tile is the smallest non-overlapping discrete
unit created by overlaying all of the workers’ segmentations
on top of each other. The tile representation allows us to
aggregate segmentations from multiple workers, rather than
being restricted to a single worker’s segmentation, allowing
us to fix one worker’s errors with help from another. In Figure
3 (left), we display three worker segmentations for a toy
example with 6 resulting tiles. Any subset of these tiles can
contribute towards the final segmentation.</p>
      <p>This simple but powerful idea of tiles also allows us to
reformulate our problem from one of “generating a
segmentation” to a setting that is much more familiar to crowdsourcing
researchers. Since tiles are the lowest granularity units
created by overlaying all workers’ segmentations on top of each
other, each tile is either completely contained within or
outside a given worker segmentation. Specifically, we can regard
a worker segmentation as multiple boolean responses where
the worker has voted ‘yes’ or ‘no’ to every tile independently.
Intuitively, a worker votes ‘yes’ for every tile that is contained
in their segmentation, and ‘no’ for every tile that is not. As
shown in Figure 3 (right), tile t2 is voted ‘yes’ by worker 1,
2, and 3; tile t3 is voted ‘yes’ by worker 2 and 3. The goal
of our aggregation algorithms is to pick an appropriate set of
tiles that effectively trades off precision versus recall.</p>
      <p>Now that we have modeled segmentation as a collection of
worker votes for tiles, we can now develop familiar variants
of standard quality evaluation algorithms for this setting.</p>
      <sec id="sec-3-1">
        <title>Aggregation: Majority Vote Aggregation (MV)</title>
        <p>This simple algorithm includes a tile in the output
segmentation if and only if the tile has ‘yes’ votes from at least 50%
of all workers.</p>
      </sec>
      <sec id="sec-3-2">
        <title>Aggregation: Expectation-Maximization (EM)</title>
        <p>
          Unlike MV, which assumes that all workers perform
uniformly, EM approaches infer the likelihood that a tile is part
of the ground truth segmentation, while simultaneously
estimating hidden worker qualities. In Section 6 we evaluate
an EM variant which assumes that each worker has a
(different) fixed probability for a correct vote. Details of this,
and more fine-grained variants can be found in our technical
report
          <xref ref-type="bibr" rid="ref9">(Lee et al. 2018)</xref>
          .
        </p>
      </sec>
      <sec id="sec-3-3">
        <title>Aggregation: Greedy Tile Picking (greedy)</title>
        <p>The greedy algorithm picks tiles in descending order of the
tiles’ ratios of (estimated) overlap area with the ground truth
to (estimated) non-overlap area with ground truth, for as long
as the (estimated) Jaccard similarity of the resulting
segmentation continues to increase. Intuitively, tiles that have a high
overlap area and low non-overlap area contribute to high
recall with limited loss of precision. Since tile overlap and
non-overlap areas, and Jaccard similarity of segmentations
with ground truth are unknown, we use different heuristics
to estimate these values. We discuss details of this algorithm
and its theoretical guarantees in our technical report.</p>
      </sec>
      <sec id="sec-3-4">
        <title>Retrieval: Number of Control Points (num pts)</title>
        <p>
          This algorithm picks the worker segmentation with the largest
number of control points around the segmentation boundary
(i.e., the most precise drawing) as the output segmentation
          <xref ref-type="bibr" rid="ref16 ref18">(Vittayakorn and Hays 2011; Sorokin and Forsyth 2008)</xref>
          .
worker 1
worker 2
worker 3
object boundary
Tile-based Inference 
(5 worker example)
t6
t5
t1
t2
t3
        </p>
        <p>t4
As discussed in Section 3, disagreements often arise in
segmentation due to differing worker perspectives on large tile
regions. We developed a clustering-based preprocessing
approach to resolve this issue. Based on the intuition that
workers with similar perspectives will have segmentations that
are close to each other, we compute the Jaccard similarity
between each pair of segmentations and perform spectral
clustering to separate the segmentations into clusters.
Figure 2 (bottom) illustrates how spectral clustering divides the
worker segmentations into clusters with meaningful semantic
associations, reflecting the diversity of perspectives for the
same task. Clustering results can be used as a preprocessing
step for any quality evaluation algorithm by keeping only
the segmentations that belong to the largest cluster, which is
typically free of semantic errors.</p>
        <p>In addition, clustering offers the additional benefit of
preserving a worker’s semantic intentions. For example, while
the green cluster in Figure 2 (bottom right) would be
considered bad segmentations for the particular task (‘computer’),
this cluster can provide more data for another segmentation
task corresponding to ‘monitor’. A potential future work
direction would be to crowdsource the semantic labels for the
computed clusters to enable the reuse of segmentations across
multiple objects to lower costs.
Colors denote
clusters with
different worker
perspectives.</p>
      </sec>
      <sec id="sec-3-5">
        <title>Dataset Description</title>
        <p>
          We collected crowdsourced segmentations from Amazon
Mechanical Turk; each HIT consisted of one segmentation task
for a specific pre-labeled object in an image. Workers were
compensated $0.05 per task. There were a total of 46 objects
in 9 images from the MSCOCO dataset
          <xref ref-type="bibr" rid="ref1 ref11 ref7 ref8">(Lin et al. 2014)</xref>
          segmented by 40 different workers each, resulting in a total
of 1840 segmentations. Each task contained a keyword for
the object and a pointer indicating the object to be segmented.
Two of the authors generated the ground truth segmentations
by carefully segmenting the objects using the same interface.
        </p>
      </sec>
      <sec id="sec-3-6">
        <title>Evaluation Metrics</title>
        <p>Evaluation metrics used in our experiments measure how
well the final segmentation (S) produced by these algorithms
compare against ground truth (GT). We use the Jaccard score
Jaccard (J) =</p>
        <p>UIAA((SS)) , which accounts for the intersection
area, IA = area(S \ GT ) and union area, U A = area(S [
GT ) between the worker and ground truth segmentations.</p>
      </sec>
      <sec id="sec-3-7">
        <title>Experiment 1: Aggregation-based methods perform significantly better than retrieval-based methods</title>
        <p>In Figure 5, we vary the number of worker segmentations
along the x-axis and plot the average Jaccard score on the
y-axis across different worker samples of a given size across
different algorithms. Figure 5 (left) shows that the
performance of aggregation-based algorithms (greedy, EM)
exceeds the best achievable through existing retrieval-based
methods (Retrieval). Then, in Figure 5 (right), we estimate
the upper-bound performance of each algorithm by
assuming that ‘full information’ based on ground truth is given
to the algorithm. For greedy, the algorithm is aware of all
the actual tile overlap and non-overlap areas against ground
truth. For EM, the true worker quality parameter values
(under our worker quality model) are known. For retrieval, the
full information version directly picks the worker with the
highest Jaccard similarity with respect to the ground truth.
By making use of ground truth information (Figure 5 right),
the best aggregation-based algorithm can achieve a
close-toperfect average Jaccard score of 0.98 as an upper bound, far
exceeding the results achievable by any single ‘best’ worker
(J=0.91). This result demonstrates that aggregation-based
methods are able to achieve better performance by
performing inference at the tile granularity, which is guaranteed to
be finer grained than any individual worker segmentation.</p>
      </sec>
      <sec id="sec-3-8">
        <title>The performance of aggregation-based methods scale well as more worker segmentations are added.</title>
        <p>Intuitively, larger numbers of worker segmentations result in
finer granularity tiles for the aggregation-based methods. The
first row in Table 1 shows the average percentage change in
performance between 5-workers and 30-workers samples. We
observe that aggregation based methods typically improve in
performance with an increase in number of workers, while
this is not generally true for retrieval-based methods.</p>
      </sec>
      <sec id="sec-3-9">
        <title>Experiment 2: Clustering as preprocessing improves algorithmic performance.</title>
        <p>The second row in Table 1 shows the average percentage
Jaccard change when clustering preprocessing is used. While
clustering generally results in an accuracy increase, since the
‘full information’ variants are already free of semantic errors,
we do not see further improvement for these variants.</p>
        <p>Retrieval-based Aggregation-based
Algorithm num pts worker* MV EM greedy greedy*
Worker Scaling -6.30 2.58 2.12 1.78 2.07 5.38
Clustering Effect 5.92 -0.02 2.05 0.03 5.73 0.283
We identified three different types of errors for crowdsourced
image segmentation, developed a clustering-based method
to capture the semantic diversity caused by differing worker
perspectives, and introduced novel aggregation-based
methods that produce more accurate segmentations than existing
retrieval-based methods.</p>
        <p>Our preliminary studies show that our worker quality
models are good indicators of the actual accuracy of worker
segmentations. We also observe that the greedy algorithm is
capable of achieving close-to-perfect segmentation accuracy with
ground truth information. Given the success of
aggregationbased methods, including the simple majority vote algorithm,
we plan to use our worker quality insights to improve our
EM and greedy algorithms. We are also working on using
computer vision signals to further improve our algorithms.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [Bell et al. 2014]
          <string-name>
            <given-names>Sean</given-names>
            <surname>Bell</surname>
          </string-name>
          , Kavita Bala, and Noah Snavely.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <article-title>Intrinsic images in the wild</article-title>
          .
          <source>ACM Trans. on Graphics (SIGGRAPH)</source>
          ,
          <volume>33</volume>
          (
          <issue>4</issue>
          ),
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [Bell et al. 2015]
          <string-name>
            <given-names>Sean</given-names>
            <surname>Bell</surname>
          </string-name>
          , Paul Upchurch, Noah Snavely, and Kavita Bala.
          <article-title>Material recognition in the wild with the materials in context database</article-title>
          .
          <source>Computer Vision and Pattern Recognition (CVPR)</source>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [Cabezas et al. 2015]
          <string-name>
            <given-names>Ferran</given-names>
            <surname>Cabezas</surname>
          </string-name>
          , Axel Carlier, Vincent Charvillat, Amaia Salvador, and
          <article-title>Xavier Giro-I-Nieto</article-title>
          .
          <article-title>Quality control in crowdsourced object segmentation</article-title>
          .
          <source>Proceedings of International Conference on Image Processing</source>
          ,
          <string-name>
            <surname>ICIP</surname>
          </string-name>
          ,
          <fpage>2015</fpage>
          - Decem:
          <fpage>4243</fpage>
          -
          <lpage>4247</lpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [Everingham et al. 2015
          <string-name>
            <surname>] M. Everingham</surname>
            ,
            <given-names>S. M. A.</given-names>
          </string-name>
          <string-name>
            <surname>Eslami</surname>
            ,
            <given-names>L. Van</given-names>
          </string-name>
          <string-name>
            <surname>Gool</surname>
            ,
            <given-names>C. K. I.</given-names>
          </string-name>
          <string-name>
            <surname>Williams</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Winn</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Zisserman</surname>
          </string-name>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <source>International Journal of Computer Vision</source>
          ,
          <volume>111</volume>
          (
          <issue>1</issue>
          ):
          <fpage>98</fpage>
          -
          <lpage>136</lpage>
          ,
          <year>January 2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <article-title>[Irshad and et</article-title>
          . al.
          <year>2014</year>
          ]
          <string-name>
            <given-names>H</given-names>
            <surname>Irshad</surname>
          </string-name>
          and
          <article-title>Montaser-Kouhsari et</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          al.
          <article-title>Crowdsourcing Image Annotation for Nucleus Detection and Segmentation in Computational Pathology: Evaluating Experts</article-title>
          ,
          <source>Automated Methods, and the Crowd. Biocomputing</source>
          <year>2015</year>
          , pages
          <fpage>294</fpage>
          -
          <lpage>305</lpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <surname>[Lee</surname>
            et al. 2018]
            <given-names>Doris</given-names>
          </string-name>
          <string-name>
            <surname>Jung-Lin</surname>
            <given-names>Lee</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Akash Das Sarma</surname>
            , and
            <given-names>Aditya</given-names>
          </string-name>
          <string-name>
            <surname>Parameswaran</surname>
          </string-name>
          .
          <article-title>Aggregating crowdsourced image segmentations</article-title>
          .
          <source>Technical report</source>
          , Stanford InfoLab (ilpubs.stanford.edu:
          <volume>8090</volume>
          /1161/),
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <surname>[Lin</surname>
          </string-name>
          et al. 2012]
          <string-name>
            <surname>Christopher H Lin</surname>
          </string-name>
          , Mausam, and Daniel S Weld.
          <article-title>Crowdsourcing control : Moving beyond multiple choice</article-title>
          .
          <source>AAAI Conference on Human Computation and Crowdsourcing (HCOMP)</source>
          , pages
          <fpage>491</fpage>
          -
          <lpage>500</lpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <surname>[Lin</surname>
            et al. 2014]
            <given-names>Tsung</given-names>
          </string-name>
          <string-name>
            <surname>Yi Lin</surname>
            ,
            <given-names>Michael</given-names>
          </string-name>
          <string-name>
            <surname>Maire</surname>
            , Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and
            <given-names>C. Lawrence</given-names>
          </string-name>
          <string-name>
            <surname>Zitnick. Microsoft</surname>
            <given-names>COCO</given-names>
          </string-name>
          :
          <article-title>Common objects in context</article-title>
          .
          <source>European Conference on Computer Vision (ECCV)</source>
          ,
          <source>8693 LNCS(PART 5)</source>
          :
          <fpage>740</fpage>
          -
          <lpage>755</lpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <source>[Natonek</source>
          <year>1998</year>
          ]
          <string-name>
            <given-names>E.</given-names>
            <surname>Natonek</surname>
          </string-name>
          .
          <article-title>Fast range image segmentation for servicing robots</article-title>
          .
          <source>In Proceedings. 1998 IEEE International Conference on Robotics and Automation (Cat.</source>
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <source>No.98CH36146)</source>
          , volume
          <volume>1</volume>
          , pages
          <fpage>406</fpage>
          -
          <lpage>411</lpage>
          vol.
          <volume>1</volume>
          , May
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [Russakovsky et al. 2015]
          <string-name>
            <given-names>Olga</given-names>
            <surname>Russakovsky</surname>
          </string-name>
          ,
          <string-name>
            <surname>Li-Jia Li</surname>
          </string-name>
          , and
          <string-name>
            <surname>Li</surname>
          </string-name>
          Fei-Fei.
          <article-title>Best of Both Worlds: Human-Machine Collaboration for Object Annotation</article-title>
          . pages
          <fpage>2121</fpage>
          -
          <lpage>2131</lpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [Sameki et al. 2015]
          <string-name>
            <given-names>Mehrnoosh</given-names>
            <surname>Sameki</surname>
          </string-name>
          , Danna Gurari, and
          <string-name>
            <given-names>Margrit</given-names>
            <surname>Betke</surname>
          </string-name>
          .
          <article-title>Characterizing Image Segmentation Behavior of the Crowd</article-title>
          . pages
          <fpage>1</fpage>
          -
          <lpage>4</lpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <source>[Sorokin and Forsyth</source>
          <year>2008</year>
          ]
          <article-title>Alexander Sorokin</article-title>
          and
          <string-name>
            <given-names>David</given-names>
            <surname>Forsyth</surname>
          </string-name>
          .
          <article-title>Utility data annotaton with Amazon Mechanical Turk</article-title>
          .
          <source>Proceedings of the 1st IEEE Workshop on Internet Vision at CVPR 08</source>
          , (c):
          <fpage>1</fpage>
          -
          <lpage>8</lpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [Torralba et al.
          <year>2010</year>
          ] Antonio Torralba, Bryan C. Russell, and Jenny Yuen.
          <article-title>LabelMe: Online image annotation and applications</article-title>
          .
          <source>Proceedings of the IEEE</source>
          ,
          <volume>98</volume>
          (
          <issue>8</issue>
          ):
          <fpage>1467</fpage>
          -
          <lpage>1484</lpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          <source>[Vittayakorn and Hays</source>
          <year>2011</year>
          ]
          <article-title>Sirion Vittayakorn and James Hays. Quality Assessment for Crowdsourced Object Annotations</article-title>
          .
          <source>Procedings of the British Machine Vision Conference</source>
          , pages
          <fpage>109</fpage>
          .
          <fpage>1</fpage>
          -
          <lpage>109</lpage>
          .11,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [Welinder et al. 2010]
          <string-name>
            <given-names>Peter</given-names>
            <surname>Welinder</surname>
          </string-name>
          , Steve Branson, Serge Belongie, and
          <string-name>
            <given-names>Pietro</given-names>
            <surname>Perona</surname>
          </string-name>
          .
          <source>The Multidimensional Wisdom of Crowds. NIPS (Conference on Neural Information Processing Systems)</source>
          ,
          <volume>6</volume>
          :
          <fpage>1</fpage>
          -
          <lpage>9</lpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [Yamaguchi 2012]
          <string-name>
            <given-names>Kota</given-names>
            <surname>Yamaguchi</surname>
          </string-name>
          .
          <article-title>Parsing clothing in fashion photographs</article-title>
          .
          <source>In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)</source>
          ,
          <source>CVPR '12</source>
          , pages
          <fpage>3570</fpage>
          -
          <lpage>3577</lpage>
          , Washington, DC, USA,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>