LIA @ MediaEval 2013 Crowdsourcing Task:
       Metadata or not Metadata? That is a Fashion Question

                          Mohamed Morchid, Richard Dufour, Mohamed Bouallegue,
                                   Georges Linarès and Driss Matrouf
                                              LIA - University of Avignon (France)
                                        {firstname.lastname}@univ-avignon.fr

ABSTRACT                                                           the metadata of the images occurring in the same context.
In this paper, we describe the LIA system proposed for the         Five systems (i.e., runs) are submitted. Each run is divided
MediaEval 2013 Crowdsourcing for Social Multimedia task.           into two subtasks. The first one is to evaluate a score x̃i
The aim is to associate an accurate label to an image among        (with {i = 1, . . . , 5}) for each label j (j = 1 for the label Yes
multiple noisy labels collected from a crowdsourcing plat-         and j = 0 for No) and then to select the label li with the
form. In particular, the task participants have to predict         highest score x̃i as:
two types of binary labels for each considered image. The                                       Y es if x̃0i ≤ x̃1i
                                                                                             
first one mentions that an image is truly fashion-related or                           li =                                        (1)
                                                                                                No      otherwise
not, while the second label indicates that the fashion tag
assigned to the image is related to the content of the image          • RUN 1: Crowdsourcing Annotation
or not. The proposed system combines noisy crowdsourcing                The crowdsourcing annotation contributed by the gen-
labels, image metadata and external resources.                          eral crowd is used to decide if an image is fashion-
                                                                        related or not, and if it has been correctly tagged. The
1.      INTRODUCTION                                                    best label x˜1 knowing a set of workers W for both ques-
   Since the advent of Web 2.0 [6], Internet users actively             tions is estimated in a similar way with a Naive Bayes
participate in information construction and propagation. Th-            method. This Yes/No classification is expressed by:
us, many companies rely on users to give their opinion on                               x̃j1 = argmax P (X j = x|Ywj )
movies or on musics, to annotate specific data... In June                                          x
2006, Jeff Howe [1] defined for the first time this new behav-                             ∝ P (Ywj |X j )P (X j )
ior with the term of crowdsourcing.
                                                                                              Y
                                                                                           =       P (Ywj |X j )P (X j )          (2)
   Thus, crowdsourcing makes it possible to rapidly and chea-                                 w∈W
ply collect supervised labeled data. Nonetheless, annotation
quality is uneven since it depends of the annotator, its im-             where we assume that labels of each worker are condi-
plication, or its expertise level. As part of the collected data         tionally independent. Ywj corresponds to one label of
can be very noisy and inaccurate, solutions should be pro-               the worker w; X j also corresponds to one label and x
posed to evaluate the relevance of the labeled data with a               is the true label. Thus, each image is labeled with the
minimum time cost.                                                       label l1 as explained in the previous section.
   For these reasons, the Crowdsourcing task becomes a more
                                                                      • RUN 2: Context
and more popular and helpful task [3, 2]. In this paper,
                                                                        Each image is potentially related to other images: they
we describe the LIA system presented at the MediaEval
                                                                        have been grouped into sets or pools c according to
2013 Crowdsourcing task. The proposed system uses differ-
                                                                        annotators whether they were annotated by the same
ent parts of the image metadata: the annotator contribution
                                                                        person. The annotator label of these images is used to
(confidence1 , label), the image context, its geographic coor-
                                                                        estimate a score of the image context x̃j2 for fashion-
dinates, and text descriptors (title, tags). All this described
                                                                        relatedness:
image metadata will be used in our system to decide if an
                                                                                                        c X
                                                                                                          d
image is truly fashion-related or not (Label1) while the rel-                                           X
evance of the fashion category (Label2) will be chosen using                            x̃j2 = x̃j1 ×             P (X j |dik )   (3)
the annotator content only.                                                                             i=1 k=0

                                                                         where dik is the kth image of the context (set or pool)
2.      PROPOSED APPROACH                                                i, X j is a label (Yes or No), and x̃j1 the crowdsourcing
  The proposed approach focuses on the textual part of the               annotation score.
images such as the image metadata, its geo-localization and
                                                                      • RUN 3: Geo-Localization
1                                                                       An image could be localized close to one or several
    Self-reported confidence of the image category.
                                                                        other ones. Here, a geo-localization score x̃j3 is defined.
                                                                        This score is calculated for the images that have a dis-
Copyright is held by the author/owner(s).                               tance of zero to the current image. Furthermore, only
MediaEval 2013 Workshop, October 18-19, 2013, Barcelona, Spain          images that have a label probability X j greater than
       2
       3
          (estimated on the development set) are considered          the use of the crowdsourcing annotation only. This demon-
       close and relevant for the fashion-relatedness question:      strates that the crowdsourcing annotation is at least as reli-
                                          s
                                                                     able as the metadata provided with the images, or that these
                                                                     other information sources should be better handled (other
                                          X
                       x̃j3 = x̃j1 ×            P (X j |di )   (4)
                                          i=1
                                                                     approaches or data selection) to expect classification gains.
                                                                     On the fashion tag classification (Label2), a F-measure of
       where di is an image from the set s of images with a          0.7175 has been obtained (no image metadata used).
       distance equals to 0, and x̃j1 the crowdsourcing anno-
       tation score.

     • RUN 4: External Resources
       Each image comes with metadata (title and tags2 ) and
       a search rank on the hosting service. The first step is                        (a)        (b)        (c)
       to collect a set of relevant pictures that are fashion-
       related. The Flickr API is used to compose a new set           Figure 1: Sample of pictures from train dataset.
       of 4, 328 images that responds to the query fashion.
       The metadata are extracted from this set of images.           Table 1: Classification results for fashion-relatedness
       The second step computes the probability score of each        (Label 1) classification.
       word based on the frequency of the word in the meta-              Run Id Submission                      F1 Label1
       data. Thus, a metadata score x̃j4 is calculated for an                1      Crowdsourcing Annotation     0.7239
       image d by using the probabilistic model and the rank                 2      Context                      0.7171
       r of the image for the question 1 (fashion or not):                   3      Geo-Localization             0.7236
                                                  |d|                        4      External Resources           0.7176
                                   (−1)j+1 X
                   x̃j4 = x̃j1 +               P (wi |d)       (5)           5      Combination                  0.7183
                                      r    i=1

       where P (wi |d) is the probability of a word wi from the
                                                                     4.   CONCLUSIONS
       image metadata of d knowing the model m of term fre-             In this paper, an automatic image classification system
       quencies. An image is labeled with the label l4 knowing       has been proposed. This system combines different aspects
       the annotator score x̃j1 and the term frequency model.        of the metadata content. The main observation is that best
                                                                     results are obtained with the crowdsourcing annotation only
     • RUN 5: Combination                                            (Label1). No gains have been observed using metadata on
       A final run using a combination of the 4 scores x̃ji de-      the test set, but efforts on new approaches and metadata
       scribed above is submitted. A label is assigned to a          selection should be continued to improve classification per-
       picture if this label responds to the different aspects of    formance. Finally, the use of the image metadata will be
       the image metadata:                                           explored for the fashion tag task (Label2).

                        x̃j5 = x̃j1 × xj2 × xj3 × xj4          (6)   5.   ACKNOWLEDGMENTS
                                                                       This work was funded by the SUMACC project supported
       This score allows to associate an image with the gen-
                                                                     by the French National Research Agency (ANR) under con-
       eral label l5 . We can notice that the score xji is:
                                                                     tract ANR-10-CORD-007.
                                   x̃ji
                         xji =            with i 6= 1                6.   REFERENCES
                                   x̃j1
                                                                     [1] J. Howe. The rise of crowdsourcing. Wired magazine,
                                                                         14(6):1–4, 2006.
3.     EXPERIMENTS                                                   [2] M. Lease and O. Alonso. Crowdsourcing for search
   The proposed system is evaluated in the MediaEval 2013                evaluation and social-algorithmic search. In ACM
benchmark [4]. In this task, we use the fashion social dataset           SIGIR conference on Research and development in
[5]. Figure 3 presents 3 images extracted from the train                 information retrieval, pages 1180–1180. ACM, 2012.
dataset with different combinations of label mode-classifica-        [3] M. Lease, V. Carvalho, and E. Yilmaz, editors.
tion: (a) is not fashion-related (Culottes); (b) is not fashion-         Workshop on Crowdsourcing for Search and Data
related but well categorized (Androgyny) and (c) is fashion-             Mining (CSDM). February 2011.
related and well categorized (Cowboy hat).                           [4] B. Loni, M. Larson, A. Bozzon, and L. Gottlieb.
   Table 1 presents results obtained in fashion-relatedness              Crowdsourcing for Social Multimedia at MediaEval
classification of images (Label1) in terms of F-measure. We              2013: Challenges, data set, and evaluation. In
can see that the best results are obtained using the crowd-              MediaEval 2013 Workshop, Barcelona, Spain, 2013.
sourcing annotation only. Although slight gains were ob-             [5] B. Loni, M. Menendez, M. Georgescu, L. Galli,
served during the development phase of our systems, we have              C. Massari, I. S. Altingovde, D. Martinenghi,
to note that the use of metadata information sources in ad-              M. Melenhorst, R. Vliegendhart, and M. Larson.
dition to the crowdsourcing annotation do not improve the                Fashion-focused creative commons social dataset. In
classification performance on the test set in comparison to              ACM Multimedia Systems Conference, pages 72–77,
2                                                                        2013.
  Note that experiments on the development set showed that
description or personal notes do not improve the results.            [6] T. O’Reilly. What is Web 2.0, 2005.