LIA @ MediaEval 2013 Crowdsourcing Task: Metadata or not Metadata? That is a Fashion Question Mohamed Morchid, Richard Dufour, Mohamed Bouallegue, Georges Linarès and Driss Matrouf LIA - University of Avignon (France) {firstname.lastname}@univ-avignon.fr ABSTRACT the metadata of the images occurring in the same context. In this paper, we describe the LIA system proposed for the Five systems (i.e., runs) are submitted. Each run is divided MediaEval 2013 Crowdsourcing for Social Multimedia task. into two subtasks. The first one is to evaluate a score x̃i The aim is to associate an accurate label to an image among (with {i = 1, . . . , 5}) for each label j (j = 1 for the label Yes multiple noisy labels collected from a crowdsourcing plat- and j = 0 for No) and then to select the label li with the form. In particular, the task participants have to predict highest score x̃i as: two types of binary labels for each considered image. The Y es if x̃0i ≤ x̃1i  first one mentions that an image is truly fashion-related or li = (1) No otherwise not, while the second label indicates that the fashion tag assigned to the image is related to the content of the image • RUN 1: Crowdsourcing Annotation or not. The proposed system combines noisy crowdsourcing The crowdsourcing annotation contributed by the gen- labels, image metadata and external resources. eral crowd is used to decide if an image is fashion- related or not, and if it has been correctly tagged. The 1. INTRODUCTION best label x˜1 knowing a set of workers W for both ques- Since the advent of Web 2.0 [6], Internet users actively tions is estimated in a similar way with a Naive Bayes participate in information construction and propagation. Th- method. This Yes/No classification is expressed by: us, many companies rely on users to give their opinion on x̃j1 = argmax P (X j = x|Ywj ) movies or on musics, to annotate specific data... In June x 2006, Jeff Howe [1] defined for the first time this new behav- ∝ P (Ywj |X j )P (X j ) ior with the term of crowdsourcing. Y = P (Ywj |X j )P (X j ) (2) Thus, crowdsourcing makes it possible to rapidly and chea- w∈W ply collect supervised labeled data. Nonetheless, annotation quality is uneven since it depends of the annotator, its im- where we assume that labels of each worker are condi- plication, or its expertise level. As part of the collected data tionally independent. Ywj corresponds to one label of can be very noisy and inaccurate, solutions should be pro- the worker w; X j also corresponds to one label and x posed to evaluate the relevance of the labeled data with a is the true label. Thus, each image is labeled with the minimum time cost. label l1 as explained in the previous section. For these reasons, the Crowdsourcing task becomes a more • RUN 2: Context and more popular and helpful task [3, 2]. In this paper, Each image is potentially related to other images: they we describe the LIA system presented at the MediaEval have been grouped into sets or pools c according to 2013 Crowdsourcing task. The proposed system uses differ- annotators whether they were annotated by the same ent parts of the image metadata: the annotator contribution person. The annotator label of these images is used to (confidence1 , label), the image context, its geographic coor- estimate a score of the image context x̃j2 for fashion- dinates, and text descriptors (title, tags). All this described relatedness: image metadata will be used in our system to decide if an c X d image is truly fashion-related or not (Label1) while the rel- X evance of the fashion category (Label2) will be chosen using x̃j2 = x̃j1 × P (X j |dik ) (3) the annotator content only. i=1 k=0 where dik is the kth image of the context (set or pool) 2. PROPOSED APPROACH i, X j is a label (Yes or No), and x̃j1 the crowdsourcing The proposed approach focuses on the textual part of the annotation score. images such as the image metadata, its geo-localization and • RUN 3: Geo-Localization 1 An image could be localized close to one or several Self-reported confidence of the image category. other ones. Here, a geo-localization score x̃j3 is defined. This score is calculated for the images that have a dis- Copyright is held by the author/owner(s). tance of zero to the current image. Furthermore, only MediaEval 2013 Workshop, October 18-19, 2013, Barcelona, Spain images that have a label probability X j greater than 2 3 (estimated on the development set) are considered the use of the crowdsourcing annotation only. This demon- close and relevant for the fashion-relatedness question: strates that the crowdsourcing annotation is at least as reli- s able as the metadata provided with the images, or that these other information sources should be better handled (other X x̃j3 = x̃j1 × P (X j |di ) (4) i=1 approaches or data selection) to expect classification gains. On the fashion tag classification (Label2), a F-measure of where di is an image from the set s of images with a 0.7175 has been obtained (no image metadata used). distance equals to 0, and x̃j1 the crowdsourcing anno- tation score. • RUN 4: External Resources Each image comes with metadata (title and tags2 ) and a search rank on the hosting service. The first step is (a) (b) (c) to collect a set of relevant pictures that are fashion- related. The Flickr API is used to compose a new set Figure 1: Sample of pictures from train dataset. of 4, 328 images that responds to the query fashion. The metadata are extracted from this set of images. Table 1: Classification results for fashion-relatedness The second step computes the probability score of each (Label 1) classification. word based on the frequency of the word in the meta- Run Id Submission F1 Label1 data. Thus, a metadata score x̃j4 is calculated for an 1 Crowdsourcing Annotation 0.7239 image d by using the probabilistic model and the rank 2 Context 0.7171 r of the image for the question 1 (fashion or not): 3 Geo-Localization 0.7236 |d| 4 External Resources 0.7176 (−1)j+1 X x̃j4 = x̃j1 + P (wi |d) (5) 5 Combination 0.7183 r i=1 where P (wi |d) is the probability of a word wi from the 4. CONCLUSIONS image metadata of d knowing the model m of term fre- In this paper, an automatic image classification system quencies. An image is labeled with the label l4 knowing has been proposed. This system combines different aspects the annotator score x̃j1 and the term frequency model. of the metadata content. The main observation is that best results are obtained with the crowdsourcing annotation only • RUN 5: Combination (Label1). No gains have been observed using metadata on A final run using a combination of the 4 scores x̃ji de- the test set, but efforts on new approaches and metadata scribed above is submitted. A label is assigned to a selection should be continued to improve classification per- picture if this label responds to the different aspects of formance. Finally, the use of the image metadata will be the image metadata: explored for the fashion tag task (Label2). x̃j5 = x̃j1 × xj2 × xj3 × xj4 (6) 5. ACKNOWLEDGMENTS This work was funded by the SUMACC project supported This score allows to associate an image with the gen- by the French National Research Agency (ANR) under con- eral label l5 . We can notice that the score xji is: tract ANR-10-CORD-007. x̃ji xji = with i 6= 1 6. REFERENCES x̃j1 [1] J. Howe. The rise of crowdsourcing. Wired magazine, 14(6):1–4, 2006. 3. EXPERIMENTS [2] M. Lease and O. Alonso. Crowdsourcing for search The proposed system is evaluated in the MediaEval 2013 evaluation and social-algorithmic search. In ACM benchmark [4]. In this task, we use the fashion social dataset SIGIR conference on Research and development in [5]. Figure 3 presents 3 images extracted from the train information retrieval, pages 1180–1180. ACM, 2012. dataset with different combinations of label mode-classifica- [3] M. Lease, V. Carvalho, and E. Yilmaz, editors. tion: (a) is not fashion-related (Culottes); (b) is not fashion- Workshop on Crowdsourcing for Search and Data related but well categorized (Androgyny) and (c) is fashion- Mining (CSDM). February 2011. related and well categorized (Cowboy hat). [4] B. Loni, M. Larson, A. Bozzon, and L. Gottlieb. Table 1 presents results obtained in fashion-relatedness Crowdsourcing for Social Multimedia at MediaEval classification of images (Label1) in terms of F-measure. We 2013: Challenges, data set, and evaluation. In can see that the best results are obtained using the crowd- MediaEval 2013 Workshop, Barcelona, Spain, 2013. sourcing annotation only. Although slight gains were ob- [5] B. Loni, M. Menendez, M. Georgescu, L. Galli, served during the development phase of our systems, we have C. Massari, I. S. Altingovde, D. Martinenghi, to note that the use of metadata information sources in ad- M. Melenhorst, R. Vliegendhart, and M. Larson. dition to the crowdsourcing annotation do not improve the Fashion-focused creative commons social dataset. In classification performance on the test set in comparison to ACM Multimedia Systems Conference, pages 72–77, 2 2013. Note that experiments on the development set showed that description or personal notes do not improve the results. [6] T. O’Reilly. What is Web 2.0, 2005.