=Paper= {{Paper |id=Vol-1436/Paper49 |storemode=property |title=RECOD @ Placing Task of MediaEval 2015 |pdfUrl=https://ceur-ws.org/Vol-1436/Paper49.pdf |volume=Vol-1436 |dblpUrl=https://dblp.org/rec/conf/mediaeval/LiMACPDNMPPSGT15 }} ==RECOD @ Placing Task of MediaEval 2015== https://ceur-ws.org/Vol-1436/Paper49.pdf
                   RECOD @ Placing Task of MediaEval 2015

 Lin Tzy Li1 , Javier A. V. Muñoz1 , Jurandy Almeida1,2 , Rodrigo T. Calumby1,3 , Otávio A. B. Penatti1,4 ,
        Ícaro C. Dourado1 , Keiller Nogueira6 , Pedro R. Mendes Júnior1 , Luís A. M. Pereira1 ,
 Daniel C. G. Pedronette1,5 , Jefersson A. dos Santos6 , Marcos A. Gonçalves6 , Ricardo da S. Torres1
                                       1
                                         RECOD Lab, Institute of Computing, University of Campinas (UNICAMP),
                           2
                               GIBIS Lab, Institute of Science and Technology, Federal University of São Paulo (UNIFESP),
                  3
                    University of Feira de Santana, 4 SAMSUNG Research Institute Brazil, 5 Universidade Estadual Paulista (UNESP),
                                6
                                  Department of Computer Science, Universidade Federal de Minas Gerais (UFMG) – Brazil
     {lintzyli, pedro.mendes, luis.pereira, rtorres}@ic.unicamp.br, jurandy.almeida@unifesp.br, rtcalumby@ecomp.uefs.br, o.penatti@samsung.com,
        jalvarm.acm@gmail.com, icaro.dourado@students.ic.unicamp.br, daniel@rc.unesp.br, {keiller.nogueira, jefersson, mgoncalv}@dcc.ufmg.br


ABSTRACT                                                                 set has 4,677 images and 905 videos, while the sub-training
In this work, we describe the approach proposed by the RE-               set has 12,944 videos and 4,226,559 images.
COD team for the Placing Task, Locale-based sub-task, at
MediaEval 2015. Our approach is based on the use of as                   2.1    Features
much evidence as possible (textual, visual, and/or audio de-             Textual . The title, description, and tags of photos/videos
scriptors) to automatically assign geographical locations to             were concatenated as a single field (here named as fusion).
images and videos.                                                       The versions that used only title, description, or tags were
                                                                         also used for the rank aggregation method. The text was
                                                                         stemmed and stopwords were removed. We used BM25,
1.   INTRODUCTION                                                        TF-IDF (cosine), information-based similarity (IBSimilar-
   Geocoding multimedia has gained attention in the latest               ity) and language modelling similarity (LMDirichletSimi-
years given its importance for providing richer services for             larity), which are similarity measures implemented in the
users such as placing information on maps and providing                  Lucene package [8].
geographic searches. Since 2011, the Placing Task [2] at                 Audio/Visual . For visual place recognition of images, we
MediaEval has been challenging participants to assign the                used the provided features: edgehistogram (EHD), scalable-
geographical locations to images and videos automatically.               color (SCD), and tamura. We also extracted BIC [13].
   Here we present our approach for the Locale-based sub-                Due to time constraint and feature dimensionality, other
task. It combines textual, audio, and/or visual descriptors              planned visual features could not be applied this year. For
by applying rank aggregation and ranked list density anal-               videos, we used the provided features (all from LIRE [9]
ysis to combine multimodal information encoded in ranked                 and MFCC20 [3]) besides extracting histograms of motion
lists. This year, we evaluated new features and a Genetic                patterns (HMP) [1].
Programming (GP) [5] approach to multimodal geocoding.
GP provides a good framework for modeling optimization
problems even when the variables are functions.
                                                                         2.2    Rank aggregation, density analysis &
   Besides combining ranked lists, we also applied combina-                     geocoding
tions of rank aggregation methods by using GP. The idea is                  We used the full training set as geo-profiles and each test
to automatically select a set of suitable features and rank              item was compared to the whole training set for each fea-
aggregation functions that yield the best result according to            ture independently. For a given test item, a ranked list for
a given fitness function. Previous works [7, 16] have shown              each feature was generated. Given the ranked lists, we ex-
that combining rank aggregated lists and rank aggregation                plored two distinct strategies: (i) a rank aggregation ap-
functions [15] yields very effective results.                            proach based on genetic programming (GP-Agg); and (ii) a
                                                                         ranked list density analysis. In addition, we also explored
                                                                         the combination of both strategies.
2.   PROPOSED APPROACH                                                   1. The GP-Agg method uses genetic programming to com-
   Our approach estimates location based on rank aggrega-                bine a set of methods for rank aggregation in an agglom-
tion of a multitude of ranked lists and their density analy-             erative way, in order to improve the results of the isolated
sis [7]. We extracted a large set of features from the data,             methods [15]. We used this method to combine the textual
derived ranked lists, and combined them using rank aggre-                and visual ranked lists generated for various descriptors.
gation methods which in turn are selected and fused by a                    This method was choosen because in [15] the authors
GP-based framework proposed in [15].                                     showed that GP-Agg produced better or equal results than
   For evaluation purposes in the training phase (as in                  the best supervised technique in a wide range of rank aggre-
2014 [7]) we split the whole training set into two parts: (i)            gation techniques (supervised and unsupervised). Moreover,
a validation set; and (ii) a sub-training set. The validation            it required a reasonable time for training (a couple of hours),
                                                                         and it was relatively fast to apply the best individual (dis-
                                                                         covered function) on the test set.
Copyright is held by the author/owner(s).                                   The GP-Agg method was trained using 400 queries from
MediaEval 2015 Workshop, Sept. 14-15, 2015, Wurzen, Germany              the validation set (randomly chosen) and their ranked lists.
We stopped the evolution process at the 30th generation.
We used the fitness function, genetic operators, and rank                                Table 2: Runs configurations
                                                                                                Images                                 Videos
aggregation techniques that yielded the best results in [15].         Run      Textual      Visual        Geocoding     Textual
                                                                                                                                     Visual/
                                                                                                                                                  Geocoding
                                                                                                                                     Audio
The GP-Agg parameters are shown in Table 1.                                    BM25 +                                   BM25 +
   For the training phase of GP-Agg, an element of a ranked               1
                                                                               TF-IDF
                                                                                            -             GP-Agg
                                                                                                                        TF-IDF
                                                                                                                                     -            GP-Agg
                                                                               + IBS +                                  + IBS +
list was considered relevant if it is located no farther than                  LMD                                      LMD
1 km from the ground truth location of the query element.                                                 RLDA                       HMP + all
                                                                                                                                                  GP-Agg
                                                                          2    -            BIC                         -                         +    RLDA
The best individuals discovered in the training phase were                                                (top100)                   LIRE
                                                                                                                                                  (top100)
applied to the test set.                                                       BM25 +       BIC    +                    BM25 +
                                                                                                                                     HMP + all
                                                                               TF-IDF       EHD    +                    TF-IDF
                                                                          3                               GP-Agg                     LIRE   +     GP-Agg
           Table 1: GP-Agg parameters [15].                                    + IBS +      SCD    +                    + IBS +
                                                                                                                                     MFCC20
                                                                               LMD          tamura                      LMD
          Parameter                 Value                                      TFIDF+
          Number of generations     30                                                                    RLDA (top
                                                                               BM25+                                                              RLDA (top
                                                                          4                 -             5 from each   BM25         -
          Genetics operators        Reproduction,  Mutation,                   IBS+                                                               5)
                                                                                                          list)
                                    Crossover                                  LMD
          Fitness functions         FFP1, WAS*, MAP, NDCG
          Rank Agg. methods         CombMAX,      CombMIN,
                                                                     items of the aggregated lists in Run 3 as we had planned ini-
                                    CombSUM,      CombMED,           tially. The second best submission was achieved by Run 1.
                                    CombANZ,      CombMNZ,           Runs 2 and 3 showed that there is a room for improvements
                                    RLSim, BordaCount, RRF,
                                    MRA
                                                                     in our method based on GP.
           * WAS (Weighted Average Score) as defined in [6].
                                                                     Table 3: Overall test result: % correct in precision levels.
  Among the different fitness functions tested, the best re-                  Km              Run 1       Run 2      Run 3                   Run 4
sults (more precise) were achivied with the FFP1 as defined                   0.001              0.15          0        0.14                   0.12
in [4]:                                                                       0.01               0.54       0.01        0.53                   0.62
                                                                              0.1                5.49       0.09        5.35                   6.44
                        |N |
                        X                                                     1                 19.75       0.44       19.11                  21.74
           FF F P 1 =            li ) × k1 × ln−1 (i + k2 )
                               r(ˆ                             (1)            10                36.60       1.99       35.31                  38.38
                        i=1                                                   100               44.89       3.57       43.26                  46.91
                                                                              1000              58.97      20.38       57.67                  63.22
where i is the element position after retrieval and ˆ   li is the             10000             91.36      88.13       91.53                  94.01
element at position i. r(ˆ  li ) ∈ {0, 1} is the relevance score                            Distance in each quartile (km)
assigned to an element, being 1 if the element is relevant and                Q1               1.8967 1239.6011      2.1244                  1.4824
                                                                              Q2             309.8649 5882.7206     394.889                196.0081
0 otherwise. |N | is the total number of retrieved elements.                  Q3            5573.9304 8636.9465 5766.8941                 3798.6223
k1 , k2 are scaling factors. Based on [15], we choose k1 = 6
and k2 = 1.2 in our experiments.                                        In most of the runs we have dealt with images and video
2. The ranked list density analysis (RLDA)1 explores the             through different settings. For instance, in Run 2 we applied
idea of finding the maximum point in a probability den-              RLDA top-100 of BIC ranked list for images, while for videos
sity function (PDF). Firstly, we induce a k-nearest neighbor         we combined other descriptors using GP-Agg followed by
graph (with k = 3), where the graph nodes are defined as             RLDA for the GP aggregated list. Thus, in Table 4, we
being the top-n items of the ranked lists. For each node,            only show the results regarding the videos in the test set. In
we estimate its probability density value by using a Parzen-         the validation phase, the geocoding results for videos were
Window gaussian kernel. This procedure is the same used to           relatively better than the ones for images, but it seems that
find root nodes (nodes with maximum density in a PDF) in             in the test set this tendency was not preserved. For example
the Optimum-path Forest (OPF) clustering algorithm [10].             in Run 4 (Table 4), the rate of correctly geocoded for videos
Finally, to assign a lat/long to a test item, we just verify         in each precision levels is lower than the overall results for
the lat/long of the graph node (a ranked list item) with the         Run 4 (Table 3).
highest density value.                                                         Table 4: Videos test results (% correct)
                                                                          Km       0.001        0.01     0.1    1       10        100     1000    10000
                                                                          Run 1      0.08       0.40     5.46   17.62   32.44     40.27   54.13    90.57
3.   OUR SUBMISSIONS & RESULTS                                            Run 2      0.00       0.00     0.01    0.02    0.11      3.80   20.39    91.97
   None of our submissions used extra crawled material or                 Run 3      0.08       0.37     5.13   16.74   32.10     39.69   53.67    91.50
gazetteers. Based on parameters of our best results on the                Run 4      0.06       0.41     5.79   17.89   32.24     39.92   55.68    93.16
evaluation phase, our submissions were configured as shown
in Table 2. Runs 1 and 4 were solely based on textual de-            4.       FUTURE WORK
scriptors, while Run 2 was only-visual and Run 3 was a                  We plan to evaluate more textual and visual descriptors
multimodal submission.                                               and give them as input to GP-Agg to select descriptors
   From Table 3, our best submission was Run 4, in which             and rank aggregation methods. For example: (a) a textual
we applied the RLDA over the top-5 items of each tex-                descriptor that combines graph representation [11] with a
tual ranked list. We observed on the validation set that             framework for graph-to-vector synthesis [12]; (b) applying
the RLDA of the top-n items from aggregated ranked list              results from works that tackle the problem of visual place
(visual and textual) seems to improve the results over just          recognition [14]. Additionally, we plan to devise a GP fit-
taking the first item from a multimodal aggregated ranked            ness function that takes advantage of RLDA to geocode,
list. However, due to the delay in generating ranked lists           since most of the time RLDA improves geocoding results,
of the visual features, we did not apply RLDA to the top-n           besides exploring clustering analysis of the top-n items.
1
 Last year [7], we called RLDA as OPF, however this year             Acknowledgments
we renamed it, since we only used the OPF step that finds
the most dense point.                                                We thank FAPESP, CNPq, CAPES, and Samsung.
5.   REFERENCES                                                     International Conference on Conference on
                                                                    Information and Knowledge Management, CIKM ’15,
 [1] J. Almeida, N. J. Leite, and R. da Silva Torres.               2015. (in press).
     Comparison of video sequences with histograms of          [16] M. N. Volkovs and R. S. Zemel. CRF framework for
     motion patterns. In ICIP, pages 3673–3676, 2011.               supervised preference aggregation. In Proceedings of
 [2] J. Choi, C. Hauff, O. V. Laere, and B. Thomee. The             the 22Nd ACM International Conference on
     Placing Task at MediaEval 2015. In Working Notes               Conference on Information; Knowledge Management,
     Proc. MediaEval Workshop, Wurzen, Germany, Sept.               CIKM ’13, pages 89–98, New York, NY, USA, 2013.
     2015.
 [3] S. Davis and P. Mermelstein. Comparison of
     parametric representations for monosyllabic word
     recognition in continuously spoken sentences. IEEE
     Transactions on Acoustics, Speech and Signal
     Processing, 28(4):357–366, Aug. 1980.
 [4] W. Fan, E. A. Fox, P. Pathak, and H. Wu. The effects
     of fitness functions on genetic programming-based
     ranking discovery for web search. Journal of the
     American Society for Information Science and
     Technology, 55(7):628–636, 2004.
 [5] J. R. Koza. Genetic Programming: On the
     Programming of Computers by Means of Natural
     Selection. MIT Press, Cambridge, MA, USA, 1992.
 [6] L. T. Li, D. C. G. Pedronette, J. Almeida, O. A. B.
     Penatti, R. T. Calumby, and R. da Silva Torres. A
     rank aggregation framework for video multimodal
     geocoding. Mult. Tools and App., pages 1–37, 2013.
     http://dx.doi.org/10.1007/s11042-013-1588-4.
 [7] L. T. Li, O. A. B. Penatti, J. Almeida, G. Chiachia,
     R. T. Calumby, P. R. M. Júnio, D. C. G. Pedronette,
     and R. da S. Torres. Multimedia geocoding: The
     RECOD 2014 approach. In Working Notes Proc.
     MediaEval Workshop, volume 1263, page 2, 2014.
 [8] A. Lucene. Apache Lucene Core. Web Site.
     http://lucene.apache.org/core/. As of Sept. 2015.
 [9] M. Lux and S. A. Chatzichristofis. Lire: Lucene image
     retrieval: An extensible java CBIR library. In
     Proceedings of the 16th ACM International Conference
     on Multimedia, MM ’08, pages 1085–1088, New York,
     NY, USA, 2008.
[10] L. M. Rocha, F. A. M. Cappabianco, and A. X.
     Falcão. Data clustering as an optimum-path forest
     problem with applications in image analysis. Int J
     Imag Syst Tech, 19(2):50–68, 2009.
[11] A. Schenker, H. Bunke, M. Last, and A. Kandel.
     Graph-Theoretic Techniques for Web Content Mining.
     World Scientific Publishing Co., Inc., NJ, USA, 2005.
[12] F. B. Silva, S. Tabbone, and R. d. S. Torres. BoG: A
     New Approach for Graph Matching. In ICPR, pages
     82–87. IEEE, Aug. 2014.
[13] R. d. O. Stehling, M. Nascimento, and A. Falcão. A
     compact and efficient image retrieval approach based
     on border/interior pixel classification. In Proceedings
     of the 11th International Conference on Information
     and Knowledge Management, CIKM ’02, pages
     102–109, 2002.
[14] A. Torii, R. Arandjelovic, J. Sivic, M. Okutomi, and
     T. Pajdla. 24/7 place recognition by view synthesis. In
     Proceedings of the IEEE Conference on Computer
     Vision and Pattern Recognition, CVPR ’15, pages
     1808–1817, 2015.
[15] J. A. Vargas, R. d. S. Torres, and M. A. Gonçalves. A
     soft computing approach for learning to aggregate
     rankings. In Proceedings of the 24Th ACM