<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Recod @ MediaEval 2014: Diverse Social Images Retrieval</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Rodrigo T. Calumby</string-name>
          <email>rtcalumby@ecomp.uefs.br</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Vinícius P. Santana</string-name>
          <email>vpsantana@ecomp.uefs.br</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Felipe S. Cordeiro</string-name>
          <email>fscordeiro@ecomp.uefs.br</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Otávio A. B. Penatti</string-name>
          <email>o.penatti@samsung.com</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lin T. Li</string-name>
          <email>lintzyli@ic.unicamp.br</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Giovani Chiachia</string-name>
          <email>chiachia@ic.unicamp.br</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ricardo da S. Torres</string-name>
          <email>rtorres@ic.unicamp.br</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Exact Sciences, University of Feira de Santana (UEFS)</institution>
          ,
          <addr-line>Feira de Santana, BA</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>RECOD Lab, Institute of Computing, University of Campinas (UNICAMP)</institution>
          ,
          <addr-line>Campinas, SP</addr-line>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>SAMSUNG Research Institute Brazil</institution>
          ,
          <addr-line>Campinas, SP</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2014</year>
      </pub-date>
      <fpage>16</fpage>
      <lpage>17</lpage>
      <abstract>
        <p>This paper presents the results of the rst participation of our multi-institutional team in the Retrieving Diverse Social Images Task at MediaEval 2014. In this task we were required to develop a summarization and diversi cation approach for social photo retrieval. Our approach is based on irrelevant image ltering, image re-ranking, and diversity promotion by clustering. We have used visual and textual features, including image metadata and user credibility information.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. INTRODUCTION</title>
      <p>
        Promoting diversity is an e ective approach for improving
retrieval results and user search experience. For instance, it
has been applied for tackling ambiguous or underspeci ed
queries, or producing summaries. The Retrieving Diverse
Social Images Task [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] combines such problems into a
challenge on visual summarization for social photo retrieval in a
tourism related context. This paper presents our rst e orts
on relevance improvement and diversity promotion using
image visual features, metadata and user credibility
information.
      </p>
    </sec>
    <sec id="sec-2">
      <title>PROPOSED APPROACH</title>
      <p>The proposed approach follows the general pipeline
presented in Figure 1. At rst, two ltering steps are conducted
in order to reduce the amount of irrelevant images.
Afterwards, re-ranking steps are applied for improving image rank
positions according to two di erent relevance aspects.
Finally, clustering is performed and followed by representative
and diverse images selection. Speci c combinations of the
proposed steps were set for each submitted run (Section 3).</p>
    </sec>
    <sec id="sec-3">
      <title>Filtering</title>
      <p>In order to reduce the number of non-relevant images we
adapted two ltering strategies: Geographic ltering and
Face ltering. Eliminating non-relevant images allows higher
e ectiveness in terms of nal relevance and boosts the
diversi cation procedure. This is a consequence of fewer
nonrelevant items as candidates for the nal diversi ed list.</p>
      <p>The geographic ltering (GeoFilter) takes the reference
lat/long of each location and then eliminates all images
located farther than a given range. In this case, only
geotagged images were assessed. According to the results on
the development set, a 10 km range limit from the reference
point was a good choice.</p>
      <p>Since images containing a person or crowds in the
foreground are considered non-relevant, we used a face
detection module of Face++1 for ltering. For all images, we
computed the features: a) number of faces; b) biggest face
size; c) smallest face size; d) average face size; e) total face
size. The size values were computed as a fraction of the
image spatial domain.</p>
      <p>Our rst face-based ltering approach (NumFacesFilter)
eliminates all images with a number of faces superior to a
threshold. According to the experiments on the development
set, we eliminated all images with more than one face. The
second approach (FaceClassi erFilter) used a 1-NN classi er
based on the described features and considering all
development images as training instances. All images classi ed as
non-relevant were eliminated.
2.2</p>
    </sec>
    <sec id="sec-4">
      <title>Features</title>
      <p>
        For the textual and multimodal approaches, we evaluated
the TF-IDF, BM25, and Cosine measures. All of them were
computed using the provided TF, DF, and TF-IDF. In the
development set, the best results were achieved using the
Cosine measure. To enable the combination with other distance
measures, the Cosine similarity values were converted by
subtracting it from 1.0. For visual approaches, besides the
provided features, we also extracted two global descriptors
(BIC and LAS) [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] and two bag-of-visual-words (BoVW)
descriptors, based on dense (6 pixels) or sparse (Harris-Laplace
detector) SIFT, with 1000 visual words (randomly selected),
soft assignment ( = 150), and max pooling.
1http://www.faceplusplus.com - Last accessed on Sept 20,
2014.
      </p>
      <p>Since original lists may present redundant and non-relevant
items, their positions may not be optimum for their
relevance. Even after the ltering procedures, some non-relevant
images may remain and therefore we proposed two re-ranking
strategies: visual-based and user credibility-based.</p>
      <p>The visual re-ranking used location's representative
images obtained from Wikipedia as queries. The original lists
were re-ranked according to the similarity in relation to the
representative sets. The visual distance from each image in
a list to the corresponding representative set was computed
as the minimum distance value between the image and each
representative image. For multiple feature fusion, we used
a smoothed version of the Borda Count algorithm. In our
version the vote (relevance score) for the nth image in the
1
rank was computed as p4n+1 .</p>
      <p>As a di erent re-ranking strategy, we also exploited the
user-credibility descriptors provided with the data. Hence,
we combined a relevance-based score (relScore) with
another score based on credibility (credScore). The relScore
of each image was computed according to its position in the
list as described for the visual re-ranking. The credScore
was computed as the product of three credibility features:
visualScore, faceProportion, and tagSpeci city. The nal
reranking score was computed as relScore credScore;
2.4</p>
    </sec>
    <sec id="sec-5">
      <title>Diversification Method</title>
      <p>
        After the ltering and re-ranking procedures, the next
step consists of the actual summarization and diversi
cation. We evaluated two diversi cation methods: MMR [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]
and a clustering technique based-on kMedoids. Given the
superiority of kMedoids over MMR on the development set,
we used the kMedoids clustering for the test set runs.
      </p>
      <p>The kMedoids clustering technique is divided into two
main steps: medoid de nition and clusters construction.
Since we were supposed to return 50 representative images,
the algorithm was set to create 50 clusters. The initial
centroids were de ned in a o set fashion. The o set value was
computed by dividing the list size by 50. The centroids
were then de ned as the images in the positions i o set,
with 0 i &lt; 50. Hence, the initial medoids were picked
throughout the list from the top to the bottom. After the
clusters are constructed, the process iterates untill there is
no further transition between the clusters. At each
iteration, the new medoids were de ned as the best connected
images (average distance to all images in the cluster). The
distance between two images is computed as the average of
their distances computed for each feature. Finally the
images in each cluster are ranked according to their positions
in the original non-clustered list. The nal output list is
composed of the most relevant item from each cluster.</p>
    </sec>
    <sec id="sec-6">
      <title>RUN SETUP</title>
      <p>We submitted ve runs and their descriptions are
presented in Table 1. The features used in each run and each
step were selected according to the best results on the
development set.</p>
    </sec>
    <sec id="sec-7">
      <title>RESULTS AND DISCUSSION</title>
      <p>Table 4 presents the o cial evaluation measures for the
ve runs. We can see that best results (for all measures)
were achieved when the proposed full pipeline was applied
4 GeoFilter and Visual re-ranking</p>
      <p>NumFacesFilter (CM3x3 + HOG + BIC)</p>
      <p>and Credibility re-ranking
5 GeoFilter and Visual re-ranking</p>
      <p>FaceClassi erFilter (CM3x3 + HOG + BIC)
and Credibility re-ranking
kMedoids (CN3x3)
(Runs 4 and 5). Run 2 (purely textual) slightly
outperformed Run 1 (purely visual) in terms of diversity. The
multimodal combination (Run 3) slightly outperformed Runs 1
and 2 on CR@20 and F1@20. However when the
credibility re-ranking was applied (Run 4) the best results were
achieved by the visual approach with reasonable
improvement on all e ectiveness measures. Notice that when the
face-based ltering used the classi er (Run 5), the results
were lower than using the face number threshold (Run 4)
but still superior to Runs 1 to 3 on F1@20.
5.</p>
    </sec>
    <sec id="sec-8">
      <title>CONCLUSIONS</title>
      <p>We proposed a multimodal approach with the use of
ltering and re-ranking approaches in conjunction with a
clustering technique for diversi cation. Our best results were
achieved with image re-ranking by combining their relevance
score and user credibility information. As future work we
would like to evaluate the usage of additional information on
the re-ranking and diversi cation steps and more elaborated
fusion approaches.</p>
    </sec>
    <sec id="sec-9">
      <title>ACKNOWLEDGMENTS</title>
      <p>We thank the support of UEFS/PROBIC, Samsung
Research Institute Brazil, and FAPESP (2013/11359-0).
Run Filtering
1 GeoFilter and</p>
      <p>NumFacesFilter
2 GeoFilter and</p>
      <p>NumFacesFilter
3 GeoFilter and</p>
      <p>NumFacesFilter
Visual re-ranking kMedoids
(CM3x3 + HOG + BIC) (BoV W mspaaxrse +</p>
      <p>HOG + Cosine)
kMedoids (CN3x3)</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>J.</given-names>
            <surname>Carbonell</surname>
          </string-name>
          and J.
          <string-name>
            <surname>Goldstein</surname>
          </string-name>
          .
          <article-title>The use of mmr, diversity-based reranking for reordering documents and producing summaries</article-title>
          .
          <source>In SIGIR</source>
          , pages
          <volume>335</volume>
          {
          <fpage>336</fpage>
          ,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>B.</given-names>
            <surname>Ionescu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Popescu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lupu</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. L.</surname>
          </string-name>
          <article-title>G^nsca, and</article-title>
          <string-name>
            <given-names>H.</given-names>
            <surname>Mu</surname>
          </string-name>
          <article-title>ller. Retrieving diverse social images at mediaeval 2014: Challenge, dataset and evaluation</article-title>
          . In MediaEval 2014 Workshop, Barcelona,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>O. A. B.</given-names>
            <surname>Penatti</surname>
          </string-name>
          , E.Valle, and R. da
          <string-name>
            <given-names>S.</given-names>
            <surname>Torres</surname>
          </string-name>
          .
          <article-title>Comparative study of global color and texture descriptors for web image retrieval</article-title>
          .
          <source>J. Vis. Commun. Image Repr</source>
          .,
          <volume>23</volume>
          (
          <issue>2</issue>
          ):
          <volume>359</volume>
          {
          <fpage>380</fpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>