<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Retrieval of Diverse Images by Pre-filtering and Hierarchical Clustering</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>. Tree Refinning</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>D.-T. Dang-Nguyen, L. Piras, G. Giacinto DIEE - University of Cagliari Piazza D'armi</institution>
          ,
          <addr-line>09123 Cagliari</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>G. Boato, F. G. B. De Natale DISI - University of Trento Via Sommarive</institution>
          ,
          <addr-line>9 I-38123 Povo, Trento</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Step 4. Result reranking</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2014</year>
      </pub-date>
      <fpage>16</fpage>
      <lpage>17</lpage>
      <abstract>
        <p>In this paper, we describe our approach and its results for MediaEval 2014 Retrieving Diverse Social Images Task. The basic idea of our proposed method is to lter out non-relevant images at the beginning of the process and then construct a hierarchical tree which allows to cluster the images with di erent criteria on visual and textual features. Experimental results shown that it is stable and has little uctuation with the number of topics.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. INTRODUCTION</title>
      <p>
        In MediaEval 2014 Retrieving Diverse Social Images task
[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], participants are provided with sets of images retrieved
from Flickr, where each set is related to a location. However,
these sets are normally noisy and redundant, thus, the goal
of this task is to re ne the initial results by choosing a subset
of images that are relevant to the queried location but
reporting di erent views of the location, various perspective,
di erent daytimes (e.g., night and day), etc.
      </p>
      <p>
        The basic idea of the proposed method is to lter out
the non-relevant images at the beginning based on the rules
of the task and then use for clustering the BIRCH
algorithm [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], that builds a hierarchical structure where nodes
are the images, and edges represent the similarity between
the linked nodes. This structure allows creating di erent
clusters, according to the criteria used, and can also be used
to remove outliers, i.e., non-relevant images that were not
ltered out during the rst step.
      </p>
    </sec>
    <sec id="sec-2">
      <title>METHODOLOGY</title>
      <p>The proposed method contains 4 steps (see Fig. 1):
Step 1. Pre- ltering: The goal of this step is to lter
out outliers by removing images that are considered
as non-relevant. We consider an image as non-relevant
by de ning the following rules: (i) it contains people
as main subject; (ii) it was shot far away from the
queried location; (iii) it received very few number of
views on Flickr; and (iv) it is out-of-focus or blurred.
Condition (i) can be detected by the proportion of the
human face size with respect to the size of the image.</p>
      <p>
        In our method, Luxand FaceSDK 1 is used as a face
Step 2. Hierarchical Clustering: In this step, we use
the BIRCH clustering algorithm [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] on the provided
visual and textual features. BIRCH allows to obtain
an initial clustering result in large datasets with very
low computational costs. Images that are similar to
each other based on global visual features and textual
information after this step are grouped into the same
cluster or the same branch of the hierarchical tree;
Step 3. Tree Re ning: thanks to the initial tree
constructed in the previous step, isolated clusters can be
easily removed or merged to other branches by
updating the tree, without modifying the clusters;
Step 4. Result Re-ranking: the clusters are sorted
based on the number of images, i.e., clusters contain
more images are ranked higher. In each cluster, the
image uploaded by the user who has highest visual score
is selected as the rst image. If there are more than
one image from that user, the image closest to the
centroid is selected. The second image is the one which
Other features
HOG2x2, f-Score, face size, GPS
has the largest distance to the rst image. The third
image is chosen as the image with the largest distance
to both the rst 2 images, and so on.
      </p>
      <p>Several similarities and metrics have been used: for the
provided visual information, we use Euclidean distance, while
with textual information, we use cosine similarity. About
geo-tagged images, Haversine formula is used to compute
the geographical distance between two locations.</p>
    </sec>
    <sec id="sec-3">
      <title>RESULTS AND DISCUSSION</title>
      <p>In order to nd the best combination of features and
parameters (the number of clusters, the inner parameters of
BIRCH, and the thresholds to determine the outliers), we
ran our model for all the provided features together with our
own features. According to the results, we choose the best
features and parameters for each run and applied to the test
set as follows:</p>
      <p>Run 1 (Visual ): Color naming (CNM), color descriptor
(GCD), histogram of oriented gradients (HOG) and
local binary pattern (GLBP) are used. In Step 4, since
we cannot exploit user credibility information, the
centroid of each cluster is selected as the rst image.
Run 2 (Text ): The parameters are chosen similar to
Run 1, but we used only TF-IDF information and
measure the distances by cosine similarity.</p>
      <p>Run 3 (Visual+Text ): The method is applied on the
combined features from Run 1 and Run 2 where
TFIDF is used rst, then the visual features with
Euclidean distance are applied after.</p>
      <p>
        Run 4 (User credibility ): Please notice that this run
is allowed to use only the user credibility information,
thus the proposed method is not applied. In this run,
we clustered the images by user. The order of the
clusters is ranked based on the visual score (i.e., the
cluster belong to the user with highest visual score
will be selected rst), then by face proportion, and so
on with all the user credibility information. For each
cluster, images are selected based on the number of
views, i.e., the image with highest number of views is
selected as the rst image.
Run 5 (All features): All steps in the proposed method
are applied in this run. In Step 1, outliers are detected
as follows: (i) the face size is bigger than 10% with
respect to the size of the image, (ii) images that were
shot farther than 15kms, (iii) images that have less
than 25 views, and (iv) images that have f-score (focus
measure) smaller than 20. In Step 2, a similar
clustering as Run 3 is applied. About the visual features,
we replace the provided HOG features by HOG2x2 as
presented in [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>J.-T.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.-H.</given-names>
            <surname>Shen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.-M.</given-names>
            <surname>Phoong</surname>
          </string-name>
          , and
          <string-name>
            <given-names>H.</given-names>
            <surname>Chen</surname>
          </string-name>
          .
          <article-title>Robust measure of image focus in the wavelet domain</article-title>
          .
          <source>In Intelligent Signal Processing and Communication Systems</source>
          , pages
          <fpage>157</fpage>
          {
          <fpage>160</fpage>
          ,
          <string-name>
            <surname>Dec</surname>
          </string-name>
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>B.</given-names>
            <surname>Ionescu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Popescu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lupu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. L.</given-names>
            <surname>Ginsca</surname>
          </string-name>
          , and
          <string-name>
            <given-names>H.</given-names>
            <surname>Muller</surname>
          </string-name>
          .
          <article-title>Retrieving diverse social images at mediaeval 2014: Challenge, dataset and evaluation</article-title>
          . In MediaEval 2014 Workshop, Barcelona, Spain,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>J.</given-names>
            <surname>Xiao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Hays</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. A.</given-names>
            <surname>Ehinger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Oliva</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Torralba</surname>
          </string-name>
          .
          <article-title>Sun database: Large-scale scene recognition from abbey to zoo</article-title>
          .
          <source>In IEEE Conference on Computer Vision and Pattern Recognition</source>
          , pages
          <volume>3485</volume>
          {
          <fpage>3492</fpage>
          . IEEE,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>T.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Ramakrishnan</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Livny</surname>
          </string-name>
          . BIRCH:
          <article-title>An E cient Data Clustering Method for Very Large Databases</article-title>
          .
          <source>In Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data</source>
          , pages
          <volume>103</volume>
          {
          <fpage>114</fpage>
          ,
          <year>1996</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>