<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>CFM@MediaEval 2017 Retrieving Diverse Social Images Task via Re-ranking and Hierarchical Clustering</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Liang Peng</string-name>
          <email>P@20</email>
          <email>pliang951125@outlook.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yi Bin</string-name>
          <email>yi.bin@hotmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Xiyao Fu Jie Zhou</string-name>
          <email>fu.xiyao.gm@gmail.com</email>
          <email>jiezhou0714@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yang Yang</string-name>
          <email>dlyyang@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Heng Tao Shen</string-name>
          <email>shenhengtao@hotmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Center for Future Media and School of Computer Science and Engineering University of Electronic Science and Technology of China</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2017</year>
      </pub-date>
      <fpage>13</fpage>
      <lpage>15</lpage>
      <abstract>
        <p>This paper presents an approach based on re-ranking and hierarchical clustering (HC) for MediaEval 2017 Retrieving Diverse Social Images Task. The experimental results on the development and test set demonstrate that the proposed approach can significantly improve relevance and visual diversity of the query results. Our approach achieves a good tradeof between relevance and diversity and a result in F1@20 of 0.6533 for the employed test data.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>INTRODUCTION</title>
      <p>
        Modern image retrieval systems should be able to give both
relevant and visually diverse results. In other words, the presented
images should not only be relevant to the query, but also cover
diferent visual facets. The MediaEval 2017 Retrieving Diverse Social
Images Task [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] fosters the development of such image retrieval
systems, which aim at the improvement of both the relevance and
the diversity of image search results. More specifically, given
images retrieved from Flickr using text queries, the goal of the task is
to refine the results by ranking images according to their relevance
to the query and ofering visually diversity. More details about the
task and data description can be found in [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
      </p>
      <p>
        Previous participants, such as [
        <xref ref-type="bibr" rid="ref3 ref7">3, 7</xref>
        ] employed a re-ranking method
to improve the relevance and showed a promising performance.
Clustering algorithms were widely used in this task to cluster
similar images, such as k-Medoids [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], hierarchical clustering [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ],
random forest [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], etc.
      </p>
      <p>
        In this work, we propose to re-rank images using the available
social metadata in order to refine the original relevance rank list
of images provided by Flickr. Then, a hierarchical clustering (HC)
algorithm is employed to diversify the top k images. Unlike
Tollari [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], who merges diferent features based on similarities, we
merge features using weighted distances. In order to balance
relevance and diversity well, we apply a diversification strategy similar
to [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
    </sec>
    <sec id="sec-2">
      <title>APPROACH</title>
      <p>As shown in Figure 1, our approach is composed of three
components: Re-ranking, Clustering and Diversification . Specifically, to
improve the semantic relevance between images and their
corresponding queries, we first re-rank the images retrieved for each</p>
      <p>Images
Clustering</p>
      <p>HC</p>
      <p>Re-ranking</p>
      <p>Textual
Feature Fusion</p>
      <p>Top k
Representative Selection</p>
      <p>50 images
topic using the available textual metadata. Then we apply a
hierarchical clustering algorithm to diversify the refined image rank
list and balance relevance and diversity of the final results using a
diversification approach.
2.1</p>
    </sec>
    <sec id="sec-3">
      <title>Re-ranking</title>
      <p>The given dataset consists of images that are retrieved from Flickr
using text-based, multi-topic queries. For each topic, images are
arranged using the initial rank list by the Flickr retrieval system.
The rank list, however, commonly mismatches the probability of
relevance to its query topic. To improve the relevance of the initial
rank list, we propose to re-rank the images of each topic by
exploiting their social metadata (e.g., titles, geo-tags, and usernames).
Concretely, we compute the Cosine Similarity according to the
TFIDF weights of the combined social metadata (i.e., concatenating
corresponding feature vectors). Note that we do not utilize the
descriptions in medata, because we find that there is much redundant
or irrelevant information in the descriptions, which may decrease
the relevance of the underlying images and the final performance
notably. Since the queries are represented by text, our re-ranking
could narrow the semantic gap between the queries and the
relevance of the corresponding image results.
2.2</p>
    </sec>
    <sec id="sec-4">
      <title>Clustering</title>
      <p>After re-ranking, we obtain a more relevant rank list of the initial
N images for each topic (N = 300 in the task). For the purpose of
reducing irrelevant images, we only select the top k images from
the refined rank list for the hierarchical clustering (See Section 4
run2
0.6473
0.453
0.5152
run3
for more details). In other words, the images with low ranks, which
are assumed to be the most irrelevant to the topic, are discarded.
The experimental results indicate that the employed performance
scores and F1@20 benefit from this selective operation.</p>
      <p>At the clustering step, one or more visual features, textual
features and credibility descriptors are utilized. In order to take
advantage of several features, we merge them by summing the weighted
distance:
n
Õ
i=1
distf1,f2, ··· ,fn (I1, I2) =
wi · distfi (I1, I2),
(1)
where fi denotes the i-th feature, distfi (I1, I2) is the distance
between image 1 and image 2 based on i-th feature, and wi is a
manually set weight for the distance of the i-th feature.</p>
      <p>
        We employ HC with single, complete, ward and average
linkage methods [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] respectively based on the Euclidean distance, and
cluster 50 clusters for each topic.
2.3
      </p>
    </sec>
    <sec id="sec-5">
      <title>Diversification</title>
      <p>HC distributes the selected top k images to several sets, each of
which potentially covers one scene or situation with related
semantic meaning. Intuitively, to improve the diversity of a retrieval
system, the results should cover as many clusters as possible. In
the similiar way, the higher ranks indicate higher relevance. To
balance diversity and relevance, we directly select the images with the
highest relevance ranks from each cluster and sort them according
to their relevance as final result (see re-ranking in Section 2.1).
3</p>
    </sec>
    <sec id="sec-6">
      <title>RUN SETUP</title>
      <p>
        We submit 5 runs with diferent experimental setting.
• Run1: visual-only. No re-ranking. HC uses a complete
linkage method and the provided visual feature auto color
correlogram (acc) [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
• Run2: text-only. Text-based re-ranking. HC uses a complete
linkage method and the provided textual feature [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], .
• Run3: text-visual. Text-based re-ranking. HC uses a ward
linkage method and the provided visual feature scalable
color (sc) and textual feature [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
• Run4: text-cred. Text-based re-ranking. HC uses a single
linkage method and the provided credibility descriptors [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
• Run5: text-cred. Text-based re-ranking. HC uses a single
linkage method and the provided credibility descriptors
and textual feature [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
4
      </p>
    </sec>
    <sec id="sec-7">
      <title>EXPERIMENT AND RESULTS</title>
      <p>Table 1 and Table 2 present the results for the five runs evaluated
on Devset and Testset respectively. Concretly, Devset is the
development set and Testset is the oficial test set. Results are assessed
0.628
0.6009
0.5871
run2
with Precision at X images (P@X), Cluster Recall at X (CR@X), and
F1-measure at X (F1@X). F1@20 is the oficial ranking metric. Best
results are depicted in bold. Flickr is the baseline result using the
top 50 images based on the initial Flickr ranking.</p>
      <p>In run1, we do not use the top k strategy before clustering
because of no re-ranking. Our preliminary experiments show that
acc can achieve better performance than other provided visual
features (e.g., CNN, sc) in run1. The performance of run1 in
Table 1 shows that clustering with visual feature acc can improve
the diversity to a certain degree. In run2, we only employ textual
features (title, tags, username) for both re-ranking and clustering.
Before clustering, we select the top k (100, 150, 200, 250, 300)
images. Preliminary experiments indicate that k=150 achieves the
best performance in terms of F1@20. The results achieved on the
development and test data show that re-ranking and hierarchical
clustering using textual features can improve the relevance and the
diversity in comparison to run1 and Flickr to a large degree.</p>
      <p>The core diference between run2, run3, run4 and run5 is that
they employ diferent features for clustering. Run3 is a combined
visual-textual run and employs a ward linkage method for
clustering. The reason we do not use the same visual feature as run1 is
that better perfomance will be achieved using sc than acc or any
other provided visual features in run3. The weights of textual and
visual feature are 1 and 0.02 respectively. In contrast to the result of
run2 on Devset and Testset, the value of every metric is increased,
CR@20 especially. The results on Devset and Testset indicate that
the combination of visual and textual features outperforms the
single feature. Note that run4 which employs only credibility feature
in clustering achieves a better CR@20 and F1@20 than run2 and
run3 on both Devset and Testset. In our experiments, the credibility
feature is the best feature for clustering. Run5 combines credibility
and textual features in clustering and achieves the best performance
in P@20, CR@20 and F1@20 on both Devset (0.6605, 0.4888, 0.5402)
and Testset (0.6881, 0.6671, 0.6533). We do not use any visual feature,
because our experimental results on the development set show that
none of the provided visual features is helpful to attain a higher
F1@20 in run5. The weights of textual and credibility features are
0.2 and 1 respectively.
5</p>
    </sec>
    <sec id="sec-8">
      <title>CONCLUSION</title>
      <p>We presented an approach which employes re-ranking and
hierarchical clustering for retrieving diverse social images. Experimental
results demonstrated the eficiency of our approach to improve the
relevance and diversity. Credibility descriptors provided a mark
of the quality of the user’s tag-image content relationships and
were very helpful to cluster similar images to improve the diversity.
Merging textual and credibility features could take advantage of
diferent features and achieved satisfactory performance.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>B.</given-names>
            <surname>Boteanu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.G.</given-names>
            <surname>Constantin</surname>
          </string-name>
          , and
          <string-name>
            <given-names>B.</given-names>
            <surname>Ionescu</surname>
          </string-name>
          .
          <article-title>Lapi@2016 retrieving diverse social images task: A pseudo-relevance feedback diversification perspective</article-title>
          . In MediaEval,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>X.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Haokun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Zhi-Hong</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yunlun</surname>
          </string-name>
          .
          <article-title>Retrieving relevant and diverse image from social media images</article-title>
          .
          <source>In MediaEval</source>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>C. D Ferreira</surname>
            ,
            <given-names>R. T.</given-names>
          </string-name>
          <string-name>
            <surname>Calumby</surname>
          </string-name>
          , I. B.
          <string-name>
            <surname>do C Araujo</surname>
          </string-name>
          ,
          <string-name>
            <surname>Ícaro C Dourado</surname>
            ,
            <given-names>J. AV</given-names>
          </string-name>
          <string-name>
            <surname>Munoz</surname>
            ,
            <given-names>O. A. B.</given-names>
          </string-name>
          <string-name>
            <surname>Penatti</surname>
            ,
            <given-names>L. T.</given-names>
          </string-name>
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Almeida</surname>
          </string-name>
          , and R. da Silva Torres. Recod@ mediaeval
          <year>2016</year>
          :
          <article-title>Diverse social images retrieval</article-title>
          .
          <source>In MediaEval</source>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>A. L.</given-names>
            <surname>Gînsca</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Popescu</surname>
          </string-name>
          , and
          <string-name>
            <given-names>N.</given-names>
            <surname>Rekabsaz</surname>
          </string-name>
          .
          <article-title>Cea list's participation at the mediaeval 2014 retrieving diverse social images task</article-title>
          .
          <source>In MediaEval</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>D.</given-names>
            <surname>Müllner</surname>
          </string-name>
          .
          <article-title>Modern hierarchical, agglomerative clustering algorithms</article-title>
          .
          <source>arXiv preprint arXiv:1109.2378</source>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>C.</given-names>
            <surname>Spampinato</surname>
          </string-name>
          and
          <string-name>
            <given-names>S.</given-names>
            <surname>Palazzo</surname>
          </string-name>
          .
          <article-title>Perceive lab@unict at mediaeval 2014 diverse images: Random forests for diversity-based clustering</article-title>
          .
          <source>In MediaEval</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>S.</given-names>
            <surname>Tollari</surname>
          </string-name>
          .
          <article-title>Upmc at mediaeval 2016 retrieving diverse social images task</article-title>
          .
          <source>In MediaEval 2016 Workshop</source>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>M.</given-names>
            <surname>Zaharieva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Ionescu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. L.</given-names>
            <surname>Gînsca</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. L. T.</given-names>
            <surname>Santos</surname>
          </string-name>
          , and
          <string-name>
            <given-names>H.</given-names>
            <surname>Müller</surname>
          </string-name>
          .
          <article-title>Retrieving diverse social images at mediaeval 2017: Challenge, dataset and evaluation</article-title>
          . In MediaEval,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>