<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Retrieving Relevant and Diverse Image from Social Media Images</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Pre- FIltering</string-name>
          <email>P@20</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Haokun Liu Peking University Beijing</institution>
          ,
          <country country="CN">China</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Xi Chen Peking University Beijing</institution>
          ,
          <country country="CN">China</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Yunlun Yang Peking University Beijing</institution>
          ,
          <addr-line>China incomparable-</addr-line>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Zhihong Deng Peking University Beijing</institution>
          ,
          <country country="CN">China</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2015</year>
      </pub-date>
      <fpage>14</fpage>
      <lpage>15</lpage>
      <abstract>
        <p>We describe our approach and its result for MediaEval 2015 Retrieving Diverse Social Images Task. The basic idea is removing the irrelevant images and then obtaining the diverse image using a greedy strategy. Experiment results show our method can retrieve diverse images with a moderate relevance to the topic.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. INTRODUCTION</title>
      <p>
        Retrieving social media information is currently a hot topic in
research. The Retrieving Diverse Social Images task puts forward
a problem on how to generate a brief summary up to 50 images
which is both relevant and diverse from Flickr photos [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>The problem is challenging. On the one hand, we need to filter
out the irrelevant images in Flickr to guarantee the relevance. On
the other hand, we need to reserve different images in order to
enhance the diversity.</p>
      <p>
        Many work on this problem have been done previous.
Concetto and Simone proposed a method including cluster wise
filtering which attained a high relevancy but lower the diversity [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
Dang-Nguyen, Piras and Giacinto introduced a method removing
irrelevant image before clustering, yield high score in both criteria
[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], but this method may fail when irrelevant images is in the
majority of the images. There are methods trying to make use of
the given ground truth with a neural network [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], but not very
effective in improving the diversity.
      </p>
      <p>We propose a method considering both relevance and
diversity, which can be describe briefly as extracting images from
the given images through clustering and sorting with the help of the
images from Wikipedia. The basic idea is removing the irrelevant
images and then obtaining the diverse images using a greedy
strategy.</p>
    </sec>
    <sec id="sec-2">
      <title>2. APPROACH</title>
      <p>Our approach contains 6 steps (Figure 1): pre-filtering,
clustering, cluster-filtering, sorting, distributing and extracting.
Each of them is described in the following.</p>
    </sec>
    <sec id="sec-3">
      <title>2.1 Pre-Filtering</title>
      <p>In this step, we filter out part of irrelevant images based on the
following rules:
1． If the distance between spots for photography and the query is
over 30km.
2． If the proportion of human faces in the images is over 0.05.</p>
      <p>Because the face detector we used is not absolutely accurate,
and some images including people are also relevant to the queries’
topic, the images we filter out are not all correct but of a
considerable accuracy.
Distributing
Extracting
• Filter out part of irrelevant images based on distance
and face detector
• Based on gengeral CNN, adaptive CNN and TF-IDF
similarity features
Cluster- • Select clusters based on average user visual score
Filtering
• Sort the clusters based on CNN features similarity
with Wikipedia images or user credibility
• Choose images from each cluster evenly
• Use a greedy strategy to extract images not too close</p>
      <p>to each other</p>
      <p>
        We use a hierarchical clustering algorithm [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] to cluster the
image into 30 clusters.
      </p>
    </sec>
    <sec id="sec-4">
      <title>2.3 Cluster-Filtering</title>
      <p>In the previous step, similar images are clustered into the same
cluster. As a result, in some cluster, most image are relevant, while
in other clusters, most images are irrelevant. However, relevant
images with different classes may not be directly clustered into
different clusters. Instead, similar classes are probably clustered
into one cluster.</p>
      <p>We expect to remove irrelevant clusters first, and to deal with
the diversity problem later. Using user information, we calculate
the average user score of each cluster, and keep only the clusters
whose score is higher than the average score of the whole image set
and filter out the other.</p>
    </sec>
    <sec id="sec-5">
      <title>2.4 Sorting</title>
      <p>To sorting the clusters, we calculate the distance between each
image in a cluster and the corresponding nearest image in
Wikipedia, and sort the clusters based on the average distance.</p>
    </sec>
    <sec id="sec-6">
      <title>2.5 Distributing</title>
      <p>After sorting, we need to determine the number of images to
be extracted from each cluster i.e., distributing the number of
images among clusters. Here we use uniform distribution i.e.,
choosing same number of images from each cluster. And if it can’t
be distributed evenly, redundant part will be chosen from the
former clusters.</p>
    </sec>
    <sec id="sec-7">
      <title>2.6 Extracting</title>
      <p>After the previous distributing step, we apply a greedy strategy
to extract images from each cluster. In the process of extracting, we
choose the image in the first cluster with highest visual score to be
the first image. And after that, in each step, image with highest
score which is not among the top 15 nearest of any previously
chosen image will be selected.</p>
    </sec>
    <sec id="sec-8">
      <title>3. RESULTS AND DISCUSSION</title>
    </sec>
    <sec id="sec-9">
      <title>3.1 Running Options</title>
      <p>We submitted 5 runs as following:
Run 1. We use only image features in clustering and extracting, and
clusters are sorted by size without being filtered. We distribute
evenly and extract using a greedy strategy.</p>
      <p>Run 2. We use only text features in clustering and extracting, and
clusters are sorted by size without being filtered. We distribute
evenly and extract using a greedy strategy.</p>
      <p>Run 3. We use both image and text features. Clusters are sorted by
size without being filtered. We distribute evenly and extract using
a greedy strategy. Featured were weighted differently in clustering
and extracting.</p>
      <p>Run 4. We simply select the first fifty pictures sorted by user score.
Run 5. We use both image and text features. The clusters with low
average visual score are filtered before sorted by distance to wiki
image (sorted by average visual score if wiki is not available). We
distribute evenly and extract using a greedy strategy. Featured were
weighted differently in clustering and extracting.</p>
      <p>Different features used are shown in Table 1 above.</p>
    </sec>
    <sec id="sec-10">
      <title>3.2 Results</title>
      <p>This section presents the experimental results achieved on
development set (153 queries, 45,375 photos) and test set (139
queries, 41,394 photos). The CR@20 is number of covered classes
among the top 20 results, P@20 is number of relevant photos
among the top 20 results, and F1@20 is the harmonic mean of the
previous two.</p>
      <p>We obtained the best result at Run 5 in development set with
F1@20 values of 0.5866, P@20 values of 0.7995, and CR@20
values of 0.4634. In the test for Single-topic, Run 5 gains the
highest score on both relevance and diversity, while in the test for
Multi-topic, though P@20 in Run 5 is not the highest, it’s close to
Run 1
Run 2
Run 3
Run 4
Run 5
Run 1
Run 2
Run 3
Run 4
Run 5
Run 1
Run 2
Run 3
Run 4
Run 5
Run 1
Run 2
Run 3
Run 4
Run 5</p>
    </sec>
    <sec id="sec-11">
      <title>4. Discussion</title>
      <p>In Development Set, our approach has achieved good results,
while in Test Set, P@20 performs relatively poorly. We believe it
is due to the method we used and problems we observed.</p>
      <p>We noticed two problems in the experiments. One is that
images in different classes may be clustered into the same cluster.
So we utilize a greedy strategy, and set different image/test weight
in the process of classification and extraction, which make a
significant improvement in the experiments.</p>
      <p>The second issue is that we found it a common scenario that
users take multiple shots at one scene, causing irrelative images
may be well clustered into several clusters with few relevant images.
Because of the difficulties in removing irrelative clusters, we use
user visual score to decide whether a cluster is irrelative or not. This
method relies on the accuracy of user information. In the
experiments it generally makes improvements, but it reduces the
score at some test points, moreover, it makes the approach more
sensitive to the quality of user information. Thus this problem has
not yet reached a satisfactory solution.</p>
    </sec>
    <sec id="sec-12">
      <title>5. CONCLUSION</title>
      <p>We proposed an approach on how to generate a brief image
summary considering both relevance and diversity from Flickr.
This approach performs well as shown by our experiments. And we
also put forward two problems which restrict the performance in
the experiments.</p>
      <p>Further research will be mainly on how to identify whether a
similar picture is related to the topic or not.</p>
    </sec>
    <sec id="sec-13">
      <title>ACKNOWLEDGMENT</title>
      <p>This work is support by the Peking University Education
Foundation (URTP2015PKU003).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Ionescu</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gînscă</surname>
            ,
            <given-names>A.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Boteanu</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Popescu</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lupu</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Müller</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <source>Retrieving Diverse Social Images at MediaEval</source>
          <year>2015</year>
          :
          <article-title>Challenge, Dataset and Evaluation</article-title>
          .
          <source>Working Notes Proceedings of the MediaEval 2015 Workshop</source>
          , Wurzen, Germany, September 14-15, CEURWS.org,
          <year>2015</year>
          ;
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>PeRCeiVe</given-names>
            <surname>Lab@UNICT at MediaEval 2014 Diverse Images</surname>
          </string-name>
          <article-title>: Random Forests for Diversity-based Clustering Concetto Spampinato</article-title>
          , Simone Palazzo
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Duc-Tien</surname>
          </string-name>
          Dang-Nguyen, Luca Piras, Giorgio Giacinto, Giulia Boato, Francesco De Natale:
          <article-title>Retrieval of Diverse Images by Pre-filtering and Hierarchical Clustering</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <article-title>[4] MIRACL's participation at MediaEval 2014 Retrieving Diverse Social Images Task: Hanen Karamti</article-title>
          , Mohamed Tmar, Faiez Gargouri
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <article-title>[5] Scikit-learn: Machine Learning in Python</article-title>
          , Pedregosa et al.,
          <source>JMLR 12</source>
          , pp.
          <fpage>2825</fpage>
          -
          <lpage>2830</lpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>