<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>UPC-UB-STP @ MediaEval 2015 Diversity Task: Iterative Reranking of Relevant Images</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Aniol Lidon</string-name>
          <email>xavier.giro@upc.edu</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marc Bolaños</string-name>
          <email>marc.bolanos@ub.edu</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Markus Seidl</string-name>
          <email>m.zeppelzauer@fhstp.ac.at</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Matthias Zeppelzauer, St. Pölten University of, Applied Sciences</institution>
          ,
          <addr-line>St. Pölten</addr-line>
          ,
          <country country="AT">Austria</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Petia Radeva, Universitat de Barcelona</institution>
          ,
          <addr-line>Barcelona, Catalonia/</addr-line>
          <country country="ES">Spain</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Xavier Giró-i-Nieto, Universitat Politècnica de</institution>
          ,
          <addr-line>Catalunya, Barcelona, Catalonia/</addr-line>
          <country country="ES">Spain</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2015</year>
      </pub-date>
      <fpage>14</fpage>
      <lpage>15</lpage>
      <abstract>
        <p>This paper presents the results of the UPC-UB-STP team in the 2015 MediaEval Retrieving Diverse Images Task. The goal of the challenge is to provide a ranked list of Flickr photos for a prede ned set of queries. Our approach rstly generates a ranking of images based on a query-independent estimation of its relevance. Only top results are kept and iteratively re-ranked based on their intra-similarity to introduce diversity.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. INTRODUCTION</title>
      <p>
        The diversi cation of search results is an important factor
to improve the usability of visual retrieval engines. This
motivates the 2015 MediaEval Retrieving Diverse Images
Task [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], which de nes the scienti c benchmark targeted in
this paper. The proposed methodology solves the trade-o
between relevance and diversity by rstly ltering results
based on a learned relevance classi er, and secondly building
a diverse reranked list following an iterative scheme.
      </p>
      <p>
        The rst challenge in our system is ltering irrelevant
images, as suggested in [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Relevance is a very abstract
concept with a high subjectivity involved. Similar problems
have been addressed in the visual domain, as for
memorability [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] or interestingness [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]. In both cases, a crowdsourced
task was organised to collect a large amount of human
annotations used to train a classi er based on visual features.
      </p>
      <p>
        The second challenge to address is the diversity in the
ranked list. A seminar work from 1998 [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] introduced
diversity in addition to relevance for text retrieval, a concept that
was later ported to image [
        <xref ref-type="bibr" rid="ref17 ref19 ref4">17, 4, 19</xref>
        ] and video retrieval [
        <xref ref-type="bibr" rid="ref6 ref7">7,
6</xref>
        ]. Di erent features have been used for this purpose, both
textual (e.g. tags [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]), visual (e.g. convolutional neural
networks [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]), or multimodal fusion [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
    </sec>
    <sec id="sec-2">
      <title>METHODOLOGY</title>
      <p>A generic and easily extensible methodology of four steps
has been applied in all our submitted runs. While steps 2
and 4 apply to all runs, steps 1 and 3 contain particularities
for visual and textual processing.</p>
      <p>1) Ranking by relevance: A relevance score for each
image is estimated by either using visual or textual
information (see details in Section 2.1 and 2.2 respectively).
2) Filtering of irrelevant images: Only a percentage
of the top ranked images by relevance are considered in later
steps. In the multimodal runs, the relevance scores for the
visual and textual modalities are linearly normalized and
fused by averaging.</p>
      <p>3) Feature and distance computation: Visual and/or
textual features are extracted for each image, and the
similarity between each pair computed.</p>
      <p>4) Reranking by diversity: An iterative algorithm
selects the most di erent image with respect to all previously
selected ones. The similarity is always assessed by averaging
the considered visual and textual features. Iterations start
by adding the most relevant image as the rst element of
the reranked list.
2.1</p>
    </sec>
    <sec id="sec-3">
      <title>Visual data</title>
      <p>
        The visual information was analyzed with Convolutional
Neural Networks (CNN) [
        <xref ref-type="bibr" rid="ref12 ref13">13, 12</xref>
        ] with two di erent approaches:
1) Ranking by relevance: A Relevance CNN was
created based on HybridNet [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ], a CNN trained with objects
from the ImageNet dataset [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] and locations from the Places
dataset [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ]. HybridNet was ne-tuned in two classes:
relevant and irrelevant, as labeled by human annotators.
      </p>
      <p>
        3) Feature and distance computation: The fully
connected layers fc7 from a CNN trained on ImageNet [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], and
the fully connected layer fc8 from HybridNet [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ] were used
as feature vectors [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ].
2.2
      </p>
      <p>
        Textual data
1) Ranking by relevance: For each query, we generate
a textual term model in an unsupervised manner from all
images returned for this query. We rst remove stopwords,
words with numeric and special characters and words of
length 4. Next, we select the most representative terms by
retaining only those terms where the term frequency (T Fq)
is higher than the document frequency (DFq) for the query
q. For each term in the model we store the T Fq as a weight.
Once this model has been established, we map the textual
descriptions of the images to the model of the query. For
each image only terms that appear also in the query model
are retained. For each remaining term we retrieve the T Fi
for the corresponding ith image and build a feature vector.
To compute a relevance score si for an image, we compute
the cosine similarity simi between the query model and a
given image feature vector. Additionally, we add the inverse
original Flickr rank ri of the image to the score, yielding a
nal textual relevance score of si = simi + (1=ri) for
image i. This computation is inspired by that of [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ] with the
di erence that we use TF instead of TFIDF in the scoring
function which showed to be more expressive in our
experiments.
      </p>
      <p>
        3) Feature and distance computation: Diversity
reranking requires the similarity comparison of all relevant
images for a query. For a given image, we rst align its
terms to the query model. Next, we compute their TFIDF
weights (T Fi=DFi) [
        <xref ref-type="bibr" rid="ref15 ref23">15, 23</xref>
        ]. Terms from the query model
that do not occur in the image's descriptions get a weight of
zero. The resulting feature vectors are compared with the
cosine metric in diversity re-ranking.
      </p>
    </sec>
    <sec id="sec-4">
      <title>EXPERIMENTAL SETUP</title>
      <p>
        The experimental setup is mostly de ned by the 2015
MediaEval Retrieving Diverse Images Task, which provides a
dataset partitioned into development (devset) and test
(testset), two types of queries (single and multi-topic), and
standardized and complementary evaluation metrics: Precision
at 20 (P@20 ), Cluster Recall at 20 (CR@20 ) and F1-score
at 20 (F1@20 ). The reader is referred to the task overview
paper [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] to learn the details of the problem.
      </p>
      <p>The Relevance CNN described in Section 2.1 was trained
with a 2-fold cross validation, each split containing one half
of the devset queries. For both splits we stopped after 2,000
iterations, when the validation accuracy was the highest one
(76% and 75% respectively). When applying the best
methods' parameters on the testset, we used all the dev data and
ne-tuned the network stopping after 4,500 iterations, when
the training loss was minimum.</p>
      <p>The portion of images to be ltered in Step 2 was learned
by measuring the evolution of the nal F1-score for di
erent percentages. From Runs 1 to 3 the best results where
obtained by keeping the top 20% of images, while for Run 5
the best value was 15%.</p>
    </sec>
    <sec id="sec-5">
      <title>RESULTS</title>
    </sec>
    <sec id="sec-6">
      <title>CONCLUSIONS</title>
      <p>
        The trade-o between relevance and diversity has been
targeted in this work with relevance-based ltering and a
posterior iterative process to introduce diversity. The nal
results, presented in Table 1, are comparable to the state of
the art on the devset [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], and achieve up to a F1@20 of 0.508
on the testset.
      </p>
      <p>Multi-topic queries seem to be more di cult to diversify
than single-topic queries. A reason may be that multi-topic
queries are more general and contain more heterogeneous
content. Considering the fact that our method was trained
on single-topic queries only, the results for the multi-topic
queries are, however, still promising.</p>
      <p>It is remarkable that increasing the number of N of
retrieved images increases both, recall and precision (and not
only recall as one would expect in a typical retrieval
scenario), as shown in Figure 1. This indicates that the
relevance ranking obtained by our method is accurate (at least
for N 50).</p>
      <p>There is no clear winner between textual and visual
information (Runs 1 and 2 ). The multimodal combination,
however, clearly improves performance (Runs 3 and 5 ).
Additionally, results indicate that using multimodal processing
at all stages (Run 3 ) is better than using multimodal
processing only during the relevance ranking (Run 5 ).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>J.</given-names>
            <surname>Carbonell</surname>
          </string-name>
          and J.
          <string-name>
            <surname>Goldstein</surname>
          </string-name>
          .
          <article-title>The use of mmr, diversity-based reranking for reordering documents and producing summaries</article-title>
          .
          <source>In Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval</source>
          , pages
          <volume>335</volume>
          {
          <fpage>336</fpage>
          . ACM,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>D</given-names>
            <surname>.-T.</surname>
          </string-name>
          Dang-Nguyen,
          <string-name>
            <given-names>L.</given-names>
            <surname>Piras</surname>
          </string-name>
          , G. Giacinto, G. Boato, and
          <string-name>
            <given-names>F. G.</given-names>
            <surname>De Natale</surname>
          </string-name>
          .
          <article-title>A hybrid approach for retrieving diverse social images of landmarks</article-title>
          .
          <source>In Multimedia and Expo (ICME)</source>
          ,
          <source>2015 IEEE International Conference on, pages 1{6</source>
          . IEEE,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>J.</given-names>
            <surname>Deng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Dong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Socher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.-J.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Li</surname>
          </string-name>
          , and
          <string-name>
            <given-names>L.</given-names>
            <surname>Fei-Fei</surname>
          </string-name>
          .
          <article-title>Imagenet: A large-scale hierarchical image database</article-title>
          .
          <source>In Computer Vision and Pattern Recognition</source>
          ,
          <year>2009</year>
          .
          <article-title>CVPR 2009</article-title>
          . IEEE Conference on, pages
          <volume>248</volume>
          {
          <fpage>255</fpage>
          . IEEE,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>T.</given-names>
            <surname>Deselaers</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Gass</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Dreuw</surname>
          </string-name>
          , and
          <string-name>
            <given-names>H.</given-names>
            <surname>Ney</surname>
          </string-name>
          .
          <article-title>Jointly optimising relevance and diversity in image retrieval</article-title>
          .
          <source>In Proceedings of the ACM international conference on image and video retrieval, page 39. ACM</source>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Gao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.-J.</given-names>
            <surname>Zha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Shen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and X.</given-names>
            <surname>Wu</surname>
          </string-name>
          .
          <article-title>Visual-textual joint relevance learning for tag-based social image search</article-title>
          .
          <source>Image Processing</source>
          , IEEE Transactions on,
          <volume>22</volume>
          (
          <issue>1</issue>
          ):
          <volume>363</volume>
          {
          <fpage>376</fpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>X.</given-names>
            <surname>Giro-i Nieto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Alfaro</surname>
          </string-name>
          , and
          <string-name>
            <given-names>F.</given-names>
            <surname>Marques</surname>
          </string-name>
          .
          <article-title>Diversity ranking for video retrieval from a broadcaster archive</article-title>
          .
          <source>In Proceedings of the 1st ACM International Conference on Multimedia Retrieval, page 56. ACM</source>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>M.</given-names>
            <surname>Halvey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Punitha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Hannah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Villa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Hopfgartner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Goyal</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J. M.</given-names>
            <surname>Jose</surname>
          </string-name>
          . Diversity, assortment, dissimilarity, variety
          <article-title>: A study of diversity measures using low level features for video retrieval</article-title>
          .
          <source>In Advances in Information Retrieval</source>
          , pages
          <volume>126</volume>
          {
          <fpage>137</fpage>
          . Springer,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>B.</given-names>
            <surname>Ionescu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. L. G nsca</given-names>
            , B.
            <surname>Boteanu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Popescu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lupu</surname>
          </string-name>
          , and
          <string-name>
            <given-names>H.</given-names>
            <surname>Mu</surname>
          </string-name>
          <article-title>ller. Retrieving diverse social images at mediaeval 2015: Challenge, dataset and evaluation</article-title>
          . In MediaEval 2015 Workshop, Wurzen, Germany,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>B.</given-names>
            <surname>Ionescu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Popescu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lupu</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. L. G</surname>
          </string-name>
          ^nsca,
          <string-name>
            <given-names>B.</given-names>
            <surname>Boteanu</surname>
          </string-name>
          , and
          <string-name>
            <given-names>H.</given-names>
            <surname>Mu</surname>
          </string-name>
          <article-title>ller. Div150cred: A social image retrieval result diversi cation with user tagging credibility dataset</article-title>
          .
          <source>ACM Multimedia Systems-MMSys</source>
          , Portland, Oregon, USA,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>P.</given-names>
            <surname>Isola</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Xiao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Parikh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Torralba</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Oliva</surname>
          </string-name>
          .
          <article-title>What makes a photograph memorable? Pattern Analysis and Machine Intelligence</article-title>
          , IEEE Transactions on,
          <volume>36</volume>
          (
          <issue>7</issue>
          ):
          <volume>1469</volume>
          {
          <fpage>1482</fpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Jia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Shelhamer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Donahue</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Karayev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Long</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Girshick</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Guadarrama</surname>
          </string-name>
          , and T. Darrell. Ca e:
          <article-title>Convolutional architecture for fast feature embedding</article-title>
          .
          <source>In Proceedings of the ACM International Conference on Multimedia</source>
          , pages
          <volume>675</volume>
          {
          <fpage>678</fpage>
          . ACM,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>A.</given-names>
            <surname>Krizhevsky</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Sutskever</surname>
          </string-name>
          , and
          <string-name>
            <given-names>G. E.</given-names>
            <surname>Hinton</surname>
          </string-name>
          .
          <article-title>Imagenet classi cation with deep convolutional neural networks</article-title>
          .
          <source>In Advances in neural information processing systems</source>
          , pages
          <volume>1097</volume>
          {
          <fpage>1105</fpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>Y.</given-names>
            <surname>LeCun</surname>
          </string-name>
          , L. Bottou,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Bengio</surname>
          </string-name>
          , and
          <string-name>
            <surname>P.</surname>
          </string-name>
          <article-title>Ha ner. Gradient-based learning applied to document recognition</article-title>
          .
          <source>Proceedings of the IEEE</source>
          ,
          <volume>86</volume>
          (
          <issue>11</issue>
          ):
          <volume>2278</volume>
          {
          <fpage>2324</fpage>
          ,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>A. S.</given-names>
            <surname>Razavian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Azizpour</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Sullivan</surname>
          </string-name>
          , and
          <string-name>
            <surname>S. Carlsson.</surname>
          </string-name>
          <article-title>Cnn features o -the-shelf: an astounding baseline for recognition</article-title>
          .
          <source>In Computer Vision and Pattern Recognition Workshops (CVPRW)</source>
          ,
          <source>2014 IEEE Conference on</source>
          , pages
          <volume>512</volume>
          {
          <fpage>519</fpage>
          . IEEE,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>G.</given-names>
            <surname>Salton</surname>
          </string-name>
          and
          <string-name>
            <given-names>C.</given-names>
            <surname>Buckley</surname>
          </string-name>
          .
          <article-title>Term-weighting approaches in automatic text retrieval</article-title>
          .
          <source>Information processing &amp; management</source>
          ,
          <volume>24</volume>
          (
          <issue>5</issue>
          ):
          <volume>513</volume>
          {
          <fpage>523</fpage>
          ,
          <year>1988</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>M.</given-names>
            <surname>Soleymani</surname>
          </string-name>
          .
          <article-title>The quest for visual interest</article-title>
          .
          <source>In Proceedings of the ACM International Conference on Multimedia. ACM</source>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>K.</given-names>
            <surname>Song</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Tian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Gao</surname>
          </string-name>
          , and
          <string-name>
            <given-names>T.</given-names>
            <surname>Huang</surname>
          </string-name>
          .
          <article-title>Diversifying the image retrieval results</article-title>
          .
          <source>In Proceedings of the 14th annual ACM international conference on Multimedia</source>
          , pages
          <volume>707</volume>
          {
          <fpage>710</fpage>
          . ACM,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>E.</given-names>
            <surname>Spyromitros-Xiou s</surname>
          </string-name>
          , S. Papadopoulos,
          <string-name>
            <given-names>A. L.</given-names>
            <surname>Ginsca</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Popescu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Kompatsiaris</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and I.</given-names>
            <surname>Vlahavas</surname>
          </string-name>
          .
          <article-title>Improving diversity in image search via supervised relevance scoring</article-title>
          .
          <source>In Proceedings of the 5th ACM on International Conference on Multimedia Retrieval</source>
          , pages
          <volume>323</volume>
          {
          <fpage>330</fpage>
          . ACM,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <surname>R. H. van Leuken</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Garcia</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          <string-name>
            <surname>Olivares</surname>
          </string-name>
          , and R. van Zwol.
          <article-title>Visual diversi cation of image search results</article-title>
          .
          <source>In Proceedings of the 18th international conference on World wide web</source>
          , pages
          <volume>341</volume>
          {
          <fpage>350</fpage>
          . ACM,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>R.</given-names>
            <surname>Van Zwol</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Murdock</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. Garcia</given-names>
            <surname>Pueyo</surname>
          </string-name>
          , and
          <string-name>
            <given-names>G.</given-names>
            <surname>Ramirez</surname>
          </string-name>
          .
          <article-title>Diversifying image search with user generated content</article-title>
          .
          <source>In Proceedings of the 1st ACM international conference on Multimedia information retrieval</source>
          , pages
          <volume>67</volume>
          {
          <fpage>74</fpage>
          . ACM,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>B.</given-names>
            <surname>Vandersmissen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Tomar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Godin</surname>
          </string-name>
          , W. De Neve, and R. Van de Walle. Ghent university-iminds at
          <article-title>mediaeval 2014 diverse images: Adaptive clustering with deep features</article-title>
          .
          <source>In MediaEval</source>
          <year>2014</year>
          , Workshop,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>B.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Lapedriza</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Xiao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Torralba</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Oliva</surname>
          </string-name>
          .
          <article-title>Learning deep features for scene recognition using places database</article-title>
          .
          <source>In Advances in Neural Information Processing Systems</source>
          , pages
          <fpage>487</fpage>
          {
          <fpage>495</fpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>J.</given-names>
            <surname>Zobel</surname>
          </string-name>
          and
          <string-name>
            <surname>A.</surname>
          </string-name>
          <article-title>Mo at. Exploring the Similarity Space</article-title>
          .
          <source>ACM SIGIR Forum</source>
          ,
          <volume>32</volume>
          (
          <issue>1</issue>
          ):
          <volume>18</volume>
          {
          <fpage>34</fpage>
          ,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>