<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>TUW @ MediaEval 2015 Retrieving Diverse Social Images Task</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Serwah Sabetghadam, João Palotti, Navid Rekabsaz, Mihai Lupu, Allan Hanbury Favoriten Strasse 9-11/188 Vienna University of Technology Vienna</institution>
          ,
          <country country="AT">Austria</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2015</year>
      </pub-date>
      <fpage>14</fpage>
      <lpage>15</lpage>
      <abstract>
        <p>This paper describes the contributions of Vienna University of Technology (TUW) to the MediaEval 2015 Retrieving Diverse Social Images challenge. Our approach consists of 3 phases: (1) Precision-oriented-phase: in which we focus only on the relevance of the documents; (2) Recall-orientedphase: in which we focus only on the diversity aspect; (3) Merging phase: in which we explore ways to nd a balance between the relevance and diversity factors. We use two fusion methods for this last part. Our best run reached a F1@20 of 0.582.</p>
      </abstract>
      <kwd-group>
        <kwd>Fusion</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. INTRODUCTION</title>
      <p>
        Result diversi cation has recently attracted much
attention in the IR community. Often, the information need
requested by the users cannot be found by displaying items
related only to one facet of the query topic. Ideally an IR
system displays pieces of information covering diverse
subtopics of the query. The same idea has been used in the
Recommender System area, where diversi cation techniques
has shown to increase user satisfaction [
        <xref ref-type="bibr" rid="ref10 ref9">9, 10</xref>
        ].
      </p>
      <p>
        This paper describes the second participation of our team
at MediaEval Retrieving Diverse Social Images task [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. We
build our solution upon our previous participation [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. Last
year, we had good results in both precision and recall, but
in separated runs. Therefore, we decided to explore di erent
strategies for better fusing our individual runs.
      </p>
    </sec>
    <sec id="sec-2">
      <title>METHODS</title>
      <p>We leveraged a distinct set of methods for each run. We
show the combinations used for each run in Table 1.</p>
      <p>
        Regarding the experience of the previous year [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], we use
only textual features (i.e. title, tag, description) for
nding the relevant documents. We extend the usual
termfrequency-based methods to more semantic-based approach
[
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. We create word embeddings using the Wikipedia corpus
with 400 dimension by Word2Vec method [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. We calculate
the similarity between the query and the text documents
(concatanation of title, tag, and description) using the
SimGreedy method [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
2.2
      </p>
      <p>
        To nd diverse images we experiment with di erent
clustering methods. From [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] we learned that an approach based
on ensemble of clusters can perform better than using only
one single clustering method. We also learned that a
preltering step can potentially remove irrelevant images that
harm the process of clustering creation. Here we brie y
comment on these two aspects:
      </p>
      <p>
        Pre-Filtering: We use hand-coded rules previously shown
to perform well in this task, to exclude probably irrelevant
pictures [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. We exclude pictures based on three rules:
without any views, geo-tagged 8km away from the POI, or with
description length greater than 2000 characters.
      </p>
      <p>Clustering solution: The basic idea is that, given a
clustering algorithm A, a feature set F that describes an
image and a distance measure Di, we can create a cluster
set C = (A; F; Di). For example, C1 can be the result of
applying K-Means (A) using the Color Histogram of the
images (F ), based on the cosine distance (Di): C1 =
(KMeans, ColorHistogram, Cosine).</p>
      <p>
        A common strategy used by a number of teams in 2013
was to go one by one of the clusters made in C1 and pick the
"best\ image from each cluster to form the nal ranked list.
We noticed that small di erences, for example having C2
= (K-means, NeuralNetworkFeatures, Cosine), could have a
large impact in the clusters formed, consequently strongly
in uencing the nal ranked list. As described in [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], our
solution is to use the development set to learn what are
the best clustering algorithm, features sets, and distance
measures. After that, we combine the results of di erent
Cs and count the frequency that any two images end up in
the same cluster. Based on this simple frequency, we
rerank the initial Flickr list (Run 1) or the list generate by
the algorithm in Section 2.1 (Run 3).
2.3
      </p>
      <p>
        Atrey et al. [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] performed a survey on fusion methods
of combining multiple modalities. In their view, there are
three category of methods for fusion: rule-based methods,
classi cation-based methods and estimation-based methods.
Our approach is inspired from these fusion methods for
combining relevancy and diversity results. We leverage the
weighted linear method from the rst category, and Bayesian
inference from the second category.
      </p>
      <p>
        Weighted Linear: We use the optimization technique
proposed by Deselaers et al. [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] based on weighted linear
fusion. Having the relevance of the query to each
document (R) and also the diversi cation measure for each set
of documents (D), we formulate the diversi cation issue as
Run
0.8137
0.8203
0.7804
0.8157
0.8137
0.7232
0.8188
0.7928
0.7507
0.7261
      </p>
      <p>2015 Development Set
0.2873
0.2188
0.2748
0.2867
0.2873
0.4189
0.3389
0.4014
0.4184
0.4189
0.2904
0.2589
0.3237
0.3062
0.2907
0.4045
0.3772
0.4443
0.4206
0.4053
0.7817
0.798
0.7546
0.7782
0.7804
Run
0.7201
0.7842
0.7633
0.7345
0.7216
0.2908
0.2576
0.3163
0.3005
0.2906</p>
      <p>Fusion
Linear Fusion
Bayesian Fusion
2015 Test Set
an optimization problem where one tries to maximize the
linear combination of these two values.</p>
      <p>U (Sjq) = w R(Sjq) + (1 w) D(S) (1)
where U denotes the score for the selected set S regarding
to the query q, and w is a parameter which controls the
importance of relevance and diversity. The parameter w is
tuned using the development set.</p>
      <p>
        Bayesian Inference: In this method the information is
combined based on the rules of the probability theory [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
The probability of a hypothesis H of diversi cation is:
      </p>
      <p>P (HjR; D) = 1=2P (DjH)wd P (RjH)wr (2)
where wd and wr are weights given to diversity and
relevancy results.</p>
    </sec>
    <sec id="sec-3">
      <title>EXPERIMENTS</title>
      <p>
        We submitted 5 runs, varying on the use of relevancy
results, pre- ltering, di erent clustering algorithms, and
fusion methods. Details of the run con gurations are shown
in Table 1. Run 1 is based on pure diversity results using
image features. Run 2 uses only text information, we apply
Word2Vec [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] semantic similarity. In Run 3, the input of
diversity algorithm is the Run 2 ranked results. In this run,
we leverage both modalities of text and image similarity in
clustering the images. In the Run 4 and Run 5 we use two
fusion methods of weighted linear and Bayesian reference on
Run 1 (diversity) and Run 2 (relevancy) results.
      </p>
      <p>Based on our development tests, we expected the Run 1
and Run 5 to achieve better results according to the F1
measure (Table 2). However, based on the test set results, we
observe that the Run 3 obtains the best value for F1@10
with 0.43 and F1@20 with 0.57. One reason could be the
multi-concept queries in the test runs. It shows that the
semantic text similarity result (Run 2) as input to the
clustering algorithms (Run 3) improved the F1 measure by 4%.
We receive the best precision (0.82) in the Run 2 which is
purely based on text similarity results.</p>
      <p>
        In the experiments of this year, we added two runs based
on fusion of relevancy and diversity results. In the
development tests we reached the optimum weighting of 0:2 R +
0:8 D for both methods. Although we obtained better result
with Bayesian inference approach in the development tests,
with the test data, weighted linear fusion has the second
place in F1@20 measure. This con rms the approach that
Deselaers et al. [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] used in the score combination. However,
Bayesian inference is usually used on classi cation results,
which may explain why in our case the linear combination
performed better on the test data.
      </p>
      <p>In Table 3 we show separate results for single and
multiconcept topics. We observe the same order of results here.
The Run 3 keeps the best value of F1@20 and Run 2 the
highest result in P@20.
4.</p>
    </sec>
    <sec id="sec-4">
      <title>CONCLUSION</title>
      <p>Our experiments show that the cluster ensemble with
input of relevancy results (Run 3) provides robust results for
this task. The input of this run was our relevancy results
based on text semantic similarity results. This demonstrates
that the combination of text similarity and diversity
approach leads to higher F1@20 value. This year we added
two fusion methods of weighted linear and Bayesian
inference. Their results were indistinguishable on the devset, but
the weighted linear fusion outperfomed the Bayesian on the
testset.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>P. K.</given-names>
            <surname>Atrey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Hossain</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>El Saddik</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M. S.</given-names>
            <surname>Kankanhalli</surname>
          </string-name>
          .
          <article-title>Multimodal fusion for multimedia analysis: a survey</article-title>
          .
          <source>Multimedia systems</source>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>T.</given-names>
            <surname>Deselaers</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Gass</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Dreuw</surname>
          </string-name>
          , and
          <string-name>
            <given-names>H.</given-names>
            <surname>Ney</surname>
          </string-name>
          .
          <article-title>Jointly optimising relevance and diversity in image retrieval</article-title>
          .
          <source>In Proceedings of the ACM international conference on image and video retrieval</source>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>B.</given-names>
            <surname>Ionescu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ginsca</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Boteanu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Popescu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lupu</surname>
          </string-name>
          , and H. Muller.
          <source>Retrieving Diverse Social Images at MediaEval</source>
          <year>2015</year>
          :
          <article-title>Challenge, Dataset and</article-title>
          <string-name>
            <surname>Evaluation.</surname>
          </string-name>
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>N.</given-names>
            <surname>Jain</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Hare</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Samangooei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Preston</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Davies</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Dupplaw</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P. H.</given-names>
            <surname>Lewis</surname>
          </string-name>
          .
          <article-title>Experiments in diversifying ickr result sets</article-title>
          .
          <source>In MediaEval</source>
          <year>2013</year>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>R. C.</given-names>
            <surname>Luo</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.-C. Yih</surname>
            , and
            <given-names>K. L.</given-names>
          </string-name>
          <string-name>
            <surname>Su</surname>
          </string-name>
          .
          <article-title>Multisensor fusion and integration: approaches, applications, and future research directions</article-title>
          .
          <source>Sensors Journal, IEEE</source>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>T.</given-names>
            <surname>Mikolov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Chen</surname>
          </string-name>
          , G. Corrado, and
          <string-name>
            <given-names>J.</given-names>
            <surname>Dean</surname>
          </string-name>
          .
          <article-title>E cient estimation of word representations in vector space</article-title>
          .
          <source>arXiv preprint arXiv:1301.3781</source>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>J.</given-names>
            <surname>Palotti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Rekabsaz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lupu</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Hanbury</surname>
          </string-name>
          .
          <article-title>Tuw@ retrieving diverse social images task 2014</article-title>
          . In MediaEval,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>N.</given-names>
            <surname>Rekabsaz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Bierig</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Ionescu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Hanbury</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Lupu</surname>
          </string-name>
          .
          <article-title>On the use of statistical semantics for metadata-based social image retrieval</article-title>
          .
          <source>In CBMI</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>E.</given-names>
            <surname>Vee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>U.</given-names>
            <surname>Srivastava</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Shanmugasundaram</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Bhat</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S. A.</given-names>
            <surname>Yahia</surname>
          </string-name>
          .
          <article-title>E cient computation of diverse query results</article-title>
          .
          <source>In Data Engineering</source>
          ,
          <year>2008</year>
          .
          <article-title>ICDE 2008</article-title>
          . IEEE 24th International Conference on,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>C.-N. Ziegler</surname>
            ,
            <given-names>S. M.</given-names>
          </string-name>
          <string-name>
            <surname>McNee</surname>
            ,
            <given-names>J. A.</given-names>
          </string-name>
          <string-name>
            <surname>Konstan</surname>
            , and
            <given-names>G.</given-names>
          </string-name>
          <string-name>
            <surname>Lausen</surname>
          </string-name>
          .
          <article-title>Improving recommendation lists through topic diversi cation</article-title>
          .
          <source>In Proceedings of the 14th international conference on World Wide Web</source>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>