<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>The University of Amsterdam's Concept Detection System at ImageCLEF 2010</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Koen E. A. van de Sande</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Theo Gevers</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Intelligent Systems Lab Amsterdam (ISLA), University of Amsterdam</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>Our group within the University of Amsterdam participated in the large-scale visual concept detection task of ImageCLEF 2010. The submissions from our visual concept detection system have resulted in the best visual-only run in the per-concept evaluation. In the per-image evaluation, it achieves the highest score in terms of example-based F-measure across all types of runs.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        Our group within the University of Amsterdam participated in the large-scale
visual concept detection task of ImageCLEF 2010. The Large-Scale Visual
Concept Detection Task [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] evaluates visual concept detectors. The concepts used
are from the personal photo album domain: beach holidays, snow, plants,
indoor, mountains, still-life, small group of people, portrait. For more information
on the dataset and concepts used, see the overview paper [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Our participation
last year, in ImageCLEF 2009, focussed on increasing the robustness of the
individual concept detectors based on the bag-of-words approach, and less on the
per-image evaluation.
      </p>
      <p>Last years experiments [3{8] emphasize in particular the role of visual
sampling, the value of color invariant features, the in uence of codebook
construction, and the e ectiveness of kernel-based learning parameters. This was
successful, resulting in the top ranking for the large-scale visual concept detection
task in terms of both EER and AUC. Both these measures do a per-concept
evaluation. The per-image evaluation based on the ontology score suggested that
the assignment of concept tags to images leaves room for improvement.
Therefore, for this year, we focus on the per-image evaluation. The primary evaluation
metric used in 2010 for the per-image evaluation is the average example-based
F-measure.</p>
    </sec>
    <sec id="sec-2">
      <title>Concept Detection System</title>
      <p>
        Our concept detection system is an improved version of last years system [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. For
the ImageCLEF book [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], we have performed additional experiments [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] which
give insight into the e ect of di erent sampling methods, color descriptors and
spatial pyramid levels within the bag-of-words model. Two of our runs this
year correspond exactly to Harris-Laplace and dense sampling every 6 pixels
(multi-scale) with 4-SIFT and Harris-Laplace and dense sampling every pixel
(single-scale) with 4-SIFT from this book chapter [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. These runs were also
submitted to ImageCLEF@ICPR 2010. Please refer to the cited papers1 for
implementation details of the system.
      </p>
      <p>To achieve better results in the per-image evaluation, where we need to
perform a binary assignment of a tag to an image, we have modi ed the probabilistic
output of the SVM. We have disabled Platts conversion method to probabilities,
and instead use the distance to the decision boundary. The decision boundary
lies at 0, positives are trained to lie at 1 and negatives are trained to lie at
-1. In a cross-validation experiment, we have found a threshold of -0.3 to be
good for most concepts: the default threshold of 0 would be too conservative
when evaluating with an example-based F-measure where precision and recall
are weighted equally. Optimizing the threshold on a per-concept basis instead
of a single threshold was found to be less stable.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Submitted Runs</title>
      <p>We have submitted ve di erent runs. All runs use both Harris-Laplace and
dense sampling with the SVM classi er. We do not use the EXIF metadata
provided for the photos nor the provided text tags.</p>
      <p>
        Harris-Laplace and dense sampling every 6 pixels (multi-scale)
with 4-SIFT: from ImageCLEF 2009 [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], ImageCLEF@ICPR 2010 and
the ImageCLEF book [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>
        Harris-Laplace and dense sampling every pixel (single-scale) with
4-SIFT: from ImageCLEF@ICPR 2010 and the ImageCLEF book [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
Harris-Laplace and dense sampling every 6 pixels (multi-scale)
with 4-SIFT plus soft assignment and multiple kernel learning:
try to optimize the soft assignment parameters from [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] with multiple
kernel learning.
mkl-bothdenseallharris-4sift-plus: includes improved color
descriptors which have not yet been published.
mkl-mixed-mixed: includes improved color descriptors which have not
yet been published.
      </p>
      <p>1Papers available from http://www.colordescriptors.com</p>
    </sec>
    <sec id="sec-4">
      <title>Evaluation Per Concept</title>
      <p>In table 1, the overall scores for the evaluation of concept detectors are shown.
The features with sampling at every pixel instead of every 6 pixels perform
better (0.4026 versus 0.3963), which is similar to the result obtained in
ImageCLEF@ICPR 2010. Optimizing the parameters of soft assignment using
Multiple Kernel Learning did not have the desired e ect. A possible
explanation is that the slack parameter for MKL was set to 1, whereas the normal
SVM runs optimize this parameter and tend to select 10 as a good slack
setting. The two nal runs perform better than the two `baseline' runs from the
ImageCLEF@ICPR 2010. However, the color descriptors present in these two
runs have not yet been documented.</p>
      <p>Compared to other ImageCLEF participants, our runs are the best
visualonly submissions. However, combinations of text and visual methods do get
a higher overall AP. For concepts like Birthday and Party, an attached tag
with the words party or birthday implies presence of that concept, whereas the
visual presence might be more ambiguous. The best visual+text method scores
0.4553, compared to 0.4073 for our best visual-only run and 0.2338 for the best
text-only run.
5</p>
    </sec>
    <sec id="sec-5">
      <title>Evaluation Per Image</title>
      <p>For the per-image evaluation, overall results are shown in table 2. Our emphasis
on optimizing the threshold for tag assignment has resulted in the best overall
run in terms of example-based F-measure, i.e. this visual-only run outperforms
visual+text methods.</p>
    </sec>
    <sec id="sec-6">
      <title>Conclusion</title>
      <p>The submissions from our visual concept detection system in the ImageCLEF
2010 large-scale visual concept detection task have resulted in the best
visualonly run in the per-concept evaluation. In the per-image evaluation, it achieves
the highest score in terms of example-based F-measure across all types of runs.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>H.</given-names>
            <surname>Mueller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Clough</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Deselaers</surname>
          </string-name>
          , and
          <string-name>
            <given-names>B.</given-names>
            <surname>Caputo</surname>
          </string-name>
          . ImageCLEF, volume
          <volume>32</volume>
          of Lecture Notes in Computer Science: The Information Retrieval Series. Springer,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>S.</given-names>
            <surname>Nowak</surname>
          </string-name>
          and
          <string-name>
            <given-names>M.</given-names>
            <surname>Huiskes</surname>
          </string-name>
          .
          <article-title>New strategies for image annotation: Overview of the photo annotation task at imageclef 2010</article-title>
          .
          <source>In Working Notes of CLEF</source>
          <year>2010</year>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>C. G. M.</given-names>
            <surname>Snoek</surname>
          </string-name>
          ,
          <string-name>
            <surname>K. E. A. van de Sande</surname>
            , O. de Rooij,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Huurnink</surname>
            ,
            <given-names>J. R. R.</given-names>
          </string-name>
          <string-name>
            <surname>Uijlings</surname>
          </string-name>
          , M. van
          <string-name>
            <surname>Liempt</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Bugalho</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          <string-name>
            <surname>Trancoso</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Yan</surname>
            ,
            <given-names>M. A.</given-names>
          </string-name>
          <string-name>
            <surname>Tahir</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Mikolajczyk</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Kittler</surname>
            , M. de Rijke,
            <given-names>J. M.</given-names>
          </string-name>
          <string-name>
            <surname>Geusebroek</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Gevers</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Worring</surname>
            ,
            <given-names>D. C.</given-names>
          </string-name>
          <string-name>
            <surname>Koelma</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A. W. M.</given-names>
            <surname>Smeulders</surname>
          </string-name>
          .
          <article-title>The MediaMill TRECVID 2009 semantic video search engine</article-title>
          .
          <source>In Proceedings of the TRECVID Workshop</source>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>J. R. R.</given-names>
            <surname>Uijlings</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. W. M.</given-names>
            <surname>Smeulders</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R. J. H.</given-names>
            <surname>Scha</surname>
          </string-name>
          .
          <article-title>Real-time bagof-words, approximately</article-title>
          .
          <source>In ACM International Conference on Image and Video Retrieval</source>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>K. E. A. van de Sande</surname>
            and
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Gevers</surname>
          </string-name>
          . University of Amsterdam at the Visual Concept Detection and Annotation Tasks, chapter
          <volume>18</volume>
          , pages
          <fpage>343</fpage>
          {
          <fpage>358</fpage>
          . Volume
          <volume>32</volume>
          of The Information Retrieval Series: ImageCLEF [1],
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>K. E. A. van de Sande</surname>
          </string-name>
          , T. Gevers,
          <article-title>and</article-title>
          <string-name>
            <given-names>A. W. M.</given-names>
            <surname>Smeulders</surname>
          </string-name>
          .
          <article-title>The university of amsterdam's concept detection system at imageclef 2009</article-title>
          .
          <source>In Multilingual Information Access Evaluation Vol. II Multimedia Experiments: Proceedings of the 10th Workshop of the Cross{Language Evaluation Forum (CLEF</source>
          <year>2009</year>
          ),
          <source>Revised Selected Papers, Lecture Notes in Computer Science</source>
          . Springer,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>K. E. A. van de Sande</surname>
            , T. Gevers, and
            <given-names>C. G. M.</given-names>
          </string-name>
          <string-name>
            <surname>Snoek</surname>
          </string-name>
          .
          <article-title>Color descriptors for object category recognition</article-title>
          .
          <source>In European Conference on Color in Graphics, Imaging and Vision</source>
          , pages
          <volume>378</volume>
          {
          <fpage>381</fpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>K. E. A. van de Sande</surname>
            , T. Gevers, and
            <given-names>C. G. M.</given-names>
          </string-name>
          <string-name>
            <surname>Snoek</surname>
          </string-name>
          .
          <article-title>Evaluating color descriptors for object and scene recognition</article-title>
          .
          <source>IEEE Transactions on Pattern Analysis and Machine Intelligence</source>
          ,
          <volume>32</volume>
          (
          <issue>9</issue>
          ):
          <volume>1582</volume>
          {
          <fpage>1596</fpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>J. C.</given-names>
            van
            <surname>Gemert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. J.</given-names>
            <surname>Veenman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. W. M.</given-names>
            <surname>Smeulders</surname>
          </string-name>
          , and J.
          <string-name>
            <surname>-M. Geusebroek</surname>
          </string-name>
          .
          <article-title>Visual word ambiguity</article-title>
          .
          <source>IEEE Transactions on Pattern Analysis and Machine Intelligence</source>
          ,
          <volume>32</volume>
          (
          <issue>7</issue>
          ):
          <volume>1271</volume>
          {
          <fpage>1283</fpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>