<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Testing a Method for Statistical Image Classification in Image Retrieval</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Christoph Rasche</string-name>
          <email>rasche15@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Universitatea Politehnica din Bucuresti Bucuresti 061071, RO</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>We continued to test our image classification methodology in the photo-annotation task of the ImageCLEF competition [Nowak et al., 2011] using a visual-only approach performing automated labeling - however with little algorithmic improvement as compared to last year. Our labeling process consisted of three phases: 1) feature extraction using color and structural description; 2) classification using Linear Discriminant (LD), which provided the confidence (scalar) values; 3) postprocessing by eliminating labels (setting binary values to 0) on the testing set thereby exploiting the calculated joint-probabilities for pairs of concepts from the training set. Our conclusions remain the same as last year: our approach provides reasonable, fast image classification. Our approach this year is essentially the same as last year and we participated to fine-tune some minor algorithmic details. We employ a massive structural descriptor extraction process, which is based on partitioning contour segments followed by relating the segments to form more complex descriptors. Although this structural description is very promising in essence, we have used it only in a statistical manner so far (because we have not developed the appropriate learning algorithm for the structural classification yet). Even though our statistical classification performance ranks mid-range only, there are some advantages to it over the other, more successful classification approaches. The majority of the other (last year's) approaches use SIFT features for representation and those features appear to be useful for texture identification (see also Torralba et al. 2005 for arguments); matching is done feature-by-feature and is relatively costly. In our approach in contrast, matching is done between image vectors, and is thus resonably fast; but our approach is costly at the feature extraction stage.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>The main novelty of the presented approach is the use of a decomposition
of structure as introduced in Rasche [Rasche, 2010]. The decomposition output
is particularly suited to represent the geometry of contours and the geometry
of their relations (pairs or clusters of contours), but it is applied here only in
a statstical form for reason of simplicity, together with a color histogramming
approach as described in Vertan et al. [Vertan and Boujemaa, 2000b]. This
statistical classification has already been shown to be useful for video indexing
[Ionescu et al., 2010].</p>
      <p>Looking at the provided photo annotations we realized that the spatial size
of the annotated object or scene can vary substantially in reference to the image
size: an annotation can describe either the image content as a whole and is thus
suitable for (semantic) image classification, or it can describe a part of a scene
(e.g isolated objects) and is thus rather suited for object-detection systems. A
clear distinction between whole or part annotation is difficult of course. A typical
recognition system is specialized for one process, either image classification or
object detection. Our methodology is geared toward image classification and
therefore is limitedly useful for ’part’ annotations.
2
2.1</p>
    </sec>
    <sec id="sec-2">
      <title>Method</title>
      <sec id="sec-2-1">
        <title>Feature Extraction</title>
        <p>Color and Texture Characterization The classical histogram image content
description approach was further refined by the classification of the image pixels
in several classes, according to a local attribute (such as the edge strength). We
can easily imagine a classification in three classes, consisting of pixels
characterized by a small, medium and high edge strength. The number of classes is thus
related to the number of quantization level of the pixel attribute’s. At the limit,
since every pixel has acquired a supplementary, highly relevant characteristic, we
can easily imagine a one pixel per class approach, which will certainly provide a
very accurate description of the image, but will require a very important size.</p>
        <p>In order to keep the balance between the histogram size and the
discrimination between pixels we propose to adaptively weight the contribution of each
pixel of the image into the color distribution [Vertan and Boujemaa, 2000b]. This
individual weighting allows a finer distinction between pixels having the same
color and the construction of a weighted histogram that accounts both color
distribution and statistical non-uniformity measures. Thus, we used a modified
histogram as explained in [Vertan and Boujemaa, 2000b,Vertan and Boujemaa, 2000a].
In short: colors are uniformly quantized with 6 bins per RGB color component,
yielding a 216 components feature vector per image.</p>
        <p>
          Structure Characterization Images were downsampled to a maximum size
of 300 pixels for any side length (width or height) to decrease computation time.
The structural processing started with contour extraction
          <xref ref-type="bibr" rid="ref1">([Canny, 1986])</xref>
          at 4
different scales (sigma=1,2,3 and 5). Contours were then partitioned and
represented as described in
          <xref ref-type="bibr" rid="ref4">(Rasche 2010)</xref>
          leading to 7 geometric and 5 appearance
parameters for each contour segment (arc, ’wiggliness’, curvature, circularity,
edginess, symmetry, contrast, ’fuzziness’). Contour segments are then paired and
clustered leading to another 58 parameters describing various distance
measurements (between segments end and center points) and structural biases (degree
of parallelism, T feature, L feature,...), see [Rasche, 2010] for details. For each
parameter a 10-bin histogram is generated; the histograms are then
concatenated to form a single vector of 700 dimensions. The average processing time for
structural processing is ca. 40 seconds per image on a 2.6 GHz machine.
Integration of Color and Structure The color and structural parameters
are then concatenated to a single image vector with ca. 916 dimensions (ca. 700
structural and 216 color parameters).
2.2
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>Classification</title>
        <p>- LDA: A Linear Discriminant Analysis was applied to train a one-versus-all
classifier for each of the 93 concepts (on the 8000 training images). This
resulted in an average number of 19.3 labels per photo (average number of labels
per training image: 11.9). The posterior values of the classifier are provided as
confidence values.</p>
        <p>Last year we also used a weighted average retrieval rank (ARR), but skipped
it this year for lack of time unfortunately, although it was the better performing
classifier in our runs last year.
2.3</p>
      </sec>
      <sec id="sec-2-3">
        <title>Postprocessing - Label Elimination</title>
        <p>Because the LDA method (see above) returned a much larger proportion of
labels for the testing set (19.3 labels/image) than for the training set (11.9), we
attempted to reduce the number of labels by eliminating unlikely labels based on
the joint-probabilities observed in the training set. Within the training set, we
determined which pairs appeared as mutual exclusive (joint probability equal
0). If a testing image contained a pair of labels that are mutual exclusive in
the training set, then the one label (of the pair) was eliminated that showed
a lower posterior value (obtained from the LDA classifier) in reference to the
entire distribution of posterior values for each concept.
2.4</p>
      </sec>
      <sec id="sec-2-4">
        <title>Runs</title>
        <p>3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Results</title>
      <p>Both of our submitted runs had exactly the same preprocessing. The
classification differed only by its postprocessing: one run had no post-processing, the other
run was with postprocessing (with the label-elimination as explained above).
The postprocessing did surprisingly not produce different results (or are the
performance measures based on the [provided] scalar values only? That would
explain the lack in difference).</p>
      <p>Overall, the performance was worse in comparison to other approaches than
last year, most likely because we did not exploit our other better-performing
classifier (based on ranking).</p>
    </sec>
    <sec id="sec-4">
      <title>Discussion</title>
      <p>Although we applied our structural decomposition in a statistical manner only
and on down-scaled image resolutions, it achieved already a performance
comparable to other approaches. But in the future we should rather participate in
the competition with a ’part-based’ approach.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <source>[Canny</source>
          , 1986] Canny,
          <string-name>
            <surname>J.</surname>
          </string-name>
          (
          <year>1986</year>
          ).
          <article-title>A computational approach to edge-detection</article-title>
          .
          <source>IEEE Transactions on Pattern Analysis and Machine Intelligence</source>
          ,
          <volume>8</volume>
          (
          <issue>6</issue>
          ):
          <fpage>679</fpage>
          -
          <lpage>698</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [Ionescu et al.,
          <year>2010</year>
          ] Ionescu,
          <string-name>
            <given-names>B.</given-names>
            ,
            <surname>Rasche</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Vertan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            , and
            <surname>Lambert</surname>
          </string-name>
          ,
          <string-name>
            <surname>P.</surname>
          </string-name>
          (
          <year>2010</year>
          ).
          <article-title>A contour-color-action approach to automatic classification of several common video genres</article-title>
          .
          <source>In AMR 8th International Workshop on Adaptive Multimedia Retrieval. Linz</source>
          , Austria.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [Nowak et al.,
          <year>2011</year>
          ] Nowak,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Nagel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            , and
            <surname>Liebetrau</surname>
          </string-name>
          ,
          <string-name>
            <surname>J.</surname>
          </string-name>
          (
          <year>2011</year>
          ).
          <article-title>The clef 2011 photo annotation and concept-based retrieval tasks</article-title>
          .
          <source>In CLEF 2011 working notes</source>
          , Amsterdam, NL.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <source>[Rasche</source>
          , 2010] Rasche,
          <string-name>
            <surname>C.</surname>
          </string-name>
          (
          <year>2010</year>
          ).
          <article-title>An approach to the parameterization of structure for fast categorization</article-title>
          .
          <source>International Journal of Computer Vision</source>
          ,
          <volume>87</volume>
          :
          <fpage>337</fpage>
          -
          <lpage>356</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [Torralba et al.,
          <year>2008</year>
          ] Torralba,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Fergus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            , and
            <surname>Freeman</surname>
          </string-name>
          , W. T. (
          <year>2008</year>
          ).
          <article-title>80 million tiny images: a large dataset for non-parametric object and scene recognition</article-title>
          .
          <source>IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI)</source>
          <year>2008</year>
          ,
          <volume>30</volume>
          (
          <issue>11</issue>
          ):
          <fpage>1958</fpage>
          -
          <lpage>1970</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <source>[Vertan and Boujemaa</source>
          , 2000a]
          <string-name>
            <surname>Vertan</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Boujemaa</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          (
          <year>2000a</year>
          ).
          <article-title>Spatially constrained color distributions for image indexing</article-title>
          .
          <source>In Proc. of CGIP</source>
          <year>2000</year>
          , pages
          <fpage>261</fpage>
          -
          <lpage>265</lpage>
          ,
          <string-name>
            <surname>Saint</surname>
            <given-names>Etienne</given-names>
          </string-name>
          , France.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <source>[Vertan and Boujemaa</source>
          , 2000b]
          <string-name>
            <surname>Vertan</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Boujemaa</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          (
          <year>2000b</year>
          ).
          <article-title>Upgrading color distributions for image retrieval: Can we do better</article-title>
          ? In Laurini, R., editor,
          <source>Advances in Visual Information Systems</source>
          , volume
          <volume>1929</volume>
          of Lectures Notes in Computer Science LNCS, chapter , pages
          <fpage>178</fpage>
          -
          <lpage>188</lpage>
          . Springer Verlag, Berlin, Germany.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>