Brainsignals Submission to Plant Identification Task at
                  ImageCLEF 2012

                                        Cristian Grozea1

                               Fraunhofer FOKUS, Berlin Germany
                                 cristian.grozea@brainsignals.de


         Abstract This article describes our first participation to the plant identification
         task of ImageCLEF. We have approached it as intended with the least possible
         computer-vision related techniques (especially without costly image preprocess-
         ing and feature extraction) and as much as possible machine learning techniques
         (random forests). Our method is purely visual and entirely automatic, using only
         the image information. One should mention that the total time spent with prepar-
         ing this submission was only about one week. The results were accordingly fairly
         poor, but probably usable for mobile applications, as the whole process of pre-
         processing and classification is very fast. We propose a slight modification to the
         Otsu algorithm, which makes it better suitable for the scan-like images.


1     Introduction
Our intention with this participation to ImageCLEF, the plant identification task, was
to test our hypothesis that good machine learning techniques paired with rudimentary
computer vision preprocessing could lead to fair/good results.


2     Methods
2.1    Preprocessing
The images are converted to grayscale, resized to 30% and then binarized with a slightly
modified Otsu algorithm. Our algorithm gives unequal weights to the variances of the
two classes when looking for the best split in the grays values space. We used that to
tune the split towards being more inclusive with the leaf pixels class, which we have
empirically found it works better than the standard Otsu algorithm on the scan-like
images. This choice seemed to have paid back, as in the final results in the competition
the performance of our algoritm degrades less on the scan-like images versus the one on
the scan images than that of the algorithms of some other teams - which has the effect
of pushing it to higher ranks.

Features Two types of features are extracted (for a total of 344 features):
    – Local features: histograms of content types of small rectangular patches of bits,
      with and without sub-sampling.
    – Global shape features from lateral projections: histograms of the number of b/w
      alternations on every pixel line/column.
The features we extract are thought to be extremely quick to compute and still give a
good description of the content of the binarized image.
     While the second class of features is self-explanatory, the first class needs perhaps
some more detail. Let us assume a small window, say a square of 2 by 2 pixels. When
this window is moved over the image, it contains several patches of 2x2 pixels. The
content of those patches can be linearized as a 4 bit vector and has only 16 possible
types of content, which are the binary representations of the numbers from 0 to 15.
It is then natural and easy to compute the distribution (histogram) of those numbers.
The same can be repeated for non-square windows, say of size 2x3, 3x2 or 3x3 and
even generalized to binary masks where the bits collected are farther away from each
other, disposed or not in a regular-step grid. Attempting to capture local and non-local
topologies, we have used masks with distance between pixels of up to 30 pixels, but
always few possible values for the selected content (sparse masks).
     The discriminative power of those features, at least for some of the classes, can be
observed in a plot of the data after dimensionality reduction by PCA in Figure 1.

2.2    Classification
A random
     √ forest classifier [1], with 100 trees and standard number of selected features
(1 + m) is trained on the training data, where m is the total number of features. The
model obtained is then applied on the test data.


3     Results
The 5-fold cross-validated classification accuracy on the training data set was 70%,
which is very good for so many classes (126). It could be that the performance can be
increased even more by using more trees in the random forest and/or by using weighting
or balancing techniques to balance the samples similar to the scoring function.
    Although the final test performance, as measured with the complicated balanced er-
ror function was certainly not up to the best (scans: 0.25, scans-like: 0.22, photographs:
0.05, average: 0,17), the method seemed to perform fine on the training data, failing
most of the time only on images for which, after resizing and binarization, it was diffi-
cult for a human as well to spot that the class of the image should be different than the
class of the random member of the wrongly misclassified-to class, as shown in Figure 2


4     Conclusion
While our solution was not performing up to the best ones in this competition, it is fast,
accurate enough for non-critical applications and has potential for improvement. We
intend to proceed developing this solution further and plan a mobile phone implemen-
tation of it.
                Figure 1. Class clustering in the space of the selected features


References
1. Breiman, L.: Random forests. Machine learning 45(1), 5–32 (2001)
Figure 2. Example of mis-classification; humans have a hard time as well trying to distinguish
those two.