=Paper=
{{Paper
|id=Vol-1458/E04_CRC17_Chudyk
|storemode=property
|title=Development of an Automatic Pollen
Classification System Using Shape, Texture and Aperture Features
|pdfUrl=https://ceur-ws.org/Vol-1458/E04_CRC17_Chudyk.pdf
|volume=Vol-1458
|dblpUrl=https://dblp.org/rec/conf/lwa/ChudykCLYB15
}}
==Development of an Automatic Pollen
Classification System Using Shape, Texture and Aperture Features==
<pdf width="1500px">https://ceur-ws.org/Vol-1458/E04_CRC17_Chudyk.pdf</pdf>
<pre>
     Development of an Automatic Pollen
Classification System Using Shape, Texture and
                Aperture Features

 Celeste Chudyk1 , Hugo Castaneda2 , Romain Leger2 , Islem Yahiaoui2 , Frank
                                 Boochs1
              1
               i3mainz, University of Applied Sciences Mainz, Germany
                      {celeste.chudyk,boochs}@hs-mainz.de
            2
              Dijon Institute of Technology, Burgundy University, France
    {hugo.castaneda,romain.leger,islem.yahiaoui}@iut-dijon.u-bourgogne.fr


        Abstract. Automatic detection and classification of pollen species has
        value for use inside of palynologic allergen studies. Traditional labeling
        of different pollen species requires an expert biologist to classify particles
        by sight, and is therefore time-consuming and expensive. Here, an auto-
        matic process is developed which segments the particle contour and uses
        the extracted features for the classification process. We consider shape
        features, texture features and aperture features and analyze which are
        useful. The texture features analyzed include: Gabor Filters, Fast Fourier
        Transform, Local Binary Patterns, Histogram of Oriented Gradients, and
        Haralick features. We have streamlined the process into one code base,
        and developed multithreading functionality to decrease the processing
        time for large datasets.

        Keywords: Image processing, Machine learning, Pollen, Texture clas-
        sification


1     Introduction

Currently, pollen count information is usually limited to generalizing all pollen
types with no access to information regarding particular species. In order to
differentiate species, typically a trained palynologist would have to manually
count samples using a microscope. Advances in image processing and machine
learning enable the development of an automatic system that, given a digital
image from a bright-field microscope, can automatically detect and describe the
species of pollen particles present.
    We build upon previous work from within our lab which has planned the
structure for a complete personal pollen tracker [6]. For image classification,
    Copyright © 2015 by the paper’s authors. Copying permitted only for private and
    academic purposes. In: R. Bergmann, S. Görg, G. Müller (Eds.): Proceedings of
    the LWA 2015 Workshops: KDML, FGWM, IR, and FGDB. Trier, Germany, 7.-9.
    October 2015, published at http://ceur-ws.org


                                           65
preliminary results have shown that extraction of both shape features and aper-
ture features lead to useful results [5]. To expand on this research, we have built
a software process that not only considers shape and aperture features, but also
adds multiple texture features. The range of tested image types has also been
greatly expanded in order to build a model capable of classifying a highly variable
dataset.


2   Overview

The steps for our process are as follows: 1. Image acquisition and particle seg-
mentation, 2. Feature extraction, and 3. Classification.
    Our process begins with scanning glass slides of the various pollen species
with a digital microscope, then segmenting these images to gather samples of
individual pollen particles. These images are then further segmented to iden-
tify the pollen boundary, and the area within this boundary is used for feature
extraction. 18 shape features, texture features including the Fast Fourier Trans-
form, Local Binary Patterns, Histogram of Oriented Gradients, and Haralick
features, as well as aperture features are used. These features are then trained
using supervised learning to build a model for the 5 pollen species sampled. The
model is then tested with ten-fold cross validation. The process is illustrated in
figure 1.


             Fig. 1. Pollen image acquisition and classification process


                                      66
3    Image Acquisition and Particle Segmentation
Five different species (Alder, Birch, Hazel, Mugwort, and Sweet Grass) have
been stained and prepared on glass slides for use with a common digital bright-
field microscope. In order to build a robust model, all species had sample images
derived from three distinct laboratory slides (using a total of 600 sample images
obtained from 15 different slides).


Fig. 2. Diverse image types, all example data used for training the Alder class of pollen.
Access to the complete dataset can be found at http://dx.doi.org/10.5072/dans-zpr-
rjm6.


    For particle segmentation, each digital image is processed in order to locate
and segment out a confining square surrounding a pollen particle. First, a me-
dian blur and Gauassian blur are applied to a negative of the image in order to
remove smaller particles that are background noise (often dirt or imperfections
on the background). Next, a threshold is applied to the image, using the OTSU
algorithm to automatically detect the histogram peak. The returned image is an
optimized binary image. A second set of filters is then applied using morpho-
logical operators (iterations of erosions and dilations) to fill in the particle area.
Finally, the image is converted to have a white background in preparation for
further processing steps.
    A blob detection algorithm is now applied in order to extract a small image
surrounding each particle. This algorithm is based on four attributes – Area,
Circularity, Convexity and Inertia Ratio, with parameters for “minimum” and
“maximum” values for each. By setting the parameters for the expected charac-
teristics of pollen grains, the smaller images are then found and extracted.
    The last filter used on the resulting images of particles is depicted in Figure
3. Because the pollen grains settle into the slide adhesive at different depths,
some particles will be out of focus. These blurry images will provide insufficient
data especially concerning texture features, therefore we remove them from our
analysis. A blur detection algorithm was developed and applied to each image: a
Laplacian filter set to a manually determined threshold value determines which
images are too blurry and removed from further processing steps.
    Lastly, the contour surrounding each pollen particle is identified, using OpenCV’s
findContours() method.


                                         67
                         Fig. 3. Blur detection example


4     Feature Extraction
4.1    Shape features
We have used 18 shape features already identified to be useful through previous
iterations of our research [5]. The 18 selected were based on the research of
developing an identification process for the Urticaceae family of pollen [11], as
well as research into developing universal shape descriptors [1].
    Shape features used:

Perimeter (P ) Length of contour given by OpenCV’s arcLength() function
Area (A) Number of pixels contained inside the contour
Roundness (R) 4πA
                P2
Compactness R1
Roundness/Circularity
      √
                        Ratio (RC) Another measure of roundness, see [9]
      P −√P 2 −4πA
      P + P 2 −4πA
Mean Distance (S̄) Average of the distance between the center of gravity and
   the contour
Minimum Distance (Smin ) Smallest distance between the center of gravity
   and the contour
Maximum Distance (Smax ) Longest distance between the center of gravity
   and the contour
Ratio1 (R1 ) Ratio of maximum distance to minimal distance Smax /Smin
Ratio2 (R2 ) Ratio of maximum distance to mean distance Smax /S̄
Ratio3 (R3 ) Ratio of minimum distance to mean distance Smin /S̄
Diameter (D) Longest distance between any two points along the contour
Radius Dispersion (RD) Standard deviation of the distances between the
   center of gravity and the contour
Holes (H) Sum of differences between the Maximum Distance and the distance
   between center of gravity and the contour
Euclidean Norm (EN2 ) Second Euclidean Norm
RMS Mean RMS mean size
Mean Distance to Boundary Average distance between every point within
   the area and the contour


                                     68
Complexity (F ) Shape complexity measure based on the ratio of the area and
  the mean distance to boundary

4.2   Texture feature extraction
A variety of texture features were selected due to their performance in prior re-
search [11,10,7,8]. The texture features extracted included: Gabor Filters (GF),
the Fast Fourier Transform (FFT), the Local Binary Pattern (LBP), the His-
togram of Oriented Gradients (HOG), and Haralick features.

Gabor Filters Gabor filters have been proven useful in image segmentation
and texture analysis [12]. The Gabor Filter function consists of the application
of 5 different size masks and 8 orientation masks (See Figure 4) in order to
produce output images. For each of the 40 resulting images, we calculate the
local energy over the entire image (the sum of the square of the gray-level pixel
intensity), and the mean amplitude (the sum of the amplitudes divided by the
total number of images). In addition to these 80 values, we also store the total
local energy for each of the 8 directions as well as the direction where the local
energy is at the maximum.


             Fig. 4. The 8 directions of the mask for the Gabor Filters


Fourier Transform Fourier Transforms translate an image from the spatial
domain into the frequency domain, and are useful because lower frequencies rep-
resent an area of an image with consistent intensity (relatively featureless areas)
and higher frequencies represent areas of change [2]. Just as in spatial analysis,
we cannot compare images directly, but first need to extract features. In the
frequency domain, we likewise extract useful information through analysis of
frequency peaks. Here, we apply a Fast Fourier Transform to the image, apply


                                      69
a logarithmic transformation, and create a graph of the resulting frequency do-
main. After taking the highest 10 frequency peaks, we compute the differences
between the peaks and store these values, as well as the mean of the differences
and the variance of the differences.

Haralick Features Haralick features [3] are determined by computations over
the GLCM (Grey-Level Co-Occurence Matrix). Here, we use: the angular sec-
ond moment, contrast, correlation, sum of squares: variance, inverse difference
moment, sum average, sum variance, sum entropy, entropy, difference variance,
difference entropy, measure of correlation 1, and measure of correlation 2. These
are 13 out of the 14 original features developed by Haralick: the 14th is typically
left out of computations due to uncertainty in the metric’s stability.

Histogram oriented gradient (HOG) The Histogram of Oriented Gradients
is calculated by first determining gradient values over a 3 by 3 Sobel mask. Next,
bins are created for the creation of cell histograms; here, 10 bins were used. The
gradient angles are divided into these bins, and the gradient magnitudes of the
pixel values are used to determine orientation. After normalization, the values
are flattened into one feature vector.

Local Binary Pattern (LBP) To obtain local binary patterns, a 3 by 3 pixel
window is moved over the image, and the value of the central pixel is compared
to the value of its neighbors. In the case that the neighbor is of lower value,
it is assigned a zero, and in the case of a higher value, a one. This string of
eight numbers (”00011101” for instance) is the determined local pattern. The
frequency of the occurrence of each pattern is used as the texture description.


                Fig. 5. Example output for Local Binary Patterns


Aperture Detection The number and type of apertures present on the pollen
surface is a typical feature used by palynologists in order to determine the pollen
species. Therefore, it seems useful to also build an automatic aperture detection
function in order to identify and count apertures as an addition feature set.
Preliminary work identifying apertures [4] has shown potential for this analysis.
First, a moving window segments the pollen image into smaller areas. Each


                                      70
          Fig. 6. Local Binary Pattern function applied to pollen image


smaller image is manually labeled as an aperture or not an aperture. Texture
features are extracted from these smaller images, including those through a Fast
Fourier Transform (FFT), Gabor Filters (GF), Local Binary Pattern (LBP),
Histogram of Oriented Gradients (HOG), and Haralick features. A supervised
learning process (through the use of support vector machines) then creates a
model for each of the four species expected to include apertures on the surface.
Once an unlabeled pollen image is given to be classified, the system again uses a
moving window to break up the image into subsections. These smaller sections
are then loaded into the generated model, and four values are returned for each
detected aperture, corresponding to the probability that the aperture is of type
Alder, Birch, Hazel, and Mugwort.


5   Classification

Once the shape, texture, and aperture features have been calculated, they are
added together into a csv file. A data set of 5 species with 40 sample pollen
images from 3 separate sample slides led to a total of 600 samples, each with
252 extracted features. A supervised learning process used this data for model
creation, which was then tested using ten-fold cross validation. Both support
vector machines and a random forest classifier showed promising (and very sim-
ilar results); for the results reported here a random forest classifier was used
due to faster processing on the larger data sets. The n-estimators parameter for
this method was set to a typical size of 100 (increasing this number did lead to
slightly improved results yet also dramatically increased processing times).


6   Results

Using a random forest classifier on a total of 600 samples (120 each for each
species) and 252 features, a model was generated with an accuracy of 87% ± 2%.


                                     71
                   Fig. 7. Window moving all over the pollen


           Fig. 8. Apertures detected by the program on a Birch pollen


Considering that the samples were intentionally selected for variability in their
appearance and background (See Figure 2), this is an indication of a robust,
reliable model that shows promise for expansion in the future to also include
datasets collected from an outdoor environment.
   The dataset was further modified into different versions in order to test the
results using only subsets of the features available.
   The above table shows the accuracies of the trained models. Using only the 18
shape features, an accuracy of 64% ± 3% was achieved, and adding texture infor-
mation either through Gabor Filters or Haralick features substantially improved
the results.


                                     72
                           Features              Accuracy
                           Shape features     64% ± 3%
                           Shape and Gabor    76% ± 2%
                           Shape and FFT      65% ± 2%
                           Shape and LBP      65% ± 3%
                           Shape and HOG      67% ± 2%
                           Shape and Haralick 87% ± 3%
                           Shape and Aperture 67% ± 2%


7   Conclusion

Through this research, we have tested an expanded sample set of 5 species of
pollen particles and used shape, texture and aperture features for use in classi-
fication. Use of all features led to an accuracy of 87% ± 2%. Through testing of
individual texture features in combination with shape features, it was found that
using only the shape and Haralick features resulted in an accuracy of 87% ± 3%.
Gabor Filters also proved to be a useful feature as seen through the improved
accuracy compared to using just the shape features alone. Surprisingly, the other
texture features as well as the aperture features did not result in significant accu-
racy gains. One next step of research would be to investigate under which exact
conditions certain texture features prove useful. In the case of the aperture fea-
tures, one known limitation is that the aperture types were trained on a more
limited dataset. Because the aperture detection process technique developed did
have positive results in determining correct aperture positions, it would be in-
teresting to retrain the aperture type on a wider dataset and see if this results in
a more useful set of extracted features. Furthermore, extending the dataset not
only beyond 600 images but especially to include more than three microscope
slides per species would test against possible overfitting to particular slide con-
ditions. Future research would also include application of this process to data
collected outside of a laboratory environment, as well as expansion to include
more pollen species.


References
 1. da Fontoura Costa, L., Cesar Jr., R.M.: Shape Classification and Analysis: Theory
    and Practice. CRC Press, Inc., Boca Raton, FL, USA, 2nd edn. (2009)
 2. Haas, N.Q.: Automated Pollen Image Classification. Master’s thesis, University of
    Tennessee (2011), http://trace.tennessee.edu/utk_gradthes/1113
 3. Haralick, R., Shanmugam, K., Dinstein, I.: Textural features for image classifica-
    tion. Systems, Man and Cybernetics, IEEE Transactions on SMC-3(6), 610–621
    (Nov 1973)
 4. Lozano-Vega, G., Benezeth, Y., Marzani, F., Boochs, F.: Classification of pollen
    apertures using bag of words. In: Petrosino, A. (ed.) Image Analysis and Processing
    – ICIAP 2013, Lecture Notes in Computer Science, vol. 8156, pp. 712–721. Springer
    Berlin Heidelberg (2013), http://dx.doi.org/10.1007/978-3-642-41181-6_72


                                       73
 5. Lozano-Vega, G., Benezeth, Y., Marzani, F., Boochs, F.: Analysis of relevant fea-
    tures for pollen classification. In: Iliadis, L., Maglogiannis, I., Papadopoulos, H.
    (eds.) Artificial Intelligence Applications and Innovations, IFIP Advances in In-
    formation and Communication Technology, vol. 436, pp. 395–404. Springer Berlin
    Heidelberg (2014), http://dx.doi.org/10.1007/978-3-662-44654-6_39
 6. Lozano Vega, G., Benezeth, Y., Uhler, M., Boochs, F., Marzani, F.: Sketch of an au-
    tomatic image based pollen detection system. In: 32. Wissenschaftlich-Technische
    Jahrestagung der DGPF. vol. 21, pp. 202–209. Potsdam, Germany (Mar 2012),
    https://hal.archives-ouvertes.fr/hal-00824014
 7. Maillard, P.: Comparing texture analysis methods through classification. Pho-
    togrammetric Engineering & Remote Sensing 69(4), 357–367 (2003), http://www.
    ingentaconnect.com/content/asprs/pers/2003/00000069/00000004/art00003
 8. Marcos, J.V., Nava, R., Cristóbal, G., Redondo, R., Escalante-Ramı́rez, B., Bueno,
    G., Déniz, O., González-Porto, A., Pardo, C., Chung, F.e.a.: Automated pollen
    identification using microscopic imaging and texture analysis. Micron 68, 36–46
    (2015)
 9. O’Higgins, P.: Methodological issues in the description of forms. In: Lestrel, P.E.
    (ed.) Fourier Descriptors and their Applications in Biology, pp. 74–105. Cam-
    bridge University Press (1997), http://dx.doi.org/10.1017/CBO9780511529870.
    005, cambridge Books Online
10. Redondo, R., Bueno, G., Chung, F., Nava, R., Marcos, J.V., Cristóbal, G.,
    Rodrı́guez, T., Gonzalez-Porto, A., Pardo, C., Déniz, O., Escalante-Ramı́rez, B.:
    Pollen segmentation and feature evaluation for automatic classification in bright-
    field microscopy. Computers and Electronics in Agriculture 110(0), 56 – 69 (2015),
    http://www.sciencedirect.com/science/article/pii/S0168169914002348
11. Rodriguez-Damian, M., Cernadas, E., Formella, A., Fernandez-Delgado, M., Sa-
    Otero, P.D.: Automatic detection and classification of grains of pollen based on
    shape and texture. IEEE Trans. Syst., Man, Cybern. C 36(4), 531–542 (jul 2006),
    http://dx.doi.org/10.1109/TSMCC.2005.855426
12. Zheng, D., Zhao, Y., Wang, J.: Features extraction using a gabor filter family.
    In: Hamza, M.H. (ed.) Proceedings of the 6th IASTED International Conference.
    pp. 139–144. Signal and Image Processing, Acta Press (2004), www.paper.edu.cn/
    scholar/downpaper/wangjiaxin-13


                                        74

</pre>