A Mission-Oriented Citizen Science Platform for Efficient
     Flower Classification Based on Combination of Feature
                           Descriptors

                      Andréa Britto Mattos                                                    Rogerio Schmidt Feris
                  Ricardo Guimarães Herrmann                                                  IBM TJ Watson Research
                      Kelly Kiyumi Shigeno                                                     rsferis@us.ibm.com
                            IBM Research - Brazil
         {abritto,rhermann,kshigeno}@br.ibm.com

ABSTRACT                                                                           analyzed for scientific purposes, has been active for genera-
This paper describes a citizen science system for flora mon-                       tions, dating back to 1900, when the Christmas Bird Count
itoring that employs a concept of missions, as well as an au-                      (CBC)1 project started. It was a form of mass collaboration
tomatic approach for flower species classification. The pro-                       citizen science project that used paper forms sent by post to
posed method is fast and suitable for use in mobile devices,                       the responsible society or research group of scientists who
as means to achieve and maintain high user engagement.                             requested the data.
Besides providing a web-based interface for visualization,                            Nevertheless, it is noticeable that the power of crowd-
the system allows the volunteers to use their smartphones                          sourced science is growing intensely nowadays and such pro-
as powerful sensors for collecting biodiversity data in a fast                     jects are becoming increasingly popular, even gaining the
and easy way.                                                                      attention of major news media. For instance, projects such
   The classification accuracy is increased by a preliminary                       as eBird2 and Galaxy Zoo3 are able to engage hundreds of
segmentation step that requires simple user interaction, us-                       thousands of volunteers and are being broadly used for sci-
ing a modified version of the GrabCut algorithm. The pro-                          entific and educational purposes.
posed classification method obtains good performance and                              The citizen science’s popularity boost is given by the fact
accuracy, by combining traditional color and texture fea-                          that data collection and analysis tasks became much eas-
tures together with carefully designed features, including a                       ier to address. Even simple and low cost smartphones are
robust shape descriptor to capture fine morphological struc-                       equipped with GPS and high-resolution cameras that allow
tures of the objects to be classified. A novel weighting tech-                     huge amounts of data to be collected with small effort. Also,
nique assigns different costs to each feature, taking into ac-                     the Internet brings together a high number of volunteers to
count the inter-class and intra-class variation between the                        work on this data remotely and simultaneously. However,
considered species.                                                                there are still difficulties that prevent the further growth of
   The method is tested on the popular Oxford Flower Da-                           this methodology.
taset, containing 102 categories and we achieve state-of-the-                         One challenge of citizen science projects is how to obtain
art accuracy while proposing a more efficient approach than                        high and constant engagement of the users. Although some
previous methods described in the literature.                                      volunteers are motivated by the scientific contribution on
                                                                                   its own, it is possible to resort to “gamification”, the use
                                                                                   of game elements in non-game contexts [7], to get others
Categories and Subject Descriptors                                                 to further engage with the community and contribute more
H.3 [Information Storage and Retrieval]: Online Infor-                             enthusiastically [8].
mation Services; I.4 [Image Processing and Computer                                   Project Noah4 , for instance, is a large scale project in
Vision]: Scene Analysis; I.5 [Pattern Recognition]: Ap-                            which every report is assigned to a specific mission, whose
plications                                                                         goal is to monitor certain types of flora or fauna classes in
                                                                                   a specific location. There are also many aspects that can
Keywords                                                                           increase user motivation, as observed in [28]. For instance,
                                                                                   the volunteers must have confidence that the collected data
Computer vision, fine-grained classification, flora classifica-                    is being used: therefore, they should have easy access to the
tion, citizen science                                                              visualization of the collected data. Also, it is important to
                                                                                   notice that some users are not willing to modify their daily
1.    INTRODUCTION                                                                 activities to report data, and providing training and mentor-
  Citizen science is not a new concept: the idea of conduct-                       ing for volunteers to increase their skills can be fundamental
ing research by citizens, gathering crowdsourced data that is                      for valid data registration.
                                                                                      This last point highlights an additional drawback of proje-

Copyright c by the paper’s authors. Copying permitted only for private             1
                                                                                     http://birds.audubon.org/christmas-bird-count
and academic purposes.                                                             2
                                                                                     http://www.ebird.org/
In: S. Vrochidis, K. Karatzas, A. Karpinnen, A. Joly (eds.): Proceedings of        3
the International Workshop on Environmental Multimedia Retrieval (EMR                http://www.galaxyzoo.org/
                                                                                   4
2014), Glasgow, UK, April 1, 2014, published at http://ceur-ws.org                   http://www.projectnoah.org/


                                                                              45
                                  Table 1: Review of citizen science projects in the botanical field.

      Project              Goal                      Reported data                Platform      Scope               Limitations
                                              Presence or absence of affected       Web,                         Requires training to
  Conker Tree
                     Plague monitoring         plants and severity of plague      Android,        UK         differentiate natural aging
    Science
                                                         damage                    iPhone                         and plague effect
                                                                                    Web,                         Requires training to
                  Monitor invasive plants     Presence of 14 types of invasive
 Plant Tracker                                                                    Android,        UK           differentiate native and
                      (non-native)                         plants
                                                                                   iPhone                           invasive plants
   The Great                                     Presence and quantity of                                       Requires training to
   Sunflower        Monitor pollinators           pollinators (bees and             Web           US           differentiate bees from
    Project                                     hummingbirds), plant type                                           wasps and flies
                    Monitor sweetclover          Phenology of blueberry,
                                                                                                              Requires repetitive work
      Melibee      (invasive) effect in the     cranberry and sweetclover,
                                                                                    Web         Alaska       and training to differentiate
      Project     pollination of blueberry     monitoring 5 plants during a
                                                                                                                the monitored species
                  and cranberry (native)              certain period
                                                                                                            Requires training to identify
  Wildflowers                                     Identify 99 species in a
                 Annual wildflowers count                                           Web           UK        99 types of flowers, location
    count                                      randomly chosen 1 km2 area
                                                                                                               chosen by the system
  Virbunum          Monitor plagues in                                                                      Requires training to identify
                                                Presence of infected plants         Web           NY
  Leaf Beetle       Viburnum species                                                                            Viburnum species
                  Study effects of climate    Presence and quantity of flowers
                                                                                   Web,                     Requires training to identify
   BudBurst          change on plant            and fruits (single or regular                     US
                                                                                   mobile                      the monitored flowers
                        phenology                         reports)
                                                                                                British       Only monitors significant
  E-Flora BC        Build flower catalog          Geo-located rare species          Web
                                                                                               Columbia              species
                                                                                    Web,
                   Build fauna and flora       Geo-located pictures within a
 Project Noah                                                                     Android,    Worldwide     Not all species are identified
                          catalog                        mission
                                                                                   iPhone
                                                                                                NY /
                                                                                                             Requires picture of a single
   LeafSnap         Build plant catalog          Geo-located leaf pictures         iPhone     Washington
                                                                                                             leaf in a white background
                                                                                                 DC


cts monitoring biodiversity: relying on the users having the                Regarding automatic classification, the LeafSnap project5
knowledge to classify, without further assistance, the speci-            offers a good differential with respect to other citizen science
mens being reported. This can be a difficult task when one               projects, since it runs a shape-based algorithm to classify
considers the large number of species that share similar mor-            plant species from their leaves automatically [15]. However,
phology. Misclassified inputs can compromise the environ-                the user needs to extract (i.e., cut) the leaf and place it on a
mental research, and the impossibility to assure the quality             white background for the segmentation algorithm to work.
of the analysis made by non-experts may be the bottleneck                   The segmentation method proposed by LeafSnap might
of such projects. In this regard, an algorithm to assist the             be considered harmful in an environmental aspect, once it
user would be very useful. A system that could automati-                 requires the extraction of the leaves from their natural sur-
cally recognize an input image with small effort, or at least            roundings. Also, lighting variations can affect the back-
retrieve a list of the most similar species, could make the user         ground color and interfere in the segmentation result. Fi-
feel more confident, perhaps even engaging a larger number               nally, it is hard to extend LeafSnap for additional domains
of volunteers, preventing errors and accelerating the iden-              other than leaf classification: besides taking shape infor-
tification process. If such a system could be deployed in a              mation as the only feature for classification, a segmenta-
mobile device, the benefits would be even greater, once the              tion method based on a fixed background is only feasible
classification could be executed at collection time.                     for static species, once it is clear that, when photographing
   To assist non-expert classification, automatic works in the           specimens that can move, there is no guarantee that they
flora field – which is the domain of focus of this paper –               will remain in place.
are getting good attention, and recent studies were able to                 Table 1 shows a list of various citizen science projects in
achieve good results in leaf and flower classification [1, 3,            the botanical field. To the best of our knowledge, these are
18, 19]. This popularity is evident as we see efforts such as            the most representative projects in this area. Their limita-
the recent Plant Identification Task, promoted by the widely             tions highlights the previously discussed issues regarding the
popular ImageCLEF challenge [9].                                         difficulty of species identification or adaptation of the vol-
   Besides robustness, automatic or semi-automatic methods               unteers routine, once the data upload requires a computer
can reduce data collection time. For instance, as indicated              and/or Internet connection.
by Zou and Nagy [33], for a dataset of 102 flower species,                  Our goal is to develop a system similar to LeafSnap, but
the time for a semi-automatic classification is much lower               focusing on flowers, and replacing the user effort for seg-
than that made by humans alone.                                          mentation by a simpler and less intrusive action taken in his
                                                                         own device. Since our study is part of the citizen science
                                                                         context, user interaction is accepted, once we assume there
1.1     Related Work                                                     is always a volunteer operating the system in his smartphone
                                                                         or tablet. Our platform is developed with the concern of be-
1.1.1     Citizen Science in the Flora Domain                            5
                                                                             http://leafsnap.com/


                                                                    46
ing attractive to users so it can reach the largest number of              Using segmented images, the authors are able to achieve
volunteers as possible. Therefore, our flower classification            a more robust classification for this dataset, though none of
algorithm must have high accuracy and also must be fast                 the considered works is suitable for real time. Nilsback [21,
enough so that it can run in the web browser and also in                22] applies a multi-kernel SVM classifier using four differ-
mobile devices. Besides, we use missions to engage and en-              ent features (color in HSV space, HOG, and SIFT in fore-
tertain the volunteers, following gamification principles. In           ground and boundary regions). Chai [5] proposes a robust
order to compare our classification results with a benchmark,           approach for co-segmentation (segmenting images with sim-
we use the challenging Oxford Flower Dataset6 .                         ilar background color distributions) and applies a SVM clas-
                                                                        sifier based on color features in the Lab space and SIFT in
1.1.2      Flower Classification                                        foreground region. Angelova [1, 16] proposes a segmenta-
   In the flower classification field, previous studies worked          tion approach, followed by the extraction of HOG features
on small datasets of 10 to 17 flower species [10, 14, 20, 25],          at four different levels, encoded using LLC. The encoded
achieving accuracy rates of 81% - 96%. Most of these works              features are subject to a max pooling and a SVM classifier
rely on contour analysis and SIFT features [17], which is               is used. They mention future efforts in making a real time
known to be a robust descriptor, but computationally inef-              approach.
ficient.
   Using larger datasets, consisting of 30 to 79 flower species,
                                                                        1.2 Contributions and Organization
accuracy rates vary from 63% - 94%, combining a variety                    Our main goal is to develop a citizen science application
of approaches such as histograms in HSV color space, con-               for flower monitoring whose collection is oriented by mis-
tour features, co-occurrence matrix (GLCM) for texture fea-             sions. Also, to assist the volunteers, we propose an algorithm
tures, RBF and probabilistic classification [6, 29, 30]. Qi             for classification of flower species that meets the following
[26] focuses on a fast environmental classification for flowers         requirements:
and leaves, using spatial co-occurrence features and a linear              (i) Has high accuracy: We propose a novel approach that
SVM, but accuracy is not reported.                                      relies on efficient histogram-based feature descriptors that
   Works on the Oxford Flower Dataset, that contains 102                capture both global properties and fine shape information of
flower species, reported accuracy rates no greater than 80%.            objects. This is made while leveraging a learning-based dis-
The dataset was introduced by Nilsback and Zisserman[21]                tance measure to properly weight the feature contributions,
and is used by many authors. It is considered an extremely              which is a critical step for increasing accuracy as demon-
challenging dataset, because it contains pictures with severe           strated in previous studies [4, 31, 32].
variations in illumination and viewpoint, and also, due to                 (ii) Is fast: Mobile-based applications have a challenge re-
small inter-class variations and large intra-class variations           garding connectivity: users can only submit requests and
between the considered flower species. See some examples                receive answers if their devices have access to the Internet.
of such cases in Fig. 1.                                                There are several locations in which this kind of service is
                                                                        not affordable or has poor quality, making image transfer-
                                                                        ence prohibitive. Our algorithm is fast enough to run di-
                                                                        rectly in the devices (without transferring information to a
                                                                        server), so the user does not have to rely upon a network
                                                                        connection and wait too long for a response, considering ad-
                                                                        ditional time latency for uploading the image and retrieving
     (a) Spear Thistle (left) and   (b) English Marigold (left)         the classification results.
     Artichoke (right).             and Barbeton Daisy (right).            This paper is structured as follows. In Section 2, we de-
                                                                        scribe the overall structure of our citizen science platform
                                                                        and, in Section 3, we describe the core approach for fine-
                                                                        grained classification. In Section 4, we show and discuss the
                                                                        obtained classification results on the considered dataset and
         (c) Spring Crocus.             (d) Bougainvillea.              our conclusions and future work are described in Section 5.

Figure 1: Samples of the Oxford Flower Dataset. On the                  2.   SYSTEM
top row, samples of different classes with small inter-class               The system is comprised of two main parts: a web portal,
variations. On the bottom, samples of the same class, with              organized toward data visualization and community organi-
large variations in color, texture and shape.                           zation, and a mobile application, aimed at data collection.
                                                                        Both components arrange user-contributed data around the
  Using the Oxford Dataset, studies working with unseg-                 concept of missions, which aggregate users with the common
mented images demonstrate lower accuracy in classification              goal of collecting structured and unstructured data about a
and require powerful classifiers, that are computationally ex-          certain class of observations.
pensive [2, 11]. Khan’s work [13] relies on color and shape,               The collected data is structured in a set of attributes,
using a SIFT based approach. Kanan [12] applies salience                whose pre-established values are filled in by the user and re-
maps to extract a high-dimensional image representation                 fer to the domain being registered. This structure allows us
that is used in a probabilistic classifier. Both works require          to provide query-by-example features, which helps to avoid
high computation time due to their feature extraction meth-             common input errors and establish a common vocabulary for
ods.                                                                    user-contributed data. The unstructured data, on the other
                                                                        hand, is comprised of images only, since we are dealing with
6
    http://www.robots.ox.ac.uk/~vgg/data/flowers/                       plants.


                                                                   47
     (a) Image upload      (b) Free-hand user marker (c) Classification of the        (d) Other results     (e) Screen for geo-analysis
                           (in white)                segmented input

Figure 2: System mobile interface for classification. The user uploads an image, inserts a marker for semi-automatic seg-
mentation of the input image and retrieves a scrollable screen with the top similar classes for the uploaded image, sorted by
highest probability. The observation is inserted in a map using the device’s location.


   Missions are created by users and can be set as public or            overall concept of the system is generic for accepting data
private, as determined by the mission’s owner, since some               belonging to other categories. Since observations can be
of them may contain sensitive geo-spatial data about obser-             arbitrary, we need to load the corresponding classifier that
vations belonging to endangered species. As the number of               was trained with the relevant data for the current category.
missions in the system may be rather large, the portal ranks
them according to their activity, to make it easier for users           3.    ALGORITHM
to engage with the rest of the community.                                 This section describes the proposed algorithm for flower
   It is currently possible to collect geo-referenced photos            recognition that is divided in segmentation and classification
with consumer-grade mobile devices, using their cameras                 steps. Both tasks are designed to be fast and precise and
and GPS receivers, as mentioned in Section 1. The steps                 are generic enough to be used for classifying categories other
the user has to go through in order to submit a contribution            than flowers.
are highlighted in Fig. 2. For the purpose of submitting a
report, the user first needs to join one mission. After the pic-        3.1    Segmentation
ture is taken, he must loosely delineate the region of interest            We use the GrabCut algorithm [27] for a semi-manual
in the image. This is used for a segmentation process that              segmentation using small user interaction. This method is
will be described int the next Section. This step can require           also used in Qi’s work [26] due to its good performance. In
user confirmation for proceeding with the next task, which              our method, instead of defining a bounding box and control
is the automatic classification of the segmented input. The             points, the user draws a free hand marker, which is more
system retrieves the top 5 matches that are more similar to             intuitive for a general user. The marker replaces the control
the uploaded image and the user can select the correct one              points for a more refined boundary in a faster interaction.
or browse within a list of all the registered classes.                  It must involve the whole object but does not have to be
   In situations where there is no connectivity, the algorithms         precise.
can be executed directly on the devices, as a significant range            A Gaussian Mixture Model is used to learn the pixel dis-
of current mobile processors have multiple cores and some               tribution in the background (outside the marker) and pos-
of them have vector processing instructions, which are al-              sible foreground regions and a graph is built from this pixel
ready supported by popular libraries for mobile platforms,              distribution. Edges’ weights are given according to pixel
such as OpenCV. After the report is submitted (and syn-                 similarity (a large difference in pixel color generates a low
chronized with the server, in situations of intermittent or no          weight). Finally, graph and image are segmented by a min-
connectivity), other users can view and bookmark this re-               cut algorithm. See an example in Fig. 2(b) of a user marker
port. The feedback is provided to the users as bookmarked               and corresponding segmented output in Fig. 2(c).
reports. Additionally, the user may fill in a form contain-                The advantages of this approach are its simplicity in a user
ing values for the observable raw morphological attributes              perspective, generation of good segmentations and that it is
of the sample. The set of supported attributes is defined               faster than the segmentation approaches proposed in Section
by the mission’s owner at the moment it is created, where               1.1.2, requiring average time of 1.5 second.
he can define the required fields according to the mission’s
needs. For the botanic field, some examples of attributes               3.2    Classification
are stem diameter, leaf thickness and so on.                              We compare multiple features (i.e. color, texture and
   In order to obtain more accurate classification results, we          shape-based features) with histogram matching, which is
let the user choose the category of the image he is uploading.          fast and invariant to rotation, scale and partial occlusion.
We currently only have trained classifiers for flowers, but the         For each class, a weight is assigned to each feature and


                                                                   48
 (a) Two classes that have similar color for (b) Species that can have many different col- (c) Two classes with similar shape and color,
 all samples, but variations in shape.       ors, while having a similar shape.            but with large variation in texture.

Figure 3: A few examples of intra-class and inter-class variation between features of different training samples of six of the
considered classes.


the segmented images are matched by comparing their his-                   elongated shapes are easily distinguished from round shapes.
tograms, using a kNN classifier. The difference between                    The angular descriptor computes the angles between p and
two histograms is computed by a metric based on the Bhat-                  its neighbors pi−1 and pi+1 , measuring the smoothness of
tacharyya distance that is described as follows.                           the contour. The proposed descriptor becomes powerful by
  Consider histograms H1 and H2 , with n bins. Let H1 (i)                  merging two simple features being able to represent, for in-
denote the ith bin element of H1 , i ∈ 1 . . . n. H2 (i) is defined        stance, petals length, density and symmetry.
analogously. The distance between H1 and H2 is measured
as:
                                                                           3.2.2    Metric Learning
                      v    Pn p                                               In order to achieve higher accuracy, we assign different
                                   H1 (i) × H2 (i)                         weights to the features when matching each class, as to con-
                      u
         d(H1 , H2 ) = 1 − s i=1
                      u
                                                                           template the situations described in Fig. 3. Our idea is to
                      u
                      u      Pn          Pn
                                H1 (i) ×    H2 (i)                         find which features are more discriminative taking into ac-
                      t
                                  i=1          i=1                         count each feature variation inside a same class, but also,
  Note that low scores indicate best matches. Our algorithm                with respect to the global variation of all classes. We learn,
compares histograms built on three features cues.                          for each class, one weight for each of the four feature de-
                                                                           scriptors. We estimate few weights due to the small number
3.2.1      Features                                                        of training samples per class.
                                                                              We consider N classes, each containing M training sam-
                                                                           ples, evaluated according to P features7 .
Color.                                                                              p
                                                                              Let Hi,j denote, for a class i, the histogram of a sample
  We build a histogram of the segmented image’s colors in                  j regarding feature p. We compute the mean histogram of a
the HSV space, which is known to be less sensitive to lighting             feature p for a class i as:
variations, using 30 bins for hue and 32 bins for saturation.
Image quantization is applied before computation and the
                                                                                                              M
histograms are normalized in order to be comparable with                                                      P      p
                                                                                                                    Hi,j
the proposed distance metric.                                                                            j=1
                                                                                                  H̄ip =
                                                                                                                 M
Texture.                                                                     Likewise, we define εpi as the mean distance per class i,
  The texture operator applied in the segmented image is                   regarding feature p, computing the distance from all the
LBP (Local Binary Pattern) [23], that is commonly used in                  training samples to the mean histogram. This help us to
real time applications due to its computational simplicity.                evaluate intra-class variations, once we are considering the
In LBP, pixels are represented as a binary number that is                  difference between all samples with respect to a histogram
product from thresholding its neighbors against it.                        that estimates the overall structure of the class.
  We use the extended version of LBP [24], that considers
a circular neighborhood with variable radius and is able to
                                                                                                       M
detect details in images with different scales. Our method                                             P        p
                                                                                                             d(Hi,j , H̄ip )
computes a single histogram with 255 bins, once the consid-                                            j=1
                                                                                               εpi =
ered textures are mainly uniform and spatial texture infor-                                                     M
mation did not improve the classification in our tests.                       Also, we compute the mean distance per feature p (εp ) as
                                                                           follows:
Shape.                                                                                                        N
  We propose the use of two simple and fast descriptors                                                       P
                                                                                                                    εpi
that, when combined, are able to represent different shape                                             p
                                                                                                       ε =
                                                                                                              i=1
characteristics. The shape contour is partitioned in 72 bins                                                 N
using fixed angles, resulting of m contour points. For each                and we use this information to estimate inter-class varia-
point pi , i ∈ 1 . . . m, we compute modular and angular de-               tions between all types of species regarding each considered
scriptors, and both vectors are later represented as separate              feature. Finally, the weight λ attributed to each of the con-
histograms.                                                                sidered features p in class i is given by:
  The modular descriptor is taken by computing the point
distance from the shape centroid, normalized by the major
distance dmax computed in this process. It measures the                    7
                                                                             In our tests, we use N = 102, M = 20 and P = 4, since we
contour relation according to the shape mass center, and                   consider two histograms for shape information.


                                                                      49
                                   (a) Input belonging to Bearded Iris (BI) class, with large variation in color.


                               (b) Input belonging to Bishop of Llandaff (BoL) class, with small variation in color.


                                 (c) Input belonging to Spring Crocus (SC) class, with large variation on texture.


                                (d) Input belonging to Blackberry Lily (BL) class, with small variation in texture.

Figure 4: Results for 4 different inputs. In each column, from left to right: input with user marker; segmented input and top
5 matches sorted by highest probability; respective weights λ(i, p) for the corresponding classes; 8 training samples of each
class. In the graphics, columns indicate angular, modular, color and texture features.


                                                                             we can see that the system is able to achieve a very high
                 1 − x(i, p)                                                 accuracy rate for a very challenging dataset.
     λ(i, p) =    P
                             , where x(i, p) = εpi − εp + min(εp )
                 P                                         p
                     x(i, p)                                                 Table 2: Algorithm’s accuracy when considering the top n
                 p=1                                                         matches for the test inputs.
and we are able to take into account intra-class and inter-
class variations between all training samples, estimating how                 Metric Learning        n=1         n=3      n=5     n = 10
each feature should be considered when evaluating each con-                         No              65.58%      83.13%   88.72%   94.11%
sidered class.                                                                      Yes             80.88%      92.45%   96.07%   98.33%
   After the weights computation, we employ a kNN classifier
in the test phase where the cost of matching an image I with                   The results also show that weighting features can im-
an image IC of class C is computed as:                                       prove classification significantly (+15.3%) and we are able
                                                                             to achieve the best results reported on the state-of-the-art
                                P
                               X   λ(C, p) × d(HI , HIC )                    for this dataset, in terms of accuracy, as seen in Table 3.
                 C(I, IC ) =
                               p=1
                                         maxi (εpi )
                                                                             Table 3: Accuracy comparison with previous works using
and select the classes with the lowest costs. We choose the                  the Oxford Dataset.
kNN classifier because it is robust and performs well with
training sets whose dimension is similar to the ones we are                                 Ito and Kubota [11]          53.9%
dealing with.                                                                           Nilsback and Zisserman [21]      72.8%
                                                                                              Khan et al. [13]           73.3%
4.     RESULTS                                                                            Kanan and Cottrell [12]        75.2%
   Our tests followed the specifications for using the Oxford                                   Nilsback [22]            76.3%
Dataset as a benchmark, considering 20 training samples per                                  Angelova et al. [2]         76.7%
class and computing the mean-per-class accuracy. In this                                       Chai et al. [5]           80.0%
dataset, the number of images in each class varies from 40                              Angelova and Shenghuo [16]       80.6%
to 258. Fig. 4 shows the top 5 results for 4 different inputs.                                      Ours                 80.8%
Every image is segmented as described in Section 3.1 and
the used weights are given as described in our metric learn-                   Efficiency can not be compared precisely, because this in-
ing method. Note how the weights affect the classification                   formation is not reported in all the previous works. However,
results.                                                                     our average time for extracting and matching all features
   Table 2 describes the algorithm’s accuracy for a variable                 proved to be 4 times faster than running SIFT in the im-
number of top matches, with and without using metric learn-                  ages’ foreground region. The baseline work of Angelova and
ing. Once our platform returns the top 5 most likely classes,                Shenghuo [16] takes about 5 seconds for segmentation and 2


                                                                        50
seconds for classification. Our classification runs in less than             classification. In IEEE International Conference on
a second, and there is a tradeoff for segmentation: ours is                  Computer Vision, ICCV’11, pages 2579–2586, 2011.
semi-automatic, but requires 1.5 seconds on average.                     [6] S.-Y. Cho. Content-based structural recognition for
  Finally, we replicated the training instances, and Table                   flower image classification. In IEEE Conference on
4 shows that our method is still efficient for much larger                   Industrial Electronics and Applications, ICIEA’12,
training sets. Approximate nearest-neighbor methods based                    pages 541–546, 2012.
on hashing or kd-trees could also be used for obtaining even             [7] S. Deterding, D. Dixon, R. Khaled, and L. Nacke.
higher efficiency.                                                           From game design elements to gamefulness: Defining
                                                                             ”gamification”. In Proceedings of the 15th
Table 4: Elapsed time (in seconds) for classifying a single                  International Academic MindTrek Conference:
input, with increasing number of training samples per class.                 Envisioning Future Media Environments, MindTrek
                                                                             ’11, pages 9–15, New York, NY, USA, 2011. ACM.
   # samples        20     40     60     100    200     1000             [8] S. Deterding, M. Sicart, L. Nacke, K. O’Hara, and
  Elapsed time     0.07   0.09   0.13    0.20   0.35     1.3                 D. Dixon. Gamification. using game-design elements
                                                                             in non-gaming contexts. In CHI ’11 Extended
                                                                             Abstracts on Human Factors in Computing Systems,
5.   CONCLUSIONS                                                             CHI EA ’11, pages 2425–2428, New York, NY, USA,
   This paper proposes a citizen science platform based on                   2011. ACM.
a novel approach for flower recognition. We introduce a                  [9] H. Goëau, A. Joly, P. Bonnet, V. Bakic,
strategy for comparing feature histograms for fine-grained                   D. Barthélémy, N. Boujemaa, and J.-F. Molino. The
classification, a robust shape descriptor and a metric learn-                ImageCLEF plant identification task 2013. In
ing approach that employs different weights to each feature,                 Proceedings of the 2Nd ACM International Workshop
that can improve classification accuracy significantly. Our                  on Multimedia Analysis for Ecological Data, MAED
algorithm is extremely fast, being suitable for offline mo-                  ’13, pages 23–28, New York, NY, USA, 2013. ACM.
bile applications and was able to outperform previous works             [10] R.-G. Huang, S.-H. Jin, Y.-L. Han, and K.-S. Hong.
using the popular Oxford Dataset.                                            Flower image recognition based on image rotation and
   Our system is organized around missions, which in gen-                    DIE. In International Conference on Digital Content,
eral will help us acquire more data due to gamification as-                  Multimedia Technology and its Applications, IDC’10,
pects. Also, besides engaging users, missions provide addi-                  pages 225–228, 2010.
tional data external to the image, that may be useful for               [11] S. Ito and S. Kubota. Object classification using
aiding classification in the future.                                         heterogeneous co-occurrence features. In European
   Other future works include testing our algorithm in other                 Conference on Computer Vision, ECCV’10, pages
species that contain large variations in the considered fea-                 27–30, 2010.
tures (color, texture and shape), such as butterflies, fishes,          [12] C. Kanan and G. Cottrell. Robust classification of
birds and so on, and evaluating more efficient classifiers.                  objects, faces, and flowers using natural image
   We will also evaluate our system with real users analyz-                  statistics. In IEEE Conference on Computer Vision
ing how their behavior is affected with gamification and                     and Pattern Recognition, CVPR’10, pages 2472–2479,
automatic classification techniques. Finally, we aim to use                  2010.
crowdsourcing for labeling training images, in order to build           [13] F. S. Khan, J. van de Weijer, A. D. Bagdanov, and
various datasets that should represent the local flora and                   M. Vanrell. Portmanteau vocabularies for multi-cue
fauna for diverse locations.                                                 image representation. In International Conference on
                                                                             Neural Information Processing Systems, 2011.
6.   REFERENCES                                                         [14] J.-H. Kim, R.-G. Huang, S.-H. Jin, and K.-S. Hong.
 [1] A. Angelova and S. Zhu. Efficient object detection and                  Mobile-based flower recognition system. In
     segmentation for fine-grained recognition. In 2013                      International Conference on Intelligent Information
     IEEE Conference on Computer Vision and Pattern                          Technology Application, IITA’09, pages 580–583, 2009.
     Recognition, CVPR’13, pages 811–818, 2013.                         [15] N. Kumar, P. N. Belhumeur, A. Biswas, D. W. Jacobs,
 [2] A. Angelova, S. Zhu, Y. Lin, J. Wong, and C. Shpecht.                   W. J. Kress, I. C. Lopez, and J. a. V. B. Soares.
     Development and deployment of a large-scale flower                      Leafsnap: A computer vision system for automatic
     recognition mobile app. In NEC Labs America                             plant species identification. In European Conference
     Technical Report, 2012.                                                 on Computer Vision, ECCV’12, pages 502–516, 2012.
 [3] E. Aptoula and B. Yanikoglu. Morphological features                [16] Y. Lin, S. Zhu, and A. Angelova. Image segmentation
     for leaf based plant recognition. In IEEE International                 for large-scale subcategory flower recognition. In IEEE
     Conference on Image Processing, ICIP ’13, pages                         Workshop on Applications of Computer Vision,
     1496–1499, 2013.                                                        WACV ’13, pages 39–45, 2013.
 [4] H. Cai, F. Yan, and K. Mikolajczyk. Learning weights               [17] D. Lowe. Object recognition from local scale-invariant
     for codebook in image classification and retrieval. In                  features. In International Conference on Computer
     Computer Vision and Pattern Recognition, 2010 IEEE                      Vision, ICCV ’99, pages 1150–1157, 1999.
     Conference on, CVPRâĂŹ10, pages 2320–2327, June                 [18] S. Mouine, I. Yahiaoui, and A. Verroust-Blondet.
     2010.                                                                   Plant species recognition using spatial correlation
 [5] Y. Chai, V. Lempitsky, and A. Zisserman. BiCoS: A                       between the leaf margin and the leaf salient points. In
     bi-level co-segmentation method for image                               IEEE International Conference on Image Processing,
                                                                             ICIP ’13, 2013.

                                                                   51
[19] S. Mouine, I. Yahiaoui, and A. Verroust-Blondet. A                   pages 1257–1258, 2012.
     shape-based approach for leaf classification using              [27] C. Rother, V. Kolmogorov, and A. Blake. “Grabcut”:
     multiscaletriangular representation. In ACM                          Interactive foreground extraction using iterated graph
     Conference on International Conference on                            cuts. In ACM SIGGRAPH, SIGGRAPH’04, pages
     Multimedia Retrieval, ICMR ’13, pages 127–134, 2013.                 309–314, 2004.
[20] M.-E. Nilsback and A. Zisserman. A visual vocabulary            [28] H. Roy, M. Pocock, C. Preston, D. Roy, J. Savage,
     for flower classification. In IEEE Computer Society                  J. Tweddle, and L. Robinson. Understanding citizen
     Conference on Computer Vision and Pattern                            science and environmental monitoring: final report on
     Recognition, CVPR’06, pages 1447–1454, 2006.                         behalf of UK Environmental Observation Framework.
[21] M.-E. Nilsback and A. Zisserman. Automated flower                    Technical report, Wallingford, NERC/Centre for
     classification over a large number of classes. In Indian             Ecology & Hydrology, 2012.
     Conference on Computer Vision, Graphics & Image                 [29] T. Saitoh, K. Aoki, and T. Kaneko. Automatic
     Processing, ICVGIP’08, pages 722–729, 2008.                          recognition of blooming flowers. In International
[22] M.-E. Nilsback and A. Zisserman. An automatic visual                 Conference on Pattern Recognition, ICPR’04, pages
     Flora - segmentation and classification of flower                    27–30, 2004.
     images. PhD thesis, University of Oxford, UK, 2009.             [30] F. Siraj, M. Salahuddin, and S. Yusof. Digital image
[23] T. Ojala, M. Pietikäinen, and D. Harwood. A                         classification for malaysian blooming flower. In
     comparative study of texture measures with                           International Conference on Computational
     classification based on featured distributions. Pattern              Intelligence, Modelling and Simulation, pages 33–38,
     Recognition, 29(1):51 – 59, 1996.                                    2010.
[24] T. Ojala, M. Pietikainen, and T. Maenpaa.                       [31] J. Wang, A. Kalousis, and A. Woznica. Parametric
     Multiresolution gray-scale and rotation invariant                    local metric learning for nearest neighbor
     texture classification with local binary patterns.                   classification. In Neural Information Processing
     Pattern Analysis and Machine Intelligence, IEEE                      Systems (NIPS), pages 1610–1618, 2012.
     Transactions on, 24(7):971–987, 2002.                           [32] K. Q. Weinberger and L. K. Saul. Distance metric
[25] W. Qi, X. Liu, and J. Zhao. Flower classification based              learning for large margin nearest neighbor
     on local and spatial visual cues. In IEEE International              classification. J. Mach. Learn. Res., 10:207–244, June
     Conference on Computer Science and Automation                        2009.
     Engineering, CSAE’12, pages 670–674, 2012.                      [33] J. Zou and G. Nagy. Evaluation of model-based
[26] X. Qi, R. Xiao, L. Zhang, C.-G. Li, and J. Guo. A                    interactive flower recognition. In International
     rapid flower/leaf recognition system. In ACM                         Conference on Pattern Recognition, ICPR ’04, 2004.
     International Conference on Multimedia, MM ’12,


                                                                52