Recognition of the Amazonian flora by Inception Networks with Test-time Class Prior Estimation CMP submission to PlantCLEF 2019 Lukáš Picek1 , Milan Šulc2 , and Jiří Matas2  1 Dept. of Cybernetics, Faculty of Applied Sciences, University of West Bohemia picekl@kky.zcu.cz 2 Visual Recognition Group, Faculty of Electrical Engineering, Czech Technical University in Prague {sulcmila,matas}@cmp.felk.cvut.cz Abstract. The paper describes an automatic system for recognition of 10,000 plant species, with focus on species from the Guiana shield and the Amazon rain forest. The proposed system achieves the best results on the PlantCLEF 2019 test set with 31.9% accuracy. Compared against human experts in plant recognition, the system performed better than 3 of the 5 participating human experts and achieved 41.0% accuracy on the subset for expert evaluation. The proposed system is based on the Inception-v4 and Inception-ResNet-v2 Convolutional Neural Network (CNN) architec- tures. Performance improvements were achieved by: adjusting the CNN predictions according to the estimated change of the class prior proba- bilities, replacing network parameters with their running averages, test- time data augmentation, filtering the provided training set and adding additional training images from GBIF. Keywords: Plant Recognition, Computer Vision, Convolutional Neu- ral Networks, Machine Learning, Class Prior Estimation, Fine-grained, Classification 1 Introduction The paper describes an automatic system for visual recognition of plants among 10,000 species, developed for the the PlantCLEF 2019 plant identification chal- lenge [4] organized in connection with the LifeCLEF 2019 workshop [5] at the Conference and Labs of the Evaluation Forum. Compared to previous Plant- CLEF challenges [1,2,3], which contained mainly species living in Europe and North America, the 2019 task is focused on the recognition of species from ”data deficient regions” - mainly the Guiana shield and the Amazon rain forest. Copyright © 2019 for this paper by its authors. Use permitted under Creative Com- mons License Attribution 4.0 International (CC BY 4.0). CLEF 2019, 9-12 Septem- ber 2019, Lugano, Switzerland. The proposed approach is based on CMP’s winning submission to PlantCLEF 2018 [11]. Checkpoints of our models from PlantCLEF 2018 have been shared with other participants of PlantCLEF 2019 in order to provide a good starting point to all participants. Fig. 1. Comparison of automatic plant recognition methods against human experts. The results of our method are shown in red as ”Post Challenge” (our results submitted at the challenge deadline, shown in orange, were wrongly exported). 2 Methodology 2.1 Cleaning and extending the training dataset The PlantCLEF 2019 training set covers 10,000 species and consists of: • PlantCLEF 2019 EOL: 72,260 images covering 4,197 classes from the Ency- clopedia of Life3 • PlantCLEF 2019 Google: 68,254 images covering 6,262 classes automatically retrieved by web search engines. • PlantCLEF 2019 Bing: 307,557 images covering 8,666 classes automatically retrieved by web search engines. The average number of images per specie decreased dramatically from Plant- CLEF 2018. One fifth of species contains less then 10 images and some of them contains only 1 image. 3 http://www.eol.org Fig. 2. Randomly selected images from the LifeCLEF 2019 training set (top) and test set(bottom). A brief manual inspection showed that the provided training set is afflicted with noisy samples - wrongly labeled images, including images of non-flora ob- jects. Examples of noisy samples are in Figure 3. We therefore decided to detect non-flora images by a pre-trained Darknet53 448x448 [8] classifier. Out of 428,702 images from the official training set, we removed 6,181 images detected as non- flora, After that our training data missed approximately 2000 classes, so we had to gather additional training images to fill that gap. We created a new train- ing set4 including external training data downloaded from GBIF5 , described in Table 1. Changes in the dataset statistics are visualized in Figure 4. Fig. 3. Randomly selected noisy images from the LifeCLEF 2019 training set. To make sure that none of the additional training images (or its resized or cropped versions) downloaded from GBIF appear in the test set, we used the image retrieval pipeline of Radenovic et al. [7] with VGG-16 and whitening. The 4 For full reproducibility, a list of removed samples as well as an archive with additional training images are shared at http://cmp.felk.cvut.cz/~sulcmila/ LifeCLEF2019/ 5 http://www.gbif.org/ 600 Original dataset Cleaned dataset 500 Cleaned & extended dataset 400 Number of images 300 200 100 0 0 2000 4000 6000 8000 10000 Class Fig. 4. Numbers of training images per class in the original dataset (blue), cleaned dataset (orange) and cleaned and extended (green), sorted for each dataset separately. Table 1. Training data (after cleaning and extending the provided training set) used in the experiments. Data Source Classes Non EOL classes Number of Images EOL 4197 0 58548 Noisy Google 6262 3800 64863 Noisy Bing 8666 5069 305291 GBIF (additional) 9402 5734 238009 All 9998 5801 666711 nearest neighbours of test images among the downloaded images are vizualized in Figure 5. 2.2 Convolutional Neural Networks The proposed system is based on two CNN architectures – Inception ResNet v2 and Inception v4 [12]. The TensorFlow-Slim API was used to adjust and fine-tune the networks from the publicly available6 PlantCLEF 2018 winning checkpoints. All networks in our experiments shared the optimizer settings enumerated in Table 2. The networks and their input resolutions are listed in Table 3. The following image pre-processing techniques were used for training: • Random image crop with aspect ratio range (0.75, 1.33) and content at least 80% of origin image. 6 http://cmp.felk.cvut.cz/~sulcmila/LifeCLEF2018/ Fig. 5. Six nearest couples of test set images (top) and GBIF images (bottom). Table 2. Optimizer hyper-parameters, common to all networks in the experiments. Parameter Value Batch size 32 Optimizer rmsprop RMSProp momentum 0.9 RMSProp decay 0.9 Initial learning rate 0.0075 Learning rate decay type Exponential (stairs) Learning rate decay factor 0.975 • Random left-right flip. • Brightness and saturation distortion. Table 3. Networks and hyper-parameters used in the experiments. # Net architecture Input Resolution 1 Inception v4 299 × 299 2 Inception v4 (second) 299 × 299 3 Inception v4 598 × 598 4 Inception ResNet v2 299 × 299 5 Inception ResNet v2 (second) 299 × 299 2.3 Test time data augmentation At test-time, 3 predictions per image are generated by using 3 crops: • 1x Full image, • 1x Central crop covering 80% of the original image, • 1x Central crop covering 60% of the original image. In submissions 4,5,6,7 the mirrored versions of all three crops were also evaluated. 2.4 Adjusting Class Priors at Test Time The training set data distribution is highly unbalanced and we can not guaran- tee that the test images were drawn from the same distribution: as described in Section 2.1, the training set comes from different sources, where the class frequencies may not correspond with the test-time priors. Following the notation of [10], the predictions p(ck |xi ) of a network trained on a dataset with class prior probabilities p(ck ) should be corrected in case of evaluation on a test set with different class priors pe (ck ): pe (ck ) p(ck |xi ) p(ck ) pe (ck ) pe (ck |xi ) = K ∝ p(ck |xi ) (1) P pe (cj ) p(ck ) p(cj |xi ) j=1 p(cj ) Since the test-time priors pe (cj ) are unknown, we propose three different estimates of adjusting the predictions: UNIFORM: As the simplest option, we adjust the test predictions by assuming a uniform prior for all classes. MLE: As the second option, we compute a Maximum Likelihood Estimate of the test time prior pe (ck ) using the EM algorithm of Saerens et al. [9], comprising of the following two steps: (s) pe (ck ) p(ck |xi ) p(ck ) E: p(s) e (ck |xi ) = (s) (2) K P pe (cj ) p(cj |xi ) j=1 p(cj ) N 1 X (s) M: p(s+1) e (ck ) = p (ck |xi ) (3) N i=1 e MAP: As the third option, we use the Maximum a Posteriori estimate proposed in [10]: PMAP = arg max p(P|(x1 , .., xN )) P N = arg max p(P) Y p(xi |P) P i=1 N (4) = arg max log p(P) + log p(xi |P) X P i=1 K s.t. X Pk = 1; ∀k : Pk ≥ 0 k=1 We model the prior knowledge about the categorical distribution pe (ck ) by the symmetric Dirichlet distribution: K 1 Y α−1 p(P) = Pk (5) B(α) k=1 Γ(α)K where the normalization factor for the symmetric case is B(α) = . As Γ(αK) in [10], we use α = 3. Fig. 6. Comparison of automatic plant recognition methods on the PlantCLEF 2019 test set. ”Post Challenge” submissions are marked with red border. 3 Results Table 4 describes eight final runs used for the evaluation. An ensemble of all five networks from Section 2.2 was used in all runs and predictions were averaged over all networks and all test image augmentations from Section 2.3. Table 4. Description of our (corrected/post-challenge) submissions. Run description Test accuracy Name Test-time augm. Prior est. Top1 Top1 Exp. Top5 All Top5 Exp. CMP Run 2 3×scale (none) 0,244 0,325 0,356 0,410 CMP Run 3 3×scale uniform 0,247 0,316 0,360 0,419 CMP Run 4 3×scale MAP 0,301 0,402 0,453 0,573 CMP Run 5 3×scale MLE 0,307 0,402 0,451 0,573 CMP Run 6 3×scale + mirrors (none) 0,311 0,402 0,454 0,538 CMP Run 7 3×scale + mirrors uniform 0,311 0,410 0,461 0,564 CMP Run 4* 3×scale + mirrors MAP 0,319 0,402 0,468 0,581 CMP Run 5* 3×scale + mirrors MLE 0,319 0,410 0,470 0,581 The evaluation results are shown in Figures 1,6. From the class prior esti- mation methods, MAP estimation with the Dirichlet hyperprior achieves the best results. This corresponds to the results of [10], where adding the hyperprior brought noticeable improvement over the MLE estimation, which may have a tendency to overfit. Note that the results from Table 4 are the official post- challenge evaluation not included in the challenge leaderboard, as our predictions were wrongly exported into the challenge run-files. 4 Conclusions The proposed system achieves the best accuracy on the PlantCLEF 2019 test set - 31.9% on the full set and 41.0% on the test subset for plant identification experts. The results show that even for ”data-deficient” plant species, automatic image recognition systems achieve human expert accuracy in visual recognition of plants: The proposed method performed better than 3 of the 5 participating experts in plant recognition. Although the results are promising, there are many opportunities for further improvement of automatic plant recognition systems for data-deficient species, such as one-shot learning and open long-tailed recognition [6] methods. The increasing precision of the automated plant recognition methods should allow for a better assistance to both nature lovers and biological experts in the fields. For example, showing a shortlist of potential species candidates can decrease the time needed for decision and potentially increase the recognition rate. Acknowledgements LP was supported by the UWB project No. SGS‐2019‐027. MŠ and JM were sup- ported by OP VVV project CZ.02.1.01/0.0/0.0/16019/000076 Research Center for Informatics. We’d like to thank Tomáš Jeníček for his assistance with the image retrieval pipeline in Section 2.1. References 1. Goëau, H., Bonnet, P., Joly, A.: Plant identification in an open-world (lifeclef 2016). In: CLEF working notes 2016 (2016) 2. Goëau, H., Bonnet, P., Joly, A.: Plant identification based on noisy web data: the amazing performance of deep learning (lifeclef 2017). CEUR Workshop Proceedings (2017) 3. Goëau, H., Bonnet, P., Joly, A.: Overview of expertlifeclef 2018: how far automated identification systems are from the best experts? In: CLEF working notes 2018 (2018) 4. Goëau, H., Bonnet, P., Joly, A.: Overview of lifeclef plant identification task 2019: diving into data deficient tropical countries. In: CLEF working notes 2019 (2019) 5. Joly, A., Goëau, H., Botella, C., Kahl, S., Servajean, M., Glotin, H., Bonnet, P., Vellinga, W.P., Planqué, R., Stöter, F.R., Müller, H.: Overview of lifeclef 2019: Identification of amazonian plants, south & north american birds, and niche pre- diction. In: Proceedings of CLEF 2019 (2019) 6. Liu, Z., Miao, Z., Zhan, X., Wang, J., Gong, B., Yu, S.X.: Large-scale long-tailed recognition in an open world. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 2537–2546 (2019) 7. Radenović, F., Tolias, G., Chum, O.: Fine-tuning cnn image retrieval with no human annotation. IEEE transactions on pattern analysis and machine intelligence (2018) 8. Redmon, J.: Darknet: Open source neural networks in c. http://pjreddie.com/ darknet/ (2013–2016) 9. Saerens, M., Latinne, P., Decaestecker, C.: Adjusting the outputs of a classifier to new a priori probabilities: a simple procedure. Neural computation 14(1), 21–41 (2002) 10. Sulc, M., Matas, J.: Improving cnn classifiers by estimating test-time priors. arXiv preprint arXiv:1805.08235v2 (2019) 11. Sulc, M., Picek, L., Matas, J.: Plant recognition by inception networks with test- time class prior estimation. Working Notes of CLEF 2018 (2018) 12. Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.A.: Inception-v4, inception-resnet and the impact of residual connections on learning. In: Thirty-First AAAI Confer- ence on Artificial Intelligence (2017)