Introduction

Non-local DenseNet for PlantCLEF 2019 Contest?

Dat Nguyen Thanh

datnt.hust59@gmail.com 0

Georges Quenot

georges.quenot@imag.fr 0

Lorraine Goeuriot

lorraine.goeuriot@imag.fr 0 0 Univ. Grenoble Alpes , CNRS, Grenoble INP, LIG, F-38000 Grenoble France

Image-based plant identi cation is a promising tool constituting the automation of agriculture and environmental conservation as stated in. As an attempt to tackle the data de cient challenge in PlantCLEF 2019, the DenseNet architecture with competitive performance and relatively low number of parameters is augmented with a non-local block. A variety of data sampling schemes are also evaluated as a part of the work. The evaluation of the model and the methods is detailed in the content of the paper.

DenseNet Non-local block Plant Identi cation

Introduction

Various types of plants grow all around us, yet, little amongs us are plant experts. Indeed, knowing what plant available and where they are will be extremely helpful in pharmacy, from productional and academical perspective, environment protection. The rising of machine learning with arti cial neural networks and convolutional neural networks which, are able to performs at near-human capability in image processing task, the popular use case of such technologies are for the automation of the task which human already excels: face recognition, image classi cation, etc. Still, it is would be highly bene cial if we can leverage these technologies in the task that human are yet to excel at in mass: Plant Identi cation.

The image-based plant identi cation can be formulated as a plant classi cation problem, where the input is an image containing the plant and the output is the id of the plant pre-de ned by user. Formulating the problem of PlantCLEF contest as an image-classi cation task, the task itself in general has observed drastic improvement with the deep learning based methods, in the summarization of PlantCLEF 2017[ 2 ], it is shown that the best competitors have got over 90% accuracy using the aforementioned method. Notably, in the LifeCLEF 2018 contest[ 3 ], the are quite a number of software that achieved comparable accuracy to that of the top experts.

In this work, we present our proposed methods for the PlantCLEF 2019 [ 4 ] which is part of the LifeCLEF 2019 [ 1 ] which focus on 10,000 species from data de cient regions. The rest of this paper is structured as follows: Section 2 gives an overview of related works on automatic plant-identi cation in deep learning from previous contests, section 3 describe the proposed architecture for prediction, section 4 provides additional information on data augmentation and data sampling schemes and nally we conclude our works in section 5.

The source code and trained models are made available under the github link: https://github.com/datvo06/PlantCLEF2019MRIM. 2

Related Work

Ever since AlexNet[ 9 ] won the competition of ImageNet classi cation 2012, Convolutional Neural Networks(CNNs) has always been at the center of image classi cation. Following AlexNet, there have been three lines of research focusing on the CNNs: modifying the operations in the CNNs, dividing the networks into several sub-modules and make improvement on each of them, and nally, altering the information ow by adding connections.

Fine-tuning modules and adding auxiliary loss The inception model [ 12,13,11 ] follows the principle of repeating many carefully designed block of lter stacked horizontally (receive the same input and the output feature map are concatenated). Each time with new version of Inception Net, the authors often optimizing one of these blocks so that the number of computations, memory consumption, number of parameters can be optimized. The Inception-v1 is used for the baseline of PlantCLEF 2017, achieving the Top 1 accuracy of 0.513 Adding Residual connections One of the problem with original deep neural networks is that the more layers added, there more model prone to gradient vanishing. Various works have been proposed to amend this problem, (i.e LSTM [ 7 ] for sequenced input, highway network [ 7 ] which introduce a gated mechanism for ANNs), for convolutional neural network, residual additive connection proposed by [ 5 ] is one of them, the author later analyzed carefully the e ect of the order of each Residual Block, resulting in [ 6 ], a modi ed version of the ResNet used in PlantCLEF 2017 [ 2 ], achieved the best score among non-ensemble runs with top 1 accuracy of 0.853.

Combining Inception and ResNet The inception design and the ResNet design has merged together, rst in the Inception-ResNet design [ 11 ]. The network architecture still bases on the original principle of carefully designed block, the authors did this by adding the residual connection in a few variant of inception blocks. Inception-ResNet v2 achieved similar score to ResNet modi ed in the PlantCLEF 2017 [ 2 ] with MRR 0.847, Top 1 0.794, Top 5 0.913 and are used by the majority participants in PlantCLEF 2018.

Ensemble prediction The top performer of PlantCLEF 2017 [ 10 ] utilized ensemble prediction of multiple predictions with bagged averaging, the models used are ResNet, ResNeXt [ 10 ] and Inception-v1.

DenseNet As the residual connections has been proven to allow better gradient ow and performance boost to the convolutional neural networks, the DenseNets author[ 8 ] has tested the idea of densely connected layers. The model capable of achieving state-of-the-arts accuracy in classifying tasks with a relatively low number of parameters, making it a potentially good baseline for the data-de cient context. For this reason we choose the DenseNet as the baseline for the model.

Data Sampling Schemes To the best of our knowledge on data-sampling for training, there are little overlapping works with the strategies proposed. 3 3.1

Model Architecture Non-local Networks 3.2 Adding Non-local operation to the DenseNet

The non-local operation is added between the output of the third dense block and the 1 1 channel-squashing convolution.

The non-local block was added after the third dense block for several reason: { First, in the original introduction of the non-local block [14], multiple nonlocal position has been tested, of which, the best position is after the third Residual Convolution Block { We have known based on the mechanism of self-attention, the non-local block performs pairwise dot product between two transformation of every pair of pixels on the grid. That why it is necessary to place a few convolution blocks before the non-local block so that the operation may potentially leverage informations from local neighbors.

The non-local neural network [14] was proposed to solve the problem of limited information propagation from CNN and LSTM. The idea is to performs interpixels correlations from di erent position in the feature maps, leading to generate more power pixel-wise representation. The non-local operation, according to [14] is de ned as: yi =

1 C (x)

X d (xi; xj ) h (xj ) 8j Where i is the index on the output feature maps (in space, time, or spacetime in the original case of video classi cation, annotation), j is the index on the input feature maps x and d computes the scalar representing the pairwise relationship between the entities in the items reside in these locations.

We shall see on the next section where the non-local block is added to the DenseNet baseline. (1) When applied into the nal predictions, each instance of observation has multiple samples, so that there either has to be some middle layer to aggregate prediction in order to combine the prediction of multiple models on multiple instance. For this, a two-level pooling is leveraged: The rst level of pooling is used for ensembling the predictions of multiple different trained prediction instances and the second is used for aggregation of predictions from multiple observations. 4 4.1

Experiments and Results Data Augmentation

Several data augmentation strategies have been applied: { Randomly resizing image { N : total number of samples { ni: number of samples for ith class { oi: oversampling factor for ith class { wi: sampling weight for ith class { m: the median number of samples { : the mean number of samples per classes.

Minimum Threshold Resampling This strategy only focus on augmenting the classes having less number of samples than the average number of samples per classes. Here, for each class with number of samples ni, the oversampling factor oi will be assigned the value of =ni.

The oversampling might make some samples in the classes appears too many times compared to the others, making the model prone to over t and also, so on each epoch, the classes samples are reshu ed and resampled.

Another problem is that the training times will be prolonged due to the increase in number of samples. For this, another strategy is also applied which is described below.

Smoothed Re-sampling This strategy partly oversampling small classes while also performs subsampling on classes with large number of samples. All of the aforementioned parameters are constant during training. The number of total samples which will be used throughout the training session is the sample: N . On each epoch however, each of the classes will be under-sampled or oversampled based on the weight wi, total weight on one epoch will be normalized so that the number of total samples will always be equal to N : Pi wioini. We will now turn to how to choose the oi and wi factors. With the m = 10 for examples, all the classes will initially applied the oversampling factor oi. The oversampling ensures a minimal number and diversity via data augmentation. { oi = 1 for ni > m (no oversampling beyond median). { oi = (1 + m=ni)=2 for ni m (oversampling for linear importance between m=2 and m).

Oversampling reduces the imbalance from about 1000:1 to about 100:1. Weighting further ensures a better balancing using a power law. wi = (oini) 1.

{ With = 1:0, no weighting, original case (except the oversampling e ect). { With = 0:5, weighting further reduces the imbalance from about 100:1 to about 10:1. { With = 0:25, weighting further reduces the imbalance from about 100:1 to about 3:1. { With = 0:15, weighting further reduces the imbalance from about 100:1 to about 2:1.

In all cases, re-normalize (divide each wi by the same value so that i(wioini) = N ). 4.3

Experiment Results on the PlantCLEF 2017

All the candidate models have been trained on the PlantCLEF 2017 for preliminary testing before being used on the PlantCLEF 2019. The models are trained on the EOL set and tested on Web dataset with the data augmentation strategies mentioned in the subsection 4.1. The result is shown in the table 1.

It is can be easily seen that the Non-local addition added an increase of accuracy in both the DenseNet-121 and DenseNet-201 and the DenseNet slightly out performs the ResNet. Initial result The model are further tested on the PlantCLEF 2019 dataset. The initial result is shown in Table 2. Thus, we can easily observe a drastic performance drop. The further inspection of the dataset shows some challenging properties: 1. The classes are imbalanced 2. Repeated samples across the classes makes the learning harder. 3. Noisy Samples

Experiment Results on the Class-Filtered PlantCLEF 2019 We rst

test the e ects of following strategies: { Temporary removing all the classes with less than 5 samples { Further remove noisy/incorrect formatted images.

The result is a 8500-classes dataset with still over 400,000 samples. The evaluation of the model is shown is Table 3. The result does not show much di erences.

Experiment Results on the Repetition-Filtered PlantCLEF 2019 Fur

ther experiments are performed on the dataset with di erent thresholds for repetition, the following training/validation split strategy is applied: for each class, at least dnsamples=5e is taken as part of the validation set, if the class has only one samples, the training set for that class would be empty. Here, the minimum threshold sampling is applied. The evaluation result is shown in table 4. It can easily be seen that removing the all repetitions from duplication creates empty classes, which would heavily di erentiates the training and validating set, making it hard to validate the model.

Experiment Results on conditional repetition ltered PlantCLEF 2019

On the nal try of ltering the dataset involves ltering out all repeated samples unless it creates empty class. The statistics of the resulting dataset is stated in Table 5.

With all the repeated samples trimmed, the distribution is still pretty imbalanced, Figure 5 shows the distributions with Smoothed Resampling strategy. Since this is the nal try, the whole dataset has to be used for training, for this, other external datasets has to be used for testing. More inspections on the PlantCLEF 2017 dataset reveals that there are 551 common categories betweens the PlantCLEF 2017 and PlantCLEF 2019 dataset. The samples are sorted by sizes and ltered to avoid having them in the training set. The statistic of the dataset is shown in table 6.

(a) = 0:5 (b)

= 0:25

The nal obtained results before submission testing on these dataset are described in the table 7:

We can see that with the same model, trained on the same number of epochs, the ltering strategies shows the di erences: The ensemble of 4 model trained with = 0:5 gives of the best performance, the model which trained with all data from PlantCLEF 2019 is also evaluated and compared.

Final Test Results The nal results are given by the top 1 accuracy on the test dataset and the hand-picked subset by experts. The detail of each run is given in table 8. The best accuracy of top 1 on the expert-chosen samples set is achieved with the mean of 4 instances trained with = 0:25 with 2 means pooling, and best accuracy of top 1 on all samples is chosen with = 0:5 and two max pooling.

Run Plant Identi cation is an important step in medical, agricultural and environment resource planning. However, the problem is currently still a challenging to both human and computer vision-based technologies even with the development of deep learning. With data-de cient challenge, the problem is even harder to conquer. The work aims to provide a decent-performing model proven with extensive experiments along with a variety of data-handling strategies, yet it still cannot solve the whole problem. The remaining problems are avoiding of bias between classes belonging to the same genus, this perhaps can be performed by adding hierarchical classi cation where the system rst identi es the genus and then the species. The data-de cient challenge still need to be tackled, either by leveraging unsupervised or semi-supervised learning methods. On the model designing perspective, the authors believe that the model can potentially be improved by adding inter-channel correlations in the non-local block. https://doi.org/10.1016/j.patrec.2014.01.008, http://arxiv.org/abs/1602. 07261 12. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going Deeper with Convolutions pp. 1{12 (sep 2014), http://arxiv.org/abs/1409.4842 13. Szegedy, C., Vanhoucke, V., Io e, S., Shlens, J., Wojna, Z.: Rethinking the Inception Architecture for Computer Vision (dec 2015), http://arxiv.org/abs/1512. 00567 14. Wang, X., Girshick, R., Gupta, A., He, K.: Non-local Neural Networks. Tech. rep. (2018)

Alexis

Joly , Herv Goau, C.B.S.K.M.S.H.G.P.B.W.P.V.R.P.F.R.S.H.M. : Overview of lifeclef 2019: Identi cation of amazonian plants, south & north american birds, and niche prediction . In: Proceedings of CLEF 2019 ( 2019 )

2. Goeau, H., Bonnet , P. , Joly , A. : Plant identi cation based on noisy web data: The amazing performance of deep learning (LifeCLEF 2017) . CEUR Workshop Proceedings 1866(LifeCLEF) ( 2017 )

3. Goeau, H., Bonnet , P. , Joly , A. : Overview of ExpertLifeCLEF 2018: How far automated identi cation systems are from the best experts ? CEUR Workshop Proceedings 2125 ( 2018 )

4. Goeau, H., Bonnet , P. , Joly , A. : Overview of lifeclef plant identi cation task 2019: diving into data de cient tropical countries . In: CLEF working notes 2019 ( 2019 )

5. He , K. , Zhang , X. , Ren , S. , Sun , J.: Deep Residual Learning for Image Recognition . In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . pp. 770 { 778 . IEEE (jun 2016 ). https://doi.org/10.1109/CVPR. 2016 . 90 , http:// ieeexplore.ieee.org/document/7780459/

6. He , K. , Zhang , X. , Ren , S. , Sun , J.: Identity mappings in deep residual networks . Lecture Notes in Computer Science (including subseries Lecture Notes in Arti cial Intelligence and Lecture Notes in Bioinformatics) 9908 LNCS , 630 { 645 ( 2016 ). https://doi.org/10.1007/978-3- 319 -46493-0 38

7. Hochreiter , S. , Schmidhuber , J.: Long Short-Term Memory . Neural Computation 9 ( 8 ), 1735 {1780 (nov 1997 ). https://doi.org/10.1162/neco. 1997 . 9 .8.1735, http:// www.mitpressjournals.org/doi/10.1162/neco. 1997 . 9 .8. 1735

8. Huang , G. , Liu , Z. , Van Der Maaten , L. , Weinberger , K.Q. : Densely connected convolutional networks . In: Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017 ( 2017 ). https://doi.org/10.1109/CVPR. 2017 .243

9. Krizhevsky , A. , Sutskever , I. , Hinton , G.E.: ImageNet Classi cation with Deep Convolutional Neural Network . Proceedings of the 25th International Conference on Neural Information Processing Systems 1 , 1097 |- 1105 ( 2012 ). https://doi.org/10.1061/(ASCE)GT. 1943 - 5606 .0001284, http://dl.acm. org/citation.cfm?id= 2999134 . 2999257

10. Lasseck , M. : Image-based plant species identi cation with deep Convolutional Neural Networks . CEUR Workshop Proceedings 1866 ( 2017 )

11. Szegedy , C. , Io e, S., Vanhoucke , V. , Alemi , A. : Inception-v4, InceptionResNet and the Impact of Residual Connections on Learning ( 2016 ).