=Paper=
{{Paper
|id=Vol-2936/paper-142
|storemode=property
|title=Weighted Pseudo Labeling Refinement for Plant Identification
|pdfUrl=https://ceur-ws.org/Vol-2936/paper-142.pdf
|volume=Vol-2936
|authors=Youshan Zhang,Brian Davison
|dblpUrl=https://dblp.org/rec/conf/clef/ZhangD21
}}
==Weighted Pseudo Labeling Refinement for Plant Identification==
Weighted Pseudo Labeling Refinement for Plant Identification Youshan Zhang, Brian D. Davison Lehigh University, 113 Research Drive, Bethlehem, PA, 18015 Abstract Unsupervised domain adaptation (UDA) focuses on transferring knowledge from a labeled source do- main to an unlabeled target domain. However, existing domain adaptation methods try to handle var- ious DA scenarios that are subject to imbalanced labels or large domain discrepancy datasets. In this paper, we propose a weighted pseudo labeling refinement model (WPLR) to balance the dataset using a weighted cross-entropy loss. We also utilize the CORAL loss to further reduce the domain difference. To improve the generalizability of the model, we develop an easy-to-hard pseudo labeling refinement process by probabilistic soft selection to suppress noisy predicted target labels. Experimental results demonstrate our WPLR model yields promising results on the PlantCLEF 2021 Challenge. Keywords Unsupervised domain adaptation, Pseudo labeling refinement, Plant identification 1. Introduction Automatic plant identification is helpful for the general audience in recognizing plant species without the expertise of botanists. Deep neural networks can improve recognition performance when a large number of labeled data are used for training but suffer from significant performance degradation when deployed in a new domain due to the problem of domain shift. However, the domain shift or domain mismatch problem exists for the plant identification problem in PlantCLEF. Due to the significant difference between herbarium and real photos, classification models often do not generalize well to the novel field photo domain. To circumvent the domain shift issue, the unsupervised domain adaptation (UDA) method has been proposed, which can transfer the model trained on the labeled source domain to an unlabeled target domain. Existing deep learning methods can be categorized into two major tracks: discrepancy-based methods [1, 2, 3] and adversarial learning methods [4, 5, 6]. The former aligns the distributions of source and target domains by directly minimizing the difference metric between feature distributions of the two domains, such as Maximum Mean Discrepancy (MMD) [1], CORrelation ALignment [2], Kullback-Leibler divergence [3], Jensen–Shannon divergence [7], and Wasserstein distance [8]. The latter category methods are inspired by GANs [9], and adversarial learning has shown its power in learning domain invariant CLEF 2021 – Conference and Labs of the Evaluation Forum, September 21–24, 2021, Bucharest, Romania " yoz217@lehigh.edu (Y. Zhang); bdd3@lehigh.edu (B. D. Davison) ~ https://sites.google.com/view/youshanzhang (Y. Zhang); http://www.cse.lehigh.edu/~brian/ (B. D. Davison) 0000-0002-0074-0979 (Y. Zhang); 0000-0002-9326-3648 (B. D. Davison) © 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) representations. It consists of a domain discriminator and a feature extractor. The domain discriminator aims to distinguish the source domain from the target domain, while the feature extractor aims to learn domain-invariant representations to fool the domain discriminator [4, 5, 6]. There is also much exploration of adversarial learning methods, such as DANN [10], MCD [11], TADA [12], SymNets [13], and ACDA [14]. Although many methods are proposed for domain adaptation, most of them are tested on small domain divergence datasets, which may have lower transferability to large-divergence datasets, and the data imbalance problem is not well addressed. To address these challenges, we offer two contributions: 1. We propose a weighted cross-entropy loss to balance the categorical data. To minimize the domain divergence, we utilize the existing CORAL loss. 2. To remove noisy pseudo labels in the target domain, we also employ an easy-to-hard pseudo labeling refinement process by probabilistic soft selection. We then form a high- quality pseudo-labeled target domain to improve the generalizability of the model. 2. Dataset PlantCLEF 2021 is a large-scale dataset of the PlantCLEF 2021 task[15, 16], organized in the context of the LifeCLEF 2021 challenge. Fig. 1 shows some challenging images in this dataset. Tab. 1 lists the statistics on PlantCLEF 2021 dataset. Due to the significant difference between herbarium and real photos, it is extremely difficult to identify the correct class. All images are the same as PlantCLEF 2020 dataset [17], but it also introduces five “traits" covering exhaustively all species of the challenge. Figure 1: Example images of the herbarium domain and photo domain. The large discrepancy between the two domains causes difficulty in improving the performance of the model. Table 1 Statistics of the PlantCLEF 2021 dataset Domain Number of Samples Number of Classes Herbarium (H) 320,750 997 Herbarium_photo_associations (A) 1,816 244 Photo (P) 4,482 375 Test (T) 3,186 - 3. Methods In this section, we will first introduce the problem and notation for UDA, and then introduce the different components of our Weighted Pseudo Labeling Refinement (WPLR) model. 3.1. Problem and notation In this work, we consider the unsupervised domain adaptation (UDA) classification problem in the following setting. There exists a labeled source domain 𝒟𝒮 = {𝒳𝒮𝑖 , 𝒴𝒮𝑖 }𝒩 𝑖=1 of 𝒩𝒮 labeled 𝒮 samples in 𝐶 categories and a target domain 𝒟𝒯 = {𝒳𝒯𝑗 }𝒩 𝑗=1 of 𝒩𝒯 samples without any 𝒯 labels (i.e., 𝒴𝒯 is unknown). The samples 𝒳𝒮 and 𝒳𝒯 obey the marginal distributions of 𝑃𝒮 and 𝑃𝒯 . The conditional distributions of the two domains are denoted as 𝑄𝒮 and 𝑄𝒯 . Due to the discrepancy between the two domains, the distributions are assumed to be different, i.e., 𝑃𝒮 ̸= 𝑃𝒯 and 𝑄𝒮 ̸= 𝑄𝒯 . Our ultimate goal is to learn a classifier 𝐹 under a feature extractor 𝐺, which reduces domain discrepancy and improves the generalization ability of the classifier to the target domain. Figure 2: The weight of each class. Figure 3: Architecture of the WPLR model. We first utilize NASNetLarge as the feature extractor 𝐺 to extract features from the two domains (𝐺(𝒳𝒮 ) and 𝐺(𝒳𝒯 )). The shared classifier 𝐹 is then trained using the extracted features. ℒ𝒲𝒮 is the weighted source classification loss, ℒ𝐶𝑂𝑅𝐴𝐿 is the CORAL 𝑛𝑝𝑡 loss, and ℒ𝒯 is the pseudo-labeled target domain classification loss. {𝑄(𝒳𝒯𝑗 ), 𝑄(𝒴𝒯𝑗 )}𝑗=1 is the pseudo- labeled target domain after 𝑇 times pseudo labeling refinement processes. Best viewed in color. 3.2. Weighted source classifier The task in the source domain is trained using the typical cross-entropy loss. However, there are imbalanced numbers of samples of each category. To handle this issue, we develop a weighted source classifier to balance the weight of each category based on the source samples. We define the weight of each class in the following equation. 𝒩𝑐 𝑚𝑒𝑑𝑖𝑎𝑛({ 𝒩𝒮𝒮 }𝐶 𝑐=1 ) 𝑊 = 𝒩𝑐 , (1) { 𝒩𝒮𝒮 }𝐶 𝑐=1 𝒩𝑐 where 𝒩𝒮𝑐 is the number of samples in each class, { 𝒩𝒮𝒮 }𝐶 𝑐=1 ∈ R 997×1 is the frequency of images in each class, 𝑚𝑒𝑑𝑖𝑎𝑛(·) takes the median value of the frequency. The frequency value varies; the median represents the middle frequency better than mean would. Fig. 2 shows the weight of each class (997 classes in total). Therefore, we develop the weighted cross-entropy loss for the labeled source domain in Eq. 2. 𝒮 𝒩 1 ∑︁ ℒ𝒲𝒮 = 𝑊𝑖 × ℒ𝑐𝑒 (𝐹 (𝐺(𝒳𝒮𝑖 )), 𝒴𝒮𝑖 ), (2) 𝒩𝒮 𝑖=1 where ℒ𝑐𝑒 is the typical cross-entropy loss, 𝐹 is the classifier in Fig. 3, and 𝐹 (𝐺(𝒳𝒮𝑖 )) is the predicted label. 3.3. CORAL loss CORrelation ALignment loss (CORAL) [2] is one frequently used distance-based loss function to minimize the difference between source and target domain. We also integrate CORAL loss during the training as follows, 1 ℒ𝐶𝑂𝑅𝐴𝐿 = 2 ||𝐶𝑂𝑉 (𝐹 (𝐺(𝒳𝒮 ))) − 𝐶𝑂𝑉 (𝐹 (𝐺(𝒳𝒯 )))||2𝐹 , (3) 4𝑑 where 𝑑 is the feature dimensionality, 𝐶𝑂𝑉 (·) is the covariance matrices of the source and target features, and || · ||2𝐹 denotes the squared matrix Frobenius norm. Therefore, our model is able to minimize the domain divergence between the source domain and the target domain during the training. 3.4. Pseudo labeling refinement To further reduce the domain difference, we also generate pseudo labels for the target domain. However, the detrimental effects of bad pseudo-labels are still significant. To mitigate this issue, we employ a 𝑇 times recurrent easy-to-hard pseudo-label refinement process to improve the quality of the pseudo-labels in the target domain via imposing a probabilistic soft selection [18, 19]. The initial shared classifier 𝐹 is optimized by ℒ𝒲𝒮 . For the inference, we can directly get predicted results for one target domain sample 𝐹 (𝐺(𝒳𝒯𝑗 )). Let Softmax(𝐹 (𝐺(𝒳𝒯𝑗 ))) be 𝑗 the predicted probability for each class, and 𝒴𝒫𝒯 = 𝑚𝑎𝑥(Softmax(𝐹 (𝐺(𝒳𝒯𝑗 ))))𝑖𝑛𝑑𝑒𝑥 be its dominant class label, where 𝑚𝑎𝑥(·)𝑖𝑛𝑑𝑒𝑥 return the index of the maximum probability value. Therefore, for the probabilistic soft selection, a higher quality pseudo label is defined as 𝑚𝑎𝑥(Softmax(𝐹 (𝐺(𝒳𝒯𝑗 )))) > 𝑝𝑡 , where 𝑝𝑡 is a threshold probability in number of 𝑡 train- ing. For 𝑇 times recurrent easy-to-hard pseudo-label refinement, for easy examples, 𝑝𝑡 has a higher value and for hard examples, 𝑝𝑡 has a lower value, hence 𝑝1 > 𝑝2 > · · · > 𝑝𝑇 . In pseudo labeling refinement, we form a robust new pseudo-labeled domain in the following equation, 𝑛 {𝑄(𝒳𝒯𝑗 ), 𝑄(𝒴𝒯𝑗 )}𝑗=1 𝑝𝑡 if and only if 𝑚𝑎𝑥(Softmax(𝐹 (𝐺(𝒳𝒯𝑗 )))) > 𝑝𝑡 (4) where 𝑄(·) represents the high quality, 𝑛𝑝𝑡 is the number of higher quality pseudo labels for the target domain. We hence can mitigate detrimental effects of bad pseudo-labels using Eq. 4. Similar to Eq. 2, we define the pseudo-labeled target domain loss as: 𝑛𝑝𝑡 1 ∑︁ ℒ𝒯 = 𝑊𝑗 × ℒ𝑐𝑒 (𝐹 (𝐺(𝑄(𝒳𝒯𝑗 ))), 𝑄(𝒴𝒯𝑗 )), (5) 𝑛𝑝𝑡 𝑗=1 where 𝑊 is the weight of each class and ℒ𝑐𝑒 is the cross-entropy loss. 3.5. WPLR model Fig. 3 depicts the overall framework of our proposed WPLR model. Taken together, our model minimizes the following objective function: 𝑇 ∑︁ arg min (ℒ𝒲𝒮 + ℒ𝐶𝑂𝑅𝐴𝐿 + ℒ𝑡𝒯 ) (6) 𝑡=1 where ℒ𝒲𝒮 is the weighted source classification loss, ℒ𝐶𝑂𝑅𝐴𝐿 is the CORAL loss, and ℒ𝒯 is the pseudo-labeled target domain classification loss. 4. Experiments 4.1. Implementation details We first extract features from the last fully connected layer [20, 21, 22] of a retrained NASNet- Large [23] model. One image can be denoted by a feature vector with the size of 1 × 1000. Therefore, the feature representation of domain herbarium (H) has the size of 320, 750 × 1000, domain herbarium_photo_associations (A) has the size of 1, 816 × 1000, domain photo (P) has the size of 4, 482 × 1000, and domain test (T) has the size of 3, 186 × 1000. Domain H + A has the size of 322, 566 × 1000. In Tab. 2, H P represents learning knowledge from domain H, which is applied to domain P [24]. We implement our approach using PyTorch. The outputs of the three Linear layers are 1000, 1000 and |𝐶|, respectively. Parameters in recurrent pseudo labeling are 𝑇 = 5 and {𝑝𝑡 }5𝑡=1 = [0.9, 0.8, 0.7, 0.6, 0.5]. Learning rate (0.001), batch size (64), optimizer (Adam) and number of epochs (𝒩𝒮 /64) are determined by performance on the source domain. Experiments are performed with a GeForce 1080 Ti. We also compare our results with four domain adaptation methods: DANN [10], ADDA [5], NASNetLarge-𝐴𝐶𝐿 [24] and BA3US [25]. 4.2. Results Table 2 Accuracy (%) on PlantCLEF 2021 dataset for photo domain Task AP HP H+A P DANN [10] 1.07 1.85 2.01 ADDA [5] 2.95 3.05 3.43 BA3US [25] 3.56 4.65 5.31 NASNetLarge-𝐴𝐶𝐿 [24] 5.98 8.64 9.67 WPLR- ℒ𝐶𝑂𝑅𝐴𝐿 − ℒ𝒯 6.03 9.12 10.03 WPLR- ℒ𝒯 6.12 9.23 11.46 WPLR- ℒ𝐶𝑂𝑅𝐴𝐿 6.22 9.47 12.51 WPLR 6.38 9.645 13.44 Tab. 2 shows the results of our ∑︀ WPLR model of the photo domain. We report the accuracy of the whole photo domain (𝐴𝑐𝑐 = 𝒩 𝑗=1 (𝒴𝒯 𝑗 == 𝒴𝒯 𝑗 )/𝒩𝒯 × 100), where 𝒴𝒯 is the predicted 𝒯 ^ ^ label for the target domain. Compared with all other four methods, our WPLR model achieves the highest accuracy in all three tasks, and especially in H+A P task. We also carefully conduct an ablation study to demonstrate the effects of different loss functions on final classification accuracy. Notice that weighted source classification loss ℒ𝒲𝒮 is required for UDA. “WPLR- ℒ𝐶𝑂𝑅𝐴𝐿 − ℒ𝒯 ” is implemented without ℒ𝐶𝑂𝑅𝐴𝐿 and ℒ𝒯 . It is a simple model, which only reduces the source risk without minimizing the domain discrepancy using ℒ𝒲𝒮 . “WPLR- ℒ𝐶𝑂𝑅𝐴𝐿 ” reports results without performing CORAL loss. “WPLR- ℒ𝒯 ” reports results without performing the 𝑇 time pseudo labeling refinement process. We can find that with the increasing number of loss functions, the accuracy of our model keeps improving. Table 3 MRR on PlantCLEF 2021 challenge for test domain Team Full test set Sub-set of the test set Organizer’s submission [15] 0.198 0.093 Neuon AI 0.181 0.158 LU (ours) 0.065 0.037 Domain_run 0.065 0.037 To_be 0.056 0.038 The effectiveness of loss functions on classification accuracy is ordered as ℒ𝒯 > ℒ𝐶𝑂𝑅𝐴𝐿 . Therefore, the proposed weighted classification loss, CORAL loss, and easy-to-hard target domain pseudo labeling refinement approaches are effective in minimizing target domain risk and improving the accuracy. We also list the final performance of our model in the test domain in Tab. 3. Our model earns the second position in the PlantCLEF 2021 challenge. We provided a total of nine submissions; the MRR of the full test set ranged from 0.034 to 0.065, as a result of varying hyperparameters (different number of iterations, 𝑇 and 𝑝𝑡 ). 5. Discussion There are two compelling advantages of our WPLR model. First, we propose a weighted cross- entropy loss to mitigate the imbalanced data issue in the source domain. Secondly, we develop an easy-to-hard refinement process to improve the quality of pseudo labels in the target domain. This strategy considers probabilistic soft selection, and it hence can push the shared classifier 𝐹 towards the target domain. Compared with other baselines in Tab. 2, the 𝑇 times easy-to-hard refinement process is effective in improving the classification accuracy and further reduces the domain discrepancy. However, our model only earns the second position in the challenge, and the results are a little bit lower than the Organizer’s submission. One underlying reason is that our model cannot extract very robust invariant features. Therefore, we will consider designing a better feature extractor method and distill the domain invariant features across the two domains for future work. In addition, we would like to include more external data during the training (e.g., GBIF [26]). 6. Conclusion In this paper, we propose a novel weighted pseudo labeling refinement (WPLR) method for domain adaptation to solve the plant identification problem. We develop a weighted cross- entropy loss to balance the categorical data and utilize the CORAL loss to minimize the domain divergence. We also employ an easy-to-hard pseudo labeling refinement process by probabilistic soft selection. It can improve the quality of pseudo labels and remove the detrimental effects of bad labels. Experimental results demonstrate our proposed WPLR model is better than several baselines. References [1] E. Tzeng, J. Hoffman, N. Zhang, K. Saenko, T. Darrell, Deep domain confusion: Maximizing for domain invariance, arXiv preprint arXiv:1412.3474 (2014). [2] B. Sun, K. Saenko, Deep CORAL: Correlation alignment for deep domain adaptation, in: European Conference on Computer Vision, Springer, 2016, pp. 443–450. [3] Z. Meng, J. Li, Y. Gong, B. Juang, Adversarial teacher-student learning for unsupervised domain adaptation, in: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, 2018, pp. 5949–5953. [4] Y. Ganin, E. Ustinova, H. Ajakan, P. Germain, H. Larochelle, F. Laviolette, M. Marchand, V. Lempitsky, Domain-adversarial training of neural networks, The Journal of Machine Learning Research 17 (2016) 2096–2030. [5] E. Tzeng, J. Hoffman, K. Saenko, T. Darrell, Adversarial discriminative domain adaptation, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 7167–7176. [6] Y. Zhang, H. Ye, B. D. Davison, Adversarial reinforcement learning for unsupervised domain adaptation, in: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2021, pp. 635–644. [7] J. Jiang, X. Wang, M. Long, J. Wang, Resource efficient domain adaptation, in: Proceedings of the 28th ACM International Conference on Multimedia, 2020, pp. 2220–2228. [8] B. Bhushan Damodaran, B. Kellenberger, R. Flamary, D. Tuia, N. Courty, DeepJDOT: Deep joint distribution optimal transport for unsupervised domain adaptation, in: Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 447–463. [9] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, Y. Bengio, Generative adversarial nets, in: Advances in Neural Information Processing Systems, 2014, pp. 2672–2680. [10] M. Ghifary, W. B. Kleijn, M. Zhang, Domain adaptive neural networks for object recognition, in: Pacific Rim International Conference on Artificial Intelligence, Springer, 2014, pp. 898– 904. [11] K. Saito, K. Watanabe, Y. Ushiku, T. Harada, Maximum classifier discrepancy for unsuper- vised domain adaptation, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 3723–3732. [12] X. Wang, L. Li, W. Ye, M. Long, J. Wang, Transferable attention for domain adaptation, in: Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, 2019, pp. 5345–5352. [13] Y. Zhang, H. Tang, K. Jia, M. Tan, Domain-symmetric networks for adversarial domain adaptation, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 5031–5040. [14] Y. Zhang, B. D. Davison, Adversarial continuous learning in unsupervised domain adapta- tion., in: ICPR Workshops (2), 2020, pp. 672–687. [15] H. Goëau, P. Bonnet, A. Joly, Overview of PlantCLEF 2021: cross-domain plant identifica- tion, in: Working Notes of CLEF 2021 - Conference and Labs of the Evaluation Forum, 2021. [16] A. Joly, H. Goëau, S. Kahl, L. Picek, T. Lorieul, E. Cole, B. Deneu, M. Servajean, R. Ruiz De Castañeda, I. Bolon, H. Glotin, R. Planqué, W.-P. Vellinga, A. Dorso, H. Klinck, T. Denton, I. Eggel, P. Bonnet, H. Müller, Overview of LifeCLEF 2021: a system-oriented evaluation of automated species identification and species distribution prediction, in: Proceedings of the Twelfth International Conference of the CLEF Association (CLEF 2021), 2021. [17] H. Goëau, P. Bonnet, A. Joly, Overview of lifeclef plant identification task 2020, in: CLEF working notes 2020, CLEF: Conference and Labs of the Evaluation Forum, Sep. 2020, Thessaloniki, Greece., 2020. [18] Y. Zhang, B. D. Davison, Deep spherical manifold gaussian kernel for unsupervised domain adaptation, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2021, pp. 4443–4452. [19] Y. Zhang, B. D. Davison, Efficient pre-trained features and recurrent pseudo-labeling in unsupervised domain adaptation, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2021, pp. 2719–2728. [20] Y. Zhang, J. P. Allem, J. B. Unger, T. B. Cruz, Automated identification of hookahs (wa- terpipes) on Instagram: an application in feature extraction using convolutional neural network and support vector machine classification, Journal of Medical Internet Research 20 (2018) e10513. [21] Y. Zhang, B. D. Davison, Modified distribution alignment for domain adaptation with pre-trained Inception ResNet, arXiv preprint arXiv:1904.02322 (2019). [22] Y. Zhang, B. D. Davison, Impact of ImageNet model selection on domain adaptation, in: Proceedings of the IEEE Winter Conference on Applications of Computer Vision Workshops, 2020, pp. 173–182. [23] B. Zoph, V. Vasudevan, J. Shlens, Q. V. Le, Learning transferable architectures for scalable image recognition, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 8697–8710. [24] Y. Zhang, B. D. Davison, Adversarial consistent learning on partial domain adaptation of PlantCLEF 2020 challenge, in: CLEF working notes 2020, CLEF: Conference and Labs of the Evaluation Forum, 2020. [25] J. Liang, Y. Wang, D. Hu, R. He, J. Feng, A balanced and uncertainty-aware approach for partial domain adaptation, arXiv preprint arXiv:2003.02541 (2020). [26] L. Picek, M. Sulc, J. Matas, Recognition of the amazonian flora by inception networks with test-time class prior estimation, in: Working Notes of CLEF 2019 - Conference and Labs of the Evaluation Forum, 2019.