=Paper= {{Paper |id=Vol-2696/paper_123 |storemode=property |title=Adversarial Consistent Learning on Partial Domain Adaptation of PlantCLEF 2020 Challenge |pdfUrl=https://ceur-ws.org/Vol-2696/paper_123.pdf |volume=Vol-2696 |authors=Youshan Zhang,Brian D. Davison |dblpUrl=https://dblp.org/rec/conf/clef/ZhangD20 }} ==Adversarial Consistent Learning on Partial Domain Adaptation of PlantCLEF 2020 Challenge== https://ceur-ws.org/Vol-2696/paper_123.pdf
      Adversarial Consistent Learning on Partial
       Domain Adaptation of PlantCLEF 2020
                     Challenge

                       Youshan Zhang and Brian D. Davison

     Lehigh University, Computer Science and Engineering, Bethlehem, PA, USA
                            {yoz217,bdd3}@lehigh.edu



        Abstract. Domain adaptation is one of the most crucial techniques to
        mitigate the domain shift problem, which exists when transferring knowl-
        edge from an abundant labeled sourced domain to a target domain with
        few or no labels. Partial domain adaptation addresses the scenario when
        target categories are only a subset of source categories. In this paper, to
        enable the efficient representation of cross-domain plant images, we first
        extract deep features from pre-trained models and then develop adver-
        sarial consistent learning (ACL) in a unified deep architecture for partial
        domain adaptation. It consists of source domain classification loss, adver-
        sarial learning loss, and feature consistency loss. Adversarial learning loss
        can maintain domain-invariant features between the source and target
        domains. Moreover, feature consistency loss can preserve the fine-grained
        feature transition between two domains. We also find the shared cate-
        gories of two domains via down-weighting the irrelevant categories in
        the source domain. Experimental results demonstrate that training fea-
        tures from NASNetLarge model with proposed ACL architecture yields
        promising results on the PlantCLEF 2020 Challenge.

        Keywords: Adversarial learning · Partial domain adaptation · Plant
        identification.


1     Introduction

Automated plant identification is important in recognizing plant species. The
availability of massive labeled training data is a prerequisite of machine learning
models. Unfortunately, such a requirement cannot be met in the plant identifica-
tion problem since we have sparse labels for real-world plant images. Therefore,
we propose to transfer knowledge from an existing auxiliary labeled herbarium
domain to the field photo domain with limited or no labels. However, due to
the phenomenon of data bias or domain shift [11], classification models do not
generalize well from an existing herbarium domain to a novel field photo domain.
    Copyright © 2020 for this paper by its authors. Use permitted under Creative
    Commons License Attribution 4.0 International (CC BY 4.0). CLEF 2020, 22-25
    September 2020, Thessaloniki, Greece.
     Domain adaptation (DA) has been proposed to leverage knowledge from an
abundant labeled source domain to learn an effective predictor for the target
domain with few or no labels, while mitigating the domain shift problem [16,17,
19,20]. In this paper, we focus on unsupervised domain adaptation (UDA), where
the target domain has no labels. Since we have fewer classes in the field photo
domain, and the classes of the field photo domain is a subset of the classes of
the source herbarium domain, we investigate partial domain adaptation (PDA)
for the PlantCLEF 2020 Challenge.
     Recently, deep neural network methods have been widely used in the domain
adaptation problem. Notably, adversarial learning shows its power in embedding
in deep neural networks to learn feature representations to minimize the discrep-
ancy between the source and target domains [9, 14]. Inspired by the generative
adversarial network (GAN) [6], adversarial learning also contains a feature ex-
tractor and a domain discriminator. The domain discriminator can distinguish
the source domain from the target domain, while the feature extractor can learn
domain-invariant representations to fool the domain discriminator [9,10,18]. The
target domain risk (the error of the target domain) is expected to be minimized
via minimax optimization. Cao et al. presented adversarial learning for PDA,
which alleviates negative transfer by reducing the outlier of source classes for
training the source classifier and domain labels, while positive transfer is im-
proved via matching the feature distributions in the shared label space [2]. Simi-
larly, the example transfer network is proposed to jointly learn domain-invariant
representations and a progressive weighting method to examine the transfer-
ability of source examples. The model can improve positive transfer by relevant
examples and mitigate negative transfer by identifying irrelevant examples [3].
     Although many methods are proposed for partial domain adaptation, they
still suffer from two challenges: (1) the models are evaluated on small datasets,
while it has lower transferability to the large-scale dataset, and (2) the feature
consistency of two domains is inappropriately ignored.
     To address the aforementioned challenges, we aggregate three different loss
functions in one framework: source domain classification loss, adversarial learn-
ing loss, and feature consistency loss to reduce the discrepancy of the two do-
mains. Moreover, our model is evaluated on a large-scale plant identification
dataset to improve the estimate of the generalization ability of our model.
     Our contributions are three-fold:

1. We propose a novel adversarial consistent learning network (ACL) for PDA,
   to adversarially minimize the domain discrepancy of the source and target
   domains and maintain domain-invariant features;
2. The proposed adversarial learning loss and feature consistency loss can dis-
   tinguish the target domain from the source domain, and preserve the fine-
   grained feature transition between the two domains;
3. We impose shared category selection to filter out the irrelevant categories
   in the source domain. By down-weighting the irrelevant categories in the
   source domain, we can reduce negative transfer from the source domain to
   the target domain.
Fig. 1: Examples images of the PlantCLEF 2020 dataset. The large discrepancy
between training and test data cause the difficulty in the PDA.

Experimental results show that ACL achieves higher classification accuracy than
several baseline methods and yields promising results on the PlantCLEF 2020
Challenge.


2     Dataset

PlantCLEF 2020 is a large-scale dataset of the PlantCLEF 2020 task [5], or-
ganized in the context of the LifeCLEF 2020 challenge [8]. Fig. 1 shows some
challenging images in this dataset. The herbarium domain contains 320,750 im-
ages in 997 species, and the number of images in different species are unbal-
anced. This dataset consists of herbarium sheets whereas the test set will be
composed of field pictures. The validation set consists of two domains herbar-
ium photo associations and photos. Herbarium photo associations domain in-
cludes 1,816 images from 244 species. This domain contains both herbarium
sheets and field pictures for a subset of species, which enables learning a mapping
between the herbarium sheets domain and the field pictures domain. Another
photo domain has 4,482 images from 375 species and images are from plant pic-
tures in the field, which is similar to the test dataset. The test dataset contains
3,186 unlabeled images. Due to the significant difference between herbarium and
real photos, it is extremely difficult to identify the correct class.


    We exclude the classId of “108335” in the photo domain since the major
classes are from the herbarium domain. In addition, herbarium domain does not
contain the “108335” category. Therefore, eight images are excluded in the photo
domain. The statistics of the PlantCLEF 2020 dataset are listed in Tab. 1.



                  Table 1: Statistics on PlantCLEF 2020 dataset
                             Domain    Number of Samples      Number of Classes
                     Herbarium (H)         320,750                 997
    Herbarium photo associations (A)        1,816                  244
                          Photo (P)         4,482                  375
                            Test (T)        3,186                    -
Fig. 2: The architecture of our proposed ACL model. We first extract deep fea-
tures from a pre-trained model for both source and target domains via Φ. The
shared layers are jointly trained with source and target features. Also, the pa-
rameters in shared layers are updated by the backward gradients ( ∂L            ∂LA
                                                                          ∂θS , ∂θA
                                                                            S


and ∂L
     ∂θCon ) from class label classifier, domain label predictor and feature consis-
        Con


tency regressor. The ACL model consists of three different loss functions (source
classification loss LS , adversarial domain loss LA , and feature consistency loss
LCon ). The feature extractor G in the shared layers is used for both classifier f
and domain discriminator D (The blue dash lines are the backward gradients,
and GRL stands for gradient reversal layer). Layers visualization of architecture
is shown in Fig. 3.


3     Methods

3.1   Motivation

Previous partial domain adaptation methods [2, 3] evaluated their models based
on a small dataset (e.g., Office 31), while their models have lower generalizability
to large-scale datasets. In addition, feature consistency of both source and target
domains is not well addressed in the PDA.
    In this paper, we present our approach: adversarial consistent learning (ACL)
on partial domain adaptation. It can align the feature distribution of the source
and target domains in the shared categories and guarantee feature consistency
across the two domains. Importantly, ACL identifies irrelevant source categories
via down-weighting class importance automatically. Evaluation on the large-scale
PlantCLEF 2020 challenge dataset shows a high generalizability of our model.


3.2   Problem and notation

For unsupervised domain adaptation, given a source domain DS = {XS i , YS i }N S
                                                                             i=1
of NS labeled samples across the set of categories CS and a target domain DT =
{XT j }N
       j=1 of NT samples without any labels (YT is unknown) across the set of
         T
categories CT . For partial domain adaptation, the number of categories in CT is
less than the number of categories in CS , and CT $ CS . The samples XS and XT
obey the marginal distributions of PS and PT . The conditional distributions of
two domains are denoted as QS and QT . Due to the discrepancy of two domains,
the distributions are assumed to be different, i.e., PS 6= PT and QS 6= QT . Our
ultimate goal is to learn a classifier f under a feature extractor G, which selects
shared categories between two domains, and ensures lower generalization error
in the target domain.

3.3   Deep features extraction
To circumvent the large computation resource requirement of training large-
scale PlantCLEF 2020 challenge datasets, we instead focus on deep features
from pre-trained models. Based on Zhang and Davison [17], the deep features
are extracted from the last fully connected layer of the pretrained model via
Φ. One represented feature vector has the size of 1 × 1000 and corresponds to
one plant image. Therefore, the source domain and the target domain can be
represented by Φ(XS ) ∈ RNS ×1000 and Φ(XT ) ∈ RNT ×1000 , respectively.

3.4   Source classifier
The task in the source domain is trained using the typical cross-entropy loss in
following equation:
                                          NS X CS
                                        1 X
         LS (f (G(Φ(XS ))), YS ) = −              YS log(f (G(Φ(XS i )))),       (1)
                                       NS i=1 c=1 ic

where YS ic ∈ [0, 1]CS is the probability of each class of ground truth for the
ith element of S, f is the classifier in Fig. 2, and f (G(Φ(XS i ))) is the predicted
probability.

3.5   Adversarial domain loss
In general adversarial learning, the system learns a mapping from the source do-
main to the target domain. Given the feature representation of feature extractor
G, we can learn a discriminator D, which can distinguish the two domains using
the following loss function:
                                                 NS
                                               1 X
  LA (GXS XT , G(Φ(XS )), G(Φ(XT ))) = −            log(1 − D(G(Φ(XS i ))))
                                              NS i=1
                                                 NT
                                                                                 (2)
                                               1 X
                                            −        log(D(G(Φ(XT j )))).
                                              NT j=1

However, Eq. 2 only guarantees source domain data will be close to the target
data (GXS XT ), and it does not ensure that the target data will be close to the
source data. We hence introduce another mapping from the target domain to
the source domain GXT XS in Eq. 3 and train it with the same adversarial loss
as in GXS XT as shown in Eq. 2.

                       LA (GXT XS , G(Φ(XS )), G(Φ(XT )))                        (3)

For GXS XT , the source domain has the label of 0 and the target domain has the
label of 1, which is corresponding to Domain Label 1 in Fig. 2. Meanwhile, for
GXT XS , 1 is the new label for the source domain and and 0 is the new label for
target domain, which is corresponding to Domain Label 2 in Fig. 2. Therefore,
we define the adversarial learning loss as:

       LA (G(Φ(XS )), G(Φ(XT ))) = LA (GXS XT , G(Φ(XS )), G(Φ(XT )))
                                                                                  (4)
                                    + LA (GXT XS , G(Φ(XS )), G(Φ(XT ))).

3.6   Feature consistency loss

To encourage the source domain and target domain information to be preserved
during adversarial learning, we propose a feature consistency loss in our model.
Details of the feature reconstruction layers are shown in Fig. 2; the reconstructed
layers are right behind the feature extractor G in the shared layers, and they
aim to reconstruct the extracted features and maintain the invariant features
during the conversion process. The feature consistency loss is defined as:

              LCon (GXS XT , GXT XS , G(Φ(XS )), G(Φ(XT )))
                    = Exs ∼G(Φ(XS )) [`(GXT XS (GXS XT (xs )) − xs )]           (5)
                    + Ext ∼G(Φ(XT )) [`(GXS XT (GXT XS (xt )) − xt )],
where ` is the mean squared error loss function, which calculates the difference
between true features and the reconstructed features.


3.7   Shared categories selection

In PDA, the set of target domain labels is a subset of the source domain labels,
i.e., CT $ CS . In the PlantCLEF challenge, the size of irrelevant label set (CS −
CT ) is far larger than the size of CT (|CS − CT | >> |CT |). If we use all elements
of the source domain distribution to match the target domain distribution, it
will cause negative transfer since the target domain will also be forced to match
the irrelevant labels (CS − CT ). Therefore, it is important to identify the shared
categories between source and target domains.
     To address the aforementioned challenge, we re-weight the source domain
label set via reducing the irrelevant label set. During the training, we can get
the predicted probability of the target domain: YˆT j = f (G(Φ(XT j ))), which
gives a probability of each source label in CS . As we know, the set of irrelevant
source labels and target label set are disjoint, and the target data are significantly
dissimilar to the source data in the irrelevant label set. Therefore, the probability
of irrelevant categories should be sufficiently small and can be ignored. We then
defined the weight vector as:
                                       NT
                                     1 X
                                 W=        YˆT j ,                              (6)
                                    NT j=1

where W is a |CS |-dimensional weight vector. The irrelevant categories (CS − CT )
will have a much smaller weight than the shared categories. We then assign the
weight as 0 if its element Wc is less than a sufficiently small number (e.g.,
10e − 9). By reducing the weight of irrelevant categories, the shared categories
can be emphasized and negative transfer will be mitigated. The weight vector W
is applied in both the source classifier and domain discriminator over the source
domain data as shown in the following objective function.

3.8   Overall objective
We combine the three aforementioned loss functions to formalize our objective
function:
        L(XS , XT , YS , GXS XT , GXT XS )
         = LS (f (W(G(Φ(XS )))), YS ) + γLA (W(G(Φ(XS ))), G(Φ(XT )))           (7)
         + βLCon (GXS XT , GXT XS , W(G(Φ(XS ))), G(Φ(XT ))),
where γ and β are tradeoff parameters between different loss functions. Our
model ultimately minimizes the difference during the transition from the source
domain to target domain and from the target domain to the source domain.
Meanwhile, it maximizes the ability to distinguish the two domains.

3.9   Gradients of shared layers
The shared layers consist of the feature extractor G and the feature reconstruc-
tion layers. In G, there are two dense layers, a “Relu” activation layer, and a
dropout layer. The numbers of units of the dense layer are 1000 and 997, respec-
tively. The rate of the Dropout layer is 0.5. The feature reconstruction layers
have a “Relu” activation layer, a dropout layer and a dense layer with the num-
ber of units of 1000. The shared layers are jointly optimized by both the source
classification loss, adversarial domain loss and feature consistency loss.
     Let FE (·, θE ) be the output of the shared encoder with parameters of θE . In
addition, let FS (·, θS ) be the output of class label classifier with parameters of
θS , FA (·, θA ) be the output of domain label predictor with parameters of θA , and
FCon (·, θCon ) be the output of feature consistency regressor with parameters of
θCon . Therefore, the shared layers are optimized by these three gradients. The
parameters in the shared layers are updated in the following equation:

                      ∂LS                   ∂LA                   ∂LCon
       θS    θS − η       , θA   θA − ητ θA       , θCon θCon − η
                      ∂θS                    ∂θA                  ∂θCon
                                                                                (8)
                                         ∂LS         ∂LA   ∂LCon
                           θE    θE − η(      + τ θA     +       ),
                                         ∂θS         ∂θA   ∂θCon
where η is the learning rate and τ is the adaptation factor from gradient reversal
layer (GRL) in [4].


3.10    Theoretical Analysis

We now formalize the error bound of our model. ACL model is trained with both
the labeled source domain and the unlabeled target domain. The error bound of
the source domain and the target domain ((h)) in our model is then formally
written as:
                      (h) = XS (h, YS ) + XT (h, YˆT ),                (9)

where YˆT is the predicted label of target domain. The term XS (h, YS ) =
Ex∼XS [|h(x) − YS |] and XT (h, YˆT ) = Ex∼XT [|h(x) − YˆT |] denote the expected
risk over the source domain and the target domain with respect to the ground
truth labels and predicted labels, respectively (where | · | is the L1 norm).
    During the training, we expect the error XT (h, YˆT )) to be close to XT (h, YT ),
which evaluates the classifier f with true target domain labels. The smaller the
difference between these two errors, the better the model performs and more
discrepancies of the two domains are reduced. Existing domain adaptation the-
ory shows that the risk in the target domain can be minimized by bounding the
source risk and discrepancy between source and target domains [1]). Therefore,
the generalization error bound of our model is shown in the following Lemma.
    Lemma 1 Let h be a hypothesis in a class H. Then

                        (h) = XS (h, YS ) + XT (h, YˆT )
                                                                                   (10)
                             ≤ 2XS (h) + dH (DS , DT ) + C,

where dH (DS , DT ) = 2 suph,h0 ∈H |XS (h, h0 ) − XT (h, h0 )| is the H-divergence of
training and test data in the hypothesis space H. C = XS (h∗ , YS )+XT (h∗ , YT )
is the adaptability to quantify the error in ideal hypothesis h∗ space of training
and test data, which should be small and is the optimal hypothesis via minimiz-
ing the joint error in Eq. 11.

                       h∗ = arg min XS (h, YS ) + XT (h, YT )                    (11)

    In Lemma 1, the generalization boundary of our model consists of three
terms: training data error, data discrepancy dH (DS , DT ), which is estimated by
the disagreement of hypothesis in the space H, and the adaptability C of the ideal
joint hypothesis. In ACL model, the first term is measured by Eq. 1. The domain
discrepancy is assessed by adversarial learning loss and feature consistency loss.
Furthermore, ACL finds the ideal hypothesis and reduces the training error in
each iteration. Hence, our model can find a minimal boundary for two domains.
In other words, ACL can implicitly minimize the target domain risk, domain
discrepancy, and the adaptability of true hypothesis h in terms of the hypothesis
space H.
Fig. 3: Layers visualization of our proposed ACL model. Two input layers are
from source domain data and target domain data, respectively. The intermediate
model is the shared layers in Fig. 2. The source classification layer refers to the
classifier f , and two reconstruction layers guarantee the feature consistency of
two domains. Two subtract layers are used for the domain discriminator. In
addition, the gradient reversal layer is used for backpropagation.


4     Experiments

4.1   Implementation details

As aforementioned, the deep features are extracted from the last fully con-
nected layer [15, 17]. One represented feature vector has the size of 1 × 1000
and is corresponding to one plant image. Therefore, the feature representa-
tion of domain herbarium (H) has the size of 320, 750 × 1000, domain herbar-
ium photo associations (A) has the size of 1, 816 × 1000, domain photo (P) has
the size of 4, 482 × 1000, and domain test (T) has the size of 3, 186 × 1000. In the
experiment, our task is to reduce the error in the target domain (real-world plant
images), i.e., photo domain or test domain. Our tasks will focus more on the eval-
uation of the domain P and domain T. Since the herbarium photo associations
(A) is important to bridge the map between two domains, we hence include
the domain A in the training procedure to form a new source domain, which
consists of domain herbarium (H) and domain A. Domain H + A has the size
of 322, 566 × 1000. We then train the model based on these extracted feature
vectors. In Tab. 2, H  P represents learning knowledge from domain H, which
is applied to domain P.
    The parameters of ACL are first tuned based on the performance of the
domain P, while the model is trained with H + A domain. We then apply these
parameters to domain T and submit it to the challenge for the evaluation. Our
implementation is based on Keras. The parameters settings are β = γ = 0.5,
τ = 0.31, learning rate: η = 0.0001, batch size = 128, the number of iterations
is 1000 and the optimizer is Adam. The details of the layers are shown in Fig. 3.
    We also compare our results with two domain adaptation methods: DANN [4]
and ADDA [14]. In addition, we extracted features from four well-trained models
      Table 2: Accuracy (%) on PlantCLEF 2020 dataset for photo domain
                               Task     AP       HP       H+A  P
                           DANN [4]      1.07      1.85       2.01
                          ADDA [14]      2.95      3.05       3.43
                      ResNet50-ACL       2.96      4.83       6.97
                   InceptionV3-ACL       3.02      5.93       7.95
           Inception-Resnet-V2-ACL       3.73      7.07       8.43
           NASNetLarge-ACL −W            3.84      7.92       8.18
               NASNetLarge-ACL          5.98      8.64        9.67


(ResNet50 [7], InceptionV3 [13], Inception-Resnet-V2 [12], NASNetLarge [21]),
which is trained based on large-scale ImageNet datasets. We then feed these
different extracted features into the shared layers and optimize the objective
function in Eq. 7.


4.2   Results

The performance of the photo domain is shown   PNTin Tab.    2. We report the ac-
curacy of the whole photo domain (Acc =              (YˆT j == YT j )/NT × 100),
                                                 j=1
where YˆT is the predicted label for the target domain. We can observe that the
extracted features from NASNetLarge with our ACL architecture achieves the
highest performance across all three tasks. We observe that two domain adapta-
tion methods have relatively lower performance in all three tasks. One reason is
that these two methods have weak feature extractors, and they do not exclude
the irrelevant categories in the source domain, which might cause the negative
transfer. Moreover, with the increasing of the ImageNet model, we can extract
better features from plant images, which lead to the high performance of the
NASNetLarge-ACL model. In addition, we conduct an ablation study in which
we train the best NASNetLarge-ACL model without the shared categories se-
lection (NASNetLarge-ACL −W). The results from all three tasks are lower
than NASNetLarge-ACL model, which indicates the shared categories selection
is useful in our model. These experiments demonstrate the efficiency of the ACL
model in finding the invariant-features of two domains.
    In the final stage of the PlantCLEF 2020 Challenge, our solutions are eval-
uated by the organizers using the test domain data. As shown in Tab. 3, our
method achieved mean reciprocal rank (MRR) of 0.032 in the whole test domain,
and MRR of 0.016 in the subset of the test domain, and our method places 4th
in the contest.

5     Discussion

There are two compelling advantages of the ACL model. First, we consider the
adversarial consistent learning paradigm, which maintains the domain-invariant
features from the source domain to the target domain and vice versa. Secondly,
         Table 3: MRR on PlantCLEF 2020 dataset for test domain [5]
                    Team          Full test set      Sub-set of the test set
            ITCR PlantNet            0.180                   0.052
                 Neuon AI            0.121                   0.107
                    UWB              0.039                   0.007
                LU(ours)             0.032                   0.016
                  Domain             0.031                   0.015
                    To Be            0.028                   0.016
                     SSN             0.008                   0.003

we reduce the weight of irrelevant categories in the source domain, which elimi-
nates the negative transfer during the training. Although the performance of our
model is better than several baseline methods, the highest accuracy of the photo
domain is less than 10%, which illustrates that the transfer learning ability in
the real world image is lower. One underlying reason is that PlantCLEF 2020
Challenge has difficult datasets—that there are significant differences between
herbarium domain and photo domain, as shown in Fig. 1. Another reason is
caused by the weakness of our model since we only train deep features instead
of raw images to reduce the computational requirements; some features might
be ignored during the training. The performance of the ACL model could be
improved if we train the architecture with raw images.

6    Conclusion
In this paper, we propose an adversarial consistent learning network on par-
tial domain adaptation termed (ACL) to overcome limitations in finding proper
shared categories and guaranteeing the feature consistency of two domains. Our
model is optimized via minimizing a three-component loss function. As each
component of our ACL model, explicit domain-invariant features are maintained
through such a cross-domain training scheme. Experimental results demonstrate
our proposed ACL model yields promising results on the PlantCLEF 2020 Chal-
lenge.

References
 1. Ben-David, S., Blitzer, J., Crammer, K., Kulesza, A., Pereira, F., Vaughan, J.W.:
    A theory of learning from different domains. Machine Learning 79(1-2), 151–175
    (2010)
 2. Cao, Z., Ma, L., Long, M., Wang, J.: Partial adversarial domain adaptation. In:
    Proceedings of the European Conference on Computer Vision (ECCV). pp. 135–
    150 (2018)
 3. Chen, J., Wu, X., Duan, L., Gao, S.: Domain adversarial reinforcement learning
    for partial domain adaptation. arXiv preprint arXiv:1905.04094 (2019)
 4. Ghifary, M., Kleijn, W.B., Zhang, M.: Domain adaptive neural networks for object
    recognition. In: Pacific Rim international conference on artificial intelligence. pp.
    898–904. Springer (2014)
 5. Goëau, H., Bonnet, P., Joly, A.: Overview of lifeclef plant identification task 2020.
    In: CLEF working notes 2020, CLEF: Conference and Labs of the Evaluation
    Forum, Sep. 2020, Thessaloniki, Greece. (2020)
 6. Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair,
    S., Courville, A., Bengio, Y.: Generative adversarial nets. In: Advances in neural
    information processing systems. pp. 2672–2680 (2014)
 7. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In:
    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
    (CVPR). pp. 770–778 (2016)
 8. Joly, A., Goëau, H., Kahl, S., Deneu, B., Servajean, M., Cole, E., Picek, L., Ruiz
    De Castañeda, R., é, Lorieul, T., Botella, C., Glotin, H., Champ, J., Vellinga, W.P.,
    Stöter, F.R., Dorso, A., Bonnet, P., Müller, H.: Overview of lifeclef 2020: a system-
    oriented evaluation of automated species identification and species distribution
    prediction. In: Proceedings of CLEF 2020, CLEF: Conference and Labs of the
    Evaluation Forum, Sep. 2020, Thessaloniki, Greece. (2020)
 9. Liu, H., Long, M., Wang, J., Jordan, M.: Transferable adversarial training: A gen-
    eral approach to adapting deep classifiers. In: International Conference on Machine
    Learning. pp. 4013–4022 (2019)
10. Long, M., Cao, Z., Wang, J., Jordan, M.I.: Conditional adversarial domain adap-
    tation. In: Advances in Neural Information Processing Systems. pp. 1647–1657
    (2018)
11. Pan, S.J., Yang, Q., et al.: A survey on transfer learning. IEEE Transactions on
    knowledge and data engineering 22(10), 1345–1359 (2010)
12. Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.A.: Inception-v4, inception-resnet
    and the impact of residual connections on learning. In: Thirty-First AAAI Confer-
    ence on Artificial Intelligence (2017)
13. Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the incep-
    tion architecture for computer vision. In: Proceedings of the IEEE conference on
    computer vision and pattern recognition. pp. 2818–2826 (2016)
14. Tzeng, E., Hoffman, J., Saenko, K., Darrell, T.: Adversarial discriminative domain
    adaptation. In: Proceedings of the IEEE Conference on Computer Vision and Pat-
    tern Recognition. pp. 7167–7176 (2017)
15. Zhang, Y., Allem, J.P., Unger, J.B., Cruz, T.B.: Automated identification of
    hookahs (waterpipes) on instagram: an application in feature extraction using con-
    volutional neural network and support vector machine classification. Journal of
    Medical Internet Research 20(11), e10513 (2018)
16. Zhang, Y., Davison, B.D.: Modified distribution alignment for domain adaptation
    with pre-trained inception resnet. arXiv preprint arXiv:1904.02322 (2019)
17. Zhang, Y., Davison, B.D.: Impact of imagenet model selection on domain adapta-
    tion. In: Proceedings of the IEEE Winter Conference on Applications of Computer
    Vision Workshops. pp. 173–182 (2020)
18. Zhang, Y., Tang, H., Jia, K., Tan, M.: Domain-symmetric networks for adversarial
    domain adaptation. In: Proceedings of the IEEE Conference on Computer Vision
    and Pattern Recognition. pp. 5031–5040 (2019)
19. Zhang, Y., Xie, S., Davison, B.D.: Transductive learning via improved geodesic
    sampling. In: Proceedings of the 30th British Machine Vision Conference (2019)
20. Zhang, Y., Davison, B.D.: Domain adaptation for object recognition using subspace
    sampling demons. Multimedia Tools and Applications (2020)
21. Zoph, B., Vasudevan, V., Shlens, J., Le, Q.V.: Learning transferable architectures
    for scalable image recognition. In: Proceedings of the IEEE conference on computer
    vision and pattern recognition. pp. 8697–8710 (2018)