<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Improving Unsupervised Domain Adaptation with Representative Selection Techniques</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>I-Ting Chen</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Hsuan-Tien Lin</string-name>
          <email>htlin@csie.ntu.edu.tw</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science and Information Engineering, National Taiwan University</institution>
          ,
          <addr-line>Taipei</addr-line>
          ,
          <country country="TW">Taiwan</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Domain adaptation is a technique that tackles the dataset shift scenario, where the training (source) data and the test (target) data can come from different distributions. Current research works mainly focus on either the covariate shift or the label shift settings, each making a different assumption on how the source and target data are related. Nevertheless, we observe that neither of the settings can perfectly match the needs of a real-world bio-chemistry application. We carefully study the difficulties encountered by those settings on the application and propose a novel method that takes both settings into account to improve the performance on the application. The key idea of our proposed method is to select examples from the source data that are similar to the target distribution of interest. We further explore two selection schemes, the hard-selection scheme that plugs similarity into a nearest-neighbor style approach, and the soft-selection scheme that enforces similarity by soft constraints. Experiments demonstrate that our proposed method not only achieves better accuracy for the bio-chemistry application but also shows promising performance on other domain adaptation tasks when the similarity can be concretely defined.</p>
      </abstract>
      <kwd-group>
        <kwd>Domain Adaptation</kwd>
        <kwd>Dataset Shift</kwd>
        <kwd>Covariate Shift</kwd>
        <kwd>Label Shift</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Machine learning has been a high-profile topic and succeeded in various kinds
of real-world tasks due to the vast amount of labeled data. However, collecting
well-labeled data from scratch is time and labor-consuming. Therefore, in many
applications [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ], we hope that the model trained on one task could generalize
to another related task. For example, consider an object recognition task that
tries to distinguish ten different products based on their images on e-commerce
websites. It is relatively easy to crawl and gather well-labeled data from the
websites to train a classifier. After training the classifier, we may encounter
another task where we hope that the users can easily recognize a product by
taking pictures with their smartphones. Given that it is harder to gather
welllabeled data from the users to train a classifier, we hope to reuse the data
© 2020 for this paper by its authors. Use permitted under CC BY 4.0.
and/or the classifier obtained in the former task to tackle the latter one. Owing
to the differences in brightness, in angle, and in picture quality between images
taken from the two tasks, the same-distribution assumption on the training and
test data may not hold. This scenario is called dataset shift [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ], where the
training(source) and test(target) data can come from different distributions.
      </p>
      <p>A family of techniques that aim at tackling the dataset shift problem is
domain adaptation (DA). In this work, we try to solve the more challenging
unsupervised domain adaptation (UDA) problem, where we can only access the
labeled source data and unlabeled target data in the training phase. The goal of
UDA is to learn a model from these data and to achieve good performance on
the target domain. Intuitively, learning under UDA is not possible if the source
and target domains do not share any properties. Previous works on UDA thus
make assumptions about the properties shared by the two domains and design
algorithms based on the assumptions. Two major assumptions, covariate and
label shift, have been considered separately in previous research works.</p>
      <p>
        The assumption of covariate shift considers the mismatch of feature
distribution between the source and target domains. Furthermore, it is assumed that the
labels of both domains are drawn from the same conditional distribution given
the features. There are two main families of methods designed under this
assumption, namely, the re-weighting method [
        <xref ref-type="bibr" rid="ref20 ref25 ref8">8, 20, 25</xref>
        ], and the adversarial training
method [
        <xref ref-type="bibr" rid="ref12 ref13 ref19 ref3">3, 12, 13, 19</xref>
        ]. They solve the same problem from different perspectives:
Re-weighting based method estimates the difference in feature distributions
between the source and target domains, whereas an adversarial training method
aligns those distributions directly. The label shift assumption refers to the change
of label distributions between the source and target domain while assuming that
the features of both domains are drawn from the same conditional distribution
given the label. Previous works focus on utilizing re-weighting [
        <xref ref-type="bibr" rid="ref1 ref11 ref26">1, 11, 26</xref>
        ] to solve
this task. They estimate the difference between source and target domain label
distributions.
      </p>
      <p>
        Most recent works extend from the two settings and demonstrate promising
performance. However, motivated by a real-world bio-chemistry application, we
find that current domain adaptation methods designed for only one of the two
assumptions cannot cope with all the application needs. We carefully examine the
application and find it comes with the shift of label distribution that can be easily
observed from the polarity of label distribution. However, the assumption that
the conditional distribution given label does not seem to be the same, violating
label shift assumption. Accordingly, we must use covariate shift assumption to
model this application. Here comes the problem: If the application is tackled with
the covariate shift assumption using adversarial training, the label distribution
should be the same on the aligned data, violating the polarity property of the
dataset. Therefore, we conclude that this application requires considering both
the covariate shift and label shift properly. In this paper, we study how to follow
the covariate shift assumption while taking the possible label shift into account
for the bio-chemistry application. [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ] also tries to tackle the same issue. They
use adversarial training while imposing the constraint on the model. Therefore,
the model would not perfectly align the distribution of source and target data.
      </p>
      <p>Inspired by some intuitive toy examples, we find that selecting
representative examples from the source data allows us to construct a similar-feature and
similar-label subset of the source data that resolves both covariate shift and label
shift. If the feature space implicitly encodes the distance between two features
with physical meaning, we can construct the subset through the nearest-neighbor
algorithm by considering the distance as the similarity measure. Based on this
finding, we propose two methods, Hard/Soft Distance-Based Selection, to handle
different situations. The hard selection directly uses the subset of the source data
we construct to train the model, whereas the soft selection enforces similarity
on the subset by adding a soft constraint.</p>
      <p>Experiments show that our methods successfully capture the structural
information and utilize the distance-based similarity and thus mitigate the impact
from the label shift in the application. To test the performance of our methods in
high-dimension space (e.g., image space), we also do experiments on the
benchmark dataset (digits). Further, we extend our methods to tackle this scenario
and have promising experimental results. Finally, we discuss what are the good
situations to utilize our methods, through a simple noisy source data experiment.</p>
      <p>Our contributions of this thesis include
1. We carefully study the difficulties encountered by concurrent UDA methods
on a real-world application.
2. We propose two methods based on representative selection to overcome the
difficulties.
3. We study how the proposed methods can be extended in different scenarios.
2
2.1</p>
    </sec>
    <sec id="sec-2">
      <title>Background</title>
      <sec id="sec-2-1">
        <title>Notation and Problem Setup</title>
        <p>We consider a K-way classification task and let X and Y represent the random
variables for the feature and label respectively, where Y = {0, . . . , K − 1}. We
denote the joint distributions for the source and target domains as PS (X, Y )
and PT (X, Y ). The marginal distributions of X and Y in the source domain
are defined as PS (X) and PS (Y ). Similarly, PT (X) and PT (Y ) represent the
marginal distributions of X and Y in the target domain. The conditional label
distributions in the two domains are denoted by PS (Y |X), PT (Y |X). PS (X|Y )
and PT (X|Y ) stand for the conditional feature distribution in the two domains.</p>
        <p>We consider the UDA setting in this thesis. There exists a set of labeled data
DS = {(xi, yi)}in=1 in the source domain, where each instance (xi, yi) is drawn
i.i.d. from PS (X, Y ). In the target domain, we have only a set of unlabeled data
m
DT = {x˜j }j=1, where each instance x˜j is drawn i.i.d. from PT (X).</p>
        <p>
          Our goal is to train a classifier f : X → Y , based on DS and DT and then
predict the corresponding labels of DT . Note that there are labels for the target
domain, but only used for testing.
DA has been studied in various fields, such as natural language processing for
sentiment analysis [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ], health care for disease diagnosis [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ], and computer
vision [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] for object detection [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] and semantic segmentation [
          <xref ref-type="bibr" rid="ref27">27</xref>
          ]. Also, there are
many types of DA to conquer different scenarios. For instance, semi-supervised
domain adaptation where a small amount of labeled target domain data is
provided is a common setting [
          <xref ref-type="bibr" rid="ref18 ref23">18, 23</xref>
          ]. In this paper, we focus on UDA [
          <xref ref-type="bibr" rid="ref16 ref9">9, 16</xref>
          ] and
make a comparison between two common settings.
        </p>
        <p>
          Most UDA researchers put emphasis on covariate shift setting, which assumes
that PS (X) is different from PT (X). Among these methods, we can roughly
divide them into two main approaches. One is the re-weighting method. The
goal of this kind of method is to estimate the importance weight PT (X)/PS (X)
for each source data. After obtaining the importance weights, they can further
do importance-weighted empirical risk minimization to adapt their model to the
target domain. Different methods estimate the importance weight differently. [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ]
utilizes the Kullback-Leibler divergence and some [
          <xref ref-type="bibr" rid="ref25 ref8">8, 25</xref>
          ] borrow the concept of
kernel mean matching [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] to estimate the weight. The other method trying to deal
with covariate shift is the adversarial training method [
          <xref ref-type="bibr" rid="ref12 ref13 ref19 ref3">3, 12, 13, 19</xref>
          ]. Inspired by
the Generative Adversarial Network (GAN) [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ], the adversarial training method
tries to learn a disentangle embedding by making use of discriminator. With these
disentangle embedding features that are domain invariant, they can reduce the
distribution difference between the source and target domains under covariate
shift setting.
        </p>
        <p>
          Another setting named label shift is assumed that PS (Y ) 6= PT (Y ). In this
setting, previous works mostly utilize the re-weighting method to solve the
problem. But different from covariate shift, they try to estimate the importance
weight PT (Y )/PS (Y ). The concept of kernel mean matching can spread to label
shift setting [
          <xref ref-type="bibr" rid="ref26">26</xref>
          ]. However, time-consuming is the drawback of the re-weighting
based method, because it requires calculating the inversion of the kernel matrix
which would be dependent on data size. Therefore, it is hard to extend to large
scale scenarios. Recently, [
          <xref ref-type="bibr" rid="ref1 ref11">1, 11</xref>
          ] proposes the method by exploiting an arbitrary
classifier to estimate the importance weights and thus can easily be applied to
large scale scenarios.
        </p>
        <p>
          Motivated by a real-world application, we find that current methods cannot
successfully tackle this application which contains the properties from covariate
and label shift. Therefore, how to promote domain adaptation methods to handle
more general cases is essential. Recently, [
          <xref ref-type="bibr" rid="ref24">24</xref>
          ] raises a problem that the adversarial
training method would cause a bad generalization to the target domain when
there exists label shift simultaneously, and proposed the method to handle this.
3
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Motivation</title>
      <p>
        Commissioned by the Industrial Technology Research Institute (ITRI), we
initiated a research project on predicting compound-protein interaction (CPI), which
is a vital topic on drug discovery [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. Briefly speaking, given a pair of compound
and protein data, the CPI prediction task identifies whether the pair comes with
chemical interaction or not. That is, the task is a classic binary classification
problem. Our collaborators at ITRI provides us with the CheMBL dataset that
contains 645461 pairs of (compound, protein), with a binary label for each pair.
Note that each example was generated according to the earlier work [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ] to
obtain a 300-dimensional feature. Each feature is formed by concatenating a
200dimension compound feature and a 100-dimension protein feature. Additionally,
they also indicated 3916 data pairs that are relative to Chinese medicine, named
Herb. They hope to get a model having good accuracy on Chinese medicine
data. The main difficulty they confront is that labeled Herb data is relatively
less compared with CheMBL data. However, doing the experiments to label the
data is time-consuming and burning up a lot of money. How to take advantage
of a bunch of labeled CheMBL- (CheMBL - Herb) data becomes important in
this situation.
      </p>
      <p>
        We plot the scatter diagram through t-SNE [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] to analyze the dataset. From
Figure 1, we can find the distribution of CheMBL- is different from the one of
Herb. This figure demonstrates a typical dataset shift scenario. Therefore, we
formulate the whole problem as UDA to meet the situation where gathering
labeled target data is difficult. As stated above, we have to assume that the
source domain and target domain share some properties. Thus, we consider two
main assumptions below.
In this setting, it assumes the input distributions change between source and
target domain (PS (X) 6= PT (X)) while the conditional label distributions remain
invariant (PS (Y |X) = PT (Y |X)). Figure 1 shows that our dataset meets these
assumptions so we do the experiments under this setting first. Early works try to
estimate the difference between PS (X) and PT (X). We call this kind of method
re-weighting. Recently, domain adaptation researchers use adversarial training,
utilizing the concept of GAN, to learn a shared transform function E which maps
source and target domain data into the same embedding space reducing the
distribution difference. We simply do the experiment utilizing Domain Adversarial
Neural Network (DANN) [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], a classical adversarial training method, There
exists three main components inside the architecture: (i) encoder, (ii) classifier,
(iii) discriminator. The encoder E is responsible for mapping the original data
to the embedding space Z, where E : X → Z and try to fool the discriminator so
that it can not distinguish between the source and target embedding. The goal
of the classifier is to predict well on the source embedding data C : Z → {0, 1}K .
What Discriminator do is to verify correctly on the source and target embedding
generating from Encoder E : Z → {0, 1}. The overall optimization is
min max Lcls(C, E, DS ) + Ladv(D, E, DS , DT )
      </p>
      <p>E,C D
= 1 Xn [yiT log C(E(xi))] + 1 Xn log[D(E(xi))] + 1 Xm log[1 − D(E(x˜j ))]
n i=1 n i=1 m j=1
where Lcls represents a cross-entropy loss for the source data and Ladv is the
objective function for encoder and discriminator.</p>
      <p>We also train a model on the source domain and directly test it on the target
domain, which is called source-only. target-only means that we train the model
on training target data then evaluate it on testing target data. Note that, we
choose weighted accuracy as the evaluation criterion on Herb dataset because it
is an imbalanced dataset.</p>
      <p>In Figure 2, we notice that of DANN is worse than source-only. Confounding
by the result, we dig deeper to analyze the property of this dataset. One
possible reason is if we let PS (E(X)) = PT (E(X)), we can derive PS (Y ) = PT (Y )
based on covariate shift assumption. However, we find that the positive to
negative ratio of the number of data is 2:1 in the source domain. In the target
domain, the corresponding ratio is 1:4. This finding shows that the label
distribution of the source domain is different from the one of the target domain, i.e.,
PS (Y ) 6= PT (Y ). In this circumstance, if we insist on aligning source and target
distribution, we may have bad accuracy. Based on this result, we argue that
current adversarial methods designed under covariate shift assumption cannot
handle the situation where PS (Y ) is also not equal to PT (Y ).
It makes the following assumptions. First, the label distribution changes from
source to target (i.e. PS (Y ) 6= PT (Y )). Then it further assumes that the
conditional feature distributions stay the same (PS (X|Y ) = PT (X|Y )). Recent works
deal with this problem through re-weighting and do importance-weighted
empirical risk minimization after getting the weights.</p>
      <p>PT (X, Y )
Ex,y∼PT (X,Y ) `(y, h(x)) = Ex,y∼PS(X,Y ) PS (X, Y ) `(y, h(x))</p>
      <p>
        PT (Y )
= Ex,y∼PS(X,Y ) PS (Y ) `(y, h(x))
= Ex,y∼PS(X,Y ) w(y)`(y, h(x)).
(1)
whereh stands for a classifier: x → {0, 1}K , ` represents the loss function: y ×
y → [
        <xref ref-type="bibr" rid="ref1">0, 1</xref>
        ] and w(y) denotes the importance weight vector which stands for
PT (Y )/PS (Y ).
      </p>
      <p>
        We take Regularized Learning under label Shifts (RLLS) [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] as our baseline.
The results are reported in Table 1. The table shows RLLS couldn’t estimate well
on the importance weight. To analyze what takes place in the experiment and
cause this bad estimation, we plot figures displaying the source and target
distribution separately to observe. According to Figure 1, we are able to confirm that
the conditional input distributions are quite different, i.e., PS (X|Y ) 6= PT (X|Y ).
This observation breaks the label shift assumption.
3.3
      </p>
      <sec id="sec-3-1">
        <title>C2H Dataset</title>
        <p>From the previous discussion, we found that current domain adaptation methods
could not cover all various dataset shift cases, e.g., our real-world dataset. We
first formally construct the C2H dataset for this particular domain adaptation
task. It comprises CheEMBL- and Herb represented as source and target domain
respectively. The source domain includes 641,545 data, and the target domain
contains 3,916 data. Specifically, both input and label distributions vary between
source and target.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Proposed Method</title>
      <p>Based on the observations in section 3.1, if we stop at nothing to use adversarial
training to align the source and target data distributions, we may finally get
an unexpected bad performance on the target domain due to neglecting that
there could also exist label shift at the same time. We illustrate the toy example
in Figure 4 to demonstrate this finding. In Figure 4(a), if we use adversarial
training to align two distributions and do not take PS (Y ) 6= PT (Y ) into
consideration, we would probably have bad accuracy on target data. Figure 4(b)
shows that when the embedding space has strong physical meaning, selecting
the source data which is close to target data directly could get some benefit
on classification. That is, we can regard the distance between two data as an
similarity measurement and then accomplish domain adaptation through
selection technique. We use a toy example to demonstrate our idea in the following
section.
4.1</p>
      <sec id="sec-4-1">
        <title>Representatives Selection</title>
        <p>In this section, we demonstrate that a simple selection technique could
accomplish UDA in Figure 4. Figure 5(a) depicts the source-only classifier. We can
see that directly applying the source-only classifier to the target domain could
have a bad performance. In Figure 5(b), choosing the source data which is close
to target data and utilizing it could get a good classifier on the target domain,
i.e., achieve domain adaptation. Therefore, when the distance can represents
similarity, simple selection technique can improve the performance in UDA task.</p>
        <p>Furthermore, we actually implicitly make the continuity assumption, i.e., the
points which are close to each other are more likely to share the same label. If
the assumption holds, we can achieve domain adaptation through selecting the
target-like source data which is close to target data. We define the target-like
source data as representatives in this paper. Based on the continuity assumption,
we further propose two selection techniques to achieve domain adaptation.
(a) adversarial training</p>
        <p>(b) data selection
The first method is based on K-Nearest Neighbor (KNN), a classic lazy
algorithm. KNN takes euclidean distance as a similarity measurement and
collaborates with the assumption that for any data point and its neighborhood must
belong to the same class, i.e., continuity assumption. We feed the source data
into KNN as training data first and then input all the target data to get the
corresponding representatives. We let K = 1 for simplicity. The procedure can
be formulated from a different perspective as
rep = {sj }jm=1,</p>
        <p>DS
rep denote the representatives we choose. After gathering the
represenwhere DS
tatives, we can use them as a new source dataset to train a model and apply it
to the target domain.
However, HS could aggravate bad performance when there exist two problems.
First, the continuity assumption could be wrong. For example, in high
dimensional space like image space, directly take distance as a similarity measurement
to select the representatives may be a catastrophe. In this kind of space, the
data sparsity problem exists naturally. We may face that the distance does not
represent a sort of similarity. Second, if the source data is noisy, choose the
representatives by distance may bring a lot of biases into the model and thus hurt
the performance. Thus, to overcome these two problems, we propose the second
method called SS-β. The soft means we do not drop the rest of the source data
after selecting the representatives. Instead, we add the following constraint into
the minimization objective. Supposed we train a neural net N as a classifier with
L layers, we add the following constraint on the k-th hidden layer
j=1
min 1 Xm ||N k(sj ) − N k(DSrejp)||22.</p>
        <p>f m
Via this term, we enforce that the close data pair in original space must be close
in embedding space. The overall objective can be
i=1</p>
        <p>j=1
n
min 1 X `(N (xi), yi) + β
N n
1 Xm ||N k(x˜j ) − N k(sj )||22,
m
where β is a hyperparameter to control the importance of this constraint and `
represents a cross entropy loss.
5</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Experiments</title>
      <p>In this section, we evaluate our proposed methods on three parts: (i) C2H, (ii)
Noisy C2H and (iii) Digits. For part (i), we want to show simple selection based
methods can improve the performance in our C2H dataset. In part (ii), we test
our methods in the noisy source domain and discuss what is the best
circumstances for our methods to be used. To evaluate the scalability of our methods
to high-dimension space, we do the experiments on digits dataset and show the
results in part (iii)</p>
      <p>We name our methods as follows: (1) HS: use the representatives selected by
Hard Distance-Based Selection to train the model and then direct apply it to the
target domain. (2) SS-β: train the model on the source domain and add the Soft
Distance-Based Selection constraint which is controlled by the hyperparameter
β to restrict the influence of this term. For each result, we repeat 5 times trials
with different random seeds and show the average on the table. We also indicate
the standard deviation to demonstrate the stability for each method.
5.1</p>
      <sec id="sec-5-1">
        <title>C2H Dataset Evaluation</title>
        <p>
          We run the following methods as our competitors (i) KMM-γ [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ], the classic
re-weighting method and the γ represents the parameter in the Gaussian kernel,
(ii) DANN [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ], (iii) fDANN-β and sDANN-β proposed by [
          <xref ref-type="bibr" rid="ref24">24</xref>
          ] which implicitly
deals with the same problem as we do. β is a restrictive factor that forces the
model not to perfectly align source and target data. source-only and target-only
are also placed as the baselines. Note that, we subsample 20000 data points from
ChEMBL- for efficient evaluation. We all use Adam as the optimizer with 512
and 64 batch size for the source and target data respectively, set the learning
rate=0.0001. For DANN-like models, it is noteworthy that encoder,
discriminator, and classifier have their optimizer with different weight decay (0.01, 0.001,
0, respectively). Figure 6 and Figure 7 shows the architecture of our methods
and DANN-like methods respectively.
        </p>
        <p>Table 2 shows that HS has an improvement compared with source-only and
other methods in this task. We can see that there is a big performance gap
between DANN-like methods and ours. The original dataset already has
interpretable and discriminative features. Therefore, aligning the distributions
aggressively would lead to declining performance, not to mention label shift would
deteriorate the performance too. fDANN and sDANN are expected to somewhat
ease the impact of label shift by restricting the model not to align the
distribution perfectly, but still have bad performance due to destroying the good feature
embedding. In Table 2, we can see that re-weighting methods are competitive
to HS. HS can basically be regarded as a re-weighting method that only assign
the weight to the representatives and others assign 0 weight. However, HS is
computational efficiency because we don’t need to calculate the kernel matrix
that KMM should do. We just run the KNN algorithm. We can also see that our
SS-β perform poorly because it suffers from difficult hyperparameter tuning.</p>
        <p>Furthermore, we do the experiment to see whether accuracy under the
different number of neighbors k would change. The results plot in Figure 8. We can
find that under different k the accuracy has slightly different. Therefore, we do
not have to worry about the parameter k when using our methods.
We want to verify that SS-β could handle the situation where the representatives
could be disruptive due to the noisy source data. Therefore, we create a noisy
C2H dataset and try to choose the better method in this scenario. First, we
add Gaussian noise with 0 mean and 0.01 variance into each feature dimension
independently for every ChEMBL- data point, while Herb dataset remains the
same.</p>
        <p>The experiment results are listed in Table 3. From the table, we can see that
HS perform poorly than SS-β. As expected, in the noisy source scenario, if we
over-rely on the close source dataset selected by HS, we would suffer from the
impact of noisy data. In this circumstance, choose SS-β can mitigate the noisy
data effect by careful hyperparameter tuning.</p>
        <p>
          Briefly, if we know in advance that the data has strong physical meaning in
your task, use the hard version would gain much more benefit without the effort
for tuning the parameter. On the contrary, in the task where source data could
have some noises, choose soft version selection and coupe with careful parameter
search can avoid over-confidence on the fake representative.
To extend to a more severe shift scenario, we follow the procedure of previous
work [
          <xref ref-type="bibr" rid="ref24">24</xref>
          ] to artificially generate the shift datasets. In brief, the source domain
keeps class-balanced and the shift part comes from the target domain. To yield
the target label distribution shift, we subsample target data from half of the
classes in a uniform sampling manner. Therefore, following the procedure, we
obtain a covariate shift dataset with severe label shift. We consider USPS and
MNIST datasets, so there would be two tasks: (i) USPS → MNIST and (ii)
MNIST → USPS. For each task, we do the following experiments. (a) [
          <xref ref-type="bibr" rid="ref1 ref2 ref3 ref4">0-4</xref>
          ] shift:
target data only sample from class 0-4. (b) [
          <xref ref-type="bibr" rid="ref5 ref6 ref7 ref8 ref9">5-9</xref>
          ] shift: target data only sample
from class 5-9. (c) [
          <xref ref-type="bibr" rid="ref1 ref2 ref3 ref4 ref5 ref6 ref7 ref8 ref9">0-9</xref>
          ] no shift: sample data from all classes. Note that, we
subsample 2000 data from MNIST and subsample 1800 data from USPS according
to given distribution (shift or no shift), resize all the image to 28x28, convert
each value into [
          <xref ref-type="bibr" rid="ref1">0, 1</xref>
          ] and do channel-wise normalization with 0.5 mean and 0.5
standard deviation. Figure 9 and Figure 10 depict the model architectures.
        </p>
        <p>
          For task (i), the results are listed in Table 4. From the table, we can discover
that fDANN outperforms other methods on the severe shift setting (i.e. [
          <xref ref-type="bibr" rid="ref1 ref2 ref3 ref4">0-4</xref>
          ] Shift
and [
          <xref ref-type="bibr" rid="ref5 ref6 ref7 ref8 ref9">5-9</xref>
          ] Shift). As we expected, our distance-based methods perform ordinary
or even worsen because the features do not have great physical meanings. But
we can also find out that fDANN and sDANN are unstable with high standard
deviations. Therefore, it is not certain whether applying fDANN and sDANN
for a real-world application is suitable.
        </p>
        <p>
          For task (ii), Table 5 shows that fDANN still does well in severe shift settings.
However, to our surprise, SS-1 has a great improvement on [
          <xref ref-type="bibr" rid="ref1 ref2 ref3 ref4">0-4</xref>
          ] Shift. We further
investigate this phenomenon by plotting the source and target distributions in
Figure 11. We can find that class 0-4 from both MNIST and USPS have great
discriminability because they separate obviously. Additionally, the source data
with the labels among class 0-4 is relatively close to the corresponding target
data. Therefore, our method can have great performance in [
          <xref ref-type="bibr" rid="ref1 ref2 ref3 ref4">0-4</xref>
          ] Shift.
        </p>
        <p>
          Even though our methods perform well only on [
          <xref ref-type="bibr" rid="ref1 ref2 ref3 ref4">0-4</xref>
          ] Shift, the performance
of our methods on other tasks is still worse than other methods. Therefore,
obtaining a feature embedding with physical meaning is crucial before applying
our methods. We try three different ways to get an embedding: (1) PCA:
concatenate both the source and target data and then run the method to obtain the
features, (2) extractor: build a model from the source domain first, then use it
as feature extractor on the source and target data, (3) ImageNet: use ImageNet
pre-trained model as a feature extractor. After getting all feature embedding,
we then apply our methods on these embedding.
        </p>
        <p>Table 6 and Table 7 show the results. We can see that our method well
generalizes to the target domain, under the feature embedding generating by
the extractor method. Using the features generated by PCA to run our methods
has bad performance on each task. This result shows PCA lets the target data
lose a lot of important information. The ImageNet method performs poorly,
either. Because it was trained on a non-digits dataset, the model can not extract
the features which are important for digits classification.
Motivated by the real-world bio-chemistry application, we indicate the problem
that covariate shift and label shift could exist at the same time. We propose
HS and SS-β which can handle this situation while other recent UDA methods
would suffer from. We also extend our methods to image space which is
highdimensional. Our methods are mainly based on the similarity, that is, how to
get a feature space with strong physical meaning would be a big problem. A
possible extension of this work is regarding our methods as a complement for
current domain adaptation methods.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Azizzadenesheli</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Anandkumar</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Regularized learning for domain adaptation under label shifts</article-title>
          .
          <source>In: International Conference on Learning Representations</source>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sakaridis</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dai</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gool</surname>
            ,
            <given-names>L.V.</given-names>
          </string-name>
          :
          <article-title>Domain adaptive faster R-CNN for object detection in the wild</article-title>
          . CoRR abs/
          <year>1803</year>
          .03243 (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Ganin</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ustinova</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ajakan</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Germain</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Larochelle</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Laviolette</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Marchand</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lempitsky</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>Domain-adversarial training of neural networks</article-title>
          .
          <source>Journal of Machine Learning Research</source>
          <volume>17</volume>
          (
          <issue>1</issue>
          ),
          <fpage>2096</fpage>
          -
          <lpage>2030</lpage>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Glorot</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bordes</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bengio</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Domain adaptation for large-scale sentiment classification: A deep learning approach</article-title>
          .
          <source>In: Proceedings of the 28th international conference on machine learning (ICML-11)</source>
          . pp.
          <fpage>513</fpage>
          -
          <lpage>520</lpage>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Goodfellow</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pouget-Abadie</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mirza</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xu</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Warde-Farley</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ozair</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Courville</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bengio</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Generative adversarial nets</article-title>
          .
          <source>In: Advances in neural information processing systems</source>
          . pp.
          <fpage>2672</fpage>
          -
          <lpage>2680</lpage>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Gretton</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Borgwardt</surname>
            ,
            <given-names>K.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rasch</surname>
            ,
            <given-names>M.J.</given-names>
          </string-name>
          , Scho¨lkopf,
          <string-name>
            <given-names>B.</given-names>
            ,
            <surname>Smola</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
          :
          <article-title>A kernel two-sample test</article-title>
          .
          <source>Journal of Machine Learning Research 13(Mar)</source>
          ,
          <fpage>723</fpage>
          -
          <lpage>773</lpage>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Hoffman</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tzeng</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Park</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhu</surname>
            ,
            <given-names>J.Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Isola</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Saenko</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Efros</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Darrell</surname>
          </string-name>
          , T.:
          <article-title>CyCADA: Cycle-consistent adversarial domain adaptation</article-title>
          .
          <source>In: Proceedings of the 35th International Conference on Machine Learning</source>
          . vol.
          <volume>80</volume>
          , pp.
          <fpage>1989</fpage>
          -
          <lpage>1998</lpage>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gretton</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Borgwardt</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          , Scho¨lkopf,
          <string-name>
            <given-names>B.</given-names>
            ,
            <surname>Smola</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.J.:</surname>
          </string-name>
          <article-title>Correcting sample selection bias by unlabeled data</article-title>
          .
          <source>In: Advances in Neural Information Processing Systems</source>
          <volume>19</volume>
          , pp.
          <fpage>601</fpage>
          -
          <lpage>608</lpage>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Kang</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jiang</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hauptmann</surname>
            ,
            <given-names>A.G.</given-names>
          </string-name>
          :
          <article-title>Contrastive adaptation network for unsupervised domain adaptation</article-title>
          .
          <source>In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)</source>
          (
          <year>June 2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Keiser</surname>
            ,
            <given-names>M.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Setola</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Irwin</surname>
            ,
            <given-names>J.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Laggner</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Abbas</surname>
            ,
            <given-names>A.I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hufeisen</surname>
            ,
            <given-names>S.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jensen</surname>
            ,
            <given-names>N.H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kuijier</surname>
            ,
            <given-names>M.B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Matos</surname>
            ,
            <given-names>R.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tran</surname>
            ,
            <given-names>T.B.</given-names>
          </string-name>
          , et al.:
          <article-title>Predicting new molecular targets for known drugs</article-title>
          .
          <source>Nature</source>
          <volume>462</volume>
          (
          <issue>7270</issue>
          ),
          <fpage>175</fpage>
          -
          <lpage>181</lpage>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Lipton</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>Y.X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Smola</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Detecting and correcting for label shift with black box predictors</article-title>
          .
          <source>In: Proceedings of the 35th International Conference on Machine Learning</source>
          . vol.
          <volume>80</volume>
          , pp.
          <fpage>3122</fpage>
          -
          <lpage>3130</lpage>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Long</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>CAO</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jordan</surname>
            ,
            <given-names>M.I.</given-names>
          </string-name>
          :
          <article-title>Conditional adversarial domain adaptation</article-title>
          .
          <source>In: Advances in Neural Information Processing Systems</source>
          <volume>31</volume>
          , pp.
          <fpage>1640</fpage>
          -
          <lpage>1650</lpage>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Long</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhu</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jordan</surname>
            ,
            <given-names>M.I.</given-names>
          </string-name>
          :
          <article-title>Deep transfer learning with joint adaptation networks</article-title>
          .
          <source>In: Proceedings of the 34th International Conference on Machine Learning</source>
          . vol.
          <volume>70</volume>
          , pp.
          <fpage>2208</fpage>
          -
          <lpage>2217</lpage>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>van der Maaten</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hinton</surname>
          </string-name>
          , G.:
          <article-title>Visualizing high-dimensional data using t-sne</article-title>
          .
          <source>Journal of Machine Learning Research</source>
          <volume>9</volume>
          ,
          <fpage>2579</fpage>
          -
          <lpage>2605</lpage>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Moradi</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gaser</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Huttunen</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tohka</surname>
          </string-name>
          , J.:
          <article-title>Mri based dementia classification using semi-supervised learning and domain adaptation</article-title>
          .
          <source>In: MICCAI 2014 Workshop Proceedings, Challange on Computer-Aided Diagnosis of Dementia, based on Structural MRI Data</source>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Pizzati</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Charette</surname>
          </string-name>
          , R.d.,
          <string-name>
            <surname>Zaccaria</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cerri</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Domain bridge for unpaired image-to-image translation and unsupervised domain adaptation</article-title>
          .
          <source>In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision</source>
          (WACV) (
          <year>March 2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <given-names>Quionero</given-names>
            <surname>Candela</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Sugiyama</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Schwaighofer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Lawrence</surname>
          </string-name>
          , N.D.:
          <article-title>Dataset shift in machine learning (</article-title>
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Saito</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kim</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sclaroff</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Darrell</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Saenko</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Semi-supervised domain adaptation via minimax entropy</article-title>
          .
          <source>In: Proceedings of the IEEE/CVF International Conference on Computer Vision</source>
          (ICCV) (
          <year>October 2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Sankaranarayanan</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Balaji</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Castillo</surname>
            ,
            <given-names>C.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chellappa</surname>
          </string-name>
          , R.:
          <article-title>Generate to adapt: Aligning domains using generative adversarial networks</article-title>
          .
          <source>In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)</source>
          (
          <year>June 2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Sugiyama</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Suzuki</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nakajima</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kashima</surname>
            , H., von Bu¨nau,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kawanabe</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Direct importance estimation for covariate shift adaptation</article-title>
          .
          <source>Annals of the Institute of Statistical Mathematics</source>
          <volume>60</volume>
          (
          <issue>4</issue>
          ),
          <fpage>699</fpage>
          -
          <lpage>746</lpage>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Wachinger</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Reuter</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Domain adaptation for alzheimer's disease diagnostics</article-title>
          .
          <source>Neuroimage</source>
          <volume>139</volume>
          ,
          <fpage>470</fpage>
          -
          <lpage>479</lpage>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Wan</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zeng</surname>
            ,
            <given-names>J.M.</given-names>
          </string-name>
          :
          <article-title>Deep learning with feature embedding for compound-protein interaction prediction</article-title>
          .
          <source>bioRxiv</source>
          (
          <year>2016</year>
          ). https://doi.org/10.1101/086033
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          , Zhang,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Gao</surname>
          </string-name>
          ,
          <string-name>
            <surname>Y.</surname>
          </string-name>
          :
          <article-title>Semi-supervised domain adaptation via fredholm integral based kernel methods</article-title>
          .
          <source>Pattern Recognition</source>
          <volume>85</volume>
          ,
          <fpage>185</fpage>
          -
          <lpage>197</lpage>
          (
          <year>2019</year>
          ). https://doi.org/https://doi.org/10.1016/j.patcog.
          <year>2018</year>
          .
          <volume>07</volume>
          .035, http://www.sciencedirect.com/science/article/pii/S0031320318302747
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          24.
          <string-name>
            <surname>Wu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Winston</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kaushik</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lipton</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          :
          <article-title>Domain adaptation with asymmetrically-relaxed distribution alignment</article-title>
          .
          <source>In: Proceedings of the 36th International Conference on Machine Learning</source>
          . vol.
          <volume>97</volume>
          , pp.
          <fpage>6872</fpage>
          -
          <lpage>6881</lpage>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          25.
          <string-name>
            <surname>Yu</surname>
            ,
            <given-names>Y.L.</given-names>
          </string-name>
          , Szepesva´ri,
          <string-name>
            <surname>C.</surname>
          </string-name>
          :
          <article-title>Analysis of kernel mean matching under covariate shift</article-title>
          .
          <source>In: Proceedings of the 29th International Coference on International Conference on Machine Learning</source>
          . pp.
          <fpage>1147</fpage>
          -
          <lpage>1154</lpage>
          . ICML'
          <volume>12</volume>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          26.
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          , Scho¨lkopf,
          <string-name>
            <given-names>B.</given-names>
            ,
            <surname>Muandet</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            ,
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <surname>Z.</surname>
          </string-name>
          :
          <article-title>Domain adaptation under target and conditional shift</article-title>
          .
          <source>In: International Conference on Machine Learning</source>
          . pp.
          <fpage>819</fpage>
          -
          <lpage>827</lpage>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          27.
          <string-name>
            <surname>Zou</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yu</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vijaya Kumar</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          :
          <article-title>Unsupervised domain adaptation for semantic segmentation via class-balanced self-training</article-title>
          .
          <source>In: Proceedings of the European Conference on Computer Vision (ECCV)</source>
          . pp.
          <fpage>289</fpage>
          -
          <lpage>305</lpage>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>