<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>International
Advances in neural information processing systems journal of corpus linguistics 6 (2001) 97-133.
27 (2014). [44] F. Wilcoxon</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Analysis in Deep Transfer Learning</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Osayande P. Omondiagbe</string-name>
          <email>omondiagbep@landcarereserach.co.nz</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sherlock A. Licorish</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Stephen G. MacDonell</string-name>
          <email>stephen.macdonell@aut.ac.nz</email>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Atlanta, USA</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Informatics, Landcare Research</institution>
          ,
          <addr-line>Lincoln, New Zelaand</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of Information science, University of Otago</institution>
          ,
          <addr-line>Dunedin, New Zelaand</addr-line>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Software Engineering Research Lab, Auckland University of Technology</institution>
          ,
          <addr-line>Auckland</addr-line>
          ,
          <country country="NZ">New Zealand</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2019</year>
      </pub-date>
      <volume>2</volume>
      <fpage>17</fpage>
      <lpage>21</lpage>
      <abstract>
        <p>Data sparsity is a challenge facing most modern recommendation systems. With cross-domain recommendation technique, one can overcome data sparsity by leveraging knowledge from relevant domains. This approach can be further enhanced by considering the latent sentiment information. However, as this latent sentiment information is derived from both relevant and irrelevant sources, the performance of the recommendation system may decline. This is a negative transfer (NT) problem, where the knowledge that is derived from multiple sources afects the system. Also, these source domains are often imbalanced, which could further hurt the performance of the recommendation system. To this end, recent research has shown that NT is caused by domain divergence, source and target quality, and algorithms that are not carefully designed to utilise the target data to improve the domain transferability. While various research works have been proposed to prevent NT, these address only some of the factors that may lead to NT. In this paper, we propose a more systematic and comprehensive approach to overcoming NT in sentiment analysis by tackling the main causes of NT. Our approach combines the use of cost weighting learning, uncertainty-guided (aleatoric and epistemic) loss function over the target dataset, and the concept of importance sampling, to derive a robust model. Experimental results on a sentiment analysis task using Amazon review datasets validate the superiority of our proposed method when compared to three other state-of-the-art methods. To disentangle the contributions behind the success of both uncertainties, we conduct an ablation study exploring the efect of each module in our approach. Our findings reveal that we can improve a sentiment analysis task in a transfer learning setting from 4% to 10% when combining both uncertainties. Our outcomes show the importance of considering all factors that may lead to NT. These ifndings can help to build an efective recommendation system when including the latent sentiment information.</p>
      </abstract>
      <kwd-group>
        <kwd>Transfer learning</kwd>
        <kwd>neural networks</kwd>
        <kwd>bert</kwd>
        <kwd>uncertainty</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <sec id="sec-1-1">
        <title>Generally, recommendation systems are used in commer</title>
        <p>
          cial applications to help users discover the products or
services they are looking for. In order to solve the lack
of data and the cold-start1 problem, researchers have
increasingly introduced concepts of source domain and
target domain into cross-domain recommendation [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ].
        </p>
      </sec>
      <sec id="sec-1-2">
        <title>Through the use of transfer learning, cross-domain based</title>
        <p>recommendation is able to leverage the rich information
from multiple domains as against in a single domain, and
transfer knowledge efectively from one domain to
another. For cross-domain recommendation to work,
howDL4SR’22: Workshop on Deep Learning for Search and
Recommendation, co-located with the 31st ACM International Conference on</p>
        <p>CEUR
Workshop
Proce dings
htp:/ceur-ws.org
IS N1613-073</p>
        <p>CEUR</p>
        <p>Workshop Proceedings (CEUR-WS.org)</p>
      </sec>
      <sec id="sec-1-3">
        <title>1A problem where the system cannot draw any inferences for users</title>
        <p>
          or items about which it has not yet gathered suficient information
ever, users’ interests or item features must be consistent
or correlated across domains [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ].
        </p>
        <p>
          Most existing cross-domain recommendation methods
rely only on sharing text information, such as ratings,
tags or reviews, and ignore latent sentiment information
in the sentiment analysis domain [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]. Recently, methods
that consider this latent sentiment information have been
proven to be more efective when compared with existing
recommendation algorithms that do not consider this
information [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]. This is because user reviews are usually
subjective, so they would not be able to reflect the user’s
preferences and sentiments towards diferent attributes.
        </p>
        <p>
          As these sentiment data are derived from both
relevant and irrelevant sources and the datasets are often
imbalanced, the performance of these cross-domain
recommendation system may decline due to learning a bias
[
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]. Also, these cross-domain models did not take into
account the bidirectional latent relations between users
and items [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. A better solution to this problem is to
introduce transfer learning (TL) [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] into the cross-domain
recommendation system [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. TL systems utilise data and
knowledge from a related domain (known as the source
domain) to mitigate this learning bias, and can improve
the generalizability of models in the target domain [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ].
        </p>
      </sec>
      <sec id="sec-1-4">
        <title>Regrettably, this approach is not always successful un</title>
        <p>
          less specific guidelines are adhered to [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]; 1) both tasks utilising the source and target datasets and incorporate a
should be related; 2) the source and target domain should cost weight to tackle the problem of imbalanced data that
be similar; 3) and a model which can learn both domains may further increase the domain divergence issue. Hence,
should be applied to both the source and target datasets. this work uses the idea of model and data transferability
When these guidelines are not followed, the performance enhancement to develop a more robust model aimed at
of the target model is likely to degrade. This is known preventing negative transfer. By using such a systematic
as negative transfer (NT) [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. NT can be caused by four approach, we would be able to tackle the four main causes
main issues [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]: One: Domain divergence - When the of NT mentioned above. Our main contributions are
divergence between the source and target domains is summarised as follows.
wide, NT will occur. Two: Transfer algorithm - When
designing a transfer algorithm, it should have a theoreti- • We propose using a combined uncertainty as a
cal guarantee that the performance in the target domain loss function. This combined uncertainty consists
will be better when auxiliary data are used, or the transfer of both the aleatoric and epistemic uncertainties.
algorithm should be carefully designed to improve the The epistemic uncertainty captures the model
untransferability of auxiliary domains, else NT may occur. certainty, while the aleatoric uncertainty captures
Three: Source data quality - The quality of the source the uncertainty concerning information that the
data determines the quality of the transferred knowledge. data cannot explain and is modelled over the
tarIf the source data are very noisy, then a model trained get and source dataset to guide the learning
proon them is unreliable. Four: Target data quality - The cess. By using the aleatoric uncertainty-guided
target domain data may be noisy, which may also lead to loss function over the target and source data, we
NT. Also, the amount of labelled target data has a direct can derive more information and enhance the
impact on the learning process if not fully utilised by the model’s transferability.
learning algorithm [
          <xref ref-type="bibr" rid="ref10 ref9">9, 10</xref>
          ]. • We propose combining an uncertainty-guided
        </p>
        <p>
          Various research works have proposed the mitigation loss function, a cost-sensitive classification
of NT, and these are seen in the following areas [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]; One: method of incorporating cost-weighting into the
By enhancing the data transferability strategy [
          <xref ref-type="bibr" rid="ref11 ref7">11, 7</xref>
          ]. model and an importance sampling strategy to
This is done by either addressing the domain divergence enhance the data and model transferability. This
between the source and target [
          <xref ref-type="bibr" rid="ref11 ref12">12, 11</xref>
          ], or reweighing method can be used when there is imbalanced
strategy by applying more weight to those source do- data and/or dissimilarity between the source and
mains which are similar to the target dataset [
          <xref ref-type="bibr" rid="ref13 ref14">13, 14</xref>
          ], or target dataset.
by learning a common latent feature space [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ]. Two: • Finally, we perform an ablation study to
disentanBy enhancing the model transferability enhancement gle the contributions behind the success of each
through transferable normalisation [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ], or by making module introduced in our system.
the model robust to adversarial samples through the use
of a robust optimisation loss function [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ]. Three: By en- The remainder of this paper is organised as follows.
hancing the target prediction through the use of pseudo We present related work in Section 2. Next, we
introlabelling [
          <xref ref-type="bibr" rid="ref18 ref19">18, 19</xref>
          ]. duce our proposed approach in 3. Section 4 presents our
        </p>
        <p>
          Previous research found that the use of a model that datasets, candidate models, and experimental setup. The
is robust to adversarial samples results in better transfer- results and discussion are presented in Sections 5 and
ability [
          <xref ref-type="bibr" rid="ref20 ref21">20, 21</xref>
          ]. They tend to have better accuracy than a 6, respectively, before considering threats in Section 7.
standard target model. Similarly, Liang et al. [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ] found a Finally, we conclude the study in Section 8.
positive correlation between a model that is robust to an
adversarial sample and the knowledge transferred. This
work suggests such a model can benefit from the knowl- 2. Related Work
edge transfer between the source and target. By relying
on such methods, these approaches can be limited to
being robust to adversarial samples and fail to model
uncertainty under data and label distribution, which could
introduce further bias [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ]. Recently, the work of Grauer
and Pei [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ] has shown that when model uncertainty
is known and distributed evenly, the performance and
reliability of the model are greatly improved.
        </p>
        <p>
          In this work, we introduce the use of an
uncertaintyguided loss function to guide the training process when
Transfer Learning is a research strategy in machine
learning (ML) that aims to use the knowledge gained while
solving one problem and apply it to a diferent but related
problem [
          <xref ref-type="bibr" rid="ref23">23</xref>
          ]. Early methods in this area have exploited
techniques such as instance weighting [
          <xref ref-type="bibr" rid="ref24 ref25">24, 25</xref>
          ], feature
mapping [
          <xref ref-type="bibr" rid="ref26">26, 27</xref>
          ] and transferring relational knowledge
[28]. Due to the increased processing power aforded
by graphical processing units (GPUs), deep learning is
now used more frequently in transfer learning tasks and
when compared to earlier approaches, such models have
achieved better results in the discovery domain invariant
features [29]. It was shown that when deep learning is
used the transferability of features decreases as the
distance between the base task and target task increases,
but that transferring features even from distant tasks
can be better than using random features [29]. Some of
these deep learning methods [30, 31, 32] have exploited
the use of mismatch measurement, such as Maximum
Mean Discrepancy (MMD) to transfer features or by using
generative adversarial networks (GANs) [33]. Although
these methods have all achieved high performance in
diferent domains, such as in computer vision [ 34] and
natural language processing [35], they were not designed
to tackle the problem of negative transfer (NT).
        </p>
        <p>
          Other prominent lines of work can be seen in deep
learning to tackle the issue of NT. These works include
the use of instance weighting (e.g., predictive distribution
matching (PDM) [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]) , enhancing the feature
transferability through the use of a latent feature (e.g., DTL [36]),
and the use of soft loss function based on soft
pseudolabels (e.g., Mutual Mean-Teaching (MMT)[
          <xref ref-type="bibr" rid="ref19">19</xref>
          ]). These
methods do not guarantee tackling NT, as they tackle
some causes of NT, but not all (e.g., PMD method
tackles the transfer algorithm and source data quality, while
MMT tackles the domain divergence, transfer algorithm
and target data quality issue). Although, a previous
study exploring the benefits of modelling epistemic and
aleatoric uncertainty in Bayesian deep learning models
for vision tasks has demonstrated that when these
uncertainties are integrated into the loss functions, the model is
more robust to noisy data, how these can be used to tackle
NT has not been looked at. Hence, our main objective
in this paper is to derive a robust loss function for
deep transfer learning that tackles the causes of NT
mentioned in Section 1.
        </p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>3. Method</title>
      <sec id="sec-2-1">
        <title>This section provides a formal definition of NT and proposed methods to overcome it.</title>
        <sec id="sec-2-1-1">
          <title>3.1. Negative Transfer</title>
          <p>
            Notation: We use the following notation   px q ≠   px q
and   p  |x q ≠   p  |x q to denote the marginal and
conditional distribution of source and target sets,
respectively. In this case, x and x represent the source and
target, respectively. Zhang et al. [
            <xref ref-type="bibr" rid="ref7">7</xref>
            ] gave a mathematical
definition of NT, and proposed a way to determine the
degree of NT (NTD) when it happens.
          </p>
          <p>Definition: Let  be the test error in the target domain,
 pS, Tq a TL algorithm between source (S) and target (T),
and  p∅, Tq the same algorithm which does not use the
source domain information at all. Then, NT happens
when  ( pS, Tq) &gt;  p p∅, Tqq, and the degree of NT can
be evaluated by equation 1 below:
  
“  p pS, Tqq ´  p p∅, Tqq
(1)</p>
        </sec>
      </sec>
      <sec id="sec-2-2">
        <title>When NTD is positive, then negative transfer has occurred. Next, we propose a systematic way to avoid negative transfer.</title>
        <sec id="sec-2-2-1">
          <title>3.2. Proposed Methods</title>
        </sec>
      </sec>
      <sec id="sec-2-3">
        <title>We explain the three concepts used in our method below:</title>
        <p>Cost-sensitive Classification: The idea of
costsensitive classification is used when there is a higher
cost of mislabelling one class over the other class [37].
Cost-sensitive learning tackles the class imbalance
problem by changing the model cost function giving more
weight to the minority class and multiplying the loss
of each training sample by a specific factor. The
imbalanced data distribution is not modified directly during
training [37]. Madabushi et al. [38] introduced a
costweighting strategy in the Bert model, which increases
the weight of incorrectly labelled sentences by altering
the cost function of the final model layer. The cost
function is changed by modifying the cross-entropy loss for
a single prediction  , and the model’s prediction for class
k to accommodate an array weight as shown in equation
2
 p, 
q “  ℎ
r
s∅
where ∅ “ − r
s `  p∑ “1 
p r sqq
(2)
Importance Sampling: The traditional way of training
a deep learning model has one major drawback, where
it is not able to diferentiate samples where it performs
very well, i.e., low loss and those samples where the
performance is poor i.e., high loss [39]. Also, as not all
source samples can provide useful knowledge [39], we
introduce the idea of importance sampling to control
examples which should be given more priority.
Importance sampling [40] is a variance reduction technique
and is done by taking a random sample of a set based
on a probability distribution among the elements of the
group. In our proposed method, we attach weights to the
source training examples based on their similarity to the
target dataset. The samples with more weight will have
a higher chance of being selected. We sample the source
from a probability density over the target data.
Uncertainty Quantification: There are diferent types
of uncertainties, and these could be present in the data
or model. When the uncertainty is derived from the
model, it is referred to as ”epistemic or model uncertainty”
[41]. Epistemic uncertainty captures the ignorance about
the model generated from the collected data and can be
explained more when more data is given to the model [41].
It is the property of the model. When the uncertainty is
related to the data, it is referred to as aleatoric uncertainty
[41]. It captures the uncertainty concerning information
that the data cannot explain. This can be further divided
into two;
• Heteroscedastic uncertainty, which depends on
the data input and is predicted as a model output
[41].
• Homoscedastic uncertainty, which is not input
data dependent but assumes a constant for all
input data and varies between the diferent tasks
[42].</p>
        <p>In this case, we are not interested in the homoscedastic
uncertainty because we are assuming related task
between the source and target. To learn the heteroscedastic
uncertainty, the loss function can be replaced with the
following [41]:

“
|| ´ ̂||2
2 2</p>
        <p>1 log  2
` 2
(3)
where the model predicts a mean  ̂ and variance  2.
Kendall and Gal [41] proposed a loss function to
combine both epistemic and aleatoric (heteroscedastic)
uncertainty as follows:

“ 1 ∑ “1 
1 log  2 (4)
p´ log  2q|| ´  ̂ ||2 ` 2
where  is the total number of output and  2 is the
variance.</p>
      </sec>
      <sec id="sec-2-4">
        <title>Algorithm 1 Combined Uncertainty Loss Function and</title>
        <p>Cost-Weighting (CUCW)
Input:
Output:
• Source model : gpxq
• Source Training set Str
• Target Training set Ttr
• Target Validation set Tv
• Target Testing set Tts</p>
        <p>Degree of negative transfer (NTD)
1. Estimate similarity for each source sample against
random 1000 target samples
2. Estimate importance weight with importance sampling
based on the similarity
3. Train a source model  using importance weight with a
small target sample as the validation data Tv
4. Compute loss function using Equations 2 and 3 OR</p>
        <p>Equations 2 and 4
5. Compute test error  p pS, Tqq on model  using target
test set Tts
6. Train a target model  with the target data only Ttr
7. Compute test error  p p∅, Tqq on model  using target
test set Tts
8. Calculate NTD  p pS, Tqq -  p p∅, Tqq
9. Fine tune model  using target training set Ttr and target
validation set Tv to derive a new model 
10. Compute test error  p p∅, Tqq on model  using target
test set Tts
11. return Degree of negative transfer (NTD) and model
performance</p>
      </sec>
      <sec id="sec-2-5">
        <title>Based on the algorithm above, we can employ a deep</title>
        <p>transfer learning using the proposed approach to find an
optimal model with the least degree of negative transfer.</p>
        <p>This can be done by following the steps in sequential
order. For each step, we can find the best model by training
diferent hyperparameters in our model.</p>
        <p>Our Proposed Approach: To derive our proposed loss
function, which can enhance the data and model trans- 4. Experiments
ferability, we combine equation 2 and 3 when
incorporating heteroscedastic uncertainty, and equation 2 and 4 All experiments were conducted 10 times as used in the
when incorporating both epistemic and heteroscedastic work of Bennin et al. [45] to reduce the impact of bias,
uncertainty. To determine the similarity of the sample, and the results were averaged across all independent runs.
we use the method proposed by Kilgarrif [43]. Then, For our sentiment analysis task, we use the Amazon
rethe Wilcoxon signed-rank test [44] is used to compare view dataset. We aim to build an accurate sentiment
the frequency counts from both datasets to determine analysis model for low-resource domains by learning
if both datasets have a statistically similar distribution. from high-resource but related domains. We used the
To overcome the divergence problem, we use the impor- smaller version of the datasets prepared by Lakkaraju
tance sampling technique in our training process. The et al. [46]. These datasets contain 22 domains, as shown
pseudocode for our proposed method is as follows: in section 1 above. It is worth noting that some domains
in this dataset are imbalanced, as seen in Fig 1. We ranked
reviews with 1 to 3 stars as negative, while reviews with 4
or 5 stars were ranked as positive. For the pre-processing
steps, we use standard techniques commonly used in
NLP and Amazon sentiment analysis tasks [47, 48] in
the following order; tokenisation, stop word/punctuation
removal, and lemmatisation. Tokenisation involves the
process of separating a sentence into a sequence of words
known as “tokens” [49]. These tokens are identifiable or
separated from each other by a space character.
Punctuation and stop words that frequently appear and do
not significantly afect meaning (stop word removal e.g.,
”the”, “is” and “and”) were also removed [49]. Our
lemmatisation process involves using the context in which the
word is derived from (e.g., studies becomes study). By
lemmatising a word, we reduce its derivationally related
forms into a common root form. By using the root form
of a word, the model will be able to learn any inflectional
form for that given word.
4.1. Experiment Setup</p>
      </sec>
      <sec id="sec-2-6">
        <title>We selected only domains from the Amazon review</title>
        <p>datasets where class imbalance was evident. To
determine the domains to select, the negative to positive ratio
is presented in Table 1, where only domains with less
than 0.7 ratio were selected to be used in this experiment. Figure 2: Amazon review dataset showing imbalance domains
From Table 1, six domains were selected as shown in
Figure 2 below.</p>
        <p>We designed two groups of experiments by selecting and Software. For each experiment, a single domain was
domains where class imbalance is present, as shown in used as the target dataset, while the remaining domains
Figure 2. In the experiment, we excluded the ”Grocery” in that group were used as the source datasets.
domain, as this domain is not related to the other six Text Similarity Measure: We use the Wilcoxon
signeddomains shown in Figure 2. The first group of domains rank test [44] to compare the frequency counts from both
consists of datasets from Beauty, Outdoor_living and Jew- datasets to determine if both datasets have a statistically
elry_&amp;_Watches, while the second domain group consists similar distribution. This was done by extracting all
of datasets from Ofice_products, Cell_phones_&amp;_Service words while retaining the repeat from each sample of our
source training set and ignoring the stop words. From the
target set, we sampled (with replacement) 1000 samples
as done by Madabushi et al. [38]. Then, we use a word
frequency from each of the source training samples and
the sample’s target set to calculate the p-value using the
Wilcoxon signed-rank test.</p>
        <p>Model: We used the Bert uncased model for this task.</p>
        <p>It consists of a 768-dimension vector, 12 layers of the
transformer block and 110 million parameters. We added
a fully connected layer on top of the BERT self-attention
layers to classify the review. For the parameters, we
adopt a similar hyperparameter as used in the Bert
uncased model for Amazon sentiment analysis [50]. These
parameters include using the Adam optimiser with
various learning rates and 512 Max Sequence Length with
ifve epochs. The learning rate was 1e´05. The model
was first build using the source dataset to derive a source
model. Then, this source model was fine-tuned with the
target datasets. The fine-tuning with the target datasets
was done by using a commonly split ratio (30:70) [51].</p>
        <p>The training sets of the target data were used to fine-tune
the source model before being tested on the test sets. We
ran 10 experiments to compute the estimated risk by the
diferent methods and the average was reported.</p>
        <p>Evaluation measures: All experiments were conducted
10 times as done in the work of Bennin et al. [45] to
reduce the impact of sampling bias, and the results were
averaged across the independent runs. To evaluate the
prediction accuracy of each modelling approach, the
following were computed:
• Balanced accuracy (BAUC): BAUC measures
model performance, taking into account class
imbalances and it also overcomes bias in binary
cases [52]. The balanced accuracy is computed
as the average of the proportion of correct
predictions for each class separately.
• F-measure: This is used for evaluating binary
classification models based on the predictions
made for the positive class [52].</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>5. Results</title>
      <sec id="sec-3-1">
        <title>Here, we compare our systematic approach against three</title>
        <p>
          diferent strategies proposed for tackling NT. These
strategies were:
• Predictive distribution matching (PDM) [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ].
        </p>
        <p>This is an instance-based weighting approach.</p>
        <p>
          This method works by first measuring the
difering predictive distributions of the target domain
and the related source domains. In this case, a
PDM regularised classifier is used to infer the
target pseudolabeled data, which will help to
identify the relevant source data, so as to correctly
align their predictive distributions [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]. We used
the support vector machines (SVM) variant of the
proposed PDM as used in the sentiment analysis
task in the work of Seah et al. [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ].
• Mutual Mean-Teaching (MMT)[
          <xref ref-type="bibr" rid="ref19">19</xref>
          ]: This is a
feature transferability approach which uses a soft
loss function based on soft pseudo-labels and is
carried out in two stages. In the first stage, the
Bert uncased model was trained using the source
domain to derive a source model. This source
model is trained to model a feature
transformation function that transforms each input sample
into a feature representation. For this experiment.
the source model is trained with a classification
loss and a triplet loss to separate features
belonging to diferent identity, as used in the original
paper [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ]. Next, the source model trained in
stage 1 is optimised using the MMT framework,
which is based on the clustering method. The
details of this approach are explained in the original
paper [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ].
• Dual Transfer Learning (DTL) [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ]: This
approach enhances feature transferability through
the use of a latent feature. This method
simultaneously learns the marginal and conditional
distributions, and exploits their duality. For this
experiment, the training was done using the Bert
uncased model by combining the source and tar- posed approach, we report the results by removing each
get training data before being tested on the target component in our proposed approach. When epistemic
dataset. uncertainty or cost weighting was excluded from the loss
function, we noticed three cases (i.e., outdoor living, cell
In Tables 2 to 3, we report the fine-tuned models’ perfor- phones &amp; service, and ofice product were used as the
mance (balanced accuracy and F- measure) on the target target datasets) where the MMT method outperformed
test set. In cases where NT has occurred (i.e., the degree our approach. A similar outcome was noted in the
Fof NT was calculated using Equation 1), we denote the measure as shown in Table 3. To further disentangle the
colour of the accuracy as red. From Table 2, the results contribution of all components in our proposed approach
indicate that our proposed approach with fine-tuning, without fine-tuning the Bert model and to provide a fair
other components and including both uncertainties (het- comparison with the three methods we compared against,
eroscedastic aleatoric and epistemic uncertainty) in the we combined the source and target training data to train
loss function outperformed the other three models. To our Bert model before testing on the target test data. This
disentangle the contribution of all components in our
prowas done to remove the benefit of the fine-tuning compo- against (MMT and DTL) in this study also use the Bert
nent in our design. The results in Tables 4 to 5 show that, Uncased model, hence, we are able to eliminate the
inwithout the fine-tuning component, we were still able terference of model complexity in the comparison result.
to improve the performance when all other components From the ablation study, model fine-tuning improved the
are integrated in our deep transfer learning, but with less overall performance from 2% to 6% when integrating all
improvement (i.e., noting an improvment of BAUC and components into our approach.
        </p>
        <p>F-measure of 2% to 9% as shown in Table 4 and Table 5).</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>7. Addressing threats to validity</title>
    </sec>
    <sec id="sec-5">
      <title>6. Discussion</title>
      <sec id="sec-5-1">
        <title>The experimental dataset was compiled by [46]. We ac</title>
        <p>
          In our sentiment analysis experiment (see Table 2 to Ta- knowledge threats relating to errors in the review labels.
ble 5), our proposed method, which incorporated both These threats have been well minimised by
experimentuncertainties, was able to improve the balanced accuracy ing with diferent projects in the datasets. Also, we
conof the BERT model from 5% to 14% and F- measure value cede that there are a few uncontrolled factors that may
from 5% to 10% as compared to using techniques that have impacted the experimental results in this study. For
are instance [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] or feature transferability based [
          <xref ref-type="bibr" rid="ref15 ref19">19, 15</xref>
          ]. instance, there could have been unexpected faults in the
Although the instance level transferability enhancement implementation of the approaches we compare against
has been used in the deep learning model to prevent NT in this paper [54]. We sought to reduce such threats by
[
          <xref ref-type="bibr" rid="ref11">11, 53</xref>
          ], they do not handle the target data quality. This using the source code provided for these methods (e.g.,
factor is shown to be one of the causes of NT [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. The PDM, MMT and DTL). While we recognize the threats
PDM method that we compared against in this paper above, we anticipate that our study here still contributes
tackles the domain divergence issue by using predictive novel findings to transfer-based modelling for
recomdistribution matching to remove the irrelevant source. mendation systems in NLP domains relying on latent
This method still failed to address the target data quality; sentiment information.
hence, we noted a single case of nt in our nlp task result
(when the outdoor living domain was used as the target’s
dataset). Although the MMT method uses a softmax loss 8. Conclusion
function based on soft pseudo-labels to tackle the
target data quality, it cannot tackle the domain divergence In this work, we proposed a systematic approach to
overissue, which may also lead to NT. A single case of NT coming negative transfer by tackling domain divergence,
(when the outdoor living domain was used as the target’s taking account of the source and target data quality.
dataset) was also noted when using this method. On Our approach involves using cost weighting learning,
the other hand, our proposed method is more robust. It uncertainty-guided loss function over the target dataset,
uses the uncertainty-guided function to tackle the target and the concept of importance sampling to derive a robust
and source data quality issue, importance sampling and model. This systematic approach improves the target
docost weighting learning, to tackle the domain divergence main’s performance. The results reported in this work
problem. For the fine-tuning process, we use a small tar- also reveal that when both aleatoric heteroscedastic and
get sample as the validation data in the source model to epistemic uncertainty are combined, we can further
enimprove the transferability of the final model. Our results hance the performance of the target model. We therefore
show that the final model is improved when we intro- assert that our systematic approach is a good approach
duce the use of an uncertainty-guided loss function to for overcoming negative transfer and improving target
guide the training process when utilising the source and model performance when performing sentiment
analytarget datasets and incorporate a cost weight to tackle sis in a transfer learning setting. This approach can be
the problem of imbalanced data. In the work of Grauer used to build an efective recommendation system when
and Pei [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ], it was also noted that when model uncer- including the latent sentiment information. A plausible
tainty is known and distributed evenly, the performance next step, is to use such an approach to design an
efecand reliability of the model are greatly improved. Hence, tive recommendation system that takes into account the
this work uses the idea of model and data transferability latent sentiment information. Although our experiments
enhancement to develop a more robust approach aimed showed our approach improves the target model
perforat preventing negative transfer. The evidence from our mance and prevents NT in sentiment analysis, it is still
results suggests that we could use a systematic approach important to investigate this approach for other domains.
such as what was proposed in this paper to improve the
quality of models in a deep transfer learning setting. Also,
it is worth noting that two of the methods we compared
        </p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgements</title>
      <sec id="sec-6-1">
        <title>This research was partly supported by an Internal Re</title>
        <p>search fund from Manaaki Whenua — Landcare Research,
New Zealand. Special thanks are given to the Department
of Informatics at Landcare Research for their ongoing
support.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>P.</given-names>
            <surname>Cremonesi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Tripodi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Turrin</surname>
          </string-name>
          ,
          <article-title>Cross-domain recommender systems</article-title>
          ,
          <source>in: 2011 IEEE 11th International Conference on Data Mining Workshops</source>
          , Ieee,
          <year>2011</year>
          , pp.
          <fpage>496</fpage>
          -
          <lpage>503</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>T.</given-names>
            <surname>Zang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhu</surname>
          </string-name>
          , H. Liu,
          <string-name>
            <given-names>R.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <article-title>A survey on cross-domain recommendation: taxonomies, methods, and future directions</article-title>
          ,
          <source>arXiv preprint arXiv:2108.03357</source>
          (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Xie</surname>
          </string-name>
          ,
          <article-title>Cross-domain recommendation based on sentiment analysis and latent feature mapping</article-title>
          ,
          <source>Entropy</source>
          <volume>22</volume>
          (
          <year>2020</year>
          )
          <fpage>473</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>B.</given-names>
            <surname>Zadrozny</surname>
          </string-name>
          ,
          <article-title>Learning and evaluating classifiers under sample selection bias</article-title>
          ,
          <source>in: Proceedings of the twenty-first international conference on Machine learning</source>
          ,
          <year>2004</year>
          , p.
          <fpage>114</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>P.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Tuzhilin</surname>
          </string-name>
          , Ddtcdr:
          <article-title>Deep dual transfer cross domain recommendation</article-title>
          ,
          <source>in: Proceedings of the 13th International Conference on Web Search and Data Mining</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>331</fpage>
          -
          <lpage>339</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>S. J.</given-names>
            <surname>Pan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <article-title>A survey on transfer learning</article-title>
          ,
          <source>IEEE Transactions on knowledge and data engineering 22</source>
          (
          <year>2009</year>
          )
          <fpage>1345</fpage>
          -
          <lpage>1359</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>W.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Deng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Wu</surname>
          </string-name>
          , A survey on negative transfer,
          <year>2020</year>
          . URL: https://arxiv.org/abs/
          <year>2009</year>
          .00909.
          <source>doi:1 0 . 4 8</source>
          <volume>5 5</volume>
          <fpage>0</fpage>
          <string-name>
            <surname>/ A R X I</surname>
          </string-name>
          <article-title>V . 2 0 0 9 . 0 0 9 0 9</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>M.</given-names>
            <surname>Rosenstein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Marx</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Kaelbling</surname>
          </string-name>
          , &amp;amp; dietterich, tg (
          <year>2005</year>
          ).
          <article-title>to transfer or not to transfer</article-title>
          , in: NIPS 2005 Workshop on Transfer Learning, ????
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Dai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Póczos</surname>
          </string-name>
          , J. Carbonell,
          <article-title>Characterizing and avoiding negative transfer</article-title>
          ,
          <source>in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>11293</fpage>
          -
          <lpage>11302</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>O. P.</given-names>
            <surname>Omondiagbe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Licorish</surname>
          </string-name>
          ,
          <string-name>
            <surname>S. G.</surname>
          </string-name>
          <article-title>MacDonell, Improving transfer learning for cross project defect prediction, TechRxiv preprint techrxiv</article-title>
          .
          <volume>19517029</volume>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wang</surname>
          </string-name>
          , J. Carbonell,
          <article-title>Towards more reliable transfer learning</article-title>
          ,
          <source>in: Joint European Conference on Machine Learning and Knowledge Discovery in Databases</source>
          , Springer,
          <year>2018</year>
          , pp.
          <fpage>794</fpage>
          -
          <lpage>810</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>E.</given-names>
            <surname>Eaton</surname>
          </string-name>
          , et al.,
          <article-title>Selective transfer between learning tasks using task-based boosting</article-title>
          ,
          <source>in: Twenty-Fifth AAAI Conference on Artificial Intelligence</source>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>C.-W. Seah</surname>
            ,
            <given-names>Y.-S.</given-names>
          </string-name>
          <string-name>
            <surname>Ong</surname>
            ,
            <given-names>I. W.</given-names>
          </string-name>
          <string-name>
            <surname>Tsang</surname>
          </string-name>
          ,
          <article-title>Combating negative transfer from predictive distribution diferences</article-title>
          ,
          <source>IEEE transactions on cybernetics 43</source>
          (
          <year>2012</year>
          )
          <fpage>1153</fpage>
          -
          <lpage>1165</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>D.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <article-title>Pool-based sequential active learning for regression</article-title>
          ,
          <source>IEEE transactions on neural networks and learning systems 30</source>
          (
          <year>2018</year>
          )
          <fpage>1348</fpage>
          -
          <lpage>1359</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>M.</given-names>
            <surname>Long</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Ding</surname>
          </string-name>
          , W. Cheng,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , W. Wang,
          <article-title>Dual transfer learning</article-title>
          ,
          <source>in: Proceedings of the 2012 SIAM International Conference on Data Mining, SIAM</source>
          ,
          <year>2012</year>
          , pp.
          <fpage>540</fpage>
          -
          <lpage>551</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Jin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Long</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. I. Jordan</surname>
          </string-name>
          ,
          <article-title>Transferable normalization: Towards improving transferability of deep neural networks</article-title>
          ,
          <source>Advances in neural information processing systems</source>
          <volume>32</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>A.</given-names>
            <surname>Madry</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Makelov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Schmidt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Tsipras</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Vladu</surname>
          </string-name>
          ,
          <article-title>Towards deep learning models resistant to adversarial attacks</article-title>
          ,
          <source>arXiv preprint arXiv:1706.06083</source>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>L.</given-names>
            <surname>Gui</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Lu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Du</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <article-title>Negative transfer detection in transductive transfer learning</article-title>
          ,
          <source>International Journal of Machine Learning and Cybernetics</source>
          <volume>9</volume>
          (
          <year>2018</year>
          )
          <fpage>185</fpage>
          -
          <lpage>197</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Ge</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <article-title>Mutual mean-teaching: Pseudo label refinery for unsupervised domain adaptation on person re-identification</article-title>
          , arXiv preprint arXiv:
          <year>2001</year>
          .
          <volume>01526</volume>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>K.</given-names>
            <surname>Liang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O. O.</given-names>
            <surname>Koyejo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <article-title>Does adversarial transferability indicate knowledge transferability? (</article-title>
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Deng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Vodrahalli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Kawaguchi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. Y.</given-names>
            <surname>Zou</surname>
          </string-name>
          ,
          <article-title>Adversarial training helps transfer learning via better representations</article-title>
          ,
          <source>Advances in Neural Information Processing Systems</source>
          <volume>34</volume>
          (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>J. A.</given-names>
            <surname>Grauer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Pei</surname>
          </string-name>
          ,
          <article-title>Minimum-variance control allocation considering parametric model uncertainty</article-title>
          ,
          <source>in: AIAA SCITECH 2022 Forum</source>
          ,
          <year>2022</year>
          , p.
          <fpage>0749</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>R.</given-names>
            <surname>Caruana</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Silver</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Baxter</surname>
          </string-name>
          , T. Mitchell, L. Pratt,
          <string-name>
            <given-names>S.</given-names>
            <surname>Thrun</surname>
          </string-name>
          ,
          <article-title>Learning to learn: knowledge consolidation and transfer in inductive systems</article-title>
          , in: Workshop held at NIPS-
          <volume>95</volume>
          , Vail, CO, see http://www. cs. cmu. edu/afs/user/caruana/pub/transfer. html,
          <year>1995</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>M.</given-names>
            <surname>Sugiyama</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Krauledat</surname>
          </string-name>
          ,
          <string-name>
            <surname>K.-R. Müller</surname>
          </string-name>
          ,
          <article-title>Covariate shift adaptation by importance weighted cross validation</article-title>
          .,
          <source>Journal of Machine Learning Research</source>
          <volume>8</volume>
          (
          <year>2007</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>J.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gretton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Borgwardt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Schölkopf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Smola</surname>
          </string-name>
          ,
          <article-title>Correcting sample selection bias by unlabeled data</article-title>
          ,
          <source>Advances in neural information processing systems</source>
          <volume>19</volume>
          (
          <year>2006</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>T.</given-names>
            <surname>Jebara</surname>
          </string-name>
          <article-title>, Multi-task feature and kernel selection for svms</article-title>
          ,
          <source>in: Proceedings of the twenty-first international conference on Machine learning</source>
          ,
          <year>2004</year>
          ,
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>