<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Italian Symposium on Advanced Database Systems, June</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Distributed Heterogeneous Transfer Learning</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Paolo Mignone</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Gianvito Pio</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Michelangelo Ceci</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Big Data Lab, National Interuniversity Consortium for Informatics (CINI)</institution>
          ,
          <addr-line>Via Ariosto, 25, 00185, Rome</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of Computer Science</institution>
          ,
          <addr-line>Via Orabona, 4, 70125</addr-line>
          ,
          <institution>University of Bari Aldo Moro</institution>
          ,
          <addr-line>Bari</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2022</year>
      </pub-date>
      <volume>1</volume>
      <fpage>9</fpage>
      <lpage>22</lpage>
      <abstract>
        <p>Transfer learning has proved to be efective for building predictive models for a target domain, by exploiting the knowledge coming from a related source domain. However, most existing transfer learning methods assume that source and target domains have common feature spaces. Heterogeneous transfer learning methods aim to overcome this limitation, but they often make strong assumptions, e.g., on the number of features, or cannot distribute the workload when working in a big data environment. In this manuscript, we present a novel transfer learning method which: i) can work with heterogeneous feature spaces without imposing strong assumptions; ii) is fully implemented in Apache Spark following the MapReduce paradigm, enabling the distribution of the workload over multiple computational nodes; iii) is able to work also in the very challenging Positive-Unlabeled (PU) learning setting. We conducted our experiments in two relevant application domains for transfer learning: the prediction of the energy consumption in power grids and the reconstruction of gene regulatory networks. The results show that the proposed approach fruitfully exploits the knowledge coming from the source domain and outperforms 3 state-of-the-art heterogeneous transfer learning methods.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Heterogeneous transfer learning</kwd>
        <kwd>Positive-Unlabeled setting</kwd>
        <kwd>Distributed computation</kwd>
        <kwd>Apache Spark</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Machine learning algorithms aim to train a model function that describes and generalizes a set
of observed examples, called training data. The learned model function can be applied to unseen
data with the same feature space and data distribution of the training data. However, in several
real scenarios, it is dificult or expensive to obtain training data described through the same
feature space and following the same data distribution of the examples where the prediction
functions will be applied. A possible solution to the challenges raised by these scenarios come
from the design and application of transfer learning methods [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], that aim to learn a predictive
function for a target domain, by exploiting also an external but related source domain.
      </p>
      <p>
        A relevant example of these scenarios can be observed in the energy field, where predicting
the customer energy consumption is a typical task [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. When new customers are connected to
the energy network in a new area/district, they could not be well represented by the available
training data to make accurate predictions for them. In this case, transfer learning would enable
the exploitation of data related to other customers, also residing in diferent areas, leveraging
common characteristics in terms of type of customers or in terms of behavior.
      </p>
      <p>
        In several scenarios, data for the target and the source domains come from multiple data
sources, exhibiting diferent data representations that make the direct adoption of classical
transfer learning approaches unfeasible. A possible solution may consist in performing
timeconsuming manual feature engineering steps, aiming to find commonalities among the data
sources and to somehow make the feature spaces homogeneous. However, such manual
operations can be subjective, error-prone, and even unfeasible when no detailed information are
available about the features at hand, or when the features in the target and source domains
are totally diferent, i.e., heterogeneous. To overcome this issue, several approaches have been
proposed in the literature to automatically identify the best match between features of diferent
domains, or to identify a novel shared feature space. Such methods cover diferent real-domain
applications, such as biomedical analysis [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], object recognition from images [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], or multilingual
sentiment classification [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. However, to the best of our knowledge, existing methods exhibit
one or more of the following limitations: i) they make strict assumptions on the number of
features, i.e., even if they are able to work with heterogeneous feature spaces, they are required
to have the same dimensionality [
        <xref ref-type="bibr" rid="ref6 ref7 ref8">6, 7, 8</xref>
        ]; ii) they are not able to distribute the workload over
multiple computational nodes; iii) in the case of classification tasks, they require a fully labeled
training set, or at least a proper representation of each class, i.e., they are not able to work in
the Positive-Unlabeled (PU) setting, where the training set consists of only positive labeled
instances and unlabelled instances [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], that is very common, for example, in the biological
domain [
        <xref ref-type="bibr" rid="ref10 ref11">10, 11</xref>
        ]; iv) they are able to work by exploiting the background knowledge of specific
application domains [
        <xref ref-type="bibr" rid="ref12 ref13">12, 13</xref>
        ].
      </p>
      <p>To overcome these limitations, in this paper we proposed a novel heterogeneous transfer
learning method, called STEAL, that simultaneously exhibit the following characteristics:
• It is able to work also in the very challenging Positive-Unlabeled (PU) learning setting;
• It is implemented using the Apache Spark framework following the MapReduce paradigm,
enabling the distribution of the workload over multiple computational nodes;
• It exploits the Kullback-Leibler (KL) divergence to align the descriptive variables of the
source and target domains and, therefore, to work with heterogeneous domains described
through feature spaces having diferent dimensionalities;
• It is not tailored for specific application domains, but can be considered a general purpose
approach applicable to multiple real-world scenarios.</p>
    </sec>
    <sec id="sec-2">
      <title>2. The proposed method STEAL</title>
      <p>In this section, we describe the proposed method STEAL, emphasizing its peculiarities. The
distribution of the workload over multiple computational nodes is achieved by designing the
algorithms using the MapReduce paradigm. The pseudo-code description of the algorithms
exploits the Resilient Distributed Dataset (RDD) data structure available in Apache Spark.</p>
      <p>The workflow followed by STEAL consists of 3 stages. The first two stages are dedicated to
align the feature spaces and to identify a proper matching between target and source instances.
Finally, the third stage consists of training a distributed variant of Random Forests (RF) available
in Apache Spark, from the obtained hybrid training set. In the following, we report some further
details about the first two stages.</p>
      <p>
        Stage 1 - Feature Alignment. In the first stage, in the case of PU learning setting, we first
estimate the labels of the unlabeled instances by resorting to a clustering-based approach.
Specifically, we apply a prototype-based clustering algorithm to identify  representatives of
positive instances from the positive examples of the target domain, and exploit them to estimate
a score for each unlabeled instance. Specifically, we adopt a distributed variant of the -means
algorithm, since it is well established in the literature and has been efectively exploited in
previous works in the PU learning setting [
        <xref ref-type="bibr" rid="ref10 ref11 ref12 ref13">10, 11, 12, 13</xref>
        ]. The score we compute for each
unlabeled instance is in the interval [
        <xref ref-type="bibr" rid="ref1">0, 1</xref>
        ], where a value close to 0.0 (resp., 1.0) means that the
unlabeled example is likely to be a negative (resp., positive) example.
      </p>
      <p>
        Methodologically, the score is computed according to the similarity between the feature
vector associated with the unlabeled instance and the feature vectors of the cluster prototypes.
√∑︀
As similarity function, we consider (, ) = 1 − =1(− )2 , that computes the similarity
between the instance vectors  and , in a -dimensional space, based on the Euclidean distance,
after applying a min-max normalization (in the range [
        <xref ref-type="bibr" rid="ref1">0, 1</xref>
        ]) to all the features. After this step,
we obtain a fully labeled dataset composed by true and estimated labels, where the computed
score represents the confidence on the fact that a given unlabelled instance is a positive instance.
      </p>
      <p>Subsequently, if the dataset is highly dimensional1, STEAL applies a distributed variant
of the PCA, in order to identify a reduced set of (non-redundant) new features. We extract
√︀(, ) new features, where  and  are the initial dimensionalities of the source and
the target feature spaces, respectively.</p>
      <p>Finally, we focus on the main objective of this stage, namely, the identification of an alignment
between the descriptive variables of the source and target domains. This step is necessary in
order to make the instances of the source and target domains directly comparable (through a
distance/similarity measure), to carry our the following Stage. Specifically, the goal is to find a
correspondent feature in the source domain, for each feature of the target domain. To this aim,
we compute the asymmetric KL divergence for each target-source pair of descriptive variables,
that quantifies the loss of information caused by a given feature of the source domain when
used to approximate a given feature of the target domain. Formally, given the -th feature of
the target domain  , and the -th feature of the source domain  , we compute the discrete
variant of the KL divergence between  and  as follows:
( ,  , , ) =</p>
      <p>∑︁
∈(( ,)∩( ,))
(,  , ) ln
(,  , ) ,
(,  , )
(1)
where  and  are the training instances of the target and source domains, respectively;
(, ) represents the set of distinct values of the feature  in the dataset ; (, , )
represents the relative number of occurrences of the value  on the feature  in the dataset ,
after a proper discretization of the feature 2.</p>
      <p>In Figure 1 we graphically show this alignment step, while the pseudo-code is reported in
Algorithm 1. STEAL starts by considering the values assumed by the attributes as the key of
a paired RDD. The result is then used to find common values assumed by target and source
attributes, through a join operation. The frequency of each distinct value is then computed for
each target and source attribute, which are then exploited to estimate the probabilities used in
1This can be considered as an optional step, that mainly depends on the availability of computational resources.
2We discretize continuous features using the equal-width strategy (bin size of 0.01), after min-max normalization.</p>
      <p>Algorithm 1: alignSourceDomain(, )</p>
      <p>Data: , : RDD. Target and source domain instances (in the ⟨_,  , ⟩ form)
Result:  : RDD. Source domain instances aligned in the -dimensional feature space.
1 begin
2   ← .map{case(_,  , ) → ⟨,  ⟩};
3   ← .map{case(_,  , ) → ⟨,  ⟩};</p>
      <p>// Join the target and source structures according to the feature values
4   ←  .join( )</p>
      <p>// Count the frequencies of values for the target features
5    ←  .map{case(,  ,  )→⟨⟨,  ⟩, 1⟩}.reduceByKey((, )→+)
// Compute the number (cardinality) of value matches for each target feature
6  ←   .map{case(⟨⟨,  ⟩,  ⟩)→⟨ ,  ⟩}.reduceByKey((, )→ + )
// Compute the value probabilities for the target domain
7    ←   .map{case(⟨,  ⟩,  ) → ⟨ , ⟨,  ⟩⟩}
8 .join().map{case( , ⟨,  , ⟩) → ⟨,  ,  /⟩}</p>
      <p>// Compute the value probabilities for the source domain in the same way
9    ← ...</p>
      <p>// Compute the KL divergences
 ←   .cartesian(  )
.map{case(⟨ ,  ,  ⟩, ⟨ ,  ,  ⟩)→⟨⟨ ,  ⟩,   · ( / )⟩}
.reduceByKey((, ) →  + )
// Find the best target-source features matching by minimizing the KL divergences
 ← 
.map{case(⟨ ,  ⟩, ) → ⟨ , ⟨ , ⟩⟩}
.reduceByKey{case(⟨ 1, 1⟩, ⟨ 2, 2⟩) →</p>
      <p>if(1 &lt; 2) then ⟨ 1, 1⟩ else ⟨ 2, 2⟩}
// Align source features to target features, according to the minimum divergences
return (, )
the computation of the KL terms. Such probabilities are computed by dividing the frequencies
by the number of matches (in the join operation) that involved a given attribute. Finally, we
compute the KL divergence for each target-source pair of features, which are then exploited to
ifnd the best (i.e., with the minimum divergence) source feature for each target feature.</p>
      <p>Note that the proposed approach implicitly performs a feature selection on the source domain,
since some of the features could not be matched to any target feature. Analogously, the same
feature of the source domain could be selected multiple times for diferent features of the target
domain, if it appears strongly aligned to all of them from a statistical viewpoint.
Stage 2 - Instance Matching. After the first stage, target and source instances are represented in
an aligned feature space, and are also directly comparable through similarity/distance measures.
A straightforward approach to exploit the instances of the source domain would be that of
appending them to the training set of the target domain. However, this approach would lead to
the possible introduction of noisy instances, i.e., not properly representing the data distribution
of the target domain, increasing the chance of negative transfer phenomena.</p>
      <p>On the contrary, we aim to finely exploit only a subset of source instances, by attaching them
to specific target instances, through feature concatenation, according to their similarity. In
this way, STEAL augments the descriptive features of target instances with those coming from
source domain
descriptive variables</p>
      <p>Fs1 Fs2 Fs3
Ft1 Ft2 Ft3 Ft4</p>
      <p>target domain
descriptive variables</p>
      <p>Kullback-Leibler
divergences
divergence matrix</p>
      <p>Fs1 Fs2 Fs3
aligned, best-matching, source instances.</p>
      <p>STEAL computes the Cartesian product between  subsets (randomly selected,
with replacement) of target and source instances, and reduce the resultset by taking the source
instances with the maximum similarity. The adoption of multiple random subsets, instead
of working on the whole set of instances alleviates the negative efects due to the possible
match of a target instance to an improper source instance, since the same target instance may
be sampled multiple times and associated with diferent source instances. At the end of the
instance matching step, we obtain a single target-source hybrid training set by merging the
target-source concatenated instances constructed from each sample. For space constraints, we
omit the pseudo-code description of this stage.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Experiments</title>
      <p>
        We conducted our experimental evaluation in a relevant application domain where transfer
learning can be strongly beneficial, namely in biology, and specifically in the reconstruction of
the human gene regulatory network (target domain), using also the knowledge coming from
the mouse gene regulatory network (source domain). The reconstruction of a gene regulatory
network consists in the identification of currently unknown gene regulatory activities. A
gene regulatory network consists of a set of genes (represented as nodes) and a set of known
regulations (represented as edges). The reconstruction of such networks can be performed by
resorting to link prediction methods [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], working in the Positive-Unlabelled learning setting.
      </p>
      <p>
        We compared the results achieved by STEAL with some state-of-the-art heterogeneous
transfer learning competitors, namely TJM [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], JGSA [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] and BDA [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. Note that even if they
are able to work with heterogeneous feature spaces, the latter are required to have the same
dimensionality. Moreover, we also evaluated the results obtained by some baseline approaches,
that are: i) T (no transfer), that is a predictive model learned only from the dataset of the
target domain; ii) S (optimal feature alignment), that is a predictive model trained only from
the source domain dataset, assuming that the optimal feature alignment is known a priori; iii) T
+ S (optimal feature alignment), that is a predictive model trained from both the target and
the source instances, assuming that the optimal feature alignment is known a priori.
      </p>
      <p>
        In our experiments, we used the dataset adopted in [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], which consists of two gene regulatory
networks, one for the human organism and one for the mouse organism. We considered both the
available versions of the dataset, i.e., the heterogeneous version, where human and mouse genes
Positive interactions
Unlabeled interactions
Gene features
Gene-pair features
are described by 174 and 161 expression levels, respectively, and the homogeneous version
obtained by averaging the expression levels per organ. Moreover, since competitor systems
ran out of memory on our server equipped with 64GB RAM, we also ran the experiments on
a reduced version of the homogeneous dataset consisting of 2% randomly selected instances.
Detailed quantitative information of the considered datasets are shown in Table 1.
      </p>
      <p>The task at hand naturally falls in the PU learning setting, since unobserved links cannot be
considered negative examples. To perform the estimation of the label confidence (Stage 1), we
identified 3 clusters and 2 clusters for the mouse and the human positive instances, respectively.
Such values were identified through the silhouette cluster analysis. For the heterogeneous
version of the dataset, we ran STEAL with the distributed PCA, that led to identify 18 new
features. As regards the instance matching, 10 samples were considered (i.e., =10),
each involving 10% random training examples from the source and from the target domains.</p>
      <p>All the experiments were performed in the 10 fold cross-validation setting, where each fold
consists of 9/10 positive examples for training and 1/10 positive examples for testing, while all
the unlabeled examples are considered for both training and testing.</p>
      <p>As evaluation measure, since the dataset does not contain any negative example, we
considered the recall@ defined as  +  , where   is the number of returned true positive
interactions within the first top-  interactions, and (  +   ) corresponds to the number
of positive examples in the testing fold, and computed the area under the recall@ curve
(AUR@K). Note that the computation of other well-known measures, such as the Area Under
ROC curve (AUROC) and the Area Under Precision-Recall curve (AUPR), is not possible at all
in the positive-unlabeled setting, without introducing wrong biases in the ground truth.</p>
      <p>From the results shown in Table 2, we can observe that STEAL is able to outperform all the
considered baselines, in both the heterogeneous and homogeneous settings. In particular, the
results show that STEAL outperforms the counterpart that does not exploit the source domain,
i.e., T (no transfer), by 27% in the homogeneous setting and by 9.7% in the heterogeneous setting,
proving the benefits of exploiting the additional knowledge coming from the mouse organism.
On the other hand, using only the source domain (i.e., S (optimal feature alignment)), leads to
a reduction of the AUR@K with respect to using the target data only. This means that, although
source data can be beneficial for the target task, their exploitation should be done in a smart
manner. Note that, as expected, appending source instances to the target dataset, even if the
optimal feature alignment is known a priori (see T+S (optimal feature alignment)), does not
provide as much as benefits as the method adopted by STEAL, that outperforms this approach
by 23.4% and 10% in the homogeneous and heterogeneous settings, respectively.</p>
      <p>Method</p>
      <p>T (no transfer)
S (optimal feature alignment)
T+S (optimal feature alignment)</p>
      <p>T+S (STEAL)</p>
      <p>Method Homogeneous Reduced</p>
      <p>JGSA 0.500
TJM 0.554
BDA 0.558
STEAL 0.589</p>
      <p>The comparison with competitors on the reduced dataset shows that STEAL is able to
outperform all of them. Specifically, STEAL achieves an improvement of 17.8% with respect to
JGSA, 6.3% with respect to TJM, and 5.5% with respect to BDA. Note that the superiority of
STEAL over these competitors is not limited to the observed AUR@K on the reduced dataset,
since JGSA, TJM and BDA were not able to complete the experiments on the full dataset.</p>
      <p>To assess the advantages of STEAL also from a computational viewpoint, we performed
a scalability analysis. We measured the running times and computed the speedup factor to
evaluate the ability of STEAL to exploit additional computational nodes. This analysis was
performed on a cluster with 1, 2, or 3 computational nodes, by resorting to a synthetic dataset
with 10 millions instances. The results of this analysis, depicted in Figure 2, show that, although
the communication overhead in principle increases, STEAL is able to take advantage of possible
additional computational nodes. Specifically, the measured speedup factors are close to ideal
results, i.e., they quickly converge to 2 and 3, respectively, with 2 and 3 computational nodes.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusions</title>
      <p>In this discussion paper, we proposed a novel heterogeneous transfer learning method, called
STEAL, that can work in the PU learning setting also with heterogeneous feature spaces, thanks
to a feature alignment step based on the KL divergence. The obtained results demonstrate the
efectiveness of the proposed method with respect to baseline and state-of-the-art competitors.
Moreover, it exhibits very interesting scalability results, emphasizing its possible applicability
on large datasets. We evaluated the performance of our method in the reconstruction of
gene regulatory networks, that obtained significant benefits from the application of STEAL.
Currently, we are evaluating the efectiveness of STEAL in other application domains, including
the prediction of the energy consumption in power grids, and we are working on extending it
with the possibility of exploiting multiple source domains.</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgments</title>
      <p>Dr. Paolo Mignone acknowledges the support of Apulia Region through the REFIN project
“Metodi per l’ottimizzazione delle reti di distribuzione di energia e per la pianificazione di
interventi manutentivi ed evolutivi” (CUP H94I20000410008, Grant n. 7EDD092A).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>F.</given-names>
            <surname>Zhuang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Qi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Duan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Xi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Xiong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <article-title>A comprehensive survey on transfer learning</article-title>
          ,
          <source>Proc. IEEE</source>
          <volume>109</volume>
          (
          <year>2021</year>
          )
          <fpage>43</fpage>
          -
          <lpage>76</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>K.</given-names>
            <surname>Amasyali</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. M.</given-names>
            <surname>El-Gohary</surname>
          </string-name>
          ,
          <article-title>A review of data-driven building energy consumption prediction studies</article-title>
          ,
          <source>Renewable and Sustainable Energy Reviews</source>
          <volume>81</volume>
          (
          <year>2018</year>
          )
          <fpage>1192</fpage>
          -
          <lpage>1205</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Xia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Deng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Xie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Gong</surname>
          </string-name>
          , X. Ma, TLGP:
          <article-title>a flexible transfer learning algorithm for gene prioritization based on heterogeneous source domain</article-title>
          ,
          <source>BMC Bioinform</source>
          . 22-S (
          <year>2021</year>
          )
          <fpage>274</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>W.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Duan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I. W.</given-names>
            <surname>Tsang</surname>
          </string-name>
          ,
          <article-title>Learning with augmented features for supervised and semi-supervised heterogeneous domain adaptation</article-title>
          ,
          <source>IEEE Trans. Pattern Anal. Mach. Intell</source>
          .
          <volume>36</volume>
          (
          <year>2014</year>
          )
          <fpage>1134</fpage>
          -
          <lpage>1148</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>J. T.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. J.</given-names>
            <surname>Pan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I. W.</given-names>
            <surname>Tsang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yan</surname>
          </string-name>
          ,
          <article-title>Hybrid heterogeneous transfer learning through deep learning</article-title>
          ,
          <source>in: AAAI</source>
          <year>2014</year>
          ,
          <year>2014</year>
          , pp.
          <fpage>2213</fpage>
          -
          <lpage>2220</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>M.</given-names>
            <surname>Long</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Ding</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. S.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <article-title>Transfer joint matching for unsupervised domain adaptation</article-title>
          ,
          <source>in: CVPR</source>
          <year>2014</year>
          ,
          <year>2014</year>
          , pp.
          <fpage>1410</fpage>
          -
          <lpage>1417</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Ogunbona</surname>
          </string-name>
          ,
          <article-title>Joint geometrical and statistical alignment for visual domain adaptation</article-title>
          ,
          <source>Proceedings - 30th IEEE Conference on Computer Vision</source>
          and Pattern Recognition,
          <string-name>
            <surname>CVPR</surname>
          </string-name>
          <year>2017</year>
          2017-
          <fpage>January</fpage>
          (
          <year>2017</year>
          )
          <fpage>5150</fpage>
          -
          <lpage>5158</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>J.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Hao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Feng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Shen</surname>
          </string-name>
          ,
          <article-title>Balanced distribution adaptation for transfer learning</article-title>
          ,
          <source>in: ICDM</source>
          <year>2017</year>
          ,
          <year>2017</year>
          , pp.
          <fpage>1129</fpage>
          -
          <lpage>1134</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>B.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , W. Zuo,
          <article-title>Learning from positive and unlabeled examples: A survey</article-title>
          ,
          <source>in: ISIP 2008 / WMWA</source>
          <year>2008</year>
          , Moscow, Russia,
          <fpage>23</fpage>
          -
          <lpage>25</lpage>
          May
          <year>2008</year>
          ,
          <year>2008</year>
          , pp.
          <fpage>650</fpage>
          -
          <lpage>654</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>P.</given-names>
            <surname>Mignone</surname>
          </string-name>
          , G. Pio,
          <article-title>Positive unlabeled link prediction via transfer learning for gene network reconstruction</article-title>
          ,
          <source>in: ISMIS</source>
          <year>2018</year>
          ,
          <year>2018</year>
          , pp.
          <fpage>13</fpage>
          -
          <lpage>23</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>P.</given-names>
            <surname>Mignone</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Pio</surname>
          </string-name>
          ,
          <string-name>
            <surname>D. D'Elia</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Ceci</surname>
          </string-name>
          ,
          <article-title>Exploiting transfer learning for the reconstruction of the human gene regulatory network</article-title>
          ,
          <source>Bioinformatics</source>
          <volume>36</volume>
          (
          <year>2020</year>
          )
          <fpage>1553</fpage>
          -
          <lpage>1561</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>P.</given-names>
            <surname>Mignone</surname>
          </string-name>
          , G. Pio,
          <string-name>
            <given-names>S.</given-names>
            <surname>Džeroski</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Ceci, Multi-task learning for the simultaneous reconstruction of the human and mouse gene regulatory networks</article-title>
          ,
          <source>Scientific Reports</source>
          <volume>10</volume>
          (
          <year>2020</year>
          )
          <fpage>22295</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>G.</given-names>
            <surname>Pio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Mignone</surname>
          </string-name>
          , G. Magazzù, G. Zampieri,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ceci</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Angione</surname>
          </string-name>
          ,
          <article-title>Integrating genomescale metabolic modelling and transfer learning for human gene regulatory network reconstruction</article-title>
          ,
          <source>Bioinformatics</source>
          <volume>38</volume>
          (
          <year>2021</year>
          )
          <fpage>487</fpage>
          -
          <lpage>493</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>L.</given-names>
            <surname>Lu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <article-title>Link prediction in complex networks: A survey, Physica A: Statistical Mechanics and its Applications 390 (</article-title>
          <year>2011</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>