<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Metalearning for multiple-domain Transfer Learning</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Catarina Felix</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Carlos Soares</string-name>
          <email>csoares@fe.up.pt</email>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Al pio Jorge</string-name>
          <email>amjorge@fc.up.pt</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Faculdade de Ci</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Faculdade de Engenharia da Universidade do Porto</institution>
          ,
          <country country="PT">Portugal</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>INESC TEC</institution>
          ,
          <country country="PT">Portugal</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>LIAAD-INESC TEC</institution>
          ,
          <country country="PT">Portugal</country>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>encias da Universidade do Porto</institution>
          ,
          <country country="PT">Portugal</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Machine learning processes consist in collecting data, obtaining a model and applying it to a given task. Given a new task, the standard approach is to restart the learning process and obtain a new model. However, previous learning experience can be exploited to assist the new learning process. The two most studied approaches for this are metalearning and transfer learning. Metalearning can be used for selecting the predictive model to use over a determined dataset. Transfer learning allows the reuse of knowledge from previous tasks. Our aim is to use metalearning to support transfer learning and reduce the computational cost without loss in terms of performance, as well as the user e ort needed for the algorithm selection. In this paper we propose some methods for mapping the transfer of weights between neural networks to improve the performance of the target network, and describe some experiments performed in order to test our hypothesis.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Machine learning processes consist of 1) collecting training data for the new task;
2) obtaining a model; 3) applying the model to new data. This is done even when
the new task is related to one or more tasks previously solved, for example, when
there are relationships between variables or between the processes used to obtain
the models.</p>
      <p>There are two approaches to taking advantage of previous learning experience
in new tasks: metalearning and transfer learning. Both transfer learning and
metalearning use information about a domain to learn e ciently and e ectively
in a new one. Metalearning focuses on the choice of a learning algorithm and
transfer learning on experience obtained from previous tasks. This suggests that
transfer learning and metalearning may be used together.</p>
      <p>Our aim is to investigate if metalearning can be used to support transfer
learning in tasks consisting of very diverse subtasks, reducing computational
cost without loss in predictive performance and cutting down the time data
scientists need to perform their tasks.</p>
      <p>In this paper we describe some aspects of the state of the art in metalearning
and transfer learning. We propose some methods for mapping the transfer of
weights between neural networks and describe experiments performed to test
the hypothesis that the transfer of weights improves the results of the target
network.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Metalearning and Transfer Learning</title>
      <p>This section presents the basic concepts related with our work. First we describe
metalearning, some of its methods and examples of use. After that, we present
transfer learning, its motivation, operation mode and some techniques used.
Finally, we describe some examples of the combination of metalearning and
transfer learning.
2.1</p>
      <sec id="sec-2-1">
        <title>Metalearning</title>
        <p>Metalearning aims at helping in the process of selecting a predictive algorithm
to use on a determined dataset. It also aims at taking advantage of the repetitive
use of a determined method over a set of similar tasks.</p>
        <p>There are several applications for metalearning. It can be used in combining
base learners: using several learners together to create a composite model that
better predicts the result. Another application of metalearning is bias
management, mostly used for data streams (continuous ows of data, for example from
large and continuously growing databases) that require context adaptation due
to the fact that the domain is not static. Metalearning can also be used to
transfer metaknowledge across tasks. It is mostly used for the Algorithm Selection
Problem, described next.</p>
        <p>
          Algorithm Recommendation Choosing the best algorithm for processing
a given dataset is a di cult process. Besides, the algorithms normally have
parameters that a ect its e ciency and tuning them can be a di cult and slow
task. This constitutes the motivation for the Algorithm Selection Problem [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ],
originally formulated by Rice [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ].
        </p>
        <p>This problem consists in determining the best algorithm to use for a certain
dataset. The metalearning approach takes advantage of information previously
obtained on several datasets and also on several algorithms. This knowledge is
used to build a metamodel that, given a new dataset, gives the system the ability
to recommend the best suited algorithm.</p>
        <p>
          Earlier applications of metalearning addressed the most common tasks -
classi cation [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ], regression [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] and time series [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]. These approaches were then
extended to selecting parameter settings for a single algorithm [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ], the whole
data mining process [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] and also to problems from domains other than machine
learning, e.g.: di erent optimization problems [
          <xref ref-type="bibr" rid="ref14 ref15">14, 15</xref>
          ]. More recently, they were
also used to deal with new problems in data mining: data streams [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ].
2.2
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>Transfer Learning</title>
        <p>
          A de nition of transfer learning can be found in [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ]: given a source domain DS
and a learning task TS , a target domain DT and a learning task TT , transfer
learning aims to help improve the learning of the target predictive function fT (:)
in DT using the knowledge in DS and TS , where DS 6= DT , or TS 6= TT .
        </p>
        <p>Transfer learning allows the tasks and distributions used in training and
testing to be di erent. Here, the knowledge is transferred from one task, the
source task, to another, the target task. It is inspired in the logic used by the
human brain: the methods that allow, for example, someone to recognize pears
based on previous knowledge on recognizing apples.</p>
        <p>Transfer learning allows algorithms to adapt to new tasks based on the
knowledge obtained in previous ones, and the three main research issues in this topic
are related to what, how and when to transfer.</p>
        <p>
          What to transfer? This question concerns the type of information transferred
between problems: instance-transfer, where instances from the source domain are
used together with the ones on the target domain, to improve the performance
of the target model, as in TrAdaBoost [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ] algorithm;
feature-representationtransfer, where a set of feature representations is extracted from the source
domain and transferred, obtaining a feature representation of the target domain
as in [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ]; parameter-transfer that is done by calculating the source model,
extracting its parameters and, assuming that the models for related tasks share
some parameters, transferring them to build the target model as in [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ]; and
relational-knowledge-transfer, that consists in trying to transfer the knowledge
about data between the domains, as is the case of the TAMAR [
          <xref ref-type="bibr" rid="ref23">23</xref>
          ] algorithm.
How to transfer? After knowing the information that should be transferred,
the focus is on how to transfer?, that is, on the development of learning
algorithms to perform the transfer. For example, the DBT (Discriminability-based
transfer) algorithm [
          <xref ref-type="bibr" rid="ref24">24</xref>
          ] consists in modifying the neural network weights
obtained in the source classi cation task in order to use them on a target
network. In [
          <xref ref-type="bibr" rid="ref25">25</xref>
          ], a "transfer-aware" naive Bayes classi cation algorithm is proposed.
In [
          <xref ref-type="bibr" rid="ref26">26</xref>
          ], rst order decision trees are used for reinforcement learning, and some
tree statistics are transferred from the source to the target problem. In [27],
graph-based transferability is determined: it automatically determines the
parameters to transfer between biased logistic regression tasks. The Kolmogorov
complexity between tasks is used in [28] to transfer knowledge between bayesian
decision trees. [29] introduces a context-sensitive multi task learning that helps
improving performance in neural networks for classi cation. In [30] the authors
use clustering to perform a feature selection to be transferred, improving the
performance of a Bayesian algorithm.
        </p>
        <p>
          When to transfer? The last question means to know in which situations
the transfer should be performed. Ultimately, the objective is to avoid negative
transfer : when the transfer can harm the learning process in the target task.
This issue is referred in [
          <xref ref-type="bibr" rid="ref25">25</xref>
          ], where the authors wish to identify when transfer
learning will hurt the performance of the algorithm instead of improving it.
2.3
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>Metalearning and Transfer Learning</title>
        <p>Some work has been performed in using metalearning together with transfer
learning. We analyzed some literature related to classi cation tasks that is
described next.</p>
        <p>Metafeatures are used in [31] for calculating similarities between the datasets.
The algorithms used for this task is the k-nearest neighbors. In [32, 33] there is
no use of metafeatures, since the transfers are made without choosing the best
source dataset to use with a certain target dataset. In [32], metalearning is used
to nd matrix transformations capable of producing good kernel matrices for
the source tasks. The matrices are then transferred to the target tasks.</p>
        <p>The results are evaluated by performance measures as accuracy [33] and more
precisely by the area under the ROC curve in [31, 32].</p>
        <p>The transferred objects found on the studied papers are SVM parameter
settings in [31], the kernel matrices in [32] and the parameter function
(responsible for mapping statistics to parameters in "bag-of-words" text classi cation
problems) in [33].
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Mapping of variables for transfer</title>
      <p>We now propose some methods for mapping variables for transfer and show the
results of applying the methods in some experiments, using neural networks with
three neurons on the hidden layer. The transfer is made from one variable on the
source dataset to another one on the target dataset. In a neural network each
neuron corresponds to a variable on the dataset, and has a connection to all the
neurons on the hidden layer.</p>
      <p>The methods proposed are described next:
1. Random: the weights are randomly ordered. We repeat this 100 times and
generate 100 sets of randomly ordered weights.
2. Direct: the weights are transferred directly between corresponding variables,
when the datasets have the same structure.
3. Mapped: the weights are ordered according to some criteria:
(a) Kullback-Leibler divergence: we obtain the KL divergence between
all the attributes of the source dataset and and all the attributes of the
target dataset. The transfer is made between the attributes with smaller
divergence.
(b) Pearson, Spearman and Kendall correlations: we obtain the
correlation between every attribute in each dataset and its target. The transfer
is made between the attributes with the most similar correlation to the
respective target.</p>
    </sec>
    <sec id="sec-4">
      <title>Experiments performed</title>
      <p>Some experiments have been performed to study if the transfer of knowledge
improves the performance of an algorithm. The aim is to measure the success
of transferring the weights of a neural network learned on a source dataset to a
new neural network that will be trained on a target dataset. All the weights are
transferred, according to some mapping, and are used to initialize the network
in a non-random way. The resulting error is compared to the one obtained with
random initial weights, to assess in which cases occurs an improvement.</p>
      <p>In the experiments, the source and target datasets may be unrelated or
related (e.g. generated by the same process in di erent times or generated by
processes with the same structure). Weight transfer is performed from one
variable in the source model to one variable in the target model. The datasets used
were retrieved from UCI [34] and di erent experiments have been made.
4.1</p>
      <sec id="sec-4-1">
        <title>Experiment 1</title>
        <p>The objective of this experiment is to study the behavior of the transfer of
knowledge between tasks. We compared random transfer (made between
randomly chosen variables) with direct transfer (performed between correspondent
variables, in related datasets).</p>
        <p>Experiment Description In this experiment, source and target datasets may
be unrelated or related (e.g. generated by the same process in di erent times or
generated by processes with the same structure). Weight transfer is performed
from one variable in the source model to one variable in the target model. The
mapping of variables for the transfer can be random or between corresponding
variables.</p>
        <p>To perform this experiment the datasets used were:</p>
        <p>One of the datasets used, Communities and Crime Unnormalized, has 18
target variables. It was used to generate new datasets using the same original
independent variables. These datasets are, then, related to each other. The other
datasets are, in principle, independent among themselves. All the datasets were
normalized in a preprocessing phase.</p>
        <p>For this experiment we ran each dataset through a neural network with three
neurons in the hidden layer, using ten-fold cross-validation. First the networks
are trained with a random initial set of weights, and we measure the Mean
Squared Error,M SE = n1 Pin=1 (y^i yi)2. Then each network is trained with the
best set of weights found for the other networks and we also measure the MSE
for each network. For each network we compare the error obtained with random
initial weights (M SER) with the ones obtained with the weights transferred
from other networks (M SET ). We consider that the transfer has improved the
result when M SER is bigger than M SET .</p>
        <p>For the unrelated datasets, the transfer was performed randomly. For the
related datasets the transfer was performed in two di erent ways: randomly and
also directly between corresponding variables.</p>
        <p>Results Figures 1 to 3 show the distribution of the improvements for the
experiments. In the x and y axis we have the source and target datasets,
respectively. We calculated the number of times when transfer improves the MSE. In
these charts the color of the squares represents the number of times the
transfer between those datasets improved the performance on the target task: darker
squares represent a higher probability of reducing the error when using transfer
of weights.</p>
        <p>Proportion of Improvements − experiment #1
te 10
sa
t
taeD 8
g
r
a
T 6
4
1
2
1
4
2
2
4
6</p>
        <p>8
Source Dataset
10
12
14
relation of the target variables for each pair of datasets.
runs of each variant of the experiment. This chart shows that in the rst variant
the improvement is lowest. Note that datasets used in the second and third
variants of the experiment are related, unlike the ones used in the rst.</p>
        <p>A plausible cause for the last variant of the experiment being the one with
more improvements is that not only the datasets are related, but also the transfer
of weights is made directly between corresponding variables from one dataset to
another, because the structure of the neural network is the same.</p>
        <p>The improvement obtained was near 50% for the random transfer between
unrelated datasets. This means that random transfer has the same probability
of improving the result as it has of deteriorating it. The random and direct
transfers between related datasets (with the same attributes but di erent target
variables) show, respectively, around 60% and 70% of improvements. This means
that the transfer between related datasets increases the probability of
improving the result of a neural network. This probability increases even more when
the transfer is made directly between corresponding variables, showing that the
transfer between similar (in this case, the same) variables is advantageous.
4.2</p>
      </sec>
      <sec id="sec-4-2">
        <title>Experiment 2</title>
        <p>The objective of this experiment was to study the behavior of the transfer of
knowledge between similar variables, comparing it to the random transfer of
knowledge.</p>
        <p>0
8
0
7
tn 60
e
m
ve
Irpom 50
0
4
0
3
different−random
simmilar−random</p>
        <p>Experiment
simmilar−direct
Experiment description In this experiment, source and target datasets are
in principle unrelated. The datasets used were the ones considered as unrelated
on Experiment 1.
1. Concrete Compressive Strength
2. Wine Quality (red wine)
3. Challenger USA Space Shuttle O-Ring (erosion only)
4. Concrete Slump Test5
5. Airfoil Self-Noise
6. Energy e ciency6
7. Yacht Aerodynamics</p>
        <p>Before running the experiment, the datasets were subject to a preprocessing
phase, which included the normalization. For this experiment we ran the datasets
through a neural network with three neurons in the hidden layer, using ten-fold
cross-validation.</p>
        <p>First, the initial set of weights fed to each neural network is composed by
values generated randomly between 0 and 1. In order to outwit the randomness
of the weight generation, the whole processed is repeated 100 times for each
dataset. The dataset and weights are fed to the neural network and, using
tenfold cross-validation, we obtain the Mean Squared Error and the Aggregated
Weights (mean of the ten sets of weights obtained from the network).
5 This dataset has 3 target variables. For this experiment only one of them was used:</p>
        <p>SLUMP (cm).
6 This dataset has 2 target variables. For this experiment only one of them was used:
Y1.</p>
        <p>These aggregated weights are transfered to other neural networks to try to
improve their performance. The transfer is performed in two ways: random and
mapped and the weights are fed to the neural network, together with the target
dataset. The learning process occurs and the resulting mean squared error is
saved.</p>
        <p>The errors obtained in the rst learning process (with randomly generated
weights - M SER) are compared with the ones obtained in the second learning
process (with the weights transferred from the other datasets - M SET ). For this,
we calculate: MSEO MSET .</p>
        <p>MSEO</p>
        <p>For each pair of datasets, we repeat the transfer several times: 10000
(100 100) for random transfer and 100 for mapped transfer.</p>
        <p>Results The chart in Figure 5 shows the probability of improving the
performance of the neural network by transferring the weights using the same dataset
as source and target.</p>
        <p>We can see in the chart that the transfer of the same set of weights generates
more improvements than using a new random set of weights. This is because the
rst is equivalent to running the neural network for twice the iterations, leading
to a better tting of the result.</p>
        <p>The charts in Figures 6 to 10 show the results for the di erent types of
mapping: random, using Kullback-Leibler divergence, Pearson, Spearman and
Kendall correlations, respectively.</p>
        <p>For the random mapping the gure shows, in the left, the mean number of
times the transfer improves the predictions and, in the right, the histograms of
the same information, where the colors match the ones on the images on the
left: gray tones for when the transfer increases the error and the other colors for
when there is an improvement.</p>
        <p>The same information is shown in the charts that refer to the other types of
mappings used. For these, we added a chart, in the middle, that shows the
difference, in terms of improvement, between the measured mapping methods and
the random mapping method. The colors also match the ones in the histogram.</p>
        <p>In all cases the proportion of improvements is below (but near) 50%. Our
aim is to nd the proper features that allow this proportion to increase.
5</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Conclusions and Future Work</title>
      <p>We can use related variables to identify characteristics of the model that can be
transfered with the advantage of reducing the computational cost and the user
e ort on the process.</p>
      <p>In this paper we described methods for mapping the transfer of weights
between neural networks. We also show results of some experiments performed
to test the hypothesis that the transfer of weights will improve the results of the
neural network.</p>
      <p>In the rst experiment we obtained an improvement near 50% for the random
transfer between unrelated datasets and around 60% and 70% of improvements
random and direct transfers between related datasets, respectively. This shows
that the transfer between similar datasets is advantageous, and the advantages
increase even more when the transfer is performed between similar variables.</p>
      <p>In the second experiment, that was performed with unrelated datasets, we
obtained probabilities of improvement below (but near) 50% for all the mappings
considered. We aim to nd the proper features that allow increasing this value.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>This work is nanced by the ERDF - European Regional Development Fund
through the COMPETE programme (operational programme for
competitiveness) within project GNOSIS, cf. FCOMP-01-0202-FEDER-038987.
tsae
ttaeD 4
rgaT
tsae
ttaeD 4
ragT
7</p>
      <p>7
6</p>
      <p>6
5</p>
      <p>5
3</p>
      <p>3
2</p>
      <p>2
1</p>
      <p>1
7</p>
      <p>7
6</p>
      <p>6
5</p>
      <p>5
3</p>
      <p>3
2</p>
      <p>2
1</p>
      <p>1
tsae
ttaeD 4
rgaT
tsea
ttaeD 4
ragT
tsae
ttaeD 4
rgaT
7</p>
      <p>7
6</p>
      <p>6
5</p>
      <p>5
3</p>
      <p>3
2</p>
      <p>2
1</p>
      <p>1
tsae
ttaeD 4
ragT
1 2 3 4 5 6 7</p>
      <p>SourceDataset
1 2 3 4 5 6 7</p>
      <p>SourceDataset
tsae
ttaeD 4
ragT
tsea
ttaeD 4
rgaT
7</p>
      <p>7
6</p>
      <p>6
5</p>
      <p>5
3</p>
      <p>3
2</p>
      <p>2
1
1
8
6
8
6
2
cy
ueqne 4
r
F</p>
      <p>2
ycuenqe
rF 4</p>
      <p>KL−divergence
Proportion of improvements
47.9% victories</p>
      <p>KL−divergence
Proportion of improvements</p>
      <p>Compared to Random</p>
      <p>KL−divergence
Histogram of victories</p>
      <p>PropPoeratirosnonofCiomrprerolavteiomnents
48.1% victories
6
5
2
1
8
6
2
y 4
rceueqnF 3
ycnueqe
rF 4
27. Eaton, E., Desjardins, M., Lane, T.: Modeling transfer relationships between
learning tasks for improved inductive transfer
28. Mahmud, M., Ray, S.: Transfer learning using kolmogorov complexity: basic theory
and empirical evaluations. In: Advances in neural information processing systems.
(2007) 985{992
29. Silver, D.L., Poirier, R., Currie, D.: Inductive transfer with context-sensitive neural
networks. Machine Learning 73(3) (2008) 313{336
30. Mishra, M., Huan, J.: Multitask learning with feature selection for groups of
related tasks. In: Data Mining (ICDM), 2013 IEEE 13th International Conference
on, IEEE (2013) 1157{1162
31. Biondi, G., Prati, R.: Setting parameters for support vector machines using transfer
learning. Journal of Intelligent &amp; Robotic Systems (2015) 1{17
32. Aiolli, F.: Transfer learning by kernel meta-learning. In: ICML Unsupervised and</p>
      <p>Transfer Learning. (2012) 81{95
33. Do, C., Ng, A.Y.: Transfer learning for text classi cation. In: NIPS. (2005)
34. Bache, K., Lichman, M.: UCI machine learning repository (2013)</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Brazdil</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Giraud-Carrier</surname>
            ,
            <given-names>C.G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Soares</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vilalta</surname>
          </string-name>
          , R.: Metalearning - Applications to Data Mining.
          <source>Cognitive Technologies</source>
          . Springer (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Rice</surname>
            ,
            <given-names>J.R.:</given-names>
          </string-name>
          <article-title>The algorithm selection problem</article-title>
          .
          <source>Advances in Computers</source>
          <volume>15</volume>
          (
          <year>1976</year>
          )
          <volume>65</volume>
          {
          <fpage>118</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Brazdil</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Soares</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>da Costa</surname>
            ,
            <given-names>J.P.</given-names>
          </string-name>
          :
          <article-title>Ranking learning algorithms: Using IBL and meta-learning on accuracy and time results</article-title>
          .
          <source>Machine Learning 50(3)</source>
          (
          <year>2003</year>
          )
          <volume>251</volume>
          {
          <fpage>277</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Gama</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brazdil</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Characterization of classi cation algorithms</article-title>
          .
          <source>In: Progress in Arti cial Intelligence, 7th Portuguese Conference on Arti cial Intelligence</source>
          , EPIA '95,
          <string-name>
            <surname>Funchal</surname>
          </string-name>
          , Madeira Island, Portugal, October 3-
          <issue>6</issue>
          ,
          <year>1995</year>
          , Proceedings. (
          <year>1995</year>
          )
          <volume>189</volume>
          {
          <fpage>200</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Kalousis</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gama</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hilario</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>On data and algorithms: Understanding inductive performance</article-title>
          .
          <source>Machine Learning 54(3)</source>
          (
          <year>2004</year>
          )
          <volume>275</volume>
          {
          <fpage>312</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Bensusan</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Giraud-Carrier</surname>
            ,
            <given-names>C.G.</given-names>
          </string-name>
          :
          <article-title>Discovering task neighbourhoods through landmark learning performances</article-title>
          .
          <source>In: Principles of Data Mining and Knowledge Discovery, 4th European Conference, PKDD</source>
          <year>2000</year>
          , Lyon, France,
          <source>September 13- 16</source>
          ,
          <year>2000</year>
          , Proceedings. (
          <year>2000</year>
          )
          <volume>325</volume>
          {
          <fpage>330</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Soares</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          : UCI++
          <article-title>: improved support for algorithm selection using datasetoids</article-title>
          .
          <source>In: Advances in Knowledge Discovery and Data Mining</source>
          , 13th
          <string-name>
            <surname>Paci</surname>
            c-Asia Conference,
            <given-names>PAKDD</given-names>
          </string-name>
          <year>2009</year>
          , Bangkok, Thailand,
          <source>April 27-30</source>
          ,
          <year>2009</year>
          , Proceedings. (
          <year>2009</year>
          )
          <volume>499</volume>
          {
          <fpage>506</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Macia</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Orriols-Puig</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bernado-Mansilla</surname>
          </string-name>
          , E.:
          <article-title>Genetic-based synthetic data sets for the analysis of classi ers behavior</article-title>
          .
          <source>In: Hybrid Intelligent Systems</source>
          ,
          <year>2008</year>
          . HIS'08. Eighth International Conference on,
          <source>IEEE</source>
          (
          <year>2008</year>
          )
          <volume>507</volume>
          {
          <fpage>512</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Blockeel</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vanschoren</surname>
          </string-name>
          , J.:
          <article-title>Experiment databases: Towards an improved experimental methodology in machine learning</article-title>
          .
          <source>In: Knowledge Discovery in Databases: PKDD</source>
          <year>2007</year>
          ,
          <source>11th European Conference on Principles and Practice of Knowledge Discovery in Databases</source>
          , Warsaw, Poland,
          <source>September 17-21</source>
          ,
          <year>2007</year>
          , Proceedings. (
          <year>2007</year>
          )
          <volume>6</volume>
          {
          <fpage>17</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10. Prud^encio,
          <string-name>
            <given-names>R.B.C.</given-names>
            ,
            <surname>Soares</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Ludermir</surname>
          </string-name>
          , T.B.:
          <article-title>Uncertainty sampling-based active selection of datasetoids for meta-learning</article-title>
          .
          <source>In: Arti cial Neural Networks and Machine Learning - ICANN 2011 - 21st International Conference on Arti cial Neural Networks</source>
          , Espoo, Finland, June 14-17,
          <year>2011</year>
          , Proceedings,
          <string-name>
            <surname>Part II</surname>
          </string-name>
          . (
          <year>2011</year>
          )
          <volume>454</volume>
          {
          <fpage>461</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11. Prud^encio,
          <string-name>
            <given-names>R.B.C.</given-names>
            ,
            <surname>Ludermir</surname>
          </string-name>
          , T.B.:
          <article-title>Meta-learning approaches to selecting time series models</article-title>
          .
          <source>Neurocomputing</source>
          <volume>61</volume>
          (
          <year>2004</year>
          )
          <volume>121</volume>
          {
          <fpage>137</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Gomes</surname>
            ,
            <given-names>T.A.F.</given-names>
          </string-name>
          , Prud^encio,
          <string-name>
            <given-names>R.B.C.</given-names>
            ,
            <surname>Soares</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Rossi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.L.D.</given-names>
            ,
            <surname>Carvalho</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.C.P.L.F.</surname>
          </string-name>
          :
          <article-title>Combining meta-learning and search techniques to select parameters for support vector machines</article-title>
          .
          <source>Neurocomputing</source>
          <volume>75</volume>
          (
          <issue>1</issue>
          ) (
          <year>2012</year>
          )
          <volume>3</volume>
          {
          <fpage>13</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Serban</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vanschoren</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kietz</surname>
            ,
            <given-names>J.U.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bernstein</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>A survey of intelligent assistants for data analysis</article-title>
          .
          <source>ACM Comput. Surv</source>
          .
          <volume>45</volume>
          (
          <issue>3</issue>
          )
          <issue>(</issue>
          <year>July 2013</year>
          )
          <volume>31</volume>
          :
          <fpage>1</fpage>
          {
          <fpage>31</fpage>
          :
          <fpage>35</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Abreu</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Soares</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Valente</surname>
            ,
            <given-names>J.M.S.:</given-names>
          </string-name>
          <article-title>Selection of heuristics for the job-shop scheduling problem based on the prediction of gaps in machines</article-title>
          .
          <source>In: Learning and Intelligent Optimization</source>
          , Third International Conference, LION 3,
          <string-name>
            <surname>Trento</surname>
          </string-name>
          , Italy, January
          <volume>14</volume>
          -
          <issue>18</issue>
          ,
          <year>2009</year>
          . Selected Papers. (
          <year>2009</year>
          )
          <volume>134</volume>
          {
          <fpage>147</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Smith-Miles</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Cross-disciplinary perspectives on meta-learning for algorithm selection</article-title>
          .
          <source>ACM Comput. Surv</source>
          .
          <volume>41</volume>
          (
          <issue>1</issue>
          ) (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Gama</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kosina</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Learning about the learning process</article-title>
          .
          <source>In: Advances in Intelligent Data Analysis X - 10th International Symposium, IDA</source>
          <year>2011</year>
          , Porto, Portugal,
          <source>October 29-31</source>
          ,
          <year>2011</year>
          . Proceedings. (
          <year>2011</year>
          )
          <volume>162</volume>
          {
          <fpage>172</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Pan</surname>
            ,
            <given-names>S.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          :
          <article-title>A survey on transfer learning</article-title>
          .
          <source>IEEE Trans. Knowl. Data Eng</source>
          .
          <volume>22</volume>
          (
          <issue>10</issue>
          ) (
          <year>2010</year>
          )
          <volume>1345</volume>
          {
          <fpage>1359</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Yoshida</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hirao</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Iwata</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nagata</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Matsumoto</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Transfer learning for multiple-domain sentiment analysis - identifying domain dependent/independent word polarity</article-title>
          .
          <source>In: Proceedings of the Twenty-Fifth AAAI Conference on Arti cial Intelligence</source>
          ,
          <source>AAAI</source>
          <year>2011</year>
          , San Francisco, California, USA,
          <year>August</year>
          7-
          <issue>11</issue>
          ,
          <year>2011</year>
          . (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Caruana</surname>
          </string-name>
          , R.:
          <article-title>Multitask learning</article-title>
          .
          <source>Machine Learning 28(1)</source>
          (
          <year>1997</year>
          )
          <volume>41</volume>
          {
          <fpage>75</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Dai</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xue</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Boosting for transfer learning</article-title>
          .
          <source>In: Machine Learning, Proceedings of the Twenty-Fourth International Conference (ICML</source>
          <year>2007</year>
          ), Corvallis, Oregon, USA, June 20-24,
          <year>2007</year>
          . (
          <year>2007</year>
          )
          <volume>193</volume>
          {
          <fpage>200</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Blitzer</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McDonald</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pereira</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Domain adaptation with structural correspondence learning</article-title>
          .
          <source>In: Proceedings of the 2006 conference on empirical methods in natural language processing</source>
          ,
          <source>Association for Computational Linguistics</source>
          (
          <year>2006</year>
          )
          <volume>120</volume>
          {
          <fpage>128</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Gao</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fan</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jiang</surname>
            ,
            <given-names>J</given-names>
            ., Han, J
          </string-name>
          .:
          <article-title>Knowledge transfer via multiple model local structure mapping</article-title>
          .
          <source>In: In International Conference on Knowledge Discovery and Data Mining</source>
          , Las Vegas, NV. (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <surname>Mihalkova</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Huynh</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mooney</surname>
          </string-name>
          , R.J.:
          <article-title>Mapping and revising markov logic networks for transfer learning</article-title>
          .
          <source>In: In Proceedings of the 22 nd National Conference on Arti cial Intelligence (AAAI</source>
          . (
          <year>2007</year>
          )
          <volume>608</volume>
          {
          <fpage>614</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          24.
          <string-name>
            <surname>Pratt</surname>
            ,
            <given-names>L.Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pratt</surname>
            ,
            <given-names>L.Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hanson</surname>
            ,
            <given-names>S.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Giles</surname>
            ,
            <given-names>C.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cowan</surname>
            ,
            <given-names>J.D.</given-names>
          </string-name>
          :
          <article-title>Discriminabilitybased transfer between neural networks</article-title>
          .
          <source>In: Advances in Neural Information Processing Systems</source>
          <volume>5</volume>
          , Morgan Kaufmann (
          <year>1993</year>
          )
          <volume>204</volume>
          {
          <fpage>211</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          25.
          <string-name>
            <surname>Rosenstein</surname>
            ,
            <given-names>M.T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Marx</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kaelbling</surname>
            ,
            <given-names>L.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dietterich</surname>
            ,
            <given-names>T.G.</given-names>
          </string-name>
          :
          <article-title>To transfer or not to transfer</article-title>
          . In: In NIPS'05 Workshop, Inductive Transfer: 10
          <string-name>
            <given-names>Years</given-names>
            <surname>Later</surname>
          </string-name>
          . (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          26.
          <string-name>
            <surname>Ramon</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Driessens</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Croonenborghs</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Transfer learning in reinforcement learning problems through partial policy recycling</article-title>
          .
          <source>In: Machine Learning: ECML 2007</source>
          . Springer (
          <year>2007</year>
          )
          <volume>699</volume>
          {
          <fpage>707</fpage>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>