<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Transfer learning for time series anomaly detection</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Vincent Vercruyssen</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Wannes Meert</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jesse Davis</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Dept. of Computer Science</institution>
          ,
          <addr-line>KU Leuven</addr-line>
          ,
          <country country="BE">Belgium</country>
        </aff>
      </contrib-group>
      <fpage>27</fpage>
      <lpage>36</lpage>
      <abstract>
        <p>Currently, time series anomaly detection is attracting significant interest. This is especially true in industry, where companies continuously monitor all aspects of production processes using various sensors. In this context, methods that automatically detect anomalous behavior in the collected data could have a large impact. Unfortunately, for a variety of reasons, it is often difficult to collect large labeled data sets for anomaly detection problems. Typically, only a few data sets will contain labeled data, and each of these will only have a very small number of labeled examples. This makes it difficult to treat anomaly detection as a supervised learning problem. In this paper, we explore using transfer learning in a time-series anomaly detection setting. Our algorithm attempts to transfer labeled examples from a source domain to a target domain where no labels are available. The approach leverages the insight that anomalies are infrequent and unexpected to decide whether or not to transfer a labeled instance to the target domain. Once the transfer is complete, we construct a nearest-neighbor classifier in the target domain, with dynamic time warping as the similarity measure. An experimental evaluation on a number of real-world data sets shows that the overall approach is promising, and that it outperforms unsupervised anomaly detection in the target domain.</p>
      </abstract>
      <kwd-group>
        <kwd>transfer learning</kwd>
        <kwd>anomaly detection</kwd>
        <kwd>time series</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Time series data frequently arise in many different scientific and industrial
contexts. For instance, companies use a variety of sensors to continuously monitor
equipment and natural resources. One relevant use case is developing algorithms
that can automatically identify time series that show anomalous behavior.
Ideally, anomaly detection could be posed as a supervised learning problem.
However, these algorithms require large amounts of labeled training data.
Unfortunately, such data is often not available as obtaining expert labels is
timeconsuming and expensive. Typically, only a small number of labels are known
for a limited number of data sets. For example, if a company monitors several
similar machines, they may only label events (e.g., shutdown, maintenance...)
for a small subset of them.</p>
      <p>
        Transfer learning is an area of research focused on methods that are able to
extract information (e.g., labels, knowledge, etc.) from a data set and reapply
it in another, different data set. Specifically, the goal of transfer learning is
to improve performance on the target domain by leveraging information from
a related data set called the source domain [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. In this paper, we adopt the
paradigm of transfer learning for anomaly detection. In our setting, we assume
that labeled examples are only available in the source domains, and that there
no labeled examples in the target domain. In the example, we utilize the label
information available for machine A to help constructing an anomaly detection
algorithm for machine B, where no labeled points are available for machine B.
      </p>
      <p>
        In this paper we study transfer learning in the context of time-series anomaly
detection, which has received less attention in transfer learning [
        <xref ref-type="bibr" rid="ref1 ref10 ref6">1, 6, 10</xref>
        ]. Our
approach attempts to transfer instances from the source domain to the target
domain. It is based on two important and common insights about anomalous
data points, namely that they are infrequent and unexpected. We leverage these
insights to propose two different ways to identify which source instances should
be transferred to the target domain. Finally, we make predictions in the target
domain by using 1-nearest neighbors classifier where the transferred instances are
the only labeled data points in the target domain. We experimentally evaluate
our approach on a large data set adapted from a real-world data set and find
that it outperforms an unsupervised approach.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Problem statement</title>
      <p>We can formally define the task we address in this paper as follows:
Given: One or multiple source domains DS with source domain data {XS , YS },
and a target domain DT with target domain data {XT , YT }, where the
instances x ∈ X are time series and the labels y ∈ Y are ∈ {anomaly, normal}.
Additionally, only partial label information is available in the source
domains, and no label information in the target domain.</p>
      <p>Do: Learn a model for anomaly detection fT (·) in the target domain DT using
the knowledge in DS , and DS 6= DT .</p>
      <p>Both the source and target domain instances are time series. Thus each instance
x = {(t1, v1), . . . , (tn, vn)}, where ti is a time stamp and vi is a single
measurement of the variable of interest v at time ti. The problem has the following
characteristics:
– The joint distributions of source and target domain data, denoted by pS (X, Y )
and pT (X, Y ), are not necessarily equal.
– No labels are known for the target domain, thus YT = ∅. In the source
domain, (partial) label information is available.
– The same variable v is monitored in the source and target domain, under
possibly different conditions (e.g., the same machine in different factories).
– The number of samples in the DS and DT are denoted respectively by nS =
|XS | and nT = |XT |, and no restrictions are imposed on them.
– Each time series in DS or DT has the same length d.
– The source and target domain instances are randomly sampled from the true
underlying distribution.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Context and related work</title>
      <p>
        Several flavors of transfer learning distinguish themselves in the way knowledge is
transferred between source and target domain. In this paper we employ
instancebased transfer learning. The idea is to transfer specific (labeled) instances from
the source domain to the target domain in order to improve learning a
target predictive function fT (·) [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. In the case of anomaly detection, the target
function is a classifier that aims to distinguish normal instances from
anomalous instances. However, care needs to be taken when selecting which instances
to transfer, because transferring all instances could result in degraded
performance in the target domain (i.e., negative transfer) [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. A popular solution is
to define a weight for each transferred instance based on the similarity of the
source and target domain. The latter is characterized either by the similarity of
the marginal probability distributions pS (X) and pT (X), and/or the similarity
of conditional probability distributions pS (Y |X) and pT (Y |X). Various ways of
calculating these weights have been proposed [
        <xref ref-type="bibr" rid="ref10 ref3 ref6">3, 6, 10</xref>
        ]. However, the problem
outlined in this paper states that YT = ∅, which is a realistic assumption given
that in practice labeling is expensive. Hence, we cannot easily calculate pT (Y |X).
Furthermore, even if the marginal distributions are different, it can still be
beneficial to transfer specific instances. Consider the following. Since the target task
is anomaly detection, one cares for a classifier that robustly characterizes normal
behavior. By adding a diverse set of anomalies to the training data of the
classifier, the learned decision surfaces will be more restricted, ensuring a decrease
of type 2 errors when detecting anomalies in new, unseen data.
      </p>
      <p>
        The subject of instance-based transfer learning for time series has received
less attention in literature. Spiegel recently proposed a mechanism for learning
a target classifier using set of unlabeled time series in various source domains,
without assuming that source and target domain follow the same generative
distribution or even have the same class labels [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. However, they require a
limited set of labels in the target domain, whereas we have YT = ∅.
4
      </p>
    </sec>
    <sec id="sec-4">
      <title>Methodology</title>
      <p>
        In order to learn the model for anomaly detection fT (·) in the target domain, we
transfer labeled instances from different source domains. To avoid situations of
negative transfer (e.g., transferring an instance with the label anomaly that maps
to a normal instance in the target domain), a decision function decides whether
to transfer an instance or not. First, we outline the intuitions behind the decision
function based on two commonly known characteristics of anomalous instances
(Sec. 4.1). Then, we propose two distinct decision functions (Sec. 4.2 and 4.3).
Finally, we describe a method for supervised anomaly detection in the target
domain based on the transferred instances (Sec. 4.4).
4.1 Instance-based transfer learning for anomaly detection
The literature frequently makes two important observations about anomalous
data:
Observation 1 Anomalies occur infrequently [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>Observation 2 If a model of normal behavior is learned, then anomalies
constitute all unexpected behavior that falls outside the boundaries of normal behavior.
This implies that it is impossible to predefine every type of anomaly.
From the first observation we derive the following property:
Property 1 Given a labeled instance (xS, yS) ∈ DS and yS = normal. If the
probability of the instance under the true target domain distribution pT (xS) is
high (i.e., the instance is likely to be sampled from the target domain), then the
probability that the true label of the instance in the target domain is normal,
pT (yS = normal|xS) is also high.</p>
      <p>The second observation allows us to derive the reverse property:
Property 2 Given a labeled instance (xS, yS) ∈ DS and yS = anomaly. If the
probability of the instance under the true target domain distribution pT (xS) is
low, then the probability that the true label of the instance in the target domain
is anomaly, pT (yS = anomaly|xS) is high.</p>
      <p>Notice that in the latter property the time series xS can have any form, while this
is not true for the first property, where the form is restricted by the distribution
of the target domain data. Given a labeled instance (xS, yS) ∈ DS that we want
to transfer to the target domain, Property 1 and Property 2 allow us to make
a decision whether to transfer or not. We can formally define a weight associated
with xS which will be high when the transfer makes sense, and low when it will
likely cause negative transfer.</p>
      <p>wS =
(
pT (xS)</p>
      <p>
        if yS = normal
1 − pT (xS) if yS = anomaly
However, since each time series xS can be considered as a vector of length d in
Rd (i.e., it consists of a series of numeric values for continuous variable v), the
probability of observing exactly xS under the target domain distribution must
be 0. Instead, we calculate the probability of observing a small interval around
xS, such that:
pT (xS) = lim
ΔI→0 ΔI
pT (xS)dx
where ΔI is an infinitesimally small region around xS in the target domain. This
probability is equal to the true density function over the target domain fT (xS).
Given that the true target domain density is unknown, we need to estimate it
Z
from the data XT . It is shown that this estimate fˆT (xS) can be calculated as
follows [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]:
fˆT (xS) =
1 1 XnT K
nT (hnT )d i=1
xS − xi
(hnT )d
where K(x) is the window function or kernel in the d-dimensional space and
RRd K(x)dx = 1. The parameter hnT &gt; 0 is the bandwidth corresponding to
the width of the kernel, and depends on the number of observations nT . The
estimate fˆT (xS) converges to the true density fT (xS) when there is an infinite
number of observations, nT → ∞, under the assumption that the data XT are
randomly sampled from the true underlying distribution.
(3)
(4)
(5)
(6)
4.2
      </p>
      <p>Density-based transfer decision function
For guaranteeing convergence of fˆT (xS) to the true density function, the
sample size must increase exponentially with the length d of the time series data.
The reasoning is clear; high-dimensional spaces are sparsely populated by the
available data, making it hard to produce accurate estimates. However, this is
often infeasible in practice (gathering data is expensive). For longer time series
d is automatically high, that is, if we treat the time series as a vector in Rd. As
a practical solution, we propose to reduce the length d of the time series xS by
dividing it into l equal-length subsequences, each with length m &lt; d. For every
subsequence s in xS, the density is estimated using Eq. 3 with a Gaussian kernel:
fˆT,m(s) =
1</p>
      <p>1
nT (hnT √2π)m i=1
nT
X exp</p>
      <p>1 s − si
− 2 hnT
2!
where hnT is the standard deviation of the Gaussian, and si are the subsequences
of the instances in XT . The Gaussian kernel ensures that instead of simply
counting similar subsequences, the count is weighted for each subsequence si
based on the kernelized distance to sS.</p>
      <p>Estimating the densities for the subsequences yields more accurate estimates
given the reduced dimensionality, but simultaneously results in l = m/d
estimates for each time series xS. Hence, we have to adjust Eq. 1 to reflect this new
situation. We only show the case in which the label yS = normal as the reverse
case is straightforward:
wS =</p>
      <p>1
Zmax − Zmin</p>
      <p>Zmax =</p>
      <p>l
X fTˆ,m(si) − Zmin
i=1</p>
      <p>!
max
xT ∈{XT ∪xS} sj∈xT</p>
      <p>
        X fTˆ,m(sj)
The sum of the density estimates in the subsequences is normalized using
minmax normalization, such that wS ∈ [
        <xref ref-type="bibr" rid="ref1">0, 1</xref>
        ]. Zmin is calculated similarly as Zmax
in Eq. 6, but taking the minimum instead of maximum. By setting a threshold
on the final weights, we make a decision on whether to transfer or not.
4.3
      </p>
      <p>
        Cluster-based transfer decision function
Our second proposed decision function is also based on the intuitions outlined in
Sec. 4.1. First, the target domain data XT are clustered using k-means clustering.
Second, the resulting set of clusters C over XT is divided into a set of large
clusters, and a set of small clusters according to the following definition [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]:
Definition 1. Given a dataset XT with nT instances, a set of ordered clusters
C = {C1, ..., Ck} such that |C1| ≥ |C2| ≥ ... ≥ |Ck|, and two numeric parameters
α and β, the boundary b between large and small clusters is defined such that
either of the following conditions holds:
      </p>
      <p>b
X
i=1
|Ci| ≥ nT × α</p>
      <p>|Cb|
|Cb+1|
≥ β
(7)
(8)
LC = {Ci|i ≤ b} and SC = {Ci|i &gt; b} are respectively the set of large and small
clusters, and LC ∪ SC = C.</p>
      <p>Furthermore, we define the radius of a cluster as ri = maxxj∈Ci kxj − cik2.
Lastly, a decision is made whether or not to transfer a labeled instance xS
from the source domain. Intuitively, and in line with Observation 1 and 2,
anomalies in XT should fall in small clusters, while large clusters contain the
normal instances. Transferred labeled instances from the source domain should
adhere to the same intuitions. Each transferred instance is assigned to a cluster
Ci ∈ C such that kxS − cik2 is minimized. An instance is only transferred in two
cases. First, if the instance has label normal and is assigned to a cluster Ci such
that Ci ∈ LC and the distance of the instance to the cluster center is less or
equal to the radius of the cluster. Second, if the instance has label anomaly and
fulfills either of two conditions: the instance is assigned to a cluster Ci such that
Ci ∈/ LC, or it is assigned to a cluster Ci such that Ci ∈ LC and the distance of
the instance to the cluster center is larger than the radius of the cluster. In all
other cases there is no transfer.
4.4</p>
      <p>
        Supervised anomaly detection in a set of time series
After transferring instances from one or multiple source domains to the target
domain using the decision functions in Sec. 4.2 and 4.3, we can construct a
classifier in the target domain to detect anomalies. Ignoring the unlabeled target
domain data, we only use the set of labeled data L = {(xi, yi)}in=A1, nA being the
number of instances transferred. It has been shown that one-nearest-neighbor
(1NN) classifier with dynamic time warping (DTW) or Euclidean distance is a
strong candidate for time series classification [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. To that end, we construct a
1NN-DTW classifier on top of L to predict the labels of unseen instances.
5
      </p>
    </sec>
    <sec id="sec-5">
      <title>Experimental evaluation</title>
      <p>In this section we aim to answer the following research questions:
– Do the proposed decision functions for instance-based time series transfer
succeed in transferring useful knowledge between source and target domain.
First, we introduce the unsupervised baseline method to which we will compare
the 1NN-DTW method with instance transfer (Sec. 5.1). Then, we discuss the
data, the experimental setup, and the results (Sec. 5.2).
5.1</p>
      <p>
        Unsupervised anomaly detection in a set of time series
Without instance transfer, the target domain consists of a set of unlabeled time
series data U = {(xi)}in=T1. Based on the anomaly detection approach outlined in
Kha et al., we introduce a straightforward unsupervised algorithm for anomaly
detection that will serve as a baseline [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. The algorithm calculates the cluster
based local outlier factor (CBLOF) for each series in U .
      </p>
      <p>Definition 2. Given a set of large LC and small clusters SC defined over U
(as per definition 1), the CBLOF of an instance xi ∈ U , belonging to cluster Ci,
is calculated as:</p>
      <p>CBLOF (xi) =
(|Ci| × D(xi, ci)
if Ci ∈ LC
(9)
|Ci| × mincj∈LC D(xi, cj ) if Ci ∈ SC
Then, anomalies are characterized by a high CBLOF.
5.2</p>
      <p>Experiments
Data. Due to the lack of readily available benchmarks for the problem outlined
in Sec. 2, we experimentally evaluate on a real-world data set obtained from
a large company. The provided data detail resource usage continuously tracked
over a period of approximately two years. Since the usage is highly dependent
on the time of day, we can generate 24 (hourly) data sets by grouping the usage
data by hour. Each data set contains about 850 different time series. For a
limited number of these series in each set we possess expert labels indicating
either normal or anomaly.</p>
      <p>Experimental setup. In turn, we treat each of the 24 data sets as the target
domain and the remaining data sets as source domains. We consider transferring
from a single source or multiple sources. Any labeled examples in the target
domain are set aside and serve as the test set. First, the proposed decision
functions are used to transfer instances from either a single source domain or
multiple source domains combined to the target domain. Then, we train both
the unsupervised CBLOF (Sec. 5.1), and supervised 1NN-DTW anomaly
detection model that uses the labeled instances transferred to the target domain
1.0
0.8
y
c
a
r
cu0.6
c
a
n
o
i
ta0.4
fiic
s
s
a
l
C0.2
cluster-based
density-based
CBLOF
(Sec. 4.4). Finally, both models predict the labels of the test set, and we report
classification accuracy. For the density-based approach, we set the threshold on
the final weights to 0.5. For the cluster-based approach we selected α = 0.95,
β = 4, and the number of clusters 10.</p>
      <p>000.0:00:00 02:00:00 04:00:00 06:00:00 08:00:00 10:00:00 12:00:00 14:00:00 16:00:00 18:00:00 20:00:00 22:00:00</p>
      <p>Data set used as target</p>
      <p>Evaluation. A limited excerpt of the experimental results is reported in Table
1. Figure 1 plots the full experimental results in a condensed manner. From the
results we derive the following observations. First, instance transfer with
1NNDTW outperforms the unsupervised CBLOF algorithm in 21 of the 24 data sets.
Clearly, this indicates that the instances that are transferred by both decision
functions, are useful in detecting anomalies. Second, the transfer works both
between similar and dissimilar domains. To see this, one must know that in our
real-world data set resource usage during the night is very different from usage
during the day. As a result, the data sets at 00:00 and 01:00 are fairly similar
for example, while data sets at 21:00 and 15:00 are highly different. From Table
1 it is clear that this distinction has little impact on the performance of the
1NN-DTW model. Third, the cluster-based decision function performs at least
as well as the density-based variant. This is apparent from Figure 1.
6</p>
    </sec>
    <sec id="sec-6">
      <title>Conclusion</title>
      <p>In this paper we introduced two decision functions to guide instance-based
transfer learning in case the instances are time series and the task at hand is anomaly
detection. Both functions are based on two commonly knowns insights about
anomalies: they are infrequent and unexpected. We experimentally evaluated
nA
the proposed decision functions in combination with a 1NN-DTW classifier by
comparing it to an unsupervised anomaly detection algorithm on a real-world
data set. The experiments showed that the transfer-based approach outperforms
the unsupervised approach in 21 of the 24 data sets. Additionally, both decision
functions lead to similar results.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Andrews</surname>
            ,
            <given-names>J.T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tanay</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Morton</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Griffin</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Transfer representation-learning for anomaly detection</article-title>
          .
          <source>ICML</source>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Chandola</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Banerjee</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kumar</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>Anomaly detection: A survey</article-title>
          .
          <source>ACM computing surveys (CSUR) 41(3)</source>
          ,
          <fpage>1</fpage>
          -
          <lpage>72</lpage>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Chattopadhyay</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sun</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fan</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Davidson</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Panchanathan</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ye</surname>
          </string-name>
          , J.:
          <article-title>Multisource domain adaptation and its application to early detection of fatigue. ACM Transactions on Knowledge Discovery from Data (TKDD) 6(4</article-title>
          ),
          <volume>18</volume>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Fukunaga</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Introduction to statistical pattern recognition</article-title>
          . Academic press (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Kha</surname>
            ,
            <given-names>N.H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Anh</surname>
          </string-name>
          , D.T.:
          <article-title>From cluster-based outlier detection to time series discord discovery</article-title>
          .
          <source>In: Revised Selected Papers of the PAKDD</source>
          <year>2015</year>
          <article-title>Workshops on Trends and Applications in Knowledge Discovery and Data Mining</article-title>
          -Volume
          <volume>9441</volume>
          . pp.
          <fpage>16</fpage>
          -
          <lpage>28</lpage>
          . Springer-Verlag New York, Inc. (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Pan</surname>
            ,
            <given-names>S.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          :
          <article-title>A survey on transfer learning</article-title>
          .
          <source>IEEE Transactions on knowledge and data engineering</source>
          <volume>22</volume>
          (
          <issue>10</issue>
          ),
          <fpage>1345</fpage>
          -
          <lpage>1359</lpage>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Spiegel</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Transfer learning for time series classification in dissimilarity spaces</article-title>
          .
          <source>In: Proceedings of AALTD 2016: Second ECML/PKDD International Workshop on Advanced Analytics and Learning on Temporal Data</source>
          . p.
          <volume>78</volume>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Torrey</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shavlik</surname>
          </string-name>
          , J.:
          <article-title>Transfer learning</article-title>
          .
          <source>Handbook of Research on Machine Learning Applications and Trends: Algorithms, Methods, and Techniques 1</source>
          ,
          <issue>242</issue>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Wei</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Keogh</surname>
          </string-name>
          , E.:
          <article-title>Semi-supervised time series classification</article-title>
          .
          <source>In: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining</source>
          . pp.
          <fpage>748</fpage>
          -
          <lpage>753</lpage>
          . ACM (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Weiss</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Khoshgoftaar</surname>
            ,
            <given-names>T.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>D.:</given-names>
          </string-name>
          <article-title>A survey of transfer learning</article-title>
          .
          <source>Journal of Big Data</source>
          <volume>3</volume>
          (
          <issue>1</issue>
          ),
          <volume>9</volume>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>