<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Optimal Robust Simplifications for Explaining Time Series Classifications</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jan Arne Telle</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Cèsar Ferri</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Brigt Håvardstun</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Informatics, University of Bergen</institution>
          ,
          <country country="NO">Norway</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>VRAIN, Universitat Politècnica de València</institution>
          ,
          <country country="ES">Spain</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Given a Time Series Classifier (TSC) and three parameters that balance between error, simplicity and robustness, we define an optimization problem over all possible ways of simplifying a given time series  into straight-line segments. Robustness is fixed as the fraction of perturbations that have the same classification as  under the TSC, and we introduce a novel method for generating a set of perturbations where this information is easy to visualize. We prove that under some mild conditions on the TSC and the three parameters, we can find the optimal solution in time polynomial in the length of , by first doing dynamic programming to solve for error and simplicity, and then adding robustness. We test the resulting Optimal Robust Simplification (ORS)-Algorithm on binary TSCs for three datasets from UCR. We apply the ORS-Algorithm to prototypes of the classes, with varying parameters, to evaluate its power as an explanatory tool for the trained classifiers. We also provide a tool for visualizing the robustness information. We believe the resulting insights show the usefulness of the Optimal Robust Simplifications in explaining TSCs.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Time Series</kwd>
        <kwd>XAI</kwd>
        <kwd>Simplification</kwd>
        <kwd>Explainability</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Temporal data is encountered in many real-world applications ranging from patient data in healthcare
[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] to the field of cyber security [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Deep learning methods have been successful [
        <xref ref-type="bibr" rid="ref1 ref2 ref3">3, 1, 2</xref>
        ] for TSC
Time Series Classification - but such methods are not easily interpretable, and often viewed as black
boxes, which limits their applications when user trust in the decision process is crucial. To enable the
analysis of these black-box models we revert to post-hoc interpretability. Recent research has focused on
adapting existing methods to time series, both specific methods like SHAP-LIME [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], Saliency Methods
[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] and Counterfactuals [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], and also combinations of these [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
      </p>
      <p>
        As humans learn and reason by forming mental representations of concepts based on examples,
and any machine learning model has been trained on data, then we believe that data e.g. in the form
of prototypes and counterfactuals is indeed the natural common language between the user and this
model. However, the basic problem is that compared to images and text, time series data are not
intuitively understandable to humans [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. This makes interpretability of time series extra demanding,
both when it comes to understanding how users will react to the provided explanations and to predict
what explanatory tools are best. For example, in [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] a tool was given for explainability of a TSC, that
allowed model inspection so the user could form their own mental model of the classifier. However, a
user evaluation showed that non-expert end-users were not able to make use of this freedom, supporting
the notion that time-series are particularly non-intuitive for humans.
      </p>
      <p>
        Theissler et al [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] give an intriguing taxonomy of XAI for TSC, divided into those focused on (i)
Time-points (e.g. SHAP and LIME) or (ii) Subsequences (e.g. Shapelets) or (iii) Instances (Prototypes,
CF, Feature-based). Explaining a classifier by instances like prototypes has a definite appeal, but it
is a challenge how to highlight the features important for the classification, while at the same time
simplifying the instance to mitigate the non-intuitive aspects of this domain, as we attempt in this
work. Theissler et al [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] review the literature on XAI for TSC and of the 9 papers they discuss on
Instance-based explanations using Prototypes or Features, it is remarkable that not a single one is
specialized for Local explanations, rather they are all Global. In this work we fill this gap, and introduce
a way of simplifying a time series by straight-line segments, and doing this in a way that is robust with
respect to the classification of a given TSC ℎ.1 Our ORS-algorithm, for Optimal Robust Simplifications, is
model-agnostic, with robustness of a simplification being the fraction of local perturbations that retains
the classification under TSC ℎ. There are various ways of perturbing times series, see e.g. [
        <xref ref-type="bibr" rid="ref11 ref12 ref13">11, 12, 13</xref>
        ]. In
this work we develop a diferent method of computing a set of perturbations, both since we start with a
simplification on straight-line segments and also because we want the aggregate information about
their classifications to be visualized in an informative way. When computing robustness we start with a
straight-line simplification, and any perturbation must to a certain degree keep that position and shape,
while it is also allowed to deviate as the perturbation moves inside a band around the simplification, see
Figure 8.
      </p>
      <p>The ORS-Algorithm computing a robust simplification  of a given times series , under a TSC ℎ,
has three parameters that allows to balance between error (Squared Euclidean distance between  and
), simplicity (number of segments of ), and robustness (what fraction of perturbations of  have
same classification as  by ℎ), and  is defined as the minimization over an objective function taking
all this into account, see Definition 1. Perhaps surprisingly, in Theorem 4 we show that under some mild
conditions on ℎ and the three parameters, we can find the optimal solution in time polynomial in the
length of the original time series , by first solving for error and simplicity, and then adding robustness.
Note that our algorithm allows not only the balance between error, simplicity and robustness to change,
but also their computation, i.e. the algorithm is agnostic to the particular methods used to calculate
these values. Hence Squared Euclidean distance can be replaced by some other distance function as long
as it is amenable to the dynamic programming, simplicity can depend on other aspects than number of
segments, and robustness can be computed by other types of perturbations.</p>
      <p>In the rest of this paper we first discuss related work and cover some standard definitions before
presenting the ORS-Algorithm together with proof of correctness and runtime. We then discuss the
usefulness of the robust simplifications, with a focus in this paper on time series that are discrete,
univariate, evenly sampled, aperiodic, short, and non-stationary, based on UCR datasets Chinatown,
Italy Power Demand and ECG200. We also provide an aid to visualize the robustness information. Our
ifndings suggest that the robust simplifications of a prototype  can be used to extract explanations of
the classification done by the TSC, as they simplify  to the point where human intuition can more
easily be applied. We conclude the paper by discussing some directions for future work.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        Several techniques have been applied to generate explanations from TSC models [
        <xref ref-type="bibr" rid="ref10 ref14">10, 14</xref>
        ]. Specifically,
techniques previously used on Convolutional Neural Networks or Recurrent Neural Networks have
been applied when these techniques have been used for TSC. For instance, the authors in [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] apply
to time series several XAI methods previously used on image and text domain. They also introduce
verification techniques specific to times series, in particular a perturbation analysis and a sequence
evaluation. Related to the datasets used in our paper, [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] presents a user study of XAI methods to
explain the ECG200 dataset from UCR.
      </p>
      <p>
        Another alternative is to produce explanations through examples, and these can be specifically
utilized in the time series domain. One type of this explanation method involves giving the nearest
example from the training dataset that acts as a prototype to depict the normal behavior of a similar
sample [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]. A method to generate prototypes for time series data using an autoencoder is presented in
[
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]. The main novelty of the work is the method to generate diverse prototypes.
      </p>
      <p>
        Given that in many cases raw time series can be too complex for humans, several studies have tried
to employ simplified versions as explanations. In [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ], the authors propose a dimensionality reduction
technique for time series, called Piecewise Aggregate Approximation, as a tool for similarity search
1Note that if one pieces together several such local explanations, say for prototypes of each class, then this can still form a
global explanation of the given TSC ℎ.
in large time series databases. The work [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] presents a review of piecewise linear approximations
employed for time series data, and the authors also propose a method named SWAB (based on the
Sliding Window and Bottom-up approaches). Finally, in [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] Camponogara and Nazari introduce a range
of piecewise-linear models and algorithms for unknown functions. Another option is to segment time
series into fixed-width windows and employ these to justify decisions. In [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ] Schlegel et al propose
TS-MULE, a model explanation method working to segment and perturb time series data and extending
LIME. A study on the efect of segmentation on time series, in particular in the field of finance for the
use of pattern matching was presented in [22]. In [23], Si and Yin also apply segmentation to financial
time series data, now as a preprocessing step to locate technical patterns.
      </p>
      <p>
        Perturbations have also been previously employed for XAI and time series. In [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], the authors
propose a framework to generate explanations for time series classifications based on a generative
model to create with-in-distribution perturbations. Enguehard, in [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], introduces a technique to explain
multivariate time series predictions using a perturbation-based saliency method. Recently, the approach
ContraLSP has been detailed in [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. This method is based on a locally sparse model that produces
counterfactual samples to build uninformative perturbations but keeps distribution using contrastive
learning.
      </p>
      <p>Some tools have been proposed to allow users to interact with time series and, in that way, to get some
insights about their classification, like [ 24] where model inspection is the key aspect. In [25], the authors
develop an interactive XAI tool for loan applications that allows users to experiment with hypothetical
input values and inspect their efect on the outcomes of the model and perform a user evaluation on
MTurk. [26] presents a Python package to provide a unified interface for the interpretation of time
series classification.</p>
      <p>Some papers have developed a similar approach to our proposal. Im [27], Tang et al. presents the
method Dual Prototypical Shapelet Networks (DPSN) for few-shot time-series classification. This
method trains a neural network from few examples and also interprets the model from two levels of
granularity: a global overview with representative time series examples, and local highlights with
discriminative shapelets. Another related paper for sequence learning is [28]. The work proposes
ProSeNet a RNN-based method for deep sequence modeling. The approach combines prototype learning
and RNNs to construct models with high accuracy and interpretability. The method considers three
criteria in constructing prototypes: Simplicity, the prototypes can be subsequences with only the key
events determining the output; Diversity, redundant prototypes should be avoided; Sparsity, for each
example to explain only a few prototypes are shown to avoid long explanations. The authors show the
validity of ProSeNet for time series using the MIT-BIH Arrhythmia ECG dataset.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Standard definitions</title>
      <p>Let us present formal definitions for Time Series Classification (TSC) and recall basic notions.</p>
      <p>
        Staying consistent with earlier notation [
        <xref ref-type="bibr" rid="ref10 ref6">10, 6</xref>
        ] a time series  = {1, 2, . . . , } is an ordered set of
 real-valued observations (or time steps). Note we may also view each  as a pair consisting of an
-value (the time) and a -value (the observation). A time series dataset  = {1, 2, ..., } ∈ × 
is a collection of such time series where each time series has a class label  forming a vector of class
labels. In this paper we consider only binary classification tasks. Given such a dataset, Time Series
Classification is the task of training a mapping  from the space of possible inputs to a probability
distribution over the class values. Thus, a black-box classifier ( ) takes a time series  as input and
predicts a probability output over the class values.
      </p>
      <p>Prototypes are time series exemplifying the main aspects responsible for a classifier’s specific decision
outcome. It can be a real instance (which is what we opt for) sampled from the dataset that is important
and meaningful because it summarizes the shape of many other similar instances, or a synthetic one,
for example a cluster centroid or an instance generated by following some ad-hoc processes.</p>
    </sec>
    <sec id="sec-4">
      <title>4. An Algorithm for Optimal Robust Simplifications</title>
      <p>In this section we give the ORS-Algorithm that takes a binary Time Series Classifier ℎ and a univariate
times series , on  points 1, ...,  given by increasing -values, and computes the optimal robust
simplification (, ℎ, , ,  ), where , ,  are factors freely chosen to balance between error,
simplicity and robustness. This simplification will select  + 1 of the  points 1 , 2 , ..., +1 with
 &lt; +1, ∀ to form a simplified time series  on  segments by drawing a line between each
consecutive pairs of the  + 1 points and by letting the first segment and the last segment extend to the
-values of 1 and , respectively. See Figure 1.</p>
      <p>Definition 1. We define (, ℎ, , ,  ) formally as the minimum, over all 2 −  − 1 possible
choices of 2 ≤  ≤  points that give a simplification  with ℎ() = ℎ(), of the value
 · (, ) +  ·  +  ·  (, , ℎ)
comparing the original time series  and the simplification  on  =  − 1 segments defined by
these  points.</p>
      <p>
        The optimal simplification thus reflects a selected balance of error, simplicity and robustness achieved
by freely choosing the 3 factors , ,  :
•  for error: (, ) is Squared Euclidean distance between  and , i.e. the sum over all
-values, of the square of the diference between the corresponding -value of  and .
•  for simplicity:  is the number of segments
•  for robustness:  (, , ℎ) is a measure of how robust the classifier ℎ is on perturbations
of , measured by the fraction of perturbations that do not fall in the same class as , thus
in the range [
        <xref ref-type="bibr" rid="ref1">0, 1</xref>
        ] with 0 being most robust (  is short for fragility and can be viewed as
(1-robustness)). See subsection 4.2.2 for the definition of the perturbations used.
      </p>
      <p>Perhaps surprisingly, under some mild assumptions this complicated objective function still allows
an algorithm that finds the optimal in time polynomial in . In this section we describe this algorithm
in detail.
We first solve the case of  = 0, i.e. without giving any weight to robustness. Firstly, Least Squares is a
foundational problem in statistics and numerical analysis, given points 1, 2, ...,  in the plane find
the straight line that minimizes the squared error. In the more general Segmented Least Squares (SLS)
problem we ask for a sequence of straight-line segments that minimizes cost, which is some balance of
error and simplicity (number of segments). The SLS problem is famously solvable in polynomial time
by a textbook Dynamic Programming Algorithm based on the following recurrence for   (), the
minimum cost for points 1, ...,  :
  () =
{︃0,
if  = 0
else
min1≤ ≤  {(, ) +  +   ( − 1)}
where (, ) is the least square error over all single lines that cover  to  , and  is the the cost
of one extra segment, with value of  chosen to balance between error and simplicity. This problem
formulation difers from our setting in 3 important ways:
• We want a continuous set of segments, with each segment starting and ending in given points.
• We want the first (resp. last) segment to be defined by any two points ,  and the line drawn
between them continued until it hits the -value of 1 (resp. of ).</p>
      <p>• We want robustness to perturbations as part of the objective function.</p>
      <p>The first two diferences are relatively easy to accommodate, as follows:</p>
      <p>() =
where
⎧⎪0, if  = 0
⎪
⎪
⎪⎨min1≤ ≤  {[ · (, ) +  +   ()], [ · (1, , ) +  ]} if  &lt; 
⎪min1≤ ≤  min+1≤ ≤ {[ · (, , ) +  +   ()],
⎪
⎪⎪⎩ [ · (1, , , ) +  ]}
if  = 
• (, ) is the Squared Euclidean distance between the straight line from  to  and the part of
the original time series  given by points , ..., 
• (, , ) is the same except the straight line continues past  to the -value of  and we
compare to the final part of  given by , ..., 
• (1, , ) is similar as above except the line continues in the other direction to the -value of 1
and we compare to the initial part of  1, ..., 
• (1, , , ) compares  to a single line covering all -values and going through points , 
For the case of (, ℎ, , ,  ) where  = 0 i.e. where robustness does not play a part, the
argument for correctness of the above recurrence is similar to the textbook argument for correctness of
the recurrence of SLS, so we do not repeat it here, except to note the new parts being that
• We use   () rather than   ( − 1) since now for two consecutive segments the endpoint
of the first is the startpoint of the next.
• When defining   () we use (1, , ) so first segment can end in 
• When defining   () we use (, , ) so last segment can choose any pair , , and
(1, , , ) to allow simplification with a single segment.</p>
      <p>Transforming this recurrence into a dynamic programming algorithm is straightforward, by first
computing each of the (2) error terms in time () each, followed by a nested loop computing the
   () values in () time each. Retrieving the optimal solution is done by a second pass over the
values stored, also standard. This gives the following result.</p>
      <p>Theorem 1. Given a time series  of length , a binary TSC ℎ, and factors , ,  that balance between
error, simplicity and robustness. For the case of  = 0 (no robustness) we can compute the optimal
simplification achieving the minimum (, ℎ, , ,  = 0) in time (3) by dynamic programming.
4.2. Accommodating Robustness: DP with Heaps
Dynamic programming relies on the property that an optimal solution to a larger problem consists of
optimal solutions to subproblems. Robustness depends on the TS Classifier ℎ and we cannot expect it to
satisfy this property, since its classification on only a part of a time series will not in general correspond
to its classification on the full time series. Thus, we cannot incorporate robustness as part of the DP
scheme. Instead, our algorithm for Optimal Robust Simplifications (ORS), the ORS-Algorithm, uses a
2-stage approach:
Stage 1 Use DP to compute a 2-dimensional table of size  ×  that in cell [, ] stores the th least
costly simplification for the points 1, ...,  of the input time series , considering only error
and simplicity but not robustness, as in above DP.</p>
      <p>Stage 2 Consider the  least costly simplifications 1, ..., , ordered by non-decreasing cost under
error and simplicity as stored in [, 1]..[, ] respectively. Compute robustness for any
simplification that ℎ classifies same as .</p>
      <p>The ORS-Algorithm then returns the simplification having lowest overall total cost, i.e. the 
minimizing the value of value of  · (, ) +  ·  +  ·  (, , ℎ)</p>
      <p>Pseudo-code is given in the Appendix. We will here give a high-level explanation, but let us first
state the following formal result.</p>
      <p>Theorem 2. Let the diference in cost between 1 and , as computed by Stage 1, be . For any
value of  ≤  the simplification returned by the ORS-Algorithm will then be an optimal one achieving
(, ℎ, , ,  ).</p>
      <p>Proof 1. We prove the statement by contradiction. Let the simplification  returned by the ORS-Algorithm
have  segments and assume there is another simplification ′ with ′ segments such that
 · (, ′) +  · ′ +  ·  (, ′, ℎ) &lt;</p>
      <p>· (, ) +  ·  +  ·  (, , ℎ)
Since Stage 2 optimized this value over all simplifications 1, ...,  we cannot have ′ among these,
and thus by the assumption in the statement of the theorem we have  · (, ′) +  · ′ ≥
 +  · (, 1) +  · 1 , with 1 the number of segments of 1. Thus we get
 +  · (, 1) +  · 1 +  ·  (, ′, ℎ) &lt;
 · (, ) +  ·  +  ·  (, , ℎ) ≤
 · (, 1) +  · 1 +  ·  (, 1, ℎ)
with the latter inequality following since  was the minimum over 1, ..., . Rearranging, this gives
 &lt;  ·  (, 1, ℎ)−  ·  (, ′, ℎ)+(( · (, 1)+ · 1 )− ( · (, 1)+ · 1 ))
with the latter term being zero and thus</p>
      <p>
        &lt;  · ( (, , ℎ) −  (, ′, ℎ))
Note we have  (· ) in the range [
        <xref ref-type="bibr" rid="ref1">0, 1</xref>
        ] which means the above implies that  &gt;  , contradicting the
assumption in the statement of the theorem. □
4.2.1. Important details of Stage 1.
      </p>
      <p>We describe some key details of how to implement Stage 1, computing a table of size  ×  with [, ]
storing the th least costly simplification of points 1, ...,  under error and simplicity only. The
diference to the earlier DP is that we compute not only the least costly but the  least costly with
 &gt; 1. After computing error terms we run an outer loop from  = 1 to  with two inner loops one
after the other. The first loop from  = 1 to  − 1 is similar to the previous DP, computing the cost
[, 1] + (, ) of having the last segment go from  to  , but now adding all these values to a heap
 . The second loop from  = 1 to  fills the [, ] entries by extracting the minimum element from
 , say this element is [, ] + (, ), then first set [, ] to this element, but most importantly
also add to the heap  the new value [,  + 1] + (, ) since the  + 1st least costly solution
up to  may be lower than many other solutions in the heap. For details, in particular regarding the
edge-cases which are cumbersome to implement properly, see the pseudo-code in the appendix.</p>
      <sec id="sec-4-1">
        <title>4.2.2. Definition of Perturbations and Robustness in Stage 2.</title>
        <p>A simplified time series  on  segments is computed by choosing  + 1 points 1 , ..., +1 from
an original time series  on points 1, ..., . When computing the robustness of  focus on the
following  + 1 pivot points 1, 2 , ...,  , 2 of  (note the first and last may not be points of )
where 1 (and 2) is the -value of  that corresponds to the smallest (and largest) -value, where
smallest is same as -value of 1 (and largest is same as -value of ). In Figure 1 1 is the leftmost
circled point and 2 the rightmost circled point, at -values 0 and 23 respectively. For these  + 1 pivot
points of  we allow an increase or decrease of its -value by an  which is 10% of the -range of the
dataset the classifier ℎ was trained on, i.e.  = 0.1 * ( − ) with  and  the maximum
and minimum -value in this dataset.</p>
        <p>We compute 10.000 random perturbations of , each consisting of  segments anchored at the
same -values as the  + 1 pivot points of  but with the -value of each pivot point shifted by an
amount drawn independently for the  + 1 values uniformly at random in the range [− , + ]. Thus
the perturbation will lie inside a band around  of height 2 ·  at each pivot point. See Figure 2. We
then use ℎ to classify each of the perturbations, and define the  -value of  as the fraction of the
10.000 random perturbations that have the opposite classification as .</p>
        <p>(a)   = 0.45 blue.</p>
        <p>Not robust.</p>
        <p>(b)   = 0.04 blue.</p>
        <p>Very robust.</p>
        <p>(c)   = 0.02 red.</p>
        <p>Very robust.</p>
        <p>Theorem 3. The runtime of the ORS-ALgorithm is ( · max{2,  log }).</p>
        <p>Proof 2. The initialization of error terms is (3) as in the first DP version. We use binary heaps having
(log ) insertion and extraction operations. Then, for each of the  values of 1 ≤  ≤  we have two
for-loops, one inserting at most  values into heap  (for time ( log )), and the other extracting and
inserting at most  values of  (for time ( log )). Note there are never more than  elements in any
heap  as the first loop never inserts more than  elements and the latter loop extracts one element and
inserts at most one new element. This completes the proof.</p>
        <p>Theorem 4. The optimal simplification (, ℎ, , ,  ) can be computed in polynomial time for a
time series  on  points, a binary TSC ℎ, and balancing factors , ,  , if there exists a constant , such
that the diference is ≥  between the least costly simplification that does not take robustness into account
and the th least costly one.</p>
        <p>Proof 3. This follows from Theorems 2 and 3 by running the ORS Algorithm with  = .</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Usefulness of the robust simplifications</title>
      <p>We have implemented 2 the ORS-Algorithm and in this section we illustrate its use for explaining a Time
Series Classifier. The algorithm is designed for classifiers working on univariate discrete time series
with a binary classification. Moreover, we believe its use is more easily evaluated if the times series
are evenly sampled, aperiodic and non-stationary. However, it is important to note that apart from the
above constraints we did not want simple datasets where the binary classification is particularly easy
or could be described (e.g. by ourselves) in any straightforward way. Based on these criteria we have
chosen to focus on three datasets.</p>
      <p>• From UCR [29]: Chinatown. This dataset shows the number of pedestrians on a street corner
of Chinatown in Melbourne over a 24-hour time period, and classifies these into Weekend and
Weekdays.
• From UCR [29]: ECG200. This dataset traces the electrical activity recorded in 96 time steps
during one heartbeat and classifies the time series into a normal heartbeat and a Myocardial
Infarction.
• From UCR [29]: Italy Power Demand. This dataset shows power demand in Italian households
over a 24-hour time period, and classifies these into Winter (October-March) and Summer
(AprilSeptember).</p>
      <sec id="sec-5-1">
        <title>2Available on Github: https://github.com/BrigtHaavardstun/kSimplification</title>
        <p>
          For each of these datasets we trained a fully convolutional neural network (FCN), originally proposed
by Wang et al. [30]. Specifically we implemented a model closely following the work by Delaney et al.
[
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. To select prototypes we used SPOTGreedy by Gurumoorthy et al. [31], implemented in the python
package InterpretML[32]. We then ran the robust simplification algorithm on the resulting prototypes,
with varying balancing factors, to evaluate their power as explanatory tools for the trained classifiers.
In this section we focus on three aspects of the robust simplifications to argue that they are helpful
explanations of the classifier model.
        </p>
        <p>• Showing prototypes together with their simplifications is more informative than showing
prototypes only
• Varying the balance of error vs simplicity vs robustness will highlight diferent aspects of the
prototypes
• Extracting a rule for class membership from this information can give simple and powerful
explanations</p>
      </sec>
      <sec id="sec-5-2">
        <title>We cover these aspects in separate subsections.</title>
        <p>5.1. Simplifications are informative as explanations</p>
        <sec id="sec-5-2-1">
          <title>5.1.1. Chinatown</title>
          <p>For the Chinatown dataset we have computed 3 prototypes from each of the two classes, that we call Red
and Blue. These 6 prototypes are presented in Figure 3. From these prototypes only it seems dificult to
identify with any kind of certainty a rule-of-thumb that can be used to decide the classification of new
instances. On the next figure, Figure 4, we show a single simplification added to each of the prototypes.
These simplifications are chosen with a well-balanced mix of error, simplicity and robustness, choosing
each of , ,  in the mid-range of values 3 These simplifications function as explanations for the
classification of each of the prototypes, as they also incorporate the robustness to 10.000 perturbations,
and as such they allow a rule-of-thumb to be more readily guessed at. Looking at the first segment
of each of the robust simplifications it is striking that the blue slopes are quite sharply downwards,
whereas none of the red are. A natural guess is as follows:
• Rule-of-thumb for Chinatown classification: if the time series starts by a noticable downward
slope then it is Blue, otherwise Red
3Exact values used for parameters can be found on the github page. Here we only refer to them as Low-Mid-High to keep the
presentation simple.</p>
          <p>(a) 11 segs: Mid  - Low  - Mid</p>
          <p>(b) 6 segs: Mid  - Mid  - Mid 
(c) 2 segs: Mid  - High  - Mid 
(d) 6 segs: Mid  - Mid  - High 
5.2. Varying balancing factors</p>
        </sec>
        <sec id="sec-5-2-2">
          <title>5.2.1. Chinatown</title>
          <p>In Figure 5 we see the efect of varying the balancing factors for a Red prototype in the Chinatown
dataset. We see that increasing the importance of simplicity from low to middle to high will decrease
the number of segments, while naturally also increasing the Euclidean distance to the original time
series. We also see that increasing the importance of robustness changes the slope of the first segment
from slightly downward (in the other 6-segment simplification) to slightly upward. This means that
(a) 5 segs: Mid  - Mid  - Mid 
a higher percentage of time series with first segment having slightly downward slope in this vicinity
are classified as Blue, as compared to the percentage classified Blue having slightly upward slope. This
change is significant information, and it holds even in the presence of a higher Euclidean distance to the
original prototype. Note that this observation constitutes supporting evidence for the rule-of-thumb
guessed at previously for Chinatown.</p>
        </sec>
        <sec id="sec-5-2-3">
          <title>5.2.2. Italy Power Demand</title>
          <p>In Figure 6 we see a single prototype from ItalyPowerDemand in black and two robust simplifications
of it in blue. Both simplifications have the same balancing factor for error and simplicity, but (a) has 4
segments while (b) has 5 segments resulting from an increase in the factor for robustness. Simplifying,
we could say that a single segment in (a) has been replaced by two segments in (b) to give two peaks.
This indicates a rule-of-thumb for the classifier trained on ItalyPowerDemand, namely that two such
peaks at or around those -values is important for the Blue classification.
5.2.3. ECG200
In Figure 7 we see a prototype from ECG200 in black and two distinct simplifications of it in blue. Note
that even though the factor for robustness is much higher in (b) than (a), the two simplifications are
(almost) identical, both on 5 segments. To explain this lack of significance of robustness we turn to
Theorem 2 which says that the diference between the best cost and th best cost output by Stage 1 of
the ORS Algorithm, may limit the values of  for which Stage 2 returns the optimal robust simplification.
For the simplifications of Figure 7 we ran Stage 1 with the high value of  = 1.000.000 and found
this diference to be 0.03630-0.02884=0.00746. Thus by Theorem 2 the simplification in (a) may not
be optimal for that value of  = 0.01 &gt; 0.00746, whereas the one in (b) is likely not optimal, as
 = 0.1 &gt;&gt; 0.00746. Let us explain why we believe this happens. This time series has a length of
96, and note the DP to get 5 segments optimizes over all possible ways of choosing 6 points from 96.
Note that there are many such choices of 6 points that give 5 segments similar to the ones in Figure 7.
Counting the number of ways of choosing these 6 points, starting with the rightmost point and going
left, a back-of-the-envelope calculation estimates this to be in the ballpark of 35*20*10*10*3*5 which
is over 1 million. Thus, we take this as evidence that almost all the  best simplifications returned by
the dynamic programming of Stage 1, that do not take robustness into account, are very close to these
5 segments. Thus increasing the importance of robustness cannot help, as the result will always be a
simplification similar to the one given. We leave for future work to deal with such cases, for example
by a heuristic during the DP that discards simplifications that are only marginally diferent from others.
5.3. Evaluating a classification rule guessed from robust simplifications</p>
        </sec>
        <sec id="sec-5-2-4">
          <title>5.3.1. Chinatown</title>
          <p>For the Chinatown classification we made a rule-of-thumb based on information gathered from the
explanations of the classifications given for 6 prototypes, by means of the robust simplifications. To
check if this rule-of-thumb is actually useful we need to make it more specific. Looking at the values
on the -axis for Figure 4 we see that the first segments of the 3 Blue simplifications are lines starting
at point (0, 500) and decreasing to about (5, 0) while the first segments of the Red simplifications are
only very slightly decreasing or increasing. Thus, the diference between the -values at  = 0 and
 = 5 would distinguish between these Red and Blue prototypes. We guess that this diference also
distinguishes between many other time series in the dataset, as this insight is gathered from the robust
simplifications. Taking the halfway point of 250 between the -values of 500 and 0 as the cutof the
simple classification rule becomes
• Simple Rule for Chinatown: if the -value at  = 0 minus the -value at  = 5 is larger than 250
then  is classified Blue, otherwise Red</p>
          <p>The Chinatown dataset has 363 time series. We checked their classification by the model and compared
to the Simple Rule. Indeed, the Simple Rule classifies all but 1 of the model-classified Blue time series as
Blue, and it classifies all but 19 of the Red time series as Red, for an accuracy of 94.5 %. Note this is
accuracy with respect to the model, not the ground truth, as the Simple Rule is trying to explain the
model. Let us mention that the accuracy of the model wrt ground truth is 96.4 %, while accuracy of the
simple rule wrt ground truth is 97 %.</p>
          <p>Finally, let us also mention that if we raise the cutof in the Simple Rule from 250 to 325 it actually
achieves an accuracy of 99.7 % (with accuracy of the simple rule vs ground truth down to 96.7%) failing
on only one instance. In retrospect, we could ask if there was any information available to us from the
robust simplifications that would point to this higher cutof. In fact, consider Figure 5 (a) that visualizes
the robustness of a simplification whose first segment goes from (0, 400) to (7, 50). Note the red part of
the band around the first segment stretches up above (0, 250). This should have made us wonder if the
cutof for the simple rule should be higher than 250. We could have made a test of this, for example by
constructing a time series whose first segment goes from (0, 300) to (5, 0) and visualize its robustness.
See Figure 8, where we have done exactly this, and pay close attention to the colourings of the band
around the first segment. This band is predominantly blue down to somewhere above (0, 300), and
already then it turns red. This in itself could have made us guess a higher cutof than 250 for the Simple
Rule. We believe these insights show the usefulness of the robust simplifications and their visualization.</p>
        </sec>
        <sec id="sec-5-2-5">
          <title>5.3.2. ItalyPowerDemand</title>
          <p>In the previous subsection we saw that two peaks in the mid-afternoon seemed to signify Blue class
for the classifier on the ItalyPowerDemand dataset. Looking more closely at Figure 6 we note that the
diference between the high part of the peaks and the low part is about 0.5. We also note from the
original prototype that the first peak is centered at 10, the second peak at 18, and that in (b) the bottom
of the simplification is centered at 14. From this we extract the following rule:
• Simple Rule for ItalyPowerDemand: let 1 = maximum of the -values over  = 9, 10, 11, let
2= minimum of the -values over  = 13, 14, 15 and 3 = maximum of the -values over
 = 17, 18, 19. If 1 − 2 ≥ 0.5 and 3 − 2 ≥ 0.5 then classify this instance Blue,
else Red.</p>
          <p>The dataset ItalyPowerDemand has 1096 time series. We checked their classification by the model
and compared to the Simple Rule. Indeed, the Simple Rule classifies all but 50 of the model-classified
Blue correctly, and all but 49 of the Red correctly, for an accuracy of 91%. Let us mention that accuracy
of the model wrt ground truth is 96.3%, and of the Simple Rule wrt Ground Truth 89.6%. Note that
classifiers of ItalyPowerDemand have previously been shown hard to explain to end-users [ 24]. Thus,
the Simple Rule, developed by examining the robust simplifications in Figure 6, has a surprisingly high
accuracy.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusions and Future Work</title>
      <p>In this paper, we addressed the challenge of interpretability in Time Series Classification (TSC) by
introducing the Optimal Robust Simplifications (ORS) algorithm. Our approach focuses on simplifying
time series data into straight-line segments while ensuring that these simplifications remain robust
with respect to the classifications made by a given TSC model.</p>
      <p>Our ORS-Algorithm is model-agnostic and allows a balance between error (fidelity with respect to
the original instance), simplicity (complexity of the simplification), and robustness (fraction of local
perturbations that retain the correct classification). We prove that the ORS-Algorithm, under certain
mild conditions, finds the optimal solution in polynomial time, by a first dynamic programming stage
solving for error and simplicity, and then incorporating robustness into the solution. This simplification
process aids in making time series data more intuitive and interpretable for humans.</p>
      <p>Our experimental results, based on UCR datasets Chinatown, Italy Power Demand, and ECG200,
confirm the practical utility of our approach. The robust simplifications provided by the ORS algorithm
make time series data more accessible to human intuition, facilitating better user understanding and
trust in the TSC models. In conclusion, our work contributes a novel method for enhancing the
interpretability of TSC models through robust simplifications.</p>
      <p>Future research could explore other ways to compute the factors employed in the algorithm. For
instance, the squared Euclidean distance could be replaced by some other distance function, or simplicity
can depend on other aspects than only the number of segments, and robustness can be computed by
other types of perturbations. Finally, we plan to conduct experiments including human studies to
demonstrate empirically the benefits of using optimal robust simplifications to explain TSC models and
compare to related XAI methods.
explanations for time series forecast models, 2021. arXiv:2109.08438.
[22] Y. Wan, X. Gong, Y.-W. Si, Efect of segmentation on financial time series pattern matching,
Applied Soft Computing 38 (2016) 346–359. URL: https://www.sciencedirect.com/science/article/
pii/S1568494615006341. doi:https://doi.org/10.1016/j.asoc.2015.10.012.
[23] Y.-W. Si, J. Yin, Obst-based segmentation approach to financial time series, Engineering
Applications of Artificial Intelligence 26 (2013) 2581–2596. URL: https://www.sciencedirect.com/science/
article/pii/S0952197613001723. doi:https://doi.org/10.1016/j.engappai.2013.08.015.
[24] B. Håvardstun, C. Ferri, K. Flikka, J. A. Telle, XAI for time series classification: Evaluating the
benefits of model inspection for end-users, in: Explainable Artificial Intelligence, 2024. URL:
https://xaiworldconference.com/2024/, to be published.
[25] Z. J. Wang, J. W. Vaughan, R. Caruana, D. H. Chau, GAM coach: Towards interactive and
usercentered algorithmic recourse, in: Proc. Conf. on Human Factors in Computing Systems„ ACM,
2023, pp. 835:1–835:20.
[26] J. Höllig, C. Kulbach, S. Thoma, Tsinterpret: A python package for the interpretability of time
series classification, J. Open Source Softw. 8 (2023) 5220.
[27] W. Tang, L. Liu, G. Long, Interpretable time-series classification on few-shot samples, in: 2020</p>
      <p>International Joint Conference on Neural Networks (IJCNN), IEEE, 2020, pp. 1–8.
[28] Y. Ming, P. Xu, H. Qu, L. Ren, Interpretable and steerable sequence learning via prototypes, in:
Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery &amp; Data
Mining, 2019, pp. 903–913.
[29] H. A. Dau, E. Keogh, K. Kamgar, C.-C. M. Yeh, Y. Zhu, S. Gharghabi, C. A. Ratanamahatana,
Yanping, B. Hu, N. Begum, A. Bagnall, A. Mueen, G. Batista, Hexagon-ML, The UCR time series
classification archive, 2018.
[30] Z. Wang, W. Yan, T. Oates, Time series classification from scratch with deep neural
networks: A strong baseline, CoRR abs/1611.06455 (2016). URL: http://arxiv.org/abs/1611.06455.
arXiv:1611.06455.
[31] K. S. Gurumoorthy, P. Jawanpuria, B. Mishra, Spot: A framework for selection of prototypes using
optimal transport, 2021. arXiv:2103.10159.
[32] I. Contributors, Interpretml: Fit interpretable models. explain blackbox machine learning., GitHub
repository (2023). URL: https://github.com/interpretml/interpret.</p>
    </sec>
    <sec id="sec-7">
      <title>7. Appendix</title>
      <p>
        Define a 2-dimensional DP table [, ] that stores the th least costly simplification for the points
1, ...,  ending at . Set [
        <xref ref-type="bibr" rid="ref1 ref1">1, 1</xref>
        ] = 0.
      </p>
      <p>Compute all [, ] for  ≥ 1 as follows:
for  = 1 to  − 1 do</p>
      <p>Initialize an empty heap  to store triples (value, index, rank) with value being the key.
for  = 1 to  − 1 do
.add( · (, ) +  + [, 1], , 1)
if  ̸= 1 then</p>
      <p>.add( · (1, , ) + , , − 1)
end if
end for
for  = 1 to  do
(, , ) = ()
[, ] = 
if  ̸= − 1 then</p>
      <p>.add( · (, ) +  + [,  + 1], ,  + 1)
end if
end for
end for
To ensure that the last segment is not required to stop at , but instead can be a continuation of two
other points  and , go over all pairs of points and evaluate the error if we continue them to the
end.</p>
      <p>Store these possibilities on a heap  and extract the  best, similar to what we did previously.
Initialize an empty heap  to store quadruples (value, index, rank, end) with value being the key.
for  = 1 to  do
for  = 1 to  − 1 do
.add( · (, , ) +  + [, 1], , 1, )
if  ̸= 1 then</p>
      <p>.add( · (1, , , ) + , , − 1, )
end if
end for
end for
for  = 1 to  do
(, , , ) = ()
[, ] = 
if  ̸= − 1 then</p>
      <p>.add( · (, , ) +  + [,  + 1], ,  + 1, )
end if
end for</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Rajkomar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Oren</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. M.</given-names>
            <surname>Dai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Hajaj</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hardt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. J.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Marcus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sun</surname>
          </string-name>
          , et al.,
          <article-title>Scalable and accurate deep learning with electronic health records</article-title>
          ,
          <source>NPJ digital medicine 1</source>
          (
          <year>2018</year>
          )
          <fpage>1</fpage>
          -
          <lpage>10</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>G. A.</given-names>
            <surname>Susto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Cenedese</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Terzi</surname>
          </string-name>
          ,
          <article-title>Time-series classification methods: Review and applications to power systems data, Big data application in power systems (</article-title>
          <year>2018</year>
          )
          <fpage>179</fpage>
          -
          <lpage>220</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>H.</given-names>
            <surname>Ismail Fawaz</surname>
          </string-name>
          , G. Forestier,
          <string-name>
            <given-names>J.</given-names>
            <surname>Weber</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Idoumghar</surname>
          </string-name>
          , P.-A. Muller,
          <article-title>Deep learning for time series classification: a review, Data mining and knowledge discovery 33 (</article-title>
          <year>2019</year>
          )
          <fpage>917</fpage>
          -
          <lpage>963</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>M.</given-names>
            <surname>Guillemé</surname>
          </string-name>
          , V. Masson,
          <string-name>
            <given-names>L.</given-names>
            <surname>Rozé</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Termier</surname>
          </string-name>
          ,
          <article-title>Agnostic local explanation for time series classification</article-title>
          ,
          <source>in: 31st IEEE International Conference on Tools with Artificial Intelligence</source>
          ,
          <source>ICTAI</source>
          <year>2019</year>
          ,
          <article-title>Portland</article-title>
          ,
          <string-name>
            <surname>OR</surname>
          </string-name>
          , USA, November 4-
          <issue>6</issue>
          ,
          <year>2019</year>
          , IEEE,
          <year>2019</year>
          , pp.
          <fpage>432</fpage>
          -
          <lpage>439</lpage>
          . URL: https://doi.org/10.1109/ICTAI.
          <year>2019</year>
          .
          <volume>00067</volume>
          . doi:
          <volume>10</volume>
          .1109/ICTAI.
          <year>2019</year>
          .
          <volume>00067</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A. A.</given-names>
            <surname>Ismail</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. K.</given-names>
            <surname>Gunady</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. C.</given-names>
            <surname>Bravo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Feizi</surname>
          </string-name>
          ,
          <article-title>Benchmarking deep learning interpretability in time series predictions</article-title>
          ,
          <source>in: Annual Conference on Neural Information Processing Systems</source>
          <year>2020</year>
          ,
          <article-title>NeurIPS 2020</article-title>
          , December 6-
          <issue>12</issue>
          ,
          <year>2020</year>
          , virtual,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>E.</given-names>
            <surname>Delaney</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Greene</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. T.</given-names>
            <surname>Keane</surname>
          </string-name>
          ,
          <article-title>Instance-based counterfactual explanations for time series classification</article-title>
          ,
          <source>in: Case-Based Reasoning Research and Development - Int. Conference, ICCBR 2021</source>
          , Springer,
          <year>2021</year>
          , pp.
          <fpage>32</fpage>
          -
          <lpage>47</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>U.</given-names>
            <surname>Schlegel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Arnout</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>El-Assady</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Oelke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. A.</given-names>
            <surname>Keim</surname>
          </string-name>
          ,
          <article-title>Towards a rigorous evaluation of xai methods on time series</article-title>
          , in: 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), IEEE,
          <year>2019</year>
          , pp.
          <fpage>4197</fpage>
          -
          <lpage>4201</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>S. A.</given-names>
            <surname>Siddiqui</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Mercier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Munir</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Dengel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ahmed</surname>
          </string-name>
          ,
          <article-title>Tsviz: Demystification of deep learning models for time-series analysis</article-title>
          ,
          <source>IEEE Access 7</source>
          (
          <year>2019</year>
          )
          <fpage>67027</fpage>
          -
          <lpage>67040</lpage>
          . URL: https://doi.org/10.1109/ ACCESS.
          <year>2019</year>
          .
          <volume>2912823</volume>
          . doi:
          <volume>10</volume>
          .1109/ACCESS.
          <year>2019</year>
          .
          <volume>2912823</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>B.</given-names>
            <surname>Håvardstun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Ferri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. A.</given-names>
            <surname>Telle</surname>
          </string-name>
          ,
          <article-title>An interactive tool for interpretability of time series classification</article-title>
          ,
          <source>in: European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases</source>
          ,
          <year>2024</year>
          . To be published.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>A.</given-names>
            <surname>Theissler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Spinnato</surname>
          </string-name>
          , U. Schlegel,
          <string-name>
            <given-names>R.</given-names>
            <surname>Guidotti</surname>
          </string-name>
          ,
          <article-title>Explainable AI for time series classification: A review, taxonomy and research directions</article-title>
          ,
          <source>IEEE Access 10</source>
          (
          <year>2022</year>
          )
          <fpage>100700</fpage>
          -
          <lpage>100724</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>H.</given-names>
            <surname>Meng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Wagner</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Triguero</surname>
          </string-name>
          ,
          <article-title>Explaining time series classifiers through meaningful perturbation and optimisation</article-title>
          ,
          <source>Information Sciences 645</source>
          (
          <year>2023</year>
          )
          <article-title>119334</article-title>
          . URL: https://www. sciencedirect.com/science/article/pii/S0020025523009192. doi:https://doi.org/10.1016/j. ins.
          <year>2023</year>
          .
          <volume>119334</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>J.</given-names>
            <surname>Enguehard</surname>
          </string-name>
          ,
          <article-title>Learning perturbations to explain time series predictions</article-title>
          ,
          <source>in: International Conference on Machine Learning, PMLR</source>
          ,
          <year>2023</year>
          , pp.
          <fpage>9329</fpage>
          -
          <lpage>9342</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>ZHANG</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Luo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Du</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Fan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Wen</surname>
          </string-name>
          ,
          <article-title>Explaining time series via contrastive and locally sparse perturbations</article-title>
          ,
          <source>in: The Twelfth International Conference on Learning Representations</source>
          ,
          <year>2024</year>
          . URL: https://openreview.net/forum?id= qDdSRaOiyb.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>T.</given-names>
            <surname>Rojat</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Puget</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Filliat</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. Del</given-names>
            <surname>Ser</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Gelin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Díaz-Rodríguez</surname>
          </string-name>
          ,
          <article-title>Explainable artificial intelligence (xai) on timeseries data: A survey</article-title>
          ,
          <source>arXiv preprint arXiv:2104.00950</source>
          (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>C.</given-names>
            <surname>Obermair</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Fuchs</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Pernkopf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Felsberger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Apollonio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Wollmann</surname>
          </string-name>
          ,
          <article-title>Example or prototype? learning concept-based explanations in time-series</article-title>
          ,
          <source>in: Asian Conference on Machine Learning, PMLR</source>
          ,
          <year>2023</year>
          , pp.
          <fpage>816</fpage>
          -
          <lpage>831</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Geler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Kurbalija</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ivanović</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Radovanović</surname>
          </string-name>
          ,
          <article-title>Weighted kNN and constrained elastic distances for time-series classification</article-title>
          ,
          <source>Expert Systems with Applications</source>
          <volume>162</volume>
          (
          <year>2020</year>
          )
          <fpage>113829</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>A. H.</given-names>
            <surname>Gee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>García-Olano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Ghosh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Paydarfar</surname>
          </string-name>
          ,
          <article-title>Explaining deep classification of time-series data with learned prototypes</article-title>
          ,
          <year>2019</year>
          . URL: https://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>2429</volume>
          /paper3.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>E.</given-names>
            <surname>Keogh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Chakrabarti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Pazzani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Mehrotra</surname>
          </string-name>
          ,
          <article-title>Dimensionality reduction for fast similarity search in large time series databases</article-title>
          ,
          <source>Knowledge and information Systems</source>
          <volume>3</volume>
          (
          <year>2001</year>
          )
          <fpage>263</fpage>
          -
          <lpage>286</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>E.</given-names>
            <surname>Keogh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Chu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Hart</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Pazzani</surname>
          </string-name>
          ,
          <article-title>Segmenting time series: A survey and novel approach, in: Data mining in time series databases</article-title>
          , World Scientific,
          <year>2004</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>21</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>E.</given-names>
            <surname>Camponogara</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. F.</given-names>
            <surname>Nazari</surname>
          </string-name>
          ,
          <article-title>Models and algorithms for optimal piecewise-linear function approximation</article-title>
          ,
          <source>Mathematical Problems in Engineering</source>
          <year>2015</year>
          (
          <year>2015</year>
          )
          <fpage>876862</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>U.</given-names>
            <surname>Schlegel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. V.</given-names>
            <surname>Lam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. A.</given-names>
            <surname>Keim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Seebacher</surname>
          </string-name>
          ,
          <article-title>Ts-mule: Local interpretable model-agnostic 7.1. Pseudo code for DP algorithm</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>