<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>World Conference on eXplainable Artificial
Intelligence: July</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Generate Explanations for Time-series classification by ChatGPT</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Zhechang Xue</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yiran Huang</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Hongnan Ma</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Michael Beigl</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Karlsruhe Institute of Technology</institution>
          ,
          <addr-line>Kaiserstraße 12, 76131 Karlsruhe</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>School of Computer Science</institution>
          ,
          <addr-line>Merchant Venturers Building, 75 Woodland Road, BS8 1TH Bristol</addr-line>
          ,
          <country country="UK">UK</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Telecooperation Ofice, Karlsruhe Institute of Technology</institution>
          ,
          <addr-line>Vincenz-Prießnitz-Straße 1, 76131 Karlsruhe</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2024</year>
      </pub-date>
      <volume>1</volume>
      <fpage>7</fpage>
      <lpage>19</lpage>
      <abstract>
        <p>With the development of machine learning, the concept of explainability has gained increasing significance. It plays a crucial role in instilling trust among clients regarding the results generated by AI systems. Traditionally, researchers have relied on feature importance to explain why AI produces certain outcomes. However, this method has limitations. Despite the existence of documents that introduce various samples and describe formulas, comprehending the implicit meaning of these features remains challenging. As a result, establishing a clear and understandable connection between features and data can be a daunting task. In this paper, we aim to introduce a novel method for explaining time-series classification, leveraging the capabilities of ChatGPT to enhance the interpretability of results and foster a deeper understanding of feature contributions within time-series data.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Time-series classification</kwd>
        <kwd>ChatGPT</kwd>
        <kwd>Explainability</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        feature becomes one of the most essential preliminary steps [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Normally, these features are
calculated by complex formulas.
      </p>
      <p>However, these features are not easily understandable. Take
ft_coeficient__attr_‘abs’__coef_30 as an example. This feature is generated by TSFRESH1, one
of the most popular feature extraction packages. The document of TSFRESH, which explain
the meanings of every feature, says that this feature means the fourier coeficients of the
one-dimensional discrete fourier transform for real input. However, this explanation is purely
mathematical and does not easily translate into a physical interpretation in the real world.</p>
      <p>An ideal explanation should be: “This data is classified to class [shake hand], since the feature
ft_coeficient__attr_‘abs’__coef_30 has a high value, indicating the presence of a periodic pattern
likely caused by a shaking motion." This explanation explicitly links the mathematical definition
of the feature to a real-world motion, making it more intuitively understandable.</p>
      <p>
        Given ChatGPT’s ability for causal inference, we aim to utilize this ability to connect
classification outcomes to easily understandable explanations. Long et al. [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] showcased that LLMs
can accurately generate Directed Acyclic Graphs (DAGs) with proper prompts, indicating their
capacity for causal reasoning. Moreover, ChatGPT has demonstrated prowess in data analysis.
Salim et al. [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] conducted the first time-series forecasting with a prompt-based method, and
they concluded that the performance of prediction is better than the numerical-based method.
      </p>
      <p>Based on the previous works, we have found that it’s feasible to use LLM to analyze
timeseries datasets. By far, all the experiments based on LLM are time-series predictions. There’s
no research about time-series classification and its explanability. Thus, we want to conduct
innovative research on this topic.</p>
      <p>In this paper, we will explore a method for generating explanations in time-series classification.
Our approach involves using ChatGPT to classify data samples and subsequently explaining
the rationale behind its classifications. It is important to note that GPT’s numerical processing
capabilities are comparatively inferior to numerical methods. Therefore, our goal is to trade of
a small degree of accuracy in exchange for the ability to generate explanations. The code of this
method is available on GitHub.2</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Works</title>
      <p>
        Some traditional methods for explaining time-series classification have been proposed. Senin et
al. pioneered interpretable time series classification using SAX and vector space models [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
This approach enables the ranking of time-series patterns by importance, serving as a precursor
to FIR. However, despite its interpretability, the classification performance is not ideal.
      </p>
      <p>
        Motivated by LIME [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], a famous approach to the explanation of classifier, Guillemé et al.
[
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] proposed the first agnostic Local Explainer For TIme Series classificaTion (LEFTIST), which
provides explanations for predictions made by any time series classifier. They showed that
explanations generated by LEFTIST can help users understand the classification in easy cases.
      </p>
      <p>
        Coincidentally, Torty et al. [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] proposed a framework called LIMESegment, which has
demonstrated the production of more faithful and robust explanations compared to the existing
      </p>
      <sec id="sec-2-1">
        <title>1https://tsfresh.readthedocs.io/en/latest/ 2https://github.com/lab992/Generate_explanations_for_classification_by_ChatGPT</title>
        <p>Raw train
dataset</p>
        <p>TSFRESH Feature table</p>
        <p>Rank</p>
        <p>
          Feature table Classify
with top-k
features
......
state-of-the-art adaptation of LIME for time-series classification. (e.g. Neves et al. [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ])
        </p>
        <p>We have found that local explanation plays a big role in time-series classification. By observing
the pattern of data, reliable explanations can be generated. we can try to combine LLM and
local explanation together, to make the explanation not only reliable but also understandable.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Methodology</title>
      <sec id="sec-3-1">
        <title>3.1. Components of prompt</title>
        <p>The main idea of this methodology is to translate time-series data into a description of the
movement. Based on this description, GPT will classify the data into a specific class. Finally,
GPT will explain the rationale behind its classification.</p>
        <p>To elicit a comprehensive response, a well-structured prompt is essential. A prompt comprises
a context and a query. An ideal response should encompass classification and an explanation.</p>
        <p>The context component encompasses background information and a description of
classification rules. The background part should outline the data measurement scenario, providing
ChatGPT with a clear understanding of the mission. This understanding aids ChatGPT in
grasping the context and objectives of the subsequent tasks. The description of classification
rules acts as a set of instructions derived from the training dataset. These instructions guide
ChatGPT in accurately classifying the given description of the test data sample based on features.
In the mission part of the query, a precise inquiry should be formulated concerning both the
correct classification and an explanation. Additionally, the desired answer format should be
specified to enhance readability. It’s imperative to instruct ChatGPT to refrain from providing
code-based responses and instead use textual explanations.</p>
        <p>Subsequently, a detailed methodology for generating descriptions of classification rules and
test data samples will be introduced.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Extract classification rules</title>
        <p>
          Indeed, utilizing data descriptions in the methodology rather than directly inputting raw
timeseries data into ChatGPT ofers several advantages. Firstly, it overcomes the limitation of
input length, as raw data often exceeds ChatGPT’s input capacity. Secondly, ChatGPT may
struggle with understanding multi-digit numbers, which can lead to inaccuracies in analysis
and classification [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]. Therefore, employing a method to describe time-series data efectively
becomes crucial, and generating classification rules from the training dataset serves as a viable
and beneficial approach to address these challenges.
        </p>
        <p>
          The process of extracting classification rules is outlined in Fig. 1. Absolutely, describing
data using features provides a more comprehensive understanding compared to using only
numerical values. Features add context and meaning to the data, allowing for a deeper analysis
and interpretation of its characteristics. In the initial step, TSFRESH [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] is utilized to construct
a feature table comprising hundreds of features from the training dataset. Recognizing that an
explanation based on hundreds of features can be perplexing, features generated by TSFRESH
will be then ranked using a decision tree and only top-k features can be reserved. Subsequently,
the feature table is utilized to train a decision tree model. Finally, classification rules are extracted
from the decision tree and translated into a textual form, ofering guidance for ChatGPT to
accurately classify data.
        </p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Generate lookup table</title>
        <p>To ensure the generation of accurate descriptions for both classification rules and test data
samples, a lookup table is indeed necessary. This table serves as a reference point, providing
the meanings of various features used in the classification rules and test data. With the lookup
table in place, the process of translating data into descriptions becomes more precise and
consistent, enhancing the overall accuracy of the generated descriptions. This table consists
of four columns: ‘Feature name’, ‘Meaning’, ‘Type’, and ‘Value’. An example of the lookup
table generated by classification rules is presented in Fig. 2. “Feature name” refers to the name
assigned to a specific feature. “Meaning” represents the physical interpretation, providing
a contextual understanding of its role in the explanation. “Type” distinguishes whether the
meaning associated with a feature is a noun or an adjective, aiding in structuring the description
appropriately. “Value” quantifies the extent or significance of the meaning associated with a
feature, providing additional detail about its impact or amount.</p>
        <p>As is shown in Fig. 2, a single classification rule comprises multiple features, many of which
are not readily understandable even with references from TSFRESH’s documentation. For
example, number_crossing_0 is explained as the number of crossings, which still lacks a clear
physical interpretation.</p>
        <p>To address this issue, we ask ChatGPT to explain these features. The question and ChatGPT’s
answer are detailed in Fig. 3. ChatGPT answers us that a big value of this feature implies that
there are frequent changes in direction or acceleration. This demonstrates that ChatGPT can
provide concrete and understandable explanations for features.</p>
        <p>As is shown in Fig. 2, the meanings of features are manually selected from the answers
provided by ChatGPT.</p>
        <p>Lastly, the feature values will also be directly recorded, forming an essential basis for
determining how to describe the features.</p>
      </sec>
      <sec id="sec-3-4">
        <title>3.4. Translate data to description</title>
        <p>After constructing the lookup table, a translation from data to descriptions can be conducted.
The "Type" and "Value" columns in the lookup table play a crucial role in determining how to
describe the meaning associated with each feature.</p>
        <p>When the count of a feature becomes a crucial consideration, it is defined as a noun.. Phrases
if number_crossing_0 &lt; 6.0, and agg_linear_trend_attr_‘‘intercept’’_ &gt; 0.4, then class: 3
GPT Manually</p>
        <p>Directly
Directly</p>
        <p>Feature name
number_crossing_0</p>
        <p>Meaning
Direction changes</p>
        <p>Type
Noun</p>
        <p>Value
6.0
User: number_crossing_0 is a time series feature, what can you infer from this feature if this value is
big during a measurement of accelerometer on x-axis. Give me only one most important pattern.
ChatGPT: A significant value of “number_crossing_0” in the context of accelerometer measurements
on the x-axis typically indicates a pattern of rapid and frequent changes in direction or acceleration.
This could suggest that the object or system being measured is undergoing dynamic and possibly
complex movements, such as sudden stops and starts, sharp turns, or oscillations. Such patterns are
often associated with activities like vehicular motion, sports activities, or machinery operations where
rapid changes in movement direction or intensity occur frequently.
such as ‘A lot of’ or ‘Few’ are used to describe them. If the provided data feature value exceeds
the corresponding value in the lookup table, then the phrase ‘A lot of’ is used; conversely, ‘Few’
is used. Similarly, when emphasizing the extent of the feature’s impact or significance, it is
defined as an adjective, terms like ‘Big’ or ‘Slight’ are employed for description,</p>
        <p>For classification rules, for instance, consider the rule in Fig. 4 as an example: “if
number_crossing_0 &lt; 6.0, and agg_linear_trend__attr_“intercept”_3 &gt; 0.4, then class: 3 ”. The program
automatically finds the corresponding meanings of features in the lookup table and replaces
them. The translated rule becomes: “If there are few direction changes and a big decreasing
trend, then class 3.”</p>
        <p>For a test data sample, an example in Fig. 4 number_crossing_0 = 9.0 is translated into A lot of
changes in the direction, since 9.0 exceeds 6.0. Ultimately, the test sample is assembled with the
meanings of each feature.</p>
        <p>Finally, the background, a description of classification rules, a description of test data samples,
and the mission are sent to ChatGPT. ChatGPT will classify the data and provide explanations
for its decisions.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Evaluation</title>
      <sec id="sec-4-1">
        <title>4.1. Experiment Setup</title>
        <p>Dataset Benchmark We use human activity recognition datasets because explanations are
more intuitively understandable. For example, we can infer from a sample with a periodic
pattern to a periodic activity, which indicates a shaking motion. In this paper, we use 3 datasets:</p>
        <sec id="sec-4-1-1">
          <title>3This feature indicates a decreasing trend.</title>
          <p>Classification rules
number_crossing &lt; 6.0
Test data sample
number_crossing = 9.0
agg_linear_trend
__attr_"intercept &gt; 0.4</p>
          <p>Class 3
agg_linear_trend
__attr_"intercept = 0.23</p>
          <p>Lookup</p>
          <p>table
Lookup
table</p>
          <p>Description of
classification rules
If there are few changes in the direction and</p>
          <p>a big decreasing trend, then class 3.</p>
          <p>Description of test data</p>
          <p>sample
There are a lot of changes in the direction
and a slight decreasing trend.
AllGestureWiimoteX4, Basketball motion5, and HMP6. For each dataset, we randomly selected
three classes.</p>
          <p>Evaluate Metrics To assess the performance of our classification models, we employ accuracy
as the metric, which evaluates the model’s capability to accurately classify the provided samples.
Accuracy is determined by the number of correctly classified test data samples divided by the
total number of samples.</p>
          <p>Models We conducted evaluations using datasets across three models: GPT-3.5-0301,
GPT-3.50613, and GPT-4-0613.</p>
          <p>Baseline In this study, we select a decision-tree as the baseline. We apply the feature
selection package TSFRESH to training samples. Then we select the top 3 features of each class
and let decision-tree to classify the feature table generated by TSFRESH. The accuracy of
AllGestureWiimoteX is 68.33%, Basketball is 66.66%, and HMP is 100%.</p>
        </sec>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Result</title>
        <p>The result is shown in Table 1. Their accuracy is all close to the baseline and therefore it can be
concluded that our method is universal for the human activity recognition dataset.</p>
        <p>Besides, by observing the accuracy of GPT-models, we conclude that with the advancement of
the model, the performance of our method improves, since in all datasets, the experiment on
GPT4 has higher accuracy than on GPT-3.5-turbo-0613, and the experiment on GPT-3.5-turbo-0613
performs better than on GPT-3.5-turbo-0301.</p>
        <p>A representative explanation is presented in Fig. 5, where we can see from the explanation that
ChatGPT can classify a shaking movement, making the explanation intuitively understandable.
5. Limitation
3-class classification Originally, the three datasets we used in the experiment contained
multiple classes. However, given our emphasis on ChatGPT’s ability to generate explanations,
we opted for a lightweight experiment. As such, we randomly selected three classes from each
dataset for analysis.
4https://www.timeseriesclassification.com/description.php?Dataset=AllGestureWiimoteX
5https://archive.ics.uci.edu/dataset/587/basketball+dataset
6https://github.com/wchill/HMP_Dataset
ChatGPT: According to the given descriptions, your data sample seems to fall into two categories
‘shake hand’ due to the lots of changes in direction, and ‘move to left’ due to a big decreasing trend and
few increasing values.</p>
        <p>However, the identification of movement is often prioritized by the most distinctive feature that clearly
separates one from the others. The ‘shake hand’ gesture is most prominently characterized by lots of
changes in the direction. This aspect, along with the decreasing trend and few increasing values, may
suggest that the hand movement involved turning the hand left and right repeatedly , while
gradually dropping it down a bit - a very likely movement pattern for a shaking hand - thus falling into
the ‘shake hand’ category.</p>
        <p>On the other hand, the ‘move to left’ gesture is characterized by few changes in the direction and a big
decreasing trend. Even though these aspects are present in your data sample, the lots of changes in
direction do not match the key characteristic of ‘move to left’, i.e., ‘few changes in the direction’. Hence,
it would be more accurate to categorize this data sample as ‘shake hand’.</p>
        <p>So, to summarize: [Class: shake hand].
Select meaning manually In section. 3.3, the meanings of each feature are manually selected.
However, an alternative approach could involve training a model to automatically extract
keywords from ChatGPT’s answers regarding the meanings of these features.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>6. Conclusion and future work</title>
      <p>In this paper, we have introduced a method to generate explanations for time-series classification.
We have explored the application of LLM in the field of human activity recognition. We have
proved that explanations can be generated by our method. By applying this method, the
classification performs excellent with a high accuracy. Besides, we have shown that explanations
generated by this method are understandable by human evaluation, which means they are close
to human cognitive habits.</p>
      <p>Furthermore, our pipeline has proved to be universal for other human activity recognition,
since all datasets get a result with high accuracy. It’s also proved that the performance of our
pipeline will be improved with the advancement of models. So far, GPT-4 has proved to be
better than GPT-3.5.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgment</title>
      <p>This work was partially funded by the Ministry of The Ministry of Science, Research and the
Arts Baden-Wuerttemberg as part of the SDSC-BW and by the German Ministry for Research
as well as by Education as part of SDI-C (Grant 01IS19030A)</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>L.</given-names>
            <surname>Antwarg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. M.</given-names>
            <surname>Miller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Shapira</surname>
          </string-name>
          , L. Rokach,
          <article-title>Explaining anomalies detected by autoencoders using shapley additive explanations</article-title>
          ,
          <source>Expert systems with applications 186</source>
          (
          <year>2021</year>
          )
          <fpage>115736</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>M.</given-names>
            <surname>Barandas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Folgado</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Fernandes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Santos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Abreu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Bota</surname>
          </string-name>
          , H. Liu,
          <string-name>
            <given-names>T.</given-names>
            <surname>Schultz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Gamboa</surname>
          </string-name>
          , Tsfel: Time series feature extraction library,
          <source>SoftwareX</source>
          <volume>11</volume>
          (
          <year>2020</year>
          )
          <fpage>100456</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>S.</given-names>
            <surname>Long</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Schuster</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Piché</surname>
          </string-name>
          , U. de Montreal, S. Research, et al.,
          <article-title>Can large language models build causal graphs?</article-title>
          ,
          <source>arXiv preprint arXiv:2303.05279</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>H.</given-names>
            <surname>Xue</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. D.</given-names>
            <surname>Salim</surname>
          </string-name>
          ,
          <article-title>Promptcast: A new prompt-based learning paradigm for time series forecasting</article-title>
          ,
          <source>IEEE Transactions on Knowledge and Data Engineering</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>P.</given-names>
            <surname>Senin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Malinchik</surname>
          </string-name>
          ,
          <article-title>Sax-vsm: Interpretable time series classification using sax and vector space model</article-title>
          ,
          <source>in: 2013 IEEE 13th international conference on data mining, IEEE</source>
          ,
          <year>2013</year>
          , pp.
          <fpage>1175</fpage>
          -
          <lpage>1180</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>M. T.</given-names>
            <surname>Ribeiro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Guestrin</surname>
          </string-name>
          ,
          <article-title>" why should i trust you?" explaining the predictions of any classifier</article-title>
          ,
          <source>in: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining</source>
          ,
          <year>2016</year>
          , pp.
          <fpage>1135</fpage>
          -
          <lpage>1144</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>M.</given-names>
            <surname>Guillemé</surname>
          </string-name>
          , V. Masson,
          <string-name>
            <given-names>L.</given-names>
            <surname>Rozé</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Termier</surname>
          </string-name>
          ,
          <article-title>Agnostic local explanation for time series classification</article-title>
          ,
          <source>in: 2019 IEEE 31st international conference on tools with artificial intelligence (ICTAI)</source>
          , IEEE,
          <year>2019</year>
          , pp.
          <fpage>432</fpage>
          -
          <lpage>439</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>T.</given-names>
            <surname>Sivill</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Flach</surname>
          </string-name>
          , Limesegment: Meaningful, realistic time series explanations,
          <source>in: International Conference on Artificial Intelligence and Statistics</source>
          , PMLR,
          <year>2022</year>
          , pp.
          <fpage>3418</fpage>
          -
          <lpage>3433</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>I.</given-names>
            <surname>Neves</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Folgado</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Santos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Barandas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Campagner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Ronzio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Cabitza</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Gamboa</surname>
          </string-name>
          ,
          <article-title>Interpretable heartbeat classification using local model-agnostic explanations on ecgs</article-title>
          ,
          <source>Computers in Biology and Medicine</source>
          <volume>133</volume>
          (
          <year>2021</year>
          )
          <fpage>104393</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>J.</given-names>
            <surname>An</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <surname>G.</surname>
          </string-name>
          <article-title>Gweon, Does chatgpt comprehend the place value in numbers when solving math word problems</article-title>
          ,
          <source>in: Proceedings of the Workshop” Towards the Future of AI-augmented Human Tutoring in Math Learning” co-located with The 24th International Conference on Artificial Intelligence in Education (AIED</source>
          <year>2023</year>
          ), Tokyo, Japan, volume
          <volume>3491</volume>
          ,
          <year>2023</year>
          , pp.
          <fpage>49</fpage>
          -
          <lpage>58</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>M.</given-names>
            <surname>Christ</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Braun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Neufer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. W.</given-names>
            <surname>Kempa-Liehr</surname>
          </string-name>
          ,
          <article-title>Time series feature extraction on basis of scalable hypothesis tests (tsfresh-a python package)</article-title>
          ,
          <source>Neurocomputing</source>
          <volume>307</volume>
          (
          <year>2018</year>
          )
          <fpage>72</fpage>
          -
          <lpage>77</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>