<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A Challenging Dataset of Jet Engine Fault Scenarios</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Christoforos Romesis</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Stasinos Konstantopoulos</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Institute of Informatics and Telecommunications, NCSR Demokritos, Patr.Gregoriou E &amp; 27 Neapoleos Str</institution>
          ,
          <addr-line>15341 Agia Paraskevi</addr-line>
          ,
          <country country="GR">Greece</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <abstract>
        <p>This dataset description paper introduces a challenging sequence processing task. Specifically, the task is to recognize faults from aircraft gas turbine engine conditions where previous faults determine how the presently observed conditions should be interpreted. Several state-of-the-art deep-learning sequence processors have been tested on the task, and these preliminary results demonstrate that they cannot correctly model the phenomenon.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;time-series dataset</kwd>
        <kwd>sequence processing</kwd>
        <kwd>long-range dependencies</kwd>
        <kwd>gas turbine faults</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Motivation and Purpose</title>
      <p>
        Time series forecasting and time series classification are
the major sequence processing tasks, and both depend on a
method’s ability to identify time-based patterns in the data.
When dealing with long sequence processing in particular,
successful methods must identify non-local patterns such
as trends, seasonality, and long-range dependencies.
Observing large-scale and high-impact repositories such as
the Monash Time Series Forecasting Archive [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] reveals an
abundance of datasets that exhibit long-term trends and
long-period seasonal patterns, but a lack of datasets that
exhibit non-periodic long-range dependencies.
Unsurprizingly, successful methods stem from the non-parametric
statistics field or from the deep learning field.
      </p>
      <p>The sequential integration of sub-symbolic and symbolic
modules allows connectionist recognition of single events
to interact with long-range dependencies between events.
Such neuro-symbolic approaches are powerful, but to
demonstrate their efectiveness they must be applied to datasets
and tasks where the following conditions hold
simultaneously: (a) the recognition of individual events in the raw
data is a non-trivial pattern recognition task for which
connectionist methods are more appropriate; and (b) this
recognition is afected by unknown symbolic patterns that need to
be discovered in conjunction with the sub-symbolic patterns
themselves.</p>
      <p>In this dataset description paper we present such a task
where purely connectionist sequence processing performs
substantially below the par established through the usual
trend and seasonality tasks and where neuro-symbolic
methods can demonstrate their efectiveness in challenging
realworld applications.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Dataset Description</title>
      <p>Jet engines play a vital role in aviation, and ensuring their
safety and reliability is paramount. The harsh environments
they operate in can cause malfunctions, impacting
aerothermodynamic measurements. Detecting faults in-flight from
time-series data remains challenging due to limited
instrumentation, similar fault efects, concurrent faults, and prior
events. This paper introduces a dataset of simulated
timeseries measurements from a turbofan engine, including
various common faults, to aid in developing fault detection
strategies. This dataset is valuable for the Machine Learning
community, presenting a complex engineering problem with
unique features. It includes 2,410 time series, each with 3,600
time steps, where faults are introduced, and measurements
are taken at every step. Some faults have nearly
indistinguishable efects, posing challenges for diferentiation, while
others exhibit long-term dependencies. These attributes
make the dataset suitable for research in sequential and
timestepwise classification and investigating long-term
dependencies. The dataset comprises time series measurements
from a low bypass ratio turbofan engine, typical of modern
civil aviation engines, generated using an Engine
Performance Model. The dataset has been archived and is publicly
accessible from https://doi.org/10.5281/zenodo.15856441</p>
      <sec id="sec-2-1">
        <title>2.1. Engine Performance Model</title>
        <p>
          An EPM interrelates parameters that represent engine
component health and operating conditions with measurements
performed on an engine [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ], and can be expressed through
Equation (1):
        </p>
        <p>
          Y = g(u, f )
(1)
where g is a vector function representing the EPM, u is
a vector of measured quantities defining the engine’s
operating point, Y is a set of measurements for condition
monitoring, and f is a vector of engine component health
parameters. Typically, two health parameters are used for
each component: the flow factor (SW), indicating the
component’s swallowing capacity, and the eficiency factor (SE),
representing thermal eficiency. The application of such
parameters for assessing engine component health has been
discussed in [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]. A deviation of a health parameter from its
nominal value signals a fault in that component. As
extensively discussed in [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] the ratio of deviations between two
health parameters can serve as a characteristic metric for
faults such as fouling, increased tip clearance, erosion, and
foreign object damage. The severity of the fault correlates
with the level of this deviation. The deviation (or delta) of a
health parameter Δf is defined as:
Δf (%) = f − fo
fo
where f denotes the value of the health parameter, and fo
its nominal value.
        </p>
        <p>Using the EPM, the nominal values of the available
measurement set, denoted as Yo, can be computed for a
specified operating point u. These values correspond to the
nominal health parameters, fo, and are derived via the
following relation:</p>
        <p>Yo = g(u, fo)</p>
        <p>Once a measurement vector Y is obtained from the
engine, the relative deviation ΔY(%) from its nominal value
Yo is defined as:
ΔY(%) =</p>
        <p>Y −</p>
        <p>Yo
Yo</p>
        <p>
          To emulate realistic measurement conditions, random
noise is superimposed on the measurement deviations. The
noise levels considered are consistent with those typical of
this instrumentation, as documented in [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ].
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Dataset Generation</title>
        <p>
          The EPM used for dataset generation was developed in
MATLAB. At each time step, the data includes a labeled vector of
measurement deltas produced by the EPM at a specified
operating point and a specific engine health condition. A total
of 10 operating points were considered, covering a broad
range of the engine’s flight envelope. Throughout a time
series, all time steps correspond to the same operating point.
Additionally, five distinct health conditions were considered,
as summarized in Table 1. The first health condition
represents the nominal, healthy operation of the engine. The
second health condition (coded as TIP) models an increased
clearance of the compressor blades. As reported in [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ], this
fault results in a ratio of the related health parameter
deviations, ∆ /∆ , of 0.2; the actual deviations of ∆ 
and ∆  are around − 1% and − 5%, respectively. The
remaining three conditions involve a mistuning of the inlet
guide vanes of the compressor, damage caused by foreign
objects within the compressor, and the concurrent occurrence
of VGV and FOD faults. These conditions result in a ratio of
− 3, 1, and 0.2, respectively. In all cases, deviations within
the range of 0.7 to 1.3 times the aforementioned values are
considered, simulating faults of varying severity. Notably,
the efects of TIP faults and the combined VGV+FOD faults
on the compressor’s health parameters are the same, and
therefore, measurements under these conditions follow the
same distribution. Each dataset record is labeled according
to the specific health condition present at that time step.
        </p>
        <p>Each time series comprises 3,600 time steps. During this
period, various health conditions may develop. A time
series without any fault contains measurements indicative of
healthy engine operation at all time steps. In other time
series, the engine operates fault-free initially, but at a certain
point, a TIP, VGV, or FOD fault occurs and persists until the
end of the series. In some other time series, the engine also
operates fault-free initially, but at a certain point, a brief
period occurs during which a VGV fault is present. The
(2)
(3)
(4)
engine then continues to operate without faults until a later
time step, when a combined VGV+FOD fault manifests and
persists for the remainder of the time series. This scenario
exemplifies a specific fault condition characterized by a
severe VGV fault leading to FOD. The damage is attributed to
the detachment of a particle from the VGV mechanism, such
as a loosened bolt, which is subsequently ingested by the
engine. The failure of the VGV mechanism may be preceded
by a transient VGV fault event, serving as an early indicator
of the impending failure. This scenario was selected because
both the TIP and VGV+FOD faults exert similar impacts on
engine performance. The key diferentiator between the TIP
fault and the VGV+FOD fault is the long-term dependency
of the latter on the initial, instantaneous VGV fault.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Concluding Remarks and Next</title>
    </sec>
    <sec id="sec-4">
      <title>Steps</title>
      <p>The dataset is used to evaluate the classification capabilities
of sequential machine learning methods on time-series data
with long-term dependencies. Preliminary results suggest
the dataset challenges current techniques. For example,
Figure 1 shows four classification methods trained on the
dataset and tested on a time-series segment containing a
VGV+FOD fault. The methods, include an RNN, a GRU,
and an LSTM, together with an MLP, trained for
point-topoint classification for comparison. All methods failed to
detect the VGV+FOD fault, though they detected a precursor
VGV fault at time step 740. Sequential methods have shown
limited ability to use information from prior time steps,
performing similarly to the point-wise MLP.</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgments</title>
      <p>This research was co-funded by the European Union under
GA no. 101135782 (MANOLO project). Views and
opinions expressed are however those of the authors only and
do not necessarily reflect those of the European Union or
CNECT. Neither the European Union nor CNECT can be
held responsible for them.</p>
    </sec>
    <sec id="sec-6">
      <title>Declaration on Generative AI</title>
      <p>The authors have not employed any Generative AI tools.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>R. W.</given-names>
            <surname>Godahewa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Bergmeir</surname>
          </string-name>
          ,
          <string-name>
            <surname>G. I. Webb</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Hyndman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Montero-Manso</surname>
          </string-name>
          ,
          <article-title>Monash time series forecasting archive</article-title>
          ,
          <source>in: Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2)</source>
          ,
          <year>2021</year>
          . URL: https://openreview.net/ forum?id=
          <fpage>wEc1mgAjU</fpage>
          -.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>A.</given-names>
            <surname>Alexiou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Mathioudakis</surname>
          </string-name>
          ,
          <article-title>Development of Gas Turbine Performance Models Using a Generic Simulation Tool</article-title>
          , in: Volume
          <volume>1</volume>
          :
          <string-name>
            <surname>Turbo</surname>
            <given-names>Expo 2005</given-names>
          </string-name>
          , ASMEDC, Reno, Nevada, USA,
          <year>2005</year>
          . doi:
          <volume>10</volume>
          .1115/gt2005-
          <fpage>68678</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>K.</given-names>
            <surname>Mathioudakis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Stamatis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Tsalavoutas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Aretakis</surname>
          </string-name>
          ,
          <article-title>Performance analysis of industrial gas turbines for engine condition monitoring</article-title>
          ,
          <source>Proceedings of the Institution of Mechanical Engineers, Part A: Journal of Power and Energy</source>
          <volume>215</volume>
          (
          <year>2001</year>
          )
          <fpage>173</fpage>
          -
          <lpage>184</lpage>
          . doi:
          <volume>10</volume>
          .1243/ 0957650011538442, publisher: SAGE Publications.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>K.</given-names>
            <surname>Mathioudakis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Alexiou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Aretakis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Romesis</surname>
          </string-name>
          ,
          <article-title>Signatures of Compressor and Turbine Faults in Gas Turbine Performance Diagnostics: A Review</article-title>
          ,
          <source>Energies</source>
          <volume>17</volume>
          (
          <year>2024</year>
          )
          <article-title>3409</article-title>
          . doi:
          <volume>10</volume>
          .3390/en17143409, publisher: MDPI AG.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>C.</given-names>
            <surname>Romesis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Aretakis</surname>
          </string-name>
          , K. Mathioudakis,
          <article-title>ModelAssisted Probabilistic Neural Networks for Efective Turbofan Fault Diagnosis</article-title>
          ,
          <source>Aerospace</source>
          <volume>11</volume>
          (
          <year>2024</year>
          )
          <article-title>913</article-title>
          . doi:
          <volume>10</volume>
          .3390/aerospace11110913, publisher: MDPI AG.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>