<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Transformer-based Analysis of Vehicle CAN Bus Data for Predictive Maintenance: A Real Case Study</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Ali Yassine</string-name>
          <email>ali_yassine@polito.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Fabio Zorzan</string-name>
          <email>fzorzan@tierratelematics.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lucia Salvatori</string-name>
          <email>lsalvatori@tierratelematics.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Luca Vassio</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Luca Cagliero</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Politecnico di Torino, Corso Duca degli Abruzzi</institution>
          ,
          <addr-line>24, 10129 Torino</addr-line>
          ,
          <country country="IT">Italia</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Tierra S.p.A., Corso Francesco Ferrucci</institution>
          ,
          <addr-line>112, 10138 Torino</addr-line>
          ,
          <country country="IT">Italia</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2026</year>
      </pub-date>
      <abstract>
        <p>The use of machine learning techniques to analyze Controller Area Network (CAN) bus data transmitted by fleets of industrial vehicles has been increasingly explored. The main industrial applications include avoiding service disruptions and vehicle damage, improving operational eficiency, and reducing cybersecurity risks. However, the application of predictive maintenance techniques to industrial vehicle data is challenged by the high-dimensional and heterogeneous nature of the signals, their variable quality, and the limited availability of public benchmarks and human annotations. In this work, we describe a real-world industrial case study based on company data acquired from fleets of thousands of commercial vehicles over several years. We design a machine learning pipeline to early detect vehicle faults based on the analysis of CAN bus signals and evaluate the performance of several prediction models, including a newly proposed transformer-based architecture. Furthermore, we also demonstrate that existing public benchmarks fail to capture the complexity of real industrial scenarios, highlighting the need for more realistic and comprehensive analysis and benchmarks.</p>
      </abstract>
      <kwd-group>
        <kwd>Predictive maintenance</kwd>
        <kwd>Vehicle systems</kwd>
        <kwd>CAN bus data</kwd>
        <kwd>Machine learning</kwd>
        <kwd>Time series analysis</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Modern vehicles are complex cyber–physical systems composed of numerous Electronic Control Units
(ECUs) and sensors that continuously monitor mechanical, electrical, and thermal subsystems. This
increasing level of instrumentation enables detailed visibility into vehicle operation, introducing new
opportunities to improve reliability, maintenance, and fault management [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>
        Unexpected failures can result in costly downtime, safety risks, and operational issues, particularly in
industrial fleet scenarios where vehicles operate continuously under diverse conditions [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Traditional
maintenance strategies, such as reactive repairs or fixed-interval servicing, are often inadequate for
such complex systems. For instance, reactive maintenance typically leads to unplanned downtime
and expensive repairs, while scheduled preventive maintenance causes unnecessary interventions and
ineficient use of resources.
      </p>
      <p>
        Predictive Maintenance (PdM) aims to address these limitations by exploiting operational data
to anticipate equipment failures and schedule maintenance proactively to reduce downtime, lower
operational costs, and extend the life of industrial assets [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Classical PdM approaches rely on statistical
modeling, condition monitoring, and reliability-centered metrics such as Remaining Useful Life (RUL)
and Mean Time Between Failures (MTBF). These methods often use sensor-based measurements,
including vibration, temperature, pressure, and oil analysis to detect signs of degradation. As a drawback,
traditional approaches face challenges in modern vehicles due to the heterogeneous, high-dimensional
nature of sensor signals and complex inter-dependencies between subsystems [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
Published in the Proceedings of the Workshops of the EDBT/ICDT 2026 Joint Conference (March 24-27, 2026), Tampere, Finland
      </p>
      <p>LGOBE
(L. Cagliero)</p>
      <p>CEUR
Workshop</p>
      <p>ISSN1613-0073</p>
      <p>
        The Controller Area Network (CAN) bus represents a key data source for PdM in vehicles. It provides
continuous streams of sensor measurements and control signals related to engine operation, cooling
systems, and other critical subsystems. These multivariate time series capture both nominal operating
conditions and subtle changes that may precede fault events. However, applying data-driven PdM
methods to real-world CAN data remains challenging due to the high dimensionality and heterogeneity
of the signals, their variable quality, and the limited availability of reliable fault annotations in industrial
settings. Specifically, state-of-the-art approaches rely on transformer models [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], mainly addressing the
detection of short-horizon anomalies or intrusion. Little attention has been paid to long-horizon fault
anticipation and PdM using raw CAN data.
      </p>
      <p>In this work, we address long-horizon PdM for industrial vehicles through a real-world case study
involving large-scale CAN bus data collected from fleets of thousands of vehicles over multiple years.</p>
      <p>Our innovative contributions are threefold:
• We develop a machine learning pipeline for early fault detection in CAN bus data and
evaluate a variety of prediction models.
• We propose a new transformer-based architecture designed to efectively capture long-range
temporal dependencies and cross-signal interactions in high-dimensional multivariate scenarios.
• We provide an analytical comparison between the data considered in this industrial
case study and the in-domain public benchmarks, justifying the ad hoc pipeline and the
experiments reported in the present study.</p>
      <p>The remainder of the paper is organized as follows. Section 2 describes the CAN dataset and
its comparison with public benchmarks. Section 3 presents the proposed transformer-based model.
Section 4 details the experimental setup. Section 5 reports experimental results and discusses their
implications. Finally, Sections 6 and 7 conclude the paper, outline directions for future work, and discuss
the main limitations of the present work.
2. Vehicle Data &amp; Problem Description</p>
      <sec id="sec-1-1">
        <title>2.1. CAN Bus Data</title>
        <p>Modern vehicles rely on the Controller Area Network (CAN) protocol to enable communication among
electronic control units (ECUs) and sensors. The CAN bus provides a lightweight and reliable
broadcastbased communication mechanism, allowing multiple ECUs to transmit messages containing sensor
measurements, actuator states, and diagnostic information over a shared bus.</p>
        <p>Each CAN message is identified by a unique identifier and carries a payload encoding one or more
physical signals, such as engine speed, coolant temperature, pressures, or voltages. These raw messages
are continuously generated during vehicle operation at signal-specific frequencies, ranging from a few
milliseconds to several seconds, depending on the subsystem and sensor type.</p>
        <p>
          In industrial fleet deployments, CAN messages are collected by on-board telematics units and
transmitted to cloud-based infrastructures [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]. The resulting multivariate time series are stored at a fixed
sampling resolution and made available for downstream analytics, including monitoring, diagnostics,
and PdM.
        </p>
      </sec>
      <sec id="sec-1-2">
        <title>2.2. Dataset Construction and Labeling</title>
        <p>The dataset is constructed from raw CAN messages and fault logs collected continuously from a fleet
of industrial vehicles. To obtain a structured learning dataset suitable for PdM, we transform the raw
event streams into fixed-length temporal windows aligned on engine operating time.
Windowing and aggregation. For each vehicle, CAN messages are grouped into consecutive,
nonoverlapping windows of duration  , expressed in engine hours. Within each window, multivariate CAN
signals are aggregated using summary statistics such as minimum, maximum, and average, resulting in
a fixed-length representation that summarizes signal behavior over that interval. This process removes
ifne-grained, within-window temporal ordering, but preserves the temporal sequence at the level of
windows.</p>
        <p>Fault filtering and consolidation. Fault events are extracted from vehicle diagnostic logs and serve
as operational ground-truth labels for fault occurrence. These events are mapped to predefined fault
definitions, which include (i) system-reported diagnostic trouble codes generated by onboard vehicle
controllers and (ii) data-driven fault indicators derived from CAN signals using predefined rules based
on domain expertise. To avoid duplicated fault registrations caused by repeated triggering of the same
condition, consecutive occurrences of the same fault are filtered out. This step ensures that each fault
corresponds to a distinct underlying event rather than repeated notifications of an already active issue.
Label assignment and prediction horizon. Each time window is labeled based on the occurrence
of fault events—defined from vehicle diagnostic logs and domain-informed fault definitions—within
a future prediction horizon of length  , expressed in engine hours. Windows are assigned to one of
three classes: normal, pre-faulty, or faulty. A window is labeled as faulty if it coincides with a fault
occurrence, pre-faulty if it is not faulty and a fault event occurs within the next  engine hours following
the window, and normal otherwise. This labeling scheme explicitly targets early fault prediction, which
is the primary objective in predictive maintenance, while preventing information leakage from future
observations.</p>
        <p>Raw CAN</p>
        <sec id="sec-1-2-1">
          <title>Window Aggregation</title>
        </sec>
        <sec id="sec-1-2-2">
          <title>Label Assignment</title>
          <p>Engine Hours
1</p>
          <p>Engine Hours</p>
          <p>Engine Hours
Engine Hours</p>
          <p>Engine Hours</p>
          <p>Engine Hours
A
l
a
n
g
i
S
B
l
a
n
g
i
S
C
l
a
n
g
i
S</p>
          <p>Engine Hours</p>
          <p>Fault</p>
          <p>Engine Hours</p>
          <p>Window</p>
          <p>Engine Hours
Pre-Faulty Window
Faulty Window</p>
          <p>Figure 1 provides an overview of the complete data processing pipeline, illustrating how decoded
CAN signals represented as multivariate time series are transformed into window-level samples with
aggregated features and future fault labels. Overall, this process results in a window-level dataset designed
A
l
a
n
g
i
S
B
l
a
n
g
i
S
C
l
a
n
g
i
S</p>
          <p>A
l
a
n
g
i
S
B
l
a
n
g
i
S
C
l
a
n
g
i
S
to reflect realistic industrial constraints, including sparse fault events, noisy sensor measurements, and
long operational horizons.</p>
        </sec>
      </sec>
      <sec id="sec-1-3">
        <title>2.3. Our Case Study: Engine-Related Fault</title>
        <p>This study considers a single fault prediction task related to engine operation. The fault is defined by an
automotive domain expert and is directly related to two CAN signals, while being indirectly reflected
in other engine-related signals.</p>
        <p>The fault labels are treated as ground truth and are used for supervised training and evaluation of all
models.</p>
      </sec>
      <sec id="sec-1-4">
        <title>2.4. Comparison with Public Benchmarks</title>
        <p>We compare the characteristics of our industrial fleet-scale dataset with other publicly available
predictive maintenance benchmarks, i.e., SCANIA APS, SCANIA Component X, and MetroPT, to highlight
the unique characteristics and challenges of our scenario (Table 1).</p>
        <p>Unlike SCANIA APS, which provides tabular snapshots of operational states with a binary failure
label, our dataset captures high-frequency operational signals over multiple years, enabling longitudinal
modeling of component degradation. Compared to SCANIA Component X, which reports component
failures using a time-to-event (TTE) format and contains multivariate features represented as histograms
and numerical counters with irregular readout intervals, our dataset contains multiple diagnostic trouble
codes (DTCs) per vehicle, reflecting a broader and more complex set of failure modes. While MetroPT
ofers high-frequency raw signals at 1 Hz, it is limited to a single vehicle with only a few documented
failures, making it unsuitable for fleet-scale predictive modeling.</p>
        <p>Our industrial dataset is derived from a real operational environment with heterogeneous vehicles,
variable usage patterns, and diverse operating conditions. This introduces challenges not present in
public benchmarks, such as imbalanced failure occurrences, uneven monitoring frequencies across
vehicles, and high-dimensional feature representations that include both accumulated counters and
multi-dimensional signals. Consequently, existing solutions tested on public datasets cannot be directly
applied, as they often rely on simplified assumptions about failure distributions, unit homogeneity, or
event sparsity.</p>
        <p>Overall, the combination of fleet-scale coverage, heterogeneous high-frequency signals, and a complex
failure landscape makes our dataset a challenging benchmark for predictive maintenance, bridging the
gap between publicly available datasets and real-world industrial scenarios.
Feature
Projection
(MLP)</p>
        <p>x
CLSInjection&amp;
PositionalEncoding</p>
        <p>Add &amp; Norm</p>
        <p>Feed
Forward
Add &amp; Norm
Multi-Head
Attention</p>
        <p>CLS
Token
Output
Attention
Pooling</p>
        <p>Feature
Fusion
(MLP)</p>
        <p>Linear</p>
        <p>Sigmoid</p>
        <p>Output
Probability
projected into a latent space, enriched with positional encodings, and processed by a transformer
encoder stack. Global (CLS-based) and local (attention-pooled) representations are fused to produce
fault predictions.</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>3. Methodology</title>
      <sec id="sec-2-1">
        <title>3.1. Problem Formulation</title>
        <p>This section presents the proposed framework for fault prediction from multivariate CAN time series.
Let X = {x1, x2, … , x } denote a multivariate CAN time series segment of length  , where x ∈ ℝ
represents  sensor measurements observed at time step  . The objective is to predict whether a fault
will occur within a predefined future horizon based on the observed sequence.</p>
        <p>We formulate this task as a binary classification problem:
 ∶ ℝ  ×</p>
        <p>
          → [
          <xref ref-type="bibr" rid="ref1">0, 1</xref>
          ],  =  ( X),
where  is the probability of a fault within a predefined horizon. The problem features
highdimensional, noisy signals and strong class imbalance.
3.2. Overview of the CAN-Transformer Framework
The proposed framework, namely CAN-Transformer, is a transformer-based model designed to capture
long-range temporal dependencies and cross-signal interactions in multivariate CAN data. Figure 2
illustrates the overall pipeline.
        </p>
        <p>
          The architecture is inspired by transformer models in NLP, particularly BERT [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ], and prior
applications of BERT-style architectures to CAN data, such as CAN-BERT [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]. Unlike these works, which
target intrusion detection on CAN ID sequences using unsupervised pretraining, our model addresses
supervised fault prediction on rich multivariate sensor sequences.
        </p>
        <p>Input Representation</p>
        <p>Each sensor channel is standardized ofline via z-score normalization, then
projected into a latent space through a lightweight MLP:
x̃ =
x − 

h(0) = MLP(x̃ ),</p>
        <p>aligning heterogeneous signals for attention-based processing.</p>
        <p>Sequence Encoding</p>
        <p>Following the BERT paradigm, a learnable classification token (CLS) is
prepended to the input sequence, and positional embeddings are added to preserve temporal order:</p>
        <p>H(0) = [h ; h(10); … ; h(0)] + P,
where P denotes the positional encoding matrix. This step ensures that the model can both summarize
the sequence via the CLS token and capture the relative position of each timestep in the input.
(1)
(2)
(3)
H̃(ℓ) = LayerNorm(H(ℓ−1) + MHA(H(ℓ−1))),</p>
        <p>H(ℓ) = LayerNorm(H̃(ℓ) + FFN(H̃(ℓ))).</p>
        <p>This architecture enables the model to capture long-range temporal dependencies and complex
interactions between sensor channels.</p>
        <p>Dual-Representation Fusion
nated:</p>
        <p>Global (CLS) and local (attention-pooled) representations are
concate(4)
(5)
(6)
(7)
(8)
(9)
h 
= ([ h</p>
        <p>; h ]),
 =  (</p>
        <p>w⊤hfused + ),
where w ∈ ℝ model and  ∈ ℝ are trainable parameters optimized end-to-end.</p>
        <p>where  is a linear transformation with normalization and nonlinearity.</p>
        <p>Prediction Head The fused sequence representation hfused ∈ ℝ model is mapped to a scalar logit and
passed through a sigmoid to produce a fault probability:
Transformer Encoder The encoded sequence is processed by a stack of transformer encoder layers.
Each layer consists of a multi-head self-attention module followed by a position-wise feedforward
network, with residual connections and layer normalization:</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>4. Experimental Setup</title>
      <p>
        This section describes how the proposed model and baselines are trained and evaluated. It also defines
the evaluation protocol and metrics used to assess PdM performance in industrial vehicle scenarios.
Loss choice The fault prediction task exhibits severe class imbalance, as faulty periods are much
rarer than normal periods. To address this, we employ an Adaptive Focal Loss, which extends standard
Focal Loss [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] by using class-specific weighting to emphasize positive (faulty) examples while focusing
training on hard-to-classify samples.
      </p>
      <p>The loss is defined as:</p>
      <p>ℒ = −  (1 −   ) log(  ),
where   is the predicted probability for the true class, and   is a per-sample weight that depends on
the class:</p>
      <p>=  pos ⋅ target +  neg ⋅ (1 − target),
with the positive class corresponding to faulty periods and the negative class to normal periods. The
(1 −   ) term ensures that misclassified or uncertain examples contribute more to the loss, allowing the
model to better learn rare and dificult fault events.</p>
      <p>The model is trained using gradient-based optimization with regularization techniques such as early
stopping and learning rate scheduling to ensure stable convergence and prevent overfitting.
Inference At inference, CAN data is segmented into fixed-length sequences and processed by the
trained model to produce fault probabilities, enabling continuous monitoring and early warnings.</p>
      <sec id="sec-3-1">
        <title>4.1. Training and Validation Procedure</title>
        <p>The dataset is divided to ensure temporal and vehicle-level separation. First, 80% of the time windows
(earlier operational periods) form the training set, and the remaining 20% (later periods) form the test
set.</p>
        <p>Second, the training set is split by vehicle: 80% of them are used for training and the remaining 20%
for validation. This ensures validation evaluates the model’s ability to generalize to unseen vehicles
while preventing temporal leakage.</p>
        <p>Preprocessing includes normalization of sensor signals, handling missing values, and feature
aggregation over temporal windows.</p>
        <p>The validation set is used to optimize model hyperparameters, select decision thresholds for fault
detection, and implement early stopping during training. This procedure ensures that the model
performance reported on the test set reflects generalization to unseen periods.</p>
      </sec>
      <sec id="sec-3-2">
        <title>4.2. Baselines and Comparisons</title>
        <p>To evaluate the performance of the proposed model, we compare it against a diverse set of classical and
modern baseline models. These include: Neural Networks (Multi-Layer Perceptron, MLP); Tree-Based
Models (Decision Tree, Random Forest); Support Vector Machines (Linear and RBF SVM); Probabilistic
and Generative Models (Gaussian Process Classifier, Quadratic Discriminant Analysis, QDA, and Naive
Bayes); and a heuristic baseline called CAN Over Threshold, which flags a fault whenever selected
CAN signals exceed thresholds. The threshold hyperparameter was tuned using the validation set to
maximize the F1 score.</p>
        <p>All classical machine learning models are implemented using default scikit-learn1 hyperparameters,
except where otherwise noted. All baselines are trained and evaluated using the same dataset and
preprocessing pipeline as the proposed CAN-Transformer to ensure a fair and reproducible comparison.</p>
      </sec>
      <sec id="sec-3-3">
        <title>4.3. Evaluation Metrics</title>
        <p>In line with the predictive maintenance objective, all models are trained and evaluated to detect
prefaulty periods, i.e., periods preceding an actual fault occurrence, rather than the identification of faults
at the moment they occur, which aligns with the practical goals of predictive maintenance.</p>
        <p>Performance is evaluated using standard classification metrics: Precision (P), Recall (R), and F1-score
(F1). Since we consider a single fault type, these metrics are computed with respect to the positive class,
corresponding to pre-faulty predictions.</p>
        <p>Evaluation is conducted at the period level to reflect the predictive maintenance objective of early
fault detection. Consecutive temporal windows are grouped into fixed-length periods of duration
 , where  is an integer multiple of the window length  (i.e.,  =  for some integer  ). Model
predictions are aggregated at the period level. A period is labeled as pre-faulty if it occurs within the
prediction horizon preceding a fault event, and as faulty if it coincides with a fault occurrence; all
remaining periods are labeled as normal. By construction, each faulty period is preceded by a pre-faulty
period, while normal periods may occur before a pre-faulty period or after a faulty one.</p>
      </sec>
      <sec id="sec-3-4">
        <title>4.4. Model Hyperparameters</title>
        <p>The CAN-Transformer is trained using the following hyperparameter settings, selected based on
validation performance to maximize period-level F1-score:
• Architecture: number of encoder layers = 2, hidden size ( model) = 128, number of attention
heads = 4, feedforward size ( f ) = 256, dropout = 0.2, maximum sequence length = 1024, input
sequence length = 32.</p>
        <p>Model Size and Hyperparameter Considerations The CAN-Transformer is intentionally designed
to be lightweight, balancing model capacity with the risk of overfitting given the limited number
of labeled fault events. Hyperparameters were tuned empirically by observing performance on the
validation set, ensuring the model is both compact enough for eficient training and inference, and
suficiently expressive to capture essential temporal dependencies and cross-signal interactions in
multivariate CAN data.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>5. Results and Discussion</title>
      <p>This section presents the period-level performance of the proposed CAN-Transformer model compared
to a comprehensive set of classical and heuristic baselines on the dataset described in Section 2, restricted
to failure-positive vehicles.</p>
      <p>Table 2 shows that the CAN-Transformer achieves the highest F1-score, reflecting a strong balance
between precision and recall. Classical models, such as Random Forests, SVMs, and Gaussian Processes,
often attain near-perfect recall but extremely low precision, resulting in numerous false positives.
Heuristic approaches (CAN Over Threshold) improve precision but are limited in overall F1-score, while
simple models like MLP achieve slightly higher precision at the cost of much lower recall, meaning
many faults would be missed in practice.</p>
      <p>From an industrial perspective, both types of errors carry significant costs: false positives trigger
unnecessary inspections and maintenance, increasing operational downtime and labor expenses, while
false negatives risk missing real faults, potentially causing equipment damage and unplanned repairs.
The CAN-Transformer mitigates these issues by leveraging long-range temporal dependencies and
cross-signal interactions in high-dimensional, noisy CAN bus data. Its combination of high recall and
moderate precision provides a practical trade-of, ensuring faults are rarely missed while keeping false
alarms manageable.</p>
      <p>These results demonstrate the efectiveness of attention-based sequence modeling for industrial
vehicle fault detection, where both operational cost and fault risk must be carefully balanced.</p>
    </sec>
    <sec id="sec-5">
      <title>6. Limitations</title>
      <p>While this work provides insights into predictive maintenance on industrial fleets, several limitations
should be acknowledged:
• Data limitations: The dataset, although large and high-frequency, contains noise due to sensor
errors, device misconfigurations, and irregular sampling. Pre-fault periods are estimated from
operational data rather than directly observed, so actual fault progression may difer. Some variables
are aggregated or anonymized, limiting the interpretability of individual signals. Additionally,
the dataset reflects a specific industrial environment, which may reduce the generalizability of
ifndings to other vehicle types or operational contexts.
• Evaluation constraints: Ground-truth labels for pre-fault periods are inherently uncertain,
and DTCs or maintenance records may not fully capture the onset or severity of faults. This
introduces ambiguity into model evaluation, and reported metrics should be interpreted with this
context in mind.
• Generality and external validity: While comparisons with public benchmarks highlight the
challenges of fleet-scale industrial scenarios, the findings may not generalize to datasets with
diferent types of vehicles, operational patterns, or sensor configurations. Additionally, some
public datasets provide only simplified or snapshot data, which limits direct comparison of
modeling performance.
• Technical and operational constraints: Variability in vehicle usage, maintenance schedules,
and ECU software versions introduces heterogeneity in the data that may afect model
performance. Models developed under these conditions may require adaptation before deployment in
other fleets.</p>
      <p>Acknowledging these limitations emphasizes the need for careful interpretation of the results and
the continued development of robust predictive maintenance methods capable of handling noisy,
heterogeneous, and high-dimensional industrial data.</p>
    </sec>
    <sec id="sec-6">
      <title>7. Conclusion &amp; Future Work</title>
      <p>In this work, we presented CAN-Transformer, a transformer-based architecture for predictive
maintenance on multivariate vehicle CAN bus data. Evaluated on real-world fleet data from thousands
of commercial vehicles over several years, the model captures complex temporal dependencies and
cross-signal interactions, enabling accurate fault prediction under challenging industrial conditions.
CAN-Transformer outperforms classical machine learning baselines and heuristic approaches, achieving
a balanced trade-of between precision and recall that is critical for practical deployment, where both
false alarms and missed faults carry operational costs.</p>
      <p>Future work includes extending the framework to multi-fault and multi-horizon prediction to improve
early warning across diverse vehicle systems, integrating temporal attention explainability to aid
operator interpretation, investigating transfer learning across diferent vehicle types or fleets to enhance
generalization, and exploring online and continual learning strategies to adapt to evolving vehicle
behavior and operational conditions in real time.</p>
      <p>Overall, CAN-Transformer demonstrates that transformer-based sequence modeling is a powerful
tool for industrial predictive maintenance, providing robust, interpretable, and scalable fault detection
in complex, large-scale vehicular environments.</p>
    </sec>
    <sec id="sec-7">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the authors used GPT-5.2 in order to: proofreading and formatting
assistance. After using this tool, the authors reviewed and edited the content as needed and take full
responsibility for the publication’s content.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Lei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Guo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Yan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <article-title>Machinery health prognostics: A systematic review from data acquisition to rul prediction</article-title>
          ,
          <source>Mechanical Systems and Signal Processing</source>
          <volume>104</volume>
          (
          <year>2018</year>
          )
          <fpage>799</fpage>
          -
          <lpage>834</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>S.</given-names>
            <surname>Buccafusco</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Megaro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Cagliero</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Vaccarino</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Salvatori</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Loti</surname>
          </string-name>
          ,
          <article-title>Profiling industrial vehicle duties using CAN bus signal segmentation and clustering</article-title>
          , in: C.
          <string-name>
            <surname>Costa</surname>
          </string-name>
          , E. Pitoura (Eds.),
          <source>Proceedings of the Workshops of the EDBT/ICDT 2021 Joint Conference</source>
          , Nicosia, Cyprus, March
          <volume>23</volume>
          ,
          <year>2021</year>
          , volume
          <volume>2841</volume>
          <source>of CEUR Workshop Proceedings, CEUR-WS.org</source>
          ,
          <year>2021</year>
          . URL: https://ceur-ws.
          <source>org/</source>
          Vol-2841/DARLI-AP_
          <article-title>7</article-title>
          .pdf.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>R. K.</given-names>
            <surname>Mobley</surname>
          </string-name>
          , An Introduction to Predictive Maintenance, Plant Engineering, second edition ed.,
          <string-name>
            <surname>Butterworth-Heinemann</surname>
          </string-name>
          , Burlington,
          <year>2002</year>
          . URL: https://www.sciencedirect.com/book/ 9780750675314/an-introduction
          <article-title>-to-predictive-maintenance.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>S.</given-names>
            <surname>Buccafusco</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Cagliero</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Megaro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Vaccarino</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Loti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Salvatori</surname>
          </string-name>
          ,
          <article-title>Learning industrial vehicles' duty patterns: A real case</article-title>
          ,
          <source>Comput. Ind</source>
          .
          <volume>145</volume>
          (
          <year>2023</year>
          )
          <article-title>103826</article-title>
          . URL: https://doi.org/10.1016/j. compind.
          <year>2022</year>
          .
          <volume>103826</volume>
          . doi:
          <volume>10</volume>
          .1016/J.COMPIND.
          <year>2022</year>
          .
          <volume>103826</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>APS</given-names>
            <surname>Failure at Scania</surname>
          </string-name>
          <string-name>
            <surname>Trucks</surname>
          </string-name>
          ,
          <source>UCI Machine Learning Repository</source>
          ,
          <year>2016</year>
          . DOI: https://- doi.org/10.24432/C51S51.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Kharazian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Lindgren</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Magnússon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Steinert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O. A.</given-names>
            <surname>Reyna</surname>
          </string-name>
          ,
          <article-title>SCANIA Component X dataset: a real-world multivariate time series dataset for predictive maintenance</article-title>
          ,
          <source>Scientific Data</source>
          <volume>12</volume>
          (
          <year>2025</year>
          )
          <article-title>493</article-title>
          . doi:
          <volume>10</volume>
          .1038/s41597- 025- 04802- 6.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>B.</given-names>
            <surname>Veloso</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. P.</given-names>
            <surname>Ribeiro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gama</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. M.</given-names>
            <surname>Pereira</surname>
          </string-name>
          ,
          <article-title>The metropt dataset for predictive maintenance</article-title>
          ,
          <source>Scientific Data</source>
          <volume>9</volume>
          (
          <year>2022</year>
          )
          <fpage>764</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>J.</given-names>
            <surname>Devlin</surname>
          </string-name>
          , M.-
          <string-name>
            <given-names>W.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Toutanova</surname>
          </string-name>
          , BERT:
          <article-title>Pre-training of deep bidirectional transformers for language understanding</article-title>
          , in: J.
          <string-name>
            <surname>Burstein</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Doran</surname>
          </string-name>
          , T. Solorio (Eds.),
          <source>Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</source>
          , Volume
          <volume>1</volume>
          (Long and Short Papers),
          <source>Association for Computational Linguistics</source>
          , Minneapolis, Minnesota,
          <year>2019</year>
          , pp.
          <fpage>4171</fpage>
          -
          <lpage>4186</lpage>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>N19</fpage>
          - 1423.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>N.</given-names>
            <surname>Alkhatib</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mushtaq</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Ghauch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.-L.</given-names>
            <surname>Danger</surname>
          </string-name>
          ,
          <article-title>Can-bert do it? controller area network intrusion detection system based on bert</article-title>
          ,
          <source>in: Proc. IEEE/ACS AICCSA</source>
          ,
          <year>2022</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>8</lpage>
          . doi:
          <volume>10</volume>
          .1109/ AICCSA56895.
          <year>2022</year>
          .
          <volume>10017800</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>T.-Y. Lin</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Goyal</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Girshick</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>He</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Dollár</surname>
          </string-name>
          ,
          <article-title>Focal loss for dense object detection</article-title>
          ,
          <source>IEEE Transactions on Pattern Analysis and Machine Intelligence</source>
          <volume>42</volume>
          (
          <year>2020</year>
          )
          <fpage>318</fpage>
          -
          <lpage>327</lpage>
          . doi:
          <volume>10</volume>
          .1109/TPAMI.
          <year>2018</year>
          .
          <volume>2858826</volume>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>