<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Online Learning Supported by Foundation Models for Anomaly Detection in Industrial Settings</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Aurora Esteban Toscano</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sepideh Pashami</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Felix Nilsson</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Luka Smeets</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sławomir Nowaczyk</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Center for Applied Intelligent Systems Research (CAISR), Halmstad University</institution>
          ,
          <addr-line>Halmstad</addr-line>
          ,
          <country country="SE">Sweden</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>HMS Industrial Networks AB</institution>
          ,
          <addr-line>Halmstad</addr-line>
          ,
          <country country="SE">Sweden</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>HQ RAXTAR</institution>
          ,
          <addr-line>Veldhoven</addr-line>
          ,
          <country country="NL">The Netherlands</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2026</year>
      </pub-date>
      <abstract>
        <p>Modern industrial monitoring systems must detect anomalies in real time under evolving operating conditions and without reliance on labeled data. Traditional online anomaly detectors ofer fast adaptation but struggle when normal behavior shifts or when rare anomalies are unintentionally learned as normal. On the other side, recently introduced foundation models for time series capture richer structure but are computationally expensive for continuous deployment. We propose a dual-learner anomaly detection framework that bridges a fast online learner based on Half-Space Trees with a time-series foundation model (MOMENT) acting as a background learner. A confidence-based routing mechanism determines, for each incoming instance, whether to trust the online model, defer to the foundation model, or combine both through confidence-weighted ensembling. The confidence estimation method is fully unsupervised and robust to drift, requiring no labels or sliding windows. We validate the approach on two real-world elevator (hoist) installations, demonstrating that the system operates eficiently in streaming conditions and matches or surpasses strong online baselines. Furthermore, we show that fine-tuning the foundation model on one installation provides measurable performance gains when transferred to a diferent installation, indicating that foundation-model adaptation can support cross-site knowledge transfer in industrial monitoring. The results highlight the promise of integrating online learning with foundation models to achieve both responsiveness and robustness in long-term industrial anomaly detection.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Streaming Machine Learning</kwd>
        <kwd>Time series analysis</kwd>
        <kwd>Transfer learning</kwd>
        <kwd>Anomaly Detection</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Modern industrial systems generate large volumes of sensor measurements at high frequency, often
under continuously evolving operating conditions. In these environments, online learning is particularly
attractive because models must process data as it arrives, without relying on long-term data storage or
expensive ofline retraining [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. At the same time, real-world deployment constraints require minimal
supervision, fast adaptation, and robustness to distributional shifts.
      </p>
      <p>
        A central challenge in this context is anomaly detection [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Malfunctions are rare by definition,
leading to an extreme imbalance where only normal data is abundant. Furthermore, labels for anomalous
events are typically unavailable or arrive with significant delay, making fully supervised learning
impractical. As a result, semi-supervised or unsupervised approaches—particularly those that learn
normal behavior and detect deviations—are widely used. Common online anomaly detectors such
as statistical models and incremental tree ensembles (e.g., Half-Space Trees) ofer fast inference and
incremental updates, but their performance degrades in complex industrial environments due to high
dimensionality, varying anomaly signatures, and model contamination when anomalous instances are
unintentionally incorporated into training [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>
        To address these limitations, we propose a dual-learner system that bridges fast online anomaly
detection with the representational capacity of a time-series foundation model. Our design is inspired
by the “fast and slow thinking” framework from cognitive psychology [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]: a lightweight, adaptive
online model provides rapid decisions under normal conditions (“think fast” ), while a more expressive
but computationally heavier background model (“think slow” ) is invoked when additional reasoning is
needed. Specifically, we combine:
• Half-Space Trees [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] as an eficient online anomaly detector capable of processing high-speed
industrial data streams.
• MOMENT [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], a large-scale foundation model for time series analysis, used as a background
model to refine uncertain predictions and detect subtle temporal deviations.
      </p>
      <p>A confidence-driven routing mechanism governs the interaction between the two learners. The
online detector makes the initial prediction and assesses its reliability. If the confidence is low, the
system queries the foundation model. If both models exhibit uncertainty, their anomaly scores are
combined via confidence-weighted ensembling. This design enables real-time operation under limited
compute while leveraging the generalization ability of a large pre-trained model when necessary.</p>
      <p>A motivating application for this work is the monitoring of industrial elevator (hoist) systems used
in high-rise construction projects. These systems operate under rapidly evolving mechanical and
environmental conditions, leading to non-stationary sensor distributions and a mixture of subtle and
abrupt anomalies. Ride durations, loading patterns, and vibration signatures vary throughout the
construction process, making it dificult for purely online or purely ofline approaches to remain reliable
over long periods. This domain highlights the need for a method that combines (i) fast
instance-byinstance adaptation to local changes and (ii) stable, domain-general representations that are robust to
drift and noise. While our proposed architecture is general and applicable to any multivariate
timeseries stream, the industrial hoist scenario provides a concrete real-world environment in which these
challenges—and the benefits of our dual-learner design—naturally arise. Thus, we evaluate our system
on real industrial elevator monitoring data across two deployment projects. Additionally, we study the
efect of fine-tuning the foundation model on data from the first project and transferring the adapted
model to the second project. Our results demonstrate that foundation model adaptation improves
performance and robustness of the online learner through indirect knowledge transfer, opening a new
direction for integrating online learning and foundation models in industrial applications.</p>
      <p>This work makes the following contributions:
• We introduce a dual-learner anomaly detection architecture that integrates a fast online model
with a foundation time-series model via confidence-based routing.
• We propose a dual confidence estimation method that is fully unsupervised and allows reliable
model-switching under scarce anomalies based on the consistency of the predictions produced
by both models.
• We demonstrate that fine-tuning the foundation model in one industrial project improves online
anomaly detection in a second project, revealing a pathway for domain-level transfer learning in
streaming applications.
• We validate the proposed approach on real-world industrial systems with high-rate streaming
sensor data.</p>
      <p>The rest of the paper is organized as follows. Section 2 reviews the related work on foundation
models for time series and previous approaches in the field of online anomaly detection. Section 3
presents the proposed dual-learner architecture. Section 4 presents the case study used for validate the
proposal. Finally, Section 5 summarizes conclusions and future work.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <sec id="sec-2-1">
        <title>2.1. Online Anomaly Detection in Data Streams</title>
        <p>
          Online anomaly detection in data streams focuses on learning models incrementally while adapting to
evolving data distributions. Among the most influential approaches are ensemble-based tree models
designed for non-stationary data. Half-Space Trees (HST) [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] introduced a streaming extension of
isolation-based anomaly detection, where random projections define partitions that adapt to new data
without explicit retraining. Similarly, Robust Random Cut Forest (RRCF) [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] uses randomized
partitioning trees to detect points that cause large changes in the model structure, emphasizing robustness to
high-dimensional and noisy data streams. StreamRHF [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] extends this line by incorporating adaptive
memory management to maintain bounded model size. More recently, streaming variants of Isolation
Forest have been proposed. Online Isolation Forest (OIF) [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] updates isolation trees incrementally
using reservoir-based subsampling to preserve model diversity, while Streaming Isolation Forest (SIF)
[
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] focuses on maintaining statistical representativeness under drift by integrating age-based sample
replacement. These models provide fast predictive performance, but their learning dynamics can
inadvertently learn the abnormal behavior as the concept of “normal” shifts over time.
        </p>
        <p>
          Hybrid systems combining streaming learners with ofline or historical models have also been
explored. For example, the work of [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] presents a hybrid system where an ofline model retains
general characteristics (a bias) and an online model continuously learns and adapts. Other approaches
[
          <xref ref-type="bibr" rid="ref12">12</xref>
          ] explore unsupervised online learning methods, which are inherently reactive but typically still
need a "warm-up" or initial training phase, or rely on a dynamically adapting threshold (like
quantilebased filters). Particularly in industrial settings, several approaches include periodically retraining
autoencoders or clustering models on bufered data [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ].
        </p>
        <p>Overall, these systems typically rely on scheduled batch retraining or assume stable long-term
distributions, limiting their responsiveness in complex industrial processes. In contrast, our approach
integrates an online detector with a foundation model in a fully instance-based, confidence-driven
routing scheme that requires no batching, no labels, and no hold-out reference windows.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Foundation Models for Time Series</title>
        <p>
          Foundation models for time series have recently emerged as a parallel to large pretrained models
in vision and language. MOMENT [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] demonstrated that transformer-based architectures trained
on large-scale multivariate time series collections can be reused for reconstruction and forecasting
across domains, supporting zero-shot and fine-tuned anomaly detection. MOMENT introduced
chunkbased temporal tokenization to enable eficient learning of long-range dependencies while maintaining
moderate computation cost. Subsequent work has expanded foundation models to improve eficiency
and transferability. Tiny Time Mixers (TTM) [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ] proposed a lightweight architecture based on TSMixer
with adaptive patching and multi-resolution sampling. TTM models achieve strong zero- and few-shot
performance while remaining computationally eficient enough for CPU-only deployments, addressing
one of the main deployment barriers of foundation models. Likewise, Mantis [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ] focuses on time series
classification, employing a ViT-based encoder trained with contrastive learning and equipped with
channel-adaptive adapters to reduce fine-tuning cost and improve calibration, highlighting the growing
interest in generalizable, reusable time-series backbones.
        </p>
        <p>Unlike forecasting- or classification-oriented foundation models, our use case requires instance-level
anomaly reconstruction under streaming conditions. The proposed dual-learner framework leverages
the high-level temporal priors encoded in MOMENT while retaining the flexibility and adaptability of
online learners. To our knowledge, this is the first work to integrate a time-series foundation model
directly within a real-time, unsupervised anomaly detection and routing mechanism.</p>
        <p>Tabular
representation</p>
        <p>Online model
…</p>
        <p>New observation xt</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Online Learning Supported by a Foundation Model for Anomaly</title>
    </sec>
    <sec id="sec-4">
      <title>Detection</title>
      <p>We propose a dual-learner anomaly detection framework that integrates a fast online detector with a
foundation model for time series as shown in Fig. 1. The system operates continuously on streaming data
and adapts its prediction strategy based on model confidence. The architecture consists of three main
components: (1) an online learner based on Half-Space Trees for real-time detection, (2) a background
learner based on the MOMENT foundation model for deep representation-driven anomaly scoring, and
(3) a confidence-based routing mechanism that dynamically selects or combines predictions from both
learners.</p>
      <p>We assume a streaming setting in which each incoming instance  is a multivariate time series of 
dimensions and variable length . To support both fast online processing and deep temporal modeling,
the architecture uses two complementary representations: (i) a compact tabular descriptor derived from
statistical and spectral features, and (ii) the raw or lightly preprocessed time-series input consumed
directly by the foundation model. The processing pipeline for each new observation  ∈ R× is
therefore:
tab ← statistical descriptors of  
seq ←  padded to nearest 8 to match MOMENT’s patch embedding</p>
      <p>The proposed architecture is intentionally modular: each component addresses a specific limitation
of either pure online methods or pure foundation-model approaches. The online learner provides
fast streaming predictions; the foundation model contributes rich temporal priors; and the confidence
router manages when to query which model under a fixed computational budget. This modularity does
introduce several design choices (e.g., the online and background methods, the form of the confidence
metric, or routing thresholds), which we motivate in the following subsections.</p>
      <sec id="sec-4-1">
        <title>3.1. Online Learner: Half-Space Trees with Ensemble-Based Confidence</title>
        <p>
          The online learner is implemented using HST [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ], an incremental anomaly detection model designed
for streaming environments. HST recursively partition the feature space into nested half-spaces,
maintaining compact statistics that approximate data density in each region. Given an observation tab,
each tree in the model outputs a score (), and the overall anomaly score is the average.
        </p>
        <p>In this setting, we introduce a confidence metric that minimizes the computational load and maintains
the original approach of unsupervised learning since does not require labels. Our confidence metric
combines two complementary notions of agreement within the tree ensemble: (i) the variance of the
raw anomaly scores, and (ii) the entropy of the induced binary anomaly votes. We use the score variance
to capture continuous agreement: if all trees assign similar scores, we treat the prediction as reliable,
regardless of whether the absolute score is high or low. However, relying on variance alone can be
brittle when a few trees produce outlier scores due to local partitioning artifacts. To mitigate this, we
also consider the entropy of binary votes obtained by thresholding each tree’s score, since entropy is
insensitive to the scale of the scores and instead measures the robustness of the decision: if most trees
agree on either “normal” or “anomaly”, the entropy is low and confidence is high. This term stabilizes
the confidence estimate in regions where the variance may be inflated by a small subset of disagreeing
trees. However, it is highly sensitive to the choice of the anomaly threshold, that here we set in 0.5.</p>
        <p>We compute:</p>
        <p>Consistvar = 1 −
( ())
fm = exp(−(
fm
,1: ))</p>
        <p>Optionally, MOMENT may be fine-tuned on unlabeled data from similar domains, improving domain
alignment and enabling knowledge transfer to subsequent deployments.</p>
      </sec>
      <sec id="sec-4-2">
        <title>3.3. Confidence Router with Online Beta Distribution Fitting</title>
        <p>The router determines whether to trust the online learner (fast path), the foundation model (slow path),
or a weighted ensemble. For each model, we maintain a running estimate of its confidence distribution</p>
        <p>
          The final per-instance confidence o nline ∈ [
          <xref ref-type="bibr" rid="ref1">0, 1</xref>
          ] of the online model is obtained by averaging the
two previous consistency metrics to preserve each one’s benefits. Using only one of the terms would
make the confidence overly sensitive either to outlier scores (variance only) or to the particular choice
of the voting threshold (entropy only). Using the minimum of the two would be overly conservative
in practice, leading the router to treat many instances as low-confidence and unnecessarily query the
foundation model, increasing computational load without clear gains in detection quality.
        </p>
        <p>Finally, since the online score is computed for all the instances in the data stream, we track reliability
over time applying an Exponential Moving Average (EMA) approach:
o nline =  ·  o−1nline + (1 − ) · Consistency</p>
        <p />
        <p>() = 1[() &gt;  ],  =
Consistentr = 1 + ( log2  + (1 − ) log 2(1 − ))
1 ∑︁ (),
 =1</p>
      </sec>
      <sec id="sec-4-3">
        <title>3.2. Background Learner: MOMENT Foundation Model</title>
        <p>
          The background learner is MOMENT [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ], a time-series foundation model pre-trained on large-scale
seq (padded to length divisible
multivariate sensor datasets. MOMENT consumes the raw time series 
by 8), producing a reconstruction ^seq.
        </p>
        <p>To estimate the confidence of the model, we apply perturbation-based analysis by injecting small
Gaussian noise into  − 1 perturbation copies of seq, producing  reconstructions in one parallel
forward pass, which does not increment the temporal complexity of the model with respect to produce
the reconstruction of the original time series. The confidence is obtained from the standard deviation
across the reconstruction errors, with low reconstruction variability meaning high confidence:
(1)
(2)
(3)
(4)
(5)
using the method of moments to fit a Beta(, ) distribution:
,</p>
        <p>From this Beta models, we compute a dynamic trust threshold for each model,  online and  fm
respectively, as the -th percentile:</p>
        <p>= Beta −1 (; , )
and selects predictions falling in the top- percentile region of each model’s own historical confidence
range. This happens because the routing rule is to always first obtain the output from the online model
(, online), which has as little complexity as is ( log ) for  trees. This confidence is evaluated

against its percentile, o nline &gt;  online, meaning that it is in the top X% relative to that model’s typical
confidence. If this condition is true, the model’s output is the online anomaly score o nline. Only when
this condition is not met, the router queries the background model, and its confidence is evaluated
against its percentile following the same principle: fm &gt;  fm. If this condition is true, the model’s
output is the background anomaly score fm. When none of the models achieve the desired percentile,
the final score is obtained through confidence-weighted ensembling:
where   is adapted to the relative confidence of the two learners:
 =  
online + (1 −</p>
        <p>)fm
  =
online

online + fm</p>
        <p>This makes the routing procedure scale-invariant across models and robust to drift; unsupervised
since it does not depend on labels; and eficient as does not store historical windows and has constant
memory and computation.</p>
        <p>However, the eficiency of the model depends heavily on the configuration of these thresholds. In
particular, the one associated to the online model,  online, that essentially controls how likely it is to
consult the background model with each new observation. Therefore, this parameter should be set in
relation to the speed of arrival of new instances, in a range that allows the model to minimize the calls
to the background model and avoid bottlenecks. For example, setting  online = 0.3 allows the model to
trust the top 70% of the observed confidences.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>4. Case Study: Industrial Hoist Installations</title>
      <sec id="sec-5-1">
        <title>4.1. Problem Description</title>
        <p>We evaluate the system on two distinct industrial elevator (hoist) installations in high-rise construction
environments. Each instance corresponds to a full hoist ride, represented both as a multivariate time
series and a derived feature vector for the online model. The details of each installation is presented in</p>
        <p>Across both installations, the average ride duration is approximately 25 seconds, but varies
substantially with construction progress, as the hoist load, travel height, and acceleration phases change over
time. Each ride provides two complementary data views, both of multivariate time series nature:
• Mechanical sensors related to vibration intensity and impulsiveness (-RMS, -Peak, -RMS),
crest factor (shock-dominated behavior) and operating temperature.
• Programmable Logic Controller (PLC) sensors that include motor electrical variables (current,
torque, frequency, voltage), mechanical variables (position, speed, brake state), and drive and
motor temperature estimates.
(8)
(6)
(7)
(9)
(10)
• Tabular descriptor for the online learner: The multivariate time series is summarized in statistical
features (mean, std, min, max, iqr), time-domain features (rms, skewness, kurtosis),
frequencydomain features (dominant, weighted average...), and trend features (trend slope and intercept).</p>
        <p>In total, the tabular representation of the series has 302 features.
• Raw multivariate time series for the background learner: The original sequence is injected for
reconstruction in the foundation model almost as it is, preserving all temporal dynamics. However,
because elevator rides are multivariate and nonstationary at startup and shutdown, we compute
anomaly score using the central 80% of the ride:
fm =</p>
        <p>1
0.8
⌊0.9⌋
∑︁
=⌊0.1⌋
‖, − ^ ,‖
2
where  is the ride length. This choice does not remove anomalous behavior, but instead addresses
two practical issues: (i) MOMENT reconstructions exhibit systematic edge artifacts due to
chunkbased patching and padding requirements, which inflate the reconstruction error at the beginning
and end of each sequence independently of the true system state. (2) The physical hoist presents
transient behavior during brake release, initial acceleration, and final deceleration. These short
intervals show large but normal fluctuations in vibration and torque, which would dominate
the reconstruction loss and lead to false positives without improving anomaly separability.
Importantly, only the foundation-model score applies this trimming; the online learner continues
to use a full representation of the ride. Thus, trimming reduces model-induced noise without
discarding diagnostically relevant information.</p>
        <p>After this, following MOMENT’s chunk-based transformer architecture, the sequence is
zeropadded to the nearest multiple of 8, and this extra portion is masked so it does not alter
errorreconstruction metric. We configure this metric as the Mean Squared Error (MSE).
(11)</p>
      </sec>
      <sec id="sec-5-2">
        <title>4.2. Experimental Settings</title>
        <p>
          The proposed dual-learner system is implemented in the CapyMOA streaming analytics framework [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ],
and integrates the MOMENT-base implementation provided by the authors [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. The experiments are
evaluated in a fully unsupervised, prequential (test-then-train) setting. Each ride is evaluated when it
arrives, producing an anomaly score before any model update. The true label, when available for ofline
inspection, is not used for training or thresholding. This evaluation protocol matches real deployment
conditions, where anomalies are rare, labels are delayed, and models must adapt online.
        </p>
        <p>
          We compare our method against widely used streaming anomaly detectors available in CapyMOA:
• Half-Space Trees (HST) [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]
• Robust Random Cut Forest (RRCF) [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]
No per-dataset hyperparameter optimization is performed. Instead, we use default configurations
and unify parameters where applicable (e.g., number of trees=25 in all cases) to ensure a fair and
deployment-realistic comparison. For our approach, the parameters related to HST and to MOMENT
stay the same as the ones proposed by the authors. Apart from these, the only relevant parameters are
the mentioned percentile-based confidence thresholds, that are set to  online = 0.1 and  fm = 0.3, and a
warm start parameter that controls that for the first  = 100 observations, the foundation model is
used until the online learner has built a stable model.
        </p>
      </sec>
      <sec id="sec-5-3">
        <title>4.3. Evaluating Scenarios</title>
        <p>Performance is assessed using ofline ground-truth inspection logs and event reports from maintenance
records, but these are never used during learning. Metrics reported include both anomaly ranking
quality (AUC-PR, AUC-ROC) and alarm eficiency (F1-score under fixed anomaly threshold at 0.5)
We evaluate our approach Dual Anomaly Detection (DAD) on two modes:
1. Zero-shot: The foundation model is used without adaptation. We refer to this as DADzs
2. Cross-Domain Fine-Tuning: The foundation model is fine-tuned on Installation A and tested
to Installation B. We refer to this as DADft
This setting tests whether knowledge gained from one hoist installation can improve anomaly detection
in another—an important capability for scalable industrial monitoring systems.</p>
      </sec>
      <sec id="sec-5-4">
        <title>4.4. Experimental Results</title>
        <p>Table 2 summarizes the performance of all methods across both hoist installations. Several trends
highlight the strengths of the proposed dual-learner system and the role of cross-installation adaptation.
First, the zero-shot configuration DAD zs achieves competitive performance on both installations,
matching or surpassing the best-performing baselines. On Installation A, it provides the highest
AUCPR (0.1933) and AUC-ROC (0.6148), with a slight improvement in F1-score over the nearest alternative
(HST). On Installation B, DADzs achieves performance comparable to the best baseline configurations.
This suggests that the representation and reconstruction mechanisms of MOMENT in combination
with the online learner generalize to new data distributions without explicit model tuning.</p>
        <p>However, the benefits of cross-domain fine-tuning are especially visible in the second installation.
DADft, which incorporates adaptation from Installation A to Installation B, improves both AUC-PR (from
0.3454 to 0.3591) and AUC-ROC (from 0.4670 to 0.5018). This improvement is especially relevant because
Installation B has a higher anomaly ratio (36% vs. 15%): The presence of more operational irregularities
increases the variance of the reconstruction error distributions, making threshold selection more
challenging for simpler detectors, but fine-tuning on normal data from previous projects recalibrates
the reconstruction baseline toward the characteristics of the new system, improving separability of
anomalous behavior. This supports the hypothesis that foundation models for industrial time-series
benefit from lightweight adaptation rather than from being applied in a completely zero-shot manner.</p>
        <p>In terms of execution time, methods such as HST and SIF remain the fastest, making them suitable
for lower-resource deployments. However, their performance is consistently lower, particularly in
Installation A, where they present limited ability to capture subtle deviations. In contrast, classical
random-cut forest methods (RRCF, StreamRHF) show significantly higher computational cost (up to
160k seconds in the largest setting), making them impractical for real-time monitoring. DADzs and
DADft fall between these two extremes. While their execution times (55–180 seconds) are higher than
the simplest models, they remain easily compatible with real-time operation, since processing a ride
occurs after acquisition and well within cycle times of industrial maintenance systems. Crucially, the
added computational cost translates into improved modeling of temporal dynamics and more accurate
anomaly scoring.</p>
        <p>The comparison between installations underlines the diferences in data complexity. Installation B has
nearly three times more rides and more anomalies, both due to more prolonged operation and evolving
mechanical wear during the later construction phase. Models that rely solely on online learning can
struggle to maintain precision as the normal behavior drifts. The dual-learner architecture mitigates
this through continual online adaptation while still preserving a stable baseline representation via the
foundation model.</p>
        <p>Finally, it is important to note that the ground-truth labels used for evaluation in this real-world case
study derive from maintenance reports and operator annotations, which do not always correspond to
precise or isolated temporal events. In many cases, an anomaly label may describe a period of degraded
performance spanning multiple rides rather than a sharply defined instance. This misalignment between
real anomaly manifestation and discrete ride-level labels introduces uncertainty into the calculated
metrics. Improving label granularity remains an open challenge in industrial predictive maintenance
and is a direction for future work.</p>
      </sec>
      <sec id="sec-5-5">
        <title>4.5. Model Behavior across the Data Stream</title>
        <p>Figure 2 illustrates the dynamics of the dual-learner system during the full prequential evaluation of
Installation A. The top panel shows the ground-truth anomalous periods, which appear in
concentrated bursts corresponding to well-documented mechanical degradation phases and maintenance
interventions, together with the anomaly score produced by DAD, which closely follows changes in
system behavior and increases during annotated anomalous intervals, indicating efective sensitivity to
deteriorating ride dynamics. Equivalently, Figure 3 shows the dynamics of the model for Installation B.</p>
        <p>The most informative aspect is the second panel, which shows the model selection mechanism:
whether the final anomaly score arises from the background model (0), the online model (2), or their
ensemble combination (1). In particular, for Installation A there are 22,877 instances scored by the
online model, 690 scored by the background model and 204 scored by the ensemble, In Installation B we
have 51,119, 137 and 9,434, respectively. In both installations, at the beginning of the stream, the system
relies almost exclusively on the background model, since the online one may be afected by the cool
start problem. As the hoist behavior gradually evolves due to increased load, wear and environmental
conditions, the online model becomes more influential, reflecting the increasing need for local adaptation.
The switching pattern is neither abrupt nor purely periodic, but instead responds to changes in the
statistical consistency of incoming data. Meanwhile, the online model confidence (third panel) starts
relatively high but slowly decreases as operational variability grows, before stabilizing in a regime
where both learners contribute. This coordinated interplay demonstrates the intended function of the
dual-learner architecture: the background model provides robust performance to prevent overreacting
to early noise, while the online model specializes to installation-specific dynamics and provides fast
predictions. When anomalous periods occur, the online learner’s confidence drops, emphasizing that
these rides deviate from learned local behavior and possible also from the broader expected operational
patterns encoded by the foundation model, as we can see in Installation B, where the model recurs
to the ensemble a considerable number of times. In this case, looking at the high confidences that
the background model exhibits most of the times, this can be due to the finetuning on the previous
installation, that is overfitting the model and making it less efective on new anomalies from the second
installation.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>5. Conclusions</title>
      <p>This work introduced a dual-learner architecture for unsupervised anomaly detection in industrial
streaming environments. The system integrates a fast online model with a time-series foundation
model through a confidence-based routing mechanism that adapts dynamically to changes in data
distribution. In this way, the online learner and the foundation model carry complementary strengths:
the former adapts quickly to local conditions, while the latter provides a stable representation of normal
operation learned from large-scale data. The confidence estimation and routing strategy allows the
system to balance these strengths without requiring labels or maintain a historical bufer. Experiments
on two industrial elevator installations showed that the proposed approach performs competitively
in a fully unsupervised and prequential evaluation. The zero-shot configuration demonstrates strong
generalization to new deployments, while fine-tuning the foundation model on one installation and
transferring it to another yields further improvements. These findings confirm that foundation models
can act as reusable priors for industrial streaming tasks, reducing the need for manual configuration
and prior training data.</p>
      <p>While still a preliminary work, this study suggests that integrating online learners with foundation
models is a viable path toward scalable and reliable industrial data stream systems. However, there is
still room for improvement in terms of validation assessment across wider domains. Future work will
explore integration with other promising architectures for time series analysis, like difusion models,
active learning strategies for selective labeling under minimal supervision, and broader cross-domain
transfer across diferent domains and types of machinery.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments</title>
      <p>This project is funded by the Swedish Knowledge Foundation (KKs).</p>
    </sec>
    <sec id="sec-8">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the authors used ChatGPT and DeepL in order to: Grammar and
spelling check. After using these tools, the authors reviewed and edited the content as needed and take
full responsibility for the publication’s content.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>V.</given-names>
            <surname>Rodriguez-Fernandez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Camacho</surname>
          </string-name>
          ,
          <article-title>Recent trends and advances in machine learning challenges and applications for industry 4.0</article-title>
          ,
          <string-name>
            <surname>Expert</surname>
            <given-names>Systems</given-names>
          </string-name>
          41 (
          <year>2024</year>
          )
          <article-title>e13506</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>L.</given-names>
            <surname>Correia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.-C.</given-names>
            <surname>Goos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Klein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Bäck</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. V.</given-names>
            <surname>Kononova</surname>
          </string-name>
          ,
          <article-title>Online model-based anomaly detection in multivariate time series: Taxonomy, survey, research challenges and future directions</article-title>
          ,
          <source>Engineering Applications of Artificial Intelligence</source>
          <volume>138</volume>
          (
          <year>2024</year>
          )
          <article-title>109323</article-title>
          . doi:
          <volume>10</volume>
          .1016/j.engappai.
          <year>2024</year>
          .
          <volume>109323</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>W.</given-names>
            <surname>Hao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <article-title>Hybrid statistical-machine learning for real-time anomaly detection in industrial cyber-physical systems</article-title>
          ,
          <source>IEEE Transactions on Automation Science and Engineering</source>
          <volume>20</volume>
          (
          <year>2021</year>
          )
          <fpage>32</fpage>
          -
          <lpage>46</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>D.</given-names>
            <surname>Kahneman</surname>
          </string-name>
          , Thinking, fast and slow, Macmillan,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>S. C.</given-names>
            <surname>Tan</surname>
          </string-name>
          ,
          <string-name>
            <surname>K. M. Ting</surname>
            ,
            <given-names>T. F.</given-names>
          </string-name>
          <string-name>
            <surname>Liu</surname>
          </string-name>
          ,
          <article-title>Fast anomaly detection for streaming data</article-title>
          ,
          <source>in: 22nd International Joint Conference on Artificial Intelligence</source>
          ,
          <year>2011</year>
          , pp.
          <fpage>1511</fpage>
          -
          <lpage>1516</lpage>
          . doi:
          <volume>10</volume>
          .5591/ 978-1-
          <fpage>57735</fpage>
          -516-8/
          <fpage>IJCAI11</fpage>
          -254.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>M.</given-names>
            <surname>Goswami</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Szafer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Choudhry</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Cai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Dubrawski</surname>
          </string-name>
          ,
          <article-title>Moment: A family of open time-series foundation models</article-title>
          ,
          <source>in: 41st International Conference on Machine Learning (ICML</source>
          <year>2024</year>
          ),
          <year>2024</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>16</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>S.</given-names>
            <surname>Guha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Mishra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Roy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Schrijvers</surname>
          </string-name>
          ,
          <article-title>Robust random cut forest based anomaly detection on streams</article-title>
          ,
          <source>in: 33rd International Conference on Machine Learning (ICML</source>
          <year>2016</year>
          ),
          <year>2016</year>
          , pp.
          <fpage>3987</fpage>
          -
          <lpage>3999</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>S.</given-names>
            <surname>Nesic</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Putina</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bahri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Huet</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. M.</given-names>
            <surname>Navarro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Rossi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sozio</surname>
          </string-name>
          , Streamrhf:
          <article-title>Tree-based unsupervised anomaly detection for data streams</article-title>
          ,
          <source>in: IEEE/ACS International Conference on Computer Systems and Applications</source>
          , AICCSA, volume
          <volume>2022</volume>
          <source>-December</source>
          ,
          <year>2022</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>8</lpage>
          . doi:
          <volume>10</volume>
          . 1109/AICCSA56895.
          <year>2022</year>
          .
          <volume>10017876</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>F.</given-names>
            <surname>Leveni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. W.</given-names>
            <surname>Cassales</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Pfahringer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bifet</surname>
          </string-name>
          , G. Boracchi,
          <article-title>Online isolation forest</article-title>
          ,
          <source>in: 41st International Conference on Machine Learning (ICML</source>
          <year>2024</year>
          ),
          <year>2024</year>
          , pp.
          <fpage>27288</fpage>
          -
          <lpage>27298</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>J. J.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. W.</given-names>
            <surname>Cassales</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. T.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Pfahringer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bifet</surname>
          </string-name>
          ,
          <article-title>Streaming isolation forest</article-title>
          ,
          <source>in: Advances in Knowledge Discovery and Data Mining</source>
          ,
          <year>2025</year>
          , pp.
          <fpage>95</fpage>
          -
          <lpage>107</lpage>
          . doi:
          <volume>10</volume>
          .1007/
          <fpage>978</fpage>
          -981-96-8170-
          <issue>9</issue>
          _
          <fpage>8</fpage>
          ,.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>M.</given-names>
            <surname>Odiathevar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W. K. G.</given-names>
            <surname>Seah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Frean</surname>
          </string-name>
          ,
          <article-title>A hybrid online ofline system for network anomaly detection</article-title>
          ,
          <source>in: 2019 28th International Conference on Computer Communication and Networks (ICCCN)</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>9</lpage>
          . doi:
          <volume>10</volume>
          .1109/ICCCN.
          <year>2019</year>
          .
          <volume>8847011</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>L.</given-names>
            <surname>Cerdà-Alabern</surname>
          </string-name>
          , G. Iuhasz, G. Gemmi,
          <article-title>Anomaly detection for fault detection in wireless community networks using machine learning</article-title>
          ,
          <source>Computer Communications</source>
          <volume>202</volume>
          (
          <year>2023</year>
          )
          <fpage>191</fpage>
          -
          <lpage>203</lpage>
          . doi:
          <volume>10</volume>
          .1016/j.comcom.
          <year>2023</year>
          .
          <volume>02</volume>
          .019.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>M.</given-names>
            <surname>Nsor</surname>
          </string-name>
          ,
          <article-title>Predictive maintenance using machine learning for engineering systems through realtime sensor data and anomaly detection models</article-title>
          ,
          <source>International Journal of Research Publication and Reviews</source>
          <volume>6</volume>
          (
          <year>2025</year>
          )
          <fpage>5167</fpage>
          -
          <lpage>5183</lpage>
          . doi:
          <volume>10</volume>
          .55248/gengpi.6.0725.2541.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>V.</given-names>
            <surname>Ekambaram</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Jati</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Dayama</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Mukherjee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. H.</given-names>
            <surname>Nguyen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W. M.</given-names>
            <surname>Giford</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Reddy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kalagnanam</surname>
          </string-name>
          ,
          <article-title>Tiny time mixers (ttms): Fast pre-trained models for enhanced zero/few-shot forecasting of multivariate time series</article-title>
          ,
          <source>in: 38th Conference on Neural Information Processing Systems (NeurIPS</source>
          <year>2024</year>
          ),
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>V.</given-names>
            <surname>Feofanov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Wen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Alonso</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Ilbert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Guo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Tiomoko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Pan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , I. Redko, Mantis:
          <article-title>Lightweight calibrated foundation model for user-friendly time series classification</article-title>
          ,
          <source>in: 1st ICML Workshop on Foundation Models for Structured Data FMSD @ ICML</source>
          <year>2025</year>
          ,
          <year>2025</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>21</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>H. M. Gomes</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Gunasekara</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Sun</surname>
            ,
            <given-names>G. W.</given-names>
          </string-name>
          <string-name>
            <surname>Cassales</surname>
            ,
            <given-names>J. J.</given-names>
          </string-name>
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Heyden</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Cerqueira</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Bahri</surname>
            ,
            <given-names>Y. S.</given-names>
          </string-name>
          <string-name>
            <surname>Koh</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Pfahringer</surname>
            ,
            <given-names>A</given-names>
          </string-name>
          . Bifet,
          <string-name>
            <surname>CapyMOA:</surname>
          </string-name>
          <article-title>Eficient machine learning for data streams in python, 2025</article-title>
          . URL: https://arxiv.org/abs/2502.07432. arXiv:
          <volume>2502</volume>
          .
          <fpage>07432</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>