1. Introduction

Online Learning Supported by Foundation Models for Anomaly Detection in Industrial Settings

Aurora Esteban Toscano

Sepideh Pashami

Felix Nilsson

Luka Smeets

Sławomir Nowaczyk

0 0 Center for Applied Intelligent Systems Research (CAISR), Halmstad University , Halmstad , Sweden 1 HMS Industrial Networks AB , Halmstad , Sweden 2 HQ RAXTAR , Veldhoven , The Netherlands

2026

Modern industrial monitoring systems must detect anomalies in real time under evolving operating conditions and without reliance on labeled data. Traditional online anomaly detectors ofer fast adaptation but struggle when normal behavior shifts or when rare anomalies are unintentionally learned as normal. On the other side, recently introduced foundation models for time series capture richer structure but are computationally expensive for continuous deployment. We propose a dual-learner anomaly detection framework that bridges a fast online learner based on Half-Space Trees with a time-series foundation model (MOMENT) acting as a background learner. A confidence-based routing mechanism determines, for each incoming instance, whether to trust the online model, defer to the foundation model, or combine both through confidence-weighted ensembling. The confidence estimation method is fully unsupervised and robust to drift, requiring no labels or sliding windows. We validate the approach on two real-world elevator (hoist) installations, demonstrating that the system operates eficiently in streaming conditions and matches or surpasses strong online baselines. Furthermore, we show that fine-tuning the foundation model on one installation provides measurable performance gains when transferred to a diferent installation, indicating that foundation-model adaptation can support cross-site knowledge transfer in industrial monitoring. The results highlight the promise of integrating online learning with foundation models to achieve both responsiveness and robustness in long-term industrial anomaly detection.

eol>Streaming Machine Learning Time series analysis Transfer learning Anomaly Detection

1. Introduction

Modern industrial systems generate large volumes of sensor measurements at high frequency, often under continuously evolving operating conditions. In these environments, online learning is particularly attractive because models must process data as it arrives, without relying on long-term data storage or expensive ofline retraining [ 1 ]. At the same time, real-world deployment constraints require minimal supervision, fast adaptation, and robustness to distributional shifts.

A central challenge in this context is anomaly detection [ 2 ]. Malfunctions are rare by definition, leading to an extreme imbalance where only normal data is abundant. Furthermore, labels for anomalous events are typically unavailable or arrive with significant delay, making fully supervised learning impractical. As a result, semi-supervised or unsupervised approaches—particularly those that learn normal behavior and detect deviations—are widely used. Common online anomaly detectors such as statistical models and incremental tree ensembles (e.g., Half-Space Trees) ofer fast inference and incremental updates, but their performance degrades in complex industrial environments due to high dimensionality, varying anomaly signatures, and model contamination when anomalous instances are unintentionally incorporated into training [ 3 ].

To address these limitations, we propose a dual-learner system that bridges fast online anomaly detection with the representational capacity of a time-series foundation model. Our design is inspired by the “fast and slow thinking” framework from cognitive psychology [ 4 ]: a lightweight, adaptive online model provides rapid decisions under normal conditions (“think fast” ), while a more expressive but computationally heavier background model (“think slow” ) is invoked when additional reasoning is needed. Specifically, we combine: • Half-Space Trees [ 5 ] as an eficient online anomaly detector capable of processing high-speed industrial data streams. • MOMENT [ 6 ], a large-scale foundation model for time series analysis, used as a background model to refine uncertain predictions and detect subtle temporal deviations.

A confidence-driven routing mechanism governs the interaction between the two learners. The online detector makes the initial prediction and assesses its reliability. If the confidence is low, the system queries the foundation model. If both models exhibit uncertainty, their anomaly scores are combined via confidence-weighted ensembling. This design enables real-time operation under limited compute while leveraging the generalization ability of a large pre-trained model when necessary.

A motivating application for this work is the monitoring of industrial elevator (hoist) systems used in high-rise construction projects. These systems operate under rapidly evolving mechanical and environmental conditions, leading to non-stationary sensor distributions and a mixture of subtle and abrupt anomalies. Ride durations, loading patterns, and vibration signatures vary throughout the construction process, making it dificult for purely online or purely ofline approaches to remain reliable over long periods. This domain highlights the need for a method that combines (i) fast instance-byinstance adaptation to local changes and (ii) stable, domain-general representations that are robust to drift and noise. While our proposed architecture is general and applicable to any multivariate timeseries stream, the industrial hoist scenario provides a concrete real-world environment in which these challenges—and the benefits of our dual-learner design—naturally arise. Thus, we evaluate our system on real industrial elevator monitoring data across two deployment projects. Additionally, we study the efect of fine-tuning the foundation model on data from the first project and transferring the adapted model to the second project. Our results demonstrate that foundation model adaptation improves performance and robustness of the online learner through indirect knowledge transfer, opening a new direction for integrating online learning and foundation models in industrial applications.

This work makes the following contributions: • We introduce a dual-learner anomaly detection architecture that integrates a fast online model with a foundation time-series model via confidence-based routing. • We propose a dual confidence estimation method that is fully unsupervised and allows reliable model-switching under scarce anomalies based on the consistency of the predictions produced by both models. • We demonstrate that fine-tuning the foundation model in one industrial project improves online anomaly detection in a second project, revealing a pathway for domain-level transfer learning in streaming applications. • We validate the proposed approach on real-world industrial systems with high-rate streaming sensor data.

The rest of the paper is organized as follows. Section 2 reviews the related work on foundation models for time series and previous approaches in the field of online anomaly detection. Section 3 presents the proposed dual-learner architecture. Section 4 presents the case study used for validate the proposal. Finally, Section 5 summarizes conclusions and future work.

2. Related Work 2.1. Online Anomaly Detection in Data Streams

Online anomaly detection in data streams focuses on learning models incrementally while adapting to evolving data distributions. Among the most influential approaches are ensemble-based tree models designed for non-stationary data. Half-Space Trees (HST) [ 5 ] introduced a streaming extension of isolation-based anomaly detection, where random projections define partitions that adapt to new data without explicit retraining. Similarly, Robust Random Cut Forest (RRCF) [ 7 ] uses randomized partitioning trees to detect points that cause large changes in the model structure, emphasizing robustness to high-dimensional and noisy data streams. StreamRHF [ 8 ] extends this line by incorporating adaptive memory management to maintain bounded model size. More recently, streaming variants of Isolation Forest have been proposed. Online Isolation Forest (OIF) [ 9 ] updates isolation trees incrementally using reservoir-based subsampling to preserve model diversity, while Streaming Isolation Forest (SIF) [ 10 ] focuses on maintaining statistical representativeness under drift by integrating age-based sample replacement. These models provide fast predictive performance, but their learning dynamics can inadvertently learn the abnormal behavior as the concept of “normal” shifts over time.

Hybrid systems combining streaming learners with ofline or historical models have also been explored. For example, the work of [ 11 ] presents a hybrid system where an ofline model retains general characteristics (a bias) and an online model continuously learns and adapts. Other approaches [ 12 ] explore unsupervised online learning methods, which are inherently reactive but typically still need a "warm-up" or initial training phase, or rely on a dynamically adapting threshold (like quantilebased filters). Particularly in industrial settings, several approaches include periodically retraining autoencoders or clustering models on bufered data [ 13 ].

Overall, these systems typically rely on scheduled batch retraining or assume stable long-term distributions, limiting their responsiveness in complex industrial processes. In contrast, our approach integrates an online detector with a foundation model in a fully instance-based, confidence-driven routing scheme that requires no batching, no labels, and no hold-out reference windows.

2.2. Foundation Models for Time Series

Foundation models for time series have recently emerged as a parallel to large pretrained models in vision and language. MOMENT [ 6 ] demonstrated that transformer-based architectures trained on large-scale multivariate time series collections can be reused for reconstruction and forecasting across domains, supporting zero-shot and fine-tuned anomaly detection. MOMENT introduced chunkbased temporal tokenization to enable eficient learning of long-range dependencies while maintaining moderate computation cost. Subsequent work has expanded foundation models to improve eficiency and transferability. Tiny Time Mixers (TTM) [ 14 ] proposed a lightweight architecture based on TSMixer with adaptive patching and multi-resolution sampling. TTM models achieve strong zero- and few-shot performance while remaining computationally eficient enough for CPU-only deployments, addressing one of the main deployment barriers of foundation models. Likewise, Mantis [ 15 ] focuses on time series classification, employing a ViT-based encoder trained with contrastive learning and equipped with channel-adaptive adapters to reduce fine-tuning cost and improve calibration, highlighting the growing interest in generalizable, reusable time-series backbones.

Unlike forecasting- or classification-oriented foundation models, our use case requires instance-level anomaly reconstruction under streaming conditions. The proposed dual-learner framework leverages the high-level temporal priors encoded in MOMENT while retaining the flexibility and adaptability of online learners. To our knowledge, this is the first work to integrate a time-series foundation model directly within a real-time, unsupervised anomaly detection and routing mechanism.

Tabular representation

Online model …

New observation xt

3. Online Learning Supported by a Foundation Model for Anomaly Detection

We propose a dual-learner anomaly detection framework that integrates a fast online detector with a foundation model for time series as shown in Fig. 1. The system operates continuously on streaming data and adapts its prediction strategy based on model confidence. The architecture consists of three main components: (1) an online learner based on Half-Space Trees for real-time detection, (2) a background learner based on the MOMENT foundation model for deep representation-driven anomaly scoring, and (3) a confidence-based routing mechanism that dynamically selects or combines predictions from both learners.

We assume a streaming setting in which each incoming instance is a multivariate time series of dimensions and variable length . To support both fast online processing and deep temporal modeling, the architecture uses two complementary representations: (i) a compact tabular descriptor derived from statistical and spectral features, and (ii) the raw or lightly preprocessed time-series input consumed directly by the foundation model. The processing pipeline for each new observation ∈ R× is therefore: tab ← statistical descriptors of seq ← padded to nearest 8 to match MOMENT’s patch embedding

The proposed architecture is intentionally modular: each component addresses a specific limitation of either pure online methods or pure foundation-model approaches. The online learner provides fast streaming predictions; the foundation model contributes rich temporal priors; and the confidence router manages when to query which model under a fixed computational budget. This modularity does introduce several design choices (e.g., the online and background methods, the form of the confidence metric, or routing thresholds), which we motivate in the following subsections.

3.1. Online Learner: Half-Space Trees with Ensemble-Based Confidence

The online learner is implemented using HST [ 5 ], an incremental anomaly detection model designed for streaming environments. HST recursively partition the feature space into nested half-spaces, maintaining compact statistics that approximate data density in each region. Given an observation tab, each tree in the model outputs a score (), and the overall anomaly score is the average.

In this setting, we introduce a confidence metric that minimizes the computational load and maintains the original approach of unsupervised learning since does not require labels. Our confidence metric combines two complementary notions of agreement within the tree ensemble: (i) the variance of the raw anomaly scores, and (ii) the entropy of the induced binary anomaly votes. We use the score variance to capture continuous agreement: if all trees assign similar scores, we treat the prediction as reliable, regardless of whether the absolute score is high or low. However, relying on variance alone can be brittle when a few trees produce outlier scores due to local partitioning artifacts. To mitigate this, we also consider the entropy of binary votes obtained by thresholding each tree’s score, since entropy is insensitive to the scale of the scores and instead measures the robustness of the decision: if most trees agree on either “normal” or “anomaly”, the entropy is low and confidence is high. This term stabilizes the confidence estimate in regions where the variance may be inflated by a small subset of disagreeing trees. However, it is highly sensitive to the choice of the anomaly threshold, that here we set in 0.5.

We compute:

Consistvar = 1 − ( ()) fm = exp(−( fm ,1: ))

Optionally, MOMENT may be fine-tuned on unlabeled data from similar domains, improving domain alignment and enabling knowledge transfer to subsequent deployments.

3.3. Confidence Router with Online Beta Distribution Fitting

The router determines whether to trust the online learner (fast path), the foundation model (slow path), or a weighted ensemble. For each model, we maintain a running estimate of its confidence distribution

The final per-instance confidence o nline ∈ [ 0, 1 ] of the online model is obtained by averaging the two previous consistency metrics to preserve each one’s benefits. Using only one of the terms would make the confidence overly sensitive either to outlier scores (variance only) or to the particular choice of the voting threshold (entropy only). Using the minimum of the two would be overly conservative in practice, leading the router to treat many instances as low-confidence and unnecessarily query the foundation model, increasing computational load without clear gains in detection quality.

Finally, since the online score is computed for all the instances in the data stream, we track reliability over time applying an Exponential Moving Average (EMA) approach: o nline = · o−1nline + (1 − ) · Consistency

() = 1[() > ], = Consistentr = 1 + ( log2 + (1 − ) log 2(1 − )) 1 ∑︁ (), =1

3.2. Background Learner: MOMENT Foundation Model

The background learner is MOMENT [ 6 ], a time-series foundation model pre-trained on large-scale seq (padded to length divisible multivariate sensor datasets. MOMENT consumes the raw time series by 8), producing a reconstruction ^seq.

To estimate the confidence of the model, we apply perturbation-based analysis by injecting small Gaussian noise into − 1 perturbation copies of seq, producing reconstructions in one parallel forward pass, which does not increment the temporal complexity of the model with respect to produce the reconstruction of the original time series. The confidence is obtained from the standard deviation across the reconstruction errors, with low reconstruction variability meaning high confidence: (1) (2) (3) (4) (5) using the method of moments to fit a Beta(, ) distribution: ,

From this Beta models, we compute a dynamic trust threshold for each model, online and fm respectively, as the -th percentile:

= Beta −1 (; , ) and selects predictions falling in the top- percentile region of each model’s own historical confidence range. This happens because the routing rule is to always first obtain the output from the online model (, online), which has as little complexity as is ( log ) for trees. This confidence is evaluated against its percentile, o nline > online, meaning that it is in the top X% relative to that model’s typical confidence. If this condition is true, the model’s output is the online anomaly score o nline. Only when this condition is not met, the router queries the background model, and its confidence is evaluated against its percentile following the same principle: fm > fm. If this condition is true, the model’s output is the background anomaly score fm. When none of the models achieve the desired percentile, the final score is obtained through confidence-weighted ensembling: where is adapted to the relative confidence of the two learners: = online + (1 −

)fm = online online + fm

This makes the routing procedure scale-invariant across models and robust to drift; unsupervised since it does not depend on labels; and eficient as does not store historical windows and has constant memory and computation.

However, the eficiency of the model depends heavily on the configuration of these thresholds. In particular, the one associated to the online model, online, that essentially controls how likely it is to consult the background model with each new observation. Therefore, this parameter should be set in relation to the speed of arrival of new instances, in a range that allows the model to minimize the calls to the background model and avoid bottlenecks. For example, setting online = 0.3 allows the model to trust the top 70% of the observed confidences.

4. Case Study: Industrial Hoist Installations 4.1. Problem Description

We evaluate the system on two distinct industrial elevator (hoist) installations in high-rise construction environments. Each instance corresponds to a full hoist ride, represented both as a multivariate time series and a derived feature vector for the online model. The details of each installation is presented in

Across both installations, the average ride duration is approximately 25 seconds, but varies substantially with construction progress, as the hoist load, travel height, and acceleration phases change over time. Each ride provides two complementary data views, both of multivariate time series nature: • Mechanical sensors related to vibration intensity and impulsiveness (-RMS, -Peak, -RMS), crest factor (shock-dominated behavior) and operating temperature. • Programmable Logic Controller (PLC) sensors that include motor electrical variables (current, torque, frequency, voltage), mechanical variables (position, speed, brake state), and drive and motor temperature estimates. (8) (6) (7) (9) (10) • Tabular descriptor for the online learner: The multivariate time series is summarized in statistical features (mean, std, min, max, iqr), time-domain features (rms, skewness, kurtosis), frequencydomain features (dominant, weighted average...), and trend features (trend slope and intercept).

In total, the tabular representation of the series has 302 features. • Raw multivariate time series for the background learner: The original sequence is injected for reconstruction in the foundation model almost as it is, preserving all temporal dynamics. However, because elevator rides are multivariate and nonstationary at startup and shutdown, we compute anomaly score using the central 80% of the ride: fm =

1 0.8 ⌊0.9⌋ ∑︁ =⌊0.1⌋ ‖, − ^ ,‖ 2 where is the ride length. This choice does not remove anomalous behavior, but instead addresses two practical issues: (i) MOMENT reconstructions exhibit systematic edge artifacts due to chunkbased patching and padding requirements, which inflate the reconstruction error at the beginning and end of each sequence independently of the true system state. (2) The physical hoist presents transient behavior during brake release, initial acceleration, and final deceleration. These short intervals show large but normal fluctuations in vibration and torque, which would dominate the reconstruction loss and lead to false positives without improving anomaly separability. Importantly, only the foundation-model score applies this trimming; the online learner continues to use a full representation of the ride. Thus, trimming reduces model-induced noise without discarding diagnostically relevant information.

After this, following MOMENT’s chunk-based transformer architecture, the sequence is zeropadded to the nearest multiple of 8, and this extra portion is masked so it does not alter errorreconstruction metric. We configure this metric as the Mean Squared Error (MSE). (11)

4.2. Experimental Settings

The proposed dual-learner system is implemented in the CapyMOA streaming analytics framework [ 16 ], and integrates the MOMENT-base implementation provided by the authors [ 6 ]. The experiments are evaluated in a fully unsupervised, prequential (test-then-train) setting. Each ride is evaluated when it arrives, producing an anomaly score before any model update. The true label, when available for ofline inspection, is not used for training or thresholding. This evaluation protocol matches real deployment conditions, where anomalies are rare, labels are delayed, and models must adapt online.

We compare our method against widely used streaming anomaly detectors available in CapyMOA: • Half-Space Trees (HST) [ 5 ] • Robust Random Cut Forest (RRCF) [ 7 ] No per-dataset hyperparameter optimization is performed. Instead, we use default configurations and unify parameters where applicable (e.g., number of trees=25 in all cases) to ensure a fair and deployment-realistic comparison. For our approach, the parameters related to HST and to MOMENT stay the same as the ones proposed by the authors. Apart from these, the only relevant parameters are the mentioned percentile-based confidence thresholds, that are set to online = 0.1 and fm = 0.3, and a warm start parameter that controls that for the first = 100 observations, the foundation model is used until the online learner has built a stable model.

4.3. Evaluating Scenarios

Performance is assessed using ofline ground-truth inspection logs and event reports from maintenance records, but these are never used during learning. Metrics reported include both anomaly ranking quality (AUC-PR, AUC-ROC) and alarm eficiency (F1-score under fixed anomaly threshold at 0.5) We evaluate our approach Dual Anomaly Detection (DAD) on two modes: 1. Zero-shot: The foundation model is used without adaptation. We refer to this as DADzs 2. Cross-Domain Fine-Tuning: The foundation model is fine-tuned on Installation A and tested to Installation B. We refer to this as DADft This setting tests whether knowledge gained from one hoist installation can improve anomaly detection in another—an important capability for scalable industrial monitoring systems.

4.4. Experimental Results

Table 2 summarizes the performance of all methods across both hoist installations. Several trends highlight the strengths of the proposed dual-learner system and the role of cross-installation adaptation. First, the zero-shot configuration DAD zs achieves competitive performance on both installations, matching or surpassing the best-performing baselines. On Installation A, it provides the highest AUCPR (0.1933) and AUC-ROC (0.6148), with a slight improvement in F1-score over the nearest alternative (HST). On Installation B, DADzs achieves performance comparable to the best baseline configurations. This suggests that the representation and reconstruction mechanisms of MOMENT in combination with the online learner generalize to new data distributions without explicit model tuning.

However, the benefits of cross-domain fine-tuning are especially visible in the second installation. DADft, which incorporates adaptation from Installation A to Installation B, improves both AUC-PR (from 0.3454 to 0.3591) and AUC-ROC (from 0.4670 to 0.5018). This improvement is especially relevant because Installation B has a higher anomaly ratio (36% vs. 15%): The presence of more operational irregularities increases the variance of the reconstruction error distributions, making threshold selection more challenging for simpler detectors, but fine-tuning on normal data from previous projects recalibrates the reconstruction baseline toward the characteristics of the new system, improving separability of anomalous behavior. This supports the hypothesis that foundation models for industrial time-series benefit from lightweight adaptation rather than from being applied in a completely zero-shot manner.

In terms of execution time, methods such as HST and SIF remain the fastest, making them suitable for lower-resource deployments. However, their performance is consistently lower, particularly in Installation A, where they present limited ability to capture subtle deviations. In contrast, classical random-cut forest methods (RRCF, StreamRHF) show significantly higher computational cost (up to 160k seconds in the largest setting), making them impractical for real-time monitoring. DADzs and DADft fall between these two extremes. While their execution times (55–180 seconds) are higher than the simplest models, they remain easily compatible with real-time operation, since processing a ride occurs after acquisition and well within cycle times of industrial maintenance systems. Crucially, the added computational cost translates into improved modeling of temporal dynamics and more accurate anomaly scoring.

The comparison between installations underlines the diferences in data complexity. Installation B has nearly three times more rides and more anomalies, both due to more prolonged operation and evolving mechanical wear during the later construction phase. Models that rely solely on online learning can struggle to maintain precision as the normal behavior drifts. The dual-learner architecture mitigates this through continual online adaptation while still preserving a stable baseline representation via the foundation model.

Finally, it is important to note that the ground-truth labels used for evaluation in this real-world case study derive from maintenance reports and operator annotations, which do not always correspond to precise or isolated temporal events. In many cases, an anomaly label may describe a period of degraded performance spanning multiple rides rather than a sharply defined instance. This misalignment between real anomaly manifestation and discrete ride-level labels introduces uncertainty into the calculated metrics. Improving label granularity remains an open challenge in industrial predictive maintenance and is a direction for future work.

4.5. Model Behavior across the Data Stream

Figure 2 illustrates the dynamics of the dual-learner system during the full prequential evaluation of Installation A. The top panel shows the ground-truth anomalous periods, which appear in concentrated bursts corresponding to well-documented mechanical degradation phases and maintenance interventions, together with the anomaly score produced by DAD, which closely follows changes in system behavior and increases during annotated anomalous intervals, indicating efective sensitivity to deteriorating ride dynamics. Equivalently, Figure 3 shows the dynamics of the model for Installation B.

The most informative aspect is the second panel, which shows the model selection mechanism: whether the final anomaly score arises from the background model (0), the online model (2), or their ensemble combination (1). In particular, for Installation A there are 22,877 instances scored by the online model, 690 scored by the background model and 204 scored by the ensemble, In Installation B we have 51,119, 137 and 9,434, respectively. In both installations, at the beginning of the stream, the system relies almost exclusively on the background model, since the online one may be afected by the cool start problem. As the hoist behavior gradually evolves due to increased load, wear and environmental conditions, the online model becomes more influential, reflecting the increasing need for local adaptation. The switching pattern is neither abrupt nor purely periodic, but instead responds to changes in the statistical consistency of incoming data. Meanwhile, the online model confidence (third panel) starts relatively high but slowly decreases as operational variability grows, before stabilizing in a regime where both learners contribute. This coordinated interplay demonstrates the intended function of the dual-learner architecture: the background model provides robust performance to prevent overreacting to early noise, while the online model specializes to installation-specific dynamics and provides fast predictions. When anomalous periods occur, the online learner’s confidence drops, emphasizing that these rides deviate from learned local behavior and possible also from the broader expected operational patterns encoded by the foundation model, as we can see in Installation B, where the model recurs to the ensemble a considerable number of times. In this case, looking at the high confidences that the background model exhibits most of the times, this can be due to the finetuning on the previous installation, that is overfitting the model and making it less efective on new anomalies from the second installation.

5. Conclusions

This work introduced a dual-learner architecture for unsupervised anomaly detection in industrial streaming environments. The system integrates a fast online model with a time-series foundation model through a confidence-based routing mechanism that adapts dynamically to changes in data distribution. In this way, the online learner and the foundation model carry complementary strengths: the former adapts quickly to local conditions, while the latter provides a stable representation of normal operation learned from large-scale data. The confidence estimation and routing strategy allows the system to balance these strengths without requiring labels or maintain a historical bufer. Experiments on two industrial elevator installations showed that the proposed approach performs competitively in a fully unsupervised and prequential evaluation. The zero-shot configuration demonstrates strong generalization to new deployments, while fine-tuning the foundation model on one installation and transferring it to another yields further improvements. These findings confirm that foundation models can act as reusable priors for industrial streaming tasks, reducing the need for manual configuration and prior training data.

While still a preliminary work, this study suggests that integrating online learners with foundation models is a viable path toward scalable and reliable industrial data stream systems. However, there is still room for improvement in terms of validation assessment across wider domains. Future work will explore integration with other promising architectures for time series analysis, like difusion models, active learning strategies for selective labeling under minimal supervision, and broader cross-domain transfer across diferent domains and types of machinery.

Acknowledgments

This project is funded by the Swedish Knowledge Foundation (KKs).

Declaration on Generative AI

During the preparation of this work, the authors used ChatGPT and DeepL in order to: Grammar and spelling check. After using these tools, the authors reviewed and edited the content as needed and take full responsibility for the publication’s content.

[1]

Rodriguez-Fernandez ,

Camacho , Recent trends and advances in machine learning challenges and applications for industry 4.0 , Expert

Systems

41 ( 2024 ) e13506 .

[2]

Correia ,

J.-C.

Goos ,

Klein ,

Bäck ,

A. V.

Kononova , Online model-based anomaly detection in multivariate time series: Taxonomy, survey, research challenges and future directions , Engineering Applications of Artificial Intelligence 138 ( 2024 ) 109323 . doi: 10 .1016/j.engappai. 2024 . 109323 .

[3]

Hao ,

Yang ,

Yang , Hybrid statistical-machine learning for real-time anomaly detection in industrial cyber-physical systems , IEEE Transactions on Automation Science and Engineering 20 ( 2021 ) 32 - 46 .

[4]

Kahneman , Thinking, fast and slow, Macmillan, 2011 .

[5]

S. C.

Tan , K. M. Ting , T. F. Liu , Fast anomaly detection for streaming data , in: 22nd International Joint Conference on Artificial Intelligence , 2011 , pp. 1511 - 1516 . doi: 10 .5591/ 978-1- 57735 -516-8/ IJCAI11 -254.

[6]

Goswami ,

Szafer ,

Choudhry ,

Cai ,

Li ,

Dubrawski , Moment: A family of open time-series foundation models , in: 41st International Conference on Machine Learning (ICML 2024 ), 2024 , pp. 1 - 16 .

[7]

Guha ,

Mishra ,

Roy ,

Schrijvers , Robust random cut forest based anomaly detection on streams , in: 33rd International Conference on Machine Learning (ICML 2016 ), 2016 , pp. 3987 - 3999 .

[8]

Nesic ,

Putina ,

Bahri ,

Huet ,

J. M.

Navarro ,

Rossi ,

Sozio , Streamrhf: Tree-based unsupervised anomaly detection for data streams , in: IEEE/ACS International Conference on Computer Systems and Applications , AICCSA, volume 2022 -December , 2022 , pp. 1 - 8 . doi: 10 . 1109/AICCSA56895. 2022 . 10017876 .

[9]

Leveni ,

G. W.

Cassales ,

Pfahringer ,

Bifet , G. Boracchi, Online isolation forest , in: 41st International Conference on Machine Learning (ICML 2024 ), 2024 , pp. 27288 - 27298 .

[10]

J. J.

Liu ,

G. W.

Cassales ,

F. T.

Liu ,

Pfahringer ,

Bifet , Streaming isolation forest , in: Advances in Knowledge Discovery and Data Mining , 2025 , pp. 95 - 107 . doi: 10 .1007/ 978 -981-96-8170- 9 _ 8 ,.

[11]

Odiathevar ,

W. K. G.

Seah ,

Frean , A hybrid online ofline system for network anomaly detection , in: 2019 28th International Conference on Computer Communication and Networks (ICCCN) , 2019 , pp. 1 - 9 . doi: 10 .1109/ICCCN. 2019 . 8847011 .

[12]

Cerdà-Alabern , G. Iuhasz, G. Gemmi, Anomaly detection for fault detection in wireless community networks using machine learning , Computer Communications 202 ( 2023 ) 191 - 203 . doi: 10 .1016/j.comcom. 2023 . 02 .019.

[13]

Nsor , Predictive maintenance using machine learning for engineering systems through realtime sensor data and anomaly detection models , International Journal of Research Publication and Reviews 6 ( 2025 ) 5167 - 5183 . doi: 10 .55248/gengpi.6.0725.2541.

[14]

Ekambaram ,

Jati ,

Dayama ,

Mukherjee ,

N. H.

Nguyen ,

W. M.

Giford ,

Reddy ,

Kalagnanam , Tiny time mixers (ttms): Fast pre-trained models for enhanced zero/few-shot forecasting of multivariate time series , in: 38th Conference on Neural Information Processing Systems (NeurIPS 2024 ), 2024 .

[15]

Feofanov ,

Wen ,

Alonso ,

Ilbert ,

Guo ,

Tiomoko ,

Pan ,

Zhang , I. Redko, Mantis: Lightweight calibrated foundation model for user-friendly time series classification , in: 1st ICML Workshop on Foundation Models for Structured Data FMSD @ ICML 2025 , 2025 , pp. 1 - 21 .

[16] H. M. Gomes , A.

Lee , N.

Gunasekara , Y.

Sun , G. W.

Cassales , J. J.

Liu , M.

Heyden , V.

Cerqueira , M.

Bahri , Y. S.

Koh , B.

Pfahringer , A . Bifet, CapyMOA: Eficient machine learning for data streams in python, 2025 . URL: https://arxiv.org/abs/2502.07432. arXiv: 2502 . 07432 .