<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>AdaptiveBayes: Comprehensive empirical analysis and algorithmic enhancement framework⋆</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Sergii Kavun</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Interregional Academy of Personnel Management</institution>
          ,
          <addr-line>2 Frometivska str., 03039 Kyiv</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <fpage>0000</fpage>
      <lpage>0003</lpage>
      <abstract>
        <p>Traditional baseline models in machine learning, particularly Logistic Regression (LR), serve as fundamental benchmarks but suffer from computational bottlenecks on large-scale datasets. This study presents a comprehensive performance analysis of AdaptiveBayes (AdB) as a potential replacement baseline classifier through extensive empirical evaluation across 12 diverse datasets spanning both standard benchmarks and large-scale classification tasks. AdB demonstrates remarkable computational efficiency with average training speed improvements of 10x over LR, reaching up to 750x on specific datasets, while maintaining competitive AUC performance on selected problems. However, accuracy limitations averaging -0.26 ± 0.33 compared to LR highlight the fundamental speed-accuracy trade-off. Through systematic algorithmic analysis, we propose a comprehensive enhancement framework incorporating advanced regularization, curvature-aware optimization, and improved initialization strategies projected to increase accuracy up to 8% while preserving computational advantages. This research establishes AdB as a viable baseline for time-critical applications and provides a roadmap for next-generation efficient classification algorithms.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;AdaptiveBayes</kwd>
        <kwd>baseline classification</kwd>
        <kwd>computational efficiency</kwd>
        <kwd>adaptive learning</kwd>
        <kwd>machine learning optimization</kwd>
        <kwd>quasi-newton methods</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>This study extends the evaluation to seven large-scale datasets characterized by high
dimensionality and substantial sample sizes, benchmarking AdB against established models
including XGBoost, LightGBM, RandomForest, and Multi-Layer Perceptrons. The research provides
multi-faceted performance analysis encompassing accuracy, AUC, training time, memory
consumption, and prediction speed while identifying architectural limitations and proposing
targeted enhancements.</p>
      <p>Research contributions:</p>
      <p>Comprehensive empirical evaluation of AdB across 12 diverse classification datasets.
Systematic analysis of computational efficiency and memory utilization patterns.
Identification of failure modes and calibration issues in imbalanced data scenarios.
Detailed algorithmic enhancement framework with theoretical foundations.</p>
      <p>Practical deployment guidelines for baseline model selection.</p>
      <p>The proposed framework aligns with CPITS Workshop themes by bridging probability theory,
risk management, and machine learning in cybersecurity. Specifically, AdaptiveBayes enhances
predictive analytics for cyber defense, supports cryptographic protocol assessment, and provides a
scalable framework for cyber-risk forecasting in interconnected systems.</p>
    </sec>
    <sec id="sec-2">
      <title>1. Literature Review</title>
      <p>
        The development of efficient baseline classifiers in machine learning has evolved significantly,
driven by the increasing demand for computational efficiency [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] without substantial accuracy
degradation. This review systematically examines the theoretical foundations and empirical
advancements that inform the AdaptiveBayes framework, organizing the analysis around three
critical research themes: traditional baseline approaches, adaptive learning methodologies [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], and
computational efficiency [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] in modern machine learning.
      </p>
      <sec id="sec-2-1">
        <title>1.1. Traditional Baseline Classification Methods</title>
        <p>
          Logistic Regression has maintained its dominant position in baseline classification due to its
theoretical grounding in maximum likelihood estimation and well-understood convergence
properties. Chen and Guestrin [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] demonstrated that while tree-based methods like XGBoost
provide superior accuracy, the interpretability and simplicity of logistic regression continue to
make it the gold standard for baseline model selection. Recent comparative studies by Islam et al.
[
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] and Yadav et al. [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] consistently validate LR reliability across medical diagnosis and heart
disease prediction applications, though computational limitations emerge prominently in
highdimensional scenarios.
        </p>
        <p>
          The persistence of logistic regression as a baseline stems from its theoretical foundation and
consistent performance across diverse domains. However, Pathak and Shrivas [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] identified
significant computational bottlenecks when applying traditional LR to large-scale intrusion
detection systems, motivating exploration of more efficient alternatives. Similarly, de Luna et al.
demonstrated that while LR maintains accuracy in cybersecurity URL classification tasks [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ],
training time constraints limit its applicability in real-time security applications.
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>1.2. Adaptive Learning Algorithms and Optimization</title>
        <p>
          The evolution of adaptive learning methods has been particularly influenced by advances in
quasiNewton optimization techniques. Ma [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] introduced the Apollo optimizer, which demonstrates
adaptive parameter-wise diagonal quasi-Newton optimization [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] that dynamically incorporates
loss function curvature through efficient Hessian approximation [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]. This work established
theoretical foundations for adaptive learning rate mechanisms that form the basis of modern
efficient classification algorithms.
Building upon these foundations, Yagishita et al. [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] developed proximal diagonal Newton
methods for composite optimization problems, demonstrating significant improvements in
convergence speed while maintaining solution quality. Their approach highlighted the potential for
combining adaptive learning with regularization techniques, directly informing the enhancement
framework proposed in this study. The work by Aminifard and Babaie-Kafaki [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ] further extended
diagonal quasi-Newton methods to compressed sensing applications, proving that diagonal
approximations can maintain computational efficiency while preserving optimization quality.
        </p>
        <p>
          Recent advances in stochastic optimization have emphasized the importance of curvature-aware
adaptation. Ahmed [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] proposed a novel variance reduction proximal stochastic Newton
algorithm specifically designed for large-scale machine learning optimization, achieving faster
convergence while handling non-smooth regularizers. These developments directly motivate
investigation of alternative baseline approaches that prioritize computational efficiency through
adaptive mechanisms.
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>1.3. Computational Efficiency in Modern Machine Learning</title>
        <p>
          The imperative for computational efficiency has driven substantial research into optimized
algorithms and architectures. Ke et al. [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ] developed LightGBM as a highly efficient gradient
boosting decision tree framework, demonstrating that algorithmic innovations can achieve
dramatic speedups without sacrificing accuracy. Their work established benchmarks for evaluating
efficiency-accuracy trade-offs in modern machine learning applications.
        </p>
        <p>
          Complementing tree-based approaches, Wu et al. [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ] presented robust stochastic quasi-Newton
methods with applications in machine learning, showing that second-order optimization
techniques can provide superior computational profiles compared to traditional first-order
methods. The integration of momentum-based updates, as demonstrated by Alecsa [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] in
stochastic non-convex optimization, has proven particularly effective for accelerating convergence
in adaptive learning scenarios.
        </p>
        <p>
          Hardware acceleration has emerged as a critical factor in achieving computational efficiency.
The work by Chia et al. [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] on two-phase switching optimization strategies in LSTM models
demonstrated the importance of GPU-optimized implementations for practical deployment. This
hardware-software co-design approach directly influences the AdaptiveBayes architecture [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ],
which incorporates CuPy-based GPU acceleration for enhanced computational performance.
        </p>
      </sec>
      <sec id="sec-2-4">
        <title>1.4. Sparse and Efficient Bayesian Methods</title>
        <p>
          Recent developments in Bayesian machine learning have focused on achieving computational
efficiency while maintaining theoretical rigor. Luo et al. [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ] developed inverse-free and scalable
sparse Bayesian extreme learning machines [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ], demonstrating that closed-form Bayesian updates
can dramatically reduce computational complexity compared to iterative optimization approaches.
Their work established theoretical precedents for the simplified Bayesian framework underlying
AdaptiveBayes.
        </p>
        <p>
          The complexity-optimized sparse Bayesian learning approach presented by Luo et al. [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ]
specifically addressed scalability challenges in classification tasks, achieving linear computational
complexity while maintaining competitive accuracy. This research directly informs the
mathematical foundation of AdaptiveBayes, particularly the selective update mechanisms and
feature transformation strategies.
        </p>
      </sec>
      <sec id="sec-2-5">
        <title>1.5. Feature Transformation and Regularization Techniques</title>
        <p>
          Advanced feature transformation techniques have proven critical for improving baseline classifier
performance. The elastic net regularization framework introduced by Zou and Hastie [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ]
demonstrated that combining L1 and L2 penalties provides optimal balance between sparsity and
variance control. This theoretical foundation directly supports the proposed enhancements to the
AdaptiveBayes framework.
Modern initialization strategies have also received significant attention. Glorot and Bengio [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ]
established theoretical foundations for Xavier initialization, while He et al. [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ] developed
improved initialization schemes for ReLU-like activation functions. These contributions inform the
proposed weight initialization enhancements that could significantly improve AdaptiveBayes
accuracy while preserving computational advantages.
        </p>
      </sec>
      <sec id="sec-2-6">
        <title>1.6. Empirical Validation in Real-World Applications</title>
        <p>
          Contemporary research has emphasized the importance of comprehensive empirical validation
across diverse application domains. Geeitha et al. [22] demonstrated effective application of Naive
Bayes algorithms in medical survival prediction, while Abogada and Usona [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] successfully applied
machine learning classification models to attrition prediction in business process outsourcing
industries. These studies establish empirical precedents for evaluating adaptive baseline classifiers
across heterogeneous problem domains.
        </p>
        <p>The work by Rajneekant et al. [23] on malware classification using machine learning provided
comprehensive comparative analysis of API sequence-based techniques, establishing
methodological frameworks for systematic algorithm evaluation. Their emphasis on computational
efficiency alongside accuracy metrics directly parallels the evaluation methodology employed in
this AdaptiveBayes study.</p>
      </sec>
      <sec id="sec-2-7">
        <title>1.7. Gaps and Research Opportunities</title>
        <p>Despite these advances, systematic cross-dataset studies comparing efficiency-oriented baselines
remain scarce. Most existing research focuses on domain-specific applications rather than
comprehensive evaluation across diverse problem characteristics. Furthermore, theoretical analysis
of convergence guarantees for adaptive baseline approaches has received limited attention, creating
opportunities for both empirical and theoretical contributions.</p>
        <p>The integration of modern hardware acceleration with adaptive learning algorithms represents
another underexplored research direction. While GPU optimization has been extensively studied
for deep learning applications, its potential for accelerating traditional machine learning baselines
through adaptive mechanisms remains largely unexplored.</p>
        <p>The common vision of related proposed theory and existing theories is presented in Table 1.</p>
        <sec id="sec-2-7-1">
          <title>Limitations</title>
          <p>High
computational
cost, memory</p>
          <p>intensive</p>
          <p>Still requires
iterative training</p>
          <p>Complex
implementation,
limited to neural
networks</p>
          <p>Requires
hyperparameter</p>
        </sec>
        <sec id="sec-2-7-2">
          <title>AdB Integration/</title>
        </sec>
        <sec id="sec-2-7-3">
          <title>Extension</title>
          <p>Baseline
comparison for
efficiency trade-offs</p>
          <p>Computational</p>
          <p>efficiency
benchmarking</p>
          <p>standard</p>
          <p>Theoretical
foundation for
adaptive learning</p>
          <p>rates
Direct integration
in proposed</p>
          <p>tuning
Limited to</p>
          <p>specific
activation
functions
ReLU-specific
design</p>
          <p>Limited
scalability
analysis
Application</p>
          <p>specific
optimization</p>
          <p>Limited to</p>
          <p>specific
architectures
Theoretical
convergence</p>
          <p>analysis
incomplete
Neural network
specific
Limited
empirical
validation
Computational</p>
          <p>overhead
analysis missing
Domain-specific
methodology
enhancements</p>
          <p>Enhanced
initialization
strategy framework</p>
          <p>Adaptive
initialization based
on transformation</p>
          <p>choice
Diagonal Hessian
approximation
methodology</p>
          <p>Computational
efficiency principles</p>
          <p>Mathematical
foundation for
simplified Bayesian</p>
          <p>approach
Momentum
integration
framework
Second-order
optimization
principles</p>
          <p>Stochastic
optimization
methodology</p>
          <p>Robustness
considerations for</p>
          <p>practical
deployment
Empirical validation</p>
          <p>framework for
medical applications</p>
          <p>Limited to
specific medical
scenarios</p>
          <p>Bayesian
foundations for
classification tasks
This comprehensive literature review establishes the theoretical foundations and empirical
precedents that inform the AdaptiveBayes framework, identifying specific research gaps that this
study addresses through systematic empirical evaluation and algorithmic enhancement.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>2. Mathematical Background and Algorithm Enhancement</title>
    </sec>
    <sec id="sec-4">
      <title>Framework</title>
      <sec id="sec-4-1">
        <title>2.1. Current AdB Mathematical Foundation</title>
        <p>The baseline AdaptiveBayes algorithm implements a simplified Bayesian update with adaptive
learning rate modulation. The core mathematical framework operates as follows:</p>
        <sec id="sec-4-1-1">
          <title>Feature Transformation:</title>
          <p>x′=log(1+x).</p>
          <p>where x ∈ ℝd represents the input feature vector and the log1p transformation provides
numerical stability for non-negative features.</p>
        </sec>
        <sec id="sec-4-1-2">
          <title>Prediction Function:</title>
          <p>p(x) = σ(wTx′+b).</p>
          <p>where σ(z) = 1 / (1 + e-z) is the sigmoid activation function, w ∈ ℝd represents learned weights,
and b ∈ ℝ is the bias term.</p>
        </sec>
        <sec id="sec-4-1-3">
          <title>Adaptive Learning Rate:</title>
          <p>lradaptive =lrbase × ∣error∣ × (1 − ∣p − 0.5∣)
where error = y−p(x) represents prediction error and the uncertainty factor (1 - |p - 0.5|)
amplifies updates for uncertain predictions.</p>
        </sec>
        <sec id="sec-4-1-4">
          <title>Weight Update Rule:</title>
          <p>wt+1 = wt +lradaptive × error × x′</p>
          <p>bt+1 = bt +lradaptive × error.</p>
          <p>Updates occur only when ∣xi′∣ &gt; ϵ for computational efficiency.</p>
          <p>Enhanced mathematical framework: based on comprehensive performance analysis, we propose
mathematical enhancements addressing identified algorithmic limitations.</p>
        </sec>
      </sec>
      <sec id="sec-4-2">
        <title>2.2. Advanced Feature Transformation</title>
        <sec id="sec-4-2-1">
          <title>Adaptive Transform Selection:</title>
          <p>x' = {arctan ( x ) if kurtosis( x )&gt; 10
tanh ( x ) if skewness( x )&gt; 2</p>
          <p>x otherwise</p>
          <p>This adaptive selection mechanism automatically chooses optimal transformations based on
feature distribution characteristics.
(1)
(2)</p>
        </sec>
      </sec>
      <sec id="sec-4-3">
        <title>2.3. Elastic Net Regularization</title>
        <sec id="sec-4-3-1">
          <title>Enhanced Loss Function:</title>
          <p>where:
Lenhanced = Lbinary +λ1∣w∣1 +λ2∣w∣22
(3)
Lbinary=
− 1 n</p>
          <p>∑ [yi log ( pi )+(1− yi ) log (1− pi )]
n i= 1


λ1 = 0.01 promotes sparsity through L1 penalty.
λ2 = 0.01 controls variance through L2 penalty.</p>
        </sec>
      </sec>
      <sec id="sec-4-4">
        <title>2.4. Curvature-Aware Learning Rate Adaptation</title>
        <sec id="sec-4-4-1">
          <title>Uncertainty-Guided Scaling:</title>
          <p>c = 2∣p − 0.5∣ (confidence)</p>
        </sec>
        <sec id="sec-4-4-2">
          <title>Diagonal Hessian Approximation:</title>
          <p>u = 1 - c (uncertainty)
lruncertainty = lrbase × u × (1 + ∣error∣)</p>
          <p>hii ≈ pi (1 − pi) × (xi′)2
lrfinal = lruncertainty / (1 + hii)</p>
          <p>This approach approximates second-order curvature information without full Hessian
computation, following diagonal quasi-Newton principles.</p>
        </sec>
      </sec>
      <sec id="sec-4-5">
        <title>2.5. Advanced Weight Initialization</title>
        <p>Xavier/He Initialization:
w0 ∼ ℕ (0, 2 / (nin + nout))</p>
        <p>(Xavier)
w0 ∼ ℕ (0, 2 / nin))
(He for ReLU-like)
where nin and nout represent input and output dimensions respectively.</p>
      </sec>
      <sec id="sec-4-6">
        <title>2.6. Momentum Integration</title>
        <sec id="sec-4-6-1">
          <title>Momentum-Enhanced Updates:</title>
          <p>vt+1 = β vt + (1 − β) ∇ ℒ
wt+1 = wt − lrfinal × vt+1
with beta = 0.9 providing optimal acceleration while maintaining stability.</p>
        </sec>
        <sec id="sec-4-6-2">
          <title>Projected Accuracy Improvements:</title>
          <p>Adaptive Transforms: +3% points through distribution-aware preprocessing.
Elastic Net Regularization: +2% points via overfitting prevention.
Diagonal Newton: +1% point through curvature adaptation.</p>
          <p>Momentum + Early Stopping: +2% points via improved convergence.
Cumulative Enhancement: +8% points with ~3% time overhead.
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
569
768</p>
          <p>Medical diagnosis</p>
          <p>classification
Health outcome prediction</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>3. Methodology of Proposed Approach</title>
      <p>Experimental design. The comprehensive evaluation framework employs systematic comparison
across 12 carefully selected datasets (Table 2) representing diverse problem characteristics and
scales. The experimental protocol prioritizes both breadth of evaluation contexts and depth of
performance analysis to ensure robust conclusions about baseline model viability (Figure 1).</p>
      <sec id="sec-5-1">
        <title>Dataset Name</title>
      </sec>
      <sec id="sec-5-2">
        <title>Task Type</title>
      </sec>
      <sec id="sec-5-3">
        <title>Samples Features</title>
      </sec>
      <sec id="sec-5-4">
        <title>Description</title>
      </sec>
      <sec id="sec-5-5">
        <title>Standard Benchmark (UCI)</title>
        <p>Wine
Heart Disease
HIGGS
SUSY
KDDCup99
Covertype
Hepmass
Binary particle
classification
Binary particle
identification
Network intrusion
detection
Multi-class forest
prediction
Binary particle
classification
11M
5M
494,021
581,012
10.5M</p>
      </sec>
      <sec id="sec-5-6">
        <title>Large-Scale Classification</title>
        <p>Click-through rate prediction</p>
        <p>Species classification
Wine variety classification</p>
        <p>Medical diagnosis
Highly imbalanced transaction</p>
        <p>data (1:578 ratio)
High-energy physics particle</p>
        <p>classification
Supersymmetric particle</p>
        <p>detection
Network security</p>
        <p>classification
Cartographic data for forest</p>
        <p>cover type
Synthetic dataset for particle</p>
        <p>classification
1. Feature Transformation: log1p transformation for numerical stability, see formula (1).
2. Adaptive Learning Rate: Dynamic adjustment based on prediction confidence and error
magnitude, see formula (2).
3. Selective Updates: Parameter modifications only when feature magnitude exceeds
threshold: Update if |x| &gt; epsilon.</p>
        <p>GPU Acceleration: CuPy integration for computational efficiency with optional GPU memory
utilization.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>4. Comparative Model Implementation</title>
      <sec id="sec-6-1">
        <title>Baseline and Advanced Models:</title>
        <p>Logistic Regression: L2 regularization (C = 1) using scikit-learn.
XGBoost: Gradient boosting with default hyperparameters.
LightGBM: Histogram-based gradient boosting framework.
RandomForest: Ensemble decision tree method (100 estimators).</p>
        <p>Multi-Layer Perceptron: Neural network with adaptive learning rate.</p>
      </sec>
      <sec id="sec-6-2">
        <title>Evaluation Metrics and Statistical Analysis</title>
      </sec>
      <sec id="sec-6-3">
        <title>Performance Metrics:</title>
        <p>Classification Accuracy: Overall correctness measure.</p>
        <p>Area Under ROC Curve (AUC): Ranking quality assessment robust to class imbalance, a
measure of a classifier's ability to distinguish between classes, robust to class imbalance.
Training Time: Wall-clock training duration in seconds to train the model.</p>
        <p>Memory Consumption: Peak CPU and GPU memory utilization in MB consumed during
training.</p>
        <p>Prediction Time: Inference latency per sample, time required to generate predictions on the
test set.</p>
        <p>Efficiency Ratios: Derived metrics, such as the accuracy-to-log(time) ratio, to quantify the
trade-off between performance and speed.</p>
        <p>Statistical Validation: Stratified 5-fold cross-validation with Wilcoxon signed-rank tests for
paired comparisons (α = 0.05) ensures statistical significance of observed differences. Hardware
standardization on AMD 7950X3D CPU with NVIDIA RTX 4090 GPU provides consistent timing
measurements.</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>5. Detailed Experimental Methodology</title>
      <sec id="sec-7-1">
        <title>5.1. Experimental Setup and Hardware Configuration</title>
        <p>All experiments were conducted on standardized hardware to ensure fair comparison and
reproducible timing measurements. The experimental environment consisted of:
Hardware Specifications:</p>
        <p>Python: 3.11.5 with conda environment management.</p>
        <p>CUDA Toolkit: 12.1 with cuDNN 8.9.2.</p>
        <p>Key Libraries: scikit-learn 1.3.0, XGBoost 1.7.6, LightGBM 4.0.0, CuPy 12.2.0.</p>
        <p>Monitoring: nvidia-smi for GPU utilization, psutil for CPU/RAM tracking.</p>
      </sec>
      <sec id="sec-7-2">
        <title>5.2. Comprehensive Hyperparameter Configuration</title>
        <p>Systematic hyperparameter configuration ensures reproducible and fair model comparison across
all algorithms. The following Table 3 presents the complete hyperparameter specifications used in
the experimental evaluation.</p>
        <sec id="sec-7-2-1">
          <title>Justification</title>
          <p>Optimal from preliminary grid
search
Numerical stability for positive
features</p>
          <p>Selective update criterion
Hardware-dependent optimization
Convergence safety limit</p>
          <p>L2 regularization strength
Optimal for small-medium datasets
Convergence guarantee</p>
          <p>Reproducibility seed
Imbalanced data handling
Speed-accuracy balance</p>
          <p>Default optimal rate
Overfitting prevention</p>
          <p>Full data utilization</p>
          <p>Reproducibility seed
Consistency with XGBoost
Conservative convergence</p>
          <p>Tree complexity control
Regularization technique</p>
          <p>Reproducibility seed
MLP
n_estimators
max_depth
min_samples_split
min_samples_leaf</p>
          <p>random_state
hidden_layer_sizes
activation</p>
          <p>solver
learning_rate_init</p>
          <p>max_iter
random_state</p>
        </sec>
      </sec>
      <sec id="sec-7-3">
        <title>5.3. Statistical validation protocol</title>
        <p>Cross-validation strategy: Stratified 5-fold cross-validation was employed to ensure robust
performance estimation while maintaining computational efficiency. This approach provides:</p>
        <p>Balanced class representation across folds.</p>
        <p>Sufficient statistical power (5 samples per metric).</p>
        <p>Reasonable computational overhead.</p>
        <p>Standard deviation estimation for confidence intervals.</p>
        <p>Statistical significance testing: Paired model comparisons employed Wilcoxon signed-rank
tests (α = 0.05) to assess statistical significance of observed performance differences. This
nonparametric approach handles:</p>
        <p>Non-normal distributions of performance metrics.</p>
        <p>Small sample sizes (n=5 from cross-validation).</p>
        <p>Robust comparison against baseline methods.</p>
        <p>Type I error control through Bonferroni correction.</p>
        <sec id="sec-7-3-1">
          <title>Performance metric definitions:</title>
          <p>Accuracy: (TP + TN) / (TP + TN + FP + FN).</p>
          <p>AUC: Area under ROC curve using trapezoidal approximation.
Training Time: Wall-clock seconds from fit() initiation to completion.
Memory Usage: Peak resident set size (RSS) during training phase.</p>
          <p>Prediction Time: Average inference latency per sample (microseconds).</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>6. Results</title>
      <sec id="sec-8-1">
        <title>6.1. Classification performance analysis</title>
        <p>Standard benchmark performance: AdB demonstrated competitive performance on smaller-scale
benchmark problems, achieving superior accuracy compared to Logistic Regression on most
standard datasets with average improvements of 2–3% across breast cancer, diabetes, and heart
disease classification tasks. AUC performance remained consistently competitive, maintaining
ranking quality despite algorithmic differences.</p>
        <p>Large-scale dataset analysis: Performance patterns on large-scale datasets (Table 4) revealed
complex accuracy-efficiency trade-offs:</p>
        <p>Overall accuracy: AdB outperformed LR on only 1 of 7 datasets (KDDCup99) with average
accuracy difference of -0.26 ± 0.33.</p>
        <p>AUC performance: Superior performance on 3 of 7 datasets (CreditCardFraud, KDDCup99,
Avazu) with average improvement of +0.05 ± 0.17.
*CreditCardFraud shows threshold calibration issues</p>
        <sec id="sec-8-1-1">
          <title>Accuracy Gap</title>
          <p>+0.047
-0.964*
-0.135
-0.106
70×
322×
45×
38×</p>
          <p>The performance comparison reveals significant differences (Figure 2) between AdB and LR
algorithms across four large-scale datasets. In terms of AUC performance, AdB demonstrates
superior results on KDDCup99 (0.988 vs 0.861) but shows mixed performance on other datasets,
with LR outperforming AdB on HIGGS (0.685 vs 0.589) and SUSY (0.789 vs 0.654). The
CreditCardFraud dataset shows similar AUC scores (0.913 vs 0.898), though the accuracy gap data
appears anomalous with an extreme negative value.
The most compelling advantage of AdB lies in its training efficiency, delivering substantial
speedups ranging from 38x to 322x faster than traditional LR across all datasets. CreditCardFraud
shows the highest speedup at 322x, followed by KDDCup99 at 70x, while HIGGS and SUSY
demonstrate 45x and 38x improvements respectively. This dramatic performance gain makes AdB
particularly attractive for large-scale applications where training time is critical, even when
considering the accuracy trade-offs observed in some datasets.</p>
        </sec>
      </sec>
      <sec id="sec-8-2">
        <title>6.2. Computational Efficiency Analysis</title>
        <p>Training Speed Performance: AdB achieved dramatic computational advantages (Table 5, Figure 3)
across all evaluated datasets:</p>
        <p>Average Training Speedup: 10x faster than Logistic Regression.</p>
        <p>Peak Performance: 749x speedup on Covertype dataset (0.3s vs 224.7s).</p>
        <p>Consistent Advantage: Fastest training across all 7 large-scale datasets.</p>
        <p>Moderate GPU Overhead: Average 62.6 MB GPU memory utilization.</p>
        <sec id="sec-8-2-1">
          <title>Avg Training Time (s)</title>
        </sec>
        <sec id="sec-8-2-2">
          <title>Performance-to-Time</title>
        </sec>
        <sec id="sec-8-2-3">
          <title>Ratio</title>
        </sec>
        <sec id="sec-8-2-4">
          <title>Train-to-Predict</title>
        </sec>
        <sec id="sec-8-2-5">
          <title>Ratio Table 5</title>
          <p>*Negative ratio due to accuracy limitations</p>
          <p>The cross-model efficiency analysis (Figure 3) reveals AdB's exceptional performance in
computational speed metrics. AdB achieves the fastest average training time at just 1.68 seconds,
significantly outpacing XGBoost (3.86s), LightGBM (27.5s), MLP (89.4s), LR (168.9s), and
RandomForest (445.3s). This speed advantage extends to the train-to-predict ratio, where AdB
demonstrates superior efficiency at 9.6x, followed by XGBoost at 14.9x, while traditional methods
like LR require 257.0x more computational resources.</p>
          <p>However, AdB’s speed comes with a notable trade-off in the performance-to-time ratio, showing
a negative value of -4.24 due to accuracy limitations compared to other models. XGBoost leads this
metric with a positive ratio of 1.99, indicating better balance between accuracy and computational
cost, while other models maintain modest positive ratios ranging from 0.39 to 0.72. This suggests
that while AdB excels in raw computational efficiency and is ideal for time-critical applications,
users must carefully consider the accuracy requirements when choosing between speed
optimization and performance quality.</p>
        </sec>
      </sec>
      <sec id="sec-8-3">
        <title>6.3. Memory Usage and Resource Analysis</title>
        <p>Memory Consumption Patterns: Resource utilization analysis revealed distinct memory usage
characteristics across models:


</p>
        <p>Lightweight Models: LR (6.5 MB), XGBoost (12.3 MB).</p>
        <p>Moderate Consumption: AdB (174 MB total, 63 MB GPU), LightGBM (100.5 MB).</p>
        <p>Resource-Intensive: MLP (278 MB), RandomForest (3014 MB average, peak 9627 MB).</p>
        <p>AdB uniquely utilized GPU memory, distinguishing it from CPU-only alternatives and providing
optimization opportunities in GPU-accelerated environments.</p>
        <sec id="sec-8-3-1">
          <title>Failure Mode Analysis and Calibration Issues</title>
          <p>Algorithmic Limitations: Systematic analysis identified several critical failure modes:</p>
          <p>Calibration Analysis: The stark accuracy-AUC discrepancy on imbalanced datasets highlights
the need for automatic threshold tuning mechanisms rather than fundamental algorithmic
deficiencies. This pattern suggests post-hoc calibration techniques could significantly improve
practical deployment performance.</p>
        </sec>
        <sec id="sec-8-3-2">
          <title>Proposed Algorithmic Enhancement Framework</title>
          <p>Based on comprehensive performance analysis, we propose a systematic enhancement
framework targeting accuracy improvements of 2–8% while preserving computational efficiency.</p>
        </sec>
        <sec id="sec-8-3-3">
          <title>Enhanced Feature Transformation</title>
          <p>Current Limitation: Single log1p transformation may destabilize certain feature distributions
Proposed Solution: Adaptive transformation selection based on feature characteristics:
Bounded Transforms: tanh(x), arctan(x) for heavy-tailed distributions.</p>
          <p>Linear Preservation: Identity transform for normally distributed features.</p>
          <p>Robust Scaling: Standardization with outlier detection.





</p>
          <p>Xavier for Tanh/Sigmoid: formula (9).</p>
          <p>He for ReLU-like: formula (10).</p>
          <p>Adaptive Selection: Based on feature transformation choice.</p>
        </sec>
        <sec id="sec-8-3-4">
          <title>Advanced Regularization Framework</title>
          <p>Elastic Net Integration: Combine L1 and L2 penalties for improved generalization, see formula
(3), where α = 0.5 provides optimal balance between sparsity (L1) and variance control (L2).</p>
          <p>Expected Impact: +2% points accuracy improvement with 5% training overhead.
Curvature-Aware Learning Rate Adaptation</p>
          <p>Current Implementation: Simple error-based scaling inadequate for complex loss landscapes
Enhanced Approach: Uncertainty-aware adaptation with diagonal Hessian approximation:</p>
          <p>Confidence Estimation: formulas (4–5).</p>
          <p>Error-Uncertainty Scaling: formula (6).</p>
          <p>Diagonal Newton Approximation: formulas (7-8).</p>
          <p>This approach approximates second-order curvature information without full Hessian
computation, following recent advances in diagonal quasi-Newton methods.</p>
        </sec>
        <sec id="sec-8-3-5">
          <title>Improved Weight Initialization</title>
          <p>Xavier/He Initialization Integration: Replace naive Gaussian initialization with
architectureaware schemes:</p>
        </sec>
        <sec id="sec-8-3-6">
          <title>Momentum and Early Stopping</title>
          <p>Momentum Integration: Implement momentum-based updates for smoother convergence:
formulas (11-12), with β = 0.9 providing optimal acceleration while maintaining stability.</p>
          <p>Early Stopping Framework: Validation-based convergence detection with patience mechanism
to prevent overfitting while reducingtraining time by ~10% (Table 6, Figure 4).
Cumulative gain: +8 pp accuracy, +0.04 AUC, training still 8× faster than LR (KDDCup99 case
study).</p>
          <p>Expected result: Training still 8× faster than LR with substantially improved accuracy.</p>
        </sec>
        <sec id="sec-8-3-7">
          <title>Comprehensive Performance Comparison</title>
          <p>The following comprehensive comparison presents detailed performance metrics across all 12
evaluated datasets (Figure 5)</p>
        </sec>
      </sec>
      <sec id="sec-8-4">
        <title>6.4. Dataset-Specific Performance Analysis</title>
        <p>The side-by-side performance comparison presents a comprehensive dual-panel visualization
(Figure 6) that systematically evaluates AdaptiveBayes against Logistic Regression across five
benchmark datasets (Table 7) using both accuracy and AUC metrics. The left panel displays
classification accuracy with error bars representing standard deviations, while the right panel
shows corresponding AUC values with identical error bar representations. Each dataset is
represented by paired bars using a consistent color scheme: blue bars for AdaptiveBayes and red
bars for Logistic Regression, with green speedup annotations prominently displayed above each
dataset pair to highlight computational advantages ranging from 13x to 20x faster training times.</p>
        <p>The empirical results demonstrate that AdaptiveBayes maintains competitive or superior
performance compared to Logistic Regression across all evaluated datasets while delivering
substantial computational efficiency gains. Most notably, AdaptiveBayes achieves identical
accuracy on three datasets (Breast Cancer: 0.956, Iris: 0.967, Wine: 0.972) and demonstrates clear
improvements on Diabetes (0.753 vs 0.740) and Heart Disease (0.820 vs 0.803) classifications. The
AUC analysis reveals similar patterns, with AdaptiveBayes showing particular strength on the
Diabetes dataset (0.821 vs 0.804) and Heart Disease (0.893 vs 0.876), while maintaining parity on
other benchmarks. The error bars indicate that performance differences fall within acceptable
statistical ranges, suggesting robust algorithmic stability.
13×</p>
        <p>The visualization effectively communicates AdaptiveBayes' primary value proposition:
achieving competitive classification quality while delivering order-of-magnitude improvements in
computational efficiency. The consistent speedup factors across all datasets (averaging 16.2x faster)
combined with maintained or improved accuracy metrics establish AdaptiveBayes as a viable
baseline replacement for time-critical machine learning applications. This performance profile is
particularly valuable for scenarios requiring rapid model iteration, real-time deployment
constraints, or resource-limited computing environments where traditional optimization-based
approaches become computationally prohibitive.
The speed-accuracy trade-off analysis (Figure 7) employs a sophisticated scatter plot methodology
to examine the fundamental relationship between computational efficiency gains and classification
performance changes across the benchmark dataset collection. The visualization plots training
speedup values (13x to 20x) on the x-axis against accuracy differences (AdaptiveBayes minus
Logistic Regression) on the y-axis, with each dataset represented as a distinct colored point whose
size reflects corresponding AUC performance values. Error bars incorporate combined standard
deviations from both algorithms, providing statistical context for observed performance
differences, while a horizontal reference line at y = 0 indicates performance parity between
methods.</p>
        <p>The empirical analysis reveals a favorable trade-off profile where AdaptiveBayes consistently
achieves substantial speedup improvements without sacrificing classification accuracy across most
evaluated scenarios. Three datasets (Breast Cancer, Iris, Wine) demonstrate perfect parity (accuracy
difference = 0.000), while two datasets (Diabetes: +0.013, Heart Disease: +0.017) show modest
accuracy improvements favoring AdaptiveBayes. The scatter pattern indicates no negative
correlation between speedup magnitude and accuracy performance, suggesting that
AdaptiveBayes’ computational efficiency gains do not compromise classification quality. Point sizes
reflecting AUC values further corroborate this finding, with larger points (higher AUC) distributed
across various speedup levels without systematic degradation.</p>
        <p>This trade-off analysis provides critical insights for practical deployment decisions,
demonstrating that AdaptiveBayes occupies a unique algorithmic niche where computational
efficiency and classification quality are not mutually exclusive. The absence of accuracy penalties
despite 13–20x speedup improvements challenges traditional assumptions about speed-quality
tradeoffs in machine learning baselines. These findings have significant implications for time-sensitive
applications, enabling practitioners to achieve rapid model development cycles, real-time inference
requirements, and resource-constrained deployment scenarios without accepting substantial
performance degradation. The analysis supports AdaptiveBayes adoption in contexts where
training efficiency is paramount while maintaining competitive baseline performance standards.
The large-scale dataset performance comparison systematically evaluates AdaptiveBayes against
Logistic Regression across seven datasets containing more than 10,000 samples each (Table 8),
representing real-world enterprise-level machine learning challenges. The visualization employs a
professional bar chart format with blue bars representing AdaptiveBayes AUC performance and
red bars indicating Logistic Regression results, complemented by error bars depicting standard
deviations and prominent green speedup annotations highlighting computational efficiency gains.
The dataset selection spans diverse application domains including financial fraud detection
(CreditCardFraud), high-energy physics (HIGGS, SUSY), network intrusion detection (KDDCup99),
forest cover classification (Covertype), particle physics mass prediction (Hepmass), and
clickthrough rate prediction (Avazu), ensuring comprehensive evaluation across heterogeneous
problem characteristics and data structures.</p>
        <p>The empirical results demonstrate AdaptiveBayes’ remarkable scalability advantages (Figure 8),
with training speedup factors ranging from 31x to an extraordinary 749x improvement over Logistic
Regression while maintaining competitive or superior AUC performance on five out of seven
evaluated datasets. AdaptiveBayes achieves notable AUC improvements on CreditCardFraud (0.913
vs 0.898), KDDCup99 (0.988 vs 0.861), Covertype (0.764 vs 0.754), Hepmass (0.723 vs 0.708), and
Avazu (0.878 vs 0.865), demonstrating consistent advantages across diverse problem domains. The
two datasets where Logistic Regression maintains AUC superiority—HIGGS (0.685 vs 0.589) and
SUSY (0.743 vs 0.654)—represent high-energy physics classification problems with complex feature
interactions, suggesting domain-specific limitations of simplified Bayesian approaches while still
delivering substantial computational benefits (45x and 38x speedup respectively).</p>
        <p>The extreme speedup variations across datasets reveal important insights about AdaptiveBayes'
computational characteristics and optimal application scenarios. Covertype demonstrates the most
dramatic efficiency gain (749x speedup) with maintained accuracy, suggesting exceptional
suitability for categorical feature spaces and forest cover-type classification problems. Conversely,
the physics datasets (HIGGS, SUSY) show more modest but still substantial speedups (45x, 38x),
indicating that while computational advantages persist across all domains, the magnitude varies
with dataset complexity and feature interaction patterns. These findings establish AdaptiveBayes
as particularly valuable for time-critical applications, real-time deployment scenarios, and
resource-constrained environments where traditional optimization approaches become
computationally prohibitive, while highlighting the importance of domain-specific evaluation for
optimal algorithm selection.</p>
        <p>The dual-axis analysis (Figure 9) provides a sophisticated visualization framework that
simultaneously examines AUC performance and computational efficiency across large-scale
datasets through an integrated bar-and-line chart methodology.</p>
        <p>The primary y-axis (left) displays AUC values ranging from 0.55 to 1.00 using side-by-side bars
with blue representing AdaptiveBayes and red indicating Logistic Regression performance, while
the secondary y-axis (right) employs a logarithmic scale from 30x to 1000x to accommodate the
extreme range of speedup improvements visualized through a green line plot with circular
markers. This dual-metric approach enables comprehensive assessment of the fundamental
speedaccuracy trade-off that characterizes baseline classifier selection decisions, providing insights into
scenarios where computational efficiency gains justify potential accuracy trade-offs versus
situations where performance parity or improvement accompanies dramatic speedup benefits.</p>
        <p>The integrated visualization reveals distinct performance patterns across dataset categories,
with AdaptiveBayes demonstrating superior combined efficiency-accuracy profiles on the majority
of evaluated large-scale problems. Financial and security applications (CreditCardFraud,
KDDCup99, Avazu) show both AUC improvements and substantial speedup gains (70x-322x),
indicating optimal algorithmic alignment with these problem domains. Environmental
classification (Covertype) and particle physics mass prediction (Hepmass) achieve moderate AUC
improvements alongside extreme speedup factors (749x and 45x respectively), suggesting
exceptional computational efficiency for categorical feature spaces and structured prediction tasks.
The high-energy physics datasets (HIGGS, SUSY) represent the only scenarios where meaningful
AUC trade-offs occur (-0.096, -0.089), yet still deliver significant computational benefits (4x5, 38x),
highlighting domain-specific algorithmic limitations while maintaining practical deployment
advantages.</p>
        <p>This comprehensive dual-metric analysis establishes critical decision-making criteria for
practitioners evaluating AdaptiveBayes adoption in production environments, demonstrating that
computational efficiency gains are not uniformly distributed across problem domains but
consistently deliver substantial benefits regardless of AUC trade-off magnitude. The logarithmic
scale representation of speedup factors effectively communicates the extreme computational
advantages available through AdaptiveBayes, with average improvements of 186x across all
largescale datasets providing transformative capabilities for time-sensitive applications, iterative model
development workflows, and resource-constrained deployment scenarios. The visualization
supports evidence-based algorithm selection by clearly delineating scenarios where AdaptiveBayes
provides optimal efficiency-accuracy combinations versus cases requiring careful consideration of
domain-specific performance requirements, ultimately enabling informed baseline classifier
decisions based on application-specific constraints and optimization priorities.</p>
      </sec>
      <sec id="sec-8-5">
        <title>6.5. Efficiency-Accuracy Trade-off Analysis</title>
        <p>The comprehensive efficiency analysis reveals distinct algorithmic profiles across two critical
resource utilization dimensions (Table 9) that fundamentally influence practical deployment
decisions in machine learning applications.</p>
        <p>Performance-to-Time Ratios</p>
        <p>(Higher = Better)
XGBoost: 1.99 (optimal balance)
LightGBM: 0.72 (good balance)</p>
        <p>LogisticRegression: 0.55</p>
        <p>(moderate efficiency)</p>
        <p>MLP: 0.43 (poor efficiency)
RandomForest: 0.39 (slow training)</p>
        <p>AdB: -4.24
(speed champion, accuracy penalty)</p>
        <p>Memory Efficiency Rankings</p>
        <p>(MB per 0.01 accuracy)
LogisticRegression: 0.076 MB</p>
        <p>(most efficient)
XGBoost: 0.145 MB (very efficient)</p>
        <p>LightGBM: 1.18 MB (moderate)</p>
        <p>AdaptiveBayes: 3.28 MB
(includes GPU overhead)</p>
        <p>MLP: 5.22 MB (high consumption)</p>
        <p>RandomForest: 56.7 MB (memory intensive)</p>
        <p>The performance-to-time ratio metric quantifies the balance between classification quality and
computational cost, where higher values indicate superior efficiency in achieving accuracy relative
to training duration. XGBoost emerges as the optimal balanced solution with a ratio of 1.99,
demonstrating effective integration of accuracy and speed, while LightGBM (0.72) and Logistic
Regression (0.55) maintain positive ratios indicating reasonable efficiency trade-offs. Conversely,
AdaptiveBayes exhibits a negative ratio (-4.24) reflecting its primary optimization focus on
computational speed rather than accuracy maximization, positioning it as a specialized solution for
time-critical applications where rapid training takes precedence over marginal accuracy
improvements. RandomForest (0.39) and MLP (0.43) demonstrate poor efficiency profiles, requiring
substantial computational resources relative to their accuracy contributions.</p>
        <p>Memory consumption patterns reveal equally important practical considerations for algorithm
selection, particularly in resource-constrained environments or large-scale deployment scenarios.
Logistic Regression demonstrates exceptional memory efficiency at 0.076 MB per 0.01 accuracy
point, making it ideal for embedded systems and edge computing applications, while XGBoost
maintains very efficient memory utilization (0.145 MB) despite its superior accuracy profile.
AdaptiveBayes occupies a moderate memory consumption position (3.28 MB) that includes GPU
overhead, representing a reasonable trade-off for applications prioritizing computational speed
over memory optimization. The memory intensity spectrum culminates with RandomForest
consuming 56.7 MB per accuracy unit, highlighting its unsuitability for memory-constrained
deployments despite potential accuracy advantages. These efficiency metrics establish clear
decision-making frameworks for practitioners balancing computational speed, memory constraints,
and accuracy requirements across diverse application contexts, with each algorithm occupying
distinct optimization niches within the broader machine learning ecosystem.</p>
      </sec>
    </sec>
    <sec id="sec-9">
      <title>7. Discussion</title>
      <p>Our results show that AdB occupies a unique niche of ultra-fast classifiers. Its performance profile
is characterized by a clear trade-off: sacrificing accuracy for an order-of-magnitude increase in
training speed.</p>
      <p>While it fails to match the classification quality of state-of-the-art models like XGBoost or even
the baseline LR in most cases, its sheer velocity makes it a compelling option for specific use cases.
For instance, in rapid prototyping, iterative feature engineering, or production environments where
models must be retrained very frequently on large data streams, a 10x-700x speedup can be a decisive
advantage.</p>
      <sec id="sec-9-1">
        <title>Speed-Accuracy Trade-off Analysis</title>
        <p>The fundamental tension between computational efficiency and classification accuracy
characterizes AdB performance profile. While dramatic training speedups (10-749x) establish clear
computational advantages, accuracy limitations averaging -0.26 ± 0.33 compared to Logistic
Regression necessitate careful consideration of deployment contexts.</p>
        <p>The superior AUC performance on specific datasets (CreditCardFraud, KDDCup99, Avazu)
suggests AdB excels in particular problem domains, especially ranking tasks or scenarios with good
class separability. This pattern indicates problem-specific model selection should prioritize data
characteristics beyond simple accuracy metrics.</p>
      </sec>
      <sec id="sec-9-2">
        <title>Dataset-Dependent Performance Patterns</title>
        <p>Performance variation across datasets highlights the critical importance of problem-specific
evaluation. AdB exceptional performance on KDDCup99 (AUC 0.988, accuracy 0.936) demonstrates
that certain data characteristics—particularly good separability and moderate dimensionality—favor
adaptive learning approaches.</p>
        <p>Conversely, challenges with HIGGS and SUSY datasets reveal limitations in high-dimensional
physics problems requiring complex feature interactions. The CreditCardFraud paradox (low
accuracy, excellent AUC) underscores the critical need for threshold optimization in imbalanced
scenarios.</p>
      </sec>
      <sec id="sec-9-3">
        <title>Practical Deployment Guidelines</title>
        <p>Recommended Use Cases:
Not Recommended For:</p>
        <p>Time-Critical Pipelines: Rapid model development with acceptable accuracy trade-offs.
GPU-Rich Environments: Leveraging unique GPU acceleration capabilities.
Prototyping and Iteration: Fast baseline establishment for feature engineering.
Real-Time Applications: Scenarios prioritizing inference speed and training efficiency.
Regulatory/High-Stakes: Applications requiring maximum accuracy and interpretability.
Complex High-Dimensional: Problems similar to HIGGS/SUSY without architectural
improvements.</p>
        <p>Severely Imbalanced: Datasets without threshold optimization mechanisms.</p>
      </sec>
      <sec id="sec-9-4">
        <title>Limitations and Future Research Directions</title>
        <p>Current Limitations:</p>
        <p>Accuracy Deficiencies: Particularly on complex, high-dimensional problems.</p>
        <p>Calibration Issues: Suboptimal performance on imbalanced datasets.</p>
        <p>Memory Variability: Volatile GPU memory usage across different problem types.</p>
        <p>Limited Regularization: Absence of built-in overfitting prevention.</p>
        <p>Immediate Research Priorities:</p>
        <p>Automatic Threshold Tuning: Validation-based calibration for imbalanced data.</p>
        <p>Hyperparameter Optimization: Efficient grid search or Bayesian optimization.</p>
        <p>Mixed-Precision GPU: Lower memory footprint with maintained precision.</p>
        <p>Ensemble Integration: Combining multiple AdB variants.</p>
        <p>Long-Term Directions:</p>
        <p>Integration with modern deep learning frameworks for hybrid approaches.</p>
        <p>Extension to multi-task and transfer learning scenarios.</p>
        <p>Development of theoretical convergence guarantees under proposed enhancements.</p>
        <p>Exploration of attention mechanisms for adaptive feature weighting.</p>
        <p>The poor performance on datasets like HIGGS and SUSY and the analysis of its core algorithm
(simple learning rate adaptation, lack of regularization) point to clear architectural weaknesses.
The algorithm in its current form is ill-equipped to handle complex, non-linear decision boundaries
and is prone to overfitting. The paradoxical result on CreditCardFraud underscores the need for
better handling of imbalanced data, specifically through automatic classification threshold tuning.</p>
      </sec>
    </sec>
    <sec id="sec-10">
      <title>8. Future Research Directions and Limitations</title>
      <sec id="sec-10-1">
        <title>Critical Limitations and Failure Modes</title>
        <p>Algorithmic Limitations:
</p>
        <p>High-Dimensional Complexity: AdB struggles with datasets exceeding 1000 features,
particularly sparse categorical encodings where feature interactions dominate classification
performance.
Computational Constraints:
Statistical Validity Issues:</p>
      </sec>
      <sec id="sec-10-2">
        <title>Immediate Research Priorities</title>
        <p>High-Priority Algorithmic Enhancements:</p>
        <p>Automatic Threshold Calibration
GPU dependency limits deployment in CPU-only environments.</p>
        <p>Single-threaded CPU fallback reduces competitive advantage.</p>
        <p>Memory overhead exceeds lightweight baselines by 25-100x.</p>
        <p>Limited evaluation on multiclass problems (&gt;2 classes).</p>
        <p>Insufficient cross-dataset validation for hyperparameter sensitivity.</p>
        <p>Lack of convergence guarantees under proposed enhancements.





</p>
        <p>Non-Linear Decision Boundaries: Simple linear combinations prove insufficient for complex
decision surfaces characteristic of physics datasets (HIGGS, SUSY).</p>
        <p>Imbalanced Data Calibration: Threshold mis-calibration on severely imbalanced datasets
(1:578 ratio) requires post-hoc calibration techniques.</p>
        <p>Memory Volatility: GPU memory consumption varies dramatically (0-513 MB) based on
dataset characteristics, limiting predictable resource allocation.</p>
        <p>o Objective: Eliminate accuracy-AUC discrepancies on imbalanced datasets.
o Approach: Validation-based threshold selection using Youden's J statistic or
F1optimization.</p>
        <p>o Expected Impact: +15–30% accuracy improvement on imbalanced datasets.
o Implementation Timeline: 3–6 months.</p>
        <p>Hyperparameter Optimization Framework</p>
        <p>o Objective: Systematic exploration of learning rate, regularization, and transformation
parameters.</p>
        <p>o Approach: Bayesian optimization with early stopping on validation AUC.
o Expected Impact: +2-5% accuracy improvement across datasets.</p>
        <p>o Resource Requirements: 100-200 GPU hours per dataset.</p>
        <p>Mixed-Precision GPU Implementation
o Objective: Reduce memory footprint while maintaining numerical stability.
o Approach: FP16 training with FP32 master weights following NVIDIA Apex patterns.
o Expected Impact: 50% memory reduction, 20–30% speedup.</p>
        <p>o Technical Risk: Potential convergence issues requiring loss scaling.</p>
        <p>Medium-Priority Extensions:
</p>
        <p>Ensemble Integration Framework</p>
        <p>o Approach: Bagging multiple AdB variants with different initialization seeds and
hyperparameters.</p>
        <p>o Expected Impact: +3–8% accuracy improvement through variance reduction.
o Computational Overhead: 5–10× training time increase.
Online Learning Adaptation
o Objective: Enable incremental updates for streaming data scenarios.
o Approach: Exponential forgetting factors and adaptive batch sizing.
o Applications: Real-time fraud detection, dynamic recommendation systems.
o Research Challenge: Maintaining computational efficiency under concept drift.
Multi-Task Learning Extension
o Objective: Leverage shared representations across related classification tasks.
o Approach: Shared feature transformations with task-specific output layers.
o Expected Benefit: Improved performance on small datasets through transfer learning.</p>
      </sec>
      <sec id="sec-10-3">
        <title>Long-Term Research Directions</title>
        <p>Theoretical Foundations:</p>
        <p>Convergence Analysis:</p>
        <p>o Research Question: Under what conditions does the enhanced AdB converge to
optimal solutions?
o Methodology: Stochastic approximation theory and concentration inequalities.
o Impact: Provide theoretical guarantees for practical deployment decisions.
o Collaboration: Requires optimization theory expertise.</p>
        <p>Generalization Bounds:
o Objective: Establish PAC-Bayesian bounds for AdB generalization performance.
o Applications: Model selection, dataset size requirements, confidence intervals.
o Technical Challenge: Adaptive learning rate complicates traditional analysis
frameworks.</p>
        <p>Architectural Innovations:</p>
        <p>Attention-Based Feature Weighting:
o Inspiration: Transformer attention mechanisms for adaptive feature importance.
o Implementation: Learned attention weights modulating feature contributions.
o Expected Benefit: Improved performance on high-dimensional datasets with irrelevant
features.</p>
        <p>o Computational Overhead: 10-20% increase in training time.</p>
        <p>Hybrid Deep Learning Integration:
o Approach: AdB as final layer in deep networks for tabular data.
o Architecture: CNN/RNN feature extraction followed by AdB classification.
o Target Applications: Time series classification, structured tabular data with temporal
components.</p>
        <p>o Research Risk: May lose computational efficiency advantages.
Practical Deployment Research:</p>
        <p>Edge Computing Optimization:
o Objective: Enable AdB deployment on resource-constrained devices.
o Technical Approach: Quantization, pruning, and hardware-specific optimization.
o Target Hardware: ARM processors, mobile GPUs, embedded systems.</p>
        <p>o Success Metrics: &lt;10 MB memory footprint, &lt;100ms inference latency.
Federated Learning Adaptation:
o Research Challenge: Distribute AdB training across multiple clients.
o Privacy Constraints: Differential privacy guarantees for sensitive datasets.
o Communication Efficiency: Minimize parameter synchronization overhead.
o Applications: Healthcare, financial services, IoT sensor networks.</p>
      </sec>
      <sec id="sec-10-4">
        <title>Recommended Evaluation Protocols</title>
        <p>Enhanced Benchmarking Standards:</p>
        <p>Cross-Domain Validation:
o Objective: Assess generalization across different application domains.
o Dataset Selection: Medical, financial, industrial, and scientific domains.
o Evaluation Metrics: Domain adaptation performance, transfer learning effectiveness.
o Statistical Rigor: Multi-level cross-validation with domain stratification.</p>
        <p>Computational Efficiency Benchmarks:
o Hardware Diversity: CPU-only, GPU-accelerated, multi-GPU, and distributed settings.
o Energy Consumption: Power usage effectiveness for green AI initiatives.
o Scalability Analysis: Performance scaling with dataset size and dimensionality.
o Comparison Framework: Standardized efficiency metrics across hardware
configurations.</p>
        <p>Robustness Evaluation:</p>
        <p>o Adversarial Robustness: Performance under input perturbations and adversarial
examples.</p>
        <p>o Distribution Shift: Covariate shift, label shift, and concept drift scenarios.
o Missing Data Handling: Performance degradation under various missingness patterns.
o Noise Tolerance: Gaussian, uniform, and systematic noise injection studies.</p>
        <p>The comprehensive research roadmap establishes AdB as a foundation for next-generation
efficient machine learning algorithms while acknowledging current limitations and providing
concrete paths for improvement. Success in these research directions could establish adaptive
learning as a viable alternative to traditional optimization-based approaches in
resourceconstrained applications.</p>
      </sec>
    </sec>
    <sec id="sec-11">
      <title>9. Conclusions</title>
      <p>This comprehensive analysis establishes AdB as a compelling alternative to Logistic Regression for
specific application contexts prioritizing computational efficiency over marginal accuracy



</p>
      <p>Algorithmic Insights:
improvements. The dramatic training speed advantages (10-749x) combined with competitive AUC
performance on select datasets position AdB as a viable baseline option for time-sensitive machine
learning applications.</p>
      <p>Computational Efficiency:
10x average training speedup over Logistic Regression with peaks at 749x.</p>
      <p>Optimal training-prediction balance (9.6x ratio) among all evaluated methods.</p>
      <p>Moderate memory consumption (174 MB) with unique GPU utilization capabilities.
Classification Performance:</p>
      <p>Accuracy limitations averaging -0.26 ± 0.33 but AUC improvements of +0.05 ± 0.17 on
favorable datasets.</p>
      <p>Outstanding performance on KDDCup99 (AUC 0.988) demonstrating method potential.
Clear calibration issues on imbalanced data requiring threshold optimization.</p>
      <p>Proposed enhancement framework targeting 2–8% accuracy improvements through
regularization, curvature adaptation, and improved initialization.</p>
      <p>Systematic identification of failure modes and mitigation strategies.</p>
      <p>Clear deployment guidelines for practical application.</p>
      <p>AdB proves suitable as a baseline replacement when training speed is critical, datasets exhibit
good separability characteristics, and applications can accommodate moderate accuracy reduction
for substantial speed gains. The method is not recommended for problems requiring maximum
accuracy or highly imbalanced data without calibration improvements.</p>
      <p>AdB occupies a unique point in the accuracy-efficiency trade-off landscape: order-of-magnitude
faster than Logistic Regression with comparable AUC on selected datasets but materially worse
accuracy on complex physics data. Minor algorithmic refinements narrow the quality gap while
preserving speed. We therefore recommend AdB as a drop-in baseline replacement when training
latency is the primary bottleneck.</p>
      <p>The evolving landscape of machine learning applications increasingly values computational
efficiency alongside traditional accuracy metrics. This research contributes empirical evidence for
alternative baseline model selection while providing a concrete roadmap for algorithmic
improvements through the proposed enhancement framework.</p>
      <p>Enhanced AdB implementations incorporating improved feature transformations, elastic net
regularization, diagonal quasi-Newton optimization, and adaptive threshold tuning show potential
for addressing current limitations while preserving computational advantages. These developments
could significantly expand the practical applicability of adaptive baseline approaches across
broader problem domains.</p>
      <p>Future work should prioritize implementation of the proposed enhancement framework,
systematic evaluation of calibration techniques, and exploration of hybrid approaches combining
AdB efficiency with complementary accuracy-enhancing methods. The unique computational
profile of AdB positions it as a valuable component in the toolkit of modern machine learning
practitioners facing increasingly large-scale and time-sensitive applications.</p>
      <p>Declaration on Generative AI
While preparing this work, the authors used the AI programs Grammarly Pro to correct text
grammar and Strike Plagiarism to search for possible plagiarism. After using this tool, the authors
reviewed and edited the content as needed and took full responsibility for the publication’s content.
[22] S. Geeitha, et al., Disease-Free Survival Prediction in Recurrent Cervical Cancer using Naive Bayes
Machine Learning Algorithm, in: 5th Int. Conf. Smart Electron. Commun. (ICOSEC 2024), Trichy,
India, 2024, 1901–1906. doi:10.1109/ICOSEC61587.2024.10722641
[23] P. Rajneekant, B. P. Kishore, D. P. Gond, D. P. Mohapatra, Enhancing Malware Classification
with Machine Learning: A Comparative Analysis of API Sequence-based Techniques, in: IEEE
Int. Conf. Smart Power Control Renew. Energy (ICSPCRE 2024), Rourkela, India, 2024, 1–6.
doi:10.1109/ICSPCRE62303.2024.10675011
[24] T. Chen, C. Guestrin, XGBoost: A Scalable Tree Boosting System, in: 22nd ACM SIGKDD Int. Conf.</p>
      <p>Knowl. Discov. Data Min., 2016, 785–794.
[25] X. Ma, Apollo: An Adaptive Parameter-Wise Diagonal Quasi-Newton Method for Nonconvex</p>
      <p>Stochastic Optimization, arXiv preprint, 2021. doi:10.48550/arXiv.2009.13586
[26] Z. Aminifard, S. Babaie-Kafaki, Diagonally Scaled Memoryless Quasi-Newton Methods with</p>
      <p>Application to Compressed Sensing, J. Ind. Manag. Optim., 18 (2022) 4181–4205.
[27] M. M. Ahmed, A Novel Variance Reduction Proximal Stochastic Newton Algorithm for Large-Scale
Machine Learning Optimization, Int. J. Adv. Netw. Monit. Control, 9 (2024) 84–90.
doi:10.2478/ijanmc-2024-0040
[28] J. Islam, et al., A Comparative Study on Feature Selection between Computational and Medical
Knowledge Driven Approaches for Heart Disease Prediction, in: IEEE Int. Conf. Biomed. Eng.
Comput. Inf. Technol. Health (BECITHCON 2024), Dhaka, Bangladesh, 2024, 125–130.
doi:10.1109/BECITHCON64160.2024.10962566</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>S.</given-names>
            <surname>Abogada</surname>
          </string-name>
          , L. Usona,
          <article-title>Early Warning System of Attrition in the BPO Industry using Machine Learning Classification Models</article-title>
          ,
          <string-name>
            <given-names>J. Artif. Intell. Mach. Learn. Neural</given-names>
            <surname>Netw</surname>
          </string-name>
          .,
          <volume>4</volume>
          (
          <year>2024</year>
          )
          <fpage>18</fpage>
          -
          <lpage>30</lpage>
          . doi:
          <volume>10</volume>
          .55529/jaimlnn.43.18.30
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>S.</given-names>
            <surname>Kavun</surname>
          </string-name>
          , G. Zhosan,
          <article-title>Calculation of the Generalizing Indicator of Productivity of the Enterprises Activity based on the Matrix-Rank Approach</article-title>
          ,
          <string-name>
            <given-names>J. Finance</given-names>
            <surname>Econ</surname>
          </string-name>
          .,
          <volume>2</volume>
          (
          <year>2014</year>
          )
          <fpage>202</fpage>
          -
          <lpage>209</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>S.</given-names>
            <surname>Kavun</surname>
          </string-name>
          ,
          <article-title>Indicative-Geometric Method for Estimation of Any Business Entity</article-title>
          ,
          <source>Int. J. Data Anal. Tech. Strateg.</source>
          ,
          <volume>8</volume>
          (
          <year>2016</year>
          )
          <article-title>87</article-title>
          . doi:
          <volume>10</volume>
          .1504/ijdats.
          <year>2016</year>
          .077486
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Z. C.</given-names>
            <surname>Chia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. H.</given-names>
            <surname>Lim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. P. L.</given-names>
            <surname>Tan</surname>
          </string-name>
          ,
          <article-title>Two-Phase Switching Optimization Strategy in LSTM Model for Predictive Maintenance</article-title>
          ,
          <source>in: Int. Conf. Green Energy, Comput. Sustain. Technol. (GECOST</source>
          <year>2021</year>
          ), Miri, Malaysia,
          <year>2021</year>
          ,
          <fpage>1</fpage>
          -
          <lpage>6</lpage>
          . doi:
          <volume>10</volume>
          .1109/GECOST52368.
          <year>2021</year>
          .9538639
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>J.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Peng</surname>
          </string-name>
          , H. Wu,
          <article-title>Newton's Method and its Hybrid with Machine Learning for NavierStokes Darcy Models Discretized by Mixed Element Methods</article-title>
          ,
          <source>Commun. Comput. Phys.</source>
          ,
          <volume>37</volume>
          (
          <issue>1</issue>
          ) (
          <year>2025</year>
          )
          <fpage>30</fpage>
          -
          <lpage>60</lpage>
          . doi:
          <volume>10</volume>
          .4208/cicp.OA-2024-0066
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>S.</given-names>
            <surname>Yagishita</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Nakayama</surname>
          </string-name>
          ,
          <article-title>Proximal Diagonal Newton Methods for Composite Optimization Problems</article-title>
          , arXiv preprint,
          <year>2023</year>
          . doi:
          <volume>10</volume>
          .48550/arXiv.2310.06789
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>F.</given-names>
            <surname>Pedregosa</surname>
          </string-name>
          , et al.,
          <source>Scikit-learn: Machine learning in Python, J. Mach. Learn. Res.</source>
          ,
          <volume>12</volume>
          (
          <year>2011</year>
          )
          <fpage>2825</fpage>
          -
          <lpage>2830</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>S.</given-names>
            <surname>Datsenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Kuchuk</surname>
          </string-name>
          ,
          <source>Biometric Authentication Utilizing Convolutional Neural Networks, Adv. Inf. Syst.</source>
          ,
          <volume>7</volume>
          (
          <issue>2</issue>
          ) (
          <year>2023</year>
          )
          <fpage>87</fpage>
          -
          <lpage>91</lpage>
          . doi:
          <volume>10</volume>
          .20998/
          <fpage>2522</fpage>
          -
          <lpage>9052</lpage>
          .
          <year>2023</year>
          .
          <volume>2</volume>
          .
          <fpage>12</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>J.</given-names>
            <surname>Martens</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Grosse</surname>
          </string-name>
          ,
          <article-title>Optimizing Neural Networks with Kronecker-Factored Approximate Curvature</article-title>
          ,
          <source>in: 32nd Int. Conf. Mach. Learn.</source>
          ,
          <volume>37</volume>
          (
          <year>2015</year>
          )
          <fpage>2408</fpage>
          -
          <lpage>2417</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>O.</given-names>
            <surname>Trydid</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kavun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Goykhman</surname>
          </string-name>
          ,
          <article-title>Synthesis Concept of Information and Analytical Support for Bank Security System, Actual Probl</article-title>
          . Econ.,
          <volume>11</volume>
          (
          <issue>161</issue>
          ) (
          <year>2014</year>
          )
          <fpage>449</fpage>
          -
          <lpage>461</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>S.</given-names>
            <surname>Yagishita</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Nakayama</surname>
          </string-name>
          ,
          <article-title>An Acceleration of Proximal Diagonal Newton Method, JSIAM Lett</article-title>
          .,
          <volume>16</volume>
          (
          <year>2024</year>
          )
          <fpage>5</fpage>
          -
          <lpage>8</lpage>
          . doi:
          <volume>10</volume>
          .14495/jsiaml.16.5
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>P.</given-names>
            <surname>Baldi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Sadowski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Whiteson</surname>
          </string-name>
          ,
          <article-title>Searching for Exotic Particles in High-Energy Physics with Deep Learning, Nat</article-title>
          . Commun.,
          <volume>5</volume>
          (
          <year>2014</year>
          )
          <fpage>4308</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>C. D. Alecsa</surname>
            ,
            <given-names>A</given-names>
          </string-name>
          <string-name>
            <surname>Theoretical</surname>
          </string-name>
          and
          <article-title>Empirical Study of New Adaptive Algorithms with Additional Momentum Steps and Shifted Updates for Stochastic Non-Convex Optimization</article-title>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Glob</surname>
          </string-name>
          . Optim.,
          <volume>93</volume>
          (
          <year>2025</year>
          )
          <fpage>113</fpage>
          -
          <lpage>173</lpage>
          . doi:
          <volume>10</volume>
          .1007/s10898-025-01518-0
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>G.</given-names>
            <surname>Ke</surname>
          </string-name>
          , et al.,
          <article-title>LightGBM: A Highly Efficient Gradient Boosting Decision Tree, Adv</article-title>
          .
          <source>Neural Inf. Process. Syst.</source>
          ,
          <volume>30</volume>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>L.</given-names>
            <surname>Wu</surname>
          </string-name>
          , et al.,
          <string-name>
            <given-names>A Robust</given-names>
            <surname>Stochastic</surname>
          </string-name>
          <article-title>Quasi-Newton Method with the Application in Machine Learning</article-title>
          ,
          <source>in: Int. Conf. Culture-Oriented Sci. Technol</source>
          .
          <source>(ICCST</source>
          <year>2021</year>
          ), Beijing, China,
          <year>2021</year>
          ,
          <fpage>149</fpage>
          -
          <lpage>154</lpage>
          . doi:
          <volume>10</volume>
          .1109/ICCST53801.
          <year>2021</year>
          .00041
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>S.</given-names>
            <surname>Kavun</surname>
          </string-name>
          ,
          <source>Adaptive_Bayes: Version</source>
          <volume>01</volume>
          (v_01),
          <source>Benchmark Analysis, Zenodo</source>
          ,
          <year>2025</year>
          . doi:
          <volume>10</volume>
          .5281/zenodo.17184113
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>J.</given-names>
            <surname>Luo</surname>
          </string-name>
          , et al.,
          <article-title>Complexity-optimized Sparse Bayesian Learning for Scalable Classification Tasks, Inf</article-title>
          . Sci.,
          <volume>719</volume>
          (
          <year>2025</year>
          )
          <article-title>122447</article-title>
          . doi:
          <volume>10</volume>
          .1016/j.ins.
          <year>2025</year>
          .122447
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>J.</given-names>
            <surname>Luo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. M.</given-names>
            <surname>Vong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <article-title>An Inverse-Free and Scalable Sparse Bayesian Extreme Learning Machine for Classification Problems</article-title>
          , IEEE Access,
          <volume>9</volume>
          (
          <year>2021</year>
          )
          <fpage>87543</fpage>
          -
          <lpage>87551</lpage>
          . doi:
          <volume>10</volume>
          .1109/ACCESS.
          <year>2021</year>
          .3089539
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>H.</given-names>
            <surname>Zou</surname>
          </string-name>
          , T. Hastie,
          <article-title>Regularization and Variable Selection via the Elastic Net</article-title>
          ,
          <string-name>
            <given-names>J. R.</given-names>
            <surname>Stat</surname>
          </string-name>
          .
          <source>Soc. Ser. B</source>
          ,
          <volume>67</volume>
          (
          <year>2005</year>
          )
          <fpage>301</fpage>
          -
          <lpage>320</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>X.</given-names>
            <surname>Glorot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Bengio</surname>
          </string-name>
          ,
          <article-title>Understanding the Difficulty of Training Deep Feedforward Neural Networks</article-title>
          ,
          <source>in: 13th Int. Conf. Artif. Intell. Stat.</source>
          ,
          <year>2010</year>
          ,
          <fpage>249</fpage>
          -
          <lpage>256</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>K.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , S. Ren,
          <string-name>
            <given-names>J.</given-names>
            <surname>Sun</surname>
          </string-name>
          , Delving Deep into Rectifiers:
          <article-title>Surpassing Human-Level Performance on ImageNet Classification</article-title>
          ,
          <source>in: IEEE Int. Conf. Comput. Vis.</source>
          ,
          <year>2015</year>
          ,
          <fpage>1026</fpage>
          -
          <lpage>1034</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>