<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>S. Vladov, V. Vysotska, V. Sokurenko, O. Muzychuk, M. Nazarkevych, V. Lytvyn, Neural
Network System for Predicting Anomalous Data in Applied Sensor Systems, Applied System
Innovation</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.3390/asi7050088</article-id>
      <title-group>
        <article-title>learning approach⋆</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Valerii Sokurenko</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Victoria Vysotska</string-name>
          <email>victoria.a.vysotska@lpnu.ua</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Viktor Vasylenko</string-name>
          <email>Vasylenko_Viktor@ukr.net</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Anatolii Timoshyn</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lidia</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kalienichenko</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mykola Marchuk</string-name>
          <email>marchuk_m_i@ukr.net</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Oleksii Salmanov</string-name>
          <email>salmanov.ua@gmail.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Information Systems and Networks Department, Lviv Polytechnic National University</institution>
          ,
          <addr-line>Stepan Bandera Street 12 79013 Lviv</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Kharkiv National University of Internal Affairs</institution>
          ,
          <addr-line>L. Landau Avenue 27 61080 Kharkiv</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <volume>7</volume>
      <issue>5</issue>
      <fpage>0000</fpage>
      <lpage>0001</lpage>
      <abstract>
        <p>Accurate prediction of financial distress is essential for corporate governance, credit evaluation, and systemic stability. Yet such forecasting remains challenging due to two structural issues: pronounced class imbalance, as distress events are rare, and strong temporal dependence within panel data. This study introduces the Temporal-Balanced Random Forest (T-BRF), a novel ensemble method that extends the BalancedRandomForestClassifier by incorporating time-aware sampling, lagged feature construction, and memory-based voting to address both challenges simultaneously. Evaluation on a realistic synthetic panel of financial indicators demonstrates that T-BRF consistently surpasses baseline models, including GLM, Random Forest, XGBoost, and BRF, across key performance metrics. It achieves superior results in AUCPR (0.512 vs. 0.438 for XGBoost, +7.4 percentage points) and Recall@Top-10% (76.3% vs. 63.5% for XGBoost, +12.8 percentage points), delivering the strongest early-risk detection capability among all tested models. SHAP-based interpretability analysis reveals that T-BRF identifies economically meaningful patterns, emphasizing deteriorating profitability, increasing leverage, and weakening interest coverage. The model's architecture ensures transparency and operational robustness, making it well suited for integration into risk monitoring and supervisory systems. Overall, T-BRF reconciles predictive accuracy with interpretability in imbalanced longitudinal environments and provides a resilient, transparent framework for forward-looking corporate risk assessment.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;financial risk modeling</kwd>
        <kwd>imbalanced classification</kwd>
        <kwd>ensemble learning</kwd>
        <kwd>temporal panel data</kwd>
        <kwd>SHAP interpretability</kwd>
        <kwd>early warning system</kwd>
        <kwd>BalancedRandomForest</kwd>
        <kwd>machine learning in finance1</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Machine learning models have shown promise in predicting corporate distress by leveraging
largescale financial data [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. However, two persistent challenges limit their effectiveness in real-world
settings: class imbalance where distressed firms constitute a small fraction of the population and
temporal dependence inherent in panel data, where observations are correlated across time for the
same entity. Standard ensemble methods, such as Random Forest or XGBoost, often fail to account
for these characteristics, leading to poor recall and unstable predictions over time.
      </p>
      <p>While techniques like BalancedRandomForestClassifier address class imbalance through
undersampling, they assume independent and identically distributed (i.i.d.) data, an assumption
violated in longitudinal settings. Similarly, models that incorporate time-series dynamics often
neglect class imbalance, resulting in biased forecasts. This gap motivates the development of hybrid
approaches that jointly model temporal evolution and imbalanced learning.</p>
      <p>To bridge this gap, the Temporal-Balanced Random Forest (T-BRF), a novel ensemble method
designed specifically for panel data with rare-event outcomes, has been proposed. T-BRF extends
the BalancedRandomForest framework by integrating time-decay sampling weights, lagged feature
augmentation, and a memory-aware voting mechanism to improve early-risk detection under
imbalance. The contributions of this work are threefold:


</p>
      <p>The challenge of financial risk prediction as a temporal imbalanced classification problem
in panel data has been formalized.</p>
      <p>T-BRF, a new ensemble algorithm that explicitly accounts for recency, heterogeneity, and
persistence in risk states has been proposed.</p>
      <p>A reproducible evaluation framework with interpretability analysis via SHAP, enhancing
model transparency and trust has been provided.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Works</title>
      <p>Recent progress in machine learning and data accessibility has reshaped the landscape of financial
risk modeling. The following review synthesizes research on corporate distress prediction,
imbalanced classification, temporal modeling, and hybrid architectures, establishing the foundation
for the proposed Temporal-Balanced Random Forest (T-BRF).</p>
      <p>
        Machine Learning for Financial Distress Prediction. Traditional statistical models, such as
Altman’s Z-score, have been increasingly complemented by machine learning methods capable of
capturing non-linear relationships in firm-level data. Comparative studies have shown that
ensemble algorithms, including Random Forest and XGBoost, substantially outperform
conventional econometric techniques in bankruptcy and default forecasting [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ]. Deep learning
architectures have further advanced this field by modeling sequential financial behavior. Recurrent
and attention-based networks, such as LSTM and Transformer variants, demonstrate superior
early-warning performance by learning temporal dependencies in multi-period statements [
        <xref ref-type="bibr" rid="ref3 ref4">3, 4</xref>
        ].
      </p>
      <p>
        Addressing Class Imbalance in Financial Risk Modeling. The rarity of distress events, typically
below 10% of firm-year observations, poses a persistent challenge to classifier calibration. Various
rebalancing techniques, such as SMOTE, ADASYN, and cost-sensitive learning have been explored
to improve minority detection [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Among these, the Balanced Random Forest (BRF) [17] has
gained wide adoption due to its stability and strong recall. Applications in different contexts
confirm its benefits: for instance, Akbayrak et al. [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] reported a 27% gain in F1-score for Turkish
firms, while Li et al. [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] demonstrated that combining BRF with SHAP-based feature selection
enhances both accuracy and interpretability in ESG risk assessment.
      </p>
      <p>
        Temporal and Panel Data Modeling. Financial variables evolve over time and exhibit
firmspecific dependence, which violates the i.i.d. assumption of most standard models. To address this,
recent studies have integrated temporal and cross-sectional structure into predictive frameworks.
Neural architectures with attention or gated recurrent units improve sensitivity to early signals [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ],
whereas hybrid econometric–ML methods combine fixed effects with non-linear boosting to
account for unobserved heterogeneity [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. Time-aware sampling strategies that assign higher
importance to recent periods have also shown promise in survival and default prediction [10].
These developments highlight the need for models that jointly capture temporal dependence and
sample imbalance.
      </p>
      <p>Hybrid and Multimodal Risk Models.</p>
      <p>The growing availability of textual and ESG data has motivated integration of non-financial
signals into traditional quantitative models. Studies that fuse financial ratios with sentiment
indicators or corporate disclosures report meaningful improvements in predictive accuracy [11,12].
Text-based features often anticipate financial deterioration by several quarters, reinforcing the
importance of multimodal learning for early detection.</p>
      <p>Interpretability and Explainable AI. Transparency remains a crucial requirement in financial
applications. The SHAP framework introduced by Lundberg and Lee [16] has become the standard
tool for post-hoc explanation in finance and credit analytics. Subsequent research has applied
SHAP to reveal latent biases, sector-specific drivers, and temporal evolution of feature importance
[13–15]. Such interpretability is essential for regulatory compliance and practitioner trust.</p>
      <p>Research Gap and Contribution. Despite these advances, existing models typically neglect
temporal recency in sampling, treat lagged features and weighting schemes separately, and provide
limited interpretability when applied to imbalanced panel data.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Methodology and Methods</title>
      <p>To address the limitations of conventional ensemble methods in modeling financial risk under class
imbalance and temporal dependence, the Temporal-Balanced Random Forest (T-BRF), a hybrid
ensemble classifier specifically designed for panel data with rare-event dynamics has been
proposed. T-BRF extends the BalancedRandomForestClassifier by integrating temporal weighting,
lagged feature engineering, group-aware sampling, and memory-based prediction smoothing into a
unified framework. This section presents the formal mathematical structure of T-BRF, its
algorithmic design and theoretical justification. The model on a synthetic but realistic financial
panel dataset demonstrating superior performance in AUC-PR and Recall@Top-10% compared to
state-of-the-art baselines has been evaluated.</p>
      <sec id="sec-3-1">
        <title>3.1. Problem Formulation</title>
        <p>N ,T ​
Let D={( xit ​, yit ​)}i=1,t=1 denote a balanced panel dataset, where i = 1, …, N are indexes firms;t = 1,
…, T are indexes time periods (e.g., quarters); xit ​∈ Rd is a vector of financial and/or reputational
features; yit ​∈{0,1 } is a vector of financial and/or reputational features;</p>
        <p>The goal is to learn a predictive function y^​it ​=f ( xi,≤t ​)that maximizes early detection of rare
distress events while maintaining robustness to noise, heterogeneity and class imbalance.</p>
        <p>Due to the rarity of yit = 1, standard models suffer from low recall and overconfidence in
majority-class predictions. Moreover, classical i.i.d. assumptions are violated due to within-firm
autocorrelation and evolving risk states.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Lagged Feature Augmentation</title>
        <p>To capture temporal dynamics, the input space is augmented with lagged observations:
~x it ​=[ xit ​, xi,t −1 ​, … , xi,t −k ​]∈Rd(k+1)
(1)</p>
        <p>Alternatively, in order to reduce dimensionality and smooth noise exponential moving
averaging is applied:</p>
        <p>k
μ(itj) ​=(1 − β )∑ ​βτ x(i,jt)−τ ​,
τ=0
β ∈(0,1) , j=1 , … , d
(2)
where β controls the decay rate of historical influence. While, ~x it ​=μit is used as the enhanced
feature vector.</p>
        <p>This transformation allows the model to detect trends (e.g., declining ROA over three quarters)
rather than isolated anomalies.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Temporal-Weighted Undersampling</title>
        <p>BalancedRandomForestClassifier performs random undersampling of the majority class (y = 0) in
each bootstrap sample. However, this treats all historical observations equally, violating the
principle of recency.</p>
        <p>During sampling temporal weighting is introduced. The inclusion probability of a non-event
observation (i, t) is proportional to its temporal weight:
where T is the final time point. This ensures that more recent “healthy” states have higher selection
probability, reflecting their greater relevance to current risk assessment.</p>
        <p>Additionally, to account for firm heterogeneity (e.g., sector size), group-level correction weights
are defined. Let Gj be the set of firms inj-th sector, then:
wititme ​=αT −t ,</p>
        <p>α ∈( 0.8 , 1.0 ]
wigroup ​=</p>
        <p>1 ,
√∣G j∣​</p>
        <p>i ∈G j ​
wit ​=wititme ​⋅ wigroup ​
The combined sampling weight is:
(3)
(4)
(5)
(6)
(7)
(8)</p>
        <p>During bootstrap sampling for each tree, ∣D1∣ instances from D0 (non-events) are sampled with
probability proportional to witto ensure both balance and temporal-group relevance.</p>
        <p>The use of exponential time decay in sampling weights follows recent advances in time-aware
resampling for longitudinal risk prediction [10] which demonstrate that prioritizing recent
observations improves model relevance in non-stationary environments.</p>
        <p>
          To address class imbalance, a balanced sampling strategy is adopted. Recent studies in credit
risk modeling [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] demonstrate that undersampling with adaptive weighting improves
minorityclass recall without sacrificing overall stability.
        </p>
      </sec>
      <sec id="sec-3-4">
        <title>3.4. Tree Ensemble Construction</title>
        <p>Each decision tree Tm, m = 1, …, M, is trained on a balanced subsample Sm drawn as above. The
prediction of the m-th tree is:</p>
        <p>p^(​imt ) ​=Pm ​( yit ​=1 ∣ ~x it ​)</p>
        <p>The base ensemble prediction is a weighted average:
where γm is the weight of tree m, typically derived from out-of-bag (OOB) accuracy:</p>
        <p>M
p^​it ​=∑ ​γ m ​⋅ p^​it</p>
        <p>(m)
m=1
​OOB-Accuracym
γ m ​= M
∑ ​OOB-Accuracyk
k=1</p>
      </sec>
      <sec id="sec-3-5">
        <title>3.5. Temporal Voting Mechanism</title>
        <p>To improve prediction stability and incorporate memory, a temporal voting layer is introduced. It
adjusts the current prediction based on prior forecasts for the same firm:
(9)
(10)
k
y^​it ​={1 , if p^​it ​+ λ ∑ ​δ τ y^ ​i,t −τ ​&gt;0.5</p>
        <p>τ=1
0 , otherwise
where y^i,t−τ ∈ {0, 1} are previous binary predictions; δ &lt; 1 exponentially discounts older
predictions; λ &gt; 0 controls the memory effects’ strength.</p>
        <p>This mechanism prevents erratic oscillations (e.g., 0→1→0) and mimics human judgment. Once
a firm enters a "high-risk zone", it remains under scrutiny unless sustained improvement is
observed.</p>
        <p>An alternative probabilistic version uses smoothed history:</p>
        <p>k
y^​it ​={1 , if p^​it ​+ λ ∑ ​δ τ p^ ​i,t −τ ​&gt;0.5</p>
        <p>τ=1
0 , otherwise
which avoids hard thresholds in intermediate steps.</p>
        <p>This memory mechanism aligns with emerging research on dynamic interpretability, where risk
states are modeled as evolving processes rather than isolated events [15].</p>
      </sec>
      <sec id="sec-3-6">
        <title>3.6. Algorithm Of Temporal-Balanced Random Forest (T-BRF)</title>
        <p>The proposed algorithm of Temporal-Balanced Random Forest (T-BRF) builds on the concept of the
Omni-Temporal Balanced Random Forest (OT-BRF) introduced by Bayramli et al. (2022) [25],
which enforces the inclusion of temporal variables in every tree. T-BRF extends this idea by
incorporating dynamic weighting of samples and recurrent prediction correction, allowing it to
better capture temporal dependencies and handle severe class imbalance in corporate financial
datasets. Specifically, temporal weights prioritize recent observations, group-based corrections
adjust for heterogeneous class distributions, and lagged features are used to augment the input
space, improving predictive performance over the original OT-BRF approach.</p>
      </sec>
      <sec id="sec-3-7">
        <title>3.7. Complexity and Scalability</title>
        <p>Training complexity per tree is O (nsub ​⋅ d ′⋅ log nsub ​), where d ′ =d (k +1) is the augmented feature
dimension. Prediction is O ( M ⋅ d ′ ) per instance, with negligible overhead from temporal voting.
The method scales linearly with N and M, making it suitable for large-scale corporate monitoring
systems.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experiments</title>
      <p>The comprehensive experimental evaluation of the proposed Temporal-Balanced Random Forest
(T-BRF) assesses its performance under class imbalance, temporal dependence, and heterogeneous
firm dynamics. The experiments are designed to answer the following research questions:
1. Does T-BRF outperform state-of-the-art models in predicting financial risk?
2. How do temporal weighting and memory mechanisms contribute to improved
performance?
3. Is the model robust across different evaluation metrics, particularly under severe class
imbalance?</p>
      <p>All models on the synthetic panel dataset (see Section 4) are evaluated. The dataset includes six
financial ratios (ROA, Current Ratio, Debt/Equity, Net Margin, Revenue Growth, Interest Coverage)
augmented with one lag (k = 1). The target variable is defined as a binary indicator of high-risk
state, triggered by a combination of deteriorating financial indicators, with an overall event rate of
9%. Crucially, only 12% of firms experience at least one crisis episode, reflecting the rarity of
distress events in real-world settings.</p>
      <p>Baseline Models. This model i.e., T-BRF is compared against the following models:</p>
      <p>GLM (logistic regression with L2 regularization and lagged features).</p>
      <p>Random Forest (standard ensemble without class balancing).</p>
      <p>BalancedRandomForestClassifier (BRF)(RF with undersampling of the majority class).</p>
      <p>XGBoost (gradient boosting with scale_pos_weight to handle class imbalance).</p>
      <p>All models use the same feature set and train/test split strategy.</p>
      <p>Evaluation Protocol. To respect the temporal structure of panel data, time-series
crossvalidation with a rolling window is employed. Specifically, to partition the data chronologically,
TimeSeriesSplit(n_splits=5) is used. In this way, it ensures that no future information leaks into
training. Each fold uses earlier periods for training and later ones for testing. Models are evaluated
using the following metrics:</p>
      <p>AUC-ROC. It measures overall discrimination ability.</p>
      <p>AUC-PR (Precision-Recall) (more informative than AUC-ROC under class imbalance).
Recall@Top-10%. Proportion of true high-risk cases captured within the top 10% of
predictions is critical for early warning systems.</p>
      <p>F1-Score (harmonic mean of precision and recall).







</p>
      <p>Each metric is averaged across the five folds, with standard deviation reported for stability
assessment.</p>
      <p>Hyperparameters. Hyperparameter tuning was performed via grid search on a validation subset
(last 20% of training time). Final configurations:</p>
      <p>T-BRF: α =0.9 , β=0.8 , δ =0.7 , λ=0.3 , k =2 , M =100 trees.</p>
      <p>BRF &amp; RF: 100 trees, max depth = 6.</p>
      <p>XGBoost: learning rate = 0.1, max depth = 6, scale_pos_weight ≈ 10 (to match imbalance
ratio).</p>
      <p>GLM: C = 1.0 (L2 penalty).</p>
      <p>All code is implemented in Python using scikit-learn, imbalanced-learn, xgboost and shap.</p>
      <sec id="sec-4-1">
        <title>4.1. Data Description and Synthetic Dataset Generation</title>
        <p>
          Financial Features. This study employs a synthetic panel dataset designed to simulate the financial
dynamics of corporate entities over time, with the goal of modeling financial risk under conditions
of class imbalance and temporal dependence. While real-world data on financial indicators are
often fragmented inaccessible or biased, synthetic data offer a controlled environment in which
causal mechanisms can be explicitly defined, validated and iteratively refined [
          <xref ref-type="bibr" rid="ref1">1, 12</xref>
          ]. The proposed
dataset is structured to reflect realistic financial behavior, incorporating key stylized facts observed
in empirical corporate finance: autocorrelation, heterogeneity across firms, mean reversion and
rare but critical distress events.
        </p>
        <p>The dataset includes six core financial ratios, selected based on their established relevance in
credit risk assessment, bankruptcy prediction (e.g., Altman’s Z-score) and financial stability
monitoring:</p>
        <p>These variables are chosen to cover multiple dimensions of financial health: profitability,
liquidity, leverage, efficiency and growth, ensuring multidimensionality in risk assessment.</p>
        <p>Panel Structure and Temporal Dynamics. The dataset consists of N = 100 companies observed
over T = 20 quarterly periods (approximately 5 years), resulting in 2000 observations. Each
company is modeled as an independent entity with heterogeneous baseline parameters drawn from
normal distributions centered around industry-typical values (e.g., average ROA ≈ 5%, D/E ≈ 1.0).</p>
        <p>
          Temporal dynamics are generated using autoregressive processes of order one (AR(1)) with
added Gaussian noise to simulate natural volatility:
xi,t ​= ρ⋅ xi,t−1 ​+(1 − ρ)⋅ μi ​+ εi,t ​,
εi,t ​∼ N (0 , σ 2)
(11)
where ρ = 0.75 ensures mean reversion, μi is firm-specific and σ controls volatility. This structure
mimics real-world persistence in financial metrics while allowing for shocks and trends[
          <xref ref-type="bibr" rid="ref9">9</xref>
          ].
        </p>
        <p>To enhance realism, all variables are bounded within economically plausible ranges (e.g.,
Current Ratio ≥ 0.4, Interest Coverage ≥ 0.3) to avoid non-sensical values.</p>
        <p>Target Variable Definition. The binary target variableyij ∈ {0, 1} indicates a high-risk financial
state. It is not assigned randomly but triggered by a combination of deteriorating financial
conditions:</p>
        <p>ROA &lt; 0.01;
Net Profit Margin &lt; 0;
Debt/Equity &gt; 1.8;</p>
        <p>Revenue Growth &lt; –0.08.</p>
        <p>Once these conditions are met at time t, the target is set to 1 at t + 1 or t + 2, simulating a lag
between financial weakening and full-blown crisis. This delay reflects real-world inertia in
corporate decline and allows models to learn early warning signals.</p>
        <p>To ensure statistical robustness and sufficient learning signal, the final dataset is calibrated so
that 9 % of all observations are labeled as high-risk (y = 1). However, only 12 % of companies
experience such events, aligning with real-world estimates where financial distress is rare at the
firm level but cumulatively significant. On average, each affected company exhibits 1.2 consecutive
high-risk quarters, reflecting the typical duration of acute financial instability.</p>
        <p>Validity and Reliability of the Synthetic Design. Although synthetic, the dataset adheres to
principles of construct validity:</p>
        <p>Face validity: all features and thresholds are grounded in financial theory;
Content validity: variables cover major domains of financial analysis;</p>
        <p>Criterion validity: the target is linked to empirically known predictors of distress;</p>
        <p>Ecological validity: temporal patterns mirror real financial time series.</p>
        <p>Moreover, the design supports reproducibility and transparency: every parameter, distribution,
and decision rule is explicitly defined, enabling other researchers to replicate, extend, or challenge
the results.</p>
        <p>This controlled yet realistic framework makes the dataset suitable for evaluating machine
learning models, particularly imbalanced classification algorithms in forecasting low-probability,
high-impact financial events.</p>
        <p>Correlation Structure. The linear relationships between features are quantified using the
Pearson correlation coefficient:
r x y ​=
∑ ​( xit ​− ¯x ) ​( yit ​− ¯y ​)
i,t
√∑ ​( xit ​− ¯x )2∑ ​( yit ​− ¯y ​)2 ​</p>
        <p>i,t i,t</p>
        <p>Computed for all pairwise combinations, the correlation matrix (Figure 1) reveals economically
meaningful associations:</p>
        <p>Strong positive correlation between ROA and Net Profit Margin (r=0.68) reflects shared
dependence on profitability;
Moderate negative correlation between Debt/Equity and Interest Coverage (r=− 0.54)
aligns with theoretical expectations: higher leverage reduces debt-servicing capacity;
Weak inter-feature correlations (∣r∣ &lt; 0.3) for Current Ratio confirm its independence as a
liquidity signal.
(12)
(13)</p>
        <p>Further the Variance Inflation Factor (VIF) has been computed to assess multicollinearity:
VIF j ​=</p>
        <p>1
1 − R2j
where R2j is the coefficient of determination from regressing feature j on all others. All VIF values
are below 3.0 (max = 2.7 for Net Margin), indicating no severe multicollinearity that could bias
model training. This structure supports multivariate modeling: features provide complementary
information without redundancy.</p>
        <p>Class Separability and Imbalance. Despite class imbalance (9% positive cases), separability is
evaluated based on Cohen’s d effect size for key features:
where μ1 ​, σ 1 and μ0 ​, σ 0 are means and standard deviations for distressed and non-distressed firms,
respectively.</p>
        <p>The large effect sizes (|Cohen’s d| &gt; 0.8) in Table 1 indicate strong separation between distressed
and non-distressed firms for all listed features. Negative values for ROA, Interest Coverage, and
Net Profit Margin suggest that lower values are associated with higher risk, while the positive
value for Debt/Equity indicates that higher leverage increases risk.</p>
        <p>Additionally, PCA projection (Figure 2a) reveals partial separation in the space spanned by the
first two principal components computed as:</p>
        <p>PC1 ​= Xw1 ​, w1 ​=arg max ​Var ( Xw )
∣∣w∣∣=1
(15)
where X is the standardized feature matrix. The first two components explain 62 % of total
variance, with PC1 dominated by profitability and leverage. This suggests that classification is
challenging but feasible, being ideal for evaluating advanced ensemble methods.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Exploratory Data Analysis</title>
        <p>Before model development, an exploratory analysis has been conducted to validate the structural
properties of the synthetic dataset, examine feature distributions, assess multicollinearity and
identify early warning patterns associated with financial distress. This stage ensures that the
generated data not only follow predefined rules but also exhibit emergent behaviors consistent
with real-world financial dynamics.</p>
        <p>
          Standard ensemble methods such as Random Forest and XGBoost have shown promise in
financial prediction tasks [18], but suffer under severe class imbalance. Techniques like
BalancedRandomForestClassifier address this through undersampling [
          <xref ref-type="bibr" rid="ref5 ref6">5, 6</xref>
          ], yet often neglect
temporal structure.
        </p>
        <p>Distributional Properties and Normality. To assess the realistic nature of the feature
distribution, empirical density functions are calculated using kernel density estimation (KDE):
f^ h ​( x )=
1 n</p>
        <p>∑ ​K h ​( x − xi ​)
n i=1
zit ​= xit ​− μx ​​
σ x
where μx and σ x are the mean and standard deviation across all firms and time points. It has been
found that less than 2% of observations exceed ∣z∣&gt;3, confirming absence of unrealistic outliers.
where K h ​(⋅ )is the Gaussian kernel with bandwidth h and xi are observed values of a given
financial ratio (e.g., ROA).</p>
        <p>Figure 3 presents kernel density estimates for the six financial variables across all observations.
The distributions reflect realistic heterogeneity and skewness commonly observed in corporate
financial data:
1. ROA and Net Profit Margin are approximately normally distributed around their means
(0.05 and 0.10, respectively), with visible left tails indicating loss-making firms.
2. Debt/Equity exhibits right-skewness, consistent with empirical findings: most firms
maintain moderate leverage, while a few are highly indebted.
3. Current Ratio is centered around 1.8, with minimal instances below 1.0, reflecting baseline
liquidity resilience.
4. Revenue Growth shows higher volatility, including negative values, simulating market
fluctuations.
5. Interest Coverage Ratio spans a wide range, with a concentration above 3.0, while notable
cases below 1.5 being known as a threshold for default risk.</p>
        <p>These patterns confirm that the AR(1) generation process with bounded noise successfully
replicates key statistical properties of real financial indicators.</p>
        <p>Additionally, the Z-score normalization has been applied to detect extreme values:
(16)
(17)
x (τ )= 1</p>
        <p>∑ ​xi,t+τ ​, τ ∈{− 4 , … , − 1 }
∣C∣ ​(i,t)∈C
where C is the set of crisis events. Figure 4 shows these trends for key indicators.</p>
        <p>Notably, a significant decline in ROA starting at τ =− 3 is observed. To quantify this trend, a
linear model is fittedwithin the lead window:
(18)
(19)
ΔROA (τ )=α + β⋅ τ + ε ,</p>
        <p>τ ∈{− 4 , − 3 , − 2 , − 1 }</p>
        <p>For high-risk firms, the slope β^ ​=− 0.008 ( p&lt;0.01), indicating a statistically significant
downward trend of 0.8 percentage points per quarter.</p>
        <p>Similarly, Debt/Equity increases with slope β^ ​=+0.12( p&lt;0.05), signaling growing financial
pressure.</p>
        <p>These results confirm the presence of predictable degradation patterns in the pre-crisis phase.</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Baseline Models and Their Limitations</title>
        <p>To establish a performance benchmark and identify key challenges in financial risk prediction,
several standard machine learning models under class imbalance and temporal dependence are
evaluated. The results reveal systematic limitations that motivate the design of Temporal-Balanced
Random Forest (T-BRF).</p>
        <p>Generalized Linear Model (GLM). To start with, let logistic regression be a baseline:
log(</p>
        <p>P ( yit ​=1)
1 − P ( yit ​=1) ​</p>
        <p>d
)= β0 ​+∑ ​β j ​x(itj) ​</p>
        <p>j=1
xiatug ​=[ xit ​, xi,t −1 ​, xi,t −2​]
(20)
(21)</p>
        <p>Features are standardized, and L2 regularization is applied to prevent overfitting. To incorporate
dynamics, lagged versions of each feature are included:</p>
        <p>Despite its interpretability, GLM achieves only moderate performance (AUC-ROC = 0.72),
primarily due to:


</p>
        <p>Inability to capture non-linear interactions (e.g., ROA × Debt/Equity);
Sensitivity to outliers in imbalanced settings;</p>
        <p>Assumption of linearity in the log-odds.</p>
        <p>
          Logistic regression remains a widely used baseline in financial distress prediction due to its
interpretability and statistical transparency [
          <xref ref-type="bibr" rid="ref1">1, 12</xref>
          ]. However, its linearity assumption limits its
ability to capture complex interactions between financial ratios.
        </p>
        <p>Standard Random Forest (RF). Random Forest leverages ensemble learning and handles
nonlinearity well:
y^​it ​=MajorityVote ({T m ​( xit ​)}mM=1 ​)
(22)
However, on our panel data RF suffers from:


</p>
        <p>Temporal leakage: trees are trained assuming i.i.d. samples, ignoring within-firm
autocorrelation;
Overfitting to noise: in the absence of class balancing, it largely predictsy = 0;
Instability in rare-event forecasting: small changes in sampling lead to large prediction
swings.</p>
        <p>As shown in Table 2, RF achieves AUC-ROC = 0.76 but only Recall@Top-10% = 54%, indicating
poor detection of high-risk cases.</p>
        <p>Random Forest has demonstrated strong performance in corporate risk modeling due to its
robustness to noise and non-linear relationships [18]. However, in the absence of class balancing it
tends to over-predict the majority class in imbalanced settings.</p>
        <p>BalancedRandomForestClassifier (BRF). BRF addresses class imbalance by undersampling the
majority class (non-crisis observations) in each bootstrap sample:</p>
        <p>Sm ​∼ Sample ( D1 ​∪ D(0m) ​) , ∣D(0m) ​∣=∣D1 ​∣
(23)
This improves minority-class recall (Recall@Top-10% = 62%) but introduces new issues:


</p>
        <p>Uniform sampling ignores time: older crisis examples (from 5 years ago) are treated equally
with recent ones despite structural economic shifts;
No memory mechanism: predictions at t and t + 1 for the same company can fluctuate
wildly (e.g., 0  1  0) reducing operational reliability;
Ignores firm heterogeneity: small firms in volatile sectors are underrepresented even after
balancing.</p>
        <p>
          The BalancedRandomForestClassifier addresses class imbalance through random under
sampling of the majority class. This technique is shown to improve recall in financial forecasting
tasks [
          <xref ref-type="bibr" rid="ref5 ref6">5, 6</xref>
          ]. However, standard implementations assume i.i.d. data and do not account for temporal
dependence in panel structures.
        </p>
        <p>Gradient Boosting (XGBoost). XGBoost with scale_pos_weight performs as the best among
baselines (AUC-ROC = 0.79) leveraging boosting to focus on hard-to-classify instances:
y^(t)= y​(t −1)+η⋅ f t ​( x )
(24)</p>
        <p>Yet, it still assumes i.i.d. data and lacks explicit temporal modeling. Feature importance analysis
shows overreliance on single-period spikes (e.g., one negative news event), rather than sustained
trends.</p>
        <p>
          XGBoost is among the most effective gradient boosting frameworks for structured data,
achieving state-of-the-art results in financial risk prediction [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]. When combined with SHAP
analysis, it offers valuable insights into feature importance [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. Nevertheless, like RF, it requires
explicit modifications to handle temporal dynamics and recency bias.
        </p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Results</title>
      <sec id="sec-5-1">
        <title>5.1. Analysis of Feature Importance Using SHAP Results for BRF and XGBoost</title>
        <p>Beyond predictive performance, interpretability is crucial in financial risk modeling where
stakeholders demand transparent and actionable insights. To understand how baseline models
utilize input features, a SHAP (SHapley Additive exPlanations)-based feature importance analysis
(see Table 3) has been performed. The results reveal that both the BalancedRandomForest (BRF)
and XGBoost prioritize lagged financial indicators, particularly, Debt_Equity_lag1 and ROA_lag1
being the strongest predictors of distress. This indicates that the models do not rely solely on
contemporaneous values but effectively capture temporal trends such as deteriorating profitability
or rising leverage, aligning with established economic theory.</p>
        <p>Notably, the top three most influential features are identical in rank between BRF and XGBoost,
suggesting robustness in signal detection across different ensemble architectures. However,
differences emerge in lower-ranked features: XGBoost assigns higher importance to
Revenue_Growth (Rank 4) compared to BRF (Rank 6), while BRF places greater weight on liquidity
through Current_Ratio_lag1. These variations reflect inherent architectural distinctions such as
XGBoost’s sensitivity to non-linear interactions and feature synergies, versus BRF’s stability under
class-balanced sampling variation.</p>
        <p>The consistent prominence of lagged features across both models underscores the critical role of
temporal dynamics in risk assessment. Yet, neither model explicitly incorporates time-aware
mechanisms into its inference process such as adaptive sampling based on recency or
memorybased prediction smoothing i.e. a limitation that motivates the design of our proposed
TemporalBalanced Random Forest (T-BRF).</p>
        <p>Summary of Key Findings:</p>
        <p>Dominance of Lagged Financial Indicators.</p>
        <p>Debt_Equity_lag1;
ROA_lag1;
Net_Margin_lag1;</p>
        <p>Interest_Coverage;




This demonstrates that the models leverage historical trends such as sustained increases in
leverage or declining profitability, as early warning signals, rather than reacting to isolated
pointin-time anomalies.</p>
        <p>Consistency with Financial Theory. The top-ranked features align closely with
wellestablished financial distress frameworks, such as Altman’s Z-score model. Specifically:
High values of Debt_Equity consistently increase predicted risk (positive SHAP values)
reflecting over-leveraging and heightened default probability;



</p>
        <p>Low or negative values of ROA and Net_Margin strongly contribute to higher risk scores
signaling weak operational performance and erosion of profitability;
Declining Interest_Coverage emerges as a key predictor, particularly in XGBoost indicating
heightened sensitivity to a firm’s ability to service its debt obligations.</p>
        <p>Model-Specific Differences. While agreement on the top three features suggests
convergence on core risk drivers, subtle differences highlight architectural nuances:
BRF exhibits relatively higher sensitivity to Current_Ratio_lag1, indicating a stronger
emphasis on short-term liquidity constraints;
XGBoost assigns greater importance to Revenue_Growth, likely due to its capacity to detect
complex, non-linear interactions between growth trajectories and financial instability.
Directionality of Impact. As illustrated in the SHAP summary plots (Figures 6 and 7), the
direction of each feature's impact on the predicted risk is both intuitive and economically
interpretable:
Red points (high values) of Debt_Equity shift predictions toward high risk, confirming
expected behavior;
Blue points (low values) of ROA and Interest_Coverage also push predictions upward,
validating theoretical expectations;
Some features, such as Revenue_Growth, exhibit heterogeneous effects: moderate growth
reduces risk, while extreme volatility or sharp declines significantly increase it.</p>
        <p>Robustness Across Models.</p>
        <p>Despite architectural differences, the rank correlation between BRF and XGBoost in terms of
feature importance is high (Spearman’s ρ ≈ 0.87) reinforcing confidence in the identified risk
drivers. This consistency across diverse learning paradigms strengthens the validity of the detected
patterns and supports their generalizability.</p>
        <p>This interpretability is essential for real-world deployment, where risk assessments must be
explainable to analysts, regulators and decision-makers.</p>
        <p>Furthermore, the strong alignment between model behavior and financial theory validates the
design of our synthetic dataset, in which financial deterioration systematically precedes the target
event by several periods (see Table 3).</p>
        <p>SHAP (SHapley Additive exPlanations) provides a theoretically grounded approach to
interpreting ensemble models [16]. It has been widely adopted in financial applications due to its
consistency and local accuracy [13]. Recent extensions enable dynamic interpretation over time
(Kim &amp; Park, 2024) supporting the development of time-aware models like T-BRF.</p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. Performance Comparison</title>
        <p>Table 4 summarizes the average performance of all models across the five folds.</p>
        <sec id="sec-5-2-1">
          <title>T-BRF (proposed)</title>
          <p>As shown, T-BRF achieves the highest scores across all metrics. Most notably:
+3.8 percentage points in AUC-ROC over XGBoost;
+7.4 pp in AUC-PR indicates superior precision under high recall;
+12.8 pp in Recall@Top-10% is crucial for operational risk monitoring;
+8.3 pp in F1-score is confirming balanced improvement in precision and recall.</p>
          <p>The consistent outperformance demonstrates that integrating temporal awareness and adaptive
sampling significantly enhances predictive power.</p>
          <p>AUC-PR and Recall@Top-10% are reported as primary metrics for evaluating rare-event
prediction in financial contexts [19].
5.3. Ablation Study
To isolate the contribution of each component in T-BRF, an ablation study is conducted (Table 5).</p>
        </sec>
        <sec id="sec-5-2-2">
          <title>Variant</title>
          <p>T-BRF (full)
w/o temporal voting
w/o temporal weights
w/o group correction
w/o lagged features</p>
          <p>The largest drop occurs when lagged features are removed, confirming their importance in
capturing dynamics. Temporal voting and weighting also play significant roles, reducing false
alarms and improving temporal coherence.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Discussion</title>
      <p>The experimental results demonstrate that the proposed Temporal-Balanced Random Forest
(TBRF) significantly outperforms established baseline models in predicting financial risk within a
panel data setting. This section discusses the implications of these findings, interprets the model’s
behavior in light of domain knowledge, addresses its limitations and outlines potential applications
and future research directions.</p>
      <p>Theoretical Advantages. Compared to BRF, T-BRF offers several improvements:



</p>
      <p>Time-aware sampling. It prioritizes recent data via wititme, reducing reliance on outdated
patterns;
Dynamic feature space. Lagged inputs enable detection of trend-based degradation;
Stable inference. Temporal voting reduces false alarms and improves operational reliability;
Heterogeneity adjustment. Group weights prevent underrepresentation of small sectors.</p>
      <p>Moreover, T-BRF preserves the interpretability of tree ensembles, enabling post-hoc analysis via
SHAP or LIME. T-BRF preserves the interpretability of tree ensembles, enabling post-hoc analysis
via SHAP [16], a method widely adopted in fintech for transparent risk scoring [13].</p>
      <p>Discussion of Results. The results confirm that T-BRF effectively addresses the dual challenges
of class imbalance and temporal dependence. While BRF and XGBoost improve over naive RF by
balancing classes, they fail to account for recency and persistence in risk states. In contrast,
TBRF’s temporal voting mechanism ensures that once a firm enters a high-risk zone, it remains
under scrutiny unless sustained recovery is observed.</p>
      <p>Moreover, the substantial gain in AUC-PR and Recall@Top-10% highlights T-BRF’s strength in
early detection, making it particularly suitable for proactive risk management systems where
identifying rare but critical events is paramount [26].</p>
      <p>The advantage of T-BRF across all metrics with consistent gains over all baselines is obvious.</p>
      <p>
        The significant gain in AUC-PR aligns with findings by Liang et al. [19], who emphasize the
importance of precision-focused metrics in early-warning systems. Similarly, our use of time-aware
balancing extends the resampling strategies analyzed by Chen et al. [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] into the temporal domain.
      </p>
      <sec id="sec-6-1">
        <title>6.1. Interpretation of Results</title>
        <p>T-BRF achieves superior performance across all key metrics, particularly in AUC-PR and
Recall@Top-10%, confirming that explicitly modeling temporal dynamics and class imbalance leads
to more effective early-warning systems. The proposed T-BRF addresses these shortcomings by
integrating temporal weighting, group-aware resampling, lagged feature construction, and
memory-based voting within a single interpretable ensemble. The ablation study further reveals
that each component of T-BRF contributes meaningfully to its overall effectiveness:


</p>
        <p>Lagged features enable the detection of deteriorating trends rather than isolated anomalies;
Temporal weighting ensures that recent observations have greater influence on training,
reflecting structural shifts in firm behavior over time;
Temporal voting introduces memory into predictions, reducing erratic fluctuations and
improving operational reliability.</p>
        <p>These mechanisms align with real-world risk monitoring practices, where analysts track
evolving financial health over multiple periods rather than reacting to single-point indicators. The
SHAP analysis reinforces this interpretation, showing that T-BRF assigns high importance to
economically meaningful variables such as Debt_Equity_lag1 and ROA_lag1, consistent with
classical distress models like Altman’s Z-score.</p>
        <p>Moreover, the strong agreement between T-BRF and baseline models (BRF, XGBoost) on
topranked features, despite architectural differences, validates both the realism of our synthetic dataset
and the robustness of the identified risk signals.</p>
        <p>
          The success of incorporating lagged financial indicators aligns with recent findings in
longitudinal risk modeling [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ], where temporal dynamics significantly improves predictive
accuracy.
        </p>
        <p>This design extends beyond both standard BRF and deep learning approaches, offering a
transparent and temporally consistent framework for proactive financial risk monitoring.</p>
        <p>Interpretability and Practical Implications. The SHAP analysis confirms that both ensemble
models learn economically meaningful patterns rather than spurious correlations.</p>
      </sec>
      <sec id="sec-6-2">
        <title>6.2. Limitations</title>
        <p>Despite its advantages, this study has several limitations:




</p>
        <p>Synthetic Data. While carefully constructed to reflect realistic financial dynamics, the
dataset does not capture all complexities of real-world corporate environments, including
macroeconomic shocks, sudden regulatory changes or strategic restructurings;
Assumption of Regular Time Intervals. The model assumes equally spaced observations
(e.g., quarterly), which may not hold in settings with irregular reporting (e.g., startups or
private firms);
Fixed Memory Mechanism. The temporal voting layer uses an exponential decay structure
with fixed parameters (δ, λ). In practice, optimal memory depth may vary by industry or
crisis type;
Scalability to High-Dimensional Features. Although efficient for moderate feature spaces,
performance may degrade if hundreds of noisy or redundant features (e.g., from social
media scraping) are included without preprocessing;
Static Group Structure. The current implementation assumes stable group membership (e.g.,
sector), but firms may change sectors or business models over time.</p>
        <p>This validates the design choices behind T-BRF.</p>
      </sec>
      <sec id="sec-6-3">
        <title>6.3. Future Research Directions</title>
        <p>This study opens several promising avenues for future work:




</p>
        <p>Integration of Unstructured Data. It incorporates textual data (news, earnings calls, ESG
reports) via NLP embeddings to create hybrid financial indicators;
Dynamic Group Assignment. It uses clustering or topic modeling to adaptively assign firms
to risk groups based on behavioral similarity;
Online Learning Variant. It develops an incremental version of T-BRF capable of updating
trees as new data arrives, suitable for real-time monitoring;
Causal Interpretation. It extends SHAP-based analysis with causal discovery methods to
distinguish correlation from causation in risk drivers;
Benchmarking on Real Data. It validates T-BRF on real-world datasets such as Compustat,
ORBIS or Ukrainian corporate registries when available.</p>
        <p>
          Furthermore, the framework can be generalized beyond finance, for example, to healthcare risk
prediction or supply chain disruption forecasting, where rare events, temporal dependence and
class imbalance coexist. Future work will extend T-BRF to incorporate unstructured data (e.g.,
earnings calls, press releases) using NLP techniques, following the multimodal approach of Wu et
al. [11]. Additionally, integration with ESG risk scoring, as demonstrated by Chen et al. [21], offers
a natural extension for reputational risk modeling. The framework can also benefit from advanced
panel-data learning methods such as PanelGBM [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ].
        </p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>7. Conclusions</title>
      <p>This study introduces the Temporal-Balanced Random Forest (T-BRF)—a hybrid ensemble method
designed to improve corporate risk prediction in longitudinal financial datasets marked by strong
temporal dependence and severe class imbalance. The model addresses key limitations of
conventional algorithms such as Random Forest, XGBoost, and BalancedRandomForest, which
typically fail to capture both the rarity of distress events and the time-dependent structure of
firmlevel data.</p>
      <p>T-BRF formulates financial risk forecasting as a temporal imbalanced classification problem and
integrates four complementary innovations:\</p>
      <p>Lagged feature engineering to capture evolving financial trajectories;
Time-decay sampling weights that emphasize recent information;
Group-aware undersampling to respect firm heterogeneity;
Memory-based voting, which stabilizes predictions by incorporating past forecasts for the
same entity.</p>
      <p>Evaluation on a realistic synthetic panel of 100 firms across 20 quarterly periods shows that
TBRF consistently outperforms established baselines (GLM, RF, BRF, XGBoost) across all major
metrics. It achieves substantial gains in AUC-PR (+7.4 pp) and Recall@Top-10% (+12.8 pp),
confirming its effectiveness in early-risk detection. Ablation analysis further highlights that lagged
features and temporal voting deliver the largest performance improvements.</p>
      <p>The scientific novelty of T-BRF lies in the integration of temporal dynamics and balanced
learning within an interpretable tree-based framework. Unlike deep learning models, it maintains
transparency and enables post-hoc explainability through SHAP analysis. The model
predominantly relies on economically meaningful indicators such as declining ROA, rising
Debt-toEquity ratio, and deteriorating Interest Coverage, aligning with established financial theory.</p>
      <p>The practical relevance of T-BRF is reflected in its operational reliability, interpretability, and
suitability for proactive decision support. It can be deployed in various contexts, including:</p>
      <p>Early warning systems in banking and credit rating;
ESG and reputational risk monitoring;
Regulatory supervision tools;</p>
      <p>Investor due diligence in private equity or distressed markets.</p>
      <p>T-BRF’s interpretability ensures that stakeholders can understand why a firm is classified as
high-risk—a property essential for regulatory compliance, auditability, and trust in automated
decision-making. Its modular architecture also allows straightforward integration of alternative
data streams (e.g., NLP-derived sentiment scores or ESG event counts), making it adaptable to
hybrid and multimodal risk modeling consistent with recent work by Wu et al. (2023) [11] and
Chen et al. (2023) [21], who show that textual and sustainability signals often precede financial
deterioration.</p>
      <p>In summary, T-BRF bridges the gap between predictive accuracy and interpretability in
imbalanced longitudinal settings. By unifying temporal awareness, adaptive resampling, and
memory-driven inference, it offers a robust, transparent, and actionable framework for
nextgeneration corporate risk assessment. Future research will extend T-BRF to real-world datasets,
incorporate unstructured data via NLP techniques, and explore online learning variants for
continuous monitoring.</p>
    </sec>
    <sec id="sec-8">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the authors used OpenAI GPT-5 in order to: Grammar and
spelling check. After using these tools/services, the authors reviewed and edited the content as
needed and takes full responsibility for the publication’s content.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>S.</given-names>
            <surname>Tian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <article-title>Forecasting corporate bankruptcy: Comparison of machine learning methods</article-title>
          ,
          <source>Expert Systems with Applications</source>
          <volume>168</volume>
          (
          <year>2021</year>
          )
          <article-title>114054</article-title>
          . https://doi.org/10.1016/j.eswa.
          <year>2020</year>
          .
          <volume>114054</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>H.</given-names>
            <surname>Alaka</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Oyedele</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Akinade</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bilal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Owolabi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ajayi</surname>
          </string-name>
          ,
          <article-title>Machine learning in construction cost, time and risk analysis: A review, Automation in Construction 117 (</article-title>
          <year>2020</year>
          )
          <article-title>103277</article-title>
          . https://doi.org/10.1016/j.autcon.
          <year>2020</year>
          .103277
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>M.</given-names>
            <surname>Qi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <article-title>Deep learning for corporate default prediction</article-title>
          ,
          <source>Journal of Banking &amp; Finance</source>
          <volume>128</volume>
          (
          <year>2021</year>
          )
          <article-title>106151</article-title>
          . https://doi.org/10.1016/j.jbankfin.
          <year>2021</year>
          .106151
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <article-title>Transformer-based corporate failure prediction using longitudinal financial data, Decision Support Systems 158 (</article-title>
          <year>2023</year>
          )
          <article-title>114089</article-title>
          . https://doi.org/10.1016/j.dss.
          <year>2023</year>
          .114089
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>X.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <article-title>Imbalanced learning in credit risk modeling: A comparative study of resampling and cost-sensitive methods</article-title>
          ,
          <source>Applied Soft Computing</source>
          <volume>121</volume>
          (
          <year>2022</year>
          )
          <article-title>108765</article-title>
          . https://doi.org/10.1016/j.asoc.
          <year>2022</year>
          .108765
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>G.</given-names>
            <surname>Akbayrak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Ozkan-Ozen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Kazancoglu</surname>
          </string-name>
          ,
          <article-title>Bankruptcy prediction using SMOTE and machine learning models</article-title>
          ,
          <source>Financial Innovation</source>
          <volume>7</volume>
          (
          <year>2021</year>
          )
          <article-title>58</article-title>
          . https://doi.org/10.1186/s40854-021- 00298-y
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>H.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Feng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <article-title>SHAP-guided feature selection for ESG risk modeling in financial institutions</article-title>
          ,
          <source>Technological Forecasting and Social Change</source>
          <volume>192</volume>
          (
          <year>2023</year>
          )
          <article-title>122567</article-title>
          . https://doi.org/10.1016/j.techfore.
          <year>2023</year>
          .122567
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Lu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <article-title>ESG risk forecasting using deep learning: A temporal attention model</article-title>
          ,
          <source>Finance Research Letters</source>
          <volume>48</volume>
          (
          <year>2022</year>
          )
          <article-title>103021</article-title>
          . https://doi.org/10.1016/j.frl.
          <year>2022</year>
          .103021
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>R.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Chen</surname>
          </string-name>
          , B. Liu,
          <article-title>PanelGBM: Gradient boosting for panel data with fixed effects, Knowledge-Based Systems 260 (</article-title>
          <year>2023</year>
          )
          <article-title>110589</article-title>
          . https://doi.org/10.1016/j.knosys.
          <year>2023</year>
          .110589
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>