<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Hybrid fractal-machine learning framework for urban air pollution forecasting</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Oleksandr Kuchanskyi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Karina Zhumagulova</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute”</institution>
          ,
          <addr-line>Beresteiskyi Ave., 37, Kyiv, 03056</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>School of Artificial Intelligence and Data Science, Astana IT University</institution>
          ,
          <addr-line>Mangilik El, Block C1, Astana, 010000</addr-line>
          ,
          <country country="KZ">Kazakhstan</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2026</year>
      </pub-date>
      <abstract>
        <p>Precise prediction of urban air pollution is vital not only for people's health but also for environmental management. The present research proposes a hybrid forecasting framework that integrates traditional time-series models (ARIMA, SARIMAX, Prophet), machine-learning techniques (LSTM), and fractal-based indicators to produce forecasts of air pollutant concentrations for Astana, Kazakhstan. The methodology combines fractal analysis with multimodel forecasting, estimating fractal features such as the Hurst exponent, DFA scaling, and fractal dimension to capture both short-term dynamics and long-range dependencies. The results reveal that LSTM networks combined with fractal metrics outperform traditional statistical models, achieving the highest prediction accuracy for PM2.5, PM10, CO, NO2, and SO2 levels. The hybrid ensemble method improved overall accuracy and robustness by capturing persistent nonlinear trends. By combining fractal analysis with machine learning, the approach enables more reliable urban air quality forecasts, supporting timely public health interventions and data-driven pollution mitigation. The results indicate the usefulness of fractal-based models for urban air quality monitoring and, therefore, support the implementation of advanced pollution mitigation strategies.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;air pollution forecasting</kwd>
        <kwd>fractal analysis</kwd>
        <kwd>Hurst exponent</kwd>
        <kwd>DFA</kwd>
        <kwd>LSTM</kwd>
        <kwd>ARIMA</kwd>
        <kwd>SARIMAX</kwd>
        <kwd>STL</kwd>
        <kwd>time series</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Urban air pollution is one of the major threats that directly impacts the quality of life in cities,
environmental stability, and regulatory planning [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. The primary drivers behind rising concentrations of
pollutants such as particulate matter (PM2.5 and PM10), nitrogen dioxide, and sulfur dioxide include
rapid urbanization, industrial emissions, and heavy vehicular trafic. Consequently, accurate forecasting
of pollutant concentrations is essential for informed decision-making, timely mitigation measures, and
efective public health interventions [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>
        Forecasting urban air pollution is one of the most challenging tasks in environmental science, largely
due to the complexity of environmental time series data. Nonstationarity, long-range dependence,
and stochastic fluctuations are inherent characteristics that traditional linear models fail to adequately
capture [
        <xref ref-type="bibr" rid="ref3 ref4">3, 4</xref>
        ]. These properties produce chaotic and scale-invariant patterns, ultimately reducing the
reliability and accuracy of conventional forecasting techniques [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>
        Fractal analysis ofers a robust approach to quantifying these structural complexities. Metrics such
as the Hurst exponent and fractal dimension reflect persistence, roughness, and long-memory efects
that frequently appear in pollutant time series [
        <xref ref-type="bibr" rid="ref3 ref4">3, 4</xref>
        ]. Numerous empirical studies have demonstrated
multifractal behavior in PM2.5 and PM10 series across various global cities, further supporting the
relevance of fractal methods in analyzing urban air quality dynamics [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
      <p>
        Machine Learning (ML) and Deep Learning (DL) methods have also proven efective in modeling
nonlinear, high-dimensional time series [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. However, when applied directly to complex, nonstationary
environmental data, these models may fail to capture intrinsic scale-invariant correlations unless
enhanced with appropriate feature engineering [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
      </p>
      <p>
        To address these limitations, this study proposes a Hybrid Fractal–Machine Learning (HFML)
approach [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. The framework extracts fractal-based complexity features through multifractal analysis and
integrates them into advanced ML predictors, thus combining structural interpretability with strong
nonlinear modeling capabilities. We hypothesize that this hybrid approach can significantly improve
the accuracy and robustness of urban air pollution forecasts, moving beyond traditional statistical
descriptions toward a more generalizable predictive methodology.
      </p>
      <p>Thus, the aim of this study is to overcome the limitations of existing forecasting methods by developing
a framework that combines fractal analysis with machine learning for the analysis of urban air pollution
data. To achieve this aim, the following tasks are formulated:
1. To compute the key fractal characteristics of pollutant time series, in particular the Hurst exponent
 as an indicator of long-term memory, and to assess the degree of nonlinear persistence in the
data.
2. To integrate fractal features as a preprocessing stage into predictive models (ARIMA, SARIMAX,</p>
      <p>Prophet, and LSTM), thereby forming a hybrid forecasting approach.
3. To evaluate the efectiveness of the proposed hybrid model in comparison with traditional
approaches using real-world air pollution data from the city of Astana, testing the hypothesis
that hybrid models provide higher forecasting accuracy and robustness.</p>
      <p>Unlike previous studies that separately examined fractal properties or applied machine learning
techniques to pollution data, this work combines both approaches by directly embedding fractal
characteristics into the forecasting process.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related works</title>
      <p>
        Urban air pollution presents a complex and pressing challenge for public health, environmental
management, and regulatory planning. The accurate prediction of the concentrations of pollutants like
PM2.5, PM10, CO, NO2, and SO2 necessitates the application of models that are capable of depicting the
non-linear, non-stationary, and long-range dependent dynamics that are typical for the aforementioned
environmental time series [
        <xref ref-type="bibr" rid="ref10 ref11 ref12">10, 11, 12</xref>
        ]. Fractal and multifractal time series analysis has gained
recognition as an efective method to determine these complexities, giving rise to the notion of the persistence
and volatility of the hidden structures in the data that the traditional linear methods are often unable to
disclose [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. Traditional statistical models, such as ARIMA, are often unable to adequately describe
the complex nonlinear and long-term dependent dynamics of air pollution data. In contrast, machine
learning and deep learning methods have demonstrated higher accuracy due to their ability to model
complex patterns. However, these models can be enhanced with additional features, for example,
estimates of fractal characteristics. At the same time, the integration of fractal characteristics into
predictive models remains limited in the literature. This study makes it possible to fill this gap and
combines several fractal features that are directly integrated into an ensemble predictive framework.
This section examines key theoretical foundations, methodological developments, and applications,
concluding with the incorporation of fractal analysis into advanced predictive modeling.
      </p>
      <sec id="sec-2-1">
        <title>2.1. Theoretical background and overview of fractal time series analysis</title>
        <p>
          Environmental systems are significantly impacted by the interactions of physical, chemical, and
humancaused processes, resulting in irregular and highly dynamic behavior [
          <xref ref-type="bibr" rid="ref10 ref14">10, 14</xref>
          ]. Observed pollutant
concentrations frequently exhibit self-similar and scale-invariant fluctuations, indicating long-range
dependence and complex feedback processes [
          <xref ref-type="bibr" rid="ref6">15, 6</xref>
          ]. Fractal time series methodology provides a
theoretical background for recognizing such properties by means of measures like the fractal dimension
 and the Hurst exponent , which indicate the level of geometric complexity and the degree of
persistence, correspondingly [16, 17].
        </p>
        <p>
          The application of fractal theory extensions to urban meteorology, for instance, and anomalous
difusion processes has evidenced that not just the pollutant complexity is characterized by these
parameters but also the interactions with environmental and human factors are disclosed [18]. Besides
air, water, and soil pollution, fractal analysis has been employed in several studies, which showed its
strength in capturing non-linear, persistent and chaotic dynamics that are the basis of environmental
monitoring and forecasting [
          <xref ref-type="bibr" rid="ref2">2, 19</xref>
          ].
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Methods of fractal and multifractal time series analysis in environmental research</title>
        <p>
          Fractal methods are applied as quantitative tools for the analysis of environmental time series data that
are irregular and scale-invariant in nature. The  is used for the quantitative expression of a dataset’s
geometric complexity or roughness, while the  is associated with the characteristic of long-term
memory and persistence [
          <xref ref-type="bibr" rid="ref11">20, 11</xref>
          ]. Environmental datasets often show the characteristic of multifractality,
where small and large fluctuations conform to diferent scaling laws and can be identified through
application of the Multifractal Detrended Fluctuation Analysis (MF-DFA) [
          <xref ref-type="bibr" rid="ref12">16, 12, 21</xref>
          ].
        </p>
        <p>
          Fractal dimension, Hurst exponent, and multifractal spectrum together allow the researchers to spot
the stochastic noise and to distinguish it from the deterministic patterns that lead to the pollution
dynamics. This provides a multi-faceted understanding of the hierarchical and non-linear nature of
pollution dynamics [
          <xref ref-type="bibr" rid="ref10">10, 22</xref>
          ]. These methodological tools form the foundation for both descriptive
and predictive modeling of environmental phenomena, guiding the selection of appropriate analytical
techniques for time series forecasting.
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Application of fractal analysis in air pollution monitoring</title>
        <p>
          Fractal and multifractal techniques have managed to find a wide application in the analysis of urban air
pollution. In the case of Astana, Kazakhstan, Biloshchytskyi and colleagues [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ] by means of R/S analysis
proved the existence of long-term memory in time series of PM2.5 and PM10. Likewise, strong fractal
signatures with Hurst exponents consistently exceeding 0.8—characteristics of a persistent trend—were
observed in the PM10 data sets from Athens, Greece [
          <xref ref-type="bibr" rid="ref6">6, 21</xref>
          ]. Multifractal studies in Taipei [15] and
Shanghai [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ] not only showed the presence of scale invariance and long-range correlations in air
pollutant concentrations but also suggested that multifractal methods are imperative for precise and
accurate forecasting.
        </p>
        <p>Besides, more and more researchers acknowledged the practical utility of fractal analysis in the
management of air quality. Conducted as a typical example, the study by Evagelopoulos et al. [20] in
Greece made it possible to distinguish between background and episodic pollution, and the one by
Prada et al. [23] in Colombia revealed policy-relevant persistence in PM2.5 and PM10 concentrations.
The recent research indicates the incorporation of multi-source data and spatiotemporal analysis for a
full-fledged representation of urban pollution dynamics [ 24]. The sum total of these findings is that the
use of fractal and multifractal metrics has become indispensable in predictive modeling of urban air
quality.</p>
      </sec>
      <sec id="sec-2-4">
        <title>2.4. Integration of fractal analysis with other advanced techniques</title>
        <p>
          Although fractal methods efectively capture non-linear and persistent dynamics, their power to predict
is still being improved through the combination with ML, DL, and advanced statistical models. The
hybrid frameworks are making use of the transparency of the fractal measures and the power of the
data-driven algorithms to predict urban pollution more accurately [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. The FII-LSTM model is one of
the examples where fractal interpolation is combined with LSTM networks to take long-term temporal
correlations [
          <xref ref-type="bibr" rid="ref3">3, 25</xref>
          ] along with the complexity of the structure. Additionally, the SARFIMA-NARX
methodology embraces the fractional-order Lorenz dynamics for chaotic air pollution patterns [19].
There have also been suggested hybrid models that combined ARIMA, LSTM, Random Forest, CNN,
GRU, and AIoT-driven sensor networks which indicated a substantial increase in predictive performance
for PM2.5 and AQI time series over the traditional models [
          <xref ref-type="bibr" rid="ref5 ref9">9, 5, 26</xref>
          ].
        </p>
        <p>The integration of fractal and hybrid models is a step forward in urban air pollution monitoring as
it is revealing the scale-invariant structures while still making use of the state-of-the-art prediction
algorithms. These methods are the best in environmental forecasting and they are also providing a
solid base for pollutant dynamics accurately anticipating and informing mitigation strategies [27, 28].</p>
        <p>Although previous studies have applied fractal analysis to characterize pollution time series and
used modern ML methods for forecasting, only a few works combine these two approaches within a
single predictive framework. The literature lacks empirical evidence that fractal indicators can directly
improve the predictive performance of machine-learning models for air quality. This article fills this
gap.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Dataset description</title>
      <sec id="sec-3-1">
        <title>3.1. Data sources and collection</title>
        <p>This study is based on a comprehensive environmental dataset from Astana, Kazakhstan, comprising
hourly measurements of key air pollutants and meteorological variables. The dataset provides a detailed
representation of urban air quality dynamics, forming the empirical foundation for fractal time series
analysis and hybrid machine learning forecasting.</p>
        <p>The dataset used in this study includes key air quality and meteorological parameters essential for
fractal analysis and hybrid forecasting. The primary air quality indicators comprise the Air Quality
Index (AQI), which provides an aggregated measure of overall air pollution, and particulate matter
concentrations PM2.5 and PM10, representing particles with aerodynamic diameters less than or equal
to 2.5 m and 10 m, respectively.</p>
        <p>Gaseous pollutants included in the dataset are carbon monoxide (CO), sulfur dioxide (SO2), and
nitrogen dioxide (NO2). In addition, meteorological variables such as temperature, relative humidity,
precipitation, wind speed, wind direction, and atmospheric pressure were incorporated to support
comprehensive environmental analysis.</p>
        <p>Data were collected from multiple publicly accessible and reliable sources to ensure high temporal
resolution and broad spatial coverage across the city of Astana:
• Kazhydromet National Monitoring Network [29] – providing real-time and historical air
quality measurements from automatic and stationary monitoring stations;
• AQI India Kazakhstan Dashboard [30] – supplying real-time AQI and pollutant concentration
data;
• Meteostat Python Library [31] – delivering historical meteorological datasets for preprocessing
and analytical integration.</p>
        <p>An overview of the dataset structure and representative records is shown in Figure 1. The dataset spans
the period from January 1, 2020, to November 9, 2025, enabling the capture of extended environmental
variability and supporting robust fractal and predictive modeling. All observations originate from
Meteostat monitoring station in Astana “Astana/Prigorodnyy” (ID: UACC0).</p>
        <p>The compiled dataset contains 15 columns in tabular form, with each row representing a single
hourly measurement. The structure includes:
• Air quality indicators: AQI, PM2.5, PM10, CO, SO2, NO2, and O3;
• Meteorological variables: temperature, dew point, relative humidity, precipitation, wind speed,
wind direction, and atmospheric pressure.</p>
        <p>Most columns are numerical types, supporting statistical and fractal analysis. The AQI and pollutant
columns reflect hourly concentrations essential for air quality monitoring, while meteorological data
(e.g., temperature, humidity, wind) provide key context for interpreting pollution variability. Non-null
counts indicate data completeness for each column; minor gaps in meteorological variables (e.g.,
humidity, precipitation) are addressed during preprocessing. The dataset is structured for straightforward
integration into fractal time series analysis pipelines and further modeling.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Data cleaning and preprocessing</title>
        <p>To ensure the quality and reliability of the dataset for fractal analysis and predictive modeling, a
multi-stage data cleaning and preprocessing pipeline was implemented.</p>
        <p>Duplicate removal and validation. Duplicate rows, identified via the timestamp column, were
removed to prevent redundancy and bias in time series modeling. Pollutant measurements (PM2.5,
PM10, CO, SO2, NO2, O3) were checked for invalid values, such as negative or zero readings. These
values were replaced with missing entries (NaN) to maintain physical validity.</p>
        <p>Temporal regularization. The timestamp column was assessed for uniform hourly intervals.
Irregularities in sampling were corrected through interpolation or appropriate handling of missing
entries, ensuring a consistent temporal resolution across the dataset.</p>
        <p>Handling missing meteorological data. Continuous meteorological variables such as temperature,
dew point, relative humidity, atmospheric pressure, and wind speed—were interpolated using
timeaware methods based on temporal proximity. Wind direction, being a cyclic variable, was interpolated
via trigonometric transformation to avoid discontinuities across the 0°–360° boundary.</p>
        <p>Precipitation data treatment. Precipitation (prcp) values required specialized procedures:
• short gaps of up to three consecutive hours were interpolated;
• single missing entries flanked by zero precipitation were imputed as zero;
• longer missing intervals were left unchanged to preserve data integrity.</p>
        <p>The resulting dataset is cleaned, regularized, and minimally gapped, providing a reliable foundation
for fractal time series characterization and hybrid machine learning forecasting.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Time series exploratory data analysis</title>
        <p>Before performing fractal analysis and prediction an exhaustive exploratory study was carried out to
describe the statistical and temporal characteristics of the dataset. The case of PM2.5 is taken as an
example for its decisive contribution to the evaluation of urban air quality and its very high sensitivity
to changes in weather and human activities. The same analysis process was performed on the other
pollutant variables.</p>
        <sec id="sec-3-3-1">
          <title>3.3.1. Time series structure</title>
          <p>The temporal evolution of PM2.5 concentrations is illustrated in Figure 2. The hourly time series was
visualized to inspect long-term trends, seasonal fluctuations, and short-term irregularities. The plot
reveals that there is considerable variability across all hours, days, and seasons, which is characteristic
of the emission intensification that is known to occur in wintertime in northern cities such as Astana.
The visualization also shows some sharp peaks, which may be the result of pollution episodes or faulty
sensors. All these findings point to the need for a more detailed decomposition into the individual
components.</p>
        </sec>
        <sec id="sec-3-3-2">
          <title>3.3.2. Trend, seasonality, and residual decomposition</title>
          <p>The series was broken into parts using Seasonal-Trend decomposition based on Loess (STL) in order to
inspect the hidden structure. The trend component shows the changes in the long run driven by heating
seasons, industrial cycles, and meteorological variability. The seasonal component shows a very strong
intraweek and yearly periodicity, while the residual component takes short-term fluctuations that are
not explained by deterministic patterns. This splitting gives a very crucial ground for both fractal
analysis—where long-range dependence is sensitive to structural components, and ML models, which
are able to separate the signals more clearly as a result. Figure 3 presents the STL decomposition results,
consisting of four subplots: the original PM2.5 time series at daily resolution, the extracted long-term
trend component, the seasonal component capturing recurring periodic patterns, and the residual
component representing short-term irregular fluctuations not explained by the trend or seasonality.</p>
        </sec>
        <sec id="sec-3-3-3">
          <title>3.3.3. Autocorrelation structure and cyclicity</title>
          <p>The autocorrelation function (ACF) and partial autocorrelation function (PACF) plots were employed
to reveal the characteristics of cyclicity, persistence and lag dependencies. The ACF shows that there
is a strong correlation at short as well as long lags thus indicating the presence of slow-decaying
memory efects which are typical for fractal and long-range dependent processes. The PACF spots
the main short-range dependencies and the corresponding points of the cutof for possible hybrid
ARIMA, SARIMA, or SARFIMA components. The periodicity of these structures allows integrating
fractal measures in the modeling process, especially the Hurst exponent and the DFA-based scaling
exponents. The ACF and PACF plots shown in Figures 4 and 5, respectively, were employed to reveal
the characteristics of cyclicity, persistence, and lag dependencies.</p>
        </sec>
        <sec id="sec-3-3-4">
          <title>3.3.4. Stationarity</title>
          <p>For the purpose of forecasting, the time series was assessed through the application of both the
Augmented Dickey–Fuller (ADF) and Kwiatkowski–Phillips–Schmidt–Shin (KPSS) tests. The ADF
test determines if the series is characterized by a unit root (non-stationarity caused by stochastic
trends), while the KPSS test checks for stationarity around a deterministic trend. In accordance with
previous similar studies, PM2.5 demonstrated mixed behaviors of stationarity: ADF most often indicates
non-stationarity, while KPSS confirms trend-stationary or weakly non-stationary structure most of the
time. Detrending, seasonal adjustment, and possibly fractional diferencing in subsequent modeling,
are backed by these findings. These results are summarized in Table 1.</p>
          <p>Note: Variables in ADF test formula:  – observed PM2.5 concentration at time , Δ – first diference,
 – time index,  – intercept,  – deterministic trend coeficient,  – unit root parameter,   – short-term
dynamic coeficients,  – lag order,  – white-noise error. Variables in KPSS test formula   – stochastic
trend (KPSS),  – stationary residual,   – random walk innovation,  – cumulative sum of residuals,
^2 – long-run variance,  – sample size.</p>
        </sec>
        <sec id="sec-3-3-5">
          <title>Test</title>
        </sec>
        <sec id="sec-3-3-6">
          <title>ADF Statistic</title>
        </sec>
        <sec id="sec-3-3-7">
          <title>ADF p-value</title>
        </sec>
        <sec id="sec-3-3-8">
          <title>ADF Critical Values</title>
        </sec>
        <sec id="sec-3-3-9">
          <title>ADF Result</title>
        </sec>
        <sec id="sec-3-3-10">
          <title>KPSS Statistic</title>
        </sec>
        <sec id="sec-3-3-11">
          <title>KPSS p-value</title>
        </sec>
        <sec id="sec-3-3-12">
          <title>KPSS Critical Values</title>
        </sec>
        <sec id="sec-3-3-13">
          <title>KPSS Result</title>
          <p>Likely stationary</p>
        </sec>
        <sec id="sec-3-3-14">
          <title>Value / Decision</title>
        </sec>
        <sec id="sec-3-3-15">
          <title>Formula / Interpretation</title>
          <p>−14.2523</p>
          <p>Comparison: −14.2523 &lt; all critical values
⇒ stationary.</p>
          <p>Series is stationary because | | is far
below critical thresholds.</p>
          <p>KPSS tests (random walk):
 =   + ,   =  −1 +  (2)
KPSS 12 ∑ ︀^2 =1  (3)</p>
          <p>=1 2 ,  = ∑︀
Fail to reject 0: trend-stationary process.
Statistic (0.3894) is near but below most thresholds
⇒ stationary.</p>
          <p>Series exhibits stationarity under KPSS framework.</p>
        </sec>
        <sec id="sec-3-3-16">
          <title>3.3.5. Distributional properties and outlier detection</title>
          <p>Distributional analyses were conducted using histogram estimates, which reveal that PM2.5 exhibits a
right-skewed and heavy-tailed distribution. This behavior indicates more often a low concentration of
pollutants with rare severe pollution events. Box plots were also used to detect the outliers relying on
the interquartile range (IQR), and thus, several very noticeable spikes were pointed out (see Figure 6).</p>
          <p>To systematically detect anomalous observations, multiple complementary techniques were applied:
• Z-score thresholding, used to identify statistically extreme deviations from the mean;
• IQR-based outlier detection, flagging values outside the conventional 1.5 × IQR limits;
• Isolation Forest, a robust machine learning algorithm capable of capturing irregular or
anomalous patterns in temporal data.</p>
          <p>The results of these anomaly detection methods are illustrated in Figure 7.</p>
          <p>These combined approaches enable a clear distinction between sensor-induced noise and authentic
pollution events. Outliers attributable to measurement errors were corrected or removed during
preprocessing, while true high-pollution episodes were retained to preserve the environmental validity
of the dataset.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Methods and models</title>
      <sec id="sec-4-1">
        <title>4.1. Fractal methods</title>
        <p>The first step involves characterising the fractal properties of each environmental parameter (AQI, PM 2.5,
PM10, CO, SO2, NO2, temperature, humidity, precipitation, atmospheric pressure, soil surface condition).
These properties provide foundational insights into the inherent complexity, persistence, and scaling
behaviour of the series—features that critically inform subsequent modelling decisions. Identifying
long-range dependencies, fluctuation heterogeneity, and multi-scale structure helps determine whether
pollution indicators are predictable, mean-reverting, or dominated by stochastic variability.</p>
        <sec id="sec-4-1-1">
          <title>4.1.1. Hurst Exponent Estimation</title>
          <p>The Hurst exponent () is the primary metric used to quantify long-term memory in environmental
time series.</p>
          <p>•  &gt; 0.5: persistent behaviour — increases tend to be followed by further increases.
• 0.5 ≤  ≤  Th: random behaviour — fluctuations behave similarly to a random walk.
•  &lt; 0.5: anti-persistent behaviour — increases tend to be followed by decreases.</p>
        </sec>
        <sec id="sec-4-1-2">
          <title>4.1.2. Rescaled Range (R/S) Analysis</title>
          <p>Given suficiently long and continuous data, the Hurst exponent can be estimated via classical rescaled
range analysis.</p>
          <p>1. Split the time series into segments of length .
2. For each segment, compute the cumulative deviation:</p>
          <p>() = ∑︁( − ¯),  = 1, . . . , ,</p>
          <p>=1
where  is the value of the series at time , ¯ is the mean of the segment, and  () is the
cumulative deviation.
3. Compute the range:</p>
          <p>() = max( ()) − min( ()),
where () represents the maximum fluctuation within the segment.</p>
          <p>4. Compute the standard deviation:
where  is a constant of proportionality.</p>
        </sec>
        <sec id="sec-4-1-3">
          <title>4.1.3. Detrended Fluctuation Analysis (DFA)</title>
          <p>DFA is the primary method used in this study due to its robustness to non-stationarity, missing segments,
trends, and structural breaks—conditions frequently observed in pollution monitoring data.
1. Integrate the time series:</p>
          <p>where () is the standard deviation of the segment.
5. Estimate the scaling relationship:
The exponent  is obtained by fitting a log–log regression:</p>
          <p>⎯
() = ⎷⎸⎸ 1 ∑=︁1 ( − ¯)2,
()
() ∼   ,
log
︂( () )︂
()
=  log() + ,</p>
          <p>() = ∑︁( − ¯),</p>
          <p>=1
() =  () − ^(),
⎯</p>
          <p>() = ⎷⎸⎸ 1 ∑︁ ()2,</p>
          <p>=1
where  is the original series at time , ¯ is the mean of the series, and  () is the cumulative
sum (integrated signal).
2. Divide into windows of size  and in each window fit a local polynomial trend ^(), where  is
the window size and ^() is the local trend estimate.
3. Detrend the signal:</p>
          <p>where () is the detrended series in window .
4. Compute the fluctuation function:
where  () measures the root-mean-square fluctuation of the detrended series, and  is the total
number of points.</p>
          <p>(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
5. Estimate the scaling law:
where () and () are the AR and MA polynomials, and   is white-noise error.</p>
          <p>The autoregressive and moving-average polynomials are:</p>
          <p>() ∼   ,
 = .</p>
          <p>= 2 − .</p>
          <p>= | − 0.5|.
′ = (1 − ) ,</p>
          <p>= −1 ,
()′ = () ,
() = 1 −  1 − · · · −</p>
          <p>,
() = 1 +  1 + · · · +  ,
^+ℎ = (+ℎ | , −1 , . . . ),</p>
          <p>Higher values of  indicate stronger deviation from randomness, and thus higher theoretical
predictability.</p>
        </sec>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Models</title>
        <p>Diferent modeling techniques are utilized to turn the environmental time series characterized by fractal
into prediction power. The methods are based on diferent kinds of structural assumptions, diferent
degrees of flexibility, and diferent suitability for linear, seasonal or nonlinear dynamics. The chosen
ones are ARIMA, SARIMA, LSTM, and Prophet forecasting. Each of the models provides a viewpoint
that adds to the understanding of pollution dynamics, thereby making it possible to carry out a solid
comparison across the diferent forecasting methods of linear, seasonal, non-linear, and trend-adaptive.</p>
        <sec id="sec-4-2-1">
          <title>4.2.1. ARIMA Model</title>
          <p>The ARIMA(, , ) model captures linear temporal dependence using autoregression, diferencing, and
moving-average components. After diferencing  times:
where  is the original series, ′ is the -times diferenced series, and  is the backshift operator.</p>
          <p>The ARIMA model satisfies:
where  is the AR order and  are autoregressive coeficients,  = 1, . . . , .
where  is the MA order and   are moving-average coeficients,  = 1, . . . , .</p>
          <p>Forecasts follow the AR–MA recursion:</p>
          <p>where  is the DFA scaling exponent characterizing long-range correlations.</p>
          <p>The exponent  is directly related to the Hurst exponent:</p>
        </sec>
        <sec id="sec-4-2-2">
          <title>4.1.4. Fractal Dimension and Predictability Index</title>
          <p>From , additional fractal characteristics can be derived:
• Fractal dimension:
• Predictability index (optional):
(12)
(13)
(14)
(15)
(16)
(17)
(18)
(19)
(20)
where ^+ℎ is the ℎ-step ahead forecast and (· | )· denotes conditional expectation given past
observations.</p>
          <p>ARIMA provides an interpretable baseline, particularly when the diferenced series is stationary and
exhibits long-memory behaviour.</p>
          <p>subsubsectionSARIMA Model</p>
          <p>SARIMA extends ARIMA by incorporating seasonal structure:
where , ,  are non-seasonal AR, diferencing, and MA orders, , ,  are seasonal AR, diferencing,
and MA orders, and  is the seasonal period.</p>
          <p>Combined diferencing is given by:</p>
          <p>SARIMA(, , )(, , ),
 = (1 − ) (1 −  ),
(21)
(22)
(23)
(24)
(25)
(26)
(27)
(28)
(29)
where  is seasonal AR order and Φ are seasonal AR coeficients,  = 1, . . . ,  .</p>
          <p>Φ() = 1 − Φ 1 − · · · − Φ</p>
          <p>,
Θ() = 1 + Θ1 − · · · − Θ
,
where  is seasonal MA order and Θ are seasonal MA coeficients,  = 1, . . . , .</p>
          <p>SARIMA is efective for pollutants exhibiting daily or annual periodicity.</p>
        </sec>
        <sec id="sec-4-2-3">
          <title>4.2.2. LSTM Network</title>
          <p>LSTM networks model nonlinear and long-range dependencies using gated memory cells. For each
time step , with input , previous hidden state ℎ−1 , and cell state −1 :</p>
          <p>Forget gate:</p>
          <p>= (   +  ℎ−1 +  ),
where  is the forget gate vector,  is the sigmoid activation,  and  are input and hidden weights,
and  is the bias.</p>
          <p>Input gate and candidate state:
where  is the original series,  is the diferenced series,  is the backshift operator,  is non-seasonal
diferencing order,  is seasonal diferencing order, and  is the seasonal period.</p>
          <p>The full model is:</p>
          <p>Φ()() = Θ()() ,
where () and () are non-seasonal AR and MA polynomials, Φ() and Θ() are seasonal AR
and MA polynomials, and  is white-noise error.</p>
          <p>Seasonal polynomials:</p>
          <p>= (  + ℎ−1 + ),
where  is the input gate vector, ,  are weights, and  is the bias.
where ˜ is the candidate cell state, ,  are weights, and  is the bias.</p>
          <p>Cell update:
where  is the updated cell state and ⊙ denotes element-wise multiplication.</p>
          <p>˜ = tanh( + ℎ−1 + ),</p>
          <p>=  ⊙  −1 +  ⊙ ˜ ,</p>
          <p>Output gate and hidden state:
where  is the output gate vector, ,  are weights, and  is the bias.</p>
          <p>= (  + ℎ−1 + ),</p>
          <p>ℎ =  ⊙ tanh( ),
where ℎ is the hidden state passed to the next time step or output layer.</p>
          <p>LSTMs efectively capture nonlinear patterns, persistence, and extreme pollution events.</p>
        </sec>
        <sec id="sec-4-2-4">
          <title>4.2.3. Prophet Model</title>
          <p>Prophet models the time series as an additive decomposition:</p>
          <p>() = () + () + ℎ() + ,
where () is the observed series, () is the trend, () is the seasonal component, ℎ() represents
holidays or events, and  is the error term.</p>
          <p>Trend (piecewise linear):
⎛
⎞</p>
          <p>⎛
 
() = ⎝ + ∑︁   1( &gt;   )⎠  + ⎝ + ∑︁   1( &gt;   )⎠ ,
=1 =1
⎞
where  is the base growth rate,  is the ofset,  is the number of change points at times   ,   and  
are adjustments to the slope and ofset, and 1(·) is the indicator function.</p>
          <p>Seasonality (Fourier series):
() = ∑︁ [︂  cos
=1
︂( 2 )︂

+  sin
︂( 2 )︂]

,
where  is the number of Fourier components,  is the period (e.g., 24 for daily or 365 for yearly), and
,  are Fourier coeficients.</p>
          <p>Events/holidays:</p>
          <p>ℎ() = ∑︁   1( ∈  ),</p>
          <p>=1
where  is the number of special events,  is the set of times corresponding to event , and   is the
efect size of the event.</p>
          <p>Prophet handles multiple seasonalities, trend shifts, missing data, and abrupt environmental changes.</p>
        </sec>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Model architecture</title>
        <p>Figure 8 illustrates the overall architecture of the proposed forecasting model. The structure for
prediction that is suggested combines classic statistical methods, machine-learning techniques, and
fractal indicators into one predictive system. The system is intended to detect not only the short-term
but also the long-range dependence of environmental time-series data. The process of forecasting starts
with the preprocessing of data where the raw hourly measurements undergo cleaning, alignment, and
optional decomposition by STL which separates them into trend, seasonality, and residual components.
After this, the fractal analysis is performed on the detrended or residual series to compute the Hurst
exponent, fractal dimension, and spectral scaling among other things. These indicators express the
amount of power, intricacy, and multi-scale behaviour in the data and thus they are treated as an
additional feature set for the forecasting models. Each model is given an input that is specifically
curated by merging preprocessed pollutant and meteorological data with fractal-derived features. The
(30)
(31)
(32)
(33)
(34)
(35)
architecture incorporates a wide range of forecasting methods, namely ARIMA, SARIMA, ETS,
STLbased models, Prophet, tree-based methods, and LSTM networks. Each of the models is responsible for
producing its own forecast and uncertainty estimate. A hybrid ensemble layer is then used to combine
these outputs through either performance-based weighting or stacked learning methods. This merging
step results in higher prediction accuracy, robustness, and adaptability for the air pollution time series
that vary in fractal regimes.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Results</title>
      <p>In this part, the developed hybrid fractal–machine learning framework’s predictive performance is
demonstrated over the four model families: ARIMA, SARIMAX, Prophet, and LSTM. The accuracy of
the models was assessed through the use of six complementary metrics—MAE, RMSE, MAPE, sMAPE,
MASE, and 2—which allowed an all-encompassing evaluation of both absolute and relative forecasting
error. Table 2 summarizes the results.</p>
      <sec id="sec-5-1">
        <title>5.1. Evaluation Metrics</title>
        <p>Let the following notations be defined:
•  =  −  −1
•  — the observed (actual) value at time ;
• ^ — the predicted value at time ;
•  — the total number of observations in the test set;
• ¯ = 1 ∑︀</p>
        <p>=1  — the mean of the actual values;
•  =  − ^  — the forecast error;</p>
        <p>— the naïve forecast error used in MASE.</p>
        <sec id="sec-5-1-1">
          <title>Mean Absolute Error (MAE)</title>
          <p>MAE = 1 ∑︁ | − ^ | .
 =1
(36)</p>
        </sec>
        <sec id="sec-5-1-2">
          <title>Root Mean Squared Error (RMSE)</title>
        </sec>
        <sec id="sec-5-1-3">
          <title>Mean Absolute Percentage Error (MAPE)</title>
        </sec>
        <sec id="sec-5-1-4">
          <title>Symmetric Mean Absolute Percentage Error (sMAPE)</title>
          <p>⎯
RMSE = ⎷⎸⎸ 1 ∑=︁1 ( − ^ )2.</p>
          <p>MAPE = 100 ∑=︁1 ⃒⃒⃒⃒  −^  ⃒⃒⃒⃒ .
sMAPE =
MASE =
.
.</p>
        </sec>
      </sec>
      <sec id="sec-5-2">
        <title>Coeficient of Determination ( 2)</title>
        <p>The forecasting performance of the diferent models, evaluated using multiple error metrics, is
summarized in Table 2.</p>
      </sec>
      <sec id="sec-5-3">
        <title>5.2. Interpretation of results</title>
        <p>The results indicate a clear performance hierarchy across model types. The ARIMA model demonstrates
the weakest forecasting ability, with negative 2 and the highest error rates, indicating that linear
autoregressive patterns alone cannot adequately capture the complexity of urban air pollution dynamics.
The ARIMA forecast for AQI, illustrating the model’s limitations in capturing complex temporal patterns,
is shown in Figure 9.</p>
        <p>The SARIMAX model improves substantially by incorporating exogenous meteorological variables,
reducing MAE by approximately 30% compared to ARIMA and achieving a positive 2 of 0.50. Prophet
provides further improvements through flexible trend and seasonality modelling, achieving moderate
errors and strong generalization performance. The SARIMAX forecast demonstrates improved accuracy
over ARIMA, is shown in Figure 10.</p>
        <p>The LSTM model augmented with fractal features achieved the highest forecasting accuracy among
the evaluated models, with a coeficient of determination of 2 = 0.93 and a mean absolute percentage
error (MAPE) of less than 9%. This performance significantly exceeds that of the best-performing
traditional Prophet model (2 ≈ 0.76 ) as well as the baseline ARIMA model, which exhibited a negative
2 value, indicating poor approximation quality.</p>
        <p>These results confirm the efectiveness of the proposed hybrid approach. The LSTM architecture is
capable of capturing complex temporal dependencies, particularly when supplemented with information
on long-range dependence through fractal indicators. Notably, high values of the Hurst exponent
( ≈ 0.9 ) observed in particulate matter time series reflect persistent dynamics. Under such conditions,
the fractal-enhanced LSTM model demonstrates superior predictive performance, whereas the ARIMA
model fails to achieve comparable forecasting accuracy.</p>
        <p>Figures 11–13 illustrate the forecasting results obtained from diferent models. Figure 11 presents
the LSTM forecast over the full AQI dataset, highlighting the model’s capability to capture complex
nonlinear temporal patterns. Figure 12 shows the hybrid LSTM+ARIMA forecast, demonstrating
improved performance by combining linear and nonlinear components. Finally, Figure 13 displays the
Prophet model forecast, which efectively captures trends and seasonal efects in the AQI series.</p>
      </sec>
      <sec id="sec-5-4">
        <title>5.3. Visual comparison of forecasts</title>
        <p>In addition to numerical evaluation, forecast performance was assessed visually using:
• time-series forecast plots comparing predicted and observed pollutant concentrations,
• residual plots highlighting error distribution and autocorrelation,
• overlayed multi-model forecast curves illustrating diferences across model families.</p>
        <p>The LSTM predictions are always in more detail with both short-term changes and long-term trends
than the baseline models as shown by the visualizations. The Prophet and SARIMAX are capturing
the medium-term behavior but they are still showing some lag during the rapid regime shifts while
ARIMA is unable to model the non-linear variability and strong seasonal components. The hybrid
ensemble further reduces the forecasting error by approximately 10–15% compared with the
bestperforming individual model, indicating a complementary efect among the constituent predictors.
This improvement suggests that each model captures distinct structural characteristics of the data. In
particular, linear dynamics and seasonal patterns are efectively modeled by the ARIMA and Prophet
approaches, whereas nonlinear long-term dependencies are accounted for by the fractal-enhanced LSTM
model. Emphasizing these advantages in the results section makes it possible to clearly demonstrate
why the proposed hybrid approach is justified for such a complex urban air pollution forecasting task.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Discussions and future directions</title>
      <p>The outcomes demonstrate that incorporating fractal traits and long-range dependence into the
prediction models has a remarkable efect on the accuracy of forecasting urban air pollution. Traditional
models like ARIMA and SARIMAX are able to capture linear trends and short-term dependencies but
are unable to deal with non-linear fluctuations and multi-scale patterns which is expressed in their
relatively high errors and low 2 values. Prophet performs better by modelling multiple seasonalities
and structural trends, yet its piecewise-linear assumptions limit responsiveness to abrupt regime shifts
observed in the dataset.</p>
      <p>LSTM achieved the best performance across all metrics, demonstrating its strength in capturing
non-linear interactions, long-range dependencies, and complex temporal patterns. This aligns with the
persistent behaviour revealed by Hurst exponent and DFA analyses, suggesting that models capable of
leveraging fractal features gain a predictive advantage. Incorporating fractal indicators as auxiliary
inputs enhances model awareness of scaling behaviour and memory efects, which classical models
may overlook.</p>
      <p>Overall, these findings confirm that hybrid fractal–machine learning approaches provide a more
robust framework for environmental forecasting, particularly in systems with persistence, multi-scale
variability, and non-linear dynamics. The study highlights the practical value of combining advanced
statistical, deep learning, and fractal-based methods for urban air quality prediction.</p>
      <p>Although the hybrid model demonstrates clear advantages, a number of limitations of the study
should be noted. The obtained results are based on data from only one city, therefore the proposed
approach needs to be tested in other urban conditions to confirm its general applicability. It should
also be noted that the proposed framework currently operates in a static mode for the purposes of
retrospective forecasting. If the model is to be applied dynamically in real time, it will require adaptation
and optimization. This is due to the fact that the implementation of the described models may be
computationally demanding for streaming data. In addition, another limitation is that the study applies
a univariate approach for each pollutant. The interactions between pollutants were not considered.
However, in general, the integration of interdependencies between pollutants is a promising direction
for further research. Nevertheless, taking into account the stated limitations, it can be emphasized that
the method has high practical potential for real-world air quality management.</p>
      <p>One of the priorities for future research is to take into account a broader range of patterns. This can
further improve the accuracy of the models, which is important for efective environmental monitoring.
An important direction is the development of multidimensional deep learning models that will
simultaneously use fractal features, multiple pollutants, and exogenous variables. The integration of this
framework with real-time sensor networks is also planned. In addition, an important future step is the
quantitative assessment of forecast uncertainty in order to provide decision-makers not only with point
forecasts but also with confidence intervals.</p>
    </sec>
    <sec id="sec-7">
      <title>7. Conclusions</title>
      <p>In this study, the stated goal of improving the accuracy of urban air pollution forecasting was achieved
through the implementation of a new hybrid fractal–machine learning framework. We succeeded
in filling the scientific gap by accounting for long-term dependence and multiscale variability using
fractal features that traditional models are unable to reproduce. The proposed approach, tested on
air quality data from the city of Astana, provided a significant improvement in forecasting accuracy.
In particular, the fractal-informed LSTM model reduced MAPE by more than 80% compared with
ARIMA and achieved an 2 value of 0.93, which confirmed the proposed hypothesis that fractal metrics
enhance the predictive capabilities of the models. By combining this LSTM with classical models in an
ensemble, a robust tool was developed that outperforms individual methods while efectively combining
interpretability and accuracy in pollution forecasting.</p>
    </sec>
    <sec id="sec-8">
      <title>Acknowledgments</title>
      <p>The authors express sincere gratitude to Kazambayev Ilyas for his insightful comments during the
development of this work.</p>
    </sec>
    <sec id="sec-9">
      <title>Declaration on Generative AI</title>
      <p>The authors have not employed any Generative AI tools.
[15] C.-K. Lee, Multifractal characteristics in air pollutant concentration time series, Water, Air, and</p>
      <p>Soil Pollution 135 (2002) 389–409. doi:10.1023/A:1014768632318.
[16] J. W. Kantelhardt, S. A. Zschiegner, E. Koscielny-Bunde, A. Bunde, S. Havlin, H. E. Stanley,
Multifractal detrended fluctuation analysis of nonstationary time series, arXiv preprint (2002). URL:
https://arxiv.org/abs/physics/0202070. arXiv:physics/0202070.
[17] G. Zhao, X. Guo, X. Wang, D. Zheng, Using a novel fractal-time-series prediction model to predict
coal consumption, Discrete Dynamics in Nature and Society 2023 (2023) 8606977. doi:10.1155/
2023/8606977.
[18] P. Pacheco, E. Mera, Fractal dimension of pollutants and urban meteorology of a basin
geomorphology: Study of its relationship with entropic dynamics and anomalous difusion, Fractal and
Fractional 9 (2025) 255. doi:10.3390/fractalfract9040255.
[19] A. H. Bukhari, M. A. Z. Raja, M. Shoaib, A. K. Kiani, Fractional order lorenz based physics informed
sarfima-narx model to monitor and mitigate megacities air pollution, Chaos, Solitons &amp; Fractals
161 (2022) 112375. doi:10.1016/j.chaos.2022.112375.
[20] V. Evagelopoulos, S. Zoras, A. G. Triantafyllou, T. A. Albanis, Pm10-pm2.5 time series and fractal
analysis, Global NEST Journal 8 (2006) 234–240. URL: https://journal.gnest.org/sites/default/files/
Journal%20Papers/234_240-EVAGELOPOULOS_372_8-3.pdf.
[21] D. Nikolopoulos, A. Alam, E. Petraki, P. Yannakopoulos, K. Moustris, Multifractal patterns
in 17-year pm10 time series in athens, greece, Environments 10 (2023) 9. doi:10.3390/
environments10010009.
[22] L. Pei, J. Chen, J. Zhou, H. Huang, Z. Zhou, C. Chen, F. Yao, A fractal prediction method for safety
monitoring deformation of core rockfill dams, Mathematical Problems in Engineering 2021 (2021)
6655657. doi:10.1155/2021/6655657.
[23] D. A. Prada, D. Parra, J. D. Tarazona, M. F. Silva, P. Vera, S. Montoya, A. Acevedo, J. Gomez, Fractal
analysis of the time series of particulate material, Journal of Physics: Conference Series 1514
(2020) 012016. doi:10.1088/1742-6596/1514/1/012016.
[24] C. Lorin t,, E. Traistă, A. Florea, D. Marchis, , S. M. Radu, A. Nicola, E. Rezmerit, a, Spatiotemporal
distribution and evolution of air pollutants based on comparative analysis of long-term monitoring
data and snow samples in petros, ani mountain depression, romania, Sustainability 17 (2025) 3141.</p>
      <p>URL: https://www.mdpi.com/2071-1050/17/7/3141.
[25] K. Kolesnikova, L. Naizabayeva, A. Myrzabayeva, R. Lisnevskyi, Use of neural networks in
prediction of environmental processes, in: 2024 IEEE 4th International Conference on Smart
Information Systems and Technologies (SIST), IEEE, Astana, Kazakhstan, 2024, pp. 625–630.
doi:10.1109/SIST61555.2024.10629330.
[26] C. Ma, G. Dai, J. Zhou, Short-term trafic flow prediction for urban road sections based on time
series analysis and lstm_bilstm method, IEEE Transactions on Intelligent Transportation Systems
23 (2022) 5615–5624. doi:10.1109/TITS.2021.3055258.
[27] Y. Andrashko, O. Kuchanskyi, A. Biloshchytskyi, A. Neftissov, S. Biloshchytska, Forecasting
air pollutant emissions using deep sparse transformer networks: A case study of the ekibastuz
coal-fired power plant, Sustainability 17 (2025) 5115. doi:10.3390/su17115115.
[28] O. Kuchanskyi, A. Biloshchytskyi, Y. Andrashko, A. Neftissov, S. Biloshchytska, S. Bronin,
Predictability of air pollutants based on detrended fluctuation analysis: Ekibastuz coal-mining center
in northeastern kazakhstan, Urban Science 9 (2025) 273. doi:10.3390/urbansci9070273.
[29] Kazhydromet, [kazhydromet], https://www.kazhydromet.kz/ru/, 2025. Retrieved November 9,
2025.
[30] AQI.in, Kazakhstan air quality index (aqi) dashboard: Astana, https://www.aqi.in/ru/dashboard/
kazakhstan/astana, 2025. Retrieved November 9, 2025.
[31] Meteostat, Hourly data structure, https://dev.meteostat.net/python/hourly.html#data-structure,
n.d.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Biloshchytskyi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kuchansky</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Andrashko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Neftissov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Vatskel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Yedilkhan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Herych</surname>
          </string-name>
          ,
          <article-title>Building a model for choosing a strategy for reducing air pollution based on data predictive analysis</article-title>
          ,
          <source>Eastern-European Journal of Enterprise Technologies</source>
          <volume>3</volume>
          (
          <year>2022</year>
          )
          <fpage>23</fpage>
          -
          <lpage>30</lpage>
          . doi:
          <volume>10</volume>
          .15587/
          <fpage>1729</fpage>
          -
          <lpage>4061</lpage>
          .
          <year>2022</year>
          .
          <volume>259323</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Y.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <article-title>Development of a trend forecasting model for environmental pollution monitoring</article-title>
          ,
          <source>Management of Development of Complex Systems</source>
          <volume>57</volume>
          (
          <year>2024</year>
          )
          <fpage>62</fpage>
          -
          <lpage>66</lpage>
          . URL: http://mdcs.knuba.edu.ua/ article/view/301806.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>X.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <surname>X. Wu,</surname>
          </string-name>
          <article-title>Research on urban ecological environment vulnerability prediction method based on fii-lstm</article-title>
          ,
          <source>SSRN Electronic Journal</source>
          ,
          <year>2024</year>
          . URL: https://ssrn.com/abstract=5249549, preprint.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>H. F.</given-names>
            <surname>Jelinek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Ahammer</surname>
          </string-name>
          ,
          <article-title>Operationalizing fractal linguistics: toward a unified framework for cross-disciplinary fractal analysis</article-title>
          ,
          <source>Frontiers in Physics 13</source>
          (
          <year>2025</year>
          )
          <article-title>1645620</article-title>
          . doi:
          <volume>10</volume>
          .3389/fphy.
          <year>2025</year>
          .
          <volume>1645620</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Bhatti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Song</surname>
          </string-name>
          ,
          <string-name>
            <given-names>U. A.</given-names>
            <surname>Bhatti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. S.</given-names>
            <surname>Syam</surname>
          </string-name>
          ,
          <article-title>Aiot-driven multi-source sensor emission monitoring and forecasting using multi-source sensor integration with reduced noise series decomposition</article-title>
          ,
          <source>Journal of Cloud Computing</source>
          <volume>13</volume>
          (
          <year>2024</year>
          ).
          <source>doi:10.1186/s13677-024-00598-9.</source>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>D.</given-names>
            <surname>Nikolopoulos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Moustris</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Petraki</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Koulougliotis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Cantzos</surname>
          </string-name>
          ,
          <article-title>Fractal and longmemory traces in pm10 time series in athens, greece</article-title>
          ,
          <source>Environments</source>
          <volume>6</volume>
          (
          <year>2019</year>
          )
          <article-title>29</article-title>
          . doi:
          <volume>10</volume>
          .3390/ environments6030029.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>B.</given-names>
            <surname>Ramadevi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Bingi</surname>
          </string-name>
          ,
          <article-title>Chaotic time series forecasting approaches using machine learning techniques: A review</article-title>
          ,
          <source>Symmetry</source>
          <volume>14</volume>
          (
          <year>2022</year>
          )
          <article-title>955</article-title>
          . doi:
          <volume>10</volume>
          .3390/sym14050955.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>J.</given-names>
            <surname>Kaur</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. S.</given-names>
            <surname>Parmar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <article-title>Autoregressive models in environmental forecasting time series: A theoretical and application review</article-title>
          ,
          <source>Environmental Science and Pollution Research</source>
          <volume>30</volume>
          (
          <year>2023</year>
          )
          <fpage>19617</fpage>
          -
          <lpage>19641</lpage>
          . doi:
          <volume>10</volume>
          .1007/s11356-023-25148-9.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>P.-W.</given-names>
            <surname>Chiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.-J.</given-names>
            <surname>Horng</surname>
          </string-name>
          ,
          <article-title>Hybrid time-series framework for daily-based pm2.5 forecasting, IEEE Access 9 (</article-title>
          <year>2021</year>
          )
          <fpage>104162</fpage>
          -
          <lpage>104174</lpage>
          . doi:
          <volume>10</volume>
          .1109/ACCESS.
          <year>2021</year>
          .
          <volume>3099111</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>F.</given-names>
            <surname>Amato</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Laib</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Guignard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kanevski</surname>
          </string-name>
          ,
          <article-title>Analysis of air pollution time series using complexityinvariant distance and information measures</article-title>
          ,
          <source>Physica A: Statistical Mechanics and its Applications</source>
          <volume>547</volume>
          (
          <year>2020</year>
          )
          <article-title>124391</article-title>
          . doi:
          <volume>10</volume>
          .1016/j.physa.
          <year>2020</year>
          .
          <volume>124391</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>A.</given-names>
            <surname>Biloshchytskyi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Neftissov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Kuchanskyi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Andrashko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Biloshchytska</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mukhatayev</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Kazambayev</surname>
          </string-name>
          ,
          <article-title>Fractal analysis of air pollution time series in urban areas in astana, republic of kazakhstan</article-title>
          ,
          <source>Urban Science</source>
          <volume>8</volume>
          (
          <year>2024</year>
          )
          <article-title>131</article-title>
          . URL: https://www.mdpi.com/2413-8851/8/3/131.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>X.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <article-title>On the multifractal analysis of air quality index time series before and during covid-19 partial lockdown: A case study of shanghai, china</article-title>
          ,
          <source>Physica A 565</source>
          (
          <year>2020</year>
          )
          <article-title>125551</article-title>
          . doi:
          <volume>10</volume>
          .1016/j. physa.
          <year>2020</year>
          .
          <volume>125551</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>A.</given-names>
            <surname>Biloshchytskyi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Kuchanskyi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Neftissov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Andrashko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Biloshchytska</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Kazambayev</surname>
          </string-name>
          ,
          <article-title>Fractal analysis of mining wastewater time series parameters: Balkhash urban region and sayak ore district</article-title>
          ,
          <source>Urban Science</source>
          <volume>8</volume>
          (
          <year>2024</year>
          )
          <article-title>200</article-title>
          . URL: https://www.mdpi.com/2413-8851/8/4/200.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>A.</given-names>
            <surname>Biloshchytskyi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Neftissov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Kuchanskyi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Andrashko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Biloshchytska</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mukhatayev</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Kazambayev</surname>
          </string-name>
          ,
          <article-title>Fractal analysis of air pollution time series in urban areas in astana, republic of kazakhstan</article-title>
          ,
          <source>Urban Science</source>
          <volume>8</volume>
          (
          <year>2024</year>
          )
          <article-title>131</article-title>
          . URL: https://www.mdpi.com/2413-8851/8/3/131.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>