<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>" Journal of Computational Social Science</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.15407/dse2022.02.037</article-id>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Victoria Vysotska</string-name>
          <email>Victoria.A.Vysotska@lpnu.ua</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Viktoriia Yakovlieva</string-name>
          <email>viktoriia.yakovlieva.sa.2022@lpnu.ua</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sofiia Ivaniv</string-name>
          <email>sofiia.ivaniv.sa.2022@lpnu.ua</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Iryna Shakleina</string-name>
          <email>ioshakleina@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Lviv Polytechnic National University</institution>
          ,
          <addr-line>Stepan Bandera 12, 79013 Lviv</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2022</year>
      </pub-date>
      <volume>48</volume>
      <issue>2</issue>
      <fpage>0000</fpage>
      <lpage>0001</lpage>
      <abstract>
        <p>The article presents the results of an intellectual analysis of the relationship between the dynamics of the number of Ukrainian refugees and the frequency of attacks by russia on the civilian and political infrastructure of Ukraine. The study aims to identify statistical relationships between these phenomena and visualize their dynamics using the R programming language. Three data samples were processed: the number of refugees, the number of attacks on civilian objects, the number of attacks on political objects, and the dynamics of the number of Ukrainian refugees abroad. The paper uses methods of preliminary statistical analysis, time series smoothing (moving average, median filtering, exponential smoothing) and also conducts correlation analysis. The results indicate a strong connection between the intensity of attacks, especially on political objects, and the growth of the number of refugees. The analysis allows for a deeper understanding of the impact of military actions on migration processes and may be helpful in predicting future trends.</p>
      </abstract>
      <kwd-group>
        <kwd>Intelligence analysis</kwd>
        <kwd>big data analysis</kwd>
        <kwd>refugees</kwd>
        <kwd>R</kwd>
        <kwd>shelling</kwd>
        <kwd>time series</kwd>
        <kwd>smoothing</kwd>
        <kwd>correlation</kwd>
        <kwd>aggression</kwd>
        <kwd>statistical analysis</kwd>
        <kwd>migration</kwd>
        <kwd>infrastructure 1</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>The full-scale invasion of russia in Ukraine has caused not only infrastructure destruction but also
large-scale humanitarian consequences, including mass migration of the population. The number of
Ukrainian refugees forced to leave the country has grown in parallel with the intensity of attacks on
civilian and political targets. The scientific community is actively researching migration processes.
Still, most of the work focuses on the sociological or political aspects of the problem, leaving aside
the analytical relationships between the frequency of attacks and the dynamics of migration. In this
context, the use of modern tools for data mining and mathematical modelling using the R
programming language, which allows for deep statistical processing, visualization, and correlation
analysis of large data sets, is of particular importance. The work is aimed at filling the existing
scientific gap and has applied value to the development of effective strategies for responding to
humanitarian crises caused by armed conflicts. For Ukraine, the results of such a study are significant
from the point of view of strategic planning, social protection of the population, and international
cooperation.</p>
      <p>The purpose of the work is to identify and formalize the relationships between the dynamics of
the number of Ukrainian refugees and the frequency of russian attacks on the civilian and political
infrastructure of Ukraine. To achieve this goal, the following tasks must be solved:
generate data samples on shelling and the number of refugees;
perform pre-processing, normalization and visualization of data;
apply time series smoothing methods to identify trends;
create correlation models of dependencies between parameters;
carry out cluster analysis of dynamics based on statistical indicators.</p>
      <p>The object of the study is the forced migration of the population of Ukraine under the influence
of military actions. The subject of the study is the dependence of the dynamics of changes in the
number of Ukrainian refugees on the frequency and nature of attacks on civilian and political
infrastructure. For the first time, a comprehensive approach to the analysis of the relationship
between the number of refugees and the intensity of attacks is proposed, which is based on a
combination of statistical methods, time series smoothing and clustering in the R environment. New
results were obtained regarding the degree of influence of attacks on political infrastructure on the
increase in the number of refugees. It was established that these attacks have a stronger correlation
with the dynamics of migration, which was not covered in previous studies. The work improves the
methodology for analysing migration processes in conditions of armed conflict, providing the
opportunity for operational forecasting of the humanitarian situation in the country.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related works</title>
      <p>The issue of the impact of armed conflicts on migration processes attracted the attention of
researchers long before the start of the full-scale war in Ukraine. However, it was the events after
2022 that became the impetus for the active study of changes in the structure and scale of forced
migration as a result of armed aggression. The works [1,2] investigated the general trends in
population displacement as a result of the war and, in particular, studied the socio-economic
consequences and challenges for host countries. Specific attention was paid to Ukraine as the largest
source of refugees in Europe after 2022 [3].</p>
      <p>A number of studies [4,5] analyse the demographic characteristics of refugees, gender
composition, access to services, and adaptation conditions. However, insufficient attention has been
paid to statistical modelling and attempts to establish a connection between migration waves and
specific military events. Study [6] is one of the few that uses regression analysis to identify a
correlation between the number of attacks on infrastructure and the number of migrants. Work [7]
emphasizes the importance of building forecasting models but uses only aggregated indicators
without deep temporal detail.</p>
      <p>Existing publications mainly focus on qualitative analysis of the phenomenon, which creates a
need for more formalized approaches using statistical methods of time series analysis, smoothing,
clustering and data normalization. Such techniques allow moving from describing the phenomenon
to identifying hidden patterns and creating tools for operational forecasting, which is especially
relevant for Ukraine in the context of a long-term threat from the aggressor.</p>
      <p>The issue of forced migration as a result of russia’s armed aggression against Ukraine has received
considerable attention in modern scientific and intergovernmental literature. The works of Libanova,
Poznyak, and Tsymbal [1] provide a fundamental analysis of demographic changes in Ukraine after
the outbreak of full-scale war. The researchers emphasize the complexity of the problem, including
social, economic, and security aspects. Digital methods of tracking migration flows have become an
important research tool: Wycoff et al. [2] demonstrated the effectiveness of analysing digital traces,
in particular data from Google and social networks, to monitor the movement of Ukrainian refugees
in real-time. Minora et al. [3] worked in a similar direction, using Facebook advertising data to build
a model of Ukrainian movements within the EU, confirming the feasibility of using big data in times
of crisis.</p>
      <p>The organizational and political aspects are reflected in the reports of international organizations.
OECD analysis [4] emphasizes the unprecedented nature of the current flow of refugees to Europe,
suggesting ways to improve integration strategies. The specific impact of massive shelling on
migration dynamics is revealed in a study by the International Center for Ukrainian Victory [5],
where particular data shows how waves of attacks lead to a sharp increase in the number of people
leaving the country.</p>
      <p>The focus of the study by Kovtun and Salabay [6] is the integration of Ukrainians in host
countries, particularly in Germany. The work includes statistical processing of questionnaire data,
which is a valuable addition to macro-level assessments. Publications by Reuters [7] and The
Guardian [8] complement the scientific picture with a modern context - they emphasize the threat
of “migration weapons” as an element of hybrid warfare and draw attention to the potential decrease
in international support due to donor fatigue. This review allows us to conclude that although the
issue of forced migration is actively researched, most of the existing work does not focus on
quantitative modelling of the relationship between attacks on infrastructure and migration
dynamics. Therefore, a study based on time series and statistical analysis is a logical and vital
continuation of this scientific discussion.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Methods and materials</title>
      <p>The study used a set of methods of mathematical statistics, data mining, and computer modelling
[9-16], which allowed for a comprehensive analysis of the relationship between the number of
Ukrainian refugees and the frequency of attacks on the civilian and political infrastructure of Ukraine
by russia. The primary tool for implementing all stages of the analysis was the R programming
language, which provides extensive capabilities for processing, visualization, and statistical
modelling of data. The use of the R language is due to its openness, flexibility, and the availability of
specialized packages for processing time series (zoo, forecast, TTR), plotting (ggplot2, plotly), and
conducting cluster analysis (cluster, factoextra). It allowed for the effective processing of three
independent samples: data on the number of refugees, attacks on civilian infrastructure, and political
objects.</p>
      <p>At the stage of primary data processing, methods of normalization, cleaning and structuring of
data were used, followed by their presentation in the format of time series. Smoothing methods were
chosen to study the dynamics, in particular:




simple moving average method — to eliminate random fluctuations and identify the primary
trend;
weighted moving average — to better take into account the asymmetric effects of events;
median filtering — as an effective means of highlighting long-term trends in the presence of
outliers;
exponential smoothing — to reflect the inertia and adaptability of processes.</p>
      <p>Correlation analysis methods were used to identify the nature of the dependencies between
variables, including the calculation of Pearson coefficients, determination and correlation relations,
and the construction of correlation fields. These methods made it possible to establish the degree of
connection between the number of attacks and the change in the number of refugees. In addition,
hierarchical agglomerative cluster analysis was used to analyse structural changes in the data, which
allowed group periods with similar characteristics of attacks and migration dynamics to identify
phase transitions in the migration behaviour of the population. Thus, the selected methods and tools
provided a reliable basis for a comprehensive analysis of complex nonlinear dependencies in time
series, allowing the identification of hidden relationships between military events and the behaviour
of the civilian population in conditions of armed conflict. For this work, the R language was chosen
to be the best suited for statistical data analysis. R is an open-source programming language. It is
used for statistical data processing (statistical calculations) and graphics (visualization). R can be used
to work with data sets. For example, R is used to solve complex problems of mathematical statistics,
perform primary data analysis, and perform mathematical modelling. Using R, you can prepare data
for research and process experimental results in various areas of life, such as medicine, nature
management, environmental protection, econometrics and financial analysis, marketing,
engineering calculations, etc. R not only supports a wide range of statistical and numerical methods
but can also be extended with software packages - libraries for specific functions or special areas of
application. The first versions of R were created in the 1990s. Since then, it has been constantly
evolving, adding new packages and features, improving existing ones, and fixing bugs. Thanks to an
active community of users and developers, R remains relevant and is constantly being updated. The
programming language has a unique syntax and framework for running programs. R is actively used
in artificial intelligence and machine learning. At first glance, the programming language may seem
quite complex. In fact, it is pretty logical and straightforward. R was created by developers for
scientists who have experience and knowledge in the field of mathematical analysis, static methods,
and probabilistic deviations. It has a number of advantages:

</p>
      <p>Code in this programming language can be run without compilation, as it uses an interpreter
that demonstrates how the program works in real-time;</p>
      <p>R is efficient and productive due to its vector approach.</p>
      <p>The R programming language is used to work with data:
collecting and analysing data from various sources;
searching for patterns and deviations;
testing and validating hypotheses;
visualizing data in multiple ways;
working with statistical data to identify anomalies.</p>
      <p>Thus, R is and remains one of the most flexible and powerful programming languages designed
specifically for data analysis. The R language is capable of processing a large number of types of
various objects - vectors, matrices, lists, data tables, etc. The R programming language can also work
with a large number of data types. These can be, for example, numbers with a fractional part,
integers, text records, date and time values, logical operation values, etc.</p>
      <p>Three datasets [17-18] were selected for analysis, which are directly related to the topic of the
work. The first dataset is the number of Ukrainian refugees abroad. This dataset contains information
about the number of Ukrainians who left the country due to the war. The second dataset is the
shelling of Ukrainian civilian infrastructure. This dataset includes information on the number and
scale of attacks on Ukraine's civilian infrastructure. And the third is the shelling of the political
infrastructure of Ukraine. This dataset covers attacks on administrative buildings, government
facilities and other political institutions. With the help of these three datasets, we want to show how
the shelling of Ukraine affects the number of Ukrainians travelling abroad. The analysis will help to
understand how attacks on civilian and political infrastructure are related to forced population
migration. Dataset 1 (Fig. 1a) contains information on 1,821 cases of shelling of civilian infrastructure
in different regions of Ukraine from January 2018 to October 2024. The data are structured by
administrative units (regions and cities) with corresponding codes and the number of events.</p>
      <p>Dataset 2 (Fig. 1b) presents a larger dataset (3,146 records) on shelling of political infrastructure.
It is noteworthy that the number of events (Events) varies significantly between regions – from
single cases to 40 events in the individual areas, which indicates an uneven distribution of attacks.
Dataset 3 (Fig. 1c) shows the dynamics of the number of Ukrainian refugees abroad from April 25,
2022, to March 12, 2024. There is an increase in the number of refugees – from 85,000 to 5,982,920
people. The data is presented in CSV format and contains 470 rows after cleaning, presenting data
values in the form of a compressed table in Table 1-3.</p>
      <p>The graphical representation of the data is given in Fig. 2-4 for the respective datasets. For dataset
1 (Fig. 2a) in the initial period (2018-2022), the data shows a relatively low and stable level of
incidents, the number of which fluctuates within 20-40 events per month. In 2022, there is a sharp
jump in the number of incidents to 700-800 incidents. It is the most intense period on the graph.
After the main surge (2022-2024), the number of incidents decreased but remained significantly
higher than before 2022, fluctuating between 200-400 events per month. Towards the end of the
graph, there is a sharp decrease in the number of incidents.
shelling and (c) Ukrainian refugees abroad number dynamics since the beginning of the full-scale
war in the Cartesian coordinate system and in the polar coordinate system.</p>
      <p>According to the data analysis for dataset 2 (Fig. 2b), the initial period (2018-2022) has a relatively
stable level of events. The indicators fluctuate within 1000-1500 cases per month, and some seasonal
fluctuations are observed. There was a noticeable decrease in the number of events in 2020-2021. The
indicators decreased to approximately 500 cases per month, a relatively stable low level during this
period—a dramatic increase in the number of events. Peak values reach about 5000 cases per month,
the most intense period for the entire time of observations. Stabilization at a high level in 2022-2024,
the indicators fluctuate within 4000-4500 cases per month, and periodic fluctuations in intensity are
noticeable. Towards the end of the graph, a sharp decrease in the number of incidents is observed.</p>
      <p>According to the data analysis for dataset 2 (Fig. 2c), the initial period (early 2022) has a sharp
increase in the number of refugees from almost zero to about 4.5 million in a very short period. The
most rapid growth is observed in the first weeks. The peak value of about 8 million people is reached
in mid-2022, with relative stabilization at a high level. Two noticeable “step” declines in the period
2022-2023, the first decline to about 6.5 million and the second decline to about 6 million. A relatively
stable level of about 6 million people in the period 2023-2024, minor fluctuations in this range, and a
tendency to slow levelling off. Descriptive statistics – quantitative characteristics of the data for the
datasets are presented in Fig. 3.</p>
      <p>Dataset 1 (Fig. 3) describes events involving civilian targets. On average, there were 5.1 events
with 4.8 casualties. The data have significant variability (coefficient of variation 169% for events and
524% for casualties). There is a strong right-sided skew (5.0 for events and 21.6 for casualties),
indicating the presence of extreme values. The maximum number of events in a single case is 139,
and the maximum number of casualties is 774. In total, 1821 observations were recorded, with 9293
events and 8696 casualties.</p>
      <p>Name Dataset 1 Dataset 2 Dataset 3</p>
      <p>Dataset 2 (Fig. 3) describes events of political violence. The average number of events is much
higher - 54.4 with 38.4 victims. There is also high variability (coefficient of variation 183% for events
and 480% for victims). The right-sided asymmetry is less pronounced (3.0 for events and 11.0 for
victims). The maximum values are significantly higher - 757 events and 3757 victims. A total of 3146
observations are of 171148 events and 120782 victims. Dataset 3 (Fig. 3) has a different nature of the
data, as it describes the number of refugees and has completely different columns. The average value
is about 6.17 million—relatively low variability (coefficient of variation 19.3%). The negative
asymmetry (-2.5) indicates a left-sided distribution. The range of values is from 85,000 to 7.9 million—
a total of 470 observations.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Experiments, results and discussion</title>
      <sec id="sec-4-1">
        <title>4.1. Data pre-processing and presentation of results</title>
        <p>The histogram in Fig. 4a for dataset 1 shows a sharply asymmetric distribution with a maximum
of about 1500 cases at the beginning of the scale. Most events are concentrated in the range of 0-50
attacks. There is a sharp decrease in frequency as the number of events increases. Such a distribution
may indicate that a single or small series of attacks occur most often. It has a pronounced right-sided
asymmetry. The shape of the distribution resembles an exponential or Poisson distribution. A sharp
decrease in frequency with an increase in the number of events is characteristic of an exponential
distribution law. Since the data are discrete and represent the number of events, the Poisson
distribution may be the most suitable approximation. The histogram in Fig. 4b for dataset 2 also
shows an asymmetric distribution but with a higher peak (over 2000 cases). The distribution is more
stretched along the X-axis (up to 800 events). The frequency of events gradually decreases with an
increase in their number. It indicates more intense attacks on political infrastructure than on civilian
infrastructure. It also exhibits right-sided asymmetry. As in the first case, the shape corresponds to
an exponential distribution. It can be approximated by a gamma distribution, which is more flexible
and can better account for the "heavy tail" of the distribution. Again, given the discreteness of the
data, a Poisson distribution may be appropriate.</p>
        <p>The histogram in Fig. 4c for dataset 3 has a fundamentally different distribution pattern - close to
normal. The peak of the distribution falls in the range of about 5-6 million refugees. The distribution
is more symmetrical compared to the previous histograms. There are a small number of cases with
a small number of refugees (about 0-2.5 million). The bulk of the data is concentrated in the range of
5-7.5 million refugees. There is some asymmetry, but much less than in the previous cases. It can be
approximated by a normal distribution or, to better account for the asymmetry, by a lognormal
distribution. You can also consider the gamma distribution as an alternative since it works well with
data that has a slight asymmetry.</p>
        <p>Most events (shellings) are concentrated at the beginning of the scale (0-50). It has a very
pronounced peak at the beginning, with about 1500 events. The cumulative curve increases rapidly
and reaches a plateau, indicating that most events occur in the first intervals (Fig. 5a-b). After 50
events, only isolated cases are observed.</p>
        <p>Similar distribution to the first dataset, but with a higher peak (about 2000 events). It also has an
intense concentration of events at the beginning of the scale. The cumulative curve shows a similar
dynamic of rapid growth with subsequent plateauing (Fig. 5c-d). The distribution is more stretched
along the scale (up to 800 events). It differs significantly from the first two in the nature of the
distribution (Fig. 5e-f). It has a normal distribution with a peak of about 5-6 million people. The
cumulative curve has an S-shaped shape, which is typical of a normal distribution. The bulk of the
data is concentrated in the range of 4-7 million. There is a small number of observations at the
beginning of the scale (0-2 million).</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Time series trend detection using smoothing methods</title>
        <p>Smoothing methods are used to reduce the influence of the random component (random
fluctuations) in time series. They provide an opportunity to obtain more "clean" values, consisting
only of deterministic components. Some of the methods are aimed at highlighting only some
elements, such as a trend. Smoothing methods can be conditionally divided into two classes, which
are based on different approaches: analytical approach and algorithmic approach. The analytical
approach is based on the selection of a mathematical function (for example, an exponential,
polynomial or hyperbola) that best fits the data trend, determined visually. Then, the parameters of
this function are estimated using mathematical or statistical methods, which form a model to describe
the time series. The algorithmic approach focuses on calculating new values of the series using
algorithms such as the moving average method, weighted average method, exponential smoothing
method, and median smoothing method.</p>
        <p>From 2018 to early 2022, the data (Fig. 6a) shows relatively stable low-level activity, where the
moving average (red line) closely follows the actual data points (blue dots). There is a sharp spike in
early 2022, after which the level remains elevated but gradually stabilizes. The moving average
smoothes out the volatile spikes in the data, preserving the overall trend. The trend is clearly
nonlinear, especially after 2022. The graph shows more stable activity from 2018-2020 with
approximately 1000 events (Fig. 6b). Small decline in 2020-2021. As in the first graph, there will be a
sharp increase in 2022. The trend is highly nonlinear, with several clear phases. Given the nonlinear
nature of both data sets, a weighted moving average would be more appropriate than a simple
moving average. A simple moving average tends to lag behind significant changes in the data,
especially during sharp increases/decreases. The current simple moving average may not accurately
reflect the actual dynamics of the processes, especially during rapid change periods.</p>
        <p>Graph 1 in Fig. 7a (Civilian-targeted events) illustrates the weighted moving average (grey line),
which better reflects the dynamics of changes compared to the simple moving average. In the period
2018-2021, the trend line reacts more sensitively to fluctuations in the data. After a sharp jump in
2022, the weighted average adapts more quickly to the new level of activity. The smoothing is less
aggressive, which allows us to better track fundamental changes in the data. At the end of the period
(2023-2024), the trend towards stabilization at the new level is better visible. Graph 2 in Fig. 7b
(dataset 2) illustrates that by 2022, the weighted moving average more accurately reflects fluctuations
in activity around the level of 1,000 events. The gradual decrease in activity in 2020-2021 is more
noticeable. After the jump in 2022, the method better reflects the fundamental dynamics of growth.
There is less lag from actual data during sharp changes. The formation of a new stable level in
20232024 is more clearly visible.</p>
        <p>The least aggressive smoothing is shown in Fig. 8a-b for datasets 1-2, which better reflects
shortterm fluctuations. More responsive to data outliers. There is more detail in the process dynamics and
less lag from real data.</p>
        <p>Stronger smoothing compared to w=3 is shown in Figure 8c-d, which filters out random
fluctuations better. There is more lag from real data. More clearly, it shows medium-term trends.
Less sensitive to local extremes. The most aggressive smoothing is shown in Figure 8d-e, which best
detects long-term trends. Significantly reduces the impact of outliers and the most considerable lag
from real data. It is best suited for detecting a general trend. Comparative analysis of methods:
1. For events targeting civilians:
 All methods clearly show a sharp jump in 2022;
 Non-linear smoothing (w=7) best shows stabilization after the jump;
 For current monitoring, w=3 is better suited. For trend analysis - w=7;
2. For events targeting political targets:
 All methods reflect the overall dynamics well;
 Non-linear smoothing best shows the transition between different modes of activity;
 Larger values of w are better suited for analyzing long-term changes.</p>
        <p>Therefore, for operational monitoring, linear smoothing is best suited when w = 3, for
mediumterm analysis – w = 5, for identifying long-term trends – nonlinear smoothing w = 7. The very low
level of events (about 50) from 2018 to early 2022 is illustrated in Fig. 9a. A sharp peak in 2022 to
about 700 events. Further decrease and stabilization at the level of 200-300 events during 2023-2024—
a sharp drop in early 2025. A relatively stable period from 2018 to early 2022 with rates of about
1000-1500 events is shown in Fig. 9b. A sharp increase in rates in 2022 to about 4000-5000 events.
Maintaining a high level (about 4000 events) throughout 2023-2024. A sharp drop in early 2025. Both
graphs show a dramatic change in the situation starting in 2022, coinciding with the start of russia’s
full-scale invasion of Ukraine. It is noticeable that the number of events directed at political targets
significantly exceeds the number of events directed at civilians. Median filtering (shown by the
orange line) helps to smooth out short-term fluctuations and identify significant trends in the data.</p>
        <p>The initial data in Fig. 10 shows a relatively low and stable level of both events and fatalities from
2018 to 2021, followed by a sharp spike in 2022 when over 2,000 incidents occurred. In normalized
data (on a scale of 0-1), both metrics show almost zero activity until 2022. The spike in 2022 reaches
the maximum normalization (1.0) for both events and fatalities. After 2022, the number of events
(orange line) remains at a higher normalized level (around 0.3-0.5) compared to the number of
fatalities (red line), which decreases to around 0.1. It suggests that although attacks continue, they
have become relatively less lethal.</p>
        <p>Consecutive low-level political events (blue line) in 2018-2021 are shown in Fig. 11. Sharp increase
in both indicators since 2022—more erratic patterns of fatalities (green line) with extreme spikes. In
normalized data: Events (orange line) show higher, more consistent normalized values (0.75-1.0) after
2022. Fatalities (red line) show more variation but generally lower normalized values. The
relationship between events and fatalities is less intense than in the infrastructure dataset.</p>
        <p>A high correlation coefficient (≥ 0.7) indicates that the smoothed series well preserves the general
trend of the original series (Table 4). At the same time, smoothing removes local fluctuations (noise)
but does not destroy the structure of the data. Turning points are local maxima and minima in the
series. A significant reduction in the number of turning points in the smoothed series indicates that
smoothing effectively eliminates short-term fluctuations (noise).</p>
        <p>N=7</p>
        <p>N=9</p>
        <p>Accordingly, the original series has more "noise" or "chaotic changes", which can often be
insignificant for analysis. A decrease in the number of turning points is a sign that the smoothed
series shows the primary trend but with less detail. Fig. 12 shows the results of smoothing using the
Kendall formulas. The data show a sharp peak of activity around point 55 on the time axis, reaching
approximately 700 attacks. After the peak, there is a stabilization at around 200-250 attacks. Method
B provides a smoother visualization of the trend (Fig. 16). Both methods show a similar overall
picture, but Method B better reflects long-term trends.</p>
        <p>There is a significant increase in the number of attacks starting from point 50 on the time axis.
The peak value reaches about 5000 attacks. Method B (sequential smoothing) shows a smoother
curve compared to method A (Fig. 17). Larger window sizes (w11-w15) give a smoother result but
may lose important local features of the data. At the end of the period, there is a sharp decline in
activity. The graph in Fig. 18 shows a rapid increase in the number of refugees at the beginning of
the period (up to point 100). The maximum value reaches about 8 million people. Two noticeable
declines are observed (around points 200 and 300). Both smoothing methods give very similar results,
which indicates relatively “clean” initial data. At the end of the period, there is a stabilization at the
level of about 6 million people.</p>
        <p>In Fig. 19, for dataset 1, a strong positive correlation is observed between all smoothing windows
(all values &gt; 0.82). The strongest correlation is observed between neighbouring smoothing windows
(for example, Window_5 and Window_7 correlate 0.9884). The correlation gradually decreases with
increasing differences in the size of the smoothing windows. The original series has the strongest
correlation with smaller smoothing windows (Window_3: 0.9432) and the weakest with larger ones
(Window_15: 0.8205). There is a very high positive correlation between all smoothing windows (all
values &gt; 0.94) in Fig. 19 for dataset 2. Correlation values are generally higher than for civil events.
There is also a trend towards a stronger correlation between neighbouring windows. The original
series has a consistently high correlation with all smoothing windows (from 0.9416 to 0.9872). There
is an extremely high positive correlation between all smoothing windows (all values &gt; 0.99) in Fig.
19 for dataset 3—the highest correlation values among all three matrices. There is practically no
difference between the correlations of neighbouring and distant smoothing windows. The original
series has a very high correlation with all smoothing windows (all values &gt; 0.99).</p>
        <p>In the diagram in Fig. 20a, the initial number of points is smaller for dataset 1 (about 23-24).
Method A shows unstable behaviour with local peaks and troughs. Method B shows a constant
decrease in the number of points. The most significant difference between the methods is observed
at medium window sizes (9-11). At the maximum window size (15), both methods show the smallest
number of turning points. In the diagram in Fig. 20b, the highest number of turning points is observed
for dataset 2 (about 30) at the smallest window size (3). Method A shows a smoother decrease in the
number of points and stabilizes at about 15-20 points. Method B shows a sharp drop at the beginning
and stabilizes at about 4 points. Both methods show a tendency to decrease the number of turning
points with increasing window size. At large window sizes (11-15), the difference between the
methods becomes more pronounced.</p>
        <p>The highest initial number of turning points for dataset 3 (more than 50) among all three plots in
Fig. 20. Both methods show a similar downward trend. Method A retains more turning points at all
window sizes. After window size 11, both methods show relative stability. The difference between
the methods remains almost constant at large window sizes.</p>
        <p>According to Fig. 21a, the correlation between civilian shelling and the number of refugees
according to the Kendel method has a weak linear relationship. The correlation coefficient of the
modulus &lt; 0.5, the coefficient of determination is less than 25% (Table 5).</p>
        <p>According to Fig. 21b, the correlation between the shelling of political targets and the number of
refugees, according to the Kendel method, has a linear relationship of medium strength. The
correlation coefficient of the modulus is less than 0.7 but more than 0.5, and the coefficient of
determination is less than 50% but more than 25%. According to Fig. 22a, the correlation between the
fatal cases provoked by the shelling of civilian targets and the number of refugees, according to the
Kendel method, has a weak linear relationship. The correlation coefficient of the modulus is &lt; 0.5,
and the coefficient of determination is less than 25%. According to Fig. 22b, the correlation between
the fatal cases provoked by the shelling of political targets and the number of refugees, according to
the Kendel method, has a weak linear relationship. The correlation coefficient of the modulus is &lt;
0.5, and the coefficient of determination is less than 25%.</p>
        <p>The correlation ratio is 0.581, indicating a moderate relationship between the variables (Fig. 23a).
A scattered nature of the points around the midline is observed. The group variance
(89068252167.606) is significantly smaller than the total variance (153339421323248.07), confirming
the presence of a moderate relationship. Most events are concentrated in the range of 5-7 events,
which may indicate a specific pattern in the frequency of shelling (Table 6).</p>
        <p>The high correlation ratio of 0.935 indicates a powerful relationship between the variables (Fig.
23b). The points are located more densely relative to the mean line. The group variance
(143336071722265.27) is close to the total (153339421323248.07), which confirms the strong
relationship. There is a clear trend of an increase in the number of refugees with an increase in the
number of attacks on political targets.</p>
        <p>The correlation ratio of 0.791 indicates a strong relationship (Fig. 24a). The points have a
noticeable spread but retain the general trend. The group variance (1212784734739.6) is significantly
smaller than the total, which indicates the presence of other influencing factors. The main
concentration of events is observed in the range of 4-8 fatal cases.</p>
        <p>The very high correlation ratio of 0.976 indicates an almost functional relationship (Fig. 24b). The
points are located most densely to the midline compared to other graphs. The group variance
(149724110730.35) is nearly equal to the total, which confirms a powerful relationship. A direct
relationship between the fatalities number and the refugees number is clearly visible.</p>
        <p>There is a very strong positive autocorrelation (0.969 at lag 1), which gradually decreases with
increasing lag (Fig. 25-26). Even at lag 10, the autocorrelation remains noticeable (0.628). It indicates
a stable trend and inertia of the migration process - the number of refugees in the next period
strongly depends on the previous period. The smooth decrease in autocorrelation indicates a
relatively stable nature of migration processes.</p>
        <p>Fig. 27 shows the autocorrelation of events (attacks) and casualties separately. The
autocorrelation of events decreases more slowly (from 0.956 to 0.527 at lag 10). The autocorrelation
of the number of casualties decreases much faster (from 0.863 to 0.16). It means that the attacks
themselves are more systematic, while the number of casualties is more random and less predictable.</p>
        <p>The number of events (attacks) in Fig. 28 shows a rapid decrease in autocorrelation - from 1.0 to
0.12 over 10 months, which indicates a somewhat chaotic and less systematic nature of the attacks
over time. The sharp drop is especially noticeable after the 5th lag (month). In contrast, the
autocorrelation of the number of victims decreases more slowly - from 1.0 to 0.465, maintaining
higher values throughout the period. It may indicate that although the attacks themselves become
less predictable, their lethality retains a certain systematicity and dependence on previous periods.
Such a pattern may indicate a change in attack tactics - from regular, systematic attacks to more
sporadic (irregular), but with similar effectiveness in terms of victims. Fig. 29 shows a generalized
plot of the results of smoothing using the Pollard formulas. These plots demonstrate the dynamics
of attacks on political infrastructure using two smoothing methods. In both cases, there is:</p>
        <p>The initial period had a relatively stable event rate (around 1000-1500 events). Sharp increase
after 50th period to peak around 4000-5000 events. Method B shows smoother transitions between
periods, especially in the area of sharp increase. Different window sizes (w3-w15) affect the degree
of smoothing, with larger windows giving a smoother curve. The graphs in Fig. 30 show a low level
of events at the beginning (around 20-30 cases). A sharp peak of activity around the 50th period (up
to 700 cases). Further stabilization at the level of 200-300 cases. Method B provides a smoother
representation of the data, especially in the peak area. Larger window sizes (w11-w15) significantly
smooth out the peak values. The graphs in Fig. 31 show a rapid increase in the number of refugees
at the beginning (up to 8 million). Two sharp declines (around the 180th and 300th periods).
Stabilization after the 300th period at the level of about 6 million. Both methods give almost identical
results for this data set. The window size has a minimal effect on the shape of the curve, indicating
greater stability of the data.</p>
        <p>There is a robust positive correlation between all smoothing windows (coefficients from 0.9137
to 0.9999). The strongest correlation is observed between neighbouring smoothing windows (Fig. 32,
dataset 1). The original series has the strongest correlation with smaller smoothing windows
(Window_3, Window_5) and somewhat weaker with larger windows. As the smoothing window size
increases, the correlation with the original series gradually decreases (from 0.9696 to 0.8951).</p>
        <p>We see an interesting difference from the previous correlogram (Fig. 41) - here, "Victims" (red
colour) have a more substantial autocorrelation than "Events" (green colour). "Victims" starts with a
high autocorrelation (1.0). It slowly decreases to 0.46 at lag 10. It maintains relatively high values
even at considerable lags. "Events" also starts with a high autocorrelation (1.0). It decreases much
faster to 0.133 at lag 10. After lag 5, the autocorrelation becomes relatively weak (&lt;0.4). Such a
structure may indicate that in attacks on political infrastructure, the number of victims is more
predictable and systematic than the events themselves. This may be due to the fact that political
objects usually have a certain number of permanent personnel, so the number of potential victims is
more stable. Instead, the events themselves (the shelling) may occur more chaotically and less
predictably.</p>
        <p>The overall trend in Figure 42a shows a relatively low level until period 40. There is a sharp peak
of activity around period 60. Smoothing with different alpha values helps to visualize the overall
trend better. As in the first graph, smaller alpha values give a smoother curve.</p>
        <p>The data show in Fig. 42b two main periods of exponential smoothing intensity: the first - about
1000-1500 cases (up to the 40th period), and the second - a sharp increase to 4000-5000 cases (after
the 60th period). Smaller alpha values (0.1, 0.15) give a smoother curve, filtering out short-term
fluctuations. Larger alpha values (0.25, 0.3) better track sharp changes but retain more "noise" in the
data. The original data (pink line) shows significant volatility (variability). The graph in Fig. 43 shows
a rapid increase in the number of refugees at the beginning of the period (up to the 100th period).
After reaching the peak, a sharp decline is observed. Then, the curve stabilizes with a slight gradual
decrease. Different alpha values produce very similar smoothing results, indicating relatively "clean"
data with fewer random fluctuations.</p>
        <p>There is a robust positive correlation between all smoothing levels (Alpha), with coefficients
ranging from 0.77 to 0.99 (Fig. 44, dataset 1). The strongest relationship is between adjacent
smoothing levels, and the weakest is between the original series and the smoothed data, which is the
expected result. The "Political Events" matrix (Fig. 44, dataset 2) shows extremely high correlation
values between all smoothing levels (0.86-0.99). The original series has a slightly stronger
relationship with the smoothed data compared to civil events, which may indicate a greater
regularity in political events. The "Refugee Data" matrix (Fig. 44, dataset 3) shows the highest
correlation values among all three matrices (0.95-0.99), including the relationship with the original
series. It indicates that the refugee data have the most stable and consistent dynamics, with fewer
random fluctuations. With exponential smoothing for dataset 1, the graph shows a similar trend as
the second graph but with a slightly smaller growth amplitude (from 19 to 32 points). There is a
noticeable plateau at alpha values of 0.15-0.20, which may indicate some stability in the pattern of
attacks on civilian infrastructure in this smoothing range.</p>
        <p>Exponential smoothing for dataset 2 shows a gradual increase in the number of turning points
from 18 to 35 as alpha increases, with the sharpest increase at alpha &gt; 0.25. It may indicate a more
complex structure and irregularity in the data on the shelling of political infrastructure at higher
values of the smoothing parameter. The graph for exponential smoothing for dataset 3 shows the
smoothest and most consistent increase in the number of turning points from 14 to almost 40,
without obvious plateaus. It may indicate more regular dynamics of the refugee movement process
and less abrupt changes compared to the shelling data. Fig. 46a shows a weak linear relationship.
The modulus correlation coefficient is &lt; 0.5, and the coefficient of determination is less than 25%
(Table 9). Fig. 46b shows a dynamic relationship of medium strength. The modulus correlation
coefficient is less than 0.7 but more than 0.5, and the coefficient of determination is less than 50% but
more than 25%.</p>
        <p>Fig. 47a shows a weak linear relationship. The correlation coefficient of the modulus is &lt; 0.5, and
the coefficient of determination is less than 25%. Fig. 47b shows a weak linear relationship. The
correlation coefficient of the modulus is &lt; 0.5, and the coefficient of determination is less than 25%.
The correlation coefficient (Fig. 48a) is 0.815, indicating a strong positive relationship between the
variables. It suggests that there is a significant relationship between the shelling and the parameter
under study, where an increase in one indicator leads to a proportional increase in the other (Table
10). With a correlation ratio of 0.827 (Fig. 48b), there is a powerful positive relationship between the
shelling of civilian targets and the number of refugees. It demonstrates that the intensity of shelling
of civilian objects has a direct and significant impact on the increase in the number of refugees.</p>
        <p>The correlation coefficient of 0.586 (Figure 49a) shows a moderate positive relationship between
the number of attacks on political targets and the number of refugees. This relationship is less
pronounced compared to previous indicators but still indicates some dependence between the
variables. The correlation coefficient of 0.872 (Figure 49b) shows a powerful positive relationship
between the number of fatalities caused by attacks and the number of refugees. It indicates that the
increase in the number of victims has the most significant impact on the rise in the number of
refugees, which is one of the factors studied.</p>
        <p>Figure 51 shows a strong positive autocorrelation that gradually decreases with increasing lag. It
indicates a clear temporal dependence in the refugee data, where current values are strongly
correlated with previous periods, suggesting persistent trends in migration processes.</p>
        <p>The graph in Fig. 52 shows a high initial autocorrelation for both metrics (casualties and events),
with a sharper decline for the casualties’ indicator. It suggests that while both indicators are
timedependent, the number of casualties has a less stable dynamic compared to the number of shelling
events. Similar to the previous correlogram, the indicators in Fig. 53 show a high initial
autocorrelation with a gradual decline but with a minor difference between the event and casualties
metrics. It suggests a more consistent dynamics between the number of attacks and their
consequences for the political infrastructure.</p>
        <p>The median smoothing data (Fig. 54) shows a sharp spike around time point 50, followed by
fluctuations at a higher level. Both smoothing methods are effective in reducing the extreme spike
while maintaining the overall pattern. The sequential smoothing in Method B creates somewhat
smoother transitions between periods of change, which can be helpful for analysing long-term trends
in attacks on civilian infrastructure. The original data in Fig. 55 show significant volatility with a
large spike around time point 60, followed by a sharp drop around time point 80. Method A and
Method B produce similar smoothing effects, but Method B (sequential smoothing) provides
somewhat more stable trends while maintaining the underlying patterns of the data. Both methods
are effective in reducing noise while preserving the key features of the trend—the initial lower level
of events, the sharp increase, and the final decrease.</p>
        <p>Both methods in Fig. 56 show almost identical results for this dataset, probably because the
original data already has a relatively smooth trend. The data show:
 The rapid initial increase in the number of refugees;
 Plateau around the 200th time point;
 Significant drop followed by stabilization;
 There is a slight upward trend in the final period.</p>
        <p>The correlation matrix in Fig. 57 (dataset 1) shows a high correlation between the different
smoothing windows (all values above 0.92). The highest correlation is observed between
neighbouring window sizes, which is logical since they similarly process the data. The original data
has the lowest correlation with the most enormous smoothing windows (Window_13, Window_15),
indicating more smoothing and loss of detail as the window size increases.</p>
        <p>Dataset 1</p>
        <p>The correlation matrix in Fig. 57 (dataset 2) shows a similar pattern to the first table but with
slightly higher correlation coefficients (all values above 0.93). It suggests that median smoothing
produces more consistent results for political events compared to civil events.</p>
        <p>The correlation matrix in Fig. 57 (dataset 3) shows a perfect correlation (all values = 1) between
all smoothing windows. It indicates that the refugee data are very smooth, and different smoothing
window sizes have little effect on the shape of the trend. With median smoothing, the high
correlation with the original data is maintained, indicating that essential data characteristics are
preserved during smoothing.</p>
        <p>The diagram in Fig. 58a shows a similar pattern but with a greater difference between the
methods. Method A shows unstable behaviour with oscillations, while Method B consistently reduces
the number of turning points. Method B is significantly more effective in reducing the number of
turning points compared to Method A for all window sizes (Fig. 58b). Method B quickly stabilizes at
a low level. The plot in Fig. 58c shows the lowest number of turning points among all data sets.
Method B almost eliminates turning points after a window of size 5, while Method A retains a certain
number of turning points even at large window sizes.</p>
        <p>The graph (Fig. 59a) shows a negative correlation - with an increase in the number of attacks on
civilian targets, there is a tendency for the number of refugees to decrease. However, the data have
significant variability (as can be seen from the scatter of blue points), and the confidence interval
(grey zone) expands with an increase in the number of attacks, which indicates a lower reliability of
the forecast at higher values—weak linear relationship. The correlation coefficient of the modulus is
&lt; 0.5, and the coefficient of determination is less than 25% (Table 11).</p>
        <p>In Fig. 59b, a positive correlation is observed - with an increase in the number of attacks on
political targets, the number of refugees also increases. The trend is more pronounced, although the
red line shows significant fluctuations. The confidence interval expands at the edges of the graph,
which indicates lower reliability of the forecast at extreme values—the linear relationship of average
strength. The modulus correlation coefficient is less than 0.7 but more than 0.5, the coefficient of
determination is less than 50% but more than 25%.</p>
        <p>The graph in Fig. 60a shows a negative correlation - with an increase in the number of deaths, a
decrease in the number of refugees is observed. Sharp fluctuations are especially noticeable at the
beginning of the graph, which then smooths out. The confidence interval expands significantly with
an increase in the number of cases—weak linear relationship. The modulus correlation coefficient is
&lt; 0.5, the coefficient of determination is less than 25%.</p>
        <p>In Fig. 60b, a positive correlation is observed - the increase in the number of deaths correlates
with the rise in the number of refugees. The trend is relatively stable, although the red line shows
periodic fluctuations. The confidence interval remains relatively narrow in the middle part of the
graph, which indicates a greater reliability of the forecast in this range—weak linear relationship.
The correlation coefficient of the modulus &lt; 0.5, the coefficient of determination is less than 25%.</p>
        <p>The correlation ratio, according to Fig. 61a and Table 12, is 0.803. This value indicates a strong
relationship between the variables. 80.3% of the variation in the dependent variable (number of
refugees) can be explained by the shelling of civilian targets. It indicates a significant impact of the
shelling of civilian targets on migration processes. The correlation coefficient, according to Fig. 61b
and Table 12, is 0.753. The indicator demonstrates a strong connection between the shelling of
political targets and the number of refugees. The shelling of political targets explains 75.3% of the
variation in the number of refugees. This indicates a significant, although somewhat smaller
compared to the first case, impact of political shelling.</p>
        <p>The correlation coefficient, according to Fig. 62a and Table 12, is 0.575. This value indicates a
moderate relationship between fatalities from the shelling of civilian targets and the number of
refugees. This factor can explain 57.5% of the variation. The relationship is less pronounced
compared to previous indicators.</p>
        <p>The correlation ratio, according to Fig. 62b and Table 12, is 0.844. The highest value of the
correlation ratio among all graphs indicates a powerful relationship between fatalities from the
shelling of political targets and the number of refugees. This factor can explain 84.4% of the variation
in the number of refugees. It indicates the most significant impact of this indicator on migration
processes. In Fig. 63, a strong positive correlation (0.793) is observed between total events
(Events_Mean_Smoothed) and the number of refugees (NoRefugees_Mean_Smoothed), indicating
that an increase in conflict events leads to a rise in the number of refugees. There is a robust positive
correlation (0.814) between civilian events (Civilian_Events_Mean_Smoothed) and civilian fatalities
(Civilian_Fatalities_Mean_Smoothed), which logically reflects a direct relationship between
incidents and their consequences. There is a moderate negative correlation (-0.428) between total
events and civilian events, which may indicate that not all conflict events are directly related to
civilians. There is a strong negative correlation (-0.731) between total events and civilian fatalities,
which may indicate that a significant proportion of events do not result in civilian casualties. It is
noteworthy that the number of refugees has a negative correlation (-0.748) with civilian casualties,
which may indicate that timely evacuation of the population (refugees) reduces the number of
civilian casualties.</p>
        <p>Overall, these correlations demonstrate a complex interdependence between different aspects of
a conflict situation, where an increase in the total number of events leads to an increase in the
number of refugees but not necessarily to the rise in civilian casualties, perhaps due to population
evacuation. In Fig. 64, a very strong positive autocorrelation is observed for slight lags (0-3 days),
where the coefficients exceed 0.9, which indicates a high inertia of the process in the short term. It
means that the number of refugees on a given day is very strongly related to the number in the
previous 1-3 days. With an increase in the time lag (from 4 to 10 days), a gradual decrease in the
strength of the autocorrelation is observed - from 0.85 to 0.624, which indicates a weakening of the
connection between observations with a larger time gap. A smooth, almost linear decrease in
autocorrelation without sharp jumps indicates a stable nature of the migration process without
sudden changes in trends. Even with a lag of 10 days, the autocorrelation remains moderately high
(0.624), which indicates the presence of long-term trends in the migration process and the relative
predictability of the dynamics of the number of refugees.</p>
        <p>In general, this nature of the autocorrelation function is typical for mass migration processes. It
indicates that changes in the number of refugees occur gradually, without sharp fluctuations. The
current situation strongly depends on previous days, which is essential to consider when planning
humanitarian assistance and developing appropriate policies.</p>
        <p>The autocorrelation in Fig. 65 for both victims and events starts at a very high level (around 1.0)
and gradually decreases over the 10 months. The correlation remains significant throughout the
period, with events (blue) having a consistently higher autocorrelation than victims (red). It suggests
a systematic and persistent nature for both events and victims, with events showing more predictable
dynamics over time.</p>
        <p>In Fig. 66, the autocorrelation pattern is noticeably different. Although both metrics start at high
values, the correlation of events (blue) drops much faster and becomes very weak after 5 months. At
the same time, the number of victims (red) maintains a moderately strong correlation throughout
the period. It suggests that attacks on political infrastructure are more random or situational, while
the number of victims resulting from them maintains a more consistent pattern over time.</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Hierarchical agglomerative cluster analysis of multidimensional data</title>
        <p>The most significant number of shellings was recorded in Kharkiv, Kherson and Donetsk regions
(Fig. 67). This can be seen from the values of “Amount” and “Maximum”, which are the highest for
these regions. Also, these regions have high average values (“Average”). Some regions, such as Volyn,
Zakarpattia and Rivne, have the minimum number of recorded shellings. It is reflected by zero or
minimum values in most columns. Significant deviations from the average (“Stand From”) indicate
an uneven distribution of shellings during the observation period. For example, Kyiv has a high
standard deviation, which means periods of intense bombardment alternating with periods of
relative calm. The “Median” and “Mode” indicators are often equal to 1, which indicates that, most
often, one shelling was recorded during a specific period. However, for some regions, such as
Kharkiv, Kherson, and Donetsk, these figures are higher, confirming the greater intensity of shelling
in these regions.</p>
        <p>Kurdi- Asymmetry Range Min Max Amount
Observashness tions</p>
        <p>Eastern and southern regions were most affected: Donetsk, Kharkiv, Zaporizhia, Sumy, Kherson,
and Luhansk regions experienced the highest number of shelling (Fig. 68). Uneven distribution of
shelling: The intensity of shelling fluctuated significantly, as evidenced by the high standard
deviation for many regions. Frequency of shelling: Most often, one shelling was recorded during the
observation period (Median and Mode often = 1).</p>
        <p>Period of mass departure from Fig. 69 - April-September 2022 - the largest outflow, rapid growth
in the number of refugees. High volatility of data at the beginning. Stabilization: October 2022
January 2023 - the number of refugees remained relatively stable at a high level. Return: Since
February 2023, there has been a trend towards the return of refugees to Ukraine. The number is
gradually decreasing. High reliability: The data was fairly reliable throughout the period. A mass
exodus of Ukrainians abroad characterized the first months of the war. Over time, the situation
stabilized, and since the beginning of 2023, there has been a trend towards return. The data is
generally reliable.</p>
        <p>Kurdi- Asymmetry Range Min Max Amount
Observashness tions</p>
        <p>Kurdi- Asymmetry Range Min Max Amount
Observashness tions</p>
        <p>The most significant deviation from the average (more shelling), according to Table 70, is Kherson
(3.679), Kharkiv (1.953), and Donetsk (1.259) regions. It confirms that these regions experienced
significantly more shelling than the average for Ukraine. Close to the average: Dnipropetrovsk
(0.139), Kyiv (0.415), Zaporizhia (0.416), Sumy (0.403). The most significant deviation from the
average (fewer shelling): Rivne (-0.637), Volyn (-0.637), and Zakarpattia (-0.637) regions. These
regions experienced significantly less shelling than the average.</p>
        <p>Kurdi- Asymmetry Range Min Max Amount
Observashness tions</p>
        <p>The most significant deviation from the average (more attacks on political infrastructure)
according to Table 71 are Donetsk (3,061), Sumy (1,782), Kharkiv (1,534), Zaporizhia (1,458), Kherson
(1,344) regions. These regions experienced significantly more attacks on political infrastructure than
the average for Ukraine. Close to the average: Many areas have values close to zero, indicating that
the number of attacks on political infrastructure is close to the average for the country. The most
significant deviation from the average (fewer attacks on political infrastructure): Most western
regions, as well as some central ones, such as Chernihiv, have negative values, indicating that the
number of attacks on political infrastructure is lower than the average. The most significant outflows
(significantly above average), according to Table 72, are mainly in the first months after the start of
the full-scale invasion: April (-3.257), May (-2.684), September (0.824), October (1.029), November
(1.111), December (1.085) 2022. April and May stand out in particular with tremendous negative
values, indicating a sharp jump in the number of refugees immediately after the start of the war. The
positive values from September to December show that the number of refugees remained
significantly above average throughout the fall and early winter of 2022. Gradual stabilization and
decline (close to average or below): Since the beginning of 2023, the "Average" values have been
closer to 0, and since June 2023, they have been primarily negative, indicating a gradual return of
refugees and a decrease in their number abroad relative to the average for the entire period.</p>
        <p>According to Fig. 73, the Kharkiv region is most similar to Donetsk (3.44), Sumy (4.26), and
Kherson (4.26). It confirms that these regions, which are located in the east and south of Ukraine,
experienced similar intensity and nature of shelling. Kherson region: Most identical to Kharkiv (4.26),
Donetsk (6.68) and Mykolaiv (7.31). Again, these are regions that are relatively close to each other
and experience intense shelling. Donetsk region: Most similar to Kharkiv (3.44) and Luhansk (6.51).
These are neighbouring regions that were the epicentre of hostilities. Sumy region: Most identical to
Kharkiv (4.26) and Chernihiv (3.83). Western regions (Zakarpattia, Volyn, Rivne, Ternopil,
IvanoFrankivsk, Chernivtsi) show the highest values, with most of the eastern and southern regions. This
means that the nature of shelling in the western regions was significantly different from that of
shelling in the east and south, which is reasonably expected, given the geographical location and
intensity of hostilities in other regions.</p>
        <p>According to Fig. 74, the Kharkiv region has the most significant similarity with Donetsk (3.46),
Sumy (7.29) and Luhansk (3.82). The similarity with Donetsk remains very high, which is expected
since these regions were on the front line. However, unlike the shelling of civilian infrastructure, the
similarity with the Kherson region is lower here. Kherson region: The most significant similarity
with Mykolaiv (3.00), Zaporizhia (3.37) and Dnipropetrovsk (2.44). Shifting emphasis to southern
regions may indicate a different nature of attacks on political infrastructure in this region. Donetsk
region: Most similar to Kharkiv (3.46) and Luhansk (4.09). As in the previous analysis, similarities
with neighbouring regions remain high. Sumy region: Most identical to Chernihiv (2.02) and Kharkiv
(7.29). Western regions (Zakarpattia, Volyn, Rivne, Ternopil, Ivano-Frankivsk, Chernivtsi) again
show the highest values, with most of the eastern and southern regions. It confirms that the nature
of attacks on political infrastructure in the western regions was significantly different from the
nature of attacks in the east and south.</p>
        <p>According to Fig. 75, The first months after the start of the full-scale invasion (April-June 2022):
April and May 2022 show a relatively high closeness value (9.45), which indicates a similarity of
dynamics during this period (rapid growth in the number of refugees). June 2022 shows less
similarity with these months, which may indicate the beginning of changes in dynamics.
Summerautumn 2022 (July-November): July, August, September, October and November 2022 show relatively
low closeness values to each other, which indicates a similar dynamic during this period (relative
stabilization and further growth). The similarity between July and August (1.28), as well as between
September, October and November (values around 1-2), is especially noticeable. Winter 2022 - Spring
2023 (December 2022 - May 2023): This period is characterized by greater variability. December 2022
shows relatively low similarity with previous months, which may indicate the beginning of a new
phase. Starting from January 2023, there is a tendency for the proximity values between
neighbouring months to increase, although with some fluctuations. Summer 2023 - Spring 2024 (June
2023 - March 2024): Starting from June 2023, the proximity values decrease again, which indicates
the formation of a new trend, different from the previous one. July and August 2023 have the lowest
value (0), which indicates the identity of the dynamics.</p>
        <p>We use the nearest neighbour strategy to perform agglomerative hierarchical cluster analysis
(Fig. 76). The distance between two groups is defined as the distance between the two nearest
elements of these groups. This strategy is monotonic and firmly compresses the feature space, and
its parameters are  =  = 0.5,  = 0,  = −0.5 .</p>
        <p>First steps (1-10) for dataset 1 (Fig. 76a): Mergers occur at relatively small metric values (from
0.014 to 0.312). It means that at the beginning of the algorithm, regions with very similar shelling
patterns are merged. Middle steps (11-17): Metric values begin to increase (from 0.361 to 1.553). It
means that less similar clusters are merged. Last steps (18-25): A significant increase in metric values
is observed (from 1.652 to 7.289). It indicates the merging of large and relatively heterogeneous
clusters. The last step (25) stands out in particular, where the metric value reaches 7.289. It means
that at this step, two large clusters with very different natures of shelling were combined, which
actually completes the process of hierarchical clustering (Fig. 77).</p>
        <p>First steps (1-10) for dataset 2 (Fig. 76b): Mergers occur at relatively small metric values (from 0.09
to 0.446). It indicates that at the beginning of the algorithm, regions with very similar patterns of
shelling of political infrastructure are merged. Middle steps (11-17): Metric values gradually increase
(from 0.48 to 1.167). It means that less similar clusters or individual regions with clusters are merged.
The growth rate of the metric here is smoother than in the analysis of the shelling of civilian
infrastructure. Last steps (18-25): A faster growth of metric values is observed (from 1.233 to 8.451).
It indicates the merging of large and relatively heterogeneous clusters. As in the previous case, the
last step (25) is characterized by a tremendous metric value (8.451), which indicates the combination
of two large clusters with the most different nature of attacks on the political infrastructure (Fig. 78).</p>
        <p>First steps (1-10) for dataset 3 (Fig. 76c): Mergers occur at relatively small metric values (from
0.591 to 1.097). It means that at the beginning of the algorithm, months with very similar trends in
the number of refugees are merged. Middle steps (11-20): Metric values gradually increase (from
1.169 to 2.011). It means that clusters with clusters that are less similar or individual months with
clusters are merged. Last steps (21-23): A sharp increase in metric values is observed (from 6.345 to
9.788). It indicates the merging of large and very heterogeneous clusters. The last two steps stand
out in particular, where metric values become very large. It means that clusters that differ
significantly in the dynamics of the number of refugees were merged at these steps (Fig. 79).</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusions</title>
      <p>Based on the multi-stage comprehensive study of the relationship between russian aggression
and migration processes in Ukraine, which included statistical analysis, time series and cluster
analysis, the following solid conclusions can be drawn:</p>
      <p>1. Methodological aspects of the study were based on the integrated application of various
analysis methods:</p>
      <p>The use of the R programming language for statistical data processing demonstrated high
efficiency due to its powerful data processing and visualization capabilities;
The use of various time series smoothing methods (Kendall, Pollard, and exponential
smoothing) allowed us to identify fundamental trends and patterns;
 The sequential smoothing method (method B) showed better results compared to direct
smoothing (method A), providing smoother curves and better preservation of long-term
trends;
 Hierarchical agglomerative cluster analysis effectively revealed hidden patterns in the data
and allowed us to group areas by the similarity of situation;
2. The study of attacks on civil infrastructure showed a clear evolution of the intensity of attacks:
 Until 2022, a relatively low and stable level of incidents was observed (20-40 events per
month);
 A sharp surge in activity in 2022 to 700-800 incidents;
 Further stabilization at the level of 200-400 events per month;
 A clear geographical pattern was identified: the eastern and southern regions of Ukraine
(Kharkiv, Kherson, Donetsk) suffered the most significant number of attacks;
 Cluster analysis showed the formation of stable groups of regions with similar patterns of
attacks;
3. The characteristics of attacks on political targets demonstrate distinct dynamics:
 A stable level of 1,000-1,500 cases per month in 2018-2022;
 A dramatic increase to 5,000 cases per month in 2022;
 Stabilization at a high level of 4000-4500 cases;
 More intense attacks compared to civilian infrastructure;
 Donetsk, Sumy, Kharkiv, Zaporizhia and Kherson regions formed the core of the most
affected areas;
4. Research on refugee dynamics revealed a clear structure of migration processes:
 Rapid growth from almost zero to 4.5 million in a short period in early 2022;
 Peaking at around 8 million;
 Two notable “stepped” declines to 6.5 and 6 million;
 Relative stabilization at around 6 million;
 Three key periods are identified:
a. Initial period (April-May 2022) with a massive outflow of population;
b. Stabilization period (summer-autumn 2022);
c. Period of gradual return (from early 2023);
5. A complex system of correlations between different aspects of the conflict has been identified:
 There is a strong relationship between the intensity of attacks and the growth of the number
of refugees (correlation 0.793);
 The powerful impact of shelling of civilian infrastructure (correlation ratio 0.803);
 Attacks on political infrastructure show a significant impact (correlation ratio 0.753);
 The highest correlation ratio (0.844) between fatalities from attacks on political targets and
the number of refugees;
 Negative correlation (-0.748) between the number of refugees and civilian casualties;
6. The analysis of time characteristics revealed:
 High inertia of migration processes, especially in short-term periods (0-3 days);
 The gradual decrease in the strength of autocorrelation over time, but maintaining
significance even with a lag of 10 days;
 More predictable dynamics of shelling of civilian infrastructure compared to attacks on
political objects;
 A clear change in the nature of all studied indicators with the beginning of a full-scale
invasion;
7. The practical significance of the research results has wide practical application:
 Forecasting migration flows and planning humanitarian aid;
 Risk assessment for different regions of Ukraine;
 Planning civil protection measures;
 Development of strategies for the restoration of affected territories;
 Optimization of the distribution of humanitarian aid;
 Documentation of russian war crimes;
 Improvement of early warning systems for threats;
8. The following limitations of the study were identified:
 Potential delay in data registration;
 Difficulty in taking into account all factors influencing migration;
 Limitations of statistical methods in the analysis of extreme events;
 Potential delay in data registration;
 The analysis is limited by available periods;
9. Directions and prospects for further future research were identified:
 Expansion of the period of analysis;
 Inclusion of additional influencing factors;
 Development of predictive models;
 Detailing regional features;
 Improvement of data analysis methods;
 Application of other cluster analysis methods;
 In-depth analysis of cause-and-effect relationships.</p>
      <p>The conducted research convincingly demonstrates the systemic nature of russian aggression
aimed at destroying both the civilian and political infrastructure of Ukraine, which led to large-scale
forced migrations of the population. Clear patterns and relationships between the intensity of
military actions and the scale of migration were identified, which is of critical importance for
understanding the nature of the conflict and its impact on the population. The use of a set of
statistical methods made it possible to identify hidden patterns and trends that can be used to predict
and plan a humanitarian response. Of particular importance is the identification of different
dynamics of attacks on civilian and political targets, which is of fundamental importance for
understanding the aggressor's strategy and developing effective protective measures. The results of
the research create a methodological basis for further analysis and forecasting the development of
the situation. It can also be used both for scientific purposes and for practical planning of
humanitarian assistance and management of migration processes.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgements</title>
      <p>The research was carried out with the grant support of the National Research Fund of Ukraine,
"Information system development for automatic detection of misinformation sources and inauthentic
behaviour of chat users", project registration number 33/0012 from 3/03/2025 (2023.04/0012). Also,
we would like to thank the reviewers for their precise and concise recommendations that improved
the presentation of the results obtained.
The authors have not employed any Generative AI tools.</p>
      <p>References</p>
    </sec>
  </body>
  <back>
    <ref-list />
  </back>
</article>