1. Introduction

" Journal of Computational Social Science

10.15407/dse2022.02.037

Victoria Vysotska

Victoria.A.Vysotska@lpnu.ua 0

Viktoriia Yakovlieva

viktoriia.yakovlieva.sa.2022@lpnu.ua 0

Sofiia Ivaniv

sofiia.ivaniv.sa.2022@lpnu.ua 0

Iryna Shakleina

ioshakleina@gmail.com 0 0 Lviv Polytechnic National University , Stepan Bandera 12, 79013 Lviv , Ukraine

2022

48 2 0000 0001

The article presents the results of an intellectual analysis of the relationship between the dynamics of the number of Ukrainian refugees and the frequency of attacks by russia on the civilian and political infrastructure of Ukraine. The study aims to identify statistical relationships between these phenomena and visualize their dynamics using the R programming language. Three data samples were processed: the number of refugees, the number of attacks on civilian objects, the number of attacks on political objects, and the dynamics of the number of Ukrainian refugees abroad. The paper uses methods of preliminary statistical analysis, time series smoothing (moving average, median filtering, exponential smoothing) and also conducts correlation analysis. The results indicate a strong connection between the intensity of attacks, especially on political objects, and the growth of the number of refugees. The analysis allows for a deeper understanding of the impact of military actions on migration processes and may be helpful in predicting future trends.

Intelligence analysis big data analysis refugees R shelling time series smoothing correlation aggression statistical analysis migration infrastructure 1

1. Introduction

The full-scale invasion of russia in Ukraine has caused not only infrastructure destruction but also large-scale humanitarian consequences, including mass migration of the population. The number of Ukrainian refugees forced to leave the country has grown in parallel with the intensity of attacks on civilian and political targets. The scientific community is actively researching migration processes. Still, most of the work focuses on the sociological or political aspects of the problem, leaving aside the analytical relationships between the frequency of attacks and the dynamics of migration. In this context, the use of modern tools for data mining and mathematical modelling using the R programming language, which allows for deep statistical processing, visualization, and correlation analysis of large data sets, is of particular importance. The work is aimed at filling the existing scientific gap and has applied value to the development of effective strategies for responding to humanitarian crises caused by armed conflicts. For Ukraine, the results of such a study are significant from the point of view of strategic planning, social protection of the population, and international cooperation.

The purpose of the work is to identify and formalize the relationships between the dynamics of the number of Ukrainian refugees and the frequency of russian attacks on the civilian and political infrastructure of Ukraine. To achieve this goal, the following tasks must be solved: generate data samples on shelling and the number of refugees; perform pre-processing, normalization and visualization of data; apply time series smoothing methods to identify trends; create correlation models of dependencies between parameters; carry out cluster analysis of dynamics based on statistical indicators.

The object of the study is the forced migration of the population of Ukraine under the influence of military actions. The subject of the study is the dependence of the dynamics of changes in the number of Ukrainian refugees on the frequency and nature of attacks on civilian and political infrastructure. For the first time, a comprehensive approach to the analysis of the relationship between the number of refugees and the intensity of attacks is proposed, which is based on a combination of statistical methods, time series smoothing and clustering in the R environment. New results were obtained regarding the degree of influence of attacks on political infrastructure on the increase in the number of refugees. It was established that these attacks have a stronger correlation with the dynamics of migration, which was not covered in previous studies. The work improves the methodology for analysing migration processes in conditions of armed conflict, providing the opportunity for operational forecasting of the humanitarian situation in the country.

2. Related works

The issue of the impact of armed conflicts on migration processes attracted the attention of researchers long before the start of the full-scale war in Ukraine. However, it was the events after 2022 that became the impetus for the active study of changes in the structure and scale of forced migration as a result of armed aggression. The works [1,2] investigated the general trends in population displacement as a result of the war and, in particular, studied the socio-economic consequences and challenges for host countries. Specific attention was paid to Ukraine as the largest source of refugees in Europe after 2022 [3].

A number of studies [4,5] analyse the demographic characteristics of refugees, gender composition, access to services, and adaptation conditions. However, insufficient attention has been paid to statistical modelling and attempts to establish a connection between migration waves and specific military events. Study [6] is one of the few that uses regression analysis to identify a correlation between the number of attacks on infrastructure and the number of migrants. Work [7] emphasizes the importance of building forecasting models but uses only aggregated indicators without deep temporal detail.

Existing publications mainly focus on qualitative analysis of the phenomenon, which creates a need for more formalized approaches using statistical methods of time series analysis, smoothing, clustering and data normalization. Such techniques allow moving from describing the phenomenon to identifying hidden patterns and creating tools for operational forecasting, which is especially relevant for Ukraine in the context of a long-term threat from the aggressor.

The issue of forced migration as a result of russia’s armed aggression against Ukraine has received considerable attention in modern scientific and intergovernmental literature. The works of Libanova, Poznyak, and Tsymbal [1] provide a fundamental analysis of demographic changes in Ukraine after the outbreak of full-scale war. The researchers emphasize the complexity of the problem, including social, economic, and security aspects. Digital methods of tracking migration flows have become an important research tool: Wycoff et al. [2] demonstrated the effectiveness of analysing digital traces, in particular data from Google and social networks, to monitor the movement of Ukrainian refugees in real-time. Minora et al. [3] worked in a similar direction, using Facebook advertising data to build a model of Ukrainian movements within the EU, confirming the feasibility of using big data in times of crisis.

The organizational and political aspects are reflected in the reports of international organizations. OECD analysis [4] emphasizes the unprecedented nature of the current flow of refugees to Europe, suggesting ways to improve integration strategies. The specific impact of massive shelling on migration dynamics is revealed in a study by the International Center for Ukrainian Victory [5], where particular data shows how waves of attacks lead to a sharp increase in the number of people leaving the country.

The focus of the study by Kovtun and Salabay [6] is the integration of Ukrainians in host countries, particularly in Germany. The work includes statistical processing of questionnaire data, which is a valuable addition to macro-level assessments. Publications by Reuters [7] and The Guardian [8] complement the scientific picture with a modern context - they emphasize the threat of “migration weapons” as an element of hybrid warfare and draw attention to the potential decrease in international support due to donor fatigue. This review allows us to conclude that although the issue of forced migration is actively researched, most of the existing work does not focus on quantitative modelling of the relationship between attacks on infrastructure and migration dynamics. Therefore, a study based on time series and statistical analysis is a logical and vital continuation of this scientific discussion.

3. Methods and materials

The study used a set of methods of mathematical statistics, data mining, and computer modelling [9-16], which allowed for a comprehensive analysis of the relationship between the number of Ukrainian refugees and the frequency of attacks on the civilian and political infrastructure of Ukraine by russia. The primary tool for implementing all stages of the analysis was the R programming language, which provides extensive capabilities for processing, visualization, and statistical modelling of data. The use of the R language is due to its openness, flexibility, and the availability of specialized packages for processing time series (zoo, forecast, TTR), plotting (ggplot2, plotly), and conducting cluster analysis (cluster, factoextra). It allowed for the effective processing of three independent samples: data on the number of refugees, attacks on civilian infrastructure, and political objects.

At the stage of primary data processing, methods of normalization, cleaning and structuring of data were used, followed by their presentation in the format of time series. Smoothing methods were chosen to study the dynamics, in particular:     simple moving average method — to eliminate random fluctuations and identify the primary trend; weighted moving average — to better take into account the asymmetric effects of events; median filtering — as an effective means of highlighting long-term trends in the presence of outliers; exponential smoothing — to reflect the inertia and adaptability of processes.

Correlation analysis methods were used to identify the nature of the dependencies between variables, including the calculation of Pearson coefficients, determination and correlation relations, and the construction of correlation fields. These methods made it possible to establish the degree of connection between the number of attacks and the change in the number of refugees. In addition, hierarchical agglomerative cluster analysis was used to analyse structural changes in the data, which allowed group periods with similar characteristics of attacks and migration dynamics to identify phase transitions in the migration behaviour of the population. Thus, the selected methods and tools provided a reliable basis for a comprehensive analysis of complex nonlinear dependencies in time series, allowing the identification of hidden relationships between military events and the behaviour of the civilian population in conditions of armed conflict. For this work, the R language was chosen to be the best suited for statistical data analysis. R is an open-source programming language. It is used for statistical data processing (statistical calculations) and graphics (visualization). R can be used to work with data sets. For example, R is used to solve complex problems of mathematical statistics, perform primary data analysis, and perform mathematical modelling. Using R, you can prepare data for research and process experimental results in various areas of life, such as medicine, nature management, environmental protection, econometrics and financial analysis, marketing, engineering calculations, etc. R not only supports a wide range of statistical and numerical methods but can also be extended with software packages - libraries for specific functions or special areas of application. The first versions of R were created in the 1990s. Since then, it has been constantly evolving, adding new packages and features, improving existing ones, and fixing bugs. Thanks to an active community of users and developers, R remains relevant and is constantly being updated. The programming language has a unique syntax and framework for running programs. R is actively used in artificial intelligence and machine learning. At first glance, the programming language may seem quite complex. In fact, it is pretty logical and straightforward. R was created by developers for scientists who have experience and knowledge in the field of mathematical analysis, static methods, and probabilistic deviations. It has a number of advantages:  

Code in this programming language can be run without compilation, as it uses an interpreter that demonstrates how the program works in real-time;

R is efficient and productive due to its vector approach.

The R programming language is used to work with data: collecting and analysing data from various sources; searching for patterns and deviations; testing and validating hypotheses; visualizing data in multiple ways; working with statistical data to identify anomalies.

Thus, R is and remains one of the most flexible and powerful programming languages designed specifically for data analysis. The R language is capable of processing a large number of types of various objects - vectors, matrices, lists, data tables, etc. The R programming language can also work with a large number of data types. These can be, for example, numbers with a fractional part, integers, text records, date and time values, logical operation values, etc.

Three datasets [17-18] were selected for analysis, which are directly related to the topic of the work. The first dataset is the number of Ukrainian refugees abroad. This dataset contains information about the number of Ukrainians who left the country due to the war. The second dataset is the shelling of Ukrainian civilian infrastructure. This dataset includes information on the number and scale of attacks on Ukraine's civilian infrastructure. And the third is the shelling of the political infrastructure of Ukraine. This dataset covers attacks on administrative buildings, government facilities and other political institutions. With the help of these three datasets, we want to show how the shelling of Ukraine affects the number of Ukrainians travelling abroad. The analysis will help to understand how attacks on civilian and political infrastructure are related to forced population migration. Dataset 1 (Fig. 1a) contains information on 1,821 cases of shelling of civilian infrastructure in different regions of Ukraine from January 2018 to October 2024. The data are structured by administrative units (regions and cities) with corresponding codes and the number of events.

Dataset 2 (Fig. 1b) presents a larger dataset (3,146 records) on shelling of political infrastructure. It is noteworthy that the number of events (Events) varies significantly between regions – from single cases to 40 events in the individual areas, which indicates an uneven distribution of attacks. Dataset 3 (Fig. 1c) shows the dynamics of the number of Ukrainian refugees abroad from April 25, 2022, to March 12, 2024. There is an increase in the number of refugees – from 85,000 to 5,982,920 people. The data is presented in CSV format and contains 470 rows after cleaning, presenting data values in the form of a compressed table in Table 1-3.

The graphical representation of the data is given in Fig. 2-4 for the respective datasets. For dataset 1 (Fig. 2a) in the initial period (2018-2022), the data shows a relatively low and stable level of incidents, the number of which fluctuates within 20-40 events per month. In 2022, there is a sharp jump in the number of incidents to 700-800 incidents. It is the most intense period on the graph. After the main surge (2022-2024), the number of incidents decreased but remained significantly higher than before 2022, fluctuating between 200-400 events per month. Towards the end of the graph, there is a sharp decrease in the number of incidents. shelling and (c) Ukrainian refugees abroad number dynamics since the beginning of the full-scale war in the Cartesian coordinate system and in the polar coordinate system.

According to the data analysis for dataset 2 (Fig. 2b), the initial period (2018-2022) has a relatively stable level of events. The indicators fluctuate within 1000-1500 cases per month, and some seasonal fluctuations are observed. There was a noticeable decrease in the number of events in 2020-2021. The indicators decreased to approximately 500 cases per month, a relatively stable low level during this period—a dramatic increase in the number of events. Peak values reach about 5000 cases per month, the most intense period for the entire time of observations. Stabilization at a high level in 2022-2024, the indicators fluctuate within 4000-4500 cases per month, and periodic fluctuations in intensity are noticeable. Towards the end of the graph, a sharp decrease in the number of incidents is observed.

According to the data analysis for dataset 2 (Fig. 2c), the initial period (early 2022) has a sharp increase in the number of refugees from almost zero to about 4.5 million in a very short period. The most rapid growth is observed in the first weeks. The peak value of about 8 million people is reached in mid-2022, with relative stabilization at a high level. Two noticeable “step” declines in the period 2022-2023, the first decline to about 6.5 million and the second decline to about 6 million. A relatively stable level of about 6 million people in the period 2023-2024, minor fluctuations in this range, and a tendency to slow levelling off. Descriptive statistics – quantitative characteristics of the data for the datasets are presented in Fig. 3.

Dataset 1 (Fig. 3) describes events involving civilian targets. On average, there were 5.1 events with 4.8 casualties. The data have significant variability (coefficient of variation 169% for events and 524% for casualties). There is a strong right-sided skew (5.0 for events and 21.6 for casualties), indicating the presence of extreme values. The maximum number of events in a single case is 139, and the maximum number of casualties is 774. In total, 1821 observations were recorded, with 9293 events and 8696 casualties.

Name Dataset 1 Dataset 2 Dataset 3

Dataset 2 (Fig. 3) describes events of political violence. The average number of events is much higher - 54.4 with 38.4 victims. There is also high variability (coefficient of variation 183% for events and 480% for victims). The right-sided asymmetry is less pronounced (3.0 for events and 11.0 for victims). The maximum values are significantly higher - 757 events and 3757 victims. A total of 3146 observations are of 171148 events and 120782 victims. Dataset 3 (Fig. 3) has a different nature of the data, as it describes the number of refugees and has completely different columns. The average value is about 6.17 million—relatively low variability (coefficient of variation 19.3%). The negative asymmetry (-2.5) indicates a left-sided distribution. The range of values is from 85,000 to 7.9 million— a total of 470 observations.

4. Experiments, results and discussion 4.1. Data pre-processing and presentation of results

The histogram in Fig. 4a for dataset 1 shows a sharply asymmetric distribution with a maximum of about 1500 cases at the beginning of the scale. Most events are concentrated in the range of 0-50 attacks. There is a sharp decrease in frequency as the number of events increases. Such a distribution may indicate that a single or small series of attacks occur most often. It has a pronounced right-sided asymmetry. The shape of the distribution resembles an exponential or Poisson distribution. A sharp decrease in frequency with an increase in the number of events is characteristic of an exponential distribution law. Since the data are discrete and represent the number of events, the Poisson distribution may be the most suitable approximation. The histogram in Fig. 4b for dataset 2 also shows an asymmetric distribution but with a higher peak (over 2000 cases). The distribution is more stretched along the X-axis (up to 800 events). The frequency of events gradually decreases with an increase in their number. It indicates more intense attacks on political infrastructure than on civilian infrastructure. It also exhibits right-sided asymmetry. As in the first case, the shape corresponds to an exponential distribution. It can be approximated by a gamma distribution, which is more flexible and can better account for the "heavy tail" of the distribution. Again, given the discreteness of the data, a Poisson distribution may be appropriate.

The histogram in Fig. 4c for dataset 3 has a fundamentally different distribution pattern - close to normal. The peak of the distribution falls in the range of about 5-6 million refugees. The distribution is more symmetrical compared to the previous histograms. There are a small number of cases with a small number of refugees (about 0-2.5 million). The bulk of the data is concentrated in the range of 5-7.5 million refugees. There is some asymmetry, but much less than in the previous cases. It can be approximated by a normal distribution or, to better account for the asymmetry, by a lognormal distribution. You can also consider the gamma distribution as an alternative since it works well with data that has a slight asymmetry.

Most events (shellings) are concentrated at the beginning of the scale (0-50). It has a very pronounced peak at the beginning, with about 1500 events. The cumulative curve increases rapidly and reaches a plateau, indicating that most events occur in the first intervals (Fig. 5a-b). After 50 events, only isolated cases are observed.

Similar distribution to the first dataset, but with a higher peak (about 2000 events). It also has an intense concentration of events at the beginning of the scale. The cumulative curve shows a similar dynamic of rapid growth with subsequent plateauing (Fig. 5c-d). The distribution is more stretched along the scale (up to 800 events). It differs significantly from the first two in the nature of the distribution (Fig. 5e-f). It has a normal distribution with a peak of about 5-6 million people. The cumulative curve has an S-shaped shape, which is typical of a normal distribution. The bulk of the data is concentrated in the range of 4-7 million. There is a small number of observations at the beginning of the scale (0-2 million).

4.2. Time series trend detection using smoothing methods

Smoothing methods are used to reduce the influence of the random component (random fluctuations) in time series. They provide an opportunity to obtain more "clean" values, consisting only of deterministic components. Some of the methods are aimed at highlighting only some elements, such as a trend. Smoothing methods can be conditionally divided into two classes, which are based on different approaches: analytical approach and algorithmic approach. The analytical approach is based on the selection of a mathematical function (for example, an exponential, polynomial or hyperbola) that best fits the data trend, determined visually. Then, the parameters of this function are estimated using mathematical or statistical methods, which form a model to describe the time series. The algorithmic approach focuses on calculating new values of the series using algorithms such as the moving average method, weighted average method, exponential smoothing method, and median smoothing method.

From 2018 to early 2022, the data (Fig. 6a) shows relatively stable low-level activity, where the moving average (red line) closely follows the actual data points (blue dots). There is a sharp spike in early 2022, after which the level remains elevated but gradually stabilizes. The moving average smoothes out the volatile spikes in the data, preserving the overall trend. The trend is clearly nonlinear, especially after 2022. The graph shows more stable activity from 2018-2020 with approximately 1000 events (Fig. 6b). Small decline in 2020-2021. As in the first graph, there will be a sharp increase in 2022. The trend is highly nonlinear, with several clear phases. Given the nonlinear nature of both data sets, a weighted moving average would be more appropriate than a simple moving average. A simple moving average tends to lag behind significant changes in the data, especially during sharp increases/decreases. The current simple moving average may not accurately reflect the actual dynamics of the processes, especially during rapid change periods.

Graph 1 in Fig. 7a (Civilian-targeted events) illustrates the weighted moving average (grey line), which better reflects the dynamics of changes compared to the simple moving average. In the period 2018-2021, the trend line reacts more sensitively to fluctuations in the data. After a sharp jump in 2022, the weighted average adapts more quickly to the new level of activity. The smoothing is less aggressive, which allows us to better track fundamental changes in the data. At the end of the period (2023-2024), the trend towards stabilization at the new level is better visible. Graph 2 in Fig. 7b (dataset 2) illustrates that by 2022, the weighted moving average more accurately reflects fluctuations in activity around the level of 1,000 events. The gradual decrease in activity in 2020-2021 is more noticeable. After the jump in 2022, the method better reflects the fundamental dynamics of growth. There is less lag from actual data during sharp changes. The formation of a new stable level in 20232024 is more clearly visible.

The least aggressive smoothing is shown in Fig. 8a-b for datasets 1-2, which better reflects shortterm fluctuations. More responsive to data outliers. There is more detail in the process dynamics and less lag from real data.

Stronger smoothing compared to w=3 is shown in Figure 8c-d, which filters out random fluctuations better. There is more lag from real data. More clearly, it shows medium-term trends. Less sensitive to local extremes. The most aggressive smoothing is shown in Figure 8d-e, which best detects long-term trends. Significantly reduces the impact of outliers and the most considerable lag from real data. It is best suited for detecting a general trend. Comparative analysis of methods: 1. For events targeting civilians:  All methods clearly show a sharp jump in 2022;  Non-linear smoothing (w=7) best shows stabilization after the jump;  For current monitoring, w=3 is better suited. For trend analysis - w=7; 2. For events targeting political targets:  All methods reflect the overall dynamics well;  Non-linear smoothing best shows the transition between different modes of activity;  Larger values of w are better suited for analyzing long-term changes.

Therefore, for operational monitoring, linear smoothing is best suited when w = 3, for mediumterm analysis – w = 5, for identifying long-term trends – nonlinear smoothing w = 7. The very low level of events (about 50) from 2018 to early 2022 is illustrated in Fig. 9a. A sharp peak in 2022 to about 700 events. Further decrease and stabilization at the level of 200-300 events during 2023-2024— a sharp drop in early 2025. A relatively stable period from 2018 to early 2022 with rates of about 1000-1500 events is shown in Fig. 9b. A sharp increase in rates in 2022 to about 4000-5000 events. Maintaining a high level (about 4000 events) throughout 2023-2024. A sharp drop in early 2025. Both graphs show a dramatic change in the situation starting in 2022, coinciding with the start of russia’s full-scale invasion of Ukraine. It is noticeable that the number of events directed at political targets significantly exceeds the number of events directed at civilians. Median filtering (shown by the orange line) helps to smooth out short-term fluctuations and identify significant trends in the data.

The initial data in Fig. 10 shows a relatively low and stable level of both events and fatalities from 2018 to 2021, followed by a sharp spike in 2022 when over 2,000 incidents occurred. In normalized data (on a scale of 0-1), both metrics show almost zero activity until 2022. The spike in 2022 reaches the maximum normalization (1.0) for both events and fatalities. After 2022, the number of events (orange line) remains at a higher normalized level (around 0.3-0.5) compared to the number of fatalities (red line), which decreases to around 0.1. It suggests that although attacks continue, they have become relatively less lethal.

Consecutive low-level political events (blue line) in 2018-2021 are shown in Fig. 11. Sharp increase in both indicators since 2022—more erratic patterns of fatalities (green line) with extreme spikes. In normalized data: Events (orange line) show higher, more consistent normalized values (0.75-1.0) after 2022. Fatalities (red line) show more variation but generally lower normalized values. The relationship between events and fatalities is less intense than in the infrastructure dataset.

A high correlation coefficient (≥ 0.7) indicates that the smoothed series well preserves the general trend of the original series (Table 4). At the same time, smoothing removes local fluctuations (noise) but does not destroy the structure of the data. Turning points are local maxima and minima in the series. A significant reduction in the number of turning points in the smoothed series indicates that smoothing effectively eliminates short-term fluctuations (noise).

N=7

N=9

Accordingly, the original series has more "noise" or "chaotic changes", which can often be insignificant for analysis. A decrease in the number of turning points is a sign that the smoothed series shows the primary trend but with less detail. Fig. 12 shows the results of smoothing using the Kendall formulas. The data show a sharp peak of activity around point 55 on the time axis, reaching approximately 700 attacks. After the peak, there is a stabilization at around 200-250 attacks. Method B provides a smoother visualization of the trend (Fig. 16). Both methods show a similar overall picture, but Method B better reflects long-term trends.

There is a significant increase in the number of attacks starting from point 50 on the time axis. The peak value reaches about 5000 attacks. Method B (sequential smoothing) shows a smoother curve compared to method A (Fig. 17). Larger window sizes (w11-w15) give a smoother result but may lose important local features of the data. At the end of the period, there is a sharp decline in activity. The graph in Fig. 18 shows a rapid increase in the number of refugees at the beginning of the period (up to point 100). The maximum value reaches about 8 million people. Two noticeable declines are observed (around points 200 and 300). Both smoothing methods give very similar results, which indicates relatively “clean” initial data. At the end of the period, there is a stabilization at the level of about 6 million people.

In Fig. 19, for dataset 1, a strong positive correlation is observed between all smoothing windows (all values > 0.82). The strongest correlation is observed between neighbouring smoothing windows (for example, Window_5 and Window_7 correlate 0.9884). The correlation gradually decreases with increasing differences in the size of the smoothing windows. The original series has the strongest correlation with smaller smoothing windows (Window_3: 0.9432) and the weakest with larger ones (Window_15: 0.8205). There is a very high positive correlation between all smoothing windows (all values > 0.94) in Fig. 19 for dataset 2. Correlation values are generally higher than for civil events. There is also a trend towards a stronger correlation between neighbouring windows. The original series has a consistently high correlation with all smoothing windows (from 0.9416 to 0.9872). There is an extremely high positive correlation between all smoothing windows (all values > 0.99) in Fig. 19 for dataset 3—the highest correlation values among all three matrices. There is practically no difference between the correlations of neighbouring and distant smoothing windows. The original series has a very high correlation with all smoothing windows (all values > 0.99).

In the diagram in Fig. 20a, the initial number of points is smaller for dataset 1 (about 23-24). Method A shows unstable behaviour with local peaks and troughs. Method B shows a constant decrease in the number of points. The most significant difference between the methods is observed at medium window sizes (9-11). At the maximum window size (15), both methods show the smallest number of turning points. In the diagram in Fig. 20b, the highest number of turning points is observed for dataset 2 (about 30) at the smallest window size (3). Method A shows a smoother decrease in the number of points and stabilizes at about 15-20 points. Method B shows a sharp drop at the beginning and stabilizes at about 4 points. Both methods show a tendency to decrease the number of turning points with increasing window size. At large window sizes (11-15), the difference between the methods becomes more pronounced.

The highest initial number of turning points for dataset 3 (more than 50) among all three plots in Fig. 20. Both methods show a similar downward trend. Method A retains more turning points at all window sizes. After window size 11, both methods show relative stability. The difference between the methods remains almost constant at large window sizes.

According to Fig. 21a, the correlation between civilian shelling and the number of refugees according to the Kendel method has a weak linear relationship. The correlation coefficient of the modulus < 0.5, the coefficient of determination is less than 25% (Table 5).

According to Fig. 21b, the correlation between the shelling of political targets and the number of refugees, according to the Kendel method, has a linear relationship of medium strength. The correlation coefficient of the modulus is less than 0.7 but more than 0.5, and the coefficient of determination is less than 50% but more than 25%. According to Fig. 22a, the correlation between the fatal cases provoked by the shelling of civilian targets and the number of refugees, according to the Kendel method, has a weak linear relationship. The correlation coefficient of the modulus is < 0.5, and the coefficient of determination is less than 25%. According to Fig. 22b, the correlation between the fatal cases provoked by the shelling of political targets and the number of refugees, according to the Kendel method, has a weak linear relationship. The correlation coefficient of the modulus is < 0.5, and the coefficient of determination is less than 25%.

The correlation ratio is 0.581, indicating a moderate relationship between the variables (Fig. 23a). A scattered nature of the points around the midline is observed. The group variance (89068252167.606) is significantly smaller than the total variance (153339421323248.07), confirming the presence of a moderate relationship. Most events are concentrated in the range of 5-7 events, which may indicate a specific pattern in the frequency of shelling (Table 6).

The high correlation ratio of 0.935 indicates a powerful relationship between the variables (Fig. 23b). The points are located more densely relative to the mean line. The group variance (143336071722265.27) is close to the total (153339421323248.07), which confirms the strong relationship. There is a clear trend of an increase in the number of refugees with an increase in the number of attacks on political targets.

The correlation ratio of 0.791 indicates a strong relationship (Fig. 24a). The points have a noticeable spread but retain the general trend. The group variance (1212784734739.6) is significantly smaller than the total, which indicates the presence of other influencing factors. The main concentration of events is observed in the range of 4-8 fatal cases.

The very high correlation ratio of 0.976 indicates an almost functional relationship (Fig. 24b). The points are located most densely to the midline compared to other graphs. The group variance (149724110730.35) is nearly equal to the total, which confirms a powerful relationship. A direct relationship between the fatalities number and the refugees number is clearly visible.

There is a very strong positive autocorrelation (0.969 at lag 1), which gradually decreases with increasing lag (Fig. 25-26). Even at lag 10, the autocorrelation remains noticeable (0.628). It indicates a stable trend and inertia of the migration process - the number of refugees in the next period strongly depends on the previous period. The smooth decrease in autocorrelation indicates a relatively stable nature of migration processes.

Fig. 27 shows the autocorrelation of events (attacks) and casualties separately. The autocorrelation of events decreases more slowly (from 0.956 to 0.527 at lag 10). The autocorrelation of the number of casualties decreases much faster (from 0.863 to 0.16). It means that the attacks themselves are more systematic, while the number of casualties is more random and less predictable.

The number of events (attacks) in Fig. 28 shows a rapid decrease in autocorrelation - from 1.0 to 0.12 over 10 months, which indicates a somewhat chaotic and less systematic nature of the attacks over time. The sharp drop is especially noticeable after the 5th lag (month). In contrast, the autocorrelation of the number of victims decreases more slowly - from 1.0 to 0.465, maintaining higher values throughout the period. It may indicate that although the attacks themselves become less predictable, their lethality retains a certain systematicity and dependence on previous periods. Such a pattern may indicate a change in attack tactics - from regular, systematic attacks to more sporadic (irregular), but with similar effectiveness in terms of victims. Fig. 29 shows a generalized plot of the results of smoothing using the Pollard formulas. These plots demonstrate the dynamics of attacks on political infrastructure using two smoothing methods. In both cases, there is:

The initial period had a relatively stable event rate (around 1000-1500 events). Sharp increase after 50th period to peak around 4000-5000 events. Method B shows smoother transitions between periods, especially in the area of sharp increase. Different window sizes (w3-w15) affect the degree of smoothing, with larger windows giving a smoother curve. The graphs in Fig. 30 show a low level of events at the beginning (around 20-30 cases). A sharp peak of activity around the 50th period (up to 700 cases). Further stabilization at the level of 200-300 cases. Method B provides a smoother representation of the data, especially in the peak area. Larger window sizes (w11-w15) significantly smooth out the peak values. The graphs in Fig. 31 show a rapid increase in the number of refugees at the beginning (up to 8 million). Two sharp declines (around the 180th and 300th periods). Stabilization after the 300th period at the level of about 6 million. Both methods give almost identical results for this data set. The window size has a minimal effect on the shape of the curve, indicating greater stability of the data.

There is a robust positive correlation between all smoothing windows (coefficients from 0.9137 to 0.9999). The strongest correlation is observed between neighbouring smoothing windows (Fig. 32, dataset 1). The original series has the strongest correlation with smaller smoothing windows (Window_3, Window_5) and somewhat weaker with larger windows. As the smoothing window size increases, the correlation with the original series gradually decreases (from 0.9696 to 0.8951).

We see an interesting difference from the previous correlogram (Fig. 41) - here, "Victims" (red colour) have a more substantial autocorrelation than "Events" (green colour). "Victims" starts with a high autocorrelation (1.0). It slowly decreases to 0.46 at lag 10. It maintains relatively high values even at considerable lags. "Events" also starts with a high autocorrelation (1.0). It decreases much faster to 0.133 at lag 10. After lag 5, the autocorrelation becomes relatively weak (<0.4). Such a structure may indicate that in attacks on political infrastructure, the number of victims is more predictable and systematic than the events themselves. This may be due to the fact that political objects usually have a certain number of permanent personnel, so the number of potential victims is more stable. Instead, the events themselves (the shelling) may occur more chaotically and less predictably.

The overall trend in Figure 42a shows a relatively low level until period 40. There is a sharp peak of activity around period 60. Smoothing with different alpha values helps to visualize the overall trend better. As in the first graph, smaller alpha values give a smoother curve.

The data show in Fig. 42b two main periods of exponential smoothing intensity: the first - about 1000-1500 cases (up to the 40th period), and the second - a sharp increase to 4000-5000 cases (after the 60th period). Smaller alpha values (0.1, 0.15) give a smoother curve, filtering out short-term fluctuations. Larger alpha values (0.25, 0.3) better track sharp changes but retain more "noise" in the data. The original data (pink line) shows significant volatility (variability). The graph in Fig. 43 shows a rapid increase in the number of refugees at the beginning of the period (up to the 100th period). After reaching the peak, a sharp decline is observed. Then, the curve stabilizes with a slight gradual decrease. Different alpha values produce very similar smoothing results, indicating relatively "clean" data with fewer random fluctuations.

There is a robust positive correlation between all smoothing levels (Alpha), with coefficients ranging from 0.77 to 0.99 (Fig. 44, dataset 1). The strongest relationship is between adjacent smoothing levels, and the weakest is between the original series and the smoothed data, which is the expected result. The "Political Events" matrix (Fig. 44, dataset 2) shows extremely high correlation values between all smoothing levels (0.86-0.99). The original series has a slightly stronger relationship with the smoothed data compared to civil events, which may indicate a greater regularity in political events. The "Refugee Data" matrix (Fig. 44, dataset 3) shows the highest correlation values among all three matrices (0.95-0.99), including the relationship with the original series. It indicates that the refugee data have the most stable and consistent dynamics, with fewer random fluctuations. With exponential smoothing for dataset 1, the graph shows a similar trend as the second graph but with a slightly smaller growth amplitude (from 19 to 32 points). There is a noticeable plateau at alpha values of 0.15-0.20, which may indicate some stability in the pattern of attacks on civilian infrastructure in this smoothing range.

Exponential smoothing for dataset 2 shows a gradual increase in the number of turning points from 18 to 35 as alpha increases, with the sharpest increase at alpha > 0.25. It may indicate a more complex structure and irregularity in the data on the shelling of political infrastructure at higher values of the smoothing parameter. The graph for exponential smoothing for dataset 3 shows the smoothest and most consistent increase in the number of turning points from 14 to almost 40, without obvious plateaus. It may indicate more regular dynamics of the refugee movement process and less abrupt changes compared to the shelling data. Fig. 46a shows a weak linear relationship. The modulus correlation coefficient is < 0.5, and the coefficient of determination is less than 25% (Table 9). Fig. 46b shows a dynamic relationship of medium strength. The modulus correlation coefficient is less than 0.7 but more than 0.5, and the coefficient of determination is less than 50% but more than 25%.

Fig. 47a shows a weak linear relationship. The correlation coefficient of the modulus is < 0.5, and the coefficient of determination is less than 25%. Fig. 47b shows a weak linear relationship. The correlation coefficient of the modulus is < 0.5, and the coefficient of determination is less than 25%. The correlation coefficient (Fig. 48a) is 0.815, indicating a strong positive relationship between the variables. It suggests that there is a significant relationship between the shelling and the parameter under study, where an increase in one indicator leads to a proportional increase in the other (Table 10). With a correlation ratio of 0.827 (Fig. 48b), there is a powerful positive relationship between the shelling of civilian targets and the number of refugees. It demonstrates that the intensity of shelling of civilian objects has a direct and significant impact on the increase in the number of refugees.

The correlation coefficient of 0.586 (Figure 49a) shows a moderate positive relationship between the number of attacks on political targets and the number of refugees. This relationship is less pronounced compared to previous indicators but still indicates some dependence between the variables. The correlation coefficient of 0.872 (Figure 49b) shows a powerful positive relationship between the number of fatalities caused by attacks and the number of refugees. It indicates that the increase in the number of victims has the most significant impact on the rise in the number of refugees, which is one of the factors studied.

Figure 51 shows a strong positive autocorrelation that gradually decreases with increasing lag. It indicates a clear temporal dependence in the refugee data, where current values are strongly correlated with previous periods, suggesting persistent trends in migration processes.

The graph in Fig. 52 shows a high initial autocorrelation for both metrics (casualties and events), with a sharper decline for the casualties’ indicator. It suggests that while both indicators are timedependent, the number of casualties has a less stable dynamic compared to the number of shelling events. Similar to the previous correlogram, the indicators in Fig. 53 show a high initial autocorrelation with a gradual decline but with a minor difference between the event and casualties metrics. It suggests a more consistent dynamics between the number of attacks and their consequences for the political infrastructure.

The median smoothing data (Fig. 54) shows a sharp spike around time point 50, followed by fluctuations at a higher level. Both smoothing methods are effective in reducing the extreme spike while maintaining the overall pattern. The sequential smoothing in Method B creates somewhat smoother transitions between periods of change, which can be helpful for analysing long-term trends in attacks on civilian infrastructure. The original data in Fig. 55 show significant volatility with a large spike around time point 60, followed by a sharp drop around time point 80. Method A and Method B produce similar smoothing effects, but Method B (sequential smoothing) provides somewhat more stable trends while maintaining the underlying patterns of the data. Both methods are effective in reducing noise while preserving the key features of the trend—the initial lower level of events, the sharp increase, and the final decrease.

Both methods in Fig. 56 show almost identical results for this dataset, probably because the original data already has a relatively smooth trend. The data show:  The rapid initial increase in the number of refugees;  Plateau around the 200th time point;  Significant drop followed by stabilization;  There is a slight upward trend in the final period.

The correlation matrix in Fig. 57 (dataset 1) shows a high correlation between the different smoothing windows (all values above 0.92). The highest correlation is observed between neighbouring window sizes, which is logical since they similarly process the data. The original data has the lowest correlation with the most enormous smoothing windows (Window_13, Window_15), indicating more smoothing and loss of detail as the window size increases.

Dataset 1

The correlation matrix in Fig. 57 (dataset 2) shows a similar pattern to the first table but with slightly higher correlation coefficients (all values above 0.93). It suggests that median smoothing produces more consistent results for political events compared to civil events.

The correlation matrix in Fig. 57 (dataset 3) shows a perfect correlation (all values = 1) between all smoothing windows. It indicates that the refugee data are very smooth, and different smoothing window sizes have little effect on the shape of the trend. With median smoothing, the high correlation with the original data is maintained, indicating that essential data characteristics are preserved during smoothing.

The diagram in Fig. 58a shows a similar pattern but with a greater difference between the methods. Method A shows unstable behaviour with oscillations, while Method B consistently reduces the number of turning points. Method B is significantly more effective in reducing the number of turning points compared to Method A for all window sizes (Fig. 58b). Method B quickly stabilizes at a low level. The plot in Fig. 58c shows the lowest number of turning points among all data sets. Method B almost eliminates turning points after a window of size 5, while Method A retains a certain number of turning points even at large window sizes.

The graph (Fig. 59a) shows a negative correlation - with an increase in the number of attacks on civilian targets, there is a tendency for the number of refugees to decrease. However, the data have significant variability (as can be seen from the scatter of blue points), and the confidence interval (grey zone) expands with an increase in the number of attacks, which indicates a lower reliability of the forecast at higher values—weak linear relationship. The correlation coefficient of the modulus is < 0.5, and the coefficient of determination is less than 25% (Table 11).

In Fig. 59b, a positive correlation is observed - with an increase in the number of attacks on political targets, the number of refugees also increases. The trend is more pronounced, although the red line shows significant fluctuations. The confidence interval expands at the edges of the graph, which indicates lower reliability of the forecast at extreme values—the linear relationship of average strength. The modulus correlation coefficient is less than 0.7 but more than 0.5, the coefficient of determination is less than 50% but more than 25%.

The graph in Fig. 60a shows a negative correlation - with an increase in the number of deaths, a decrease in the number of refugees is observed. Sharp fluctuations are especially noticeable at the beginning of the graph, which then smooths out. The confidence interval expands significantly with an increase in the number of cases—weak linear relationship. The modulus correlation coefficient is < 0.5, the coefficient of determination is less than 25%.

In Fig. 60b, a positive correlation is observed - the increase in the number of deaths correlates with the rise in the number of refugees. The trend is relatively stable, although the red line shows periodic fluctuations. The confidence interval remains relatively narrow in the middle part of the graph, which indicates a greater reliability of the forecast in this range—weak linear relationship. The correlation coefficient of the modulus < 0.5, the coefficient of determination is less than 25%.

The correlation ratio, according to Fig. 61a and Table 12, is 0.803. This value indicates a strong relationship between the variables. 80.3% of the variation in the dependent variable (number of refugees) can be explained by the shelling of civilian targets. It indicates a significant impact of the shelling of civilian targets on migration processes. The correlation coefficient, according to Fig. 61b and Table 12, is 0.753. The indicator demonstrates a strong connection between the shelling of political targets and the number of refugees. The shelling of political targets explains 75.3% of the variation in the number of refugees. This indicates a significant, although somewhat smaller compared to the first case, impact of political shelling.

The correlation coefficient, according to Fig. 62a and Table 12, is 0.575. This value indicates a moderate relationship between fatalities from the shelling of civilian targets and the number of refugees. This factor can explain 57.5% of the variation. The relationship is less pronounced compared to previous indicators.

The correlation ratio, according to Fig. 62b and Table 12, is 0.844. The highest value of the correlation ratio among all graphs indicates a powerful relationship between fatalities from the shelling of political targets and the number of refugees. This factor can explain 84.4% of the variation in the number of refugees. It indicates the most significant impact of this indicator on migration processes. In Fig. 63, a strong positive correlation (0.793) is observed between total events (Events_Mean_Smoothed) and the number of refugees (NoRefugees_Mean_Smoothed), indicating that an increase in conflict events leads to a rise in the number of refugees. There is a robust positive correlation (0.814) between civilian events (Civilian_Events_Mean_Smoothed) and civilian fatalities (Civilian_Fatalities_Mean_Smoothed), which logically reflects a direct relationship between incidents and their consequences. There is a moderate negative correlation (-0.428) between total events and civilian events, which may indicate that not all conflict events are directly related to civilians. There is a strong negative correlation (-0.731) between total events and civilian fatalities, which may indicate that a significant proportion of events do not result in civilian casualties. It is noteworthy that the number of refugees has a negative correlation (-0.748) with civilian casualties, which may indicate that timely evacuation of the population (refugees) reduces the number of civilian casualties.

Overall, these correlations demonstrate a complex interdependence between different aspects of a conflict situation, where an increase in the total number of events leads to an increase in the number of refugees but not necessarily to the rise in civilian casualties, perhaps due to population evacuation. In Fig. 64, a very strong positive autocorrelation is observed for slight lags (0-3 days), where the coefficients exceed 0.9, which indicates a high inertia of the process in the short term. It means that the number of refugees on a given day is very strongly related to the number in the previous 1-3 days. With an increase in the time lag (from 4 to 10 days), a gradual decrease in the strength of the autocorrelation is observed - from 0.85 to 0.624, which indicates a weakening of the connection between observations with a larger time gap. A smooth, almost linear decrease in autocorrelation without sharp jumps indicates a stable nature of the migration process without sudden changes in trends. Even with a lag of 10 days, the autocorrelation remains moderately high (0.624), which indicates the presence of long-term trends in the migration process and the relative predictability of the dynamics of the number of refugees.

In general, this nature of the autocorrelation function is typical for mass migration processes. It indicates that changes in the number of refugees occur gradually, without sharp fluctuations. The current situation strongly depends on previous days, which is essential to consider when planning humanitarian assistance and developing appropriate policies.

The autocorrelation in Fig. 65 for both victims and events starts at a very high level (around 1.0) and gradually decreases over the 10 months. The correlation remains significant throughout the period, with events (blue) having a consistently higher autocorrelation than victims (red). It suggests a systematic and persistent nature for both events and victims, with events showing more predictable dynamics over time.

In Fig. 66, the autocorrelation pattern is noticeably different. Although both metrics start at high values, the correlation of events (blue) drops much faster and becomes very weak after 5 months. At the same time, the number of victims (red) maintains a moderately strong correlation throughout the period. It suggests that attacks on political infrastructure are more random or situational, while the number of victims resulting from them maintains a more consistent pattern over time.

4.3. Hierarchical agglomerative cluster analysis of multidimensional data

The most significant number of shellings was recorded in Kharkiv, Kherson and Donetsk regions (Fig. 67). This can be seen from the values of “Amount” and “Maximum”, which are the highest for these regions. Also, these regions have high average values (“Average”). Some regions, such as Volyn, Zakarpattia and Rivne, have the minimum number of recorded shellings. It is reflected by zero or minimum values in most columns. Significant deviations from the average (“Stand From”) indicate an uneven distribution of shellings during the observation period. For example, Kyiv has a high standard deviation, which means periods of intense bombardment alternating with periods of relative calm. The “Median” and “Mode” indicators are often equal to 1, which indicates that, most often, one shelling was recorded during a specific period. However, for some regions, such as Kharkiv, Kherson, and Donetsk, these figures are higher, confirming the greater intensity of shelling in these regions.

Kurdi- Asymmetry Range Min Max Amount Observashness tions

Eastern and southern regions were most affected: Donetsk, Kharkiv, Zaporizhia, Sumy, Kherson, and Luhansk regions experienced the highest number of shelling (Fig. 68). Uneven distribution of shelling: The intensity of shelling fluctuated significantly, as evidenced by the high standard deviation for many regions. Frequency of shelling: Most often, one shelling was recorded during the observation period (Median and Mode often = 1).

Period of mass departure from Fig. 69 - April-September 2022 - the largest outflow, rapid growth in the number of refugees. High volatility of data at the beginning. Stabilization: October 2022 January 2023 - the number of refugees remained relatively stable at a high level. Return: Since February 2023, there has been a trend towards the return of refugees to Ukraine. The number is gradually decreasing. High reliability: The data was fairly reliable throughout the period. A mass exodus of Ukrainians abroad characterized the first months of the war. Over time, the situation stabilized, and since the beginning of 2023, there has been a trend towards return. The data is generally reliable.

Kurdi- Asymmetry Range Min Max Amount Observashness tions

The most significant deviation from the average (more shelling), according to Table 70, is Kherson (3.679), Kharkiv (1.953), and Donetsk (1.259) regions. It confirms that these regions experienced significantly more shelling than the average for Ukraine. Close to the average: Dnipropetrovsk (0.139), Kyiv (0.415), Zaporizhia (0.416), Sumy (0.403). The most significant deviation from the average (fewer shelling): Rivne (-0.637), Volyn (-0.637), and Zakarpattia (-0.637) regions. These regions experienced significantly less shelling than the average.

Kurdi- Asymmetry Range Min Max Amount Observashness tions

The most significant deviation from the average (more attacks on political infrastructure) according to Table 71 are Donetsk (3,061), Sumy (1,782), Kharkiv (1,534), Zaporizhia (1,458), Kherson (1,344) regions. These regions experienced significantly more attacks on political infrastructure than the average for Ukraine. Close to the average: Many areas have values close to zero, indicating that the number of attacks on political infrastructure is close to the average for the country. The most significant deviation from the average (fewer attacks on political infrastructure): Most western regions, as well as some central ones, such as Chernihiv, have negative values, indicating that the number of attacks on political infrastructure is lower than the average. The most significant outflows (significantly above average), according to Table 72, are mainly in the first months after the start of the full-scale invasion: April (-3.257), May (-2.684), September (0.824), October (1.029), November (1.111), December (1.085) 2022. April and May stand out in particular with tremendous negative values, indicating a sharp jump in the number of refugees immediately after the start of the war. The positive values from September to December show that the number of refugees remained significantly above average throughout the fall and early winter of 2022. Gradual stabilization and decline (close to average or below): Since the beginning of 2023, the "Average" values have been closer to 0, and since June 2023, they have been primarily negative, indicating a gradual return of refugees and a decrease in their number abroad relative to the average for the entire period.

According to Fig. 73, the Kharkiv region is most similar to Donetsk (3.44), Sumy (4.26), and Kherson (4.26). It confirms that these regions, which are located in the east and south of Ukraine, experienced similar intensity and nature of shelling. Kherson region: Most identical to Kharkiv (4.26), Donetsk (6.68) and Mykolaiv (7.31). Again, these are regions that are relatively close to each other and experience intense shelling. Donetsk region: Most similar to Kharkiv (3.44) and Luhansk (6.51). These are neighbouring regions that were the epicentre of hostilities. Sumy region: Most identical to Kharkiv (4.26) and Chernihiv (3.83). Western regions (Zakarpattia, Volyn, Rivne, Ternopil, IvanoFrankivsk, Chernivtsi) show the highest values, with most of the eastern and southern regions. This means that the nature of shelling in the western regions was significantly different from that of shelling in the east and south, which is reasonably expected, given the geographical location and intensity of hostilities in other regions.

According to Fig. 74, the Kharkiv region has the most significant similarity with Donetsk (3.46), Sumy (7.29) and Luhansk (3.82). The similarity with Donetsk remains very high, which is expected since these regions were on the front line. However, unlike the shelling of civilian infrastructure, the similarity with the Kherson region is lower here. Kherson region: The most significant similarity with Mykolaiv (3.00), Zaporizhia (3.37) and Dnipropetrovsk (2.44). Shifting emphasis to southern regions may indicate a different nature of attacks on political infrastructure in this region. Donetsk region: Most similar to Kharkiv (3.46) and Luhansk (4.09). As in the previous analysis, similarities with neighbouring regions remain high. Sumy region: Most identical to Chernihiv (2.02) and Kharkiv (7.29). Western regions (Zakarpattia, Volyn, Rivne, Ternopil, Ivano-Frankivsk, Chernivtsi) again show the highest values, with most of the eastern and southern regions. It confirms that the nature of attacks on political infrastructure in the western regions was significantly different from the nature of attacks in the east and south.

According to Fig. 75, The first months after the start of the full-scale invasion (April-June 2022): April and May 2022 show a relatively high closeness value (9.45), which indicates a similarity of dynamics during this period (rapid growth in the number of refugees). June 2022 shows less similarity with these months, which may indicate the beginning of changes in dynamics. Summerautumn 2022 (July-November): July, August, September, October and November 2022 show relatively low closeness values to each other, which indicates a similar dynamic during this period (relative stabilization and further growth). The similarity between July and August (1.28), as well as between September, October and November (values around 1-2), is especially noticeable. Winter 2022 - Spring 2023 (December 2022 - May 2023): This period is characterized by greater variability. December 2022 shows relatively low similarity with previous months, which may indicate the beginning of a new phase. Starting from January 2023, there is a tendency for the proximity values between neighbouring months to increase, although with some fluctuations. Summer 2023 - Spring 2024 (June 2023 - March 2024): Starting from June 2023, the proximity values decrease again, which indicates the formation of a new trend, different from the previous one. July and August 2023 have the lowest value (0), which indicates the identity of the dynamics.

We use the nearest neighbour strategy to perform agglomerative hierarchical cluster analysis (Fig. 76). The distance between two groups is defined as the distance between the two nearest elements of these groups. This strategy is monotonic and firmly compresses the feature space, and its parameters are = = 0.5, = 0, = −0.5 .

First steps (1-10) for dataset 1 (Fig. 76a): Mergers occur at relatively small metric values (from 0.014 to 0.312). It means that at the beginning of the algorithm, regions with very similar shelling patterns are merged. Middle steps (11-17): Metric values begin to increase (from 0.361 to 1.553). It means that less similar clusters are merged. Last steps (18-25): A significant increase in metric values is observed (from 1.652 to 7.289). It indicates the merging of large and relatively heterogeneous clusters. The last step (25) stands out in particular, where the metric value reaches 7.289. It means that at this step, two large clusters with very different natures of shelling were combined, which actually completes the process of hierarchical clustering (Fig. 77).

First steps (1-10) for dataset 2 (Fig. 76b): Mergers occur at relatively small metric values (from 0.09 to 0.446). It indicates that at the beginning of the algorithm, regions with very similar patterns of shelling of political infrastructure are merged. Middle steps (11-17): Metric values gradually increase (from 0.48 to 1.167). It means that less similar clusters or individual regions with clusters are merged. The growth rate of the metric here is smoother than in the analysis of the shelling of civilian infrastructure. Last steps (18-25): A faster growth of metric values is observed (from 1.233 to 8.451). It indicates the merging of large and relatively heterogeneous clusters. As in the previous case, the last step (25) is characterized by a tremendous metric value (8.451), which indicates the combination of two large clusters with the most different nature of attacks on the political infrastructure (Fig. 78).

First steps (1-10) for dataset 3 (Fig. 76c): Mergers occur at relatively small metric values (from 0.591 to 1.097). It means that at the beginning of the algorithm, months with very similar trends in the number of refugees are merged. Middle steps (11-20): Metric values gradually increase (from 1.169 to 2.011). It means that clusters with clusters that are less similar or individual months with clusters are merged. Last steps (21-23): A sharp increase in metric values is observed (from 6.345 to 9.788). It indicates the merging of large and very heterogeneous clusters. The last two steps stand out in particular, where metric values become very large. It means that clusters that differ significantly in the dynamics of the number of refugees were merged at these steps (Fig. 79).

5. Conclusions

Based on the multi-stage comprehensive study of the relationship between russian aggression and migration processes in Ukraine, which included statistical analysis, time series and cluster analysis, the following solid conclusions can be drawn:

1. Methodological aspects of the study were based on the integrated application of various analysis methods:

The use of the R programming language for statistical data processing demonstrated high efficiency due to its powerful data processing and visualization capabilities; The use of various time series smoothing methods (Kendall, Pollard, and exponential smoothing) allowed us to identify fundamental trends and patterns;  The sequential smoothing method (method B) showed better results compared to direct smoothing (method A), providing smoother curves and better preservation of long-term trends;  Hierarchical agglomerative cluster analysis effectively revealed hidden patterns in the data and allowed us to group areas by the similarity of situation; 2. The study of attacks on civil infrastructure showed a clear evolution of the intensity of attacks:  Until 2022, a relatively low and stable level of incidents was observed (20-40 events per month);  A sharp surge in activity in 2022 to 700-800 incidents;  Further stabilization at the level of 200-400 events per month;  A clear geographical pattern was identified: the eastern and southern regions of Ukraine (Kharkiv, Kherson, Donetsk) suffered the most significant number of attacks;  Cluster analysis showed the formation of stable groups of regions with similar patterns of attacks; 3. The characteristics of attacks on political targets demonstrate distinct dynamics:  A stable level of 1,000-1,500 cases per month in 2018-2022;  A dramatic increase to 5,000 cases per month in 2022;  Stabilization at a high level of 4000-4500 cases;  More intense attacks compared to civilian infrastructure;  Donetsk, Sumy, Kharkiv, Zaporizhia and Kherson regions formed the core of the most affected areas; 4. Research on refugee dynamics revealed a clear structure of migration processes:  Rapid growth from almost zero to 4.5 million in a short period in early 2022;  Peaking at around 8 million;  Two notable “stepped” declines to 6.5 and 6 million;  Relative stabilization at around 6 million;  Three key periods are identified: a. Initial period (April-May 2022) with a massive outflow of population; b. Stabilization period (summer-autumn 2022); c. Period of gradual return (from early 2023); 5. A complex system of correlations between different aspects of the conflict has been identified:  There is a strong relationship between the intensity of attacks and the growth of the number of refugees (correlation 0.793);  The powerful impact of shelling of civilian infrastructure (correlation ratio 0.803);  Attacks on political infrastructure show a significant impact (correlation ratio 0.753);  The highest correlation ratio (0.844) between fatalities from attacks on political targets and the number of refugees;  Negative correlation (-0.748) between the number of refugees and civilian casualties; 6. The analysis of time characteristics revealed:  High inertia of migration processes, especially in short-term periods (0-3 days);  The gradual decrease in the strength of autocorrelation over time, but maintaining significance even with a lag of 10 days;  More predictable dynamics of shelling of civilian infrastructure compared to attacks on political objects;  A clear change in the nature of all studied indicators with the beginning of a full-scale invasion; 7. The practical significance of the research results has wide practical application:  Forecasting migration flows and planning humanitarian aid;  Risk assessment for different regions of Ukraine;  Planning civil protection measures;  Development of strategies for the restoration of affected territories;  Optimization of the distribution of humanitarian aid;  Documentation of russian war crimes;  Improvement of early warning systems for threats; 8. The following limitations of the study were identified:  Potential delay in data registration;  Difficulty in taking into account all factors influencing migration;  Limitations of statistical methods in the analysis of extreme events;  Potential delay in data registration;  The analysis is limited by available periods; 9. Directions and prospects for further future research were identified:  Expansion of the period of analysis;  Inclusion of additional influencing factors;  Development of predictive models;  Detailing regional features;  Improvement of data analysis methods;  Application of other cluster analysis methods;  In-depth analysis of cause-and-effect relationships.

The conducted research convincingly demonstrates the systemic nature of russian aggression aimed at destroying both the civilian and political infrastructure of Ukraine, which led to large-scale forced migrations of the population. Clear patterns and relationships between the intensity of military actions and the scale of migration were identified, which is of critical importance for understanding the nature of the conflict and its impact on the population. The use of a set of statistical methods made it possible to identify hidden patterns and trends that can be used to predict and plan a humanitarian response. Of particular importance is the identification of different dynamics of attacks on civilian and political targets, which is of fundamental importance for understanding the aggressor's strategy and developing effective protective measures. The results of the research create a methodological basis for further analysis and forecasting the development of the situation. It can also be used both for scientific purposes and for practical planning of humanitarian assistance and management of migration processes.

Acknowledgements

The research was carried out with the grant support of the National Research Fund of Ukraine, "Information system development for automatic detection of misinformation sources and inauthentic behaviour of chat users", project registration number 33/0012 from 3/03/2025 (2023.04/0012). Also, we would like to thank the reviewers for their precise and concise recommendations that improved the presentation of the results obtained. The authors have not employed any Generative AI tools.

References