<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>International Workshop of IT-professionals on Artificial Intelligence, October</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Using Machine Learning Methods to Analyze HIV Incidence in Ukraine</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Yurii Parfeniuk</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dmytro Kurinniy</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kseniia Bazilevych</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ievgen Meniailov</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>National Aerospace University “Kharkiv Aviation Institute”</institution>
          ,
          <addr-line>Kharkiv</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>V.N. Karazin Kharkiv National University</institution>
          ,
          <addr-line>Kharkiv</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <volume>1</volume>
      <fpage>5</fpage>
      <lpage>17</lpage>
      <abstract>
        <p>HIV remains a persistent public health issue in Ukraine, with complex socio-economic and geopolitical factors influencing its incidence. This study investigates the application of machine learning techniques to analyze and predict HIV incidence trends across various regions of Ukraine. Utilizing publicly available epidemiological and demographic data, we apply different methods-including decision trees, random forests, logistic regression, and clustering algorithms-to identify key risk factors and uncover spatial and temporal patterns in HIV transmission. The results demonstrate that machine learning models can improve the accuracy of HIV incidence predictions and support data-driven decision-making for public health interventions. The study highlights the potential of machine learning tools to enhance disease surveillance and inform targeted prevention strategies in Ukraine's evolving healthcare landscape.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;machine learning</kwd>
        <kwd>epidemiological data</kwd>
        <kwd>epidemic surveillance</kwd>
        <kwd>infectious diseases 1</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>HIV is a chronic infectious disease caused by the human immunodeficiency virus. It affects the
immune system, gradually reducing its ability to resist infections and diseases. HIV is transmitted
through blood, sexual contact, and from mother to child during pregnancy, childbirth, or
breastfeeding. It is one of the most serious viral diseases known to humanity since the late 20th
century.</p>
      <p>
        Globally, HIV is one of the leading causes of death from infectious diseases [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], ranking
alongside tuberculosis. According to the World Health Organization (WHO), around 650,000 people
die each year from HIV-related illnesses [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. This makes HIV/AIDS one of the most significant
public health challenges. The problem is further complicated by the fact that HIV is not only a
severe infection but also an issue of access to treatment. According to WHO, about 38 million
people worldwide live with HIV, but only slightly more than half of them receive antiretroviral
therapy (ART), which is essential for controlling the virus.
      </p>
      <p>
        Lack of access to treatment leads to high mortality rates – without ART, the life expectancy
after an AIDS diagnosis ranges from several months to three years [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. The situation in Ukraine
remains challenging. Although a national HIV/AIDS control program has been established, the
prevalence of the disease remains high. In recent years, there has been a slight decrease in the
number of new infections; however, the issue is still relevant. One of the main obstacles in the fight
against HIV is late diagnosis, insufficient population coverage with testing, and a low level of
awareness regarding preventive measures.
      </p>
      <p>
        In 2010, the HIV prevalence rate in Ukraine was 0.9% among the adult population, which was
one of the highest rates in Eastern Europe. However, the HIV/AIDS-related mortality rate remained
high, reaching 10.2 cases per 100,000 population [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. The most vulnerable groups continue to be
young people, individuals who use injectable drugs and sex workers, among whom the infection
rate can exceed 20%.
      </p>
      <p>
        Despite certain improvements, Ukraine continues to face serious challenges in combating HIV.
The main issues include insufficient public awareness about the modes of virus transmission, late
diagnosis, and inadequate coverage with antiretroviral therapy. In order to reduce the prevalence
of HIV, the government has developed a national strategy to combat the epidemic, which includes
improving access to testing and treatment. In addition, Ukraine has submitted an application to the
Global Fund to Fight AIDS, Tuberculosis and Malaria for approximately 100 million US dollars,
which will be used for HIV prevention and treatment [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Thus, the fight against HIV, especially
among vulnerable population groups, is a key public health priority at both the national and
international levels.
      </p>
      <p>
        A modern approach to data analysis and forecasting the spread of infection can play a
significant role in the fight against HIV. Machine learning (ML) has become a critically important
tool in the field of healthcare [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], enabling the analysis of large volumes of data and the
identification of hidden patterns that are essential for predicting the epidemiological situation. For
example, methods such as decision trees, random forest, support vector machines (SVM), and
neural networks demonstrate high effectiveness in predicting the spread of HIV by analyzing
complex factors, including behavioral aspects, demographic data, and the level of accessibility of
medical services [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
      </p>
      <p>
        Machine learning methods make it possible not only to forecast the spread of HIV infection but
also to identify the most vulnerable regions and social groups, which allows for more effective
allocation of medical resources and the implementation of targeted preventive measures [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
      </p>
      <p>The main objective of the study is to identify and implement methods that enable the analysis of
HIV incidence and the identification of areas of HIV spread in Ukraine using machine learning
techniques.</p>
      <p>Object of the study: the process of HIV infection spread in Ukraine. Subject of the study: the use
of machine learning methods for studying HIV incidence in Ukraine.</p>
      <p>To achieve this objective, it was necessary to address the following tasks:
 analyze the epidemiological situation regarding HIV infection in Ukraine;
 conduct an analytical review of machine learning methods that can be used to study
incidence and identify areas of HIV spread;
 develop algorithmic models to solve the research tasks;
 develop software to implement the research tasks;
 evaluate the results obtained.</p>
      <p>
        The current research is part of a comprehensive information system for assessing the impact of
emergencies on the spread of infectious diseases described in [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]
      </p>
      <sec id="sec-1-1">
        <title>2. Development of a software application for analyzing HIV incidence in Ukraine using machine learning methods</title>
        <p>
          The study utilized five algorithmic models that formed the basis of the software: an algorithmic
model for smoothing time series based on the moving average method («Moving average»); an
algorithmic model based on the K-Nearest Neighbors method («K-Nearest Neighbors»);
algorithmic model based on the method of «Random Forest»; algorithmic model based on ensemble
methods («Ensemble»); algorithmic clustering model based on the method K-means. To evaluate
the effectiveness of the models, key accuracy metrics were calculated: MAE (Mean Absolute Error);
RMSE (Root Mean Squared Error); R² (Coefficient of Determination). The study utilized an open
dataset from the Public Health Center [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]. The user specifies the year for analysis, after which the
following processes are performed: data preprocessing, including missing data removal and






normalization of indicators (for example, the number of new cases, testing coverage percentage,
ART therapy); clustering of Ukrainian regions using the K-means algorithm, which allows
grouping of regions based on similar epidemiological characteristics. To assess the quality of
clustering, the purity metric was used — it determines how well a cluster corresponds to known
groups (for example, by geography or infection zones) and ranges from 0 to 1. Determination of the
optimal number of clusters is performed using the "elbow" method, which ensures a balance
between accuracy and generalization of the models. HIV spread forecasting is carried out using the
Random Forest model, which provides high accuracy due to the use of many independent decision
trees. The obtained clustering results are presented as a color-coded map of Ukraine, where each
region is highlighted according to its cluster. This allows for a visual assessment of:



territorial concentration of morbidity;
regions similar in prevalence levels;
zones of epidemic risk.
        </p>
        <p>In addition, the graphs display actual and predicted values of new HIV cases by years and
regions, providing convenient data visualization and facilitating decision-making. The aim of this
stage was to study the current state of the epidemic situation, the dynamics of HIV infection spread
by regions, and the identification of areas with increased risks. The analysis included:
collection and systematization of statistical data for the years 2019–2024;
visualization of HIV prevalence on the country map;
identification of regional differences in morbidity levels.</p>
        <sec id="sec-1-1-1">
          <title>Forecasting morbidity included:</title>
          <p> preprocessing and normalization of time series data (including the moving average
method using);
 splitting data into training and testing datasets;
 application of machine learning algorithms to build forecasting models (linear regression,
Random Forest, k-means, gradient boosting);
 evaluation of the accuracy of the constructed models using relevant metrics.</p>
        </sec>
        <sec id="sec-1-1-2">
          <title>The forecasting focused on the following specific indicators:</title>
          <p>the expected level of HIV incidence in future periods (based on data from previous years);
identification of regions with potential increases or decreases in incidence;
detection of patterns and trends that can be used for developing preventive measures.</p>
          <p>Cluster analysis was applied for grouping of Ukrainian regions based on the similarity of the
epidemiological situation (HIV prevalence rate, growth dynamics, socio-economic characteristics,
etc.):</p>
          <p> identification of typical infection spread profiles, allowing for a more targeted approach to
planning countermeasures;</p>
          <p> improvement of visual and statistical understanding of the data prior to building
forecasting models.</p>
          <p>Figure 1 presents the overall scheme of the process for building and using the information
model for analyzing HIV morbidity, which addresses the tasks described above.</p>
          <p>The diagram covers all key stages - from data loading and processing to task selection, model
training, its validation, and final evaluation. The central element is the block " Task selection",
which integrates functional modules such as Multi-Year Overview, Enhanced Clustering, Spread
Analysis, War Impact Analysis. These modules represent specific directions of data analysis. Next,
parameter tuning, model training, and quality verification take place with the possibility of
iterative optimization. Upon achieving acceptable accuracy, the final model evaluation is
performed.</p>
          <p>Input data for the analysis and forecasting of HIV infection incidence are based on official
statistics regarding the number of individuals newly diagnosed with HIV, as well as related medical
and social indicators. The data cover quarterly or annual statistics over several years, allowing for
the analysis of infection spread trends and the construction of forecasting models based on
machine learning methods.</p>
          <p>Main characteristics:
1. year and quarters – data are structured by years and quarters, allowing consideration of
seasonal fluctuations and analysis of the dynamics of new HIV infection cases.
2. number of new HIV cases – for each quarter or year, the number of new infection cases is
indicated. This is the primary indicator for analyzing epidemiological dynamics.



medical indicators – additional parameters are taken into account, such as:
the level of HIV testing coverage among the general population and key groups,
the percentage coverage of antiretroviral therapy,
the number of late diagnoses (CD4).</p>
          <p>Regional data – statistics are provided for each oblast of Ukraine (except territories where official
statistics are unavailable, Luhansk, parts of Donetsk, and the Autonomous Republic of Crimea).
Data on HIV prevalence in Ukraine were obtained from official materials of the State Institution
“Public Health Center of the Ministry of Health of Ukraine.” This allows for cluster analysis of
regions to identify territories with the highest risks.</p>
          <p>The input data are a critical element for evaluating the effectiveness of the machine learning
models applied in the process of forecasting HIV incidence. After completing all stages of data
processing and training, the modeling results allow not only for making predictions but also for
quantitatively assessing their accuracy in relation to real data (Table 1).</p>
          <p>As shown in Table 1, the Ensemble Methods-based model demonstrated the lowest MAE (16.0)
and RMSE (26.5), indicating its highest accuracy among the tested algorithms. Since MAE and
RMSE are absolute error metrics, they represent the average deviation of the forecast from the
actual values in the same units as the target variable — namely, the number of HIV cases. This
means that, for example, when using Random Forest, the average error is 34 cases, whereas with
Ensemble Methods it is only 16 cases. RMSE, which is more sensitive to large deviations, also
confirms the advantage of ensemble methods. Thus, models with lower MAE and RMSE values
better reflect the actual dynamics of incidence and can be recommended for practical use in
forecasting the spread of HIV</p>
        </sec>
      </sec>
      <sec id="sec-1-2">
        <title>3. Analysis of the obtained results</title>
        <p>During the study, statistical indicators of HIV incidence across the regions of Ukraine for the
period 2019–2024 were analyzed. Based on these data, the dynamics of key indicators were
constructed: the number of new cases, testing coverage, the number of patients with CD4 counts
below 350 and others.</p>
        <p>Identified trends. Regions with high and stable incidence: Odessa, Dnipropetrovsk, and
Mykolaiv regions demonstrated consistent growth in indicators, indicating a sustained high level of
risk. These regions also experienced a high burden on the healthcare system. Regions with
declining incidence: Khmelnytskyi, Ternopil, and Zakarpattia regions recorded a gradual decrease
in the number of new cases. This may indicate the effectiveness of preventive measures or
improvements in the testing system. Regions with unstable dynamics: Poltava, Cherkasy, and
Kharkiv regions showed irregular data patterns, complicating the interpretation of results and
requiring additional monitoring.</p>
        <p>Results of cluster analysis - using the K-Means method, several clusters of regions were
identified:</p>
        <p> Cluster 1: regions with a high level of new cases and active testing – Dnipropetrovsk,
Odesa, Kyiv.</p>
        <p> Cluster 2: regions with a medium level of prevalence and stable dynamics – Vinnytsia,
Lviv, Zaporizhzhia.</p>
        <p> Cluster 3: regions with a low detection rate and probable undercoverage of the population
– Chernivtsi, Rivne, Volyn.</p>
        <p>The obtained results allow us to assert that the regional approach to the prevention and
treatment of HIV infection should be differentiated. Regions with a high burden require intensive
support, whereas in regions with low indicators, it is important to ensure the accuracy and
completeness of reporting.</p>
        <p>Based on the clustering maps for 2019 and 2023, significant changes in the cluster structure of
certain regions can be observed, in particular: Donetsk region in 2021 still belonged to cluster 2, but
in 2022 and 2023 it shifted to cluster 1 (Figure 2), which may indicate a decrease in officially
registered cases or problems with data access due to the armed conflict. Kherson region shows a
similar dynamic, changing clusters from 2 to 1 (Figure 2). This may also be related to the temporary
occupation of part of the territory, changes in the reporting system, or a decrease in case detection
due to limited access to medical services.</p>
        <p>These changes indicate that the cluster structure is not static and is sensitive to socio-economic
and political changes. Therefore, dynamic cluster analysis is an important element of
epidemiological monitoring. Within the scope of this study, the epidemiological development of
HIV infection was forecasted using machine learning methods.</p>
        <p>
          Let’s analyze an example of comparison statistical data and data obtained through clustering
using k-means method in 2023. Based on the available statistics [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ], it can be observed that in this
year the highest incidence rates are recorded in Odesa (78.8), and Dnipropetrovsk (42.1) regions.
The visualization of the clustering results provides a generalized representation of these data by
grouping regions with similar levels of HIV incidence into clusters (Figure 2). Cluster One: includes
the majority of Ukrainian regions with a relatively low level of HIV incidence. This cluster covers a
significant part of northern, central, and western Ukraine. Cluster Two: includes regions with a
higher level of HIV incidence, mainly located in the eastern part of the country. Cluster Three:
corresponds to the regions with the highest level of HIV incidence (Dnipropetrovsk and Odesa),
which stand out according to statistical data. These regions form a separate cluster, indicating the
severity of the HIV problem in these areas.
        </p>
        <p>The clustering map simplifies the understanding of the geographical distribution of HIV, as it
replaces the need to analyze individual figures for each region with a visual representation of
regional trends.The modeling approaches employed included linear regression (Figure 3), the
Random Forest algorithm (Figure 4), and ensemble methods, notably gradient boosting.</p>
        <sec id="sec-1-2-1">
          <title>The models were trained on preprocessed time series data.</title>
          <p>The resulting forecasts enabled the estimation of the number of new infection cases (Figure 4),
the prevalence of HIV across different regions, as well as the growth rates of incidence (Figure 5).</p>
          <p>The Ensemble model demonstrated several advantages on this dataset, including the highest
accuracy among all tested models, as shown in Figure 6 (R² = 0.997, MAE = 3.33). This model
combines the strengths of multiple algorithms, which reduces the errors inherent to individual
models and ensures greater stability of the results. Consequently, it is less sensitive to variations in
the data or noise, making it a robust choice for predicting HIV incidence.</p>
          <p>For a deeper understanding of the changes in the epidemic situation in Ukraine, an analysis of
the dynamics of HIV spread during the period 2015–2024 was conducted. The graphical
representation of the data made it possible to identify both long-term trends of increasing or
decreasing incidence, as well as short-term anomalies that may indicate the influence of external
factors or changes in reporting practices.</p>
          <p>To identify HIV infection spread zones in Ukraine based on epidemiological similarity, the
Kmeans clustering algorithm was applied (Figure 7). During the analysis, the selection of three
clusters was justified (Figure 8), representing regions with high, medium, and low levels of HIV
prevalence. This approach enabled the structuring of data and identification of typical infection
spread profiles, which is beneficial for regional planning of prevention and control measures.</p>
          <p>The results obtained demonstrate a high degree of consistency between the predicted and actual
values, indicating a strong quality of the modeling process.</p>
          <p>The collected statistical data for the period 2019–2024 were structured and analyzed, followed
by the construction of a geographic visualization (Figure 9). The visual map enables prompt
identification of regions with the highest incidence rates, which is critical for assessing territorial
risk and strategic planning of healthcare interventions (Figure 10, example from 2024).</p>
          <p>In the study also a statistical analysis was conducted in relation to emergency situations,
specifically examining the impact of the COVID-19 pandemic and the war of 2022. This
functionality was implemented programmatically, with the results presented in the form of a
comprehensive analytical report. The analysis covered data from 2019 to 2024 and focused on the
differences between the pre-war period (2019–2021) and the wartime period (2022–2024).</p>
          <p>It was established that until 2020, the annual number of HIV antibody screening tests in Ukraine
remained stable, ranging from 2.3 to 2.5 million. However, due to quarantine restrictions associated
with the COVID-19 pandemic and as a consequence of the war in 2022, the number of tests
dropped to a record low of 1.6 million. In 2023, the volume of examinations increased by 40%,
reaching 2.25 million, primarily owing to a 70% rise in testing initiated by healthcare professionals
and patients. Accordingly, in 2023, the HIV antibody testing rate per 100,000 population increased
1.6-fold compared with 2022. Between 2019 and 2023, the overall HIV prevalence decreased from
0.9% to 0.6%. This decline can be attributed to the exclusion, starting in 2022, of HIV/AIDS
statistical data from the Donetsk, Luhansk, Zaporizhzhia, and Kherson regions, which were
previously known for high HIV prevalence.</p>
          <p>From 2019 to 2023, a marked decrease in the HIV incidence rate was observed against the
backdrop of the COVID-19 pandemic and the war of 2022, falling from 42.5 to 28.4 per 100,000
population. Traditionally, the highest HIV incidence per 100,000 population has been recorded in
the south-eastern region of Ukraine. In 2023, the highest rates were registered in Dnipropetrovsk
and Odesa regions. Active migration flows in 2022–2023 contributed to a 1.5–2-fold increase in HIV
incidence in the western and central regions of Ukraine. For example, in Volyn region, the rate rose
from 9.7 to 16.6 per 100,000 population; in Zakarpattia, from 5.9 to 9.0; in Lviv, from 14.6 to 21.2; in
Khmelnytskyi, from 11.0 to 15.7; in Chernivtsi, from 7.1 to 10.4; and in Chernihiv, from 28.4 to 38.6.
In Kyiv, the indicator increased from 29.5 to 36.8.</p>
          <p>It was also found that at the current stage of the HIV epidemic, as in previous stages, the
majority of HIV-positive individuals reside in urban areas (77% of new HIV cases in 2023). Among
them, 65% are men, and 78% belong to the 25–49 age group. It is noteworthy that the epidemic is
aging: over the past five years, the proportion of individuals first diagnosed with HIV at age 50 or
older has increased from 16% to 19%.</p>
          <p>During the war, with the support of international organizations, it was possible to rapidly
organize the provision of preventive and therapeutic HIV services in healthcare facilities of various
profiles. This led to the continuation, in 2022–2023, of the pre-COVID trend in HIV transmission
routes: the proportion of sexually transmitted cases increased (from 68.3% to 74.6%), while
parenteral transmission through injecting drug use decreased (from 31.3% to 25.4%).</p>
          <p>The data demonstrate a gradual decline in the share of parenteral transmission (through
injecting drug use) and an increasing role of sexual transmission. This indicates a transformation of
the epidemic: from concentration within key risk groups to wider dissemination in the general
population, necessitating a reorientation of prevention strategies.</p>
          <p>Conclusions based on the results of the performed calculations:
Preliminary data preprocessing revealed no missing values in the dataset. The data were scaled,
checked for anomalies, and 95% of the useful information was retained after cleaning.</p>
          <p>An initial clustering was performed using the K-means method, where the optimal number of
clusters was determined by the elbow method. The clusters were successfully visualized on the
map of Ukraine for different years.</p>
          <p>For forecasting disease incidence, the Random Forest method was employed, achieving high
accuracy with a mean absolute error of approximately 12.6, root mean squared error of about 15.4,
and R² = 0.937. The model was trained in approximately 0.4 seconds. Additionally, the K-Nearest
Neighbors method was tested, which demonstrated lower accuracy on the test set with an R² of
0.82. To smooth the time series data, a moving average method was applied, which reduced noise
influence and improved forecast stability. The visualization of results was carried out using an
interactive map of Ukraine, where each region was automatically colored according to the risk
level or cluster membership. It can be observed that HIV incidence in Ukraine exhibits a distinct
geographic distribution. The highest concentration of cases is found in the eastern and southern
regions, whereas most oblasts in central and western Ukraine demonstrate lower incidence rates.
Given the varying levels of HIV incidence across different clusters, a differentiated approach is
necessary for the development and implementation of prevention programs. For instance, in
regions belonging to the third cluster (green), more intensive measures focused on prevention,
testing, and treatment of HIV are required. Clustering allows for the identification of priority
regions for the implementation of HIV control programs. Concentrating resources and efforts in
regions with the highest incidence rates may be more effective than distributing them evenly
across the entire country. The identified clusters can be used for further investigation of factors
influencing the spread of HIV in each region. For example, socio-economic conditions, behavioral
factors, and accessibility of medical services can be studied across different clusters.</p>
        </sec>
      </sec>
      <sec id="sec-1-3">
        <title>4. Conclusions</title>
        <p>This study conducted a detailed analysis of the epidemiological situation of HIV infection in
Ukraine based on multi-year statistical data. Special attention was given to the dynamics of disease
prevalence in the regions before and after the onset of the full-scale war, as well as to identifying
trends and changes in the regional distribution of indicators.</p>
        <p>To achieve the stated objectives, a comprehensive set of approaches was employed, including:
 clustering of Ukrainian regions using the K-means method to identify groups of regions
with similar HIV prevalence characteristics;</p>
        <p> identification of regions with high, medium, and low levels of infection spread, enabling
clear delineation of risk zones;</p>
        <p> analysis of cluster stability over time demonstrated that some regions change their cluster
affiliation, while others remain stable (e.g., Dnipropetrovsk and Odesa).</p>
        <p>For the forecasting tasks, machine learning models were implemented and tested, including
Random Forest, ensemble methods, and the K-Nearest Neighbors algorithm.</p>
        <p>The evaluation of model accuracy using MAE, RMSE, and R² metrics confirmed the high
effectiveness of the forecasts. The outcome of the work was the development of an information
system in the form of a software application, which: allows users to select forecast parameters
(year, quarter); performs clustering; visualizes results through graphs and maps; provides the
ability to save results in a convenient format. The information system is designed for practical
application in the field of public health, supports visual analysis of trends, and can be adapted for
other infectious diseases or medical statistics indicators. Thus, the use of machine learning methods
combined with visual tools significantly enhances the quality of epidemiological analysis and
improves the effectiveness of decision-making in healthcare.</p>
        <p>A promising direction of future investigation will be the assessment of complex
sociodemographic factors. In conclusion, data on population migration, especially internal movements
of individuals, are critically important, as parts of the movement of large groups of people can
affect access to medical services, testing and treatment. Changes in the behavior of the population
caused by war, such as increased use of drugs and alcohol, can also have a negative impact on the
expansion of HIV. The integration of these complex factors will allow us to create more accurate
models and effective approaches to counteractions to HIV/AIDS epidemic.</p>
      </sec>
      <sec id="sec-1-4">
        <title>Acknowledgements</title>
        <p>This study was funded by the National Research Foundation of Ukraine in the framework of the
research project 2023.03/0197 on the topic “Multidisciplinary study of the impact of emergency
situations on the infectious diseases spreading to support management decision making in the field
of population biosafety”.</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the authors did not use Generative AI tools.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>N. S.</given-names>
            <surname>Hoidyk</surname>
          </string-name>
          ,
          <article-title>Overview of the epidemiological situation of HIV/AIDS in Odesa region</article-title>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>O. A.</given-names>
            <surname>Holubovska</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O. I.</given-names>
            <surname>Vysotska</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O. V.</given-names>
            <surname>Bezrodna</surname>
          </string-name>
          ,
          <article-title>"The role of primary health care in patients with blood-borne infections (HIV infection and hepatitis B and C),"</article-title>
          <source>Infectious Diseases</source>
          , no.
          <issue>1</issue>
          (
          <year>2017</year>
          ):
          <fpage>5</fpage>
          -
          <lpage>8</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A. S.</given-names>
            <surname>Dovbysh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. V.</given-names>
            <surname>Vasylyev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V. O.</given-names>
            <surname>Liubchak</surname>
          </string-name>
          , Intelligent Information Technologies in E-learning, Sumy: Sumy State University,
          <year>2013</year>
          , 172 p.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Yu. P.</given-names>
            <surname>Zaichenko</surname>
          </string-name>
          ,
          <source>Fundamentals of Designing Intelligent Systems: A Textbook</source>
          , Kyiv: Slovo,
          <year>2004</year>
          ,
          <volume>352</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>L. D.</given-names>
            <surname>Kaliuzhna</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. V.</given-names>
            <surname>Hrechanska</surname>
          </string-name>
          ,
          <article-title>"Associations of sexually transmitted infections in HIV-infected individuals,"</article-title>
          <source>Ukrainian Journal of Dermatology</source>
          , Venereology, Cosmetology, no.
          <issue>1</issue>
          (
          <year>2004</year>
          ):
          <fpage>78</fpage>
          -
          <lpage>80</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>V. F.</given-names>
            <surname>Mariievskyi</surname>
          </string-name>
          ,
          <string-name>
            <surname>S. I. Doan</surname>
          </string-name>
          ,
          <article-title>"Determining promising directions for countering HIV infection in the current epidemic situation,"</article-title>
          <source>Infectious Diseases</source>
          , no.
          <issue>4</issue>
          (
          <year>2013</year>
          ):
          <fpage>17</fpage>
          -
          <lpage>22</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>D.</given-names>
            <surname>Chumachenko</surname>
          </string-name>
          , I. Meniailov,
          <string-name>
            <given-names>K.</given-names>
            <surname>Bazilevych</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Chumachenko</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Yakovlev</surname>
          </string-name>
          , “
          <article-title>Investigation of Statistical Machine Learning Models for COVID-</article-title>
          19
          <string-name>
            <surname>Epidemic Process Simulation: Random Forest</surname>
          </string-name>
          ,
          <string-name>
            <surname>K-Nearest</surname>
            <given-names>Neighbors</given-names>
          </string-name>
          , Gradient Boosting,” Computation, vol.
          <volume>10</volume>
          , no.
          <issue>6</issue>
          , p.
          <fpage>86</fpage>
          ,
          <year>2022</year>
          , doi: https://doi.org/10.3390/computation10060086.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>A.</given-names>
            <surname>Mohammadi</surname>
          </string-name>
          , et al., “
          <article-title>Comparative study of linear regression and SIR models of COVID-19 propagation in Ukraine before vaccination</article-title>
          ,
          <source>” Radioelectronic and Computer Systems</source>
          , vol.
          <year>2021</year>
          , no.
          <issue>3</issue>
          , pp.
          <fpage>5</fpage>
          -
          <lpage>18</lpage>
          ,
          <year>2021</year>
          , doi: https://doi.org/10.32620/reks.
          <year>2021</year>
          .
          <volume>3</volume>
          .01.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>D.</given-names>
            <surname>Chumachenko</surname>
          </string-name>
          et al.,
          <article-title>“Methodology for assessing the impact of emergencies on the spread of infectious diseases</article-title>
          ,
          <source>” Radioelectronic and Computer Systems</source>
          , vol.
          <year>2024</year>
          , no.
          <issue>3</issue>
          , pp.
          <fpage>6</fpage>
          -
          <lpage>26</lpage>
          , Aug.
          <year>2024</year>
          , doi: https://doi.org/10.32620/reks.
          <year>2024</year>
          .
          <volume>3</volume>
          .01.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>[10] Public Health Center of Ukraine, HIV/AIDS Statistics. Available at: https://phc.org.ua/kontrolzakhvoryuvan/vilsnid/statistika-z-vilsnidu</mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>