<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>V. Vysotska);</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Application of machine learning for predicting fraudulent anomalies in financial transactions⋆</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Victoria Vysotska</string-name>
          <email>victoria.a.vysotska@lpnu.ua</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dmytro Uhryn</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Oleksii Iliuk</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yuriy Ushenko</string-name>
          <email>y.ushenko@chnu.edu.ua</email>
          <xref ref-type="aff" rid="aff3">3</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Vasyl</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Information Systems and Networks Department, Lviv Polytechnic National University</institution>
          ,
          <addr-line>Stepan Bandera Street 12 79013 Lviv</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Ternopil Ivan Puluj National Technical University</institution>
          ,
          <addr-line>Ruska Street 56, 46025, Ternopil</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>University of Twente</institution>
          ,
          <addr-line>Drienerlolaan 5, 7522 NB, Enschede</addr-line>
          ,
          <country country="NL">The Netherlands</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Vinnytsia</institution>
          ,
          <country country="UA">Ukraine</country>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>Yuriy Fedlovyvh Chernivtsi National University</institution>
          ,
          <addr-line>Kotsiubynskoho Street 2 58012, Chernivtsi</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>1882</year>
      </pub-date>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0003</lpage>
      <abstract>
        <p>The article is devoted to the topical problem of detecting fraudulent anomalies in financial transactions using machine learning methods. In the context of rapid digital transformation of financial systems and growth in transaction volumes, traditional methods of fraud detection are becoming ineffective, which highlights the urgent need to implement automated and adaptive solutions. The research is based on a step-by-step approach that includes data preparation and processing, building and training classification models, and evaluating their effectiveness. A comparative analysis of seven popular machine learning algorithms was conducted: linear regression, decision trees, random forest, neural networks, gradient boosting, XGBoost, and SVC. The key findings of the study showed that ensemble methods demonstrate the highest effectiveness in detecting fraud: Random Forest, Gradient Boosting, and XGBoost proved to be the most suitable for fraud detection tasks, demonstrating consistently high results. This is especially important given the typical class imbalance (a small number of fraudulent transactions compared to legitimate ones) in real financial data. The effectiveness of the models significantly outperforms the other algorithms considered, indicating their ability to detect complex, non-obvious patterns in the data. The critical importance of correctly configuring model hyperparameters and accounting for class imbalance to achieve maximum accuracy and completeness in detecting fraudulent transactions has been confirmed. This avoids overfitting on the dominant class and increases the system's sensitivity to rare but important fraudulent cases. The practical significance of the study lies in the fact that the proposed approach allows financial institutions to significantly improve operational efficiency, minimize financial losses, and strengthen customer trust. The implementation of such systems provides comprehensive and adaptive protection of the financial system in today's dynamic digital environment. The results of the study confirm the effectiveness of machine learning as a powerful tool for combating financial fraud.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Machine learning</kwd>
        <kwd>financial fraud</kwd>
        <kwd>anomaly prediction</kwd>
        <kwd>ensemble methods</kwd>
        <kwd>Random Forest</kwd>
        <kwd>Gradient Boosting</kwd>
        <kwd>XGBoost</kwd>
        <kwd>class imbalance</kwd>
        <kwd>financial transactions1</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>In today’s world, financial systems are undergoing rapid digital transformation, marked by
growing transaction volumes and increasingly complex financial operations. These changes,
however, bring heightened risks of fraudulent activity, posing a serious threat to both financial
institutions and their clients. Fraud in the financial sector is becoming more sophisticated,
leveraging modern technologies to bypass traditional security measures.</p>
      <p>According to the Association of Certified Fraud Examiners (ACFE), global organizations lose
over 5% of their annual revenue to financial fraud. The 2024 report highlights that 53% of fraud
cases were linked to factors stemming from the COVID-19 pandemic, and for the first time since
2016, the average loss per case increased. Criminals are increasingly using cryptocurrencies to
cover their tracks and often operate in regions with weaker financial oversight. On average, a
typical fraud scheme lasts around 12 months before detection, underscoring the urgent need for
more effective monitoring tools.</p>
      <p>ITraditional fraud detection methods—relying on static rules and manual analysis—are
insufficient for today's challenges, driving interest in machine learning and AI for automated,
adaptive real-time fraud detection. This paper develops a machine learning approach for
identifying financial anomalies, aiming to create an effective system that detects suspicious
transactions early to minimize losses, reputational harm, and legal costs. Automation boosts
efficiency, ensures regulatory compliance, and builds client trust.Ultimately, advanced analytics
provide comprehensive financial system protection, bolstering resilience in a dynamic digital
landscape.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related works</title>
      <p>Digital transformation has reshaped financial transactions, making them faster, more convenient,
and accessible via innovations like digital banking, mobile payments, cryptocurrencies, and fintech,
which boost interconnectivity. Yet, it amplifies risks, especially financial fraud, threatening
institutions' stability and customer trust.</p>
      <p>
        Fraud includes identity theft, phishing, card fraud, money laundering, etc. Research [
        <xref ref-type="bibr" rid="ref1 ref2 ref3">1-3</xref>
        ]
highlights growing challenges in countering it amid surging transaction volumes and evolving
tactics.
      </p>
      <p>
        Traditional methods—rule-based systems, risk filters, manual reviews—rely on fixed criteria
(e.g., amount, location) but show limited effectiveness [
        <xref ref-type="bibr" rid="ref3 ref4 ref5 ref6 ref7">3-7</xref>
        ] in dynamic settings, often detecting
fraud post-fact and causing losses. Figure 1 illustrates the rising payment fraud attempts over time.
      </p>
      <p>
        The emergence of modern technologies—particularly machine learning (ML), artificial
intelligence (AI), and big data—has opened new opportunities for detecting and preventing
complex fraud schemes. ML algorithms enable real-time transaction analysis, anomaly detection,
and adaptive learning based on changing patterns of malicious behavior [
        <xref ref-type="bibr" rid="ref10 ref8 ref9">8–10</xref>
        ]. Behavioral
analytics plays a key role as well, allowing institutions to build customer profiles and identify
deviations from typical interaction patterns [
        <xref ref-type="bibr" rid="ref11 ref12 ref13">11–13</xref>
        ].
      </p>
      <p>
        In addition, technologies such as blockchain and distributed ledger technology (DLT) enhance
the transparency of financial processes and make fraudulent activities more difficult to execute by
ensuring that data entries cannot be altered or forged without leaving a trace [
        <xref ref-type="bibr" rid="ref14 ref15">14, 15</xref>
        ].
      </p>
      <p>
        Comprehensive, ready-to-deploy fraud detection systems are rarely available in the public
domain. Most fraud detection models are developed under commercial contracts tailored to specific
financial institutions or enterprises, using confidential, business-specific training data that restricts
applicability in academic or open-source projects.Public alternatives are research models on
platforms like Kaggle, using open datasets (e.g., Credit Card Fraud Detection) for full development
—from data cleaning/normalization to model building and visualization. Valuable for evaluating
algorithms in controlled environments, but challenging to integrate into real-world processes due
to variances in data scale, structure, and dynamics.Academic literature reflects a growing interest
in applying AI methods to combat financial fraud. Studies [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] explore the use of neural networks,
decision trees, naïve Bayes classifiers, and ensemble models. Some researchers [
        <xref ref-type="bibr" rid="ref17 ref18">17,18</xref>
        ] have
proposed using recurrent neural networks (RNNs) to process transaction sequences, achieving
significantly improved results compared to traditional algorithms [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ]. There is also an increasing
emphasis on the need for continuous model adaptation to evolving fraud patterns [
        <xref ref-type="bibr" rid="ref20 ref21">20,21</xref>
        ]. To
address this, hybrid systems combining supervised and unsupervised learning are being proposed,
enabling the detection of new, previously unclassified types of fraud.According to analytical
reports (ACFE, 2022; PwC, 2023), fraud detection is becoming a multidisciplinary challenge that
spans not only risk management and IT but also marketing, customer service, and strategic
management. Modern organizations must integrate risk analytics into all business processes,
building collaborative teams that bring together experts from various fields.The cost of financial
fraud is typically assessed by calculating both direct and indirect losses: fraudulent transaction
losses, software and tool expenses, analyst salaries, legal fees, and opportunity costs resulting from
diminished customer trust. A visualization of this approach is presented in the diagram below
(Figure 2).
      </p>
      <p>This diagram illustrates a comprehensive assessment of the financial impact of fraud on an
organization, capturing the various components that contribute to the overall cost:
1. Direct fraud losses. These are the actual financial damages incurred due to fraudulent
activity, such as unauthorized withdrawals, transaction manipulation, or data theft.
2. Human resources and salary costs. Expenses related to the personnel involved in fraud
detection, investigation, and prevention, including their salaries, working hours, and other
risk management-related costs.
3. Fraud protection tools and staffing. Investments in software, technical security systems,
analytical platforms, and Cybersecurity and compliance specialists.
4. Profit margin allocated to fraud-related costs.A portion of the company’s overall profit that
must be redirected toward covering fraud-related expenses—both direct and indirect.</p>
      <p>The diagram highlights that the cost of fraud extends far beyond direct financial losses—it also
includes expenditures on staff, technology, prevention tools, and lost profits. For this reason,
organizations are increasingly motivated to implement efficient fraud detection and prevention
systems to minimize the total financial burden.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Problem formulation</title>
      <p>The aim of this study is to develop an effective system for detecting fraudulent transactions using
machine learning methods. To achieve this, it is necessary to perform a set of tasks that cover the
stages of data preparation, model development and tuning, as well as evaluation of their
performance.</p>
      <p>Data preparation and preprocessing. The first step is to create a high-quality input dataset,
which includes several stages:
Data cleaning. This involves removing duplicates, eliminating incorrect entries, and
handling missing values.</p>
      <p>Normalization of numerical features. For example, scaling transaction amounts to ensure
stable performance of the algorithms.</p>
      <p>Encoding categorical variables. This is achieved through techniques such as one-hot
encoding or label encoding.</p>
      <p>Feature engineering. At this stage, new features are generated, including temporal,
geographical, or behavioral patterns that may be informative for classification.</p>
      <p>Building machine learning models. The next stage is the selection and configuration of
models capable of recognizing anomalous transactions. Since the problem is a classification
task, several approaches will be tested:</p>
      <sec id="sec-3-1">
        <title>Logistic regression. Decision trees. Ensemble methods. This includes Random Forest, Gradient Boosting, and XGBoost. Artificial neural networks.</title>
        <p>Because the dataset is often highly imbalanced, with fraudulent transactions representing only
about 1% of the total, special training strategies are required. These include adjusting class weights,
oversampling, or undersampling. In addition, hyperparameter optimization will be performed to
improve the accuracy and stability of the models.</p>
        <p>3. Evaluation of model performance. The effectiveness of the models will be measured using
several metrics: Accuracy, Recall, Precision, F1-score, ROC-AUC. To provide deeper
insights, the models will be compared on datasets with different class distributions,
including 50/50, 99/1, and 83/17. This will allow for assessment of system robustness under
varying levels of imbalance and adaptability to real-world scenarios.
4. Data selection and preparation for experimentation. Since access to real financial
transaction data is limited, publicly available datasets from Kaggle will be used. The choice
of datasets will be guided by the availability of fraud labels, the structure of the data fields
(categorical and numerical features), and sample size. After being downloaded, the data will
undergo preprocessing and only then will be used for training the models.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Methods and materials</title>
      <p>The table below summarizes key traits of the five algorithms, including convergence speed
(iterations to 90% optimal on CEC 2017 functions) and computational complexity (operations per
iteration for 100 agents). These aid in selecting algorithms for specific tasks.</p>
      <p>In this study, a machine learning method detects fraudulent anomalies in financial transactions
to minimize losses for institutions, businesses, and individuals. It is implemented stepwise using
modern Python libraries.</p>
      <p>Stage 1: Data preparation. Load dataset with pandas; clean by removing missing values,
duplicates, and irrelevant attributes. Result: structured DataFrame.</p>
      <p>Stage 2: Data scaling and normalization. Normalize numerical features (e.g., Amount) via
RobustScaler from sklearn.preprocessing (median-based, outlier-resistant). Essential for
scalesensitive models like logistic regression, SVMs, neural networks, and gradient descent. Remove
unsuitable features (e.g., ID, geography, non-informative fields).</p>
      <p>Stage 3: Training/test set formation. Split data 80/20 using train_test_split (test_size=0.2,
random_state=42, stratify=y) to prevent overfitting, tune hyperparameters, evaluate on unseen
data, and preserve class balance.</p>
      <p>Stage 4: Model construction and training. Binary classification (fraud/non-fraud) is applied.
Models (e.g., logistic regression, decision trees, random forest, gradient boosting) are tuned
individually with parameters like tree count, depth, or learning rate, balancing quality and
resources.</p>
      <p>Stage 5: Performance evaluation. Assess models using confusion matrix (TP, FP, FN, TN),
classification report (precision, recall, F1-score), AUC-ROC (class separation at varying thresholds),
and accuracy (correct predictions share; supplementary due to imbalance).</p>
      <p>The methodology is adaptable to other fraud detection datasets.</p>
      <p>The study evaluated seven ML algorithms for financial anomaly detection; each has strengths
and limitations suiting specific data types.</p>
      <p>1. Linear Regression. Despite its simplicity, this model can be applied to binary classification.</p>
      <p>It allows for class weighting, which is important for imbalanced datasets. The best results
were shown by saga, while liblinear was more effective for smaller samples. The model is
limited in its ability to capture complex nonlinear relationships and requires prior feature
scaling.
2. Decision Tree. An interpretable and fast model that does not require feature scaling. In the
study, it demonstrated a tendency toward overfitting, which is why depth was restricted to
five levels, and thresholds for minimum node splits and samples per leaf were introduced.
The model is sensitive to changes in data and performs worse than ensemble methods in
terms of accuracy.
3. Random Forest. An ensemble approach based on decision trees that significantly reduces
the risk of overfitting. It showed consistently high results across all datasets. The model
used 50 trees with a maximum depth of 10 levels and a minimum of 10 samples per leaf.</p>
      <p>Another advantage is the ability to determine feature importance.
4. Neural Network (MLP Classifier). A multilayer model that performs well with complex and
high-dimensional data. Its main advantages are flexibility and the ability to model complex
dependencies. However, the model requires careful tuning of its architecture, considerable
computational resources, and a large amount of training data. It is also difficult to interpret.
5. Gradient Boosting. An ensemble method with sequential tree training that corrects the
errors of previous models. It demonstrated high accuracy with moderate depth (up to 3) and
number of trees (100). The learning rate was set at 0.1 to balance speed and performance. A
key advantage is robustness to imbalanced data due to class weight support.
6. XGBoost. An optimized version of gradient boosting with high performance. It supports
handling missing values and includes built-in regularization that reduces the risk of
overfitting. It was applied with the same parameters as Gradient Boosting and confirmed its
effectiveness across different datasets.
7. Support Vector Classifier (SVC). One of the most accurate but also the most
resourceintensive algorithms. It performs best on imbalanced datasets, where class inequality can be
compensated by class weights. Training took a significant amount of time (up to an hour),
but the model demonstrated strong capability in separating complex classes.</p>
      <p>These results confirm the suitability of ensemble methods (Random Forest, Gradient Boosting,
XGBoost) for fraud detection tasks and highlight the importance of parameter tuning and class
imbalance handling to improve model accuracy.</p>
      <p>Data splitting into training and test sets is a fundamental stage in the machine learning process,
as it allows for an objective evaluation of the model’s ability to generalize to new, unseen data.
This approach divides the dataset into two parts.</p>
      <p>The training set is used to build the model and usually represents 70–80% of the data,
providing the model with a wide range of examples to learn transaction patterns of
different types. The larger the training set, the more patterns, both legitimate and
fraudulent, the model can capture, which increases its predictive effectiveness.</p>
      <p>The test set consists of the remaining 20–30% of the data, which is not used during training.
It serves to evaluate the performance of the model on new examples, allowing for an
assessment of its generalization ability. This corresponds to the supervised learning
concept, where the model is first “trained” and then “tested” on an independent set.</p>
      <p>When splitting data, parameters include: test_size=0.2 (20% for testing), stratify=y (ensures
proportional class representation in train/test sets, crucial for imbalances), random_state=42 (fixes
randomness for reproducibility and bias prevention). These minimize overfitting, where models
perform well on training data but poorly on new data.</p>
      <p>Evaluation uses the classification report to summarize key metrics; Precision is the proportion
of correctly predicted positives among all predicted positives.</p>
      <p>A high precision value indicates a small number of false positives. It is calculated using the
formula:
(1)
(2)
(3)
where TP represents true positives and FP represents false positive predictions. Recall (sensitivity)
is a metric that reflects the model’s ability to detect all actual positive cases. It is calculated as the
proportion of correctly classified positive results among all real positive examples. A high recall
value indicates a small number of false negative predictions. Formally, it is computed using the
formula:
where TP represents true positives and FN represents false negatives. The F1-score is the harmonic
mean between precision and recall, allowing a balance between these two metrics. It is particularly
useful in cases of imbalanced data, where it is important to account for both false positive and false
negative predictions. It is calculated using the following formula:</p>
      <p>Pre cision =</p>
      <p>TP</p>
      <p>TP + TF
Recall =</p>
      <p>TP</p>
      <p>TP + FN
2 precision ∙ recall</p>
      <p>F 1 =( recall-1 + precision-1 )= 2 ∙ precision + recall</p>
      <p>Support: Number of actual instances per class in the test set; aids in interpreting precision and
recall relative to class sizes.</p>
      <p>ROC-AUC: Measures model's class distinction at varying thresholds. ROC curve plots TPR vs.
FPR as threshold changes. AUC: 1.0 = perfect, 0.5 = random guessing, &lt;0.5 = worse than random.</p>
      <p>Accuracy: Proportion of correct classifications overall. Simple but misleading in imbalanced data
(e.g., predicting 95% majority class yields 95% accuracy but fails to detect minority):
Accuracy =</p>
      <p>TP + TN
TP + TN + TP + FN
(4)
where TP = true positives, TN = true negatives, FN = false negatives. Metrics like confusion matrix,
AUC-ROC, and accuracy enable comprehensive model evaluation, highlighting strengths and
limitations. This supports informed selection of optimal methods, forming the basis for effective
fraud detection systems that minimize risks for businesses and users.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Analysis of the database</title>
      <sec id="sec-5-1">
        <title>5.1. Selection of machine learning research datasets for predicting fraudulent anomalies in financial transactions</title>
        <p>The first dataset contains 248,807 credit card transaction records, with 492 fraudulent (99:1
imbalance). It includes 28 anonymized features V0–V28, plus Time and Amount. The second is
artificially generated and balanced (560,863 each class), structurally similar (V0–V28, no Time,
removes uninformative ID). Used for training to avoid imbalance effects. The third, for real-world
testing: 5,100 records (83:17), less anonymized but structurally different, so trained separately.
Datasets referenced as 99:1, 50:50, and 83:17.</p>
        <p>Figure 5 presented the three selected datasets for the study of fraudulent anomalies in financial
transactions: (a) containing credit card transaction information over a specified period; (b)
artificially generated with a balanced class distribution; (c) approximating real-world conditions.</p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. Result and discussion</title>
        <p>Research and testing ML models are crucial for building effective fraud detection systems in
financial transactions. This validates methods in near-real conditions, where accuracy minimizes
losses and fosters bank trust. Datasets with class imbalances (50:50, 83:17, 99:1) mirrored rare fraud
scenarios, enabling full assessment of anomaly detection in highly skewed data. The study
pinpointed optimal models for institutions that cut risks, boost efficiency, reduce false positives,
lower security costs, and ensure live system stability. Below: detailed performance analysis per
model across datasets. Logistic Regression model across the three datasets:</p>
        <p>Dataset 50%/50%. The model demonstrated high accuracy at 96% with balanced precision
and recall metrics, indicating its effectiveness on balanced data.</p>
        <p>Dataset 99%/1%. Model accuracy sharply dropped to 10%, showing practically zero
effectiveness in detecting fraudulent transactions (class 1) under strong class imbalance.
Although the model correctly predicted all fraudulent transactions, it misclassified a
significant number of legitimate operations as fraudulent. This may indicate potential
overfitting, data leakage, or insufficient parameter tuning.</p>
        <p>Dataset 83%/17%. Accuracy was 87%; however, the F1-score for class 1 significantly
decreased (0.62), indicating frequent errors in fraud detection.</p>
        <p>Table 1 and a screenshot (Fig. 4) below illustrate the accuracy metrics and training results of the
Logistic Regression algorithm on the 83%/17% dataset, as mentioned in the previous sections.
Macro avg
Weighted avg
1. 50%/50% set. The model showed high precision and recall (96% for both classes) and an
overall accuracy of 96%, indicating its good performance on uniform data.
2. 99%/1% set. The model maintained high accuracy for class 0, but precision for class 1
decreased significantly. The overall accuracy was 58%, indicating a problem with imbalance.
3. 83%/17% set. The overall accuracy of the model was high at 97%. The F1-score for class 1
was 0.88, which is better than Logistic Regression, but still shows a decrease in recall.</p>
        <p>Below is Table 2 and a screenshot (Fig. 4) showing the accuracy metrics mentioned in the
previous sections and the results of training the Decision Tree algorithm on the 83%/17% dataset.
Figure 7 shows the Decision Tree console output in an 87%/13% set with corresponding metrics.</p>
        <p>Random Forest model: 50%/50% set: Demonstrated impressive results: 98% accuracy and high
F1-score values for both classes. 99%/1% set: Despite 71% accuracy, precision for class 1 was close to
0, indicating a lack of recognition of fraudulent transactions. 83%/17% set: Accuracy remained at
90%, but the F1-score for class 1 dropped to 0.44, indicating problems with detecting the smaller
class.</p>
        <p>Below is Table 3 and a screenshot (Fig. 8) showing the accuracy metrics mentioned in the
previous sections and the results of training the Random Forest algorithm on the 83%/17% dataset.</p>
        <p>Figure 8 shows the Random Forest console output in an 87%/13% set with corresponding
metrics.</p>
        <p>Neural Network (MLP Classifier) model:50%/50% set: Showed almost perfect performance with
98% accuracy and an F1-score of 0.99 for both classes. 99%/1% set. There was a significant problem
with precision for class 1, although the overall accuracy of 83% remained acceptable. 83%/17%.
Accuracy dropped to 80%, and the F1-score for class 1 was only 0.53, indicating insufficient
adaptation to the imbalance. Below is Table 4 and a screenshot (Fig. 6) showing the accuracy
metrics mentioned in the previous sections and the results of training the MLP Classifier algorithm
on the 83%/17% dataset.</p>
        <p>Figure 9 shows the Neural Network console output in the 87%/13% set with the corresponding
metrics.</p>
        <p>Gradient Boosting model:50%/50% set. The experiment demonstrated high efficiency on a
uniform data set with an accuracy of 97%. 99%/1% set. The overall accuracy was 47% due to high
imbalance, which led to a significant decrease in the F1-score for class 1. 83%/17% set. The accuracy
reached 99%. The model handled the imbalanced data well, although the F1-score for class 1
decreased slightly to 0.97.</p>
        <p>Below is Table 5 and a screenshot (Fig. 7) showing the accuracy metrics mentioned in the
previous sections and the results of training the Gradient Boosting algorithm on the 83%/17%
dataset.
Weighted avg</p>
        <p>XGBoost model: 50%/50% set: The model showed a high accuracy of 97% with well-balanced
metrics for both classes. 99%/1% set. The model demonstrated the best result among all models for
the 99%/1% set, achieving 99% accuracy and an F1-score for class 1 of 0.96. 83%/17% set. The model
maintained high accuracy of 99%, adapting well to partially imbalanced data.</p>
        <p>Below is Table 6 and a screenshot (Fig. 8) showing the accuracy metrics mentioned in the
previous sections and the results of training the XGBoost algorithm on the 83%/17% dataset.</p>
        <p>Figure 11 shows the XGBoost console output in an 87%/13% split with the corresponding
metrics.</p>
        <p>SVC model: 50%/50% set. The model showed average performance with an accuracy of 52%,
indicating its low ability to detect both classes. 99%/1% set. The model shows a sharp drop in the
F1-score for class 1, with an accuracy of only 55%. 83%/17% set. The model's accuracy was 81%, but
the F1-score for class 1 was only 0.10, indicating poor recognition of the smaller class.</p>
        <p>Below is Table 7 and a screenshot (Fig. 9) showing the accuracy metrics mentioned in the
previous sections and the results of training the SVC algorithm on the 83%/17% dataset.</p>
        <sec id="sec-5-2-1">
          <title>The following table shows the results of studies in a 99%/1% set.</title>
          <p>Random Forest
MLP Classifier
Gradient Boosting
XGBoost
SVC</p>
          <p>In general, the results of the study allow us to draw the following conclusions about the
effectiveness of the models used. The XGBoost and Gradient Boosting algorithms demonstrated the
highest performance for all class imbalance options, indicating their high adaptability. In contrast,
Logistic Regression and Support Vector Classifier (SVC) proved ineffective in cases of significant
imbalance, losing their ability to accurately classify fraudulent transactions. Methods based on tree
structures generally cope better with partially unbalanced data, but require careful tuning when
working with heavily skewed samples. Overall, most of the tested models confirmed their
reputation in the context of detecting fraudulent anomalies.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusions</title>
      <p>The article examines the application of machine learning for predicting fraudulent anomalies in
financial transactions. The aim of the study was to develop an effective framework capable of
identifying suspicious transactions at early stages, thereby minimizing financial losses, reputational
risks, and legal costs. According to the Association of Certified Fraud Examiners (ACFE), annual
losses from financial fraud have exceeded 5% of organizational revenue worldwide, and the average
loss in 2024 increased for the first time since 2016.</p>
      <p>Traditional fraud detection methods, which rely on static rules and manual analysis, are no
longer adequate to meet modern challenges due to the growing volume of data and the increasing
complexity of financial operations. These methods cannot process large datasets in a timely
manner or recognize complex anomalous behavior patterns. Consequently, the implementation of
machine learning and artificial intelligence technologies has become increasingly relevant, as they
enable the creation of automated, adaptive systems for real-time fraud prediction.</p>
      <p>The study developed a step-by-step approach for detecting fraudulent anomalies using machine
learning, which includes data preparation and preprocessing, building and training classification
models, and evaluating their effectiveness. The performance of seven popular machine learning
algorithms was analyzed, including linear regression, decision trees, random forest, neural
networks, Gradient Boosting, XGBoost, and SVC.</p>
      <p>The research demonstrated that ensemble methods such as Random Forest, Gradient Boosting,
and XGBoost are the most suitable for fraud detection tasks. These models consistently delivered
high performance across different datasets, including when working with imbalanced data, which
is typical of real financial transactions. The study also confirmed the importance of proper
parameter tuning and accounting for class imbalance to improve model accuracy.</p>
      <p>Applying the proposed approach enables financial institutions to significantly enhance
operational efficiency, minimize financial losses, and strengthen client trust, providing
comprehensive protection for the financial system in a dynamic digital environment.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgements</title>
      <p>Thanks our colleagues for their valuable advice and constructive feedback, which contributed to
refining the methodology and interpreting the results.</p>
      <sec id="sec-7-1">
        <title>The authors have not employed any Generative AI tools.</title>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>E. W. T.</given-names>
            <surname>Ngai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y. H.</given-names>
            <surname>Wong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <article-title>Data mining techniques in financial fraud detection: classification framework, Decision Support Systems 50 (</article-title>
          <year>2011</year>
          )
          <fpage>559</fpage>
          -
          <lpage>569</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Shu</surname>
          </string-name>
          , H. Liu,
          <article-title>Detecting fraud in online transactions using machine learning: a review</article-title>
          ,
          <source>ACM Computing Surveys</source>
          <volume>54</volume>
          (
          <year>2021</year>
          )
          <fpage>1</fpage>
          -
          <lpage>36</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>S.</given-names>
            <surname>Bhattacharyya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Jha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Tharakunnel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. C.</given-names>
            <surname>Westland</surname>
          </string-name>
          ,
          <article-title>Data mining for credit card fraud: a comparative study, Decision Support Systems 50 (</article-title>
          <year>2011</year>
          )
          <fpage>602</fpage>
          -
          <lpage>613</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>J.</given-names>
            <surname>Jurgovsky</surname>
          </string-name>
          , G. Granitzer,
          <string-name>
            <given-names>K.</given-names>
            <surname>Ziegler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Calabretto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Portier</surname>
          </string-name>
          ,
          <article-title>Sequence classification for credit-card fraud detection</article-title>
          ,
          <source>Expert Systems with Applications</source>
          <volume>100</volume>
          (
          <year>2018</year>
          )
          <fpage>234</fpage>
          -
          <lpage>245</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>C.</given-names>
            <surname>Whitrow</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. J.</given-names>
            <surname>Hand</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Juszczak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Weston</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. M.</given-names>
            <surname>Adams</surname>
          </string-name>
          ,
          <article-title>Transaction aggregation for credit card fraud detection</article-title>
          ,
          <source>Data Mining and Knowledge Discovery</source>
          <volume>18</volume>
          (
          <year>2009</year>
          )
          <fpage>30</fpage>
          -
          <lpage>55</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <article-title>Blockchain-based financial fraud detection: a review</article-title>
          ,
          <source>IEEE Access 8</source>
          (
          <year>2020</year>
          )
          <fpage>111697</fpage>
          -
          <lpage>111707</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>D.</given-names>
            <surname>Uhryn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Ushenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Lytvyn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Lozynska</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Ilin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Hostiuk</surname>
          </string-name>
          ,
          <article-title>Intelligent GIS model for migration forecasting</article-title>
          ,
          <source>Int. Journal of Modern Education and Computer Science</source>
          <volume>15</volume>
          (
          <year>2023</year>
          )
          <fpage>69</fpage>
          -
          <lpage>79</lpage>
          . https://doi.org/10.5815/ijmecs.
          <year>2023</year>
          .
          <volume>04</volume>
          .06
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <source>[8] Association of Certified Fraud Examiners, Report to the Nations: 2022 Global Study on Occupational Fraud and Abuse</source>
          ,
          <string-name>
            <surname>ACFE</surname>
          </string-name>
          , Austin,
          <year>2022</year>
          . URL: https://www.acfe.com/report-to-thenations/
          <year>2022</year>
          /
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9] PricewaterhouseCoopers,
          <source>Global Economic Crime and Fraud Survey</source>
          <year>2023</year>
          , PwC,
          <year>2023</year>
          . URL: https://www.pwc.com/gx/en/services/forensics/economic-crime-survey.html
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>International</given-names>
            <surname>Monetary</surname>
          </string-name>
          <string-name>
            <surname>Fund</surname>
          </string-name>
          , Fraud in financial institutions,
          <year>2025</year>
          . URL: https://www.imf.org/external/index.htm
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Scikit-learn</surname>
          </string-name>
          ,
          <source>Machine learning in Python - documentation</source>
          ,
          <year>2025</year>
          . URL: https://scikit-learn.org/stable/documentation.html
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>XGBoost</surname>
          </string-name>
          , Extreme Gradient Boosting documentation,
          <year>2025</year>
          . URL: https://xgboost.readthedocs.io/
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Google</surname>
            <given-names>AI Blog</given-names>
          </string-name>
          ,
          <article-title>Fighting fraud with machine learning</article-title>
          ,
          <year>2025</year>
          . URL: https://ai.googleblog.com/
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Kaggle</surname>
          </string-name>
          , Fraud detection datasets,
          <year>2025</year>
          . URL: https://www.kaggle.com/
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>C. M. Bishop</surname>
          </string-name>
          ,
          <source>Pattern Recognition and Machine Learning</source>
          , Springer,
          <year>2006</year>
          . URL: https://www.springer.com/gp/book/9780387310732
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16] Ministry of Finance of Ukraine, Financial security,
          <year>2025</year>
          . URL: https://mof.gov.ua/en/
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <surname>Financial</surname>
            <given-names>Times</given-names>
          </string-name>
          ,
          <source>Combating financial fraud with AI</source>
          ,
          <year>2025</year>
          . URL:https://www.ft.com/
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <source>International Journal of Financial Studies, AI in fraud detection</source>
          ,
          <year>2025</year>
          . URL: https://www.mdpi.com/journal/ijfs
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>S.</given-names>
            <surname>Vladov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Vysotska</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Sokurenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Muzychuk</surname>
          </string-name>
          ,
          <string-name>
            <surname>L. Chyrun,</surname>
          </string-name>
          <article-title>The Intelligent Data Measurement System Using Neural Network Technologies and Fuzzy Logic Under Operating Implementation Conditions</article-title>
          ,
          <source>Big Data and Cognitive Computing</source>
          <volume>8</volume>
          :
          <issue>12</issue>
          (
          <year>2024</year>
          )
          <article-title>189</article-title>
          . https://doi.org/10.3390/bdcc8120189
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>Visa</given-names>
            <surname>Inc</surname>
          </string-name>
          .,
          <source>Fraud prevention with advanced analytics</source>
          ,
          <year>2025</year>
          . URL: https://usa.visa.com
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>D.</given-names>
            <surname>Uhryn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Andrunyk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Chyrun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Antonyuk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Dyyak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Naum</surname>
          </string-name>
          ,
          <article-title>Service-oriented architecture as integration platform in tourism</article-title>
          ,
          <source>in: Proc. 2nd Int. Workshop on Modern Machine Learning Technologies and Data Science (MoMLeT+DS</source>
          <year>2020</year>
          ),
          <source>CEUR-WS 2631</source>
          ,
          <year>2020</year>
          ,
          <fpage>221</fpage>
          -
          <lpage>236</lpage>
          . URL: https://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>2631</volume>
          /paper17.pdf
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>