<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>K. Chorniy); ORCID</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Analysis of state register data to identify anomalies and corruption threats in tenders and procurements</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Kyrylo Chornyi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Galina Shilo</string-name>
          <email>shilo.gn@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Anastasiia Lebedieva-Dychko</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Zaporizhzhia National University, м</institution>
          ,
          <addr-line>Zhukovs'koho St, 66, Zaporizhzhia, Zaporizhia Oblast, Zaporizhzhia, 69600</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>1861</year>
      </pub-date>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0002</lpage>
      <abstract>
        <p>Public procurement plays a crucial role in economic development, accounting for a significant portion of government expenditures worldwide. However, the sector remains highly vulnerable to corruption, inefficiencies, and financial mismanagement, leading to severe economic and social consequences. Corruption in public procurement not only undermines fair competition but also results in inflated costs, poor-quality services, and weakened public trust in governmental institutions. The digitization of legal algorithms presents a promising solution to these challenges. By leveraging artificial intelligence (AI), big data, blockchain, and automated decision-making systems, governments can strengthen compliance, detect fraudulent activities, and enhance oversight in public procurement. The system for detecting fraudulent activities in public procurements and tenders is proposed. The system gets data from the open database of companies, their activities, tenders, court cases, as well as tax and legal information. Using an algorithm, the system can detect any activities with corruption threats. After analyzing hundreds of tenders, a dataset was formed. This dataset of tenders, companies and their activity will be used for training a machine-learningbased algorithm to further enhance the analytical capabilities of the system.</p>
      </abstract>
      <kwd-group>
        <kwd>Procurement</kwd>
        <kwd>corruption</kwd>
        <kwd>government</kwd>
        <kwd>tenders</kwd>
        <kwd>fraud</kwd>
        <kwd>cluster analysis</kwd>
        <kwd>dataset 1</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Corruption in public procurement is one of the most serious problems of the modern public
administration system. It undermines public trust, limits economic development and leads to
inefficient spending of budget funds. At the same time, existing control tools do not allow for timely
and comprehensive</p>
      <p>monitoring of possible corruption schemes and connections between
procurement participants due to the large volume of data and the complexity of analyzing the
relationships. As a result, violations are often identified after the completion of procedures, which
significantly reduces the possibility of restoring justice and efficient use of resources.</p>
      <p>
        One of the industries that is particularly susceptible to corruption risks is the construction sector.
This is due to the large volumes of funding and the complexity of projects. According to studies by
the European Commission and the Organization for Economic Cooperation and Development
(OECD), up to a fifth of the European Union's GDP is spent through public procurement, and losses
from corrupt practices reach 20-25% of the allocated funds. The most common forms of corruption in
procurement include bribery, collusion between participants, conflicts of interest and hidden
lobbying. Recent Eurobarometer surveys confirm the prevalence of the problem: a significant share
of entrepreneurs consider corruption a serious barrier to participation in tenders [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
∗ Corresponding author.
† These authors contributed equally.
      </p>
      <p>To address these issues, European countries are actively implementing digital technologies and
legislative measures aimed at increasing transparency. Examples of successful initiatives are laws on
access to information - the Freedom of Information Act in the UK. The law allows the public to access
government data, significantly reducing the opportunities for corruption. In addition, the EU has
initiated a large-scale transition from paper to electronic tendering procedures over the past decade.
E-procurement platforms such as Prozorro in Ukraine and TED (Tenders Electronic Daily) in the EU
provide a full "electronic trail", increasing transparency and accessibility of information for all
stakeholders.</p>
      <p>The platform that operates in Ukraine provides a unified information space where tender
announcements, commercial offers, and contracts are published. At the same time, in Ukraine, there
have been significant problems with access to information since the beginning of the full-scale
invasion. Limited access to legal registers prevents full monitoring and timely detection of corruption
risks. This circumstance significantly reduces the effectiveness of the state's anti-corruption policy.</p>
      <p>
        The digitization of legal algorithms as a tool for mitigating corruption in public procurement is
explored [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Corruption in procurement arises from inefficiencies, conflicts of interest, and weak
oversight, leading to financial mismanagement and security risks. The study highlights how digital
technologies, including AI-driven analytics, blockchain, and big data, can enhance transparency,
automate oversight, and detect fraudulent activities in procurement processes. Implementing
AIbased monitoring systems and aligning national procurement policies with international best
practices can significantly reduce misappropriation risks.
      </p>
      <p>
        The use of quantitative indicators to detect corruption risks in public procurement is
researched [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. The study applies machine learning techniques to analyze roadwork contracts in Italy,
identifying new red flags derived from police investigations and judicial practices. It finds that
multiparameter awarding criteria are systematically linked to high corruption risk. However,
urgencybased procedures are more obvious red flags, but they are ineffective due to their predictability. The
research also demonstrates that corruption risk prediction improves when including unmonitored
indicators, highlighting the adaptability of corrupt actors to scrutiny. Furthermore, the study
emphasizes that private firm competition and transparent bidding processes help mitigate corruption
risks. It concludes that enhancing data collection on public contracts and strengthening coordination
between courts and regulatory authorities could improve corruption detection. Finally, the findings
suggest that concealing specific monitoring criteria may prevent corrupt actors from adapting their
strategies, ultimately aiding enforcement efforts.
      </p>
      <p>
        The object of the research is the processes of public procurement and tendering within Ukrainian
electronic registry systems. The subject of the research is the methods and digital tools for analyzing
data from state registries to identify anomalous patterns and detect corruption risks in tenders and
procurement activities. To reduce the corruption risks in Ukraine, the operational control over the
transparency of procurement is a relevant problem [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Current solutions in Ukraine for analyzing
public procurement and tenders offer fragmented information. Prozorro contains data on tenders,
while YouControl shares s business analytics about companies. To solve the problem, an integrated
approach is proposed with the possibility of automated risk assessment based on multidimensional
parameters and the use of artificial intelligence. Artificial intelligence technologies will allow
identifying additional unknown corruption schemes [
        <xref ref-type="bibr" rid="ref4 ref5">4-5</xref>
        ]. The paper aims to develop a specialized
analytical system as a web application to aggregate and process data from open sources (Zakupivli.pro
[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], YouControl [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], etc.), providing a comprehensive analysis of tender purchases by means artificial
intelligence technologies. The developed software solution allows for a detailed analysis of winners,
identifying links between companies and officials, visualizing data at various stages of procurement,
and generating analytical reports available to citizens and experts. The implementation of such an
automated system is aimed at increasing transparency, promptly identifying and preventing
corruption risks, and strengthening public confidence in public institutions and procurement
procedures.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. The web application for tender analysis</title>
      <p>One of the common methods of abuse is to create the appearance of competition when related
companies participate in a tender. This can happen through several mechanisms:
• affiliation by managers and beneficiaries (several legal entities belong to the same group of
owners);
• using registration of fictitious firms to submit alternative bids and simulate competition;
• price manipulation (preliminary agreements between participants on the distribution of
winnings by overstating or understating offers).</p>
      <p>The proposed web application automatically collects data via API on the following information:
about tenders, amounts of concluded contracts and data on contractors, subcontractors, including
their identification code (EDRPOU), legal addresses, information on directors and ultimate
beneficiaries. After collection, the data is consolidated in a single database.</p>
      <p>The analysis is carried out according to the following key parameters:
• intersection of beneficiaries, directors and legal addresses of companies;
• frequency of victories of one company in tenders from one customer;
• relationship of participants through third structures (affiliation of companies);
• unusual changes in the cost of tender offers.</p>
      <p>It helps to identify companies that systematically win tenders from the same subcontractors, and
find the connections between the winners of purchases through founders, directors and registration
addresses.</p>
      <p>The program implements the analysis of tender processes, including the selection of the
contractors, viewing detailed information on purchases and analyzing the winners. The system
displays data on tenders, including the name of the purchase, the winner, the contract amount and
the list of participants. Information on the winner is interactive, which allows for immediate in-depth
analysis (Figure 1).</p>
      <p>This tool is effective for monitoring public procurement, helping journalists and
anticorruption agencies quickly identify possible violations. By using open data and modern analysis
algorithms, the web application promotes transparency and fairness of procurement processes.</p>
      <p>Based on the analysis of several tenders by the system, the characteristic patterns indicating
possible corruption risks are proposed and represented in Tables 1 and 2.
Analysing the distribution of
financial flows allows not only to
determine the share of the budget
- Total amount of tenders; attributable to each contractor, but
- Average contract price; also to identify anomalies
- The share of the budget going to systematic and disproportionate
one subcontractor allocation of funds to one bidder.
- Average difference between the Such behaviour may indicate possible
forecast and actual contract price affiliation with the contracting
authority, restriction of competition
or the existence of corrupt
agreements.</p>
      <p>Financial flows</p>
      <sec id="sec-2-1">
        <title>Contractors would be more</title>
        <p>interested in some specific
subcontractors based on personal
gains, but on professional
requirements</p>
      </sec>
      <sec id="sec-2-2">
        <title>Regional schemes and differences in procurement Category</title>
      </sec>
      <sec id="sec-2-3">
        <title>Contract terms</title>
      </sec>
      <sec id="sec-2-4">
        <title>Attributes Patterns of corruption risks</title>
        <p>- Distribution of tenders by month
(are there any spikes in During wartime, a minimum of 7
November/December?) days is allowed for conducting a
- The share of tenders conducted tender, usually, it is 15, and if it is
urgently (less than 10 days between often 7, then it is suspicious; it may
the announcement and the be a tender for one person.
submission of applications)</p>
        <p>Analysis of the temporal characteristics of
participation in tenders allows us to
identify typical and anomalous patterns of
contractor behaviour - for example, a
concentration of wins in certain periods
(usually at the end of the budget year),
bursts of activity in November-December,
and irregular or selective frequency of
participation. Such temporal anomalies
may indicate attempts to target budget
allocation in favour of a particular bidder,
participation in pre-agreed procedures, or
fictitious competition.</p>
      </sec>
      <sec id="sec-2-5">
        <title>The subcontractors that participated in the previous tenders with the same contractor.</title>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Cluster analysis to anomaly detection in tenders</title>
      <p>To further enhance the system for tender analysis, we can utilize a machine -learning-based
approach. The cluster analysis is used to identify groups of companies exhibiting anomalous behavior
in tenders. This approach enables the segmentation of companies into clusters based on a
multidimensional feature space, such as encompassing tender activity, behavioral patterns,
affiliations, geography, interactions with contracting authorities, and financial indicators.
Additionally, it aids in identifying companies that significantly deviate from typical behavioral
profiles. By utilizing unsupervised machine learning techniques such as k-Means clustering,
DBSCAN, and Isolation Forest, it is possible to detect individual companies or small groups exhibiting
unusual patterns. To do so, first, we need a proper dataset to train our model with.</p>
      <p>There are many types of datasets based on the type of data they contain. In our case, we have a
tabular dataset, which is a database dump into CSV.</p>
      <p>The dataset consists of 3 CSV files, each represents a specific table in the database schema.
Contractors.csv file represents information about the companies that put tenders, as shown in Table
3.</p>
      <p>The file tenders.csv represents information about tenders that are submitted by the companies, as
shown in Table 4.</p>
      <p>The file subcontractors.csv represents information about the customers who win/participate in
those tenders, as shown in Table 5.</p>
      <p>The current dataset schema is not complete since right now some of the government data sources
are closed due to the war. Because of this, we can only use those that are in the public domain. The
data in the public domain is quite good, however, in some cases, the information is incomplete or not
enough for analysis.</p>
      <p>In particular, the following categories of information are missing:
• history of changes in founders, , and ultimate beneficiaries, which is critical for identifying
hidden company affiliations;
• data on the company's turnover, debts, and VAT payments, which is necessary to separate
real businesses from fictitious firms;
• data from declarations of civil servants is important for establishing connections between
officials and companies that win tenders;
• data on the execution of contracts - including the involvement of subcontractors, certificates
of completion of work, and payment, which is often not published;
• historical tender data, including for defense procurement and critical infrastructure facilities,
which are currently closed.
contractor
winner_edrpou
winner
amount
date
subcontractors
created_at
url_tenders</p>
      <p>The absence of these data limits the capabilities of machine analysis and does not allow for the
full identification of affiliation schemes, price gouging, fictitious competition and other signs of
corruption. Gaining access to such registers, at least in a limited form for scientific and analytical
purposes, is critically important for building an effective public procurement monitoring system in
Ukraine.</p>
      <p>Another difficult in tender data analysis is the different terms in text type attributes may represent
the same concept. Appling Natural Language Processing technologies during the preliminary data
preparation phase to identify synonyms in text data is an effective approach that enhances the quality
of subsequent analyses.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusion</title>
      <p>Thus, the use of modern technologies in public procurement is an important step in the fight
against corruption. Digital platforms improve the transparency of procedures and provide
information for analysis and identification of possible corruption schemes.</p>
      <p>Using web applications for monitoring tenders helps to effectively identify fictitious competition
schemes and contract manipulation. The system for identifying relationships between companies
helps to minimize the influence of affiliated structures on procurement results.</p>
      <p>However, significant challenges remain, such as insufficient access to data, the complexity of
integrating different sources of information and the limited capabilities of law enforcement agencies.
To achieve maximum effect, it is necessary to further develop the legislative framework, ensure open
access to state registers and improve data analysis methods.</p>
      <p>To improve the accuracy of analysis and further automated detection of corruption schemes, we
propose the formation of a dataset that will include:
• historical data on tenders and their winners;
• information on company connections, directors and beneficiaries;
• financial indicators and amounts of concluded contracts;
• tags of previously identified corruption cases and investigations;
• data on unnatural price jumps and discrepancies with market conditions.</p>
      <p>This dataset will be used to train machine learning models that can automatically find suspicious
anomalies and offer analytical reports for experts and government agencies.</p>
      <p>This direction of the system's development is aimed at use by both government anti-corruption
agencies and independent researchers and journalists.</p>
    </sec>
    <sec id="sec-5">
      <title>Declaration on Generative AI</title>
      <sec id="sec-5-1">
        <title>The authors have not employed any Generative AI tools.</title>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>European</given-names>
            <surname>Union</surname>
          </string-name>
          .
          <article-title>Eurobarometer site</article-title>
          . URL: https://europa.eu/eurobarometer/surveys/detail/3180 (accessed:
          <fpage>14</fpage>
          .
          <fpage>02</fpage>
          .
          <year>2025</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>F.</given-names>
            <surname>Decarolis</surname>
          </string-name>
          and
          <string-name>
            <given-names>C.</given-names>
            <surname>Giorgiantonio</surname>
          </string-name>
          .
          <article-title>Corruption red flags in public procurement: new evidence from Italian calls for tenders</article-title>
          .
          <source>EPJ Data Science</source>
          ,
          <volume>11</volume>
          (
          <issue>1</issue>
          ),
          <year>2022</year>
          . doi:
          <volume>10</volume>
          .1140/epjds/s13688-022-00325-x (accessed:
          <fpage>27</fpage>
          .
          <fpage>03</fpage>
          .
          <year>2025</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>O.</given-names>
            <surname>Makarenkov</surname>
          </string-name>
          .
          <article-title>Digitisation of legal algorithms to prevent public procurement corruption</article-title>
          .
          <source>Baltic Journal of Economic Studies</source>
          ,
          <volume>10</volume>
          (
          <issue>5</issue>
          ):
          <fpage>254</fpage>
          -
          <lpage>265</lpage>
          ,
          <year>2024</year>
          . doi:
          <volume>10</volume>
          .30525/
          <fpage>2256</fpage>
          -
          <lpage>0742</lpage>
          /
          <fpage>2024</fpage>
          -10-5-
          <fpage>254</fpage>
          -265 (
          <issue>accessed</issue>
          :
          <fpage>27</fpage>
          .
          <fpage>03</fpage>
          .
          <year>2025</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <article-title>[4] Applied machine learning to anomaly detection in enterprise purchase processes: a hybrid approach using clustering and isolation forest / A. Herreros-Martínez et al</article-title>
          .
          <source>Information</source>
          .
          <year>2025</year>
          . Vol.
          <volume>16</volume>
          , no. 3. P. 177. URL: https://doi.org/10.3390/info16030177 (date of access:
          <volume>25</volume>
          .
          <fpage>04</fpage>
          .
          <year>2025</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Busu</surname>
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Busu</surname>
            <given-names>C</given-names>
          </string-name>
          .
          <article-title>Detecting bid-rigging in public procurement. A cluster analysis approach</article-title>
          .
          <source>Administrative sciences. 2021</source>
          . Vol.
          <volume>11</volume>
          , no. 1. P. 13. URL: https://doi.org/10.3390/admsci11010013 (date of access:
          <volume>25</volume>
          .
          <fpage>04</fpage>
          .
          <year>2025</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Prozorro. Zakupivli</given-names>
            <surname>Platform</surname>
          </string-name>
          . URL: https://zakupivli.pro/ (accessed:
          <fpage>06</fpage>
          .
          <fpage>02</fpage>
          .
          <year>2025</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>YouControl. Business</given-names>
            <surname>Analytics</surname>
          </string-name>
          <article-title>Platform</article-title>
          . URL: http://youcontrol.com.
          <source>ua (accessed: 18.01</source>
          .
          <year>2025</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>