Public Procurement Fraud Detection and Artificial Intelligence Techniques: a Literature Review Roberto Nai1,* , Emilio Sulis1 and Rosa Meo1 1 Computer Science Department, University of Turin, Italy Abstract Every year, a significant part of public and private organisations’ revenues is lost to fraud. Recently, increasing digitisation has also brought more attention on organisational processes, procurement, and fraud data. Several automated methods have already been proposed to extract information from these kind of sources, including public procurement, also to develop predictive models for fraud detection. In addition, artificial intelligence techniques including machine learning, neural networks, natural language processing, and network analysis methodologies have been adopted to address the issue. This study offers a review of the most recent emerging studies on fraud detection for public organisations. Finally, it summarises the main existing research by proposing a review of current challenges in the field. Keywords Corruption detection, artificial intelligence, literature review, public procurement, 1. Introduction Organizations are increasingly focused on mitigating the chances of experiencing fraud, which represent a significant loss of revenue. For instance, an accredited biennial 2020 study carried out by the Association of Certified Fraud Examiners claims that on average 5% of a company’s revenue is lost because of unchecked fraud every year. Among the reasons for these large losses is that it takes about 14 months for a fraud to be discovered and that audits capture only 3 percent of actual fraud. This necessitates the use of better tools and processes to quickly and inexpensively identify potential criminals [1]. Researches in political science, economics and sociology investigated the field, trying to highlight possible flaws in the systems that lead to such risks, with a view to prevention. Recently, the new possibilities offered by information technology allow for new studies in the area of fraud detection as well. Recent changes include the availability of large data sets at low cost, the use of increasingly powerful computing devices and the development of applications that enable the training of machine learning (ML) models [2, 3]. Public organizations are also subject to fraud risks, starting with public procurement. A major challenge is to be able to detect potential fraud automatically, through appropriate artificial intelligence (AI) techniques. Machine learning methods, in particular, have proven very effective EKAW’22: Companion Proceedings of the 23rd International Conference on Knowledge Engineering and Knowledge Management, September 26–29, 2022, Bozen-Bolzano, IT * Corresponding author. $ roberto.nai@unito.it (R. Nai); emilio.sulis@unito.it (E. Sulis); rosa.meo@unito.it (R. Meo)  0000-0003-4031-5376 (R. Nai); 0000-0003-1746-3733 (E. Sulis); 0000-0002-0434-4850 (R. Meo) © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) in a wide range of practical applications [4, 5, 6]. In addition, the most recent methodologies have also developed Natural Language Processing (NLP) [7] techniques, as well as neural networks [8]. In order to systematise the existing research on the topic of AI techniques applied to fraud detection in public procurement, we propose a systematic review of the research literature. To summarize our goals, we explored the following three research questions: • RQ1: Which disciplinary areas are more interested in investigating frauds in public procurement? • RQ2: What AI techniques are being applied to investigate fraud in public procurement contracts? • RQ3: Which research studies are most influential in the field? The remainder of the paper is organised as follows: section 2 introduces some related works. In section 3 we describe the proposed methodology, while section 4 provides insights about the results of our review. Finally, section 5 provides a summary of main research, and section 6 concludes the paper. 2. Related works There are previous works focused on the application of techniques to public procurement. In [9], 102 articles published between 2015 and 2019 have been selected from Scopus and WoS databases, focusing on the primary data mining techniques used to prevent corruption. It is observed that the main techniques of AI are those based on the theorem of Bayes, neural networks, Support Vector Machines (SVM), decision trees, Random Forest, logistic and linear regression. In another recent survey, 147 articles published between 2015 and 2019 have been selected from Scopus and WoS [10]. These works focused the following types of corruption: fraud (77.49%), overpricing (7.05%), bribery (5.05%) and favouritism (4.66%) generate greater citations in the articles. A large part of research analyses business intelligence literature for bank fraud using text mining. For the geographical distribution of the authors, the first author of the publication was considered; the leading countries with articles on data mining and corruption are United States (16,3%), China (10,9%) and United Kingdom (8,9%). Another work focused on the most used methods to detect different types of “corruption” [11], exploring 23 articles published between 2016 and 2021. Data mining and ML methods are used in this segment over a large amount of data collected from different data sets, such as contract registers, blacklist economic operators, business registers and so on. The methods include classification techniques, with the aim of detecting connections between economic operators and contracting authorities, but also for finding companies that participated in collusion, as well as associations rules, and graph databases algorithms. Another recent review is focused on Social Network Analysis (SNA) to capture the contribu- tions of the scientific community to the topic of corruption in public procurement [12]. Authors identified the most recurrent authors, their interactions, number of citations, identification of keywords, and their repetitions. Authors analyzed 18 articles from 2011 to 2021. To perform network analysis on the collected dataset and represent the interactions between the actors or nodes of the graph, the open-source engine VOSviewer1 was used; the tool allowed the authors to identify the publications, authors, journals, institutions, keywords, and countries with the most significant impact on the research in repositories of scientific articles (from parameters such as centrality degree and edge weight). Our work builds on previous researches by focusing on recent work (2016-2021) related to fraud detection with ML methods/techniques. 3. Methodology For our survey, the methodology of [13] is followed, dividing the process into three phases: planning, conducting the review, and reporting the review. The workflow in figure 1 resumes the activity of this research. Figure 1: Workflow for the planning and conducting the review. Table 1 describes the objective of the survey through the research questions and the expected output variables. 3.1. Semantic structure of the search In the first phase of the workflow, the following keywords have been defined: public tenders, public competitions, public procurement, e-procurement, state laws, fraud detection, corruption, crime, criminal, prediction, predictive, modeling, detection, artificial intelligence, machine learning, deep learning, neural networks. In a second phase, various combinations of the keywords have been prepared as query for the scientific databases (Scopus, WoS and IEEE Explore). In a third phase, specific scripts have been prepared following the syntax of each database. For instance, one of the queries in Scopus (very similar in WoS) has been: 1 https://www.vosviewer.com Question Type of answer sought RQ1: Which disciplinary areas are more interested in List of disciplinary areas investigating frauds in public procurement? RQ2: What AI techniques are being applied to List of AI techniques investigate fraud in public procurement contracts? RQ3: Which research studies are most influential Weighted graph of citations between articles in the field? Table 1 Research questions. TITLE-ABS-KEY ((("PUBLIC TENDER" OR "PUBLIC PROCUREMENT" OR "E-PROCUREMENT" OR "public competitions" OR "public regulations" OR "state laws") AND ("DETECTION" OR fraud OR corruption OR crime OR criminal) AND (prediction OR predictive OR "machine learning" OR "deep learning" OR "neural networks" OR "modeling" OR "artificial intelligence" ))) 3.2. Inclusion and exclusion criteria The results have been filtered by categories: Computer Science, Engineering, Business, Politics and Business Management, joint with Scimago Journal Rating (Quartile and H-Index). The search has been automated in Python combining the Scopus APIs, WoS and IEEE Explore to get automatically the list of results joint with Scimago rating indexes2 . Finally, an amount of 15 studies has been selected for this survey. 4. Results We summarize the main results of our survey by exploring for each paper some features of interest: the disciplines involved, dataset used in input (Input), the AI techniques, methodologies and technologies adopted (Methods). Table 2 and table 3 resume the results about RQ1. Disciplines involved are mostly Computer Science, but also Business, Management and Accounting or Engeneering, from different venues, e.g. Conferences or Journal in Quartile 1 or 2 (Q). Regarding the geographical distribution of the authors, all the authors are considered. Most researchers come from Europe (Spain, Portugal, Italy) and America (Brazil, U.S.A., Paraguay). Finally, the researches summarized in table 4 have been selected with respect to the RQ2 research question. Most frequent methods concern typical ML supervised and unsupervised algorithms. About technologies, Python emerges as the most used programming language (libraries as Scikit-learn are becoming the state-of-the-art in the subject), but also Java or R. 2 Python script using Scopus APIs and Scimago Journal Rating is available here: https://github.com/roberto-nai- unito/scopus-api Among the tools used, we notice Neo4J or KNIME. Finally, some works explore the adoption of neural networks, as well as social network analysis methods. Paper # Cit Venue Disciplines Q Computer Science [14] - Electronics (Switzerland) Q2 Engineering Business [15] 3 International Journal of Forecasting Q1 and International Management Business International Transactions in Management and Accounting [16] 7 Q1 Operational Research Computer Science Decision Sciences [17] - Conference paper - - [18] 3 Conference paper - - [19] 3 Governance Public administration Q1 [20] - Automation in Construction Engineering Q1 [8] 2 Conference paper - - [21] 7 Proceedings Computer Science - [22] 1 Proceedings Computer Science - [23] 1 Proceedings Computer Science - [24] 8 Complexity Computer Science Q1 [25] 4 Business and Politics Business, Management and Accounting Q1 [26] 5 Conference paper - - [27] 10 SSRN Electronic Journal - - Table 2 Disciplinary area of the selected papers with citation count (Cit), the Venue, and the Journal’s Quartile (Q1). Country Amount Spain 12 Brazil 11 United States 8 Portugal 5 Italy 4 Paraguay 3 Croatia 2 Australia, Austria, Colombia, Slovenia, United Kingdom 1 Table 3 Geographical distribution of the authors. A co-occurrence network has been generated in figure 2, starting from the main keyword “public procurement”. By applying a community detection algorithm (Louvain) [28], we detect most dense groups of terms. In particular, a first group includes terms related to corruption, risk, governance with respect to public administration, public spending, public sector. A second group includes terms about ICT technologies (blockchain, e-procurement, information and communication, data processing). Interestingly, a third group includes Artificial Intelligence, Paper Input data Methods & Technologies [14] Public Procurement System (SERCOP) of Clustering (K-Means), Self-Organizing map Ecuador (SOM), Support Vector Machine (SVM) and Principal Component Analysis (PCA). Tech- nologies: Python Scikit-learn library, Mini- Som, Self Organizing Map (SOM), AZURE Machine Learning. [15] Sistema Electronico de Contratación Lasso classification model, gradient boost- Pública (SECOP) of Columbia ing classification model (GBM). Technolo- gies: n.a. [16] Various from the states of Brazil. Graph theory, network analysis, clusteriza- tion, regression analysis. Technologies: n.a. [17] Diario Oficial da Uniao (DOU) of Brazil Bottleneck deep neural network and Bi- LSTM. Technologies: n.a. [18] Public procurement open data from Spain Machine learning, pattern detection. Tech- nologies: Python, R, Neo4j, others [19] Italian dataset managed by the ANAC Binary logistic regression, random Forest, and Gradient Boosting Machines (GBM). Technologies: R. [20] Public procurement open data from Brazil, SGD (Stochastic Gradient Descent), Extra Italy, Japan, Switzerland and USA Trees (Extremely Randomized Trees), Ran- dom Forest, Ada Boost, Gradient Boosting, SVC (C-Support Vector Classification), K Neighbors, MLP (Multi-Layer Perceptron), Bernoulli Naive Bayes and Gaussian Naive Bayes, Gaussian Process. Technologies: Python and Scikit-learn library. [8], [21] Electronic Public Procurement of Croatia NLP, naïve Bayes (NB), logistic regression (LR), support vector machines (SVM). Tech- nologies: Python. [22] Public procurement open data from Unsupervised learning model for anomaly Paraguay detection based on the Isolation Forest al- gorithm. Technologies: KNIME framework. [23] Portuguese Public Procurement Supervised machine learning, graph- oriented database. Technologies: Python Scikit-learn library, Neo4j. [24] Public procurement open data from Spain Random forest regression method. Tech- and EU nologies: Random Forest Regressor from Scikit-learn (Python). [25] European Economic Area members and as- Random forest. Technologies: n.a. sociate countries [26] Various (private and public) Text Analytics, Social Network Analysis, Unsupervised learning, Online probabilistic learning. Technologies: Python, Java, DB/2. [27] Italian dataset managed by the ANAC Lasso, ridge regression, and random forest. Technologies: n.a. Table 4 Papers selected for literature review. data mining, social network (and fraud detection). Fourth, another group of terms includes network methods: semantic web, knowledge graphs, semantic technologies (and anomaly detection). These results confirm the effectiveness of our approach, and we will use these categories in the presentation of the research in section 5. Figure 2: Visualization of a term co-occurrence network. 5. Summary of main research We can summarise the main results of the papers by grouping them into three classes. Most of the papers adopt typical ML methods, while two smaller groups deal mainly with neural networks and network analysis. Typical machine learning methods. In [14], a multi-phase model was used (the identification of anomalies and generation of the detection model), which uses different algorithms, such as clustering (K-Means), Self-Organizing map (SOM), Support Vector Machine (SVM) and Principal Component Analysis (PCA). Following this methodology, a semi-supervised learning model is built for the detection of anomalies, which obtains an accuracy of 95%, allowing the detection of procedures where the aim is to benefit a particular supplier by means of the qualification assignment parameters. Two machine learning models have been used in [15], to predict whether a contract will result in malfeasance, breach of contract, or inefficiency: a lasso classification model [29] and a gradient boosting classification model [30]. The methods used allow to describe which variables —and in which way these variables— contribute to the likelihood that a contract will be problematic, which is very useful from the perspective of policymakers; for instance, variables associated with projects such as their size or duration were important predictors of malfeasance. Also, the time lag between adjudicating the contract and the nearest election showed high predictive value. Alternative predictive models were estimated in [19]; article traces the organization of corruption in public procurement, by theoretically and empirically assessing the contribution of Extra-legal Governance Organizations (EGO) to supporting it. They used traditional regression and supervised machine-learning methods for identifying and validating proxy indicators for EGO presence in public procurement such as single bidding or municipal spending concentration. The predictive models included both traditional regression analysis and machine learning: binary logistic regression, random forest, and Gradient Boosting Machines (GBM). Testing prediction accuracy on unseen data, GBM achieves 85%. Looking at external validity, the model’s predicted EGO score also significantly and moderately strongly correlates with established indicators of organized criminality both within Italy and across Europe. The accuracy of eleven ML algorithms for detecting collusion using collusive datasets obtained from Brazil, Italy, Japan, Switzerland and the United States is tested in [20], while the use of ML in public procurement remains largely unexplored, its potential use to identify collusion is promising. The three top-performing ML algorithms have been the Extra Trees, Random Forest and Ada Boost (ensemble methods). In the scenario where all auction information was available, these algorithm’s accuracy (detection rates) ranged between 81% and 95%, with a balanced accuracy generally above 73% (excluding the US dataset). In [18], a prototype called SALER is proposed. Inside SALER, several internal and external data sources are analysed and assessed to explore possible irregularities in budget and cash management, public service accounts, salaries, disbursement, grants, subsidies, etc. SALER combines descriptive and predictive machine learning models and the results can be accessed with a web interface. Finally, the authors mention two frameworks similar to SALER: zIndex3 , a public procurement benchmarking tool for rating contracting authorities which is being developed in the Czech Republic by researchers from the Charles University of Prague and Arachne4 , considered by the European Commission as a good tool amongst anti-fraud mea- sures; this risk-scoring tool generates more than 100 risk indicators sorted into specific risk categories to help managing authorities and intermediate bodies to prevent and detect errors and irregularities among projects, beneficiaries, contracts and contractors. The relation between the award price and the bidding price is investigated by [24]. It is 3 https://www.zindex.cz 4 https://ec.europa.eu/social/main.jsp?catId=325&intPageId=3587&langId=en proposed an award price estimator that uses the random forest [31] regression method over the Spanish open data from 2012 to 2018. Finally, a similar analysis, employing a dataset from European countries (TED5 ), is presented to compare and generalise the results. The article illustrates how a machine learning algorithm can be useful. Particularly, random forest predicts the award prices with less uncertainty, adapting to the real market. Machine learning tools are used to analyze by [25] to analyze a large dataset of public contracts from across Europe, in order to identify the conditions under which close connections, defined both in terms of repeated interaction, as well as geographical dispersion, appear. In this case, random forest models have been used. In [27], three main results through detailed data on the content of calls for tenders involving roadwork contracts in Italy are presented. The prediction capability of the various corruption indicators using standard ML algorithms have been tested: lasso, ridge regression, and random forest. The article shows that, among ML methods, the random forests algorithm provides the most accurate prediction. At a more general level, the article suggests that a higher standardiza- tion of call for tenders documents can contribute to reduce corruption risks. For this purpose, sector authorities or specialized public bodies can play a crucial role. In [22] the initial results of an anomaly detection experiment by applying Isolation Forrest algorithm to a publicly available dataset, i.e. the public procurement of Paraguay are discussed. An in-depth study of the diversity of ties between buyers and sellers in public contracts adopted a statistical analysis with Random Forest models starting with 3.3 million European Union contracts between 2009 and 2015. The effectiveness of the model is validated with local known anomalous procurement processes, which are: a) processes protested by entities involved in the contracting process, which were determined in favor of the protestant, and b) complaints about the contracting process from external entities with the possibility of anonymity. The results show an accuracy of over 90% in detecting these known anomalies as early as in the tender stage and during the contracting stage. Network analysis and text mining. A Decision Support System (DSS) is proposed in [16] to allow law enforcement agencies to establish priorities concerning the companies to be investigated. This DSS incorporates data mining algorithms for quantifying dozens of corruption risk patterns for all public contractors inside a specific jurisdiction, leading to improvements in the quality of public spending and to the identification of more cases of fraud. These algorithms combine operations research tools such as graph theory, clusterization, and regression analysis with advanced data science methods to allow the identification of the main risk patterns. Starting from various dataset and social network analysis (graph model based), an unsupervised learning model has been developed for clustering fraudulent employees by [26]. In [8] and [21], the use of advanced text mining to improve the procurement process is explored. Based on Public Procurement of Croatia6 . The authors introduce the use of NLP to improve the research of frauds, comparing common classification algorithms: Naïve Bayes (NB), Logistic regression (LR) and Support Vector Machines algorithm (SVM). The models have been trained and tested on all data, and by groups of procurement lots (food, medical equipment, construction, IT services, etc.) defined in the unique Public Procurement Dictionary (CPV). 5 https://data.europa.eu/data/datasets/ted-csv?locale=en 6 https://eojn.nn.hr Groups such as IT services, repair and maintenance services, and health and social work services have good prediction results; conversely, groups such as architecture, construction, engineering and inspection services provide bad metrics, precisely because of the lack of information on technical and professional abilities. Neural networks. The types of fraud investigated by [17] are mainly collusion (bid-rigging), over pricing, and delivery fraud (quality and quantity of services and materials). To evaluate the reference dataset, bottleneck Deep Neural Networks and Bidirectional Neural Networks [32] were chosen. Deep neural network models were built using the Tensorflow [33]. Both bottleneck deep neural network and Bi-LSTM proved to be competitive with traditional classifiers and achieved better precision, which is more desirable (over recall) in a criminal fraud investigation. In [23], starting from the Portuguese Public Procurement portal, a graph-oriented user interface is proposed to support decision-making, using Cypher queries. Beside this, supervised machine learning methods are used to find suspicious procurement. After summarize the main research with respect to the methods and technologies used, it’s important to note how this kind of researches can have limitations, i.e. it is widely known that ML algorithms are akin to a black box from which it is difficult to explain to not-experts. Lawyers and stakeholders can be interested in explaination about the results, while the inherent complexity of the problem being analyzed do not facilitate the task (at least not in a straightforward manner) [34]. Another issue is the need of a substantial amount of reliable historical data, some of which (especially the collusion-related) may not always be made available by competition commissions or law enforcement agencies [20]. 6. Conclusions and future work We provided a review of the most recent studies on fraud detection for public organisations. We detect typical methods based on ML algorithms or network analysis, with some emerging interest on neural network. As a future work, we are interested in detect relevant authors in the field, according to [35]. In particular, we aim to perform a bibliometric network analysis (graph-based and timeline-based) to find centrality and density of authors connected about the topic of machine learning and fraud detection in public procurement. References [1] ACFE, 2020 report to the nations—the acfeś 11th study on the costs and effects of occupational fraud (2020). URL: https://acfepublic.s3-us-west-2.amazonaws.com/ 2020-Report-to-the-Nations.pdf. [2] H. R. Varian, Big data: New tricks for econometrics, Journal of Economic Perspectives 28 (2014) 3–28. URL: https://www.aeaweb.org/articles?id=10.1257/jep.28.2.3. doi:10.1257/ jep.28.2.3. [3] S. Mullainathan, J. Spiess, Machine learning: An applied econometric approach, Journal of Economic Perspectives 31 (2017) 87–106. URL: https://www.aeaweb.org/articles?id=10. 1257/jep.31.2.87. doi:10.1257/jep.31.2.87. [4] Z. Zhou, Machine Learning, Springer, 2021. URL: https://doi.org/10.1007/978-981-15-1967-3. doi:10.1007/978-981-15-1967-3. [5] E. Sulis, L. Humphreys, F. Vernero, I. A. Amantea, D. Audrito, L. D. Caro, Exploiting co-occurrence networks for classification of implicit inter-relationships in legal texts, Inf. Syst. 106 (2022) 101821. doi:10.1016/j.is.2021.101821. [6] E. Sulis, L. B. Humphreys, D. Audrito, L. Di Caro, Exploiting textual similarity techniques in harmonization of laws, in: S. Bandini, F. Gasparini, V. Mascardi, M. Palmonari, G. Vizzari (Eds.), AIxIA 2021 – Advances in Artificial Intelligence, Springer International Publishing, Cham, 2022, pp. 185–197. [7] S. Bird, E. Klein, E. Loper, Natural Language Processing with Python, 1st ed., O’Reilly Media, Inc., 2009. [8] N. Modrušan, K. Rabuzin, L. Mrsic, Improving public sector efficiency using ad- vanced text mining in the procurement process, 2020, pp. 200–206. doi:10.5220/ 0009823102000206. [9] Y. Torres Berru, V. F. López Batista, P. Torres-Carrión, M. G. Jimenez, Artificial intelligence techniques to detect and prevent corruption in procurement: A systematic literature review, in: M. Botto-Tobar, M. Zambrano Vizuete, P. Torres-Carrión, S. Montes León, G. Pizarro Vásquez, B. Durakovic (Eds.), Applied Technologies, Springer International Publishing, Cham, 2020, pp. 254–268. [10] Y. Torres Berrú, V. Batista, P. Torres-Carrion, Data mining to detect and prevent corruption in contracts: Systematic mapping review, RISTI - Revista Iberica de Sistemas e Tecnologias de Informacao (2020) 13–25. [11] N. Modrusan, K. Rabuzin, L. Mrsic, Review of public procurement fraud detection tech- niques powered by emerging technologies, International Journal of Advanced Computer Science and Applications 12 (2021). doi:10.14569/IJACSA.2021.0120272. [12] Public procurement fraud detection: A review using network analysis", in: R. Benito, C. Cherifi, H. Cherifi, E. Moro, L. Rocha, M. Sales-Pardo (Eds.), Complex Networks & Their Applications X, volume I of Studies in Computational Intelligence, Springer, Cham, 2022", pp. 116–129. URL: https://complexnetworks.org. doi:10.1007/978-3-030-93409-5\_11". [13] P. V. Torres-Carrión, C. S. González-González, S. Aciar, G. Rodríguez-Morales, Methodology for systematic literature review applied to engineering and education, in: 2018 IEEE Global Engineering Education Conference (EDUCON), 2018, pp. 1364–1373. doi:10.1109/ EDUCON.2018.8363388. [14] Y. Torres-Berru, V. F. López Batista, Data mining to identify anomalies in public procure- ment rating parameters, Electronics 10 (2021). URL: https://www.mdpi.com/2079-9292/10/ 22/2873. doi:10.3390/electronics10222873. [15] J. Gallego, G. Rivero, J. Martínez, Preventing rather than punishing: An early warning model of malfeasance in public procurement, International Journal of Forecasting 37 (2021) 360–377. URL: https://www.sciencedirect.com/science/article/pii/ S0169207020300935. doi:https://doi.org/10.1016/j.ijforecast.2020.06.006. [16] R. B. Velasco, I. Carpanese, R. Interian, O. C. G. P. Neto, C. C. Ribeiro, A decision support system for fraud detection in public procurement, Int. Trans. Oper. Res. 28 (2021) 27–47. URL: https://doi.org/10.1111/itor.12811. doi:10.1111/itor.12811. [17] M. Lima, R. Silva, F. Lopes de Souza Mendes, L. R. de Carvalho, A. Araujo, F. de Barros Vidal, Inferring about fraudulent collusion risk on Brazilian public works contracts in official texts using a Bi-LSTM approach, in: Findings of the Association for Computational Linguistics: EMNLP 2020, Association for Computational Linguistics, Online, 2020, pp. 1580–1588. URL: https://aclanthology.org/2020.findings-emnlp.143. doi:10.18653/v1/ 2020.findings-emnlp.143. [18] F. Plumed, J. Casamayor, C. Ferri, J. Gómez, E. Vendrell Vidal, SALER: A Data Science Solution to Detect and Prevent Corruption in Public Administration, 2019, pp. 103–117. doi:10.1007/978-3-030-13453-2\_9. [19] M. Fazekas, S. Sberna, A. Vannucci, The extra-legal governance of corruption: Tracing the organization of corruption in public procurement, Governance (2021). [20] M. J. García Rodríguez, V. Rodríguez-Montequín, P. Ballesteros-Pérez, P. E. Love, R. Signor, Collusion detection in public procurement auctions with machine learn- ing algorithms, Automation in Construction 133 (2022) 104047. URL: https:// www.sciencedirect.com/science/article/pii/S0926580521004982. doi:https://doi.org/ 10.1016/j.autcon.2021.104047. [21] K. Rabuzin., N. Modrušan., Prediction of public procurement corruption indices using machine learning methods, in: Proceedings of the 11th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - KMIS„ INSTICC, SciTePress, 2019, pp. 333–340. doi:10.5220/0008353603330340. [22] M. Niessen, J. Paciello, J. Fernandez, Anomaly detection in public procurements using the open contracting data standard, 2020, pp. 127–134. doi:10.1109/ICEDEG48599.2020. 9096674. [23] D. Carneiro, P. Veloso, A. Ventura, G. Palumbo, J. Costa, Network Analysis for Fraud Detection in Portuguese Public Procurement, 2020, pp. 390–401. doi:10.1007/ 978-3-030-62365-4\_37. [24] M. J. García Rodríguez, V. Montequín, F. Ortega-Fernández, J. Balsera, Public procurement announcements in spain: Regulations, data analysis, and award price estimator using machine learning, Complexity 2019 (2019) 1–20. doi:10.1155/2019/2360610. [25] M. Popa, Uncovering the structure of public procurement transactions, Business and Politics 21 (2019) 351–384. doi:10.1017/bap.2019.1. [26] A. Dhurandhar, B. Graves, R. Ravi, G. Maniachari, M. Ettl, Big data system for ana- lyzing risky procurement entities, in: Proceedings of the 21th ACM SIGKDD Inter- national Conference on Knowledge Discovery and Data Mining, KDD ’15, Associa- tion for Computing Machinery, New York, NY, USA, 2015, p. 1741–1750. URL: https: //doi.org/10.1145/2783258.2788563. doi:10.1145/2783258.2788563. [27] F. Decarolis, C. Giorgiantonio, Corruption red flags in public procurement: New evi- dence from italian calls for tenders, SSRN Electronic Journal (2020). doi:10.2139/ssrn. 3612661. [28] X. Que, F. Checconi, F. Petrini, J. A. Gunnels, Scalable community detection with the Lou- vain algorithm, in: 2015 IEEE International Parallel and Distributed Processing Symposium, IEEE, 2015, pp. 28–37. [29] R. Tibshirani, Regression shrinkage and selection via the lasso, Journal of the royal statistical society series b-methodological 58 (1996) 267–288. [30] J. H. Friedman, Greedy function approximation: a gradient boosting machine, Annals of statistics (2001) 1189–1232. [31] L. Breiman, Machine learning, volume 45, number 1 - springerlink, Machine Learning 45 (2001) 5–32. doi:10.1023/A:1010933404324. [32] M. Schuster, K. Paliwal, Bidirectional recurrent neural networks, IEEE Transactions on Signal Processing 45 (1997) 2673–2681. doi:10.1109/78.650093. [33] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Isard, Y. Jia, R. Joze- fowicz, L. Kaiser, M. Kudlur, J. Levenberg, D. Mané, R. Monga, S. Moore, D. Murray, C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke, V. Va- sudevan, F. Viégas, O. Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu, X. Zheng, Tensorflow: Large-scale machine learning on heterogeneous distributed systems, 2015. URL: http://download.tensorflow.org/paper/whitepaper2015.pdf. [34] R. Guidotti, A. Monreale, S. Ruggieri, F. Turini, F. Giannotti, D. Pedreschi, A survey of methods for explaining black box models, ACM Comput. Surv. 51 (2018). URL: https: //doi.org/10.1145/3236009. doi:10.1145/3236009. [35] A. Perianes-Rodriguez, L. Waltman, N. J. van Eck, Constructing bibliometric networks: A comparison between full and fractional counting, Journal of Informetrics 10 (2016) 1178–1195. URL: https://www.sciencedirect.com/science/article/pii/S1751157716302036. doi:https://doi.org/10.1016/j.joi.2016.10.006.