Measuring the Risk of Public Contracts Using Bayesian Classifiers Leonardo J. Sales1,3 and Rommel N. Carvalho1,2 1 Department of Research and Strategic Information at the Brazilian Office of the Comptroller General ⇤ 2 Department of Computer Science at the University of Brası́lia † 3 Department of Economics at the University of Brası́lia ‡ {leonardo.sales,rommel.carvalho}@cgu.gov.br Abstract are the institutional means by which consumption mate- rializes, having important role in the search for efficiency and effectiveness of public spending. Bayesian Classifiers are widely used in ma- chine learning supervised models where there Given the huge number of contracts and purchasing pro- is a reasonable reliability in the dependent vari- cesses to audit, this context raises the challenge of act- able. This work aims to create a risk measure- ing effectively in the pursuit of management problems, ment model of companies that negotiate with fraud, and corruption. This is the responsibility of the the government using indicators grouped into governmental control units, which specially in Brazil has four risk dimensions: operational capacity, his- limited resources. tory of penalties and findings, bidding profile, Take the example of the Office of the Comptroller Gen- and political ties. It is expected that this model eral (CGU), the central unit of internal control of the contributes to the selection of contracts to be Brazilian federal government, which is responsible for audited under the central unit of internal con- auditing any transaction that represents federal spending. trol of the Brazilian government, responsible The CGU should audit both spending conducted directly for auditing more than 30,000 public contracts (by the central units of the ministries) as the ones con- per year. ducted indirectly (by almost 20,000 decentralized units), including all payments made by any state or municipal- ity that receives federal funds through voluntary transfers 1 INTRODUCTION (Brazil, 2003). Nevertheless, the CGU has only 1,200 auditors working directly in the oversight of these ex- Public contracts can be understood as adjustments made penditures. between public administration and private sector for the In this context, a big issue arises involving the need to ra- attainment of public interest objectives (Di Pietro, 1999). tionalize the use of auditing capabilities. There is a clear The contract terms are set by the governmental unit, this need to optimize the choice of what will be effectively being understood as any body or public authority of fed- audited, since the complete census is impossible and un- eral, municipal, or state level. economical. Acting in a preventive way to avoid future Government spending coming from public contracts and problems is also important since most of the errors found direct purchases of goods and services account for ap- generate irrecoverable damage, such as paralysis of a en- proximately 19% of the Brazilian GDP in recent years. gineering project or the need to redo it. Data from the Brazilian Institute of Geography and Both the rationalization of choices (in a subsequent op- Statistics (IBGE), published in National Accounts Re- eration) and the understanding and treatment of vulner- port in the 2015 fourth quarter, quantifies in R$ 1.07 tril- ability (in preventive action) can be analyzed within the lion the amount of government consumption expenditure more general concept of risk assessment. After all, what in that year (IBGE, 2016). The bidding and procurement is sought in both cases is to identify factors or char- ⇤ SAS, Quadra 01, Bloco A, Edificio Darcy Ribeiro Brasilia, acteristics of purchases or contracts which increase the DF, Brazil chance of future problems such as mismanagement or † Campus Darcy Ribeiro Brası́lia, DF, Brazil even fraud. ‡ Campus Darcy Ribeiro Brası́lia, DF, Brazil BMAW 2016 - Page 7 of 59 Supervised learning models have been used in similar tails of the methodology used in the study, including problems in private sector. Financial institutions assess the understanding of data modeling, the creation of the the risk of potential borrowers, among many suitors with networks, and the validation of the models. Section 4 different characteristics and history using such models, presents and discusses the results. Finally, Section 5 pro- in this case called credit scoring (Lessmann et al., 2015). vides conclusions and considerations on gaps and oppor- Insurance companies also use such statistical models to tunities for future work. assign the value of insurance for a certain good. The techniques learn from the transaction history and quan- 2 THEORETICAL REFERENCES tify the weight of certain characteristics in determining the risk of a client or specific process. Thus, the auto in- In this section we describe the public bidding process surance company knows that unmarried young men offer in Brazil, the Bayesian classifiers used for learning the more risk than married women with children. predictive models, and some related works. In practice, these models are applications of statistical and computational techniques of regression and classi- 2.1 PUBLIC BIDDING IN BRAZIL fication using databases that have information of trans- action history and labeled cases of “success” and “fail- The whole process of buying products or hiring services ure” (Friedman et al., 2001). A good condition in the in the Brazilian federal government takes place accord- construction of this type of risk analysis model is the ex- ing to the rules of Law 8666/1993 (Brazil, 1993), called istence of information on transaction history, with vari- Procurement Law. Other regulatory acts complement ables representing different characteristics of each trans- this law, such as Law 10520/2005 (Brazil, 2002), estab- action. Thus, one can distinguish and identify correla- lishing the types of Auction and Complementary Law tions between groups. 123/2006 (Brazil, 2006) establishing privileges for mi- cro and small businesses in bidding. Law 8666/1993 This paper proposes to create a predictive model of risk (Brazil, 1993) details the stages of the bidding process in contracts based on Bayesian classifiers. It will re- itself, the bidding types allowed, types of contracts, as- sult in the quantification of the propensity that a sup- pects of qualification of companies, and also defines ad- plier has problems in government contracts, according ministrative and criminal penalties to be applied to sup- to the company’s characteristics. Learning models using pliers in case of noncompliance. Bayesian networks are especially useful when you need to organize or discover the knowledge of a particular area The Procurement Law, together with other mentioned through the construction of cause and effect relationships legislation, defines the following administrative penalties captured from a set of data (Spiegelhalter et al., 1993). to suppliers, due to total or partial non-performance of Besides this, Bayesian Classifiers have been incorpo- contracts: rated into risk measurement studies, especially when it is important to capture and explain the relationships of • warning; cause and effect between the different prediction param- • pecuniary penalty; eters, avoiding the “black box” issue, common in other techniques. • temporary suspension of bid; The model will be used to select high-risk contracts to • declaration of non-trustworthiness; and be audited by the CGU and will be based on the estima- • impediment to bid and hire. tion of the relations of cause and effect between various indicators that are related to the propensity of contrac- The whole process of procurement and contracting in the tual risk. The dependent variable is the occurrence of federal government is done using the government’s Gen- more severe punishment that can be given to a supplier eral Services Administration System (SIASG). Each pur- in Brazil: the impediment to bidding. The indicators chase or contract is recorded in this system, since the that will be used as predictors represent characteristics opening of the process to the issue of commitment. grouped into four risk dimensions: operational capacity, history of penalties and findings, bidding profile, and po- Existing since 1994, the SIASG started to be used by the litical ties. government gradually and it already has more than 5 mil- lion purchases. All federal administration is required to This work is divided into 5 sections. Besides this in- use this system. Annually it records over 700,000 bids. troduction, Section 2 presents the theoretical framework Some of these bids representing continued provision of that supports the central idea of the work and the method- services or delivery of goods turns into contracts, gener- ological approach adopted. Section 3 contains the de- ating nearly 30,000 new contracts per year. BMAW 2016 - Page 8 of 59 2.2 BAYESIAN CLASSIFIERS MODELS • the logarithm of the Bayesian Dirichlet equivalent score (bde); and Since Bayesian networks (BNs) have been successfully used in classification problems – e.g., see (Sahami et al., • the logarithm of the modified Bayesian Dirichlet 1998; Friedman et al., 1997; Goldszmidt et al., 2010; equivalent score (mbde). Friedman and Goldszmidt, 1996; Cheng and Greiner, 1999; Ceccon et al., 2014; Ye et al., 2014; Shi et al., 2.3 RELATED WORKS 2013) –, we decided to experiment with different BN learning algorithms in order to classify the companies Many studies use supervised learning models in order to that sell service and goods to the government with high predict risk in business transactions. The area where it likelihood of noncompliance. is more common this type of approach is the bank credit (Lessmann et al., 2015; Hand and Henley, 1997). Score-based learning is a popular method for inducing BNs. The main idea is to assign a score to a model based These learning models attempt to quantify how the char- on how well it represents the data set used for learning. acteristics of potential borrowers influence the probabil- Thus, the purpose of the algorithm is to maximize the ity of default. Classically, the techniques most used for goodness-of-fit score. this purpose are Logistic Regression and Discriminant Analysis (Ghodselahi, 2011). Other studies have been In this work we use standard and well-known Bayesian testing and comparing some modern techniques (Baesens network classifiers, which are aimed at classification. et al., 2002). In other areas, such as insurance, such mod- More specifically, we use two algorithms available in the els are also widely used. bnlearn R package1 (Scutari, 2009): Bayesian Classifiers have been incorporated into these studies, especially when you want to capture and explain • Naı̈ve Bayes (naive.bayes): a simple algorithm the relationships of cause and effect between the differ- that assumes that all explanatory variables are in- ent prediction parameters, avoiding the “black box” is- dependent of each other. In other words, the target sue, common in other techniques (Jiang and Wu, 2009; variable is the only parent of all other variables. Zonneveldt et al., 2010; Baesens et al., 2002). Bayesian • Tree-Augmented Naı̈ve Bayes (tree.bayes): an al- algorithms provide more clear insights when modeling gorithm that relaxes the simple Naı̈ve Bayes as- causal relationships. sumption of independence, by allowing the explana- A new approach to credit scoring by synthesizing Sim- tory variables to have one other variable as parent ple Naı̈ve Bayesian Classifier (SNBC) and the Rough Set besides the target one. Theory is presented by (Jiang and Wu, 2009). A compar- ison between Naı̈ve Bayes (NB) models, different aug- Besides that, we also tried two different score-based mented NB models, and a handcrafted causal network is learning algorithms, which are also available in the made by (Zonneveldt et al., 2010). bnlearn R package used in this work (Scutari, 2009): In the context of public procurement, some initiatives al- • Hill-Climbing (hc): a hill climbing greedy search ready exist in order to implement similar models in pre- on the space of the directed graphs. dicting irregularities or contractual problems. For exam- ple, Naı̈ve Bayes algorithms are used by (Balaniuk et al., • Tabu Search (tabu): a modified hill-climbing able 2012) in an unsupervised approach to quantify the com- to escape local optima. bined risk of private companies and government units in the execution of contracts. The bnlearn package implements random restart with (Sales, 2014) built a model with the same objective of configurable perturbing operations for both algorithms. this work (to measure the risk of public contracts) and A number of different scores were used to fine tune the with similar data. In that case the accuracy using Logistic models learned from the score-based algorithms and to Regression and Decision Tree were compared, resulting improve their performance, which are also available in in the best accuracy of 64%. the bnlearn package (Scutari, 2009): 3 METHODS AND PROCEDURES • the Akaike Information Criterion score (aic); The first step in building the Bayesian classification • the Bayesian Information Criterion score (bic); model was the definition of the criteria for characteriza- 1 The package is available at http://www.bnlearn.com/. tion of the companies with the highest risk (the “Bad”). BMAW 2016 - Page 9 of 59 In this sense, we chose to characterize the “Bad” group • Bidding profile: company profile when participat- all companies that suffered the following punishments in ing in bids, as the average quantity of offers, and the the years 2015 and 2016: temporary suspension of bid, degree of success of business (percentage of wins). declaration of non-trustworthiness, and impediment to bid and hire. The group of low-risk companies (here- – Quantity of indicators: 12. inafter “Good”) are companies with existing contracts in – Examples of indicators: quantity of purchases, the same period but without such punishment. purchase quantity of items, average amount of offers, number of units of the federation, num- The database used contained 1,448 companies, of which ber of wins, percentage of victory, value of 724 were previously classified as “Bad” and other 724 contracts, the difference in days between the previously classified as “Good”2 . opening of the company and the first participa- From this initial setting, the second step was the creation tion in a public procurement. of risk indicators, which cover the past of relations be- tween companies and government, considering the pe- • Political ties: company relationship with politi- riod since 2011, as well as other information that are cians, via donations in campaigns. independent of the period, such as those from the reg- – Quantity of indicators: 01. istry of companies. The idea is to answer the following – Examples of indicators: amount donated in po- question: What happened in the recent past of the com- litical campaigns. panies that contributed to its contractual default in 2015 and 2016? The next step was the transformation of all variables in These indicators were obtained from the four dimensions factors (categories), using a simple process of discretiza- of risk: operational capacity, history of penalties and tion, where values of each variable were divided into findings, bidding profile, and political ties. The mean- three intervals of equal size. Once complete, the database ing of each of the risk dimensions and some indicators has been divided in training set (70%) and test (30%). used are described below: The discretization was carried out due to the limitation of some algorithms used. In future experiments, we will • Operational capacity: irregularities related to the learn models using algorithms that allows continous vari- existence or insufficient physical and operational ables. structure of the contracted company. At first, we used standard Bayesian classifiers available – Quantity of indicators: 11. in the bnlearn R package, Naı̈ve Bayes (NB) and Tree- – Examples of indicators: number of employ- Augmented Naı̈ve Bayes (TAN). ees, number of partners, the total amount re- ceived from the government, amount received As the database does not have a very large number of ob- from the government per employee, value re- servations, we used a process of estimation with cross- ceived from the government for partner, av- validation in the training subset for both algorithms. The erage salary of employees, average salary of Cross-Validation procedure applied was the random di- the partners, company size, number of activi- vision of training based on 10 sample partitions of equal ties carried out by the company, age from the size, for use in cycles of modeling where 9 partitions are company. used for training and one for testing. Error measures are then combined to have a single measurement error. • History of penalties and findings: pre-existence of punishment or audit findings related to the com- The estimation with cross-validation was performed us- pany. ing a Score-based learning algorithm, which ranks the network structures created with emphasis on model fit. – Quantity of indicators: 04. In these algorithms, various parameters can be adjusted – Examples of indicators: quantity of received in search of the best results forecast. punishments, number of alerts generated in CGU monitoring. The loss function used to measure the model results was 2 the misclassification, where the dependent variable value The 724 companies in the “Bad” group are all companies that meet the criteria described for this class. The 724 compa- is the result of local distributions (from its parents) and nies in “Good” group was obtained by sampling in the set of the error function is measured by coincidence or not with 41,000 companies that meet the requirements described. Sam- the actual values (hit rate). pling the second group was made in order to solve the dominant class issue, in a process called undersampling (see (Japkowicz Since an important aspect of machine learning is the pa- et al., 2000) for more details of this process). rameter tuning and both NB and TAN in bnlearn do BMAW 2016 - Page 10 of 59 not have any parameters to be tuned, we decided to also performance with the test set. The 95% confidence in- try another set of algorithms. In bnlearn, a set of al- terval of the accuracy was (0.69, 0.77), which shows that gorithms that allow many different configurations is the the model generalizes well. The sensitivity of the model score-based learning algorithms, namely: Hill-Climbing (prediction ability of “BAD” companies) was 76%. Ta- (HC) and Tabu Search (Tabu), both using incremental ble 2 shows the results of prediction on the test set. search. Tabu introduces changes in HC in order to avoid local optima. Table 2: The table shows the model results, that pre- In score-based algorithms, it is critical to set the network sented a total accuracy of 73%, with higher quality in score calculation method, which measures the quality of the identification of “Bad” cases. the network created using the quantification of poste- Real values rior probability. Two variables were used in the score parameterization: type of score and penalty parameter. Prediction Good Bad The tested scores types were AIC (Akaike Information Good 170 47 Criterion Score), BIC (Bayesian Information Criterion Bad 69 148 Score), BDE (Bayesian Dirichlet Equivalent Score), and % 71% (specificity) 76% (sensitivity) MBDE (Modified Bayesian Dirichlet Equivalent Score), suitable for categorical variables. Besides that, we also We consider this a good result in the context of gov- tried many different penalty parameters. ernment contracts, especially when compared with other The central idea was to try different values of each pa- similar works. Taking as reference the results obtained rameter in order to find the setting that present the best by (Sales, 2014), you can see a reasonable gain in predic- predictive ability. For better understanding, Table 1 tive ability. The sensitivity of the model is particularly shows some of these tested settings and its accuracy mea- important since what really matters is the identification sure, aiming to compare the Naı̈ve Bayes (NB) algorithm of high-risk cases, even assuming the cost of auditing setting with different configurations3 of Score-Based al- some low risk contracts, which were misclassified. gorithms. 5 CONCLUSION AND FUTURE WORK Table 1: The table shows that despite our efforts in set- ting up the Score-Based Algorithms, there was no sig- This work is consistent with a great effort that has been nificant difference than the Naı̈ve Bayes and TAN algo- developed by government control institutions to rational- rithms. The Accuracy here is the proportion of true re- ize the use of their human and material resources in order sults, either true positive or true negative. to provide more effective results at lower operating and financial costs. Algorithm Setting Accuracy - 95% CI Considering the current Brazilian context, where a se- NB - (0.70, 0.76) vere economic crisis has been treated through large cuts TAN - (0.72, 0.78) in public budgets (reducing the sending of resources to Tabu AIC, K=0.1 (0.67, 0.73) control bodies), the efficient use of resources should be a Tabu BDE, ISS=25 (0.65, 0.70) permanent goal. HC MBDE, ISS=10 (0.64, 0.70) The attempt to use statistical models based on Bayesian networks is in addition to other initiatives presented in Section 2. The main purpose of these studies is to extract 4 RESULTS knowledge from various databases that government con- trol institutions have access in order to facilitate the se- Since the best models did not present a statistically sig- lection of audit objects more likely to present problems. nificant difference in performance and usually the sim- The classification results are slightly better than other su- pler the model the better the generalization, we chose pervised models applied in government databases with the Naı̈ve Bayes algorithm to run the final model with the same goal (see (Sales, 2014), described in section all the data from the training set in order to check the 2.3). However, we believe that there is room for im- 3 provement in two possible ways: the inclusion of new The parameters used to set the algorithm were the score- based algorithm, Hill-Climbing (HC) or Tabu Search (Tabu), indicators that capture aspects ignored by this model and the score types (AIC, BIC, BDE or MBDE) and the penalty the use of optimization algorithms in the parameteriza- parameter (ISS or K). tion of score-based networks. BMAW 2016 - Page 11 of 59 Each step in direction of improving these models is a per- http://link.springer.com/article/ manent gain for the public auditing activity, and conse- 10.1023/A%3A1007465528199. quently to society. Ahmad Ghodselahi. A hybrid support vector machine ensemble model for credit scoring. International Jour- References nal of Computer Applications, 17(5):1–5, 2011. Moises Goldszmidt, James J. Cochran, Louis A. Bart Baesens, Michael Egmont-Petersen, Robert Cox, Pinar Keskinocak, Jeffrey P. Kharoufeh, Castelo, and Jan Vanthienen. Learning bayesian and J. Cole Smith. Bayesian network classi- network classifiers for credit scoring using markov fiers. In Wiley Encyclopedia of Operations Re- chain monte carlo search. In Pattern Recognition, search and Management Science. John Wiley & 2002. Proceedings. 16th International Conference on, Sons, Inc., 2010. ISBN 9780470400531. URL volume 3, pages 49–52. IEEE, 2002. http://onlinelibrary.wiley.com/doi/ Remis Balaniuk, Pierre Bessiere, Emmanuel Mazer, and 10.1002/9780470400531.eorms0099/ Paulo Cobbe. Risk based Government Audit Plan- abstract. ning using Nave Bayes Classifiers. In Advances in David J Hand and William E Henley. Statistical clas- Knowledge-Based and Intelligent Information and En- sification methods in consumer credit scoring: a re- gineering Systems, 2012. URL https://hal. view. Journal of the Royal Statistical Society: Series archives-ouvertes.fr/hal-00746198/. A (Statistics in Society), 160(3):523–541, 1997. Brazil. Lei n 8666, de 1993, 1993. IBGE. Indicadores do Instituto Brasileiro de Geografia Brazil. Lei n 10520, de 2002, 2002. Estatstica, Contas Nacionais Trimestrais, 2016. Brazil. Lei n 10683, de 2003, 2003. Nathalie Japkowicz et al. Learning from imbalanced data sets: a comparison of various strategies. In AAAI Brazil. Lei Complementar n 123, de 2006, 2006. workshop on learning from imbalanced data sets, vol- S. Ceccon, D.F. Garway-Heath, D.P. Crabb, and ume 68, pages 10–15. Menlo Park, CA, 2000. A. Tucker. Exploring early glaucoma and the visual Yi Jiang and Li Hua Wu. Credit scoring model based field test: Classification and clustering using bayesian on simple naive bayesian classifier and a rough set. networks. IEEE Journal of Biomedical and Health In- In 2009 International Conference on Computational formatics, 18(3):1008–1014, May 2014. ISSN 2168- Intelligence and Software Engineering, 2009. 2194. doi: 10.1109/JBHI.2013.2289367. Stefan Lessmann, Bart Baesens, Hsin-Vonn Seow, and Jie Cheng and Russell Greiner. Comparing bayesian Lyn C. Thomas. Benchmarking state-of-the-art clas- network classifiers. In Proceedings of the Fif- sification algorithms for credit scoring: An up- teenth Conference on Uncertainty in Artificial In- date of research. European Journal of Opera- telligence, UAI’99, page 101108, San Francisco, tional Research, 247(1):124–136, November 2015. CA, USA, 1999. Morgan Kaufmann Publishers Inc. ISSN 03772217. doi: 10.1016/j.ejor.2015.05. ISBN 1-55860-614-9. URL http://dl.acm. 030. URL http://linkinghub.elsevier. org/citation.cfm?id=2073796.2073808. com/retrieve/pii/S0377221715004208. Maria Sylvia Zanella Di Pietro. Direito administrativo, Mehran Sahami, Susan Dumais, David Heckerman, and volume 22. Atlas São Paulo, 1999. Eric Horvitz. A bayesian approach to filtering junk Jerome Friedman, Trevor Hastie, and Robert Tibshi- e-mail. In Learning for Text Categorization: Papers rani. The elements of statistical learning, vol- from the 1998 workshop, volume 62, page 98105, ume 1. Springer series in statistics Springer, Berlin, 1998. 2001. URL http://statweb.stanford.edu/ Leonardo Jorge Sales. Risk prevention brazilian gov- ˜tibs/book/preface.ps. ernment contracts using credit scoring. In Interdisci- Nir Friedman and Moises Goldszmidt. Building clas- plinary Insights on Fraud, chapter 11, pages 264–286. sifiers using bayesian networks. In Proceedings of Cambridge Scholars Publishing, 2014. the national conference on artificial intelligence, page Marco Scutari. Learning bayesian networks with the 12771284, 1996. bnlearn r package. arXiv preprint arXiv:0908.3817, Nir Friedman, Dan Geiger, and Moises Goldszmidt. 2009. Bayesian network classifiers. Machine Learning, 29 Wei Shi, Yao Wu Pei, Liang Sun, Jian Guo Wang, and (2-3):131–163, November 1997. ISSN 0885-6125, Shao Qing Ren. The defect identification of LED 1573-0565. doi: 10.1023/A:1007465528199. URL chips based on bayesian classifier. Applied Mechanics BMAW 2016 - Page 12 of 59 and Materials, 333-335:1564–1568, July 2013. ISSN 1662-7482. doi: 10.4028/www.scientific.net/AMM. 333-335.1564. URL http://www.scientific. net/AMM.333-335.1564. David J. Spiegelhalter, A. Philip Dawid, Steffen L. Lau- ritzen, and Robert G. Cowell. Bayesian Analysis in Expert Systems. Statistical Science, 8(3):219–247, 1993. URL http://www.jstor.org/stable/ 2245959. Ye Ye, Fuchiang (Rich) Tsui, Michael Wagner, Jeremy U. Espino, and Qi Li. Influenza de- tection from emergency department reports us- ing natural language processing and bayesian network classifiers. Journal of the American Medical Informatics Association, pages amiajnl– 2013–001934, January 2014. ISSN , 1527- 974X. doi: 10.1136/amiajnl-2013-001934. URL http://jamia.bmj.com/content/early/ 2014/01/09/amiajnl-2013-001934. S Zonneveldt, K Korb, and A Nicholson. Bayesian network classifiers for the german credit data. Tech- nical report, Technical report, 2010/1, Bayesian Intelligence. http://www. Bayesian-intelligence. com/publications. php, 2010. BMAW 2016 - Page 13 of 59