=Paper= {{Paper |id=Vol-1663/bmaw2016_paper_3 |storemode=property |title=Bayesian Networks on Income Tax Audit Selection - A Case Study of Brazilian Tax Administration |pdfUrl=https://ceur-ws.org/Vol-1663/bmaw2016_paper_3.pdf |volume=Vol-1663 |authors=Leon Silva,Henrique Rigitano,Rommel Novaes Carvalho,João Carlos Felix Souza |dblpUrl=https://dblp.org/rec/conf/uai/SilvaRCS16 }} ==Bayesian Networks on Income Tax Audit Selection - A Case Study of Brazilian Tax Administration== https://ceur-ws.org/Vol-1663/bmaw2016_paper_3.pdf
        Bayesian Networks on Income Tax Audit Selection - A Case Study of
                         Brazilian Tax Administration


 Leon Sólon da Silva ⇤          Henrique de C. Rigitano †           Rommel N. Carvalho             João Carlos F. Souza ¶
 Secretariat of Federal            Secretariat of Federal            Brazil’s Office of the         Universidade de Brası́lia
   Revenue of Brazil                 Revenue of Brazil               Comptroller General ‡              jocafs@unb.br
Universidade de Brası́lia       henrique.rigitano@rfb.gov.br        Universidade de Brası́lia §
 leon.silva@rfb.gov.br                                            rommel.carvalho@cgu.gov.br


                         Abstract                                  earns. In most countries, sales taxes amount are consid-
                                                                   erably larger than income taxes (OECD, 2013). In Brazil,
     Tax administrations in most countries have                    corporate and personal income taxes are about 50% of
     more corporate and personal information than                  the country’s revenue (RFB, 2016). Although corporate
     any other government office. Data mining                      tax has much greater impact on final numbers, personal
     techniques can be used in many different prob-                income tax audits affects a considerably large share of
     lems due to the large amount of tax returns re-               the Brazilian citizens. There are 27 million individual
     ceived every year. In the present work we show                taxpayers in Brazil, about 13% of the population (RFB,
     an essay of the Brazilian Tax Administration                  2016).
     on using Bayesian networks to predict taxpay-                 In order to facilitate and prioritize tax audits on personal
     ers behavior based on historical analysis of in-              income tax, RFB created the concept of a “fiscal lattice”.
     come tax compliance. More specifically, we                    One can understand the fiscal lattice as a first audit se-
     tried to improve a previous risk based audit se-              lection based on historical risk analysis of tax compli-
     lection which detects a large amount of taxpay-               ance by taxpayers. This lattice is a complex process in
     ers as high risk. However, in its current form                which many tax auditors specialized in personal income
     it identifies much more cases than the tax audi-              tax frauds create risk based rules for audit selection. The
     tors can handle. Our first results are promising,             main difference between a regular audit and fiscal lattice
     considerably improving tax audit performance.                 audit is that the former has a much simpler process of
                                                                   analysis in order to determine whether to punish a tax-
1    INTRODUCTION                                                  payer or not.
                                                                   Since the number of taxpayers has increased, and the ra-
Tax administrations have more information on people                tio between tax auditors and citizens has been reducing
and companies than any other government office. Tax re-            (RFB, 2016), the number of income taxpayers caught on
turns, bank transactions, and invoices arrive as hundreds          fiscal lattice has increased as well. From 2010 to 2014,
of millions of records every year. The Secretariat of Fed-         the taxpayers selected for this kind of audit highly in-
eral Revenue of Brazil (RFB) is the Brazilian Tax Ad-              creased (RFB, 2016). This changing scenario is pushing
ministration and Brazilian Customs as well. This combi-            the tax administration to a limit of the tax auditors ca-
nation is a major leverage and also a challenge.                   pacity of analysis. RFB’s major office, has about 10,000
Basically, there are two types of taxes: sales taxes and in-       tax auditors and a huge backlog of fiscal lattice audits to
come taxes. Sales taxes includes value-added taxes and             analyze.
they are based on the value of the product being sold. In-         Data mining techniques can help better selecting taxpay-
come tax is based on how much a person or a company                ers for audit and the present work offers one solution
    ⇤
    Anexo Ministério da Defesa, 5o andar Brası́lia, DF, Brazil    to improve the selection of this kind of audits. In Sec-
    †
    Av. Rogerio Weber, 1752 - Centro, Porto Velho, RO,             tion 2.1 we discuss how Bayesian networks can be used
Brazil                                                             as a classification algorithm in order to create predictive
    ‡
    SAS, Quadra 01, Bloco A, Edificio Darcy Ribeiro Brasilia,      models.
DF, Brazil
    §
    Campus Darcy Ribeiro Brasilia, DF, Brazil                      The document is organized as follows: Section 2 de-
    ¶
    Campus Darcy Ribeiro Brasilia, DF, Brazil                      scribes some background information about Bayesian



                                                        BMAW 2016 - Page 14 of 59
networks; Section 3 details the solution for the tax audit
selection problem, from its methodology to our first re-
sults; Section 4 presents the conclusion and future work.

2     BACKGROUND

In this section we bring some tax administration con-
cepts, formulate the problem assessed by the present
work, and discuss Bayesian networks for prediction.

2.1     BAYESIAN NETWORKS FOR PREDICTIVE
        MODELS

As stated by (Korb and Nicholson, 2010) Bayesian net-            Figure 1: Example of Naı̈ve Bayes Network (Zhang,
works (BNs) are graphical models for reasoning under             2004)
uncertainty, where the nodes represent variables (discrete
or continuous) and arcs represent direct connections be-
tween them. These direct connections are often causal            forcing a tree structure. In this case, each explanatory
connections. In addition, BNs model the quantitative             variable only depends on the class and one other variable.
strength of the connections between variables, allowing          This relaxation allows the representation of more com-
probabilistic beliefs about them to be updated automati-         plex models, leading to possible performance improve-
cally as new information becomes available.                      ments, as shown in (Carvalho et al., 2014).

Bayesian networks are useful to learn from data and dis-
cover causalities between variables and it can be used
as a classifier algorithm. It is being used for predic-
tion in many different problems, from genetics (Jansen
et al., 2003) and prognostics of breast cancer (Gevaert
et al., 2006), to identification of split purchases (Car-
valho et al., 2014). In the present work, we use Bayesian
networks as a solution for predicting a taxpayer to be
compliant or non-compliant in terms of tax obligations.
In more detail, our approach presents an improvement of
tax audit selection using Bayesian networks to build pre-
dictive models. In the next section we present the details
for the solution to our problem, as well as the first results.
The next subsections describe two different types of
Bayesian networks, Naı̈ve Bayes and Tree-Augmented
Naı̈ve Bayes.
                                                                 Figure 2: Example of Tree-Augmented Naı̈ve Bayes
                                                                 Network (Jiang et al., 2009)
2.1.1    Naı̈ve Bayes

Naı̈ve Bayes is the most simple version of Bayesian net-
                                                                 2.2   RELATED WORK
work. It uses strong connections between the nodes and
it considers all explanatory variables (nodes) as indepen-       As stated in (Silva et al., 2015) many tax administrations
dent. Despite its simpleness it has many applications            have been using data mining techniques to create pre-
with good results and great run performance as stated in         dictive models for tax compliance risk. Despite being
(Zhang, 2004).                                                   a topic of great interest, tax administrations have many
                                                                 concerns in publishing internal projects. Since taxpayer
2.1.2    Tree-Augmented Naı̈ve Bayes                             information is classified and should be protected by tax
                                                                 officers, many of them do not share the details of tax
Tree-Augmented Naı̈ve Bayes (TAN), as explained in
                                                                 compliance risk projects.
(Zheng and Webb, 2011), relaxes the assumption of com-
plete independence of the explanatory variables by en-           A source of such information, case studies, methodolo-



                                                       BMAW 2016 - Page 15 of 59
gies, and best practices are intergovernmental organiza-      selection.
tions. For tax administrations and customs the World
Customs Organization (WCO) and the Organization for           3.1   METHODOLOGY
Economic Cooperation and Development (OECD) are
important sources. In a recent survey that gathered           The methodology of the present work follows the well-
many countries, OECD presented a comparative chart            known CRISP-DM (CRISP-DM). The Cross Industry
that shows the use of data mining to detect tax fraud         Standard Process for Data Mining is a technology-
(OECD, 2013).                                                 independent methodology and reference model to im-
Tax Administrations internal publications also present        plement data mining process in every business. It de-
many studies that can be applied by other countries and       scribes each phase every data mining work should pass.
many of them have developed methodologies based on            Each phase is equally relevant for the success of the data
statistical analysis and data mining to create tax com-       analysis process and should not be underestimated. The
pliance risk systems. Most countries use data mining          process has six phases and it is possible to perform the
for taxpayers classification considering its risks of non-    same step more than once. The phases of CRISP-DM
compliance.                                                   are (Wirth and Hipp, 2000):

Some studies, however, reveal different data analysis ap-
proach being held in tax administration. The US In-
ternal Revenue Service (IRS) uses data mining for dif-
ferent purposes, according to (Castellón González and
Velásquez, 2013), among which are tax compliance risk
based taxpayer classification, tax fraud detection, tax re-
fund fraud, criminal activities, and money laundering
(Watkins et al., 2003).
Another related reference is Jani Martikainens master
thesis (Martikainen et al., 2012). He presents results
of studies conducted by the Australian Taxation Office
(ATO) concerning the usage of models to detect high-
risk tax refund claims. Also according to the author,
the ATO avoided the payment of refunds of about US$
665,000,000.00 between 2010 and 2011 based on data
mining tools. ATO uses refund models based on social
networking discovery algorithms that detect connections
between individuals, companies, partnerships, or tax re-
turns. The models are updated and refined to enhance
detection and increase the recognition of new fraud (Mar-
tikainen et al., 2012).
More related to the present work Gupta et al. in (Gupta
and Nagadevara, 2007) describes in details different ap-
proaches on using data mining techniques to improve tax
audit selection. The main difference is that in (Gupta and
Nagadevara, 2007) the main taxes are value-added taxes        Figure 3: CRISP-DM Reference Model (Wirth and Hipp,
in contrast with income taxes, object of the present re-      2000)
search. Also in (Kirkos et al., 2007) data mining is used
to detect frauds on financial statements, which can be
easily customized to tax returns and tax evasion/fraud.       Business Understanding
                                                              Every data analysis process is designed to answer busi-
3   SOLUTION AND FIRST RESULTS                                ness questions to achieve business goals. In the busi-
                                                              ness understanding phase of CRISP-DM these questions
In this section we describe the methodology used in the       are asked and possible solutions are also proposed. Pos-
present work and detail each step of the data analysis        sible quantitative and qualitative business process’ im-
from the information and data gathering to the construc-      provements are also detailed, in order to justify the use
tion of predictive models for improvement of tax audit        of data mining techniques to solve business problems.



                                                    BMAW 2016 - Page 16 of 59
According to (Chapman et al., 2000), this initial phase         ness objectives. A key objective is to determine if there
focuses on understanding the project objectives and re-         is some important business issue that has not been suffi-
quirements from a business perspective, and then con-           ciently considered. At the end of this phase, a decision
verting this knowledge into a data mining problem defi-         on the use of the data mining results should be reached.
nition, and a preliminary project plan designed to achieve
the objectives.                                                 Deployment
                                                                Creation of the model is generally not the end of the
Data Understanding
                                                                project. Usually, the knowledge gained will need to be
Once the business questions are clear, it is time to under-     organized and presented in a way that the customer can
stand the required information to perform the changes           use it. Depending on the requirements, the deployment
needed in the business process and achieve the goals            phase can be as simple as generating a report or as com-
identified in the previous phase. In data understanding,        plex as implementing a repeatable data mining process.
all sources of information needed to perform the analysis       In many cases it will be the user, not the data analyst,
are determined. The first insights and main patterns are        who will carry out the deployment steps. In any case,
also identified in the first contact with the data available    it is important to understand up front what actions will
from the possible sources. Each business question needs         need to be carried out in order to actually make use of
to be mapped to every data source (systems, databases,          the created models.
webpages, etc.) in order to address every goal and iden-
tify possible gaps and lack of information.                     3.2   BUSINESS UNDERSTANDING
In (Wirth and Hipp, 2000) it is stated that there is a          Our main goal is to improve individuals tax audit selec-
close link between business understanding and data un-          tion. We try to achieve a better audit process perfor-
derstanding. The formulation of the data mining problem         mance by better using the tax auditors knowledge and
and the project plan require at least some understanding        time available to perform these audits. As in any tax ad-
of the available data.                                          ministration, there are far more taxpayers returns and in-
                                                                formation to analyze than tax officers, and to achieve the
Data Preparation                                                revenue goals and tax fairness it is major that the selec-
The data preparation phase covers all activities to con-        tion of audit is as risk based as possible.
struct the final dataset (data that will be fed into the mod-   In Brazil, personal taxpayers pay their income taxes ev-
eling tool(s)) from the initial raw data. Data preparation      ery month. Since the tax is calculated on a year based,
tasks are likely to be performed multiple times, and not        by April of the next year, taxpayers are obliged to send
in any prescribed order. Tasks include table, record, and       their income tax return in order to adjust their debt (or
attribute selection, data cleaning, construction of new at-     credit). Every year, tens of million of returns are sent to
tributes, and transformation of data for modeling tools.        RFB, much more than it could handle if there were no
                                                                risk based selection.
Modeling
                                                                RFB created the concept of “fiscal lattice” to select per-
In this phase, various modeling techniques are selected         sonal income tax returns based on tax compliance risk.
and applied, and their parameters are calibrated to op-         In this technique personal income tax fraud experts an-
timal values. Typically, there are several techniques for       alyze the historical of all taxpayers and their previous
the same data mining problem type. Some techniques re-          knowledge in order to come up with parameters to select
quire specific data formats. There is a close link between      the tax returns for audit. Once caught on “fiscal lattice”,
data preparation and modeling. Often, one realizes data         only a tax officer could release the tax return, prevent-
problems while modeling or one gets ideas for construct-        ing fraudsters from receiving a possible credit. There are
ing new data.                                                   three main purposes in using this technique:
Evaluation                                                        • to better select taxpayers based on tax compliance
At this stage in the project you have built one or more             risk;
models that appear to have high quality, from a data anal-        • to facilitate the verification of tax auditors, since
ysis perspective. Before proceeding to final deployment             each parameter has a well defined analysis and treat-
of the model, it is important to more thoroughly evalu-             ment activities;
ate the model, and review the steps executed to construct
the model, to be certain it properly achieves the busi-           • to ease the auto-correction of tax returns by taxpay-
                                                                    ers, since many of them were caught due to filling



                                                      BMAW 2016 - Page 17 of 59
          errors.                                                 variable (compliant) other 20 characteristics of taxpay-
                                                                  ers and information retrieved from returns and other sys-
                                                                  tems. From these, 13,547 are women and 10,730 are
Besides all Brazilian tax administration efforts to select
                                                                  men. Other explanatory variables are information of tax
the individuals tax audits, the number of audits selected
                                                                  return and unfortunately cannot be specified because it
by fiscal lattice has increased from 569,000 in 2011 to
                                                                  could present classified information, since the result of
937,000 in 20141 in contrast with the number of tax of-
                                                                  the analysis could lead taxpayers to learn fraud patterns
ficers, that decreased from 12,273 in 2010 to 10,419 in
                                                                  and use that information to avoid being caught.
2015 (RFB, 2016).
                                                                  For preparation, all independent variables were analyzed
More specifically, we intend to use data mining tech-
                                                                  in order to remove the incomplete rows and to discretize
niques to discharge as many taxpayers as possible of fis-
                                                                  continuous ones to comply with the Bayesian network al-
cal lattice, with the minimum compliance risk to tax ad-
                                                                  gorithms constraints. The numeric variables where clas-
ministration. With thousands of audits already finalized
                                                                  sified within bands in terms of average multipliers (one
by experienced tax auditors, it is possible to assess this
                                                                  average, half average, three times average, etc.). After
problem with machine learning tools and achieve best re-
                                                                  data preparation the final number of individual taxpayers
sults in letting go those taxpayers that offer less risk of
                                                                  returns was 24,277.
tax compliance.
                                                                  All data preparation took place using R language2 and its
In our first approach on trying data mining techniques
                                                                  packages.
to address the problem, we selected a certain RFB’s unit
that has been suffering from the large number of fiscal
lattice audits. The “Delegacia Especial de Pessoa Fsica”          3.4       MODELING AND EVALUATION
(DERPF) or “Individual Taxpayers Special Office” is an
individual taxpayer specialized unit located at Sao Paulo         We used bnlearn R package3 in order to run the
City, the Brazilian biggest city, in the most economically        Bayesian network algorithms. Specifically the functions
active federation unit (State of Sao Paulo). This unit has        naive.bayes and tree.bayes where chosen to create the
come to its limit of fiscal audits since its creation in 2014,    predictive models. The first is the well-known Naı̈ve
and has the largest number of this kind of audits in the          Bayes algorithm, which does not take parameters for cus-
whole country. It was a natural choice for our first exper-       tomizing the models and the former is an implementation
iments.                                                           of the Tree-Augmented Naı̈ve Bayes (TAN) algorithm.
                                                                  The TAN algorithm takes white list (force the inclusion
                                                                  of arcs in Bayesian network), black list (force the exclu-
3.3       DATA UNDERSTANDING AND
                                                                  sion of arcs in Bayesian network), and mi4 parameters.
          PREPARATION
                                                                  To create the predictive models we took the compli-
To answer the business question on how to improve the             ant variable as dependent and the other 35 (thirty five)
selection of individual taxpayers caught in fiscal lattice,       information as independent variables. The sample of
we evaluated the sources of the information needed to             24,277 where divided into training (80%) and test (20%).
perform the data mining analysis. Our sample was taken            No validation sample was needed since we used 10-
from audits performed by DERPF from years 2014 to                 fold cross-validation technique with bnlearn’s function
2016.                                                             bn.cv().
Basically, all individuals taxpayer information was taken         As stated in bn.cv() documentation (CRAN, 2016) k-
from internal systems, from online systems to data-               fold is a technique where the data is split in k subsets
marts and datawarehouses. Most of taxpayer informa-               of equal size. For each subset in turn, bn is fitted (and
tion caught in fiscal lattice is available from tax returns,      possibly learned as well) on the other k - 1 subsets and
but some information is taken from invoices and financial         the loss function is then computed using that subset. Loss
operations. The exact properties retrieved by the data            estimates for each of the k subsets are then combined to
extraction as well as the fraud/non-compliance rate are           give an overall loss for data.
classified information.
                                                                        2
                                                                        https://www.r-project.org/.
The final taxpayer table has 25,322 taxpayer’s returns                  3
                                                                        http://www.bnlearn.com/.
analyzed by tax auditors and classified as compliant or               4
                                                                        The estimator used for the mutual information coefficients
non-compliant. Each line has, besides the dependent               for the Chow-Liu algorithm in TAN. Possible values are mi
                                                                  (discrete mutual information) and mi-g (Gaussian mutual infor-
      1
     In 2015 this number decreased to 670,000 due to efforts in   mation). We use discrete since all explanatory variables have
better selecting individuals tax returns for audits               been discretized




                                                        BMAW 2016 - Page 18 of 59
Since the proportion of compliant/non-compliant taxpay-        4   CONCLUSION AND FUTURE WORK
ers is classified information, we present the results of
the predictive models in terms of improvements from            Brazil has been through a major crisis and the respon-
the actual process of discharging taxpayers from fiscal        sibility of the RFB as a tax administration has also in-
lattice. Since our dependent variable is compliant/non-        creased in order to guarantee the revenue for public poli-
compliant, we are interested in evaluating the models by       cies. A better selection of tax audits save resources and
specificity more than sensitivity, since it is more danger-    increase the performance of the collecting tax process.
ous to let a non-compliant taxpayer go away without be-        Our approach on creating predictive models to improve
ing audited than to select one that is compliant to be au-     the risk based selection of the so called “fiscal lattice”
dited.                                                         proved to be a promising one based on the first results.
Each Brazilian tax administration local unit is au-            We intend to use different approaches and Bayesian
tonomous and may choose whatever criteria it finds best        networks algorithms in order to create compliance risk
to dismiss taxpayers from fiscal lattice. So, to a matter      scores and leave the decision of taxpayers being compli-
of possible comparison with our proposal, we consider a        ant or not to the tax officers and possibly increase the
linear cut (random selection) of taxpayers until it reaches    specificity. The approach in the present work delegates
a units capacity. If, for example, an office has the ca-       this decision to the prediction algorithm.
pacity to audit 2,000 taxpayers per month, and there are
3,000, we consider the actual process to randomly choose       Furthermore we will try and build Bayesian networks
the 1,000 to be dismissed. The overall taxpayers wrongly       with larger samples and more tax units and include
dismissed, is the same as the proportion between non-          more information about the taxpayer, since in this
compliant taxpayers from overall caught on fiscal lattice.     work we basically used income tax returns and registry
Our goal is to better predict if a taxpayer caught on fiscal   information. Financial transactions and invoice data
lattice is compliant or not. If we come to a specificity       could be interesting explanatory variables and will be
considerably better than random selection, we achieve          used in future applications.
our goal to let go as few non-compliant taxpayers as pos-
sible.                                                         Acknowledgements
As we learn from Table 1, using Naı̈ve Bayes is already
a good tool to select those taxpayers which can and can-       The authors would like to thank RFB, specially DERPF,
not be dismissed from being audited. Tree-Augmented            for providing the resources necessary to work in this re-
Naı̈ve Bayes had no major advantages, despite the cus-         search, as well as for allowing its publication.
tomization of parameters (root chose automatically or
user defined).
                                                               References
                                                               Rommel N Carvalho, Leonardo Sales, Henrique A
                                                                 Da Rocha, and Gilson Libório Mendes.         Using
 Table 1: Predictive Models by Algorithm/Parameters              bayesian networks to identify and prevent split pur-
                                                                 chases in brazil. In BMA@ UAI, pages 70–78, 2014.
            Algorithm          Performance Rate
                                                               Pamela Castellón González and Juan D Velásquez. Char-
       Naive Bayes             41 %                              acterization and detection of taxpayers with false in-
       TAN (auto root)         34 %                              voices using data mining techniques. Expert Systems
       TAN (selected root)     35 %                              with Applications, 40(5):1427–1436, 2013.
                                                               Pete Chapman, Julian Clinton, Randy Kerber, Thomas
                                                                 Khabaza, Thomas Reinartz, Colin Shearer, and Rudi-
                                                                 ger Wirth. Crisp-dm 1.0 step-by-step data mining
Therefore, the predictive models in this first results           guide. 2000.
showed optimistic results, resulting in a increase of more
                                                               CRAN. Cran project. package bnlearn. https://cran.r-
then 30% in tax audit selection in comparison to ran-
                                                                project.org/web/packages/bnlearn/index.html, 2016.
domly discharging taxpayers. It is major to recollect that
                                                                Accessed: 2016-05-08.
the taxpayers caught in fiscal lattice have already been
through a risk based process of selection and any increase     Olivier Gevaert, Frank De Smet, Dirk Timmerman, Yves
in this criteria is a leverage in using Bayesian networks        Moreau, and Bart De Moor. Predicting the prognosis
to build models of tax compliance.                               of breast cancer by integrating clinical and microarray



                                                     BMAW 2016 - Page 19 of 59
  data with bayesian networks. Bioinformatics, 22(14):      Rüdiger Wirth and Jochen Hipp. Crisp-dm: Towards a
  e184–e190, 2006.                                            standard process model for data mining. In Proceed-
                                                              ings of the 4th international conference on the practi-
Manish Gupta and Vishnuprasad Nagadevara. Audit se-
                                                              cal applications of knowledge discovery and data min-
 lection strategy for improving tax compliance: Ap-
                                                              ing, pages 29–39. Citeseer, 2000.
 plication of data mining techniques. In Foundations
 of Risk-Based Audits. Proceedings of the eleventh In-      Harry Zhang. The optimality of naive bayes. AA, 1(2):3,
 ternational Conference on e-Governance, Hyderabad,           2004.
 India, December, pages 28–30, 2007.                        Fei Zheng and Geoffrey I Webb. Tree augmented naive
Ronald Jansen, Haiyuan Yu, Dov Greenbaum, Yuval               bayes. In Encyclopedia of Machine Learning, pages
  Kluger, Nevan J Krogan, Sambath Chung, Andrew               990–991. Springer, 2011.
  Emili, Michael Snyder, Jack F Greenblatt, and Mark
  Gerstein. A bayesian networks approach for predicting
  protein-protein interactions from genomic data. Sci-
  ence, 302(5644):449–453, 2003.
Liangxiao Jiang, Harry Zhang, and Zhihua Cai. A
  novel bayes model: Hidden naive bayes. Knowledge
  and Data Engineering, IEEE Transactions on, 21(10):
  1361–1371, 2009.
Efstathios Kirkos, Charalambos Spathis, and Yannis
  Manolopoulos. Data mining techniques for the detec-
  tion of fraudulent financial statements. Expert Systems
  with Applications, 32(4):995–1003, 2007.
Kevin B Korb and Ann E Nicholson. Bayesian artificial
  intelligence. CRC press, 2010.
Jani Martikainen et al.         Data mining in tax
  administration-using analytics to enhance tax compli-
  ance. Department of Information and Service Econ-
  omy. Aalto University, 2012.
OECD.       Tax administration 2013 - compara-
 tive information on oecd and other advanced
 and emerging economies.          Technical Re-
 port 2308-7331, Organisation for Economic
 Co-operation and Development, Paris, 2013.
 URL       http://www.oecd-ilibrary.org/
 content/serial/23077727.
RFB. Secretariat of federal revenue of brazil (rfb) web-
  site. http://www.receita.fazenda.gov.br, 2016. Ac-
  cessed: 2016-05-08.
Leon Sólon da Silva, Rommel Novaes Carvalho, and
  João Carlos Felix Souza. Predictive models on tax re-
  fund claims-essays of data mining in brazilian tax ad-
  ministration. In Electronic Government and the Infor-
  mation Systems Perspective, pages 220–228. Springer,
  2015.
R CORY Watkins, K Michael Reynolds, Ron Demara,
  Michael Georgiopoulos, Avelino Gonzalez, and Ron
  Eaglin. Tracking dirty proceeds: exploring data min-
  ing technologies as tools to investigate money laun-
  dering. Police Practice and Research, 4(2):163–178,
  2003.



                                                  BMAW 2016 - Page 20 of 59