Data Science Methods and Techniques for Goods and Services Trading Taxation: a Systematic Mapping Study Douglas Silva*, Sergio Carvalho** *Federal University of Goias, Goiania-GO, Brazil, douglas.bernardes@inf.ufg.br **Federal University of Goias, Goiania-GO, Brazil, sergiocarvalho@ufg.br Abstract: Taxation on goods and services trading operations is the main revenue source for States and Provinces around the world. Collecting such taxes, however, constantly faces a series of challenges, ranging from the incorrect filling of tax documents involved (which leads to the incorrect calculation of the due tax) to attempts of tax fraud. As this context involves analyzing a very large amount of data, data science techniques appear as an interesting alternative to provide effective solutions to the problems that arise. This article describes a systematic mapping of the literature aimed to identify how data science methods and techniques have been applied to this context and how the problems inherent in this domain are being handled. Results show that there are very well-defined categories of problems being researched in this area, and that data science can efficiently be used to improve the collection of these types of taxes. Keywords: value-added tax, goods and services tax, sales tax, data science, systematic mapping study. 1. Introduction Taxes are compulsory financial charges imposed on an individual or entity taxpayer by Government in order to fund public expenditures (Mathews et al. 2018, Rad et al. 2015). They are regulated by specific laws that describe their composition, their collection and compliance processes and even resulting revenue application — if needed. Taxation on the sale of goods and provision of services is the main source of revenue for most states and provinces around the world and therefore a relevant kind of indirect taxes. It is a tax applied to each link in the consumption chain, and which generates, at each transaction, one or more tax documents with a complete record of the items and parties involved in the transaction (Yu et al. 2019), including their tax classification and due tax rate. This information is usually registered in an electronic invoice. Trading operations taxation has been implemented in different ways, but usually as a non- cumulative tax due proportionally to each taxpayer that compose the consumption chain. Some Copyright ©2021 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). 232 Ongoing Research taxes of this nature are Sales Tax in the United States (Buxton et al. 2019), ICMS (that stands for Tax on Circulation of Goods and Services) in Brazil, GST (Goods and Services Tax) in countries like Australia, Canada, Singapore and recently India (Mathews et al. 2018, Mehta et al. 2018), and variations of VAT (Value-Added Tax) — that is used in most countries, like China (Yu et al. 2019) and European Union. Tax law applied to goods and services trading, however, in addition to being complex, constantly changes, and the taxpayer is not always up to date on the tax rules applicable to each product he sells, or to each service he is willing to provide (Lahann et al. 2019). Tax benefits and exemptions are also granted seasonally and for a specific period of time to specific segments of taxpayers, and all of these possibilities directly impact the tax bookkeeping declared by all of them. These situations allow taxpayers, intentionally or unintentionally, to generate damage to public treasury and consequently undermine provision of public services to the citizen. It becomes then necessary to not only collect the taxes, but to verify if taxpayers done it properly and to proceed with debt collection when necessary (Abe et al. 2010). The analysis of tax compliance information is currently a tax auditor's responsibility. The limited number of human resources, associated with the volume of generated information, however, makes conventional procedures ineffective and inefficient (Wang 2012). It is necessary to direct auditor's focus, so that he acts less in formalities and more in signs of anomalies or fraud (Basta et al. 2009). Although technological development has enabled the automation of operational processes, analysis of massive amounts of data aimed at identifying anomalies, inconsistencies and behavior patterns for detecting evidence of fraud and tax non-compliance is still a challenge. Methods traditionally used to solve aforementioned problems are time-consuming, costly and imprecise, and in big data scenario it is impractical. Although government have been analyzing tax data for ever, and analytics, AI and modern technology help them do better, big data in this domain is recent. GST itself has been implemented in India, e.g., only on 2017 (Das et al. 2017), and electronic invoicing was made mandatory for Italian companies just in January 2019 (Bardelli et al. 2020). Problems related to data characteristics — as volume, inconsistency and incompleteness — are hence also recent, and mapping how that computing areas deals with this domain becomes needed. Data science models and strategies are, in general, useful to a context analogous to this. However, their applicability varies according to the characteristics of available data. Therefore, it is necessary to identify which techniques could be used and for which reasons, and also to identify aspects of these models and strategies that have not yet been addressed. This article presents a systematic mapping of the literature that intends to comprehend the domain of tax collection in goods and services trading operations and how data science has been used to solve problems that emerge in this context, identifying a possible consensus or good practices in handling these situations. However, it is important to understand that this is a systematic mapping and not a systematic review. Its main objective is to map the domain area, its datasets characteristics and how they influence researchers' choice about which techniques to use, Ongoing Research 233 so it can help to clarify the way to future researchers. Techniques itself and how they handle the problems found in this domain here would better be explored in a systematic review, with research questions aimed to this end. The remainder of the paper is organized as follows. The second section introduces concepts involving tax due to trade of goods and services, and data science techniques. Third section presents the materials and methods used to define the systematic mapping protocol. Fourth section displays the results of the systematic mapping performed, while the fifth section analyzes these results. The sixth section discusses the results found and perceptions over them. Seventh and last section presents the final considerations on the performed procedure. 2. Review Protocol This systematic mapping of literature followed the procedure described in (Petersen et al. 2008). As part of the process, a research protocol was defined, which is detailed in the following items. 2.1 Research Questions and Search Strategy This mapping sought to establish the state of the art in scientific research conducted in the field of data science in the domain of tax data from goods and services trading operations. The specific review questions addressed were: 1) What problems in the domain of goods and services trading taxation have been studied in the area of data science? 2) What types of techniques and learning strategies have been applied? 3) Which data sources are used in the analysis? 4) Do the selected attributes vary according to the region / location (where the problem occurs)? 5) Which datasets are used? 6) How big are these datasets? 7) Has the volume of data been a complicating factor for the analysis? 8) How has the problem of volumetry been dealt within this context? From the main keywords identified in these research questions, an initial string was defined and calibrated through a pilot search in digital libraries from IEEE Xplore, ACM and Scopus, in order to reduce likelihood of polarization. Assessment also took into account that taxes with these characteristics are called Sales Tax in the United States, GST in India (among other countries) and VAT in the European Union and China. By adding these three variations, we apparently reached all (or most of) aimed publications. The evaluation of pilot search results led to the following search string: 234 Ongoing Research ("value-added tax" OR "goods and services tax" OR "sales tax") AND ("data science" OR "artificial intelligence" OR "data mining" OR "machine learning" OR "neural network") After defining the string, we selected most common publications databases to perform systematic mappings and reviews of the literature in the area of Software Engineering (Dyba et al. 2005), namely: ACM Digital, IEEE Xplore, ScienceDirect, SpringerLink and Scopus. 2.2 Criteria for Study Selection Once primary studies were obtained from the aforementioned research sources, inclusion and exclusion criteria were applied to them in order to select those notably relevant to the systematic mapping objetive. Thereby studies were considered eligible if they had tax collection in the trade of goods and services as motivation and as scenario for implementation/validation of the proposed method, or whose method had applicability to a context similar to that mentioned. Selected studies were also evaluated for their relevance (they should bring up data science techniques) and formality, being excluded from the review publications that did not meet the aforementioned eligibility criteria and: • Papers that do not propose the use of data science methods or techniques to solve a problem found in the mentioned domain; • Papers that do not present the method proposed to solve the problem; • Publications that have not been subjected to peer review; • Publications that are not in English or Portuguese; • Publications without the full text or unavailable; • Repeated publications. The number of excluded papers, as well as the reason for their exclusion, were recorded as the articles were evaluated. The process for selecting studies followed the one proposed by Meline (2006): • Step 1 (screening): eligibility criteria were applied to the search results through a preliminary evaluation of their title, abstract and keywords; • Step 2: studies were then discarded if they meet one or more exclusion criteria, being evaluated the same elements as step 1; • Step 3 (full text review): eligibility and exclusion criteria were then applied to remaining/accepted studies, now evaluating their full text. 2.3 Methods for Data Extraction and Study Synthesis After evaluating full text of accepted articles, we filed them using a data extraction form, equalizing the results found in each research and allowing their analysis and summarization. Ongoing Research 235 The following information was extracted from the selected articles: (i) title, authors and year of publication; (ii) research problem; (iii) proposed data analysis technique; (iv) learning paradigm and strategy, if it fits; (v) data sources used; (vi) datasets used; (vii) datasets volumetry (and inherent problems); and (vii) gaps observed by the researchers. 3. Results Following the process described in Section 2.2, we carried out a literature search on December 29, 2020, which initially returned 867 papers. Of these, 24 papers came from the IEEE Xplore digital library, 66 from the ACM database, 258 from Scopus, 218 from the ScienceDirect digital database and 301 from the SpringerLink database. After the initial reading of abstract, keywords and title, 71 duplicate articles were found and discarded, and 747 articles were also rejected because they did not meet the eligibility criteria. In these 747, 742 were excluded for not having as motivation and as validation scenario problems related to tax collection in the trade of goods and services, not even by similarity, and another 5 were rejected for proposing computational techniques not related to data science (such as blockchain or ontologies) aiming some other aspect of the mentioned tax domain. It is important to highlight that, according to our view, the string contains only the terms necessary to direct the results: three variations of how this consumption tax is called around the world, and the name of techniques or areas that could indicate the application of Data Science to this domain. Aiming avoid false positives, even (known) abbreviations of these taxes were removed from the search string. However, several articles mention, often only once, the tax itself, or how useful it would be to use data science to deal with it. Their application, however, or the domain itself, were not the focus of these articles — and whenever that happened, they were discarded. Thus, from 747 articles, 49 remained for full text evaluation. These 49 articles were obtained and evaluated as full text, and we found that 4 (four) of them should be rejected because they did not have, as their main motivation, the improvement of tax collection in goods and services trading operations (Hoglund 2017, Kong et al. 2014, Krzikallová 2020, Meservy 1992), and three of them were discarded by the exclusion criterion related to the non-use of data science methods or techniques to solve a problem found in the mentioned domain (Akinboade et al. 2009, Bogdanov et al. 2015, Cai et al. 2011). In addition, two of them were rejected for not been written in Portuguese or English (Cadena et al. 2019, Hasanli et al. 2014). Finally, three of them were not even accessible (Loan et al. 2018, Mathews et al. 2018, Vicente et al. 2016). After examining the full texts 37 articles remained. We applied then the data extraction form defined in Section 2.3 and carried out the analyzes show bellow. 4. Results Analysis The evaluated publications start, chronologically, with the proposal of Voorhees (2006) to carry out a forecast of goods and services trading revenue through neural networks. He mentions that using a neural network for this purpose is better than performing a regressive analysis, since it is limited 236 Ongoing Research to the extent that independent variables cannot be correlated, residues must be independent and errors must be equally distributed. Defa and Jing (2010) and Buxton et al. (2019) also present approaches to forecasting revenue from this tax. Defa and Jing combines three prediction models: a regression equation model, a time series model and gray model, maximizing their combined accuracy and reaching less than 5% error. Buxton et al., with a more recent work, also combine two models — Auto-regressive MultiLayer Perceptron and LSTM — and are effective in forecasting the collection of different product categories, such as fuels, construction and medicines. The expected tax, however, does not always match collected one. The process of verifying — and seeking — the correctness of the tax declared by the taxpayer is known as tax compliance. In this sense, Lahann et al. (2019) presents an anomaly detection approach, in order to identify obvious transactions that have a high degree of probability of being associated to a false tax code (and, consequently, lead the taxpayer to pay an undue tax and, in most cases, a smallest one). In the same line, Fjeldstad et al. (2020) proposes a model based on a decision tree that verifies whether the expected behavior and the taxpayer documents correspond to the tax operation planned for him. Mehta et al. (2019), to increase compliance levels, propose a regression model to identify defaulting debtors and friendly Android apps to assist auditors in collecting tax. However, they also deal with another aspect in the quest to guarantee the correct collection: the verification of tax evasion. To do this, they explore the detection and analysis of a tax evasion mechanism, known as circular trading, using advanced social network and algorithmic analytical techniques. Mehta et al. (2019) have published a series of surveys involving the analysis of tax data and the detection of tax fraud and tax evasion behavior by the taxpayer. Only from their work group (apparently) 8 (eight) other articles were selected for full-text review in this systematic mapping. Mathews et al. (2018) had already started exploring the circular trade problem. In this type of transaction, a group of merchants "manufactures" sales and (or) purchases between themselves, which results in the flow of goods in a circular manner without any added value: for the collecting entity, the taxpayer (or the group) is entitled to an abatement of the tax to be paid, since the nature of the tax indicates that it must pay only the tax on the value it added to the product. However, as there was no acquisition initially, this "credit" is free, and in fact the taxpayer is only withholding what would be due to him for selling the goods. To solve this problem, the entire series of articles published by the group seeks to model the relationships between taxpayers, as well as the commercial transactions that take place between them, in the form of a graph (where the contributors are the vertices and their relations, the edges), and so that machine learning models can identify patterns and outliers in these relationships. In another paper by Mathews et al. (2018), the classification of suspected contributors is given in three steps. In the first, taxpayers are clustered based on 7 correlations between variables such as tax paid, the total amount of sales, the amount of tax paid in cash and the amount of tax-free sales. They then use an application of Benford's law to classify taxpayers in each cluster as "trusted" and Ongoing Research 237 "suspect". Finally, it uses data from trusted taxpayers to create a linear regression model, which is then applied to suspect taxpayers to predict the amount of tax each tends to evade in the next period. Mehta et al. (2018) try to predict whether a taxpayer tends to declare the tax appropriately in the next reference. They are based on the behavior of statements of each company in previous years, on the turnover of the current month, on the value of interactions with other taxpayers and on the average absolute deviation obtained by the law of Benford, when applied to taxpayer sales transactions. It also uses information from transport communications to carry out associations (all transport of products demands this auxiliar document). 5. Discussion Table 1 shows a comparison of accepted papers. As can be seen, there is a preponderance on using machine learning unsupervised techniques in two major categories of tax problem, which are even related: fraud and tax evasion. Table 1. Papers grouped by the learning paradigm used to solve each tax problem found Tax Collection Problem Learning Paradigm Revenue Forecasting Supervised (Buxton et al. 2019, Defa and Jing 2010, Voorhees 2006) Compliance Supervised (Fjeldstad et al. 2020, Lahann et al. 2019, Mehta et al. 2019) Debt Collection Reinforcement (Abe et al. 2010) Statistical learning (Holkova and Falat 2017) Fraud Supervised (Basta et al. 2009, Castellón Gonzaléz et al. 2013, Rad and Shahbahrami 2015, Yu et al. 2019) Unsupervised (Castellón Gonzaléz et al. 2013, Junqué de Fortuny et al. 2014, Mathews et al. 2018, Mehta et al. 2019, Mehta et al. 2019, Mittal et al. 2018, Priya et al. 2019, Vanhoeyveld et al. 2020, Zha 2020) Reinforcement (He et al. 2020) Evasion Supervised (Didimo et al. 2020, Mathews et al. 2018, Mathews et al. 2021, Rahimikia et al. 2017, Wu et al. 2012) Unsupervised (Assylbekov et al. 2016, González-Martel et al. 2020, Mathews et al. 2018, Mehta et al. 2019, Mehta et al. 2018, Mehta et al. 2019, Mehta et al. 2020, Wang 2012, Wu et al. 2020) Semi-supervised (Kleanthous and Chatzis 2020) Positive learning (Wu et al. 2019) Transfering learning (Zhu et al. 2018) Evasion occurs when any action by the taxpayer leads to the non-collection by the Public Administration of the taxes due to it. It can occur intentionally or not, but regardless it causes damage to the treasury, and for this reason it is combated. Fraud is a more specific case of evasion, in which the taxpayer (or a group of taxpayers) intentionally uses techniques or subterfuge to avoid being held responsible for the purchase and sale of goods they carry out. The most prominent of these, according to the results of systematic mapping, is Circular Trading (Mathews et al. 2018, 238 Ongoing Research Mathews et al. 2018, Mehta et al. 2019, Mehta et al. 2019, Priya et al. 2019, Mathews et al. 2021). However, there are other actions, such as the indication of a false operating address to get rid of tax obligations — known as Residence Fraud (Junqué de Fortuny et al. 2014) — and clandestine transportation of goods without a tax document. As we analyze the results of the mapping, it is clear that the techniques and learning paradigms vary widely, but in general are associated with the characteristics of the data available in each context. When it comes to a problem that involves historic of carried out operations, such as audits already carried out or collection from previous months, the paradigm is usually supervised, since the data tend to be labeled. This is also the case for tax compliance, as it is inherent to it to know the expected tax classification for each item and to check if proper rate has been assigned to it. Tax fraud or evasion cases, on the other hand, can be dealt under both points of view. If data analysis makes use of information from audits already carried out, with proofs that a certain behavior was actually due to a "fraudulent" contributor, learning will be supervised and the algorithm will use the characteristics associated with the given label to rank the next contributors. This is a rarer case, however, as the volume of audits performed and recorded is still small compared to the volume of tax documents issued. Therefore, the trend observed in systematic mapping is that the algorithms and learning techniques use the relationships between the taxpayers, and the commercial transactions carried out by them, to identify patterns and outliers that indicate suspicious behavior in an effective and efficient way. It is also worth noting that the use of machine learning in this domain is recent. According to the mapping, 75% of the elected works carried out in this area were published in the last 5 years. This is due, in part, to the fact that the tax documents processed in the operations of trade in goods and services have only recently become electronic. In the state of Goias, e.g., they are 100% electronic since 2018, only. Finally, it is necessary to highlight that the volume of tax data to be processed during the learning process was not mentioned as a problem. However, this may be due to a fact mentioned in several studies: fiscal secrecy prevents researchers outside Revenue agencies from having access to data from commercial transactions, limiting the scope of the proposals. This, however, could be a new opportunity when it comes to evaluating new learning techniques, if access to tax data is granted. 6. Threats to Validity Despite the mapping's systematic character, some aspects are threats to its validity. The main one is due to a characteristic inherent to a mapping or systematic review: when addressing specific research questions, and for this purpose choosing the most appropriate terms for the search string, Ongoing Research 239 search may fail to return interest results to the purpose of the review or mapping — just by not matching the chosen terms (Kitchenham et al. 2007). For this work's matter, we defined that one of the mandatory expressions would be value-added tax (with its syntactic variations), due to its recurrence as a tax on operations in the trade of goods and services in different parts of the world. However, its acronym (VAT) was not included, as well known as, but associated with the most diverse expressions (such as Visceral Adipose Tissue, in medical articles). In contrast, articles of interest in this research that use only the known acronyms of the surveyed taxes (VAT, GST), without naming them in full, were not returned by this review. Another threat to validity is due to the fact that the mapping was carried out by a single reviewer, which may have biased in some way papers' interpretation. 7. Concluding Remarks The systematic mapping study presented here showed, within the scope of the main digital libraries used to index studies published in the field of Computing, the state of the art of the proposed approaches to deal with aspects related to tax collection in operations of trade in goods and services through data science methods and techniques. Mapping showed that there are five major problems researched by the scientific community in this context, with a greater focus on identifying and predicting of tax evasion behaviors by the taxpayers, whether due to incorrect filling of tax documents or intentional attempts at tax fraud and evasion. The mapping also showed that each of these problems requires specific data analysis methods and techniques, and that the nature of these data leads to the choice of the appropriate learning technique for each case. To address tax compliance (verifying if proper rate is being applied to each product), for example, characteristics related to each tax class are labeled and a supervised learning algorithm is needed to classify products and taxpayers. In order to detect tax evasion or fraud attempts, such as circular trading, not only purchase and sale operations are analyzed, but also the relationships between taxpayers, in order to identify outliers in their behavior. For this, an unsupervised learning technique for clustering these taxpayers seems to be more suitable. Regarding the datasets used, there are two considerations. Unlike the initial suspicion, the volume of data was not mentioned — in general — as an issue to be handled. On the other hand, this may be due to the fact that most returned papers found it difficult to access tax data, due to confidentiality involved, which limited the amount and variability of data used in the validation of the proposed methods. It also guided — and maybe biased — the choice of the learning technique to be used in some cases. Major implications for future research include a need for more taxpayers' behavior analysis variations. As data is limited — in amount and depth, by confidentiality — only some aspects of taxpayer behavior, as amount of sales and related tax, are usually investigated. Some works have been done around fraud techniques as circular trading and residence fraud, mas it is still limited. Taxpayers use regulation gaps in tax domain to apply fraud without breaking out tax procedures, 240 Ongoing Research and therefore not being seen as an anomaly. Tax benefits and exemptions, granted seasonally and for a specific period of time to specific segments of taxpayers, are also a huge opportunity for tax evaders. This exceptions and unusual behaviors must be taken into account and be added to current models for improvement and performance analysis. Furthermore, it would be interesting to systematically evaluate techniques current proposed to handle tax evasion, how they arrange to adapt incomplete and inconsistent tax data and if a consensus emerge of it. This could be proper done with a systematic literature review focused on data science methods and techniques specifically proposed for tax evasion and fraud behavior. Finally, it lacks an evaluation of efficiency loss due to incomplete tax data, by the confidentiality issue, and a definition of how to definitely deal with this problem. It could be achieved throw a comparison of performance and effectiveness between a complete and incomplete data scenarios. References Abe, N., & Melville, P., & Pendus, C., & Reddy, C.K., & Jensen, D.L., & Thomas, V.P., & Bennett, J.J., & Anderson, G.F., & Cooley, B.R., & Kowalczyk, M., & Domick, M., & Gardinier, T. (2010). Optimizing debt collections using constrained reinforcement learning. Proceedings of the 16th ACM SIGKDD. p. 75–84. Akinboade, O.A., & Kinfack, E.C., & Mokwena, M.P., & Kumo, W.L. (2009). Benchmarking tax compliance efficiency among south african retail firms using stochastic frontier approach. 32(13), 1124–1146. Assylbekov, Z., & Melnykov, I., & Bekishev, R., & Baltabayeva, A., & Bissengaliyeva, D., & Mamlin, E., & Czarnowski, I., & Caballero, A.M., & Howlett, R.J., & Jain, L.C. (2016). Detecting Value-Added Tax Evasion by Business Entities of Kazakhstan, pp. 37–49. Bardelli, C., & Rondinelli, A., & Vecchio, R., & Figini, S. (2020). Automatic electronic invoice classification using machine learning models. Machine Learning and Knowledge Extraction 2(4), 617–629. Basta, S., & Fassetti, F., & Guarascio, M., & Manco, G., & Giannotti, F., & Pedreschi, D., & Spinsanti, L., & Papi, G., & Pisani, S. (2009). High quality true-positive prediction for fiscal fraud detection. pp. 7–12. Bogdanov, D., & Jõemets, M., & Siim, S., & Vaht, M., T., O., R., B. (2015). How the estonian tax and customs board evaluated a tax fraud detection system based on secure multi-party computation. vol. 8975, pp. 227–234. Buxton, E., & Kriz, K., & Cremeens, M., & Jay, K. (2019). An auto regressive deep learning model for sales tax forecasting from multiple short time series. Intern. Conf. on Machine Learning and Applications. 1359-1364. Cadena, M., & Morán, E. (2019). Analysis for possible tax evasions from the value added tax in ecuador using an stochastic model with a non-parametric technique. pp. 428–438. Cai, D., & Zhang, A., & Cai, J. (2011). The improvement on china’s regional standard value added tax revenue estimate method - the construction, application and verification of standard rate model. pp. 783–786. Castellón González, P., & Velásquez, J.D. (2013). Characterization and detection of taxpayers with false invoices using data mining techniques. 40(5), 1427–1436. Das, S., & Kolya, A.K. (2017). Sense gst: Text mining & sentiment analysis of gst tweets by naive bayes algorithm. pp. 239–244. Ongoing Research 241 Defa, C., & Jing, C. (2010). Construction of combination forecasting model and related validation – based on combined forecast of sales tax and enterprise income tax in heilongjiang province. pp. 328–331. Didimo, W., & Grilli, L., & Liotta, G., & Menconi, L., M., F., P., D. (2020). Combining network visualization and data mining for tax risk assessment. pp. 16073–16086. Dyba,T., & Kitchenham,B.A., & Jorgensen,M. (2005). Evidence-based software engineering for practitioners. 58-65. Fjeldstad, O.H., & Kagoma, C., & Mdee, E., & Sjursen, I.H., & Somville, V. (2020). The customer is king: Evidence on vat compliance in tanzania. 128, 104841. Junqué de Fortuny, E., & Stankova, M., & Moeyersoms, J., & Minnaert, B., & Provost, F., & Martens, D. (2014). Corporate residence fraud detection. p. 1650–1659. KDD ’14. González-Martel, C., & Hernández, J.M., & Manrique-de Lara-Penãte, C. (2020). Identifying business misreporting in vat using network analysis. p. 113464. Hasanli, Y., & Agayev, S. (2014). Assessment of tax evasion risks for vat payers. 153(3), 487–495. He, Y., & Wang, C., & Li, N., & Zeng, Z. (2020). Attention and memory-augmented networks for dual-view sequential learning. Proceedings of the 26th ACM SIGKDD. p. 125–134. KDD ’20. Hoglund, H. (2017). Tax payment default prediction using genetic algorithm-based variable selection. 88, 368–375. Holkova, B., & Falat, L. (2017). Statistical learning as a tool for optimizing the level of excise tax of mineral oils in slovakia. 192, 318–323 Kitchenham, B., & Charters, S. (2007). Guidelines for performing systematic literature reviews in software engineering. Kleanthous, C., & Chatzis, S. (2020). Gated mixture variational autoencoders for value added tax audit case selection. 188, 105048. Kong, D., & Saar-Tsechansky, M. (2014). Collaborative information acquisition for data-driven decisions. 95, 71-86. Krzikallová, K., & Tosenovsk`y, F. (2020). Is the value added tax system sustainable? the case of the czech and slovak republics. 12(12). Lahann, J., & Scheid, M., & Fettke, P. (2019). Utilizing machine learning techniques to reveal vat compliance violations in accounting data. IEEE 21st Conference on Business Informatics (CBI). vol. 01, pp. 1–10. Loan, N.T., & Hac, L.D., & Anh, N.V.H., & Anh, L.H., & Dong, L.S., & Kreinovich, V., & Thach, N.N. (2018). Application of Statistical Methods for Tax Inspection of Enterprises: A Case Study in Vietnam. pp. 648–655. Mathews, J., & Mehta, P., & Babu, C.S., & Kasi V. Rao, S.V. (2018). An algorithmic approach to handle circular trading in commercial taxation system. pp. 67–75. Mathews, J., & Mehta, P., & Babu, C.S., & Kasi V. Rao, S.V. (2018). Clustering collusive dealers in commercial taxation system. Advances in Intelligent Systems and Computing, vol. 869, pp. 703–717. Mathews, J., & Mehta, P., & Kuchibhotla, S., & Bisht, D., & Chintapalli, S.B., & Rao, S.V.K.V. (2018). Regression analysis towards estimating tax evasion in goods and services tax. IEEE/WIC/ACM WI. 758-761. 242 Ongoing Research Mathews, J., & Mehta, P., & Suryamukhi, K., & Babu, S. (2021). Link prediction techniques to handle tax evasion. 8th ACM IKDD CODS and 26th COMAD. pp. 307–315. Mehta, P., & Mathews, J., & Bisht, D., & Suryamukhi, K., & Kumar, S., & Babu, C.S., & W., A., G., K. (2020). Detecting tax evaders using trustrank and spectral clustering. vol. 389 LNBIP, pp. 169–183. Mehta, P., & Mathews, J., & Kasi V. Rao, S.V., & Kumar, K.S., & Suryamukhi, K., & Babu, C.S. (2019). Identifying malicious dealers in goods and services tax. pp. 312–316. Mehta, P., & Mathews, J., & Kumar, S., & Suryamukhi, K., & Babu, C.S. (2019). Curtailing the tax leakages by nabbing return defaulters in taxation system. vol. 1127 CCIS, pp. 183–195. Mehta, P., & Mathews, J., & Kumar, S., & Suryamukhi, K., & Babu, C.S., & Rao, S.V.K.V., & Shivapujimath, V., & Bisht, D. (2019). Big data analytics for tax administration. vol. 11709 LNCS, pp. 47-57. Mehta, P., & Mathews, J., & Kumar, S., & Suryamukhi, K., & Sobhan Babu, C., & Kasi Visweswara Rao, S.V. (2019). Big data analytics for nabbing fraudulent transactions in taxation system. vol. 11514 LNCS, pp. 95-109. Mehta, P., & Mathews, J., & Suryamukhi, K., & Kumar, K.S., & Babu, C.S. (2018). Predictive modeling for identifying return defaulters in goods and services tax. pp. 631–637. Meline, T. (2006). Selecting studies for systemic review: inclusion and exclusion criteria. 33, 21–27. Meservy, R.D., & Denna, E.L., & Hansen, J.V. (1992). Application of artificial intelligence to accounting, tax, and audit services. 4(2), 213–218. Mittal, S., & Reich, O., & Mahajan, A. (2018). Who is bogus? using one-sided labels to identify fraudulent firms from tax returns. In: Proceedings of. COMPASS ’18. Petersen, K., & Feldt, R., & Mujtaba, S., & Mattsson, M. (2008). Systematic mapping studies in software engineering. In: 12th EASE. pp. 1–10. Priya, & Mathews, J., & Kumar, K.S., & Babu, C.S., & Rao, S.V.K.V. (2019). A collusion set detection in value added tax using benford’s analysis. vol. 858, pp. 909–921. Rad, M.S., & Shahbahrami, A. (2015). High performance implementation of tax fraud detection algorithm. pp. 6–9. Rahimikia, E., & Mohammadi, S., & Rahmani, T., & Ghazanfari, M. (2017). Detecting corporate tax evasion using a hybrid intelligent system: A case study of iran. 25, pp. 1–17. Vanhoeyveld, J., & Martens, D., & Peeters, B. (2020). Value-added tax fraud detection with scalable anomaly detection techniques. Vicente, E., & Mateos, A., & Jiménez-Martín, A., & Torra, V., & Narukawa, Y., & Navarro-Arribas, G., & Yañez, C. (2016). Complicity Functions for Detecting Organized Crime Rings. vol. 9880, pp. 205–216. Voorhees,W.R. (2006). Neural networks and revenue forecasting: a smarter forecast? 1(4), 379–388. Wang, G.L. (2012). Research on sampling method of tax-checking based on neural network. pp. 1541–1546. Wu, R.S., & Ou, C.S., & Lin, H.y., & Chang, S.I., & Yen, D.C. (2012). Using data mining technique to enhance tax evasion detection performance. 39(10), 8769–8777. Ongoing Research 243 Wu, Y., & Dong, B., & Zheng, Q., & Wei, R., & Wang, Z., & Li, X. (2020). A novel tax evasion detection framework via fused transaction network representation. pp. 235–244. Wu, Y., & Zheng, Q., & Gao, Y., & Dong, B., & Wei, R., & Z., F., & He, H. (2019). Tedm-pu: A tax evasion detection method based on positive and unlabeled learning. pp. 1681–1686. Yu, J., & Qiao, Y., & Shu, N., & Sun, K., & Zhou, S., & Yang, J. (2019). Neural network based transaction classification system for chinese transaction behavior analysis. 2019 IEEE BigData Congress. pp. 64–71. Zha, Z. (2020). Taxaa: A reliable tax auditor assistant for exploring suspicious transactions. WWW ’20. p. 240–244. Zhu, X., & Yan, Z., & Ruan, J., & Zheng, Q., & Dong, B. (2018) Irted-tl: An inter-region tax evasion detection method based on transfer learning. pp. 1224–1235. About the Authors Douglas Silva Douglas B. Silva is a PhD student in Computer Science at the Federal University of Goias, Goiania-GO, Brazil. His research interests include Data Science, Artificial Intelligence, Computer Systems and E-Government. He currently works at the Public Treasury of the State of Goias, Brazil, analyzing goods and services trading operations data. Sergio Carvalho Sergio T. Carvalho is a full professor at the Informatics Institute of the Federal University of Goias in Goiania- GO, Brazil. He received bachelor's degree in Computer Science from the Federal University of Goias, Master and Doctoral degrees in Computer Science, both from the Fluminense Federal University, Brazil. He has experience in the areas of Distributed Systems and Software Engineering and his main areas of expertise are ubiquitous computing, with a focus on healthcare applications, in addition to adaptive distributed systems and software architecture.