<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Data Science Methods and Techniques for Goods and Services Trading Taxation: a Systematic Mapping Study</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Douglas</forename><surname>Silva</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">Federal University of Goias</orgName>
								<address>
									<settlement>Goiania-GO</settlement>
									<country key="BR">Brazil</country>
								</address>
							</affiliation>
							<affiliation key="aff1">
								<orgName type="institution">Federal University of Goias</orgName>
								<address>
									<settlement>Goiania-GO</settlement>
									<country key="BR">Brazil</country>
								</address>
							</affiliation>
						</author>
						<author role="corresp">
							<persName><forename type="first">Sergio</forename><surname>Carvalho</surname></persName>
							<email>sergiocarvalho@ufg.br</email>
						</author>
						<title level="a" type="main">Data Science Methods and Techniques for Goods and Services Trading Taxation: a Systematic Mapping Study</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">22485AAF70BB705E915748D871B3B6E5</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T21:02+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>value-added tax</term>
					<term>goods and services tax</term>
					<term>sales tax</term>
					<term>data science</term>
					<term>systematic mapping study</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Taxation on goods and services trading operations is the main revenue source for States and Provinces around the world. Collecting such taxes, however, constantly faces a series of challenges, ranging from the incorrect filling of tax documents involved (which leads to the incorrect calculation of the due tax) to attempts of tax fraud. As this context involves analyzing a very large amount of data, data science techniques appear as an interesting alternative to provide effective solutions to the problems that arise. This article describes a systematic mapping of the literature aimed to identify how data science methods and techniques have been applied to this context and how the problems inherent in this domain are being handled. Results show that there are very well-defined categories of problems being researched in this area, and that data science can efficiently be used to improve the collection of these types of taxes.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>Taxes are compulsory financial charges imposed on an individual or entity taxpayer by Government in order to fund public expenditures <ref type="bibr">(Mathews et al. 2018</ref><ref type="bibr" target="#b42">, Rad et al. 2015)</ref>. They are regulated by specific laws that describe their composition, their collection and compliance processes and even resulting revenue application -if needed.</p><p>Taxation on the sale of goods and provision of services is the main source of revenue for most states and provinces around the world and therefore a relevant kind of indirect taxes. It is a tax applied to each link in the consumption chain, and which generates, at each transaction, one or more tax documents with a complete record of the items and parties involved in the transaction <ref type="bibr" target="#b51">(Yu et al. 2019)</ref>, including their tax classification and due tax rate. This information is usually registered in an electronic invoice.</p><p>Trading operations taxation has been implemented in different ways, but usually as a noncumulative tax due proportionally to each taxpayer that compose the consumption chain. Some taxes of this nature are Sales Tax in the United States <ref type="bibr" target="#b6">(Buxton et al. 2019)</ref>, ICMS (that stands for Tax on Circulation of Goods and Services) in Brazil, GST <ref type="bibr">(Goods and Services Tax)</ref> in countries like Australia, Canada, Singapore and recently India <ref type="bibr">(Mathews et al. 2018</ref><ref type="bibr">, Mehta et al. 2018)</ref>, and variations of VAT (Value-Added Tax) -that is used in most countries, like China <ref type="bibr" target="#b51">(Yu et al. 2019</ref>) and European Union.</p><p>Tax law applied to goods and services trading, however, in addition to being complex, constantly changes, and the taxpayer is not always up to date on the tax rules applicable to each product he sells, or to each service he is willing to provide <ref type="bibr" target="#b25">(Lahann et al. 2019)</ref>. Tax benefits and exemptions are also granted seasonally and for a specific period of time to specific segments of taxpayers, and all of these possibilities directly impact the tax bookkeeping declared by all of them. These situations allow taxpayers, intentionally or unintentionally, to generate damage to public treasury and consequently undermine provision of public services to the citizen. It becomes then necessary to not only collect the taxes, but to verify if taxpayers done it properly and to proceed with debt collection when necessary <ref type="bibr" target="#b0">(Abe et al. 2010)</ref>.</p><p>The analysis of tax compliance information is currently a tax auditor's responsibility. The limited number of human resources, associated with the volume of generated information, however, makes conventional procedures ineffective and inefficient <ref type="bibr" target="#b47">(Wang 2012)</ref>. It is necessary to direct auditor's focus, so that he acts less in formalities and more in signs of anomalies or fraud <ref type="bibr" target="#b4">(Basta et al. 2009)</ref>.</p><p>Although technological development has enabled the automation of operational processes, analysis of massive amounts of data aimed at identifying anomalies, inconsistencies and behavior patterns for detecting evidence of fraud and tax non-compliance is still a challenge.</p><p>Methods traditionally used to solve aforementioned problems are time-consuming, costly and imprecise, and in big data scenario it is impractical.</p><p>Although government have been analyzing tax data for ever, and analytics, AI and modern technology help them do better, big data in this domain is recent. GST itself has been implemented in India, e.g., only on 2017 <ref type="bibr" target="#b10">(Das et al. 2017)</ref>, and electronic invoicing was made mandatory for Italian companies just in January 2019 <ref type="bibr" target="#b3">(Bardelli et al. 2020)</ref>. Problems related to data characteristics -as volume, inconsistency and incompleteness -are hence also recent, and mapping how that computing areas deals with this domain becomes needed.</p><p>Data science models and strategies are, in general, useful to a context analogous to this. However, their applicability varies according to the characteristics of available data. Therefore, it is necessary to identify which techniques could be used and for which reasons, and also to identify aspects of these models and strategies that have not yet been addressed. This article presents a systematic mapping of the literature that intends to comprehend the domain of tax collection in goods and services trading operations and how data science has been used to solve problems that emerge in this context, identifying a possible consensus or good practices in handling these situations. However, it is important to understand that this is a systematic mapping and not a systematic review. Its main objective is to map the domain area, its datasets characteristics and how they influence researchers' choice about which techniques to use, so it can help to clarify the way to future researchers. Techniques itself and how they handle the problems found in this domain here would better be explored in a systematic review, with research questions aimed to this end.</p><p>The remainder of the paper is organized as follows. The second section introduces concepts involving tax due to trade of goods and services, and data science techniques. Third section presents the materials and methods used to define the systematic mapping protocol. Fourth section displays the results of the systematic mapping performed, while the fifth section analyzes these results. The sixth section discusses the results found and perceptions over them. Seventh and last section presents the final considerations on the performed procedure.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Review Protocol</head><p>This systematic mapping of literature followed the procedure described in <ref type="bibr" target="#b40">(Petersen et al. 2008)</ref>. As part of the process, a research protocol was defined, which is detailed in the following items.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1">Research Questions and Search Strategy</head><p>This mapping sought to establish the state of the art in scientific research conducted in the field of data science in the domain of tax data from goods and services trading operations. The specific review questions addressed were:</p><p>1) What problems in the domain of goods and services trading taxation have been studied in the area of data science? 2) What types of techniques and learning strategies have been applied? 3) Which data sources are used in the analysis? 4) Do the selected attributes vary according to the region / location (where the problem occurs)? 5) Which datasets are used? 6) How big are these datasets? 7) Has the volume of data been a complicating factor for the analysis? 8) How has the problem of volumetry been dealt within this context?</p><p>From the main keywords identified in these research questions, an initial string was defined and calibrated through a pilot search in digital libraries from IEEE Xplore, ACM and Scopus, in order to reduce likelihood of polarization.</p><p>Assessment also took into account that taxes with these characteristics are called Sales Tax in the United States, GST in India (among other countries) and VAT in the European Union and China. By adding these three variations, we apparently reached all (or most of) aimed publications.</p><p>The evaluation of pilot search results led to the following search string:</p><p>("value-added tax" OR "goods and services tax" OR "sales tax") AND ("data science" OR "artificial intelligence" OR "data mining" OR "machine learning"</p><p>OR "neural network")</p><p>After defining the string, we selected most common publications databases to perform systematic mappings and reviews of the literature in the area of Software Engineering <ref type="bibr" target="#b13">(Dyba et al. 2005)</ref>, namely: ACM Digital, IEEE Xplore, ScienceDirect, SpringerLink and Scopus.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2">Criteria for Study Selection</head><p>Once primary studies were obtained from the aforementioned research sources, inclusion and exclusion criteria were applied to them in order to select those notably relevant to the systematic mapping objetive.</p><p>Thereby studies were considered eligible if they had tax collection in the trade of goods and services as motivation and as scenario for implementation/validation of the proposed method, or whose method had applicability to a context similar to that mentioned.</p><p>Selected studies were also evaluated for their relevance (they should bring up data science techniques) and formality, being excluded from the review publications that did not meet the aforementioned eligibility criteria and:</p><p>• Papers that do not propose the use of data science methods or techniques to solve a problem found in the mentioned domain;</p><p>• Papers that do not present the method proposed to solve the problem;</p><p>• Publications that have not been subjected to peer review;</p><p>• Publications that are not in English or Portuguese;</p><p>• Publications without the full text or unavailable;</p><p>• Repeated publications.</p><p>The number of excluded papers, as well as the reason for their exclusion, were recorded as the articles were evaluated.</p><p>The process for selecting studies followed the one proposed by <ref type="bibr" target="#b37">Meline (2006)</ref>:</p><p>• Step 1 (screening): eligibility criteria were applied to the search results through a preliminary evaluation of their title, abstract and keywords;</p><p>• Step 2: studies were then discarded if they meet one or more exclusion criteria, being evaluated the same elements as step 1;</p><p>• Step 3 (full text review): eligibility and exclusion criteria were then applied to remaining/accepted studies, now evaluating their full text.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.3">Methods for Data Extraction and Study Synthesis</head><p>After evaluating full text of accepted articles, we filed them using a data extraction form, equalizing the results found in each research and allowing their analysis and summarization.</p><p>The following information was extracted from the selected articles: (i) title, authors and year of publication; (ii) research problem; (iii) proposed data analysis technique; (iv) learning paradigm and strategy, if it fits; (v) data sources used; (vi) datasets used; (vii) datasets volumetry (and inherent problems); and (vii) gaps observed by the researchers.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Results</head><p>Following the process described in Section 2.2, we carried out a literature search on December 29, 2020, which initially returned 867 papers. Of these, 24 papers came from the IEEE Xplore digital library, 66 from the ACM database, 258 from Scopus, 218 from the ScienceDirect digital database and 301 from the SpringerLink database.</p><p>After the initial reading of abstract, keywords and title, 71 duplicate articles were found and discarded, and 747 articles were also rejected because they did not meet the eligibility criteria. In these 747, 742 were excluded for not having as motivation and as validation scenario problems related to tax collection in the trade of goods and services, not even by similarity, and another 5 were rejected for proposing computational techniques not related to data science (such as blockchain or ontologies) aiming some other aspect of the mentioned tax domain.</p><p>It is important to highlight that, according to our view, the string contains only the terms necessary to direct the results: three variations of how this consumption tax is called around the world, and the name of techniques or areas that could indicate the application of Data Science to this domain. Aiming avoid false positives, even (known) abbreviations of these taxes were removed from the search string. However, several articles mention, often only once, the tax itself, or how useful it would be to use data science to deal with it. Their application, however, or the domain itself, were not the focus of these articles -and whenever that happened, they were discarded.</p><p>Thus, from 747 articles, 49 remained for full text evaluation. These 49 articles were obtained and evaluated as full text, and we found that 4 (four) of them should be rejected because they did not have, as their main motivation, the improvement of tax collection in goods and services trading operations <ref type="bibr" target="#b19">(Hoglund 2017</ref><ref type="bibr" target="#b23">, Kong et al. 2014</ref><ref type="bibr" target="#b24">, Krzikallová 2020</ref><ref type="bibr" target="#b38">, Meservy 1992)</ref>, and three of them were discarded by the exclusion criterion related to the non-use of data science methods or techniques to solve a problem found in the mentioned domain <ref type="bibr" target="#b1">(Akinboade et al. 2009</ref><ref type="bibr" target="#b5">, Bogdanov et al. 2015</ref><ref type="bibr" target="#b8">, Cai et al. 2011</ref>). In addition, two of them were rejected for not been written in Portuguese or English <ref type="bibr" target="#b7">(Cadena et al. 2019</ref><ref type="bibr" target="#b17">, Hasanli et al. 2014)</ref>. Finally, three of them were not even accessible <ref type="bibr" target="#b26">(Loan et al. 2018</ref><ref type="bibr">, Mathews et al. 2018</ref><ref type="bibr" target="#b45">, Vicente et al. 2016)</ref>.</p><p>After examining the full texts 37 articles remained. We applied then the data extraction form defined in Section 2.3 and carried out the analyzes show bellow.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Results Analysis</head><p>The evaluated publications start, chronologically, with the proposal of <ref type="bibr" target="#b46">Voorhees (2006)</ref> to carry out a forecast of goods and services trading revenue through neural networks. He mentions that using a neural network for this purpose is better than performing a regressive analysis, since it is limited to the extent that independent variables cannot be correlated, residues must be independent and errors must be equally distributed. <ref type="bibr" target="#b11">Defa and Jing (2010)</ref> and <ref type="bibr" target="#b6">Buxton et al. (2019)</ref> also present approaches to forecasting revenue from this tax. Defa and Jing combines three prediction models: a regression equation model, a time series model and gray model, maximizing their combined accuracy and reaching less than 5% error. <ref type="bibr">Buxton et al.</ref>, with a more recent work, also combine two models -Auto-regressive MultiLayer Perceptron and LSTM -and are effective in forecasting the collection of different product categories, such as fuels, construction and medicines.</p><p>The expected tax, however, does not always match collected one. The process of verifying -and seeking -the correctness of the tax declared by the taxpayer is known as tax compliance. In this sense, <ref type="bibr" target="#b25">Lahann et al. (2019)</ref> presents an anomaly detection approach, in order to identify obvious transactions that have a high degree of probability of being associated to a false tax code (and, consequently, lead the taxpayer to pay an undue tax and, in most cases, a smallest one). In the same line, <ref type="bibr" target="#b14">Fjeldstad et al. (2020)</ref> proposes a model based on a decision tree that verifies whether the expected behavior and the taxpayer documents correspond to the tax operation planned for him. <ref type="bibr">Mehta et al. (2019)</ref>, to increase compliance levels, propose a regression model to identify defaulting debtors and friendly Android apps to assist auditors in collecting tax. However, they also deal with another aspect in the quest to guarantee the correct collection: the verification of tax evasion. To do this, they explore the detection and analysis of a tax evasion mechanism, known as circular trading, using advanced social network and algorithmic analytical techniques. <ref type="bibr">Mehta et al. (2019)</ref> have published a series of surveys involving the analysis of tax data and the detection of tax fraud and tax evasion behavior by the taxpayer. Only from their work group (apparently) 8 (eight) other articles were selected for full-text review in this systematic mapping. <ref type="bibr">Mathews et al. (2018)</ref> had already started exploring the circular trade problem. In this type of transaction, a group of merchants "manufactures" sales and (or) purchases between themselves, which results in the flow of goods in a circular manner without any added value: for the collecting entity, the taxpayer (or the group) is entitled to an abatement of the tax to be paid, since the nature of the tax indicates that it must pay only the tax on the value it added to the product. However, as there was no acquisition initially, this "credit" is free, and in fact the taxpayer is only withholding what would be due to him for selling the goods.</p><p>To solve this problem, the entire series of articles published by the group seeks to model the relationships between taxpayers, as well as the commercial transactions that take place between them, in the form of a graph (where the contributors are the vertices and their relations, the edges), and so that machine learning models can identify patterns and outliers in these relationships.</p><p>In another paper by <ref type="bibr">Mathews et al. (2018)</ref>, the classification of suspected contributors is given in three steps. In the first, taxpayers are clustered based on 7 correlations between variables such as tax paid, the total amount of sales, the amount of tax paid in cash and the amount of tax-free sales. They then use an application of Benford's law to classify taxpayers in each cluster as "trusted" and "suspect". Finally, it uses data from trusted taxpayers to create a linear regression model, which is then applied to suspect taxpayers to predict the amount of tax each tends to evade in the next period. <ref type="bibr" target="#b36">Mehta et al. (2018)</ref> try to predict whether a taxpayer tends to declare the tax appropriately in the next reference. They are based on the behavior of statements of each company in previous years, on the turnover of the current month, on the value of interactions with other taxpayers and on the average absolute deviation obtained by the law of Benford, when applied to taxpayer sales transactions. It also uses information from transport communications to carry out associations (all transport of products demands this auxiliar document).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Discussion</head><p>Table <ref type="table">1</ref> shows a comparison of accepted papers. As can be seen, there is a preponderance on using machine learning unsupervised techniques in two major categories of tax problem, which are even related: fraud and tax evasion.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Table 1. Papers grouped by the learning paradigm used to solve each tax problem found</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Tax Collection Problem Learning Paradigm</head><p>Revenue Forecasting Supervised <ref type="bibr" target="#b6">(Buxton et al. 2019</ref><ref type="bibr" target="#b11">, Defa and Jing 2010</ref><ref type="bibr" target="#b46">, Voorhees 2006)</ref> Compliance Supervised <ref type="bibr" target="#b14">(Fjeldstad et al. 2020</ref><ref type="bibr" target="#b25">, Lahann et al. 2019</ref><ref type="bibr">, Mehta et al. 2019)</ref> Debt Collection Reinforcement <ref type="bibr" target="#b0">(Abe et al. 2010)</ref> Statistical learning <ref type="bibr" target="#b20">(Holkova and Falat 2017)</ref> Fraud Supervised <ref type="bibr" target="#b4">(Basta et al. 2009</ref><ref type="bibr">, Castellón Gonzaléz et al. 2013</ref><ref type="bibr" target="#b42">, Rad and Shahbahrami 2015</ref><ref type="bibr" target="#b51">, Yu et al. 2019)</ref> Unsupervised <ref type="bibr">(Castellón Gonzaléz et al. 2013</ref><ref type="bibr" target="#b15">, Junqué de Fortuny et al. 2014</ref><ref type="bibr">, Mathews et al. 2018</ref><ref type="bibr">, Mehta et al. 2019</ref><ref type="bibr">, Mehta et al. 2019</ref><ref type="bibr" target="#b39">, Mittal et al. 2018</ref><ref type="bibr" target="#b41">, Priya et al. 2019</ref><ref type="bibr" target="#b44">, Vanhoeyveld et al. 2020</ref><ref type="bibr" target="#b52">, Zha 2020)</ref> Reinforcement <ref type="bibr" target="#b18">(He et al. 2020)</ref> Evasion Supervised <ref type="bibr" target="#b12">(Didimo et al. 2020</ref><ref type="bibr">, Mathews et al. 2018</ref><ref type="bibr" target="#b30">, Mathews et al. 2021</ref><ref type="bibr" target="#b43">, Rahimikia et al. 2017</ref><ref type="bibr" target="#b48">, Wu et al. 2012)</ref> Unsupervised <ref type="bibr" target="#b2">(Assylbekov et al. 2016</ref><ref type="bibr" target="#b16">, González-Martel et al. 2020</ref><ref type="bibr">, Mathews et al. 2018</ref><ref type="bibr">, Mehta et al. 2019</ref><ref type="bibr">, Mehta et al. 2018</ref><ref type="bibr">, Mehta et al. 2019</ref><ref type="bibr" target="#b31">, Mehta et al. 2020</ref><ref type="bibr" target="#b47">, Wang 2012</ref><ref type="bibr" target="#b49">, Wu et al. 2020)</ref> Semi-supervised (Kleanthous and Chatzis 2020) Positive learning <ref type="bibr" target="#b50">(Wu et al. 2019)</ref> Transfering learning <ref type="bibr" target="#b53">(Zhu et al. 2018)</ref> Evasion occurs when any action by the taxpayer leads to the non-collection by the Public Administration of the taxes due to it. It can occur intentionally or not, but regardless it causes damage to the treasury, and for this reason it is combated. Fraud is a more specific case of evasion, in which the taxpayer (or a group of taxpayers) intentionally uses techniques or subterfuge to avoid being held responsible for the purchase and sale of goods they carry out. The most prominent of these, according to the results of systematic mapping, is Circular Trading <ref type="bibr">(Mathews et al. 2018</ref><ref type="bibr">, Mathews et al. 2018</ref><ref type="bibr">, Mehta et al. 2019</ref><ref type="bibr">, Mehta et al. 2019</ref><ref type="bibr" target="#b41">, Priya et al. 2019</ref><ref type="bibr" target="#b30">, Mathews et al. 2021)</ref>. However, there are other actions, such as the indication of a false operating address to get rid of tax obligations -known as Residence Fraud (Junqué de Fortuny et al. 2014) -and clandestine transportation of goods without a tax document.</p><p>As we analyze the results of the mapping, it is clear that the techniques and learning paradigms vary widely, but in general are associated with the characteristics of the data available in each context.</p><p>When it comes to a problem that involves historic of carried out operations, such as audits already carried out or collection from previous months, the paradigm is usually supervised, since the data tend to be labeled. This is also the case for tax compliance, as it is inherent to it to know the expected tax classification for each item and to check if proper rate has been assigned to it.</p><p>Tax fraud or evasion cases, on the other hand, can be dealt under both points of view. If data analysis makes use of information from audits already carried out, with proofs that a certain behavior was actually due to a "fraudulent" contributor, learning will be supervised and the algorithm will use the characteristics associated with the given label to rank the next contributors. This is a rarer case, however, as the volume of audits performed and recorded is still small compared to the volume of tax documents issued. Therefore, the trend observed in systematic mapping is that the algorithms and learning techniques use the relationships between the taxpayers, and the commercial transactions carried out by them, to identify patterns and outliers that indicate suspicious behavior in an effective and efficient way.</p><p>It is also worth noting that the use of machine learning in this domain is recent. According to the mapping, 75% of the elected works carried out in this area were published in the last 5 years. This is due, in part, to the fact that the tax documents processed in the operations of trade in goods and services have only recently become electronic. In the state of Goias, e.g., they are 100% electronic since 2018, only.</p><p>Finally, it is necessary to highlight that the volume of tax data to be processed during the learning process was not mentioned as a problem. However, this may be due to a fact mentioned in several studies: fiscal secrecy prevents researchers outside Revenue agencies from having access to data from commercial transactions, limiting the scope of the proposals. This, however, could be a new opportunity when it comes to evaluating new learning techniques, if access to tax data is granted.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.">Threats to Validity</head><p>Despite the mapping's systematic character, some aspects are threats to its validity. The main one is due to a characteristic inherent to a mapping or systematic review: when addressing specific research questions, and for this purpose choosing the most appropriate terms for the search string, search may fail to return interest results to the purpose of the review or mapping -just by not matching the chosen terms <ref type="bibr" target="#b21">(Kitchenham et al. 2007)</ref>.</p><p>For this work's matter, we defined that one of the mandatory expressions would be value-added tax (with its syntactic variations), due to its recurrence as a tax on operations in the trade of goods and services in different parts of the world. However, its acronym (VAT) was not included, as well known as, but associated with the most diverse expressions (such as Visceral Adipose Tissue, in medical articles). In contrast, articles of interest in this research that use only the known acronyms of the surveyed taxes (VAT, GST), without naming them in full, were not returned by this review.</p><p>Another threat to validity is due to the fact that the mapping was carried out by a single reviewer, which may have biased in some way papers' interpretation.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7.">Concluding Remarks</head><p>The systematic mapping study presented here showed, within the scope of the main digital libraries used to index studies published in the field of Computing, the state of the art of the proposed approaches to deal with aspects related to tax collection in operations of trade in goods and services through data science methods and techniques.</p><p>Mapping showed that there are five major problems researched by the scientific community in this context, with a greater focus on identifying and predicting of tax evasion behaviors by the taxpayers, whether due to incorrect filling of tax documents or intentional attempts at tax fraud and evasion.</p><p>The mapping also showed that each of these problems requires specific data analysis methods and techniques, and that the nature of these data leads to the choice of the appropriate learning technique for each case. To address tax compliance (verifying if proper rate is being applied to each product), for example, characteristics related to each tax class are labeled and a supervised learning algorithm is needed to classify products and taxpayers. In order to detect tax evasion or fraud attempts, such as circular trading, not only purchase and sale operations are analyzed, but also the relationships between taxpayers, in order to identify outliers in their behavior. For this, an unsupervised learning technique for clustering these taxpayers seems to be more suitable.</p><p>Regarding the datasets used, there are two considerations. Unlike the initial suspicion, the volume of data was not mentioned -in general -as an issue to be handled. On the other hand, this may be due to the fact that most returned papers found it difficult to access tax data, due to confidentiality involved, which limited the amount and variability of data used in the validation of the proposed methods. It also guided -and maybe biased -the choice of the learning technique to be used in some cases.</p><p>Major implications for future research include a need for more taxpayers' behavior analysis variations. As data is limited -in amount and depth, by confidentiality -only some aspects of taxpayer behavior, as amount of sales and related tax, are usually investigated. Some works have been done around fraud techniques as circular trading and residence fraud, mas it is still limited. Taxpayers use regulation gaps in tax domain to apply fraud without breaking out tax procedures, and therefore not being seen as an anomaly. Tax benefits and exemptions, granted seasonally and for a specific period of time to specific segments of taxpayers, are also a huge opportunity for tax evaders. This exceptions and unusual behaviors must be taken into account and be added to current models for improvement and performance analysis.</p><p>Furthermore, it would be interesting to systematically evaluate techniques current proposed to handle tax evasion, how they arrange to adapt incomplete and inconsistent tax data and if a consensus emerge of it. This could be proper done with a systematic literature review focused on data science methods and techniques specifically proposed for tax evasion and fraud behavior.</p><p>Finally, it lacks an evaluation of efficiency loss due to incomplete tax data, by the confidentiality issue, and a definition of how to definitely deal with this problem. It could be achieved throw a comparison of performance and effectiveness between a complete and incomplete data scenarios.</p></div>		</body>
		<back>
			<div type="annex">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>About the Authors</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Sergio Carvalho</head><p>Sergio T. Carvalho is a full professor at the Informatics Institute of the Federal University of Goias in Goiania-GO, Brazil. He received bachelor's degree in Computer Science from the Federal University of Goias, Master and Doctoral degrees in Computer Science, both from the Fluminense Federal University, Brazil. He has experience in the areas of Distributed Systems and Software Engineering and his main areas of expertise are ubiquitous computing, with a focus on healthcare applications, in addition to adaptive distributed systems and software architecture.</p></div>			</div>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Optimizing debt collections using constrained reinforcement learning</title>
		<author>
			<persName><forename type="first">N</forename><surname>Abe</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Melville</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Pendus</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">K</forename><surname>Reddy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">L</forename><surname>Jensen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><forename type="middle">P</forename><surname>Thomas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">J</forename><surname>Bennett</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">F</forename><surname>Anderson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">R</forename><surname>Cooley</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Kowalczyk</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Domick</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Gardinier</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 16th ACM SIGKDD</title>
				<meeting>the 16th ACM SIGKDD</meeting>
		<imprint>
			<date type="published" when="2010">2010</date>
			<biblScope unit="page" from="75" to="84" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<monogr>
		<title level="m" type="main">Benchmarking tax compliance efficiency among south african retail firms using stochastic frontier approach</title>
		<author>
			<persName><forename type="first">O</forename><forename type="middle">A</forename><surname>Akinboade</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">C</forename><surname>Kinfack</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">P</forename><surname>Mokwena</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><forename type="middle">L</forename><surname>Kumo</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2009">2009</date>
			<biblScope unit="volume">32</biblScope>
			<biblScope unit="page" from="1124" to="1146" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<monogr>
		<title level="m" type="main">Detecting Value-Added Tax Evasion by Business Entities of Kazakhstan</title>
		<author>
			<persName><forename type="first">Z</forename><surname>Assylbekov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Melnykov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Bekishev</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Baltabayeva</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Bissengaliyeva</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Mamlin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Czarnowski</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">M</forename><surname>Caballero</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">J</forename><surname>Howlett</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">C</forename><surname>Jain</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2016">2016</date>
			<biblScope unit="page" from="37" to="49" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Automatic electronic invoice classification using machine learning models</title>
		<author>
			<persName><forename type="first">C</forename><surname>Bardelli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Rondinelli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Vecchio</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Figini</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Machine Learning and Knowledge Extraction</title>
		<imprint>
			<biblScope unit="volume">2</biblScope>
			<biblScope unit="issue">4</biblScope>
			<biblScope unit="page" from="617" to="629" />
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<monogr>
		<title level="m" type="main">High quality true-positive prediction for fiscal fraud detection</title>
		<author>
			<persName><forename type="first">S</forename><surname>Basta</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Fassetti</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Guarascio</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Manco</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Giannotti</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Pedreschi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Spinsanti</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Papi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Pisani</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2009">2009</date>
			<biblScope unit="page" from="7" to="12" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<monogr>
		<title level="m" type="main">How the estonian tax and customs board evaluated a tax fraud detection system based on secure multi-party computation</title>
		<author>
			<persName><forename type="first">D</forename><surname>Bogdanov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Jõemets</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Siim</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Vaht</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename></persName>
		</author>
		<imprint>
			<date type="published" when="2015">2015</date>
			<biblScope unit="volume">8975</biblScope>
			<biblScope unit="page" from="227" to="234" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">An auto regressive deep learning model for sales tax forecasting from multiple short time series</title>
		<author>
			<persName><forename type="first">E</forename><surname>Buxton</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Kriz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Cremeens</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Jay</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Intern. Conf. on Machine Learning and Applications</title>
		<imprint>
			<biblScope unit="page" from="1359" to="1364" />
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<monogr>
		<title level="m" type="main">Analysis for possible tax evasions from the value added tax in ecuador using an stochastic model with a non-parametric technique</title>
		<author>
			<persName><forename type="first">M</forename><surname>Cadena</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Morán</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="428" to="438" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<monogr>
		<title level="m" type="main">The improvement on china&apos;s regional standard value added tax revenue estimate method -the construction, application and verification of standard rate model</title>
		<author>
			<persName><forename type="first">D</forename><surname>Cai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Cai</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2011">2011</date>
			<biblScope unit="page" from="783" to="786" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<monogr>
		<title level="m" type="main">Characterization and detection of taxpayers with false invoices using data mining techniques</title>
		<author>
			<persName><forename type="first">Castellón</forename><surname>González</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Velásquez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">D</forename></persName>
		</author>
		<imprint>
			<date type="published" when="2013">2013</date>
			<biblScope unit="volume">40</biblScope>
			<biblScope unit="page" from="1427" to="1436" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<monogr>
		<title level="m" type="main">Sense gst: Text mining &amp; sentiment analysis of gst tweets by naive bayes algorithm</title>
		<author>
			<persName><forename type="first">S</forename><surname>Das</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">K</forename><surname>Kolya</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2017">2017</date>
			<biblScope unit="page" from="239" to="244" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<monogr>
		<title level="m" type="main">Construction of combination forecasting model and related validation -based on combined forecast of sales tax and enterprise income tax in heilongjiang province</title>
		<author>
			<persName><forename type="first">C</forename><surname>Defa</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Jing</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2010">2010</date>
			<biblScope unit="page" from="328" to="331" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<monogr>
		<title level="m" type="main">Combining network visualization and data mining for tax risk assessment</title>
		<author>
			<persName><forename type="first">W</forename><surname>Didimo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Grilli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Liotta</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Menconi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename></persName>
		</author>
		<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="16073" to="16086" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<monogr>
		<title level="m" type="main">Evidence-based software engineering for practitioners</title>
		<author>
			<persName><forename type="first">T</forename><surname>Dyba</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">A</forename><surname>Kitchenham</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Jorgensen</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2005">2005</date>
			<biblScope unit="page" from="58" to="65" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<monogr>
		<title level="m" type="main">The customer is king: Evidence on vat compliance in tanzania</title>
		<author>
			<persName><forename type="first">O</forename><forename type="middle">H</forename><surname>Fjeldstad</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Kagoma</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Mdee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><forename type="middle">H</forename><surname>Sjursen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Somville</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="volume">128</biblScope>
			<biblScope unit="page">104841</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Corporate residence fraud detection</title>
		<author>
			<persName><forename type="first">E</forename><surname>Junqué De Fortuny</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Stankova</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Moeyersoms</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Minnaert</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Provost</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Martens</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">KDD &apos;14</title>
				<imprint>
			<date type="published" when="2014">2014</date>
			<biblScope unit="page" from="1650" to="1659" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<monogr>
		<title level="m" type="main">Identifying business misreporting in vat using network analysis</title>
		<author>
			<persName><forename type="first">C</forename><surname>González-Martel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">M</forename><surname>Hernández</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Manrique-De Lara-Penãte</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page">113464</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<monogr>
		<title level="m" type="main">Assessment of tax evasion risks for vat payers</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Hasanli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Agayev</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2014">2014</date>
			<biblScope unit="volume">153</biblScope>
			<biblScope unit="page" from="487" to="495" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">Attention and memory-augmented networks for dual-view sequential learning</title>
		<author>
			<persName><forename type="first">Y</forename><surname>He</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Zeng</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 26th ACM SIGKDD</title>
				<meeting>the 26th ACM SIGKDD</meeting>
		<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="125" to="134" />
		</imprint>
	</monogr>
	<note>KDD &apos;20</note>
</biblStruct>

<biblStruct xml:id="b19">
	<monogr>
		<title level="m" type="main">Tax payment default prediction using genetic algorithm-based variable selection</title>
		<author>
			<persName><forename type="first">H</forename><surname>Hoglund</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2017">2017</date>
			<biblScope unit="volume">88</biblScope>
			<biblScope unit="page" from="368" to="375" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<monogr>
		<title level="m" type="main">Statistical learning as a tool for optimizing the level of excise tax of mineral oils in slovakia</title>
		<author>
			<persName><forename type="first">B</forename><surname>Holkova</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Falat</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2017">2017</date>
			<biblScope unit="volume">192</biblScope>
			<biblScope unit="page" from="318" to="323" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<monogr>
		<title level="m" type="main">Guidelines for performing systematic literature reviews in software engineering</title>
		<author>
			<persName><forename type="first">B</forename><surname>Kitchenham</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Charters</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2007">2007</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<monogr>
		<title level="m" type="main">Gated mixture variational autoencoders for value added tax audit case selection</title>
		<author>
			<persName><forename type="first">C</forename><surname>Kleanthous</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Chatzis</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="volume">188</biblScope>
			<biblScope unit="page">105048</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b23">
	<monogr>
		<title level="m" type="main">Collaborative information acquisition for data-driven decisions</title>
		<author>
			<persName><forename type="first">D</forename><surname>Kong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Saar-Tsechansky</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2014">2014</date>
			<biblScope unit="volume">95</biblScope>
			<biblScope unit="page" from="71" to="86" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b24">
	<monogr>
		<title level="m" type="main">Is the value added tax system sustainable? the case of the czech and slovak republics</title>
		<author>
			<persName><forename type="first">K</forename><surname>Krzikallová</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Tosenovsk`y</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="volume">12</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b25">
	<analytic>
		<title level="a" type="main">Utilizing machine learning techniques to reveal vat compliance violations in accounting data</title>
		<author>
			<persName><forename type="first">J</forename><surname>Lahann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Scheid</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Fettke</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">IEEE 21st Conference on Business Informatics (CBI)</title>
				<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="volume">01</biblScope>
			<biblScope unit="page" from="1" to="10" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b26">
	<monogr>
		<title level="m" type="main">Application of Statistical Methods for Tax Inspection of Enterprises: A Case Study in Vietnam</title>
		<author>
			<persName><forename type="first">N</forename><forename type="middle">T</forename><surname>Loan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">D</forename><surname>Hac</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><forename type="middle">V H</forename><surname>Anh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">H</forename><surname>Anh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">S</forename><surname>Dong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Kreinovich</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><forename type="middle">N</forename><surname>Thach</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2018">2018</date>
			<biblScope unit="page" from="648" to="655" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b27">
	<monogr>
		<title level="m" type="main">An algorithmic approach to handle circular trading in commercial taxation system</title>
		<author>
			<persName><forename type="first">J</forename><surname>Mathews</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Mehta</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">S</forename><surname>Babu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Kasi</forename><forename type="middle">V</forename><surname>Rao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">V</forename></persName>
		</author>
		<imprint>
			<date type="published" when="2018">2018</date>
			<biblScope unit="page" from="67" to="75" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b28">
	<analytic>
		<title level="a" type="main">Clustering collusive dealers in commercial taxation system</title>
		<author>
			<persName><forename type="first">J</forename><surname>Mathews</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Mehta</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">S</forename><surname>Babu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Kasi</forename><forename type="middle">V</forename><surname>Rao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">V</forename></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Advances in Intelligent Systems and Computing</title>
		<imprint>
			<biblScope unit="volume">869</biblScope>
			<biblScope unit="page" from="703" to="717" />
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b29">
	<analytic>
		<title level="a" type="main">Regression analysis towards estimating tax evasion in goods and services tax</title>
		<author>
			<persName><forename type="first">J</forename><surname>Mathews</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Mehta</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Kuchibhotla</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Bisht</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">B</forename><surname>Chintapalli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">V K V</forename><surname>Rao</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE/WIC/ACM WI</title>
		<imprint>
			<biblScope unit="page" from="758" to="761" />
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b30">
	<analytic>
		<title level="a" type="main">Link prediction techniques to handle tax evasion</title>
		<author>
			<persName><forename type="first">J</forename><surname>Mathews</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Mehta</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Suryamukhi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Babu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">8th ACM IKDD CODS and 26th COMAD</title>
				<imprint>
			<date type="published" when="2021">2021</date>
			<biblScope unit="page" from="307" to="315" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b31">
	<monogr>
		<title level="m" type="main">Detecting tax evaders using trustrank and spectral clustering</title>
		<author>
			<persName><forename type="first">P</forename><surname>Mehta</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Mathews</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Bisht</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Suryamukhi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Kumar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">S</forename><surname>Babu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename></persName>
		</author>
		<editor>LNBIP</editor>
		<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="volume">389</biblScope>
			<biblScope unit="page" from="169" to="183" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b32">
	<monogr>
		<title level="m" type="main">Identifying malicious dealers in goods and services tax</title>
		<author>
			<persName><forename type="first">P</forename><surname>Mehta</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Mathews</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Kasi</forename><forename type="middle">V</forename><surname>Rao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">V</forename><surname>Kumar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">S</forename><surname>Suryamukhi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Babu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">S</forename></persName>
		</author>
		<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="312" to="316" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b33">
	<monogr>
		<title level="m" type="main">Curtailing the tax leakages by nabbing return defaulters in taxation system</title>
		<author>
			<persName><forename type="first">P</forename><surname>Mehta</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Mathews</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Kumar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Suryamukhi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">S</forename><surname>Babu</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2019">2019</date>
			<publisher>CCIS</publisher>
			<biblScope unit="volume">1127</biblScope>
			<biblScope unit="page" from="183" to="195" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b34">
	<analytic>
		<title level="a" type="main">Big data analytics for tax administration</title>
		<author>
			<persName><forename type="first">P</forename><surname>Mehta</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Mathews</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Kumar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Suryamukhi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">S</forename><surname>Babu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">V K V</forename><surname>Rao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Shivapujimath</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Bisht</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">LNCS</title>
		<imprint>
			<biblScope unit="volume">11709</biblScope>
			<biblScope unit="page" from="47" to="57" />
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b35">
	<analytic>
		<title level="a" type="main">Big data analytics for nabbing fraudulent transactions in taxation system</title>
		<author>
			<persName><forename type="first">P</forename><surname>Mehta</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Mathews</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Kumar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Suryamukhi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Sobhan Babu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Kasi</forename><surname>Visweswara Rao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">V</forename></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">LNCS</title>
		<imprint>
			<biblScope unit="volume">11514</biblScope>
			<biblScope unit="page" from="95" to="109" />
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b36">
	<monogr>
		<title level="m" type="main">Predictive modeling for identifying return defaulters in goods and services tax</title>
		<author>
			<persName><forename type="first">P</forename><surname>Mehta</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Mathews</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Suryamukhi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">S</forename><surname>Kumar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">S</forename><surname>Babu</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2018">2018</date>
			<biblScope unit="page" from="631" to="637" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b37">
	<monogr>
		<title level="m" type="main">Selecting studies for systemic review: inclusion and exclusion criteria</title>
		<author>
			<persName><forename type="first">T</forename><surname>Meline</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2006">2006</date>
			<biblScope unit="volume">33</biblScope>
			<biblScope unit="page" from="21" to="27" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b38">
	<monogr>
		<title level="m" type="main">Application of artificial intelligence to accounting, tax, and audit services</title>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">D</forename><surname>Meservy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">L</forename><surname>Denna</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">V</forename><surname>Hansen</surname></persName>
		</author>
		<imprint>
			<date type="published" when="1992">1992</date>
			<biblScope unit="volume">4</biblScope>
			<biblScope unit="page" from="213" to="218" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b39">
	<analytic>
		<title level="a" type="main">Who is bogus? using one-sided labels to identify fraudulent firms from tax returns</title>
		<author>
			<persName><forename type="first">S</forename><surname>Mittal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Reich</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Mahajan</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of. COMPASS &apos;18</title>
				<meeting>COMPASS &apos;18</meeting>
		<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b40">
	<analytic>
		<title level="a" type="main">Systematic mapping studies in software engineering</title>
		<author>
			<persName><forename type="first">K</forename><surname>Petersen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Feldt</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Mujtaba</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Mattsson</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">12th EASE</title>
				<imprint>
			<date type="published" when="2008">2008</date>
			<biblScope unit="page" from="1" to="10" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b41">
	<monogr>
		<title level="m" type="main">A collusion set detection in value added tax using benford&apos;s analysis</title>
		<author>
			<persName><surname>Priya</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Mathews</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">S</forename><surname>Kumar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">S</forename><surname>Babu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">V K V</forename><surname>Rao</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="volume">858</biblScope>
			<biblScope unit="page" from="909" to="921" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b42">
	<monogr>
		<title level="m" type="main">High performance implementation of tax fraud detection algorithm</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">S</forename><surname>Rad</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Shahbahrami</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2015">2015</date>
			<biblScope unit="page" from="6" to="9" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b43">
	<monogr>
		<title level="m" type="main">Detecting corporate tax evasion using a hybrid intelligent system: A case study of iran</title>
		<author>
			<persName><forename type="first">E</forename><surname>Rahimikia</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Mohammadi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Rahmani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Ghazanfari</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2017">2017</date>
			<biblScope unit="volume">25</biblScope>
			<biblScope unit="page" from="1" to="17" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b44">
	<monogr>
		<title level="m" type="main">Value-added tax fraud detection with scalable anomaly detection techniques</title>
		<author>
			<persName><forename type="first">J</forename><surname>Vanhoeyveld</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Martens</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Peeters</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b45">
	<monogr>
		<title level="m" type="main">Complicity Functions for Detecting Organized Crime Rings</title>
		<author>
			<persName><forename type="first">E</forename><surname>Vicente</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Mateos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Jiménez-Martín</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Torra</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Narukawa</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Navarro-Arribas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Yañez</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2016">2016</date>
			<biblScope unit="volume">9880</biblScope>
			<biblScope unit="page" from="205" to="216" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b46">
	<monogr>
		<title level="m" type="main">Neural networks and revenue forecasting: a smarter forecast</title>
		<author>
			<persName><forename type="first">W</forename><forename type="middle">R</forename><surname>Voorhees</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2006">2006</date>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page" from="379" to="388" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b47">
	<monogr>
		<title level="m" type="main">Research on sampling method of tax-checking based on neural network</title>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">L</forename><surname>Wang</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2012">2012</date>
			<biblScope unit="page" from="1541" to="1546" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b48">
	<monogr>
		<title level="m" type="main">Using data mining technique to enhance tax evasion detection performance</title>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">S</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">S</forename><surname>Ou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><forename type="middle">Y</forename><surname>Lin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">I</forename><surname>Chang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">C</forename><surname>Yen</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2012">2012</date>
			<biblScope unit="volume">39</biblScope>
			<biblScope unit="page" from="8769" to="8777" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b49">
	<monogr>
		<title level="m" type="main">A novel tax evasion detection framework via fused transaction network representation</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Dong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><surname>Zheng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Wei</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Li</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="235" to="244" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b50">
	<monogr>
		<title level="m" type="main">Tedm-pu: A tax evasion detection method based on positive and unlabeled learning</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><surname>Zheng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Gao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Dong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Wei</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>He</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename></persName>
		</author>
		<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="1681" to="1686" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b51">
	<analytic>
		<title level="a" type="main">Neural network based transaction classification system for chinese transaction behavior analysis</title>
		<author>
			<persName><forename type="first">J</forename><surname>Yu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Qiao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Shu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Sun</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Zhou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Yang</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE BigData Congress</title>
		<imprint>
			<biblScope unit="page" from="64" to="71" />
			<date type="published" when="2019">2019. 2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b52">
	<analytic>
		<title level="a" type="main">Taxaa: A reliable tax auditor assistant for exploring suspicious transactions</title>
		<author>
			<persName><forename type="first">Z</forename><surname>Zha</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">WWW &apos;</title>
		<imprint>
			<biblScope unit="volume">20</biblScope>
			<biblScope unit="page" from="240" to="244" />
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b53">
	<monogr>
		<title level="m" type="main">Irted-tl: An inter-region tax evasion detection method based on transfer learning</title>
		<author>
			<persName><forename type="first">X</forename><surname>Zhu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Yan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Ruan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><surname>Zheng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Dong</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2018">2018</date>
			<biblScope unit="page" from="1224" to="1235" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
