<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Towards a Data Mining Methodology for the Banking Domain</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author role="corresp">
							<persName><forename type="first">Veronika</forename><surname>Plotnikova</surname></persName>
							<email>veronika.plotnikova@ut.ee</email>
							<affiliation key="aff0">
								<orgName type="department">Institute of Computer Science</orgName>
								<orgName type="institution">University of Tartu</orgName>
								<address>
									<addrLine>J. Liivi 2</addrLine>
									<postCode>50409</postCode>
									<settlement>Tartu</settlement>
									<country key="EE">Estonia</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Marlon</forename><surname>Dumas</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Institute of Computer Science</orgName>
								<orgName type="institution">University of Tartu</orgName>
								<address>
									<addrLine>J. Liivi 2</addrLine>
									<postCode>50409</postCode>
									<settlement>Tartu</settlement>
									<country key="EE">Estonia</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Fredrik</forename><forename type="middle">P</forename><surname>Milani</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Institute of Computer Science</orgName>
								<orgName type="institution">University of Tartu</orgName>
								<address>
									<addrLine>J. Liivi 2</addrLine>
									<postCode>50409</postCode>
									<settlement>Tartu</settlement>
									<country key="EE">Estonia</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Robert</forename><surname>Kitt</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Institute of Computer Science</orgName>
								<orgName type="institution">University of Tartu</orgName>
								<address>
									<addrLine>J. Liivi 2</addrLine>
									<postCode>50409</postCode>
									<settlement>Tartu</settlement>
									<country key="EE">Estonia</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Towards a Data Mining Methodology for the Banking Domain</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">843A8BB16FC65D2556CEFE6454E052B8</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T12:06+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Big data</term>
					<term>Data mining</term>
					<term>CRISP-DM</term>
					<term>Banking</term>
					<term>Financial services</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Telecoms and financial service industries are leaders in adopting data analytics technologies, practices, and heavily invest into "Big Data" tools and related competence development. However, many of them fail to realize benefits of data-driven decision making and maximize "Big Data" business value due to lack of knowledge on how to frame, approach and tackle complex data analytics projects. Existing data mining methodologies are domain-independent, general, abstract and partially outdated. Several refinements of data mining methodologies have been proposed, but they address specific aspects or tasks and remain fragmented. The goal of this doctoral project is to develop a domain-specific data mining methodology for the financial sector, which (1) represents consolidation of existing body of knowledge, and ( <ref type="formula">2</ref>) is validated on the sample of real life data-mining projects. The proposed illustrative case studies approach is based on broad, typical data mining use cases portfolio executed across different geographical regions and business areas of the financial institution.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>The "Big Data" phenomenon, technological advances in data processing and development of algorithmic techniques have fostered widespread adoption of data analytics across different industries. According to the most recent market studies <ref type="bibr" target="#b0">[1]</ref><ref type="bibr" target="#b1">[2]</ref> adoption rate of "Big Data" analytics tripled for all companies reaching 53% in 2017, up from 17% in 2015. Study based on global in-depth survey of 583 business and IT professionals <ref type="bibr" target="#b2">[3]</ref> revealed that 40% of organizations are already using data analytics across key business functions, and it forecasted to double: the rate should exceed 70% in 2018 and reach 90% in 2020. Telecommunications and financial services are the leading industry adopters with 87% and 76% of the respective sector companies already reporting the data analytics usage [1-2]well above average figures. Telecoms and financial sectors as early adopters have developed specific datasets, varieties of data and execute broad set of data mining tasks to solve industry-specific business problems. Therefore, both industries are naturally the most suitable sectors for in-depth exploration of data analytics<ref type="foot" target="#foot_0">1</ref> phenomena and its impact on organizations and business practices. Also, both telecoms and financial services explicitly demonstrate the trend of heavy investments into data analytics technologies and competences seeking to realize benefits from data-driven decision-making and maximize "Big Data" business value. However, many of them consequently fail due to lack of knowledge on how to approach and tackle complex data analytics projects. Welldeveloped, comprehensive, domain-specific methodologies and guidelines to govern data analytics deliveries is key pre-requisite to ensure their success. Business value is realized by reusability, repeatability, scaling and actionability of resulting data analytics products, solutions and insights across organization and is dependent on domainspecific factors.</p><p>Academic literature to date have studied <ref type="bibr" target="#b3">[4,</ref><ref type="bibr" target="#b9">10]</ref> data mining use cases catering to broad variety of business problems along with application-specific issues <ref type="bibr" target="#b4">[5]</ref>. In contrast, existing standard data-mining methodologies have not been extensively and explicitly discussed; they are domain-independent, rather generic, abstract and partially outdated. There are attempts to introduce refinements, but they are also fragmented and concentrated at two opposite ends of the spectrum -either proposing additional elements into a data mining process, or focusing on organizational aspects (general data mining processes and tools integration into business, enterprise and IT architectures); domain-specific factors are not considered.</p><p>Comprehensive, domain-specific methodologies for data analytics projects are critical for business value realization, but they do not exist. The purpose of this PhD project is to bridge the gap and develop such data mining methodology. As telecoms and financial services are identified as one of the most suitable sectors for in-depth exploration of data analytics business practices, the new methodology will be designed for one of them -banking domain <ref type="foot" target="#foot_1">2</ref> . The project"s research proposal is structured as follows. Section 2 introduces necessary basic concepts and terminology, and reflects on their current usage by practitioners. Section 3 offers literature review followed by identification of existing research gaps and formulation of research questions, Section 4 proposes research methodology while Section 5 concludes.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Basic Concepts and Related Terminology</head><p>Data Mining is defined as set of rules, processes, algorithms that are designed to find valuable "knowledge", extract patterns, identify relationships, etc. from large date warehouses or datasets <ref type="bibr" target="#b9">[10]</ref>. This involves automated data extraction, processing, modeling with the help of vast range of methods and techniques of statistics, machine learning, artificial intelligence, etc. There are three major standard methodologies developed and widely used in academic research and in business practices, CRISP-DM, SEMMA, ASUM-DM. Short overview of each and current usage practices are presented in the following subsections.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1">Overview of Existing Standard Data Mining Methodologies</head><p>CRISP-DM (Cross-Industry Standard Process for Data Mining) is industry-driven guidelines to perform data mining on large datasets <ref type="bibr" target="#b8">[9]</ref><ref type="bibr" target="#b9">[10]</ref><ref type="bibr" target="#b10">[11]</ref>. It originated from KDD (Knowledge Discovery in Databases) field which also had KDD process developed in 1996 <ref type="bibr" target="#b7">[8]</ref>. Essentially, CRISP-DM was built on KDD process fundamentals <ref type="foot" target="#foot_2">3</ref> , however, with several abstraction layers it has achieved much higher level of complexity and details (eg. generic tasks level consists of 24 tasks and outputs), thereby, representing refinement of KDD process. CRIPS-DM development was led by industrial consortium with the final version published in 2000; attempts to update initiated in 2006 were unsuccessful. CRISP-DM divides data mining process into six not strictly sequential, but iterative phasesbusiness understanding, data understanding and data preparation, modeling, evaluation, and deployment. SEMMA (Sample, Explore, Modify, Model and Assess) is list of sequential steps guiding implementation of data mining process developed by SAS Institute <ref type="bibr" target="#b9">[10]</ref><ref type="bibr" target="#b10">[11]</ref>.</p><p>ASUM-DM (Analytics Solutions Unified Method for Data Mining) was released in 2015 by IBM with the purposes to refine and extend CRISP-DM.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2">Data Mining Methodologies Usage Patterns</head><p>According to KDNuggets<ref type="foot" target="#foot_3">4</ref> polls results presented in the Table <ref type="table" target="#tab_0">1</ref>, the leading methodology for data mining process is CRISP-DM, followed by SEMMA and KDD <ref type="bibr" target="#b5">[6]</ref>. However, the usage of CRIPS-DM has reached plateau while others are steadily declining. Importantly, data scientists own methodologies usage stays above 25% rate and coupled with other ones (domain and non-domain specific) is steadily increasing reaching usage rate of over 30% <ref type="bibr" target="#b5">[6]</ref>. This indicates decline in adoption rates of CRISP-DM and potential need for revision and modification. Indeed, this methodology though widely used was not updated since 2000 while data mining usage, methods and tools have developed exponentially.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Literature Review</head><p>The literature review was conducted using key principles of Systematic Literature Review approach <ref type="bibr" target="#b6">[7]</ref>. The corpus of scientific research articles, publications and books was retrieved and the following steps conducted.</p><p>Step 1 -Scopus and Web of Science databases have been searched with the search string of the three standard major methodologies described in Section 2, i.e. "CRISP-DM", "SEMMA", "ASUM-DM" jointly with domain keyword "banking" <ref type="foot" target="#foot_4">5</ref> . All texts referred from databases were retrieved and included into literature corpus.</p><p>Step 2 -Identical procedure as in Step 1 was performed for Google Scholar database, but with the delimitation -the texts corpus was retrieved for the first 100 hits. The threshold was determined empirically based on evaluation of relevancy of texts spanning beyond first 100 search results. The relevancy of the retrieved texts after the given threshold declined significantly and did not contribute to additional insights.</p><p>In both steps, there were no time restrictions set, all texts were retrieved as many years back as database contained, oldest publication dated back to 1998, newest to 2018. 1/3 of studies have been published over last 3 years while approximately half of the scientific texts are concentrated over last 5 years period. Overall text corpus was reviewed and evaluated on iterative basis with respect to the relevancy of studies. Summary statistics of the literature reviewed is presented in the The first research class (hereinafter, Class 1) relates to application of various data mining methodologies for specific case studies. Importantly, the typical purpose of case studies is to solve various business problems of the financial institutions by the means of modeling tasks. The case studies can be further categorized as follows:</p><p>1. customer behavior modeling with the purpose to identify customer likely to churn or loyal customer <ref type="bibr" target="#b12">[13]</ref>, 2. profiling customers either according to the usage patterns of various digital channels while interacting with the bank, patterns of electronic transactions, eg., <ref type="bibr" target="#b12">[13]</ref><ref type="bibr" target="#b13">[14]</ref> or based on other features, 3. overall customer relationship management including customer segmentation tasks, customer targeting <ref type="bibr" target="#b15">[15]</ref>, 4. modeling tasks to support variety of risk management processes: a. credit risk identification and managementcredit scoring, modeling and identifications of defaults <ref type="bibr" target="#b16">[16]</ref>, b. identification and prevention of fraud behavior and/or ALM risks, c. risk control activities including auditing (internal/external in bank domain) <ref type="bibr" target="#b17">[17]</ref>, 5. efficiency studies, eg. optimization of branch network <ref type="bibr" target="#b18">[18]</ref>. In Class 1 publications, the relevant data mining methodologies are used to structure the data mining process and achieve data mining goals. Critical discussions are not common, and if present, are structured around the method application at best, typically considering data.</p><p>Also, Class 1 research concentrates on the application of the particular scientific technique processing aspects, types of modeling techniques with associated selection of the best one based on evaluation results, model validation aspects, feature selection and the final set of the best predictors. At the same time, there is lack of critical evaluation of methodology aspects, discussions on the methodology steps, substeps that need to be modified, added, or are redundant is largely omitted. Knowledge discovery in relation to executing the data mining task methodologically remains "hidden", "tacit" and confined within individual experience of the data mining experts. This might be evidenced by own methodologies usage growth as identified in subsection 2.2.</p><p>The second class of publications (hereinafter, Class 2) concentrates on data mining methodologies or processes on a higher abstraction levels. A subset of these studies also contains case studies similarly to Class 1 publications, but in contrast, these experiments are conducted on a broader scope with larger number of organizations and/or data mining tasks. Also, Class 2 publications typically present critical evalua-tion of existing standard data mining methodologies. Such approach supports identification of deficiencies and suggests improvements. Importantly, Class 2 research takes various domain and industry perspectives. However, most of the studies focus on the analysis of specific step of the methodologies. Very rare exception is <ref type="bibr" target="#b11">[12]</ref> which proposes novel direction -design of fuzzy expert system to evaluate overall success of data mining projects by evaluating each step of the process methodology.</p><p>Critical evaluation results and proposed suggestions can be structured based on the following methodology phases, steps or areas.</p><p>Deployment phase and business process. CRISP-DM methodology is identified as lacking deployment phase details which can support integration of data mining results into business process <ref type="bibr" target="#b19">[19]</ref>. Pivk, et al identify relationship between data and data mining sophistication levels, and propose improvements by use of ontologies (domain, business process and data mining) including extension elements to CRISP-DM, and Service-Oriented Architecture for data mining. <ref type="bibr" target="#b20">[20]</ref> proposes new deployment framework (DEEPER). Associated concepts of ontologies and broader business architecture for establishing data mining systems in organization are also discussed <ref type="bibr" target="#b21">[21]</ref>.</p><p>Data preparation phase and data requirements. Number of studies proposes additional substeps and techniques for data preparations stage starting from adjustments to KDD initiated in <ref type="bibr" target="#b22">[22]</ref> or alternatively, specific methodologies on gathering and structuring data requirements in the broader context of data-intensive projects and data governance <ref type="bibr" target="#b23">[23]</ref>. These studies are performed in the context of IT system architecture, discussing enterprise data warehouses, "data lakes" and associated data and information modeling and management concepts (eg. Business Information Modeling). Given the fact that ~80% time in data mining process is taken by data preprocessing and preparation steps, this part of research is of utmost importance.</p><p>Model evaluation and selection phase. This research direction focuses on relevant methodology enhancements to model evaluation and selection steps based on decision-support framework, eg. <ref type="bibr" target="#b24">[24]</ref> proposes hybrid methodology and procedure for generating and selecting the most appropriate casual explanatory model.</p><p>Novel methodology enhancements and adjustments. Limited, but valuable number of studies has emerged as a response to legislation and regulatory requirements, eg. <ref type="bibr" target="#b25">[25]</ref> developed DADM (Discrimination-aware data mining) framework. Other valuable direction of research is represented by authors proposing extension of methodological frameworks from other business areas or processes. Adaptive Software Development (ASD) methodology is adopted and introduced as ASD-DM for predictive data mining in <ref type="bibr" target="#b26">[26]</ref>. Other research is associated with Sex Sigma Lean methodologies modifications and application in data mining process context, eg. DMAIC <ref type="foot" target="#foot_5">6</ref> application discussed in <ref type="bibr" target="#b27">[27]</ref>.</p><p>BI technologies, tools and IT architectures perspectives. Part of the studies acknowledge importance of data mining processes and associated methodologies when designing and implementing respective BI, Data Science technologies and tools in the organizations. Such studies lack enhancement prospective, however, they discuss relevant aspects for successful integration of data mining process into overall IT architecture <ref type="bibr" target="#b28">[28]</ref>.</p><p>Organizational prospective. Finally, there is set of Class 2 publications progressing to higher levels of generalization <ref type="bibr" target="#b29">[29]</ref>. These studies do not focus on application of data mining methodologies, but rather concentrate on broader investigation on adoption of data mining as such. These studies, though not addressing concrete methodological aspects are rather important as they discuss relevant motivational and organizational aspects. These aspects are disregarded in existing standard data mining methodologies, however, they do represent an inseparable part of practical context and implementation environment in which data mining methodology is used.</p><p>The literature review showed a few well-developed frameworks for data mining, and they have been created for wide industry application. Existing data mining methodologies do not cater to specific industry needs such as banking domain. Thus, existing research gap can be formulated as follows: Research Gap -Lack of comprehensive data-mining methodology applicable, adapted for banking industry. The following research questions address it: RQ1: What are the existing data mining frameworks and what components they include? RQ2: What within the existing frameworks could be re-used, removed or needed to be added in order to develop the data mining methodology for banking domain?</p><p>The research methodology to address research questions is presented in Section 4.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Research Methodology</head><p>The research methodology consists of two phases summarized in the Table <ref type="table" target="#tab_3">3</ref>. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">Conclusion</head><p>The Systematic Literature Review for the research project (documented in Section 3) has demonstrated a few well-developed frameworks for data mining created for wide industry application, which do not cater to specific industry needs such as banking domain. Also, scarce research concerned with this topic in specific financial services domain provides opportunities for new insights and novel findings relevant for both practitioners and academia. Section 4 proposed project research methodology to: (1) elicit and consolidate domain-specific refinements towards existing data mining methodologies from existing body of knowledge, and (2) to validate against portfolio of real-life data mining projects executed in banking domain. The result of the study will be conceptualized, enhanced data-mining methodology specifically designed to frame and tackle complex data analytics projects in financial services industry.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1 .</head><label>1</label><figDesc>KDNuggets Poll on Data Mining Methodology results,<ref type="bibr" target="#b5">[6]</ref> </figDesc><table><row><cell>Poll Years</cell><cell>2002</cell><cell>2004</cell><cell>2007</cell><cell>2014</cell></row><row><cell>CRISP-DM</cell><cell>51%</cell><cell>42%</cell><cell>42%</cell><cell>43%</cell></row><row><cell>SEMMA</cell><cell>12%</cell><cell>10%</cell><cell>13%</cell><cell>8.5%</cell></row><row><cell>KDD process</cell><cell></cell><cell></cell><cell>7%</cell><cell>7.5%</cell></row><row><cell>My organization"s</cell><cell>7%</cell><cell>6%</cell><cell>5%</cell><cell>3.5%</cell></row><row><cell>My own</cell><cell>23%</cell><cell>28%</cell><cell>19%</cell><cell>27.5%</cell></row><row><cell>Other (incl. domain specific)</cell><cell>4%</cell><cell>6%</cell><cell>9% (5%)</cell><cell>10% (2%)</cell></row><row><cell>None</cell><cell>4%</cell><cell>7%</cell><cell>5%</cell><cell>0%</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2 below . Table 2 .</head><label>2.2</label><figDesc>Summary statistics on retrieved publications</figDesc><table /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Database Scopus and Web of Science Google Scholar Total Class 1 texts</head><label></label><figDesc>Scientific publications from databases were supplemented by additional set of general materials (over 20 various texts). They were primarily retrieved from industry web-sites via general search and provide descriptive information on data mining methodologies and processes in industry context.Analysis of the selected publications corpus enables to perform next research steps: 1. construct high-level typification of research performed in the field over the last 10 years, 2. identify and categorize the existing research gaps, and 3. formulate research questions. Based on analysis of scientific publications, existing research can be broadly typified into two major classes.</figDesc><table><row><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell>Class 2</cell></row><row><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell>texts</cell></row><row><cell>No. of texts (string</cell><cell>57</cell><cell>91</cell><cell>148</cell><cell></cell><cell></cell></row><row><cell>Crisp-DM)</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell>No. of texts (string</cell><cell>9</cell><cell>94</cell><cell>103</cell><cell></cell><cell></cell></row><row><cell>SEMMA)</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell>No. of texts (string</cell><cell>1</cell><cell>3</cell><cell>4</cell><cell></cell><cell></cell></row><row><cell>ASUM-DM)</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell>Total (excl. duplica-</cell><cell>61</cell><cell>163</cell><cell>224</cell><cell></cell><cell></cell></row><row><cell>tions)</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell>Total (excl. irrelevant)</cell><cell>55</cell><cell>132</cell><cell>187</cell><cell>83</cell><cell>104</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_3"><head>Table 3 .</head><label>3</label><figDesc>Research methodology overview for Phase 1 and 2Expected outcome of the research is conceptualized, refined data mining methodology with adaptations to financial services domain, which (1) represents consolidation of existing body of knowledge, and (2) is validated on the sample of real life data-mining projects. The proposed illustrative case studies approach is based on broad, typical data mining use cases portfolio executed across different geographical regions and business areas of the financial institution.</figDesc><table><row><cell>Phase</cell><cell>RQs</cell><cell>Activities</cell><cell>Expected Outcome</cell></row><row><cell>1 -Comprehensive</cell><cell cols="2">RQ1 Systematic Literature Review and</cell><cell>Comprehensive over-</cell></row><row><cell>review of existing</cell><cell cols="2">analysis of its results</cell><cell>view of existing DM</cell></row><row><cell>frameworks</cell><cell></cell><cell></cell><cell>frameworks</cell></row><row><cell>2.1 -Refinements</cell><cell cols="2">RQ2 Identification, consolidation of</cell><cell>Structured list of re-</cell></row><row><cell>generation</cell><cell cols="2">refinements towards existing DM</cell><cell>finements to DM meth-</cell></row><row><cell></cell><cell cols="2">methodologies from existing litera-</cell><cell>odologies phases, steps</cell></row><row><cell></cell><cell>ture</cell><cell></cell><cell>and deliverables</cell></row><row><cell>2.2 -Validation</cell><cell cols="2">RQ2 Validating refinements proposed in</cell><cell>Common, validated</cell></row><row><cell></cell><cell cols="2">phase 2.1 with sample of real-life</cell><cell>refinements set</cell></row><row><cell></cell><cell cols="2">data mining projects. Removal of</cell><cell></cell></row><row><cell></cell><cell cols="2">conflicting, irrelevant refinements</cell><cell></cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">In this paper, data analysis and data mining are used as synonyms, even though it is acknowledged that data analytics is broader field, as it encompasses statistical analysis methods that are traditionally not associated with data mining.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_1"><ref type="bibr" target="#b1">2</ref> In this paper, banking domain refers to universal banking business model with extensive products and services portfolio offered to all types of clientele, and with variety of support functions (risk, operations, etc.).</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_2">KDD process consists of 9 steps: learning application domain, dataset creation, data cleaning &amp; processing, data reduction &amp; projection, choosing the function of data mining, choosing data mining algorithm, interpretation, using discovered</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="4" xml:id="foot_3">knowledge.<ref type="bibr" target="#b3">4</ref> One of the leading websites on Business Analytics, Data Mining, and Data Science (edited by Gregory I. Piatetsky-Shapiro, one of the major contributors to Knowledge Discovery and Data Mining concepts).</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="5" xml:id="foot_4">As CRISP-DM methodology is elaborated derivation, refinement of KDD process (as described in Section 2.1), KDD was omitted from the direct search.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="6" xml:id="foot_5">Acronym for Define, Measure, Analyze, Improve and Control, refers to a data-driven improvement cycle used for improving, optimizing and stabilizing business processes and designs.</note>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<monogr>
		<title level="m" type="main">Big Data Analytics Market Study</title>
		<ptr target="https://globenewswire.com/news-release/2017/12/20/1267022/0/en/Dresner-Advisory-Services-Publishes-2017-Big-Data-Analytics-Market-Study.html" />
		<imprint>
			<date type="published" when="2017">2017. 2018/04/06</date>
		</imprint>
		<respStmt>
			<orgName>Nasdaq Globe Newsire ; Dresdner Advisory Services Publishes</orgName>
		</respStmt>
	</monogr>
	<note>news feed</note>
</biblStruct>

<biblStruct xml:id="b1">
	<monogr>
		<ptr target="https://www.forbes.com/sites/louiscolumbus/2017/12/24/53-of-companies-are-adopting-big-data-analytics/#4cf12a2139a1" />
		<title level="m">Forbes homepage</title>
				<imprint>
			<date type="published" when="2018-04-06">2018/04/06</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">The Future Belongs To Those Who Monetize And Maximize Their Data</title>
		<author>
			<persName><forename type="first">Forrester</forename><surname>Consulting</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Industry report</title>
		<imprint>
			<date type="published" when="2017-01">January 2017. last accessed 2018/04/06</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">A review on data mining in banking sector</title>
		<author>
			<persName><forename type="first">V</forename><surname>Jayasree</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">V S</forename><surname>Balan</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">American Journal of Applied Sciences</title>
		<imprint>
			<biblScope unit="volume">10</biblScope>
			<biblScope unit="issue">10</biblScope>
			<biblScope unit="page" from="1160" to="1165" />
			<date type="published" when="2013">2013</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Data mining in business services</title>
		<author>
			<persName><forename type="first">L</forename><surname>David</surname></persName>
		</author>
		<author>
			<persName><surname>Olson</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Service Business</title>
		<imprint>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="issue">3</biblScope>
			<biblScope unit="page" from="181" to="193" />
			<date type="published" when="2007">2007</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<monogr>
		<ptr target="https://www.kdnuggets.com/2014/10/crisp-dm-top-methodology-analytics-data-mining-data-science-projects.html" />
		<title level="m">KDNuggets Homepage</title>
				<imprint>
			<date type="published" when="2018-04-07">2018/04/07</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Customer relationship management mechanisms: A systematic review of the state of the art literature and recommendations for future research</title>
		<author>
			<persName><forename type="first">Z</forename><surname>Soltani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><forename type="middle">J</forename><surname>Navimipour</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Computers in Human Behavior</title>
		<imprint>
			<biblScope unit="volume">61</biblScope>
			<biblScope unit="page" from="667" to="688" />
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">The KDD process for extracting useful knowledge from volumes of data</title>
		<author>
			<persName><forename type="first">U</forename><surname>Fayyad</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Piatetsky-Shapiro</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Smyth</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Comminications of the ACM</title>
		<imprint>
			<biblScope unit="volume">39</biblScope>
			<biblScope unit="issue">11</biblScope>
			<biblScope unit="page" from="27" to="34" />
			<date type="published" when="1996">1996</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">.0, step-by-step data mining guide</title>
		<author>
			<persName><forename type="first">S</forename><surname>Chapman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Clinton</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Kerber</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Khabaza</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Reinartz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Shearer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Wirth</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">CRISP-DM 1</title>
				<imprint>
			<publisher>SPSS Inc</publisher>
			<date type="published" when="2000">2000</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<monogr>
		<author>
			<persName><forename type="first">V</forename><surname>Morabito</surname></persName>
		</author>
		<title level="m">The future of digital business innovation: Trends and practices</title>
				<meeting><address><addrLine>Switzerland</addrLine></address></meeting>
		<imprint>
			<publisher>Springer International Publishing</publisher>
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
	<note>1 st edition</note>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">A Proposed Data Mining Methodology and its Application to Industrial Procedures</title>
		<author>
			<persName><forename type="first">S</forename><surname>Rohanizadeha</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Moghadama</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Industrial Engineering</title>
		<imprint>
			<biblScope unit="volume">4</biblScope>
			<biblScope unit="page" from="37" to="50" />
			<date type="published" when="2009">2009</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Evaluating the success level of data mining projects based on CRISP-DM methodology by a Fuzzy expert system</title>
		<author>
			<persName><forename type="first">A</forename><surname>Nadali</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><forename type="middle">E</forename><surname>Kakhky</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">H</forename><surname>Nosratabadi</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International Conference on Electronics Computer Technology (ICECT)</title>
				<imprint>
			<date type="published" when="2011">2011</date>
			<biblScope unit="page" from="161" to="165" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Intelligent data analysis approaches to churn as a business problem: a survey</title>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">L</forename><surname>García</surname></persName>
		</author>
		<author>
			<persName><forename type="first">À</forename><surname>Nebot</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Vellido</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Knowledge and Information Systems</title>
		<imprint>
			<biblScope unit="volume">51</biblScope>
			<biblScope unit="issue">3</biblScope>
			<biblScope unit="page" from="719" to="774" />
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Application of a data mining process model: A case study-profiling internet banking users in Jamaica</title>
		<author>
			<persName><forename type="first">G</forename><surname>Mansingh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K.-M</forename><surname>Osei-Bryson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Rao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Mills</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">AMCIS</title>
				<imprint>
			<date type="published" when="2010">2010</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title/>
	</analytic>
	<monogr>
		<title level="j">Proceedings</title>
		<imprint>
			<biblScope unit="volume">439</biblScope>
			<date type="published" when="2010">2010</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Implementation of CRISP-DM model in order to define the sales pipelines of PT X</title>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">U</forename><surname>Daihani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Feblian</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceeding of 9th International Seminar on Industrial Engineering and Management</title>
				<meeting>eeding of 9th International Seminar on Industrial Engineering and Management</meeting>
		<imprint>
			<date type="published" when="2016">2016</date>
			<biblScope unit="page" from="1" to="10" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Prediction of financial distress: An empirical study of listed Chinese companies using data mining</title>
		<author>
			<persName><forename type="first">R</forename><surname>Geng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Bose</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Chen</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">European Journal of Operational Research</title>
		<imprint>
			<biblScope unit="volume">241</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="236" to="247" />
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">E-commerce impact: Emerging technology -Electronic auditing</title>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">M</forename><surname>Shaikh</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Managerial Auditing Journal</title>
		<imprint>
			<biblScope unit="volume">20</biblScope>
			<biblScope unit="issue">4</biblScope>
			<biblScope unit="page" from="408" to="421" />
			<date type="published" when="2005">2005</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">Branch Efficiency and Location Forecasting Application of Ziraat Bank</title>
		<author>
			<persName><forename type="first">I</forename><surname>Met</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Tunali</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Erkoç</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Tanrikulu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Applied Finance &amp; Banking</title>
		<imprint>
			<biblScope unit="volume">7</biblScope>
			<biblScope unit="issue">4</biblScope>
			<biblScope unit="page" from="1" to="13" />
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">On approach for the implementation of data mining to business process optimisation in commercial companies</title>
		<author>
			<persName><forename type="first">A</forename><surname>Pivk</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Vasilecas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Kalibatiene</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Rupnik</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Technological and Economic Development of Economy</title>
		<imprint>
			<biblScope unit="volume">19</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="237" to="256" />
			<date type="published" when="2013">2013</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">A portfolio theoretic approach to administering advanced analytics: The case of multi-stage campaign management</title>
		<author>
			<persName><forename type="first">S</forename><surname>Balkan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Goul</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 44th Annual Hawaii International Conference on System Sciences</title>
				<meeting>the 44th Annual Hawaii International Conference on System Sciences</meeting>
		<imprint>
			<date type="published" when="2011">2011</date>
			<biblScope unit="page" from="1" to="10" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<analytic>
		<title level="a" type="main">Promoting data mining methodologies by architecturelevel optimizations</title>
		<author>
			<persName><forename type="first">G</forename><surname>Xin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Enjie</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Hongxia</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings 2009 2nd International Workshop on Knowledge Discovery and Data Mining</title>
				<meeting>2009 2nd International Workshop on Knowledge Discovery and Data Mining<address><addrLine>WKKD</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2009">2009. 2009</date>
			<biblScope unit="page" from="179" to="182" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<analytic>
		<title level="a" type="main">An extended process model of knowledge discovery in database</title>
		<author>
			<persName><forename type="first">Tianrui</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Da</forename><surname>Ruan</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Enterprise Information Management</title>
		<imprint>
			<biblScope unit="volume">20</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="169" to="177" />
			<date type="published" when="2007">2007</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b23">
	<analytic>
		<title level="a" type="main">Business information modeling: A methodology for data-intensive projects, data science and big data governance</title>
		<author>
			<persName><forename type="first">T</forename><surname>Priebe</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Markus</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings 2015 IEEE International Conference on Big Data (IEEE Big Data</title>
				<meeting>2015 IEEE International Conference on Big Data (IEEE Big Data</meeting>
		<imprint>
			<date type="published" when="2015">2015. 2015</date>
			<biblScope unit="page" from="2056" to="2065" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b24">
	<analytic>
		<title level="a" type="main">A hybrid decision support framework for generating and selecting causal explanatory regression splines models for information systems research</title>
		<author>
			<persName><forename type="first">K.-M</forename><surname>Osei-Bryson</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Information System Frontiers</title>
		<imprint>
			<biblScope unit="volume">17</biblScope>
			<biblScope unit="issue">4</biblScope>
			<biblScope unit="page" from="845" to="856" />
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b25">
	<analytic>
		<title level="a" type="main">Better decision support through exploratory discriminationaware data mining: Foundations and empirical evidence</title>
		<author>
			<persName><forename type="first">B</forename><surname>Berendt</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Preibusch</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Artificial Intelligence and Law</title>
		<imprint>
			<biblScope unit="volume">22</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="175" to="209" />
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b26">
	<analytic>
		<title level="a" type="main">Applying Adaptive Software Development (ASD) agile modeling on predictive data mining applications: ASD-DM methodology</title>
		<author>
			<persName><forename type="first">M</forename><surname>Alnoukari</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Alzoabi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Hanna</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings International Symposium on Information Technology</title>
				<meeting>International Symposium on Information Technology<address><addrLine>ITSim</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2008">2008. 2008</date>
			<biblScope unit="page">2</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b27">
	<analytic>
		<title level="a" type="main">Lean Six Sigma meets data science: Integrating two approaches based on three case studies</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">I</forename><surname>Zwetsloot</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Kuiper</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Akkerhuisc</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>De Koningd</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename></persName>
		</author>
		<idno type="DOI">10.1080/08982112.2018.1434892</idno>
	</analytic>
	<monogr>
		<title level="j">Quality Engineering</title>
		<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
	<note>online journal</note>
</biblStruct>

<biblStruct xml:id="b28">
	<analytic>
		<title level="a" type="main">Data warehousing and analytics in banking: Implementation</title>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">V</forename><surname>Narayanan</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Advances in Banking Technology and Management: Impacts of ICT and CRM</title>
				<editor>
			<persName><forename type="first">Editor</forename><surname>Vadlamani</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Ravi</forename></persName>
		</editor>
		<meeting><address><addrLine>Hershey, New York</addrLine></address></meeting>
		<imprint>
			<publisher>publisher Information Science Reference</publisher>
			<date type="published" when="2008">2008</date>
			<biblScope unit="page" from="217" to="231" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b29">
	<analytic>
		<title level="a" type="main">Extending data mining methodologies to encompass organizational factors</title>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">C W</forename><surname>Debuse</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Systems Research and Behavioral Science</title>
		<imprint>
			<biblScope unit="volume">24</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="183" to="190" />
			<date type="published" when="2007">2007</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
