<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Towards a Data Mining Methodology for the Banking Domain</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Veronika Plotnikova</string-name>
          <email>veronika.plotnikova@ut.ee</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Supervisors: Marlon Dumas</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Fredrik P. Milani</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Robert Kitt</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Tartu, Institute of Computer Science</institution>
          ,
          <addr-line>J. Liivi 2, 50409 Tartu</addr-line>
          ,
          <country country="EE">Estonia</country>
        </aff>
      </contrib-group>
      <fpage>46</fpage>
      <lpage>54</lpage>
      <abstract>
        <p>Telecoms and financial service industries are leaders in adopting data analytics technologies, practices, and heavily invest into „Big Data‟ tools and related competence development. However, many of them fail to realize benefits of data-driven decision making and maximize „Big Data‟ business value due to lack of knowledge on how to frame, approach and tackle complex data analytics projects. Existing data mining methodologies are domain-independent, general, abstract and partially outdated. Several refinements of data mining methodologies have been proposed, but they address specific aspects or tasks and remain fragmented. The goal of this doctoral project is to develop a domain-specific data mining methodology for the financial sector, which (1) represents consolidation of existing body of knowledge, and (2) is validated on the sample of real life data-mining projects. The proposed illustrative case studies approach is based on broad, typical data mining use cases portfolio executed across different geographical regions and business areas of the financial institution.</p>
      </abstract>
      <kwd-group>
        <kwd>Big data</kwd>
        <kwd>Data mining</kwd>
        <kwd>CRISP-DM</kwd>
        <kwd>Banking</kwd>
        <kwd>Financial services</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        The „Big Data‟ phenomenon, technological advances in data processing and
development of algorithmic techniques have fostered widespread adoption of data analytics
across different industries. According to the most recent market studies [
        <xref ref-type="bibr" rid="ref1 ref2">1-2</xref>
        ] adoption
rate of „Big Data‟ analytics tripled for all companies reaching 53% in 2017, up from
17% in 2015. Study based on global in-depth survey of 583 business and IT
professionals [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] revealed that 40% of organizations are already using data analytics across
key business functions, and it forecasted to double: the rate should exceed 70% in
2018 and reach 90% in 2020. Telecommunications and financial services are the
leading industry adopters with 87% and 76% of the respective sector companies already
reporting the data analytics usage [
        <xref ref-type="bibr" rid="ref1 ref2">1-2</xref>
        ] – well above average figures.
      </p>
      <p>Telecoms and financial sectors as early adopters have developed specific datasets,
varieties of data and execute broad set of data mining tasks to solve industry-specific
business problems. Therefore, both industries are naturally the most suitable sectors
for in-depth exploration of data analytics1 phenomena and its impact on organizations
and business practices. Also, both telecoms and financial services explicitly
demonstrate the trend of heavy investments into data analytics technologies and
competences seeking to realize benefits from data-driven decision-making and maximize „Big
Data‟ business value. However, many of them consequently fail due to lack of
knowledge on how to approach and tackle complex data analytics projects.
Welldeveloped, comprehensive, domain-specific methodologies and guidelines to govern
data analytics deliveries is key pre-requisite to ensure their success. Business value is
realized by reusability, repeatability, scaling and actionability of resulting data
analytics products, solutions and insights across organization and is dependent on
domainspecific factors.</p>
      <p>
        Academic literature to date have studied [
        <xref ref-type="bibr" rid="ref10 ref4">4, 10</xref>
        ] data mining use cases catering to
broad variety of business problems along with application-specific issues [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. In
contrast, existing standard data-mining methodologies have not been extensively and
explicitly discussed; they are domain-independent, rather generic, abstract and
partially outdated. There are attempts to introduce refinements, but they are also fragmented
and concentrated at two opposite ends of the spectrum - either proposing additional
elements into a data mining process, or focusing on organizational aspects (general
data mining processes and tools integration into business, enterprise and IT
architectures); domain-specific factors are not considered.
      </p>
      <p>Comprehensive, domain-specific methodologies for data analytics projects are
critical for business value realization, but they do not exist. The purpose of this PhD
project is to bridge the gap and develop such data mining methodology. As telecoms and
financial services are identified as one of the most suitable sectors for in-depth
exploration of data analytics business practices, the new methodology will be designed for
one of them - banking domain2. The project‟s research proposal is structured as
follows. Section 2 introduces necessary basic concepts and terminology, and reflects on
their current usage by practitioners. Section 3 offers literature review followed by
identification of existing research gaps and formulation of research questions, Section
4 proposes research methodology while Section 5 concludes.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Basic Concepts and Related Terminology</title>
      <p>
        Data Mining is defined as set of rules, processes, algorithms that are designed to find
valuable „knowledge‟, extract patterns, identify relationships, etc. from large date
warehouses or datasets [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. This involves automated data extraction, processing,
modeling with the help of vast range of methods and techniques of statistics, machine
learning, artificial intelligence, etc. There are three major standard methodologies
1 In this paper, data analysis and data mining are used as synonyms, even though it is
acknowledged that data analytics is broader field, as it encompasses statistical analysis methods that
are traditionally not associated with data mining.
2 In this paper, banking domain refers to universal banking business model with extensive
products and services portfolio offered to all types of clientele, and with variety of support
functions (risk, operations, etc.).
developed and widely used in academic research and in business practices,
CRISPDM, SEMMA, ASUM-DM. Short overview of each and current usage practices are
presented in the following subsections.
2.1
      </p>
      <sec id="sec-2-1">
        <title>Overview of Existing Standard Data Mining Methodologies</title>
        <p>
          CRISP-DM (Cross-Industry Standard Process for Data Mining) is industry–driven
guidelines to perform data mining on large datasets [
          <xref ref-type="bibr" rid="ref10 ref11 ref9">9-11</xref>
          ]. It originated from KDD
(Knowledge Discovery in Databases) field which also had KDD process developed in
1996 [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. Essentially, CRISP-DM was built on KDD process fundamentals3, however,
with several abstraction layers it has achieved much higher level of complexity and
details (eg. generic tasks level consists of 24 tasks and outputs), thereby, representing
refinement of KDD process. CRIPS-DM development was led by industrial
consortium with the final version published in 2000; attempts to update initiated in 2006 were
unsuccessful. CRISP-DM divides data mining process into six not strictly sequential,
but iterative phases – business understanding, data understanding and data
preparation, modeling, evaluation, and deployment.
        </p>
        <p>
          SEMMA (Sample, Explore, Modify, Model and Assess) is list of sequential steps
guiding implementation of data mining process developed by SAS Institute [
          <xref ref-type="bibr" rid="ref10 ref11">10-11</xref>
          ].
        </p>
        <p>ASUM-DM (Analytics Solutions Unified Method for Data Mining) was released in
2015 by IBM with the purposes to refine and extend CRISP-DM.
2.2</p>
      </sec>
      <sec id="sec-2-2">
        <title>Data Mining Methodologies Usage Patterns</title>
        <p>
          According to KDNuggets4 polls results presented in the Table 1, the leading
methodology for data mining process is CRISP-DM, followed by SEMMA and KDD [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ].
However, the usage of CRIPS-DM has reached plateau while others are steadily
declining. Importantly, data scientists own methodologies usage stays above 25% rate
3 KDD process consists of 9 steps: learning application domain, dataset creation, data
cleaning &amp; processing, data reduction &amp; projection, choosing the function of data mining,
choosing data mining algorithm, interpretation, using discovered knowledge.
4 One of the leading websites on Business Analytics, Data Mining, and Data Science (edited
by Gregory I. Piatetsky-Shapiro, one of the major contributors to Knowledge Discovery and
Data Mining concepts).
2007
42%
13%
7%
5%
19%
9% (5%)
5%
2014
43%
8.5%
7.5%
3.5%
27.5%
10% (2%)
0%
and coupled with other ones (domain and non-domain specific) is steadily increasing
reaching usage rate of over 30% [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. This indicates decline in adoption rates of
CRISP-DM and potential need for revision and modification. Indeed, this
methodology though widely used was not updated since 2000 while data mining usage, methods
and tools have developed exponentially.
3
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Literature Review</title>
      <p>
        The literature review was conducted using key principles of Systematic Literature
Review approach [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. The corpus of scientific research articles, publications and
books was retrieved and the following steps conducted.
      </p>
      <p>Step 1 - Scopus and Web of Science databases have been searched with the search
string of the three standard major methodologies described in Section 2, i.e.
“CRISPDM”, “SEMMA”, “ASUM-DM” jointly with domain keyword “banking”5. All texts
referred from databases were retrieved and included into literature corpus.</p>
      <p>Step 2 - Identical procedure as in Step 1 was performed for Google Scholar
database, but with the delimitation - the texts corpus was retrieved for the first 100 hits.
The threshold was determined empirically based on evaluation of relevancy of texts
spanning beyond first 100 search results. The relevancy of the retrieved texts after the
given threshold declined significantly and did not contribute to additional insights.</p>
      <p>In both steps, there were no time restrictions set, all texts were retrieved as many
years back as database contained, oldest publication dated back to 1998, newest to
2018. 1/3 of studies have been published over last 3 years while approximately half of
the scientific texts are concentrated over last 5 years period. Overall text corpus was
reviewed and evaluated on iterative basis with respect to the relevancy of studies.
Summary statistics of the literature reviewed is presented in the Table 2 below.</p>
      <p>Database
No. of texts (string
Crisp-DM)
No. of texts (string
SEMMA)
No. of texts (string
ASUM-DM)
Total (excl.
duplications)
Total (excl. irrelevant)</p>
      <p>Scopus and
Web of Science
57
9
1
61
55</p>
      <p>Total
148
103
4
224
187</p>
      <p>Class 1
texts</p>
      <p>Class 2
texts
83
104
Scientific publications from databases were supplemented by additional set of general
materials (over 20 various texts). They were primarily retrieved from industry
web5 As CRISP-DM methodology is elaborated derivation, refinement of KDD process (as
described in Section 2.1), KDD was omitted from the direct search.
sites via general search and provide descriptive information on data mining
methodologies and processes in industry context.</p>
      <p>Analysis of the selected publications corpus enables to perform next research steps:
1. construct high-level typification of research performed in the field over the
last 10 years,
2. identify and categorize the existing research gaps, and
3. formulate research questions.</p>
      <p>Based on analysis of scientific publications, existing research can be broadly
typified into two major classes.</p>
      <p>
        The first research class (hereinafter, Class 1) relates to application of various data
mining methodologies for specific case studies. Importantly, the typical purpose of
case studies is to solve various business problems of the financial institutions by the
means of modeling tasks. The case studies can be further categorized as follows:
1. customer behavior modeling with the purpose to identify customer likely to
churn or loyal customer [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ],
2. profiling customers either according to the usage patterns of various digital
channels while interacting with the bank, patterns of electronic transactions,
eg., [
        <xref ref-type="bibr" rid="ref13 ref14">13-14</xref>
        ] or based on other features,
3. overall customer relationship management including customer segmentation
tasks, customer targeting [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ],
4. modeling tasks to support variety of risk management processes:
a. credit risk identification and management – credit scoring,
modeling and identifications of defaults [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ],
b. identification and prevention of fraud behavior and/or ALM risks,
c. risk control activities including auditing (internal/external in bank
domain) [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ],
5. efficiency studies, eg. optimization of branch network [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ].
      </p>
      <p>In Class 1 publications, the relevant data mining methodologies are used to
structure the data mining process and achieve data mining goals. Critical discussions are
not common, and if present, are structured around the method application at best,
typically considering data.</p>
      <p>Also, Class 1 research concentrates on the application of the particular scientific
technique processing aspects, types of modeling techniques with associated selection
of the best one based on evaluation results, model validation aspects, feature selection
and the final set of the best predictors. At the same time, there is lack of critical
evaluation of methodology aspects, discussions on the methodology steps, substeps that
need to be modified, added, or are redundant is largely omitted. Knowledge discovery
in relation to executing the data mining task methodologically remains „hidden‟,
„tacit‟ and confined within individual experience of the data mining experts. This might
be evidenced by own methodologies usage growth as identified in subsection 2.2.</p>
      <p>
        The second class of publications (hereinafter, Class 2) concentrates on data mining
methodologies or processes on a higher abstraction levels. A subset of these studies
also contains case studies similarly to Class 1 publications, but in contrast, these
experiments are conducted on a broader scope with larger number of organizations
and/or data mining tasks. Also, Class 2 publications typically present critical
evaluation of existing standard data mining methodologies. Such approach supports
identification of deficiencies and suggests improvements. Importantly, Class 2 research takes
various domain and industry perspectives. However, most of the studies focus on the
analysis of specific step of the methodologies. Very rare exception is [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] which
proposes novel direction - design of fuzzy expert system to evaluate overall success of
data mining projects by evaluating each step of the process methodology.
      </p>
      <p>Critical evaluation results and proposed suggestions can be structured based on the
following methodology phases, steps or areas.</p>
      <p>
        Deployment phase and business process. CRISP-DM methodology is identified as
lacking deployment phase details which can support integration of data mining results
into business process [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ]. Pivk, et al identify relationship between data and data
mining sophistication levels, and propose improvements by use of ontologies (domain,
business process and data mining) including extension elements to CRISP-DM, and
Service-Oriented Architecture for data mining. [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] proposes new deployment
framework (DEEPER). Associated concepts of ontologies and broader business
architecture for establishing data mining systems in organization are also discussed [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ].
      </p>
      <p>
        Data preparation phase and data requirements. Number of studies proposes
additional substeps and techniques for data preparations stage starting from adjustments to
KDD initiated in [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ] or alternatively, specific methodologies on gathering and
structuring data requirements in the broader context of data-intensive projects and data
governance [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ]. These studies are performed in the context of IT system architecture,
discussing enterprise data warehouses, „data lakes‟ and associated data and
information modeling and management concepts (eg. Business Information Modeling).
Given the fact that ~80% time in data mining process is taken by data preprocessing
and preparation steps, this part of research is of utmost importance.
      </p>
      <p>
        Model evaluation and selection phase. This research direction focuses on relevant
methodology enhancements to model evaluation and selection steps based on
decision-support framework, eg. [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ] proposes hybrid methodology and procedure for
generating and selecting the most appropriate casual explanatory model.
      </p>
      <p>
        Novel methodology enhancements and adjustments. Limited, but valuable
number of studies has emerged as a response to legislation and regulatory requirements,
eg. [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ] developed DADM (Discrimination-aware data mining) framework. Other
valuable direction of research is represented by authors proposing extension of
methodological frameworks from other business areas or processes. Adaptive Software
Development (ASD) methodology is adopted and introduced as ASD-DM for
predictive data mining in [
        <xref ref-type="bibr" rid="ref26">26</xref>
        ]. Other research is associated with Sex Sigma Lean
methodologies modifications and application in data mining process context, eg. DMAIC6
application discussed in [
        <xref ref-type="bibr" rid="ref27">27</xref>
        ].
      </p>
      <p>
        BI technologies, tools and IT architectures perspectives. Part of the studies
acknowledge importance of data mining processes and associated methodologies
when designing and implementing respective BI, Data Science technologies and tools
6 Acronym for Define, Measure, Analyze, Improve and Control, refers to a data-driven
improvement cycle used for improving, optimizing and stabilizing business processes and
designs.
in the organizations. Such studies lack enhancement prospective, however, they
discuss relevant aspects for successful integration of data mining process into overall IT
architecture [
        <xref ref-type="bibr" rid="ref28">28</xref>
        ].
      </p>
      <p>
        Organizational prospective. Finally, there is set of Class 2 publications
progressing to higher levels of generalization [
        <xref ref-type="bibr" rid="ref29">29</xref>
        ]. These studies do not focus on application
of data mining methodologies, but rather concentrate on broader investigation on
adoption of data mining as such. These studies, though not addressing concrete
methodological aspects are rather important as they discuss relevant motivational and
organizational aspects. These aspects are disregarded in existing standard data mining
methodologies, however, they do represent an inseparable part of practical context
and implementation environment in which data mining methodology is used.
      </p>
      <p>The literature review showed a few well-developed frameworks for data mining,
and they have been created for wide industry application. Existing data mining
methodologies do not cater to specific industry needs such as banking domain. Thus,
existing research gap can be formulated as follows:
Research Gap – Lack of comprehensive data-mining methodology applicable,
adapted for banking industry.</p>
      <p>The following research questions address it:
RQ1: What are the existing data mining frameworks and what components they
include?
RQ2: What within the existing frameworks could be re-used, removed or needed to be
added in order to develop the data mining methodology for banking domain?</p>
      <p>The research methodology to address research questions is presented in Section 4.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Research Methodology</title>
      <p>The research methodology consists of two phases summarized in the Table 3.
Expected outcome of the research is conceptualized, refined data mining methodology
with adaptations to financial services domain, which (1) represents consolidation of
existing body of knowledge, and (2) is validated on the sample of real life
datamining projects. The proposed illustrative case studies approach is based on broad,
typical data mining use cases portfolio executed across different geographical regions
and business areas of the financial institution.
5</p>
    </sec>
    <sec id="sec-5">
      <title>Conclusion</title>
      <p>The Systematic Literature Review for the research project (documented in Section 3)
has demonstrated a few well-developed frameworks for data mining created for wide
industry application, which do not cater to specific industry needs such as banking
domain. Also, scarce research concerned with this topic in specific financial services
domain provides opportunities for new insights and novel findings relevant for both
practitioners and academia. Section 4 proposed project research methodology to: (1)
elicit and consolidate domain-specific refinements towards existing data mining
methodologies from existing body of knowledge, and (2) to validate against portfolio
of real-life data mining projects executed in banking domain. The result of the study
will be conceptualized, enhanced data-mining methodology specifically designed to
frame and tackle complex data analytics projects in financial services industry.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1. Nasdaq Globe Newsire, https://globenewswire.com/newsrelease/2017/12/20/1267022/0/en/Dresner-Advisory
          <string-name>
            <surname>-Services-Publishes-</surname>
          </string-name>
          2017
          <string-name>
            <surname>-Big-DataAnalytics-</surname>
          </string-name>
          Market-Study.html,
          <source>news feed Dresdner Advisory Services Publishes 2017 Big Data Analytics Market Study, last accessed</source>
          <year>2018</year>
          /04/06
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2. Forbes homepage, https://www.forbes.com/sites/louiscolumbus/2017/12/24/53-ofcompanies
          <article-title>-are-adopting-big-data-analytics/#4cf12a2139a1, last accessed</article-title>
          <year>2018</year>
          /04/06
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>Forrester</given-names>
            <surname>Consulting: The Future Belongs To Those Who Monetize And Maximize Their Data</surname>
          </string-name>
          ,
          <source>Industry report, January</source>
          <year>2017</year>
          , last accessed
          <year>2018</year>
          /04/06
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Jayasree</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Balan</surname>
            ,
            <given-names>R.V.S.:</given-names>
          </string-name>
          <article-title>A review on data mining in banking sector</article-title>
          .
          <source>American Journal of Applied Sciences</source>
          ,
          <volume>10</volume>
          (
          <issue>10</issue>
          ),
          <fpage>1160</fpage>
          -
          <lpage>1165</lpage>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5. David L.
          <article-title>Olson: Data mining in business services</article-title>
          .
          <source>Service Business</source>
          ,
          <volume>1</volume>
          (
          <issue>3</issue>
          ), pp
          <fpage>181</fpage>
          -
          <lpage>193</lpage>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>KDNuggets</given-names>
            <surname>Homepage</surname>
          </string-name>
          , https://www.kdnuggets.com/
          <year>2014</year>
          /10/crisp-dm
          <article-title>-top-methodologyanalytics-data-mining-data-science-projects</article-title>
          .html,
          <source>last accessed</source>
          <year>2018</year>
          /04/07
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Soltani</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Navimipour</surname>
            ,
            <given-names>N.J.:</given-names>
          </string-name>
          <article-title>Customer relationship management mechanisms: A systematic review of the state of the art literature and recommendations for future research</article-title>
          .
          <source>Computers in Human Behavior</source>
          <volume>61</volume>
          ,
          <fpage>667</fpage>
          -
          <lpage>688</lpage>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Fayyad</surname>
            ,
            <given-names>U.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Piatetsky-Shapiro</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Smyth</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>The KDD process for extracting useful knowledge from volumes of data</article-title>
          .
          <source>Comminications of the ACM</source>
          ,
          <volume>39</volume>
          (
          <issue>11</issue>
          ),
          <fpage>27</fpage>
          -
          <lpage>34</lpage>
          (
          <year>1996</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Chapman</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Clinton</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kerber</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Khabaza</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Reinartz</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shearer</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wirth</surname>
            ,
            <given-names>R:</given-names>
          </string-name>
          <article-title>CRISP-DM 1.0, step-by-step data mining guide, SPSS Inc</article-title>
          .
          <article-title>(</article-title>
          <year>2000</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Morabito</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>The future of digital business innovation: Trends and practices</article-title>
          .
          <source>1st edition</source>
          . Springer International Publishing Switzerland (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Rohanizadeha</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Moghadama</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>A Proposed Data Mining Methodology and its Application to Industrial Procedures</article-title>
          .
          <source>Journal of Industrial Engineering</source>
          ,
          <volume>4</volume>
          ,
          <fpage>37</fpage>
          -
          <lpage>50</lpage>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Nadali</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Kakhky</surname>
            <given-names>N.E.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Nosratabadi</surname>
            ,
            <given-names>E. H.</given-names>
          </string-name>
          :
          <article-title>Evaluating the success level of data mining projects based on CRISP-DM methodology by a Fuzzy expert system</article-title>
          .
          <source>3rd International Conference on Electronics Computer Technology (ICECT)</source>
          ,
          <fpage>161</fpage>
          -
          <lpage>165</lpage>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>García</surname>
            ,
            <given-names>D.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nebot</surname>
            , À. Vellido,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Intelligent data analysis approaches to churn as a business problem: a survey</article-title>
          .
          <source>Knowledge and Information Systems</source>
          <volume>51</volume>
          (
          <issue>3</issue>
          ),
          <fpage>719</fpage>
          -
          <lpage>774</lpage>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Mansingh</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Osei-Bryson</surname>
            ,
            <given-names>K.-M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rao</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mills</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Application of a data mining process model: A case study- profiling internet banking users in Jamaica</article-title>
          .
          <source>In: AMCIS 2010 Proceedings, Paper</source>
          <volume>439</volume>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Daihani</surname>
            ,
            <given-names>D.U.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Feblian</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Implementation of CRISP-DM model in order to define the sales pipelines of PT X</article-title>
          .
          <source>In: Proceeding of 9th International Seminar on Industrial Engineering and Management</source>
          ,
          <fpage>1</fpage>
          -
          <lpage>10</lpage>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Geng</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bose</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
          </string-name>
          , X.:
          <article-title>Prediction of financial distress: An empirical study of listed Chinese companies using data mining</article-title>
          .
          <source>European Journal of Operational Research</source>
          ,
          <volume>241</volume>
          (
          <issue>1</issue>
          ),
          <fpage>236</fpage>
          -
          <lpage>247</lpage>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Shaikh</surname>
            ,
            <given-names>J.M.:</given-names>
          </string-name>
          <article-title>E-commerce impact: Emerging technology - Electronic auditing</article-title>
          .
          <source>Managerial Auditing Journal</source>
          ,
          <volume>20</volume>
          (
          <issue>4</issue>
          ),
          <fpage>408</fpage>
          -
          <lpage>421</lpage>
          (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Met</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tunali</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Erkoç</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tanrikulu</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Branch Efficiency and Location Forecasting Application of Ziraat Bank</article-title>
          .
          <source>Journal of Applied Finance &amp; Banking</source>
          ,
          <volume>7</volume>
          (
          <issue>4</issue>
          ),
          <fpage>1</fpage>
          -
          <lpage>13</lpage>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Pivk</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vasilecas</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kalibatiene</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rupnik</surname>
          </string-name>
          , R.:
          <article-title>On approach for the implementation of data mining to business process optimisation in commercial companies</article-title>
          .
          <source>Technological and Economic Development of Economy</source>
          ,
          <volume>19</volume>
          (
          <issue>2</issue>
          ),
          <fpage>237</fpage>
          -
          <lpage>256</lpage>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Balkan</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Goul</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>A portfolio theoretic approach to administering advanced analytics: The case of multi-stage campaign management</article-title>
          .
          <source>In: Proceedings of the 44th Annual Hawaii International Conference on System Sciences</source>
          ,
          <fpage>1</fpage>
          -
          <lpage>10</lpage>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Xin</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Enjie</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hongxia</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          :
          <article-title>Promoting data mining methodologies by architecturelevel optimizations</article-title>
          .
          <source>In: Proceedings 2009 2nd International Workshop on Knowledge Discovery and Data Mining, WKKD</source>
          <year>2009</year>
          ,
          <volume>179</volume>
          -
          <fpage>182</fpage>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Tianrui</surname>
            <given-names>Li</given-names>
          </string-name>
          ,
          <article-title>Da Ruan: An extended process model of knowledge discovery in database</article-title>
          ,
          <source>Journal of Enterprise Information Management</source>
          , Vol.
          <volume>20</volume>
          Issue: 2, pp.
          <fpage>169</fpage>
          -
          <lpage>177</lpage>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <surname>Priebe</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Markus</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Business information modeling: A methodology for data-intensive projects, data science and big data governance</article-title>
          .
          <source>In: Proceedings 2015 IEEE International Conference on Big Data (IEEE Big Data</source>
          <year>2015</year>
          ), pp.
          <fpage>2056</fpage>
          -
          <lpage>2065</lpage>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          24.
          <string-name>
            <surname>Osei-Bryson</surname>
            ,
            <given-names>K.-M.:</given-names>
          </string-name>
          <article-title>A hybrid decision support framework for generating and selecting causal explanatory regression splines models for information systems research</article-title>
          .
          <source>Information System Frontiers</source>
          ,
          <volume>17</volume>
          (
          <issue>4</issue>
          ),
          <fpage>845</fpage>
          -
          <lpage>856</lpage>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          25.
          <string-name>
            <surname>Berendt</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Preibusch</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Better decision support through exploratory discriminationaware data mining: Foundations and empirical evidence</article-title>
          .
          <source>Artificial Intelligence and Law</source>
          ,
          <volume>22</volume>
          (
          <issue>2</issue>
          ),
          <fpage>175</fpage>
          -
          <lpage>209</lpage>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          26.
          <string-name>
            <surname>Alnoukari</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Alzoabi</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hanna</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Applying Adaptive Software Development (ASD) agile modeling on predictive data mining applications: ASD-DM methodology</article-title>
          .
          <source>In: Proceedings International Symposium on Information Technology, ITSim</source>
          <year>2008</year>
          ,
          <volume>2</volume>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          27.
          <string-name>
            <surname>Zwetsloot</surname>
            ,
            <given-names>M.I</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kuiper</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Akkerhuisc</surname>
          </string-name>
          , S., T., de Koningd, H.:
          <article-title>Lean Six Sigma meets data science: Integrating two approaches based on three case studies. Quality Engineering (online journal)</article-title>
          ,
          <source>DOI: 10.1080/08982112</source>
          .
          <year>2018</year>
          .
          <volume>1434892</volume>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          28.
          <string-name>
            <surname>Narayanan</surname>
            ,
            <given-names>L.V.</given-names>
          </string-name>
          :
          <article-title>Data warehousing and analytics in banking: Implementation</article-title>
          . Editor Vadlamani Ravi,
          <source>Advances in Banking Technology and Management: Impacts of ICT and CRM</source>
          ,
          <fpage>217</fpage>
          -
          <lpage>231</lpage>
          , publisher Information Science Reference, Hershey, New York (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          29.
          <string-name>
            <surname>Debuse</surname>
            ,
            <given-names>J.C.W.</given-names>
          </string-name>
          :
          <article-title>Extending data mining methodologies to encompass organizational factors</article-title>
          .
          <source>Systems Research and Behavioral Science</source>
          ,
          <volume>24</volume>
          (
          <issue>2</issue>
          ),
          <fpage>183</fpage>
          -
          <lpage>190</lpage>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>